0% found this document useful (0 votes)

61 views28 pages

MCMC Methods For Fitting and Comparing Multinomial Response Models

This document discusses new Markov chain Monte Carlo (MCMC) algorithms for fitting multinomial probit, multinomial-t, and multinomial logit models to multinomial response data. It introduces two new MCMC algorithms for fitting the multinomial probit and multinomial-t models and compares them to existing MCMC methods. It also discusses how to calculate marginal likelihoods and Bayes factors to compare the fit of the different multinomial models. An example using real data illustrates the methods.

Uploaded by

Philippe Aurier

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views28 pages

MCMC Methods For Fitting and Comparing Multinomial Response Models

Uploaded by

Philippe Aurier

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

MCMC Methods for Fitting and Comparing

Multinomial Response Models

Siddhartha Chib Edward Greenberg
John M. Olin School of Business Department of Economics
Washington University Washington University
One Brookings Drive One Brookings Drive
St. Louis, MO 63130 St. Louis, MO 63130
Yuxin Chen
John M. Olin School of Business
Washington University
One Brookings Drive
St. Louis, MO 63130
January 20, 1998

Abstract
This paper is concerned with statistical inference in multinomial probit, multinomial-
t and multinomial logit models. New Markov chain Monte Carlo (MCMC) algorithms
for tting these models are introduced and compared with existing MCMC methods.
The question of parameter identication in the multinomial probit model is readdressed.
Model comparison issues are also discussed and the method of Chib (1995) is utilized
to nd Bayes factors for competing multinomial probit and multinomial logit models.
The methods and ideas are illustrated in detail with an example.
Keywords: Bayes factor; Gibbs sampling; Monte Carlo EM algorithm; Marginal likeli-
hood; Metropolis-Hastings algorithm; Multinomial logit; Multinomial probit; Multinomial-
t; Model comparison.

1 Introduction
The tting of multinomial probit models has been viewed as a challenge for over twenty
ve years. One major diculty is the problem of evaluating the likelihood function while
another, somewhat neglected one, is the problem of estimating covariance parameters of
the model given that only outcome per subject is observed. As a result of this missing-
ness, which is inherent in multinomial data, it is possible that dierent combinations of
regression and covariance parameters can produce virtually identical outcome probabilities.

1
Recently, developments in simulation-based Bayesian and classical methods have given rise
to reasonably eective methods for estimating this model [McFadden (1989), Albert and
Chib (1993), McCulloch, Polson, and Rossi (1994) and Stern (1997)]. Despite these devel-
opments, further improvements in the tting of the model are possible, based on Markov
chain Monte Carlo methods [Gelfand and Smith (1990), Chib and Greenberg (1996)].
In general terms, Markov chain simulation methods provide a rather attractive frame-
work for dealing with the MNP and related multinomial models. The use of these methods
in the context of probit models was initiated by Albert and Chib (1993). A central reason
for studying these methods is that they are easy to implement and can be applied from both
a classical and Bayesian perspective. One version of these methods can be used to sample
the posterior distribution of the parameters, while another can be used to search for the
maximum-likelihood estimate. As a bonus, these methods can be extended for the tting of
more general multinomial models than the MNP. One such model that is introduced in this
paper, the multinomial-t, relies on a multivariate-t assumption for the latent data. It turns
out that the basic algorithms have to be modied only slightly to apply to this model.
In addition to tackling the question of tting the MNP model, another purpose of
this paper is to develop a framework within which alternative multinomial models can be
compared. This framework is important because there is a paucity of discussion in the
literature on the practical benets of the MNP model over the much simpler multinomial
logit (MNL) model. Although it is well known that the MNL model suers from a weakness
not shared with the MNP model|that the ratio of probabilities of any two outcomes does
not depend on the presence or absence of other outcomes|it appears that the importance
of this weakness has not been assessed in empirical settings. One reason for this may be that
the comparison of these non-nested models is dicult from a classical perspective. From a
Bayesian viewpoint, however, such comparisons can be handled more conveniently. For a
specied set of priors, a method due to Chib (1995) can be used to calculate the marginal
likelihood of the model and the Bayes factor, which is used in the Bayesian context to
compare models. We apply this technique to a data set and nd that the support for the
MNL model over both the MNP and MNT models is decisive. Moreover, the Bayes factor
supports the MNL model in another example, which we do not report on, in which the data

2
are articially generated from the MNP model. This result is possibly an artifact of the
covariates and our design, but it nonetheless emphasizes the important point that support
for the MNP model over the MNL model is not guaranteed once model complexity is taken
into account.
The rest of the paper is organized as follows. In Section 2 the various multinomial models
are described, and in Section 3 two new MCMC algorithms for tting the MNP and related
models are presented. This section also discusses the issues related to identication and
points out why the parameters of the MNP model are likely to be weakly identied. Section
4 explains the computation of the marginal likelihood and Bayes factors and considers
an application that involves the comparison of MNP, MNT, and MNL models. Results
from a real data set are introduced at various places in the text to illustrate the methods.
Concluding remarks are contained in Section 5.

2 Multinomial response models

Let y1 ; y2 ; : : : ; yn denote a set of unordered multinomial responses on n randomly selected
subjects. Assume that each response takes the possible values f1; : : : ; J + 1g and that each
subject and response is associated with a set of covariates (vij ; wi ), where vij is a covariate
vector that varies across both subjects and responses and wi contains characteristics of sub-
ject i. To formulate the probability model for the J + 1 responses, let Zi = (zi1 ; : : : ; ziJ +1 )
denote a continuous latent vector with multivariate distribution F and let yi = j if zij is
the maximum of Zi .
2.1 Multinomial probit
If F is the multivariate normal distribution, the probabilities of the multinomial responses
can be dened in terms of the vector Zi = (zi1 ; : : : ; ziJ ); where zij = zij ziJ +1 are dierences
with respect to an arbitrarily chosen base, ziJ +1 : The probability model of zij is given by

zij = (vij vi;J +1 )0 + wi0 j + "ij

= x0ij + "ij

3
or in vector notation as Zi = Xi + "i , where = ( ; 1 ; 2 ; : : : ; J ),
0 1
(vi vi;J )0 wi0 00 00
B (v v )0 00 wi0 00 CC
1 +1

Xi = BBB i .. i;J
2
..
+1
. . . ... C
CA ;
@ . .
(viJ vi;J )0 00 00 wi0
+1

and "i = ("i ; : : : ; "iJ )0 NJ (0; ): For identiability reasons the (1; 1) element of , ;
1 11

is constrained to equal one and J +1 is normalized to zero. In terms of the latent values
zij , the observed outcome is given by the conditions
(
ij = maxfZi ; 0g
Yi = jJ + 1 ifif zmax (1)
l fzil g 0;
and the probability mass function of Yi is
Z
Pr(Yi = j j ; ) = J (Zi jXi ; ) dZi ; j J + 1;
Aj
where J is the density function of the J -variate normal distribution and
(
Aj = ffZZi :: zzi1 < zij ; : : : ; 0 < zij ; zi;j +1 < zij ; : : : ; ziJ < zij g; j J
i i1 < 0; : : : ; zij < 0; : : : ziJ < 0g; j = J +1:
The multinomial probabilities thus require the computation of a complicated multivariate
integral. One way to compute the integral is by the Monte Carlo importance sampling
method developed by Geweke (1991), Hajivasilliou (1990), and Keane (1994), and known
as the GHK method (see Appendix 1 for further details). For estimation purposes, it is not
necessary to compute this probability, as is discussed below.
2.2 Multinomial- t

Now suppose that the distribution F on the underlying undierenced latent values Zi is
multivariate-t with specied degrees of freedom . This gives rise to a model that we call
the multinomial-t model. Albert and Chib (1993) extended the probit link to the t-link
in the binary response case and provided a simple approach for estimating the resulting
model. As in the MNP case, the MNT model can be expressed in terms of the dierenced
latent values Zi , where now Zi j; MVTJ (Xi ; ; ) with density
(v+J )=2
f (Zi j ; ) / jj =
1 2
1 + 1 (Zi Xi )0 1 (Zi Xi ) :

4
As before, 11 = 1 and the observed outcomes Yi are dened by (1). Following Albert and
Chib (1993), the model for the latent Zi may be expressed as a scale mixture of normals by
introducing a random variable i Gamma( 2 ; 2 ) and letting

Zij; ; i N (Xi ; i ):

Conditionally on i , this model is equivalent to the MNP.

2.3 Multinomial logit
The assumption that F is the independent Weibull distribution leads to the multinomial
logit model of McFadden (1975). The probabilities of the outcomes are given in closed form
as
exp(v0 + w0 )
Pr(Yi = j j ) = PJ +1 ij 0 i j 0 ; j = 1; : : : ; J + 1 ;
j =1 exp(vij + wi j )
with J +1 normalized to zero. Note that the covariates vij appear in their undierenced
form.

3 MCMC tting of MNP models

Markov chain Monte Carlo simulation methods provide a unied and coherent approach
for estimating the multinomial response models in Section 2. We focus exclusively on these
methods and refer the reader to McFadden (1989), Geweke, Keane, and Runkle (1994) and
Stern (1997) for classical non-MCMC simulation-based methods. We focus our attention on
the estimation of the MNP model because the estimation of the MVT model requires only a
straightforward modication of the MNP algorithms and the estimation of the MNL model
by MCMC presents no new diculties. An algorithm for tting it is described brie y.
3.1 Posterior sampling in the MNP model
One basic approach for tting probit models by MCMC methods is due to Albert and Chib
(1993). In this approach, the parameter space is augmented by the latent data fZi g; and
MCMC methods are used to sample the joint posterior distribution (; ; fZi gjy), where
= (12 ; 22 ; 31 ; : : : ; JJ ) denotes the unique elements of and y = (y1 ; : : : ; yn ) is the
observed data. The MCMC algorithm is quite straightforward except that the constraint

5
on 11 makes it dicult to sample . To solve this problem, McCulloch and Rossi (1994)
propose an algorithm that ignores the restriction on 11 in the sampling. Their algorithm
simulates the non-identied parameters of the model, obtaining draws of the identied
parameters ex-post from the draws of the non-identied parameters. Nobile (1995) has
pointed out that, as a consequence of sampling the non-identied parameters, this method
is sensitive to the prior distribution.
To sample the identied parameters in a MCMC simulation with data augmentation,
one iterates on the following steps a large number of times.

Basic algorithm for sampling the MNP posterior distribution

Sample zij from (zij jyi; Zi j ; ; ), for j = 1; 2; : : : ; J and i = 1; 2; : : : ; n; where
( )

Zi j is the vector Zi excluding the j th component;

( )

Sample from (jy; Z; ); and

Sample from ( jy; ; Z):
We now explain how each of these distributions can be sampled. The distributions in
the rst step of this algorithm are the univariate normal distributions f (zij jZi( j ) ; ; )
truncated to the region implied by the observed value of yi :

(zij jZi( j ) ; ; ) = f (zij jZi( j) ; ; )I (zij 2 Rij ); (2)

where 8
>
< (maxf0; maxfZi j gg; 1) if yi = j; j = 1; : : : ; J
( )
Rij = > ( 1; maxfZi j g) if yi 6= j; j = 1; : : : ; J ;
: ( 1; 0] ( )
if yi = J + 1
which follows from the set-valued inverse of the mapping in (1). The density f (zij jZi( j ) ; ; )
is obtained by the usual multivariate normal theory. Instead of sampling the zij in this man-
ner, the entire vector Zi can be sampled from (Zi jyi ; ; ) by the accept-reject method
[Albert and Chib (1993)]. In this approach the vector Zi is drawn from N (Xi ; ) and
accepted as a valid draw if the vector falls in the region implied by yi . The advantages
of this method are that it requires little coding and that it tends to improve the serial
correlation of the sampled output because the Zi are drawn in one block. A disadvantage is
6
that several sampled vectors may have to be discarded before one is accepted. Nonetheless,
because the accept-reject method is not a Markov chain sampler, the method is useful in
initializing the Markov chain simulations for the latent data.
The next two distributions are proportional to the complete data density
Y
n
f (Zj; ) = f (Zi j; )
i=1 !
/ jj n=2 exp 1 X(Z X )0 1 (Z X ) : (3)
2 i i i i i

If the prior density of is N ( 0 ; B0 1 ), then jy; Z; ; which is independent of y; is

N (1 ; B1 1 ), where 1 = B1 1 [B0 0 + Pi X0i 1 Zi ] and B1 = [B0 + Pi X0i 1Xi ]: Be-
cause this distribution is in closed form, the conditional mean and variance can be used
to estimate the posterior mean and variance of (jy) by averaging the conditional mo-
ments. For example, given that the simulation has been run for G iterations, the posterior
P P
mean of can be estimated as G 1 Gg=1 B1 1 [B0 0 + i X0i 1 Z(ig) ], where Z(ig) denotes
the sampled latent data in the gth iteration. The corresponding marginal variance can be
estimated from the relationship between conditional and unconditional variance.
Two ways of simulating the third distribution are considered next. The rst is based on
a Choleski decomposition, and the second utilizes properties of the Wishart distribution.
Choleski decomposition method
Our new approach to the simulation of is based on the log-Choleski decomposition of a
positive-denite matrix [Pinheiro and Bates (1996)]. For any symmetric or lower triangular
matrix A with a11 = 1, let
vech (A) = (a12 ; a22 ; a31 ; : : : ; aJ 1 ; : : : ; aJJ )0
denote the free elements of A. Now let = LL0 , where L = flrs g is a J J lower triangular
matrix with l11 = 1; and let = vech (L), a vector of dimension p = (J +2)(J 1)=2. The
matrix L is made unique by restricting its diagonal elements to be positive; these appear
in rows (i + 2)(i 1)=2 of , i 2 J . Finally, dene the parameter
= (l21 ; log(l22 ); l31 ; l32 ; log(l33 ); : : : ; lJ 1 ; : : : ; log(lJJ ))0
( ; : : : ; p )0 :
1

7
The mapping between and is one-to-one. This parameterization of leaves the vector
entirely unrestricted. Any 2 Rp leads to a matrix that is symmetric, positive denite,
and has 11 = 1:
To understand the nature of this parameterization, consider the case J = 2; where
!
L= l 1 0
21 l22 :
From = LL0 it follows that 12 = l12 and 22 = l12 2
+ l22
2
. These imply that l222
=
22 12 2
; which is the determinant of and is positive if is positive denite. Thus, the
parameterization = (l12 ; log(l22 )) imposes the required properties of positive deniteness
along with the condition that 11 = 1.
A major advantage of the parameterization from a Bayesian perspective is that it
permits a straightforward use of MCMC methods. Furthermore, a prior distribution on
can be assigned by specifying a prior distribution on each ij and then using this prior
distribution to infer the required distribution of . To illustrate this idea, suppose that our
prior beliefs about vech () are proportional to a normal distribution with mean vector s0
and covariance matrix S0 , as in Chib and Greenberg (1995b). The required prior on can
be determined by the following Monte Carlo procedure:

1. Set i = 1

(a) While i is less than I (a prespecied quantity), sample a vector vech ()i /
N (s0 ; S0 ) and form the matrix i = Li Li0 . From Li compute and store the
vector i :
(b) Increment i and go to (1a).
P P
2. Compute v0 = I 1 Ii=1 i and G0 = I 1 Ii=1 (i v0 )( i v0 )0 , the mean and
covariance of f i g. Let the prior distribution of be N (v0 ; G0 ).

Note that the above prior on fij g overcomes the well known limitation of the Wishart
distribution wherein the spread of the distribution is controlled by a single scalar degrees
of freedom parameter. A notable advantage of working in the parameterization is that it

8
leads to a unrestricted posterior density. In contrast, the posterior density of vech () is
restricted to the region that produces a positive-denite matrix.
Now consider the sampling of (equivalently the sampling of ) from the density
(jZ; ). By denition the full conditional density is
Y
n
(jZ; ) / () (Zi jXi ; )
i=1
/ ()f (Zj; ); 2 Rp ; (4)

where ( ) is the unnormalized Gaussian prior density for and the value of the normalizing
constant is not required. This posterior density can be sampled by the MH algorithm with
a tailored proposal density. Tailoring is achieved by nding the mode and curvature of
log f (Zj; ) from a few Newton-Raphson steps. The mode and curvature are then used
to create a multivariate-t proposal density, fT ( j; V; ), where is the mode, V is the
inverse of the negative Hessian at the mode, and and are adjustable parameters. With
denoting the current point in the iterations, the MCMC algorithm proceeds by iterating
on the following steps.

Algorithm MNP 1
Sample Z as in the basic algorithm for sampling the MNP posterior distribution;
Sample as in the basic algorithm for sampling the MNP posterior distribution;
Sample t from fT (:j; V; ) and compute
( t ) fT (j; V; ) )
f ( Zj
(; t ) = min 1; f (Zj; ) ;
:
fT ( t j; V; )
Move to t with probability ( ; t ) and stay at with probability 1 ( ; t ):

It should be noted that this algorithm is easily modied if the covariance matrix has
more constraints than 11 = 1. In that case one can operate directly on the unique elements
of , as in Chib and Greenberg (1995b) in a dierent but related context. This point is
illustrated in one of the examples considered below.

9
Posterior sampling without augmentation
Algorithm 1 exploits the simplication that arises from data augmentation. One question
is whether it is possible to sample the posterior distribution without augmentation. The
main problem (one that is avoided by data augmentation) is that it is necessary to compute
the likelihood function at least once during each point in the iterations. This can be a
prohibitive computational burden if the sample size and the number of alternatives are
large. In the case of smaller models, however, one may proceed as follows.
Let = (; ) denote the parameters of the model and consider sampling in one
block with the MH algorithm. To nd the proposal density for one can utilize the
output of Algorithm 1. Specically, one can run Algorithm 1 for G = 5000 iterations
P
(say) to nd the mean vector = G 1 Gg=1 (g) and the sample covariance matrix V =
G 1 PGg=1 ( (g) )( (g) )0 . Based on these quantities, the proposal density can be
specied as fT ( j; V; ); where fT is the multivariate-t density with degrees of freedom.
A sample of draws from the posterior distribution can then be obtained by repeating the
following step.

Algorithm MNP 2
Sample (t ; t ) from fT ( j; V; ) and let
( t ; t ) fT ( ; j; V; ) )
p ( y j
[( ; ); ( t ; t )] = min 1; p(yj ; )
fT (t ; t j; V; )
denote the probability of move. Then move to ( ; ) with probability [( ; ); ( t ; t )]
and stay at (; ) with probability 1 [( ; ); ( t ; t )].

It should be noted that if J is large it may be necessary to sample and in two blocks.
In that case, however, the likelihood function p(yj ; ) must be evaluated twice within each
iteration and the proposal density for each block must also be dened in a dierent way.
At this point, therefore, it does not seem feasible to implement this algorithm in general
without incurring an enormous computational cost.

10
Starting values for algorithms
It is often useful to initialize posterior sampling algorithms in regions that have high mass
under the posterior distribution. This seems to be particularly important in the tting of
MNP models. One way to compute a high density point is by the Monte Carlo EM (MCEM)
algorithm, which also relies on data augmentation and delivers the approximate maximum
likelihood estimate [Natarajan, Kiefer, and McCulloch (1995)]. Let ((t) ; (t) ) denote the
current value of the parameters and (^ ; ^ ) the estimates obtained at convergence. The
algorithm is implemented by iterating on the following steps.

Algorithm MCEM
Sample Z j as in the basic algorithm for sampling posterior distribution. Repeat this
( )

step N times.
Update through the expression t = (Pni X0i Xi) (Pni X0i Zi) ,
( +1) 1 1 1

PN Z j is the average of Z over the N draws.

=1 =1

where Zi = N j
1
i =1
( )
i

Update to t by maximizing the function PN log f (Z j j t ; ), where f (:)

( +1)
j =1
( ) ( +1)

is the complete data density.

In implementing this algorithm N is initially chosen to be a small number, and its value
is steadily increased as the maximizer is approached. In the examples below, N is set equal
to ten for the rst twenty iterations and is increased to four hundred close to convergence.
A well known problem with the EM algorithm is that it does not automatically provide
an estimate of the observed information matrix at convergence. This is not a problem if
one is using the MCEM algorithm to supply starting values for the full posterior sampling
algorithms. If standard errors are required, then one can compute the observed informa-

tion matrix using the Louis (1982) formula E 2 log f (Zj; ) Var f log f (Zj; )g,
where the expectation and variance are with respect to the distribution Zjy; ^ ; ^ and
denotes dierentiation w.r.t. the parameters. Each of these terms can be estimated by
taking M additional draws fZ(1) ; : : : ; Z(M ) g from Zjy; ; ^ and computing the expectation
and variance as corresponding sample averages.

11
3.2 MCMC sampling of the MNT and MNL models
Consider now the tting of the MNT model by MCMC methods. In this case, Algorithm
1 is easily modied because of the fundamental connection between the multivariate-t and
multivariate normal distributions. The general idea is to conduct the sampling with i
(i n) as additional parameters of the model. Then, conditional on i , the latent data Zi
follow the distribution
Zi N (Xi; i 1 ) :
Accordingly, the full conditional distributions of zij and are obtained by replacing by
i 1 in the expressions presented above. To sample , the MH approach given in the
context of Algorithm 1 can again be applied by noting that Zi 1i =2 is distributed as normal
with mean Xi 1i =2 and variance . Finally, the mixing variable i (i n) is sampled from
the gamma distribution
!
i jZi ; ; Gamma 2 ; + J + ( Z i X i )0 1(Zi Xi )
; i n:
2
Algorithm 2 can also be modied by making use of the GHK algorithm to evaluate Pr(yi =
j j ; ), but now under the assumption that the distribution of the latent data is multivariate-
t: The GHK algorithm in this case requires simulation from univariate student-t distribu-
tions as discussed in Appendix 1.
To conduct MCMC sampling of in the MNL model we note that the posterior density
of is proportional to
Y exp(vij0 + wi0 j ) !d ij

( jy) / PJ +1 exp(v0 + w0 ) ();
i;j j ij i j
where yi = j
dij = 10 ifotherwise
and () is the prior distribution for ; assumed to be multivariate normal with known
mean vector and covariance matrix. This density can be sampled by the MH algorithm in
which the proposal density q( ) is taken to be multivariate-t with mean vector equal to the
mode of the posterior distribution and scale matrix equal to the curvature at the mode of
the posterior distribution. The algorithm is then implemented by iterating on the following
steps.
12
Algorithm MNL
Let be the current value and choose y from q().
Accept y as the next value in the sample with probability
( y jy)q( ) )
(; y) = min 1; (
:
( jy)q( y )
Accept as the next value in the sample with probability 1 (; y):
3.3 Comparison of algorithms for the MNP model
The algorithms for the MNP model are now compared with data on four multinomial
choices. The results are similar for the MNT model and are suppressed. The data consist of
210 observations on highway and transit usage between Sydney, Melbourne, and New South
Wales, Australia, that were collected by David Hensher and are contained in the Limdep
computer package. The choices are whether to travel by air (A), train (T ), bus (B ), or car
(C ), with car treated as the base choice. The covariates are terminal waiting time (TTME),
in-vehicle time (INVT), in-vehicle cost (INVC), a generalized cost measure (GC), indicator
variables for the rst three choices (IND1, IND2, IND3), household income times A (HA),
and traveling party size times A (PA). Data for the rst two observations are presented in
Table 1. The covariates are in their undierenced form (vij ):
One model that is useful for these data consists of the seven covariates TTIME, INVT,
IND1, IND2, IND3, HA and PA. On the assumption that the prior information on the
parameters is represented by the distributions

N (0; 10Ik ) and / N (s0 ; S0 );

where s0 = (0; 1; 0; 0; :75) and S0 = diag(1; 0:51; 1; 1; 0:51); we nd (via the simulation
method described in Section 2) that the prior mean of is ( 0:01; 0:057; 0:006; 0:006; 0:383)
and that the prior variance is approximately 0.28 for each component of . The prior on
is taken to be Gaussian with these moments. Algorithms 1{2 are run for 10,000 cycles, and
the MH parameters in Algorithms 1 and 2 are set at = 1 and = 20. Each of the posterior
sampling algorithms is initialized by the point estimate from the MCEM algorithm.

13
yi TTIME INVT INVC GC IND1 IND2 IND3 HA PA
4 69 59 100 70 1 0 0 35 1
4 34 31 372 71 0 1 0 0 0
4 35 25 417 70 0 0 1 0 0
4 0 10 180 30 0 0 0 0 0

4 64 58 68 68 1 0 0 30 2
4 44 31 354 84 0 1 0 0 0
4 53 25 399 85 0 0 1 0 0
4 0 11 255 50 0 0 0 0 0

Table 1: Data for the rst two of 210 subjects.

Results are summarized in Tables 2 and 3. Point estimates of and |MLE and
posterior means|are fairly close across the various algorithms. Some of the dierences
may be attributable to identication problems inherent in this model that are discussed
below. Dierences between the MLE and the posterior means may also re ect asymmetries
in the posterior distribution with a resulting dierence between modes and means. It is also
interesting to compare the serial correlation of the sampled output from Algorithms 1-2.
Figures 1 and 2 reproduce the output of . It is seen that the serial correlation of the output
from Algorithm 2 dissipates quickly relative to Algorithm 1. However, the point estimates
from the two algorithms are very close (as are the predicted probabilities computed below)
and, therefore, one may conclude that the benets that accrue from adopting Algorithm 2
are outweighed by the computational burden.
Finally, we compare the posterior predicted probability of the observed choice of each in-
dividual from each of the three algorithms. This probability is computed from the posterior
sample of the parameters generated by each of the algorithms as
X
Pr(Yi = ji ja) = G 1
Pr(Yi = ji j(ag) ; (ag) ); (5)
g

where ji is the choice made by the ith subject, ( (ag) ; (ag) ) are draws from the posterior
distribution, and a = 1; 2 indexes, respectively, the MNP Algorithms 1 and 2. Figure 3

14
MCEM Algorithm 1 Algorithm 2
Variable MLE Std Error Mean Std Dev Mean Std Dev
TTME -0.030 0.007 -0.040 0.007 -0.039 0.007
GC -0.011 0.002 -0.012 0.002 -0.012 0.002
IND1 2.096 0.743 2.807 0.601 2.666 0.601
IND2 1.474 0.317 1.786 0.271 1.714 0.273
IND3 1.272 0.316 1.511 0.269 1.477 0.266
HA 0.013 0.005 0.013 0.006 0.014 0.006
PA -0.471 0.125 -0.523 0.125 -0.512 0.122

Table 2: Posterior results for .

MCEM Algorithm 1 Algorithm 2

MLE Std Error Mean Std Dev Mean Std Dev
21 0.350 0.150 0.266 0.209 0.254 0.204
22 0.493 0.250 0.928 0.347 0.879 0.345
31 0.203 0.143 0.076 0.222 0.080 0.197
32 0.215 0.133 0.334 0.189 0.295 0.188
33 0.226 0.157 0.474 0.188 0.413 0.182

Table 3: Posterior results for

15
1
2.5 1.2
1
0.4
Sampled output 1
0.5 2
0.2 0.5 0.8
1.5
0 0.6
0
1 0
−0.2 0.4

−0.5 0.5 0.2

−0.4 −0.5
0 5000 0 5000 0 5000 0 5000 0 5000

1 1 1 1 1

0.8 0.8 0.8 0.8 0.8

Autocorr

0.6 0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2 0.2

0 0 0 0 0
0 10 20 0 10 20 0 10 20 0 10 20 0 10 20
σ21 σ22 σ31 σ32 σ33

Figure 1: Sampled output and autocorrelation plots of from Algorithm 1.

displays the scatter plot of the probabilities for two pairs of the algorithms. It will be
seen that the points lie on or very close to the 45 line. The correlations between the
predicted probabilities are over 0.999, indicating that the results from the algorithms are
indistinguishable.
3.4 Identication issues
In tting MNP models it is important to keep in mind the issue of parameter identication.
Keane (1992) points out that the parameters of the MNP model are weakly identied and
attributes this problem to the lack of exclusion restrictions in : He argues that \movements
in the regressor coecients can eectively mimic the eects of changes in the covariance
parameters," thus leading to a at likelihood surface. We attribute the problem of fragile
identication to the large number of free parameters in the model rather than to the lack

16
1 3.5 0.6 1.5
1.4
3
0.4 1.2
Sampled output 1
0.5 2.5
0.2 1
2
0.5 0.8
0
0 1.5 0.6
−0.2 0
1 0.4
0.5 −0.4 0.2
−0.5 −0.5
0 5000 0 5000 0 5000 0 5000 0 5000

1 1 1 1 1

0.8 0.8 0.8 0.8 0.8

Autocorr

0.6 0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2 0.2

0 0 0 0 0
0 10 20 0 10 20 0 10 20 0 10 20 0 10 20
σ21 σ22 σ31 σ32 σ33

Figure 2: Sampled output and autocorrelation plots of from Algorithm 3.

of exclusion restrictions. It is possible to obtain very similar likelihood functions for quite
dierent sets of parameter values whether or not there are exclusion restrictions. The same
problem arises in the MNT version of the model, but it is less serious in the MNL model
because there are no covariance parameters to estimate.
The case J = 2 is examined. Figure 4 displays in the (z1 ; z2 ) space those regions
that lead, respectively, to choices 1 (lightly shaded), 2 (medium shaded), or 3 (heavily
shaded). The distribution of a (zi1 ; zi2 ) pair depends on its mean Xi and the covariance
matrix : To see how fragile identication may arise, consider an observation for which the
mean is located deep in the region where yi = j (i.e., the covariates are very eective in
predicting choice). In that case, the probability that the person chooses j is very high. If
the covariates are eective predictors for most of the observations in a sample, the observed

17
1

0.9

0.8

0.7

0.6
2

0.5
pr

0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
pr1

Figure 3: Probability of observed choice: Algorithm 1 vs 2.

choice is consistent with quite dierent covariance matrices, and the resulting likelihood
function is at. Note that the likelihood contribution of the ith subject is based only on
the actual choice made. Figure 4 illustrates the problem in a less extreme case. The dashed
99% contour is plotted around a mean of (0; 0:5) and vech () = (0; 1); and the solid
contour is around mean (0:39; 0:22) and vech () = (1:68; 3:00): The correlation is zero
for the rst of these and 0.97 for the second. Although the two sets of parameters are very
dierent, they yield the same probabilities of choices to two decimal places: 0.43, 0.22, and
0.35. Thus, even for observations that are not deep in one of the regions, the parameters
may not be well identied, and the extent of the problem would vary for dierent data sets.
In view of this discussion, we support Keane's ideas that identication may be fragile
but believe that for some data sets this fragility will persist even in the presence of exclusion

18
z2

Figure 4: Means at (0,-0.50) and (0.39, -0.22), vech ()s at(0,1) and (1.68,3.00), and two
99% contours.

restrictions. (Our model implies several restrictions; for example, the variable IND1 is not
contained in the T and B equations.) The problem seems less severe for estimates of than
for , and although the coecients are somewhat dierent, the predicted probabilities are
very close.

4 Comparing alternative models with Bayes factors

An important question that we now address is the comparison of alternative multinomial
models. One complication in this context is the fact that the models under consideration are
typically non-nested. In such cases, the Bayesian framework is particularly convenient. Let
the collection of models be denoted by Mk ; k = 1; : : : ; K , and let the marginal likelihood

19
of model Mk be given by
Z
m(yjMk ) = p(yjMk ; ; ())( ; jMk ) d d ; (6)

where we have adopted the parameterization for and suppressed the dependence of the
parameters on Mk : The MNL marginal likelihood has the same form except for the integra-
tion over : Given the marginal likelihood of each model, model evidence in favor of Mk over
Mr is measured by the Bayes factor Bkr ; which is given by the ratio m(yjMk )=m(yjMr ).
4.1 Computation of marginal likelihood
A straightforward way to estimate the integral (6) is by the method of Chib (1995) [see
DiCiccio, Kass, Raftery, and Wasserman (1997) for this and other methods of computing
the marginal likelihood]. The Chib method utilizes Bayes theorem to obtain
))( ; jMk )
m(yjMk ) = p(yjMk ;(; ; ( jM ; y) ;
k
where all normalizing constants are included and and are arbitrary points, taken
to be high density values such as the posterior means. Transforming to the log scale and
utilizing conditional/marginal decompositions yields

log m(yjMk ) = log p(yjMk ; ; ( )) + log ( jMk ) + log ( jMk )
log ( jMk ; y; ) log ( jMk ; y): (7)

A key desirable feature of this approach is that the likelihood function p(yjMk ; ; ( ))
needs to be computed only once. In the appendix we explain how each term in (7) is
computed.
In order to implement this Bayesian model selection approach it is necessary to think
carefully about the prior inputs. One criterion is that the prior distributions lead a priori to
the same distribution of observable responses across models. Another possible requirement
on the prior is that the choice between dierent models depends primarily on the data and
only slightly on the details of the prior. We oer two suggestions for choosing such priors
for :

20
One approach to specifying a prior on k ; where k indicates Mk , is based on the
preposterior distribution of the data under Mk :
Z
Pr(yjMk ) = fk (z j k ; k )k ( k )k (k ) dz d k dk ;
where k N (0; cI ) and vech (k ) / N (s0 ; S0 ). In this approach, c, s0 ; and S0 are chosen
to make Pr(yjMk ) approximately equal for the models to be compared and approximately
equal to what is known about Pr(yjMk ): For example, for the travel data discussed in
the example, the approximate percentage breakdown of people traveling by the various
modes may be known from previous studies, or information may be available for trips
between comparable destinations. Under this approach, the priors for the two models are
comparable in the sense that they produce the same probabilities of choice.
An alternative prior can be based on a method that uses a training sample. For model
Mk , assume that the prior distribution is k (; k jck ); where ck is a vector of hyperpa-
rameters. Let yt be a vector of n1 observations selected at random from y; and let yr be
the remainder of the sample. The training prior distribution is dened as
k ( k ; k jyt ) = k (yt jk ; k )k (k ; k jck ):
The ratio of marginal likelihoods for Mk and Mj based on yt is
m ( y t ) R pk (yt j ; k )k ( ; k jck ) d dk
Bkj = m (yt ) = R p (yt jk ; ) (k; jc ) d kd :
k
j j j j j j j j j j
This expression represents the Bayes factor before seeing the data in yr : Our suggestion is
to choose ck and cj so that Bkj = 1: This choice makes the rst stage priors k (k ; k jck )
and j (j ; j jcj ) comparable in the sense that the Bayes factor based on them and the
training sample does not favor either model.
Bayes factors are now computed for the data of our example. For the purpose of this
illustration we have chosen proper priors for each model that imply approximately the same
prior probability distribution on the outcomes. The consequences of a particular prior for
the outcomes are determined by simulation. This requires the simulation of parameters
from the prior distribution followed by a simulation of the outcomes given the parameters.
These two steps are repeated a large number of times and the hyperparameters are adjusted
until the implied empirical distribution of the outcomes is roughly similar across models.
21
4.2 Example (cont.)
Let the model tted in Section 3.3 be denoted as M1 , let M2 denote the MNP model
that adds two covariates|in vehicle cost for all stages (INVC) and in vehicle time for all
stages (INVT)|to model M1 ; and let M3 denote the MNP model in which has equal
covariances. This patterned covariance arises from the assumption that the original set of
four latent variables are independent. Finally, let M4 denote the MNL model and let M5
denote the MNT model with = 10 (both with the same covariates as M1 ).
We begin with the posterior distribution of in model M3 . Due to the restriction
on the covariances, Algorithm 2 cannot be applied in this case, but one can use a version
of Algorithm 1 where the parameters are sampled directly through an MH step. The
posterior distribution is summarized in Table 4. The posterior distribution of in this
model is close to that of M1 and is not reported.

Algorithm 1
Covariance Mean Std Dev
ij ; i 6= j 0.267 0.112
22 0.800 0.310
33 0.445 0.184

Table 4: Posterior distribution for under M3 .

The rst set of model comparison results, computed as a by-product of Algorithm 1

(with a reduced run of 10,000 iterations), is contained in Tables 5. From this table it is
clear that the data strongly favor M4 , that the support for M1 and M3 is approximately
the same, and that M2 is decisively rejected. This comparison of alternative non-nested
models nicely illustrates the usefulness of the Bayes factors approach.
Next, Table 6 allocates the marginal likelihood into its components. Table 6 reveals
that M2 and M4 have similar likelihood values, even though the former has nine more
parameters, and that the larger value of the marginal likelihood of M4 arises from the value
of its posterior ordinate. Moreover, since the prior ordinates except for M2 are very similar,
the dominance of M4 cannot be ascribed to incomparable priors. Further understanding
of the dominance of the MNL model comes from Figure 5. This gure provides a scatter

22
Model M1 M2 M3 M4
M2 -4.63 { { {
M3 0.54 5.17 { {
M4 7.81 12.43 7.27 {
M5 2.23 6.86 1.69 -5.58

Table 5: Log (base 10) of Bayes factors for row model against column model
Model Data Prior Posterior Marginal S. E.
Likelihood Ordinate Ordinate Likelihood
M 1 -83.37 -10.92 -9.55 -103.72 0.05
M 2 -79.76 -13.65 -14.94 -108.35 0.06
M 3 -83.11 -10.54 -9.53 -103.18 0.04
M 4 -80.75 -9.99 -5.17 -95.91 0.04
M 5 -83.82 -8.78 -8.89 -101.49 0.05

Table 6: Log (base 10) of the marginal likelihood and its components. Numerical standard
error in the last column is computed as in Chib (1995).

plot of the predicted probabilities from models M1 and M4 of the choice made by each
subject. In the case of the MNL model, the probability of the observed outcome is larger
than that from the MNP model in about 80% of the observations. Thus, in this case, the
MNL model is more successful than the MNP model in predicting the choices made by the
individuals. The MNT model was included because it represents a compromise between
MNP and MNL in the sense that it allows for correlated errors but has thicker tails than
the normal. Interestingly, Greene (1997) obtains a parallel result with a classical nested
models test. He compares the MNL model and the nested logit model (a model that is
similar to the MNP in that both relax the independence of irrelevant attributes property)
and nds that the MNL model cannot be rejected for these data.

5 Conclusions
This paper has presented a set of new MCMC-based algorithms and inference procedures
for the Bayesian analysis of the MNP model. One contribution is the comparison for the
rst time of dierent MCMC algorithms for simulating the posterior distribution of the

23
1

0.9

0.8

0.7

0.6
mnl

0.5

0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
mnp

Figure 5: Predicted posterior mean of observed choices from the MNL model (vertical axis)
and the MNP model (horizontal axis)

parameters. Another contribution is the study of the MNT model and its analysis by
MCMC methods. A general comment based on our experience is that the tting of these
models requires some care and that the covariance parameters can be particularly dicult
to estimate, regardless of the algorithm that may be used in the tting.
An important concern of this paper is the question of comparing the t of alternative
MNP models and the t of the MNP model with that of the MNT and the simpler MNL
model. We show that the Bayes factors framework is quite useful for this purpose and that
the marginal likelihood of competing models can be computed from the MCMC output as
a by-product of the simulation procedure. One interesting result is that the MNP model
is not guaranteed to t better than the MNL model once model complexity is taken into
account. Finally, the paper reports on a probability plot for comparing the t of alternative

24
multinomial response models that should be useful in the practical tting of these models.

A Appendix
A.1 Computing ( i ) with the GHK algorithm
p y j ;

We compute p(yij ; ) by the Geweke-Keane-Hajivassiliou (GHK) method [see Geweke

(1991), Hajivassiliou (1990), and Keane (1994)] for the MNP model as follows. Let i = Xi
and = LL0 ; where L = flkm g is lower triangular, and repeat the following steps for
r = 1; 2; : : : ; R.

If yi = j (j < J + 1), reorder the variables so that outcome j appears in the rst row.
Let
Q(i1r) = 1 (A(i1r) );
where A(i1r) = i1 =l11 and () is the cdf of the standard normal distribution. Draw
(i1r) from TN(A(i1r) ; 1); where TN(B; U ) is the standard normal distribution truncated
to (B; U ).

{ For j = 2; : : : ; J; let
Q(ijr) = (Bij(r) );
where (r ) Pj 1 l (r)
Bij(r) = i1 ij + l11 il1 m=1 jm im :
jj
{ Draw ijr from TN( 1; Bijr ):
( ) ( )

If yi = J + 1; repeat the following steps.

{ For j = 1; 2; : : : ; J; let
Q(ijr) = (Bij(r) );
where Pj 1 (r)
Bij(r) = (ij + lm=1 ljm im ) :
jj
{ Draw ijr from TN( 1; Bijr ):
( ) ( )

25
Compute
Y
J
Q(ir) = Q(ijr) :
j =1
The GHK estimate of the probability p(yi j; ) is then given by
X
R
Qi = R 1
Q(ir) :
r=1
To apply this method to the MNT model, one draws the (ijr) from the standard univariate-t
distribution, truncated as above, and replaces the cdf of the normal in the above calculations
by the cdf of the t distribution.
A.2 Computing the marginal likelihood using Chib's method
Details for computing the marginal likelihood for the MNP model of equation (7) follow,
and the necessary modications for the MNT model are obvious. Calculations for the MNL
model are discussed at the end of this subsection. Note that in this section and refer
to values of and at high density points. The dependence of the parameters on Mk is
suppressed.

1. The likelihood contribution of the ith observation f (yijMk ; ; ) is calculated by

the GHK algorithm as described above. The log p(yijMk ; ; ) are then added to
obtain log p(yjMk ; ; ):
2. The next term, ( jMk ); is the ordinate of a normal distribution.
3. The term ( jMk ) is the ordinate of a normal distribution.
4. The fourth term,
Z
( jMk ; y; ) = ( jMk ; y; Z; )f (ZjMk ; y; ) dZ;
where (jMk ; y; ; Z); is a normal distribution with parameters ( 1 ; B1 ) evaluated
at : This integral can be accurately estimated by drawing a large sample of Z values
from a reduced MCMC run consisting of draws from

[Z1 jMk ; y1 ; ; ]; : : : ; [Zn jMk ; yn ; ; ]; and [jMk ; Z1 ; : : : ; Zn ; ]:

26
From the G values of Z drawn from this run, an estimate of the desired ordinate is
given by
X
G
^ (jMk ; y; ) = G 1 p ( j(1g) ; B1(g) 1 );
g=1
where (1g) = B(1g) 1 P 1 ( ) ( ) P
( ni X0i ( ) Zig + B ) and B g = B + ni X0i ( ) Xi :
1
0 0 1 0

5. Kernel smoothing may be applied to the sample of generated by the original MCMC
run to obtain the ordinate at : If is high-dimensional it may be desirable to nd
the ordinate by applying the kernel smoothing to several blocks of the ij [Chib and
Greenberg (1995b)]. Note that the kernel smoothing steps suggested here and above
can be made as accurate as desired by increasing the number of simulated values.
This option is, of course, not available when kernel smoothing is employed on data
for which the sample size is xed.

Finally, the calculation of the marginal likelihood for the MNL model proceeds in a
similar fashion. The likelihood function at is available in closed form. The prior ordinate
for is a normal distribution, and the posterior ordinate is computed by kernel smoothing.

References
Albert, J. and S. Chib (1993), Bayesian analysis of binary and polychotomous response
data. Journal of the American Statistical Association 88, 669{679.
Chib, S. (1995), Marginal likelihood from the Gibbs output. Journal of the American
Statistical Association, 90, 1313{1321.
Chib, S. and E. Greenberg (1995a), Understanding the Metropolis-Hastings algorithm.
The American Statistician, 49, 327{335.
Chib, S. and E. Greenberg (1995b), Analysis of multivariate probit models. Biometrika,
forthcoming.
Chib, S. and Greenberg, E. (1996), Markov Chain Monte Carlo Simulation Methods in
Econometrics, Econometric Theory, 12, (1996), 409-431.
DiCiccio, T., R. Kass, A. Raftery, and L. Wasserman (1997), Computing Bayes fac-
tors by combining simulation and asymptotic approximations. Journal of the Ameri-
can Statistical Association, 92, 903{915.
Gelfand, A. E. and Smith, A. F. M. (1990), Sampling-Based approaches to calculating
marginal densities, Journal of the American Statistical Association, 85, 398-409.
27
Geweke, J. (1991), Ecient simulation from the multivariate normal and Student-t distri-
butions subject to linear constraints. In Computer Science and Statistics: Proceedings
of the Twenty-Third Symposium on the Interface, 571{578.
Geweke, J., M. Keane, and D. Runkle (1994), Alternative computational approaches
to inference in the multinomial probit model, Review of Economics and Statistics, 76,
609{632.
Greene, W. (1997), Econometric Analysis, 3rd ed., Upper Saddle River, NJ: Prentice-
Hall.
Hajivassiliou, V. A. (1990), Smooth simulation estimation of panel LDV models. Manuscript.
Hausman, J.A. and D.A. Wise (1978), A conditional probit model for qualitative choice:
Discrete decisions recognizing interdependence and heterogenous preferences. Econo-
metrica, 46, 403-426.
Keane, M. P. (1992), A note on identication in the multinomial probit model. Journal
of Business & Economic Statistics, 10, 193{200.
Keane, M. P. (1994), A computationally practical simulation estimator for panel data.
Econometrica, 62, 95{116.
Louis, T. A. (1982), \Finding the observed information matrix using the EM algorithm,"
Journal of the Royal Statistical Society B, 44, 226{233.
McCulloch, R. E. and P. E. Rossi (1994), Exact likelihood analysis of the multinomial
probit model. Journal of Econometrics, 64, 207{240.
McFadden, D (1989), A method of simulated moments for estimation of discrete response
models without numerical integration. Econometrica, 57, 995-1026.
Natarajan, R., C. E. McCulloch, and N. M. Kiefer (1995), Maximum likelihood
for the multinomial probit model. Manuscript.
Nobile, A. (1995), A hybrid Markov chain for the Bayesian analysis of the multinomial
probit model. Manuscript.
Pinheiro, J. C., and D. M. Bates (1996), Unconstrained parametrizations for variance-
covariance matrices. Statistics and Computing, 6, 289{296.
Stern, S. (1997), Simulation-based estimation. Journal of Economic Literature, 35, 2006{
2039.

1-Value at Risk (Var) Models, Methods & Metrics - Excel Spreadsheet Walk Through Calculating Value at Risk (Var) - Comparing Var Models, Methods & Metrics
No ratings yet
1-Value at Risk (Var) Models, Methods & Metrics - Excel Spreadsheet Walk Through Calculating Value at Risk (Var) - Comparing Var Models, Methods & Metrics
65 pages
IR - Part 1 - C-BGP - Final - 16june2017
No ratings yet
IR - Part 1 - C-BGP - Final - 16june2017
65 pages
Schedule Management Plan
No ratings yet
Schedule Management Plan
12 pages
Operations Management and Systems Engineering
No ratings yet
Operations Management and Systems Engineering
253 pages
Settlement Prediction of Embankment On Soft Soil
No ratings yet
Settlement Prediction of Embankment On Soft Soil
8 pages
Ansys Advanced Analysis Techniques Guide
No ratings yet
Ansys Advanced Analysis Techniques Guide
368 pages
Simulation and System Modeling: Dr. K. M. Salah Uddin Professor, Dept. of MIS University of Dhaka
No ratings yet
Simulation and System Modeling: Dr. K. M. Salah Uddin Professor, Dept. of MIS University of Dhaka
35 pages
TreeAgePro 2013 Manual
No ratings yet
TreeAgePro 2013 Manual
588 pages
Computer Simulations and Monte Carlo Methods
No ratings yet
Computer Simulations and Monte Carlo Methods
57 pages
Ang y Tang ProbabilityConceotinEngineering Searchable PDF
No ratings yet
Ang y Tang ProbabilityConceotinEngineering Searchable PDF
419 pages
Data Analytics For Non-Life Insurance Pricing
No ratings yet
Data Analytics For Non-Life Insurance Pricing
240 pages
Thesis Nordlund945 RaoBlack
No ratings yet
Thesis Nordlund945 RaoBlack
114 pages
Guhe and Lascarides - 2014 - Game Strategies For The Settlers of Catan
No ratings yet
Guhe and Lascarides - 2014 - Game Strategies For The Settlers of Catan
8 pages
Appropriate Budget Contingency Determination For Construction Projects: State-Of-The-Art
No ratings yet
Appropriate Budget Contingency Determination For Construction Projects: State-Of-The-Art
16 pages
Jianye Ching Et Al-Predicting Landslide Probablities in Taiwan
No ratings yet
Jianye Ching Et Al-Predicting Landslide Probablities in Taiwan
11 pages
Analysis of Complex Sample Survey Data: Multinomial and Ordinal Logistic Regression For Complex Samples
No ratings yet
Analysis of Complex Sample Survey Data: Multinomial and Ordinal Logistic Regression For Complex Samples
39 pages
Quantitative Techniques For Decision-Making
100% (1)
Quantitative Techniques For Decision-Making
18 pages
Quality and Reliability Engineering Lecture 1. Introduction, Monte Carlo
No ratings yet
Quality and Reliability Engineering Lecture 1. Introduction, Monte Carlo
33 pages
Why The Monte Carlo Method Is So Important Today
0% (1)
Why The Monte Carlo Method Is So Important Today
11 pages
0412055511MarkovChain
100% (5)
0412055511MarkovChain
508 pages
A Practitioners Guide To Pricing Callable Libor Exotics
No ratings yet
A Practitioners Guide To Pricing Callable Libor Exotics
58 pages
Thesis Resit Max Wiedmaier PDF
No ratings yet
Thesis Resit Max Wiedmaier PDF
67 pages
Evaluación Probabilística de La Estabilidad de Taludes
No ratings yet
Evaluación Probabilística de La Estabilidad de Taludes
10 pages
Contribution of Heavy Metals To Storm Water From Automotive Disc Brake Pad Wear
No ratings yet
Contribution of Heavy Metals To Storm Water From Automotive Disc Brake Pad Wear
42 pages
2003 Awr 3
No ratings yet
2003 Awr 3
12 pages
A Bayesian Analysis of The Multinomial Probit Model Using Marginal Data Augmentation
No ratings yet
A Bayesian Analysis of The Multinomial Probit Model Using Marginal Data Augmentation
24 pages
Artifact II. MGMT 505 - Group Earth Peak Lan Project Case Study
No ratings yet
Artifact II. MGMT 505 - Group Earth Peak Lan Project Case Study
11 pages
A Markov Decision Model To Optimize Hotel Room Occupancy Under Stochastic Demand
No ratings yet
A Markov Decision Model To Optimize Hotel Room Occupancy Under Stochastic Demand
7 pages
MCMC Methods For Multi-Response Generalized Linear
No ratings yet
MCMC Methods For Multi-Response Generalized Linear
22 pages
Likelihood and Conditional Likelihood Inference For Generalized Additive Mixed Models For Clustered Data
No ratings yet
Likelihood and Conditional Likelihood Inference For Generalized Additive Mixed Models For Clustered Data
17 pages
Estimation of Multivariate Probit Models: A Mixed Generalized Estimating/pseudo-Score Equations Approach and Some Nite Sample Results
No ratings yet
Estimation of Multivariate Probit Models: A Mixed Generalized Estimating/pseudo-Score Equations Approach and Some Nite Sample Results
19 pages
1 s2.0 S0047259X20302323 Main
No ratings yet
1 s2.0 S0047259X20302323 Main
19 pages
A Brief Overview of Bayesian Model Averaging: Chris Sroka, Juhee Lee, Prasenjit Kapat, Xiuyun Zhang
No ratings yet
A Brief Overview of Bayesian Model Averaging: Chris Sroka, Juhee Lee, Prasenjit Kapat, Xiuyun Zhang
70 pages
5
No ratings yet
5
23 pages
The Use of Monte Carlo Technique To Optimize The Dose Distribution in Total Skin Irradiation
No ratings yet
The Use of Monte Carlo Technique To Optimize The Dose Distribution in Total Skin Irradiation
2 pages
MCMC For Epidemiologists
No ratings yet
MCMC For Epidemiologists
8 pages
Week11 Multinomial
No ratings yet
Week11 Multinomial
11 pages
Course Notes
No ratings yet
Course Notes
141 pages
Anybody Can Do Value at Risk
No ratings yet
Anybody Can Do Value at Risk
18 pages
Isight 55 Datasheet
No ratings yet
Isight 55 Datasheet
2 pages
Multinomial-Poisson Homogeneous Models For Contingency Tables
No ratings yet
Multinomial-Poisson Homogeneous Models For Contingency Tables
44 pages
MCMC Final Edition
No ratings yet
MCMC Final Edition
17 pages
Plasil Vlach
No ratings yet
Plasil Vlach
5 pages
Jackman EstimationInferencevia 2000
No ratings yet
Jackman EstimationInferencevia 2000
31 pages
Econometrics - Qualitative Response Models
No ratings yet
Econometrics - Qualitative Response Models
17 pages
基于马尔可夫链蒙特卡洛方法的常微分方程参数推断研究熊靖
No ratings yet
基于马尔可夫链蒙特卡洛方法的常微分方程参数推断研究熊靖
67 pages
Multiple Regression Methodology and Applications
No ratings yet
Multiple Regression Methodology and Applications
7 pages
MNP: R Package For Fitting The Multinomial Probit Model: Kosuke Imai David A. Van Dyk Version 2.6-2
No ratings yet
MNP: R Package For Fitting The Multinomial Probit Model: Kosuke Imai David A. Van Dyk Version 2.6-2
25 pages
1 s2.0 S0047259X14002620 Main
No ratings yet
1 s2.0 S0047259X14002620 Main
16 pages
Probability and Computing Randomization and Probabilistic Techniques in Algorithms and Data Analysis 2nd Edition Michael Mitzenmacher
100% (1)
Probability and Computing Randomization and Probabilistic Techniques in Algorithms and Data Analysis 2nd Edition Michael Mitzenmacher
65 pages
ML 2024 Part1 CrossValidation
No ratings yet
ML 2024 Part1 CrossValidation
43 pages
Estimating Models and Evaluating Their Efficiency Under Multicollinearity in Multiple Linear Regression: A Comparative Study
No ratings yet
Estimating Models and Evaluating Their Efficiency Under Multicollinearity in Multiple Linear Regression: A Comparative Study
15 pages
CFB 1
No ratings yet
CFB 1
30 pages
Aaai07 262
No ratings yet
Aaai07 262
5 pages
From Equations To Predictions Understanding The Mathematics and Machine Learning of Multiple Linear Regression
No ratings yet
From Equations To Predictions Understanding The Mathematics and Machine Learning of Multiple Linear Regression
9 pages
Chat GPT
No ratings yet
Chat GPT
6 pages
Encyclopediaentry MTMMShen3
No ratings yet
Encyclopediaentry MTMMShen3
9 pages
Ordinal Regression Models For: Zero-Inflated And/or Over-Dispersed Count Data
No ratings yet
Ordinal Regression Models For: Zero-Inflated And/or Over-Dispersed Count Data
12 pages
Monte Carlo Methods in Bayesian Computation Full Text Download
94% (17)
Monte Carlo Methods in Bayesian Computation Full Text Download
16 pages
6A. Econometrics Review
No ratings yet
6A. Econometrics Review
8 pages
Jacob Rich
No ratings yet
Jacob Rich
38 pages
UNIT II Part-2
No ratings yet
UNIT II Part-2
32 pages
Hidden Markov Model: Fundamentals and Applications
From Everand
Hidden Markov Model: Fundamentals and Applications
Fouad Sabry
No ratings yet
Circumscription Logic: Fundamentals and Applications
From Everand
Circumscription Logic: Fundamentals and Applications
Fouad Sabry
No ratings yet
Introduction to Statistics
From Everand
Introduction to Statistics
Simone Malacrida
No ratings yet
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet
Introduction to Applied Econometrics Analysis Using Stata
From Everand
Introduction to Applied Econometrics Analysis Using Stata
Justin Doran
5/5 (3)
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Complex Analysis: Advanced Concepts
From Everand
Complex Analysis: Advanced Concepts
Shashank Tiwari
No ratings yet
Complex numbers
From Everand
Complex numbers
Alessio Mangoni
No ratings yet
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
Differential Equations: A Concise Course
From Everand
Differential Equations: A Concise Course
H. S. Bear
5/5 (3)
Topology and Geometry for Physicists
From Everand
Topology and Geometry for Physicists
Charles Nash
3.5/5 (1)
Elements of the Theory of Functions
From Everand
Elements of the Theory of Functions
Konrad Knopp
4.5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Elements of Tensor Calculus
From Everand
Elements of Tensor Calculus
A. Lichnerowicz
3.5/5 (2)
Graphs and Tables of the Mathieu Functions and Their First Derivatives
From Everand
Graphs and Tables of the Mathieu Functions and Their First Derivatives
James C. Wiltse
No ratings yet
Ordinary Differential Equations and Stability Theory: An Introduction
From Everand
Ordinary Differential Equations and Stability Theory: An Introduction
David A. Sanchez
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
From Everand
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
Mohammed Chadli
No ratings yet

MCMC Methods For Fitting and Comparing Multinomial Response Models

Uploaded by

MCMC Methods For Fitting and Comparing Multinomial Response Models

Uploaded by

MCMC Methods for Fitting and Comparing

Multinomial Response Models

2 Multinomial response models

zij = (vij vi;J +1 )0  + wi0 j + "ij

Zij ; ; i  N (Xi ; i ):

Conditionally on i , this model is equivalent to the MNP.

3 MCMC tting of MNP models

Basic algorithm for sampling the MNP posterior distribution

Zi j is the vector Zi excluding the j th component;

 Sample from ( jy; Z; ); and

(zij jZi( j ) ; ; ) = f (zij jZi( j) ; ; )I (zij 2 Rij ); (2)

If the prior density of is N ( 0 ; B0 1 ), then jy; Z; ; which is independent of y; is

PN Z j is the average of Z over the N draws.

 Update  to  t by maximizing the function PN log f (Z j j t ; ), where f (:)

is the complete data density.

 N (0; 10Ik ) and  / N (s0 ; S0 );

Table 1: Data for the rst two of 210 subjects.

Table 2: Posterior results for .

MCEM Algorithm 1 Algorithm 2

Table 3: Posterior results for 

−0.5 0.5 0.2

0.8 0.8 0.8 0.8 0.8

0.6 0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2 0.2

Figure 1: Sampled output and autocorrelation plots of  from Algorithm 1.

0.8 0.8 0.8 0.8 0.8

0.6 0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2 0.2

Figure 2: Sampled output and autocorrelation plots of  from Algorithm 3.

Figure 3: Probability of observed choice: Algorithm 1 vs 2.

4 Comparing alternative models with Bayes factors

Table 4: Posterior distribution for  under M3 .

The rst set of model comparison results, computed as a by-product of Algorithm 1

We compute p(yij ; ) by the Geweke-Keane-Hajivassiliou (GHK) method [see Geweke

 If yi = J + 1; repeat the following steps.

1. The likelihood contribution of the ith observation f (yijMk ; ;  ) is calculated by

[Z1 jMk ; y1 ; ;  ]; : : : ; [Zn jMk ; yn ; ;  ]; and [ jMk ; Z1 ; : : : ; Zn ;  ]:

You might also like

zij = (vij vi;J +1 )0 + wi0 j + "ij

Zij; ; i N (Xi ; i ):

Conditionally on i , this model is equivalent to the MNP.

Sample from (jy; Z; ); and

(zij jZi( j ) ; ; ) = f (zij jZi( j) ; ; )I (zij 2 Rij ); (2)

If the prior density of is N ( 0 ; B0 1 ), then jy; Z; ; which is independent of y; is

Update to t by maximizing the function PN log f (Z j j t ; ), where f (:)

N (0; 10Ik ) and / N (s0 ; S0 );

Table 3: Posterior results for

Figure 1: Sampled output and autocorrelation plots of from Algorithm 1.

Figure 2: Sampled output and autocorrelation plots of from Algorithm 3.

Table 4: Posterior distribution for under M3 .

We compute p(yij ; ) by the Geweke-Keane-Hajivassiliou (GHK) method [see Geweke

If yi = J + 1; repeat the following steps.

1. The likelihood contribution of the ith observation f (yijMk ; ; ) is calculated by

[Z1 jMk ; y1 ; ; ]; : : : ; [Zn jMk ; yn ; ; ]; and [jMk ; Z1 ; : : : ; Zn ; ]: