Applied Choice Analysis
Applied Choice Analysis
Applied Choice Analysis
The second edition of this popular book brings students fully up to date with the latest
methods and techniques in choice analysis. Comprehensive yet accessible, it offers a
unique introduction to anyone interested in understanding how to model and fore-
cast the range of choices made by individuals and groups. In addition to a complete
rewrite of several chapters, new topics covered include ordered choice, scaled MNL,
generalized mixed logit, latent class models, group decision making, heuristics and
attribute processing strategies, expected utility theory, and prospect theoretic applica-
tions. Many additional case studies are used to illustrate the applications of choice
analysis with extensive command syntax provided for all Nlogit applications and
datasets available online. With its unique blend of theory, estimation, and application,
this book has broad appeal to all those interested in choice modeling methods and
will be a valuable resource for students as well as researchers, professionals, and
consultants.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:00:54 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:00:54 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
Applied Choice Analysis
Second Edition
David A. Hensher
The University of Sydney Business School
John M. Rose
The University of Sydney Business School*
William H. Greene
Stern School of Business, New York University
* John Rose completed his contribution to the second edition while at The University of Sydney. He has since relocated
to The University of South Australia
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:00:54 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
University Printing House, Cambridge CB2 8BS, United Kingdom
www.cambridge.org
Information on this title: www.cambridge.org/9781107465923
© David A. Hensher, John M. Rose and William H. Greene 2015
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2005
Second edition 2015
Printed in the United Kingdom by TJ International Ltd. Padstow Cornwall
A catalogue record for this publication is available from the British Library
Library of Congress Cataloguing in Publication data
Hensher, David A., 1947–
Applied choice analysis / David A. Hensher, The University of Sydney Business School, John
M. Rose,
The University of Sydney Business School, William H. Greene, Stern School of Business, New York
University. – 2nd edition.
pages cm
John M. Rose now at University of South Australia.
Includes bibliographical references and index.
ISBN 978-1-107-09264-8
1. Decision making – Mathematical models. 2. Probabilities – Mathematical
models. 3. Choice. I. Rose,
John M. II. Greene, William H., 1951– III. Title.
QA279.4.H46 2015
519.50 42–dc23
2014043411
ISBN 978-1-107-09264-8 Hardback
ISBN 978-1-107-46592-3 Paperback
Additional resources for this publication at www.cambridge.org/9781107465923
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication,
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:00:54 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
Contents
1 In the beginning 3
1.1 Choosing as a common event 3
1.2 A brief history of choice modeling 6
1.3 The journey ahead 11
2 Choosing 16
2.1 Introduction 16
2.2 Individuals have preferences and they count 17
2.3 Using knowledge of preferences and constraints in choice analysis 27
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:03:25 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
vi Contents
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:03:25 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
vii Contents
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:03:25 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
viii Contents
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:03:25 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
ix Contents
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:03:25 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
x Contents
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:03:25 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
xi Contents
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:03:25 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
xii Contents
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:03:25 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
xiii Contents
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:03:25 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
xiv Contents
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:03:25 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
xv Contents
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:03:25 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
xvi Contents
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:03:25 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
Figures
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:03:26 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
xviii List of figures
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:03:26 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
xix List of figures
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:03:26 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
xx List of figures
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:03:26 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
xxi List of figures
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:03:26 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
Tables
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:06:06 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
xxiii List of tables
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:06:06 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
xxiv List of tables
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:06:06 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
xxv List of tables
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:06:06 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
xxvi List of tables
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:06:06 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
xxvii List of tables
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:06:06 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:06:06 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
I’m all in favor of keeping dangerous weapons out of the hands of fools. Let’s start
with typewriters.
(Frank Lloyd Wright 1868–1959)
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:06:20 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.001
Cambridge Books Online © Cambridge University Press, 2015
xxx Preface
discrete choice analysis are written for the well informed is common, and was
sufficient incentive to write the first edition of this book and a subsequent need
to revise it to include the many new developments since 2004 (when the first
edition was completed), as well as to clarify points presented in the first edition
on which many readers sought further advice. The new topics, in addition to a
complete rewrite of most previous chapters, include ordered choice, generalized
mixed logit, latent class models, statistical tests (including partial effects and
model output comparisons), group decision making, heuristics, and attribute
processing strategies, expected utility theory, prospect theoretic applications,
and extensions to allow for non-linearity in parameters. The single case study
has been replaced by a number of case studies, each chosen as an example of
data that best illustrate the application of one or more choice models.
This book for beginners in particular, but also of value to seasoned
researchers, is our attempt to meet the challenge. We agreed to try and write
the first draft of the first edition without referring to any of the existing
material as a means (hopefully) of encouraging a flow of explanation.
Pausing to consult can often lead to terseness in the code (as writers of novels
can attest). Further draft versions leading to the final product did, however,
cross-reference to the literature to ensure that we had acknowledged appro-
priate material. This book in both its first and second edition guises, however,
is not about ensuring that all contributors to the literature on choice are
acknowledged, but rather ensuring that the novice choice analyst is given a
fair go in their first journey through this intriguing topic.
We dedicate this book to the beginners, but we also acknowledge our research
colleagues who have influenced our thinking as well as co-authored papers over
many years. We thank Michiel Bliemer for his substantial input to Chapter 6
as well as Andrew Collins and Chinh Ho for their case studies using NGene. We
also thank Waiyan Leong and Andrew Collins for their substantial contribution
to Chapter 21. We especially recognize Dan McFadden (2000 Nobel Laureate in
Economics), Ken Train, Chandra Bhat, Jordan Louviere, Andrew Daly, Moshe
Ben-Akiva, David Brownstone, Michiel Bliemer, Juan de Dios Ortúzar, Joffre
Swait, and Stephane Hess. Colleagues and doctoral students at the University of
Sydney read earlier versions. In particular, we thank Andrew Collins, Riccardo
Scarpa, Sean Puckett, David Layton, Danny Campbell, Matthew Beck, Zheng Li,
Waiyan Leong, Chinh Ho, Kwang Kim and Louise Knowles, and the 2004–2013
graduate classes in Choice Analysis as well as participants in the annual short
courses on choice analysis and choice experiments at The University of Sydney
and various other locations in Europe, Asia, and the United States, who were
guinea pigs for the first full use of the book in a teaching environment.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:06:20 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.001
Cambridge Books Online © Cambridge University Press, 2015
Part I
Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:02 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:02 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
Why did we choose to write the first edition of this primer and then a second
edition? Can it be explained by some inherent desire to seek personal gain or
was it some other less self-centered interest? In determining the reason, we are
revealing an underlying objective. It might be one of maximizing our personal
satisfaction level or that of satisfying some community-based objective (or
social obligation). Whatever the objective, it is likely that there are a number of
reasons why we made such a choice (between writing and not writing this
primer) accompanied by a set of constraints that had to be taken into account.
An example of a reason might be to “promote the field of research and practice
of choice analysis”; examples of constraints might be the time commitment
and the financial outlay.
Readers should be able to think of choices that they have made in the last
seven days. Some of these might be repetitive and even habitual (such as
taking the bus to work instead of the train or car), buying the same daily
newspaper (instead of other ones on sale); other choices might be a once-off
decision (such as going to the movies to watch a latest release or purchasing
this book). Many choice situations involve more than one choice (such as
choosing a destination and means of transport to get there, choosing where to
live and the type of dwelling, or choosing which class of grapes and winery in
sourcing a nice bottle of red or white).
The storyline above is rich in information about what we need to include in
a study of the choice behavior of individuals or groups of individuals (such as
households, lobby groups, and organisations). To arrive at a choice, an
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.003
Cambridge Books Online © Cambridge University Press, 2015
4 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.003
Cambridge Books Online © Cambridge University Press, 2015
5 In the beginning
means of transport and departure time. That is, a specific choice of means of
transport may indeed be changed as a consequence of the person changing
where they reside or work. In a shorter period such as one year, choosing
among modes of transport may be conditional on where one lives or works,
but the latter is not able to be changed given the time that it takes to relocate
one’s employment.
The message in the previous paragraphs is that careful thought is
required to define the choice setting so as to ensure that all possible
behavioral responses (as expressed by a set of choice situations) can be
accommodated when a change in the decision environment occurs. For
example, if we increase fuel prices, then the cost of driving a car increases.
If one has only studied the choice of mode of transport then the decision
maker will be “forced” to modify the choice among a given set of modal
alternatives (e.g., bus, car, train). However it may be that the individual
would prefer to stay with the car but to change the time of day they travel
so as to avoid traffic congestion and conserve fuel. If the departure time
choice model is not included in the analysis, then experience shows that
the modal choice model tends to force a substitution between modes,
which in reality is a substitution between travel at different times of the
day by car.
Armed with a specific problem or a series of associated questions, the
analyst now recognizes that to study choices we need a set of choice situations
(or outcomes), a set of alternatives and a set of attributes that belong to each
alternative. But how do we take this information and convert it to a useful
framework within which we can study the choice behavior of individuals? To
do this, we need to set up a number of behavioral rules under which we believe
it is reasonable to represent the process by which an individual considers a set
of alternatives and makes a choice. This framework needs to be sufficiently
realistic to explain past choices and to give confidence in likely behavioral
responses in the future that result in staying with an existing choice or making
a new choice. The framework should also be capable of assessing the likely
support for alternatives that are not currently available, be they new alter-
natives in the market or existing ones that are physically unavailable to some
market segments. These are some of the important issues that choice modelers
will need to address and which are central to the journey throughout this
book.
Before we overview the structure of the book, we thought it useful to go
back in time and get an appreciation of the evolution of choice modeling,
which began at least ninety years ago.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.003
Cambridge Books Online © Cambridge University Press, 2015
6 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.003
Cambridge Books Online © Cambridge University Press, 2015
7 In the beginning
response function by setting πi/πj = exp(αi−αj). Bradley and Terry (1952) were
the first (in the psychological literature) to estimate the logit response function
by using a maximum likelihood estimator, although the logistic form goes
back many years in Bioassay (see Ashton 1972 for a review and summary of
the contribution of Berkson). Estimates of the natural log of πi/πj were
obtained by employing logistic deviates yij = ln{probij/(1−probij)}. After expo-
nential transformation of parameters (what later became the representative
or observed component of utility), the Bradley–Terry–Luce (BTL) model
becomes equivalent to Thurstone’s Case V model, except that the logistic’s
density replaces the Gaussian density of Thurstone’s response function. The
principle of IIA has the exact same effect as constant correlation of discriminal
processes for all pairs of alternatives (stimuli). This implies that the condi-
tional probability of an individual’s choice between any two alternatives, given
their choice between any other two alternatives, is equal to the unconditional
probability. The famous red bus/blue bus example introduced by Mayberry in
Quandt (1970) and due to Debreu (1960), has been used extensively to high-
light the risk of empirical validity of IIA, which became the springboard for
many of the developments in discrete choice models to circumvent the rigidity
of IIA.
Marschak (1959) generalized the BTL model to stochastic utility maximiza-
tion over multiple alternatives, and introduced it to economics, referring for the
first time to Random Utility Maximization (RUM) (also see Georgescu-Roegen
1954). Marschak explored the testable implications of maximization of random
preferences, and proved for a finite set of alternatives that choice probabilities
satisfying Luce’s IIA axiom were consistent with RUM. An extension of
this result established that a necessary and sufficient condition for RUM
with independent errors to satisfy the IIA axiom was that the εi be identically
distributed with a Type I Extreme Value distribution, Prob(εi ≤ c) = exp(−e−c/σ),
where σ is a scale factor and c is a location parameter. The sufficiency was
proved by Anthony Marley and reported by Luce and Suppes (1965).
In the 1960s a number of researchers realized that the study of choices
among mutually exclusive (discrete) alternatives was not appropriate through
the application of ordinary least squares (OLS) regression. Given that the
dependent variable of interest was discrete, typically binary (1, 0), the use of
OLS would result in predicted outcomes violating the boundary limits of
probability. Although under a binary choice setting, probabilities in the
range 0.3 to 0.7 tended to satisfy a common range of a linear OLS (or linear
probability model form), any probabilities at the extremities were likely to be
greater than 1.0 and less than 0. To avoid this, a transformation is required, the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.003
Cambridge Books Online © Cambridge University Press, 2015
8 Getting started
most popular being the logistic (log of the odds) transformation. Software to
estimate a binary logit (or probit) model started to appear in the 1960s,
replacing the popular discriminant analysis method. The early programs
included PROLO (PRObit-LOgit) written by Cragg at the University of
British Columbia and which was used in many PhD theses in the late 1960s
and early 1970s (including Charles Lave 1970, Thomas Lisco 1967, and David
Hensher 1974). Peter Stopher (at Northwestern University, and now at
Sydney) in the late 1960s had written a program to allow for more than two
alternatives, but as far as we are aware it was rarely used. During the period of
the late 1960s and early 1970s there were a number of researchers developing
logit software for multinomial logit, including McFadden’s code that became
the basis of QUAIL (programmed in particular by David Brownstone),
Charles Manski’s program (XLogit) used by MIT students such as Ben-
Akiva, Andrew Daly’s ALogit, Hensher and Johnson’s BLogit, and Daganzo
and Sheffi’s TROMP. Bill Greene had a version of Limdep in the 1970s that
began with Tobit and then Logit.
Despite the developments in software (mainly binary choice and some
limited multiple choice capability), it was not until the link was made between
McFadden’s contribution at Berkeley (McFadden 1968) and a project under-
taken by Charles River Associates to develop a joint mode and destination
choice model (Domencich and McFadden 1975), that we saw a significant
growth in research designed to deliver practical tools for modeling interde-
pendent discrete choices involving more than two alternatives. By the late
1960s, McFadden had developed an empirical model from Luce’s choice
axiom (centered on IIA as described above). Letting PC(i) denote the prob-
ability that a subject confronted with a set of mutually exclusive and exhaus-
tive alternatives C will choose alternative i, given the IIA property, Luce
showed that if his axiom holds, then one can associate with each alternative
a positive “strict utility” wi such that PC(i) = wi / ∑k2C wk. Taking the strict
utility for alternative i to be a parametric exponential function of its attributes
xi, wi = exp(xiβ), gave a practical statistical model for individual choice data.
McFadden called this the conditional logit model because it reduced to a
logistic in the two-alternative case, and had a ratio form analogous to the
form for conditional probabilities (McFadden 1968, 1974). McFadden (1968,
1974) proved necessity (given sufficiency had already been shown), starting
from the implication of the Luce axiom that multinomial choice between an
object with strict utility w1 and m objects with strict utilities w2 matched
binomial choice between an object with strict utility w1 and an object with
strict utility mw2.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.003
Cambridge Books Online © Cambridge University Press, 2015
9 In the beginning
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.003
Cambridge Books Online © Cambridge University Press, 2015
10 Getting started
that may not align with intuition in the construction of decision trees. With the
knowledge that the distribution of the variance associated with the unobserved
effects can be defined by a location and a scale parameter, the nested logit
model had found a way of explicitly identifying and parameterizing this scale,
which became known alternatively as composite cost, inclusive value, logsum
and expected maximum utility. The contributions to this literature, in particular
the theoretical justification under RUM, are attributable to Williams (1977) and
Daly and Zachary (1978), with a later generalization by McFadden (2001). In
particular, the Williams–Daly–Zachary analysis provides the foundation for
derivation of RUM-consistent choice models from social surplus functions, and
connects RUM-based models to willingness to pay (WTP) for projects.
The period from the mid 1970s to 2010 saw an explosion of contributions to
theory, computation and empirical applications of closed-form discrete-
choice models of the multinomial logit (MNL) and nested logit (NL) variety.
The most notable development of closed-form models occurred when it was
recognized that the nested logit model reveals crucial information to accom-
modate the pooling of multiple data sets, especially revealed and stated
preference data. Although Louviere and Hensher (1982, 1983) and Louviere
and Woodworth (1983) had recognized the role of stated choice data in the
study of discrete choices in situations where new alternatives and/or existing
alternatives with stretched attribute levels outside of these observed in real
markets exist, it was the contribution of Morikawa (see Ben Akiva and
Morikawa 1991) that developed a way to combine data sets while accounting
for differences in scale (or variance) that was the essential feature of the choice
model that had to be satisfied if the resulting model was able to satisfy the
theoretical properties of RUM. Bradley and Daly (1997, but written in 1992)
and Hensher and Bradley (1993) had shown how the nested logit method
could be used as a “nested logit trick,” to identify the scale parameter(s)
associated with pooled data sets and to adjust the parameter estimates so
that all absolute parameters can be compared across data sets.
Despite great progress in linking multiple choices and multiple data sets,
some critical challenges remained. These centered initially on open-form
models such as multinomial probit, which in the 1980s was difficult to estimate
beyond a few alternatives, given the need to accommodate multiple integrals
through analytical solutions. The need for numerical integration was required,
but it was not until a number of breakthroughs associated with the notion of
simulated moments (McFadden 1989) that the door opened to ways of accom-
modating more complex choice models, including models that could account
for the fuller range of sources of unobserved heterogeneity in preferences.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.003
Cambridge Books Online © Cambridge University Press, 2015
11 In the beginning
The era of open-form models such as random parameter logit (also referred
to as mixed logit) and error components logit, enabled researchers to account
for random and systematic sources of taste (or preference) heterogeneity, to
allow for the correlated structure of data common to each sampled individual
(especially the case for stated choice data), and to obtain richer insights into
preference and scale heterogeneity (and heteroskedasticity) associated with
structural and latent influences on choices.
The following sections of this primer will introduce the main rules that are
needed to start understanding the richness of methods available to study. We
will start right from the beginning and learn to “walk before we run.” We will
be pedantic in the interest of clarity, since what is taken for granted by the
long-established choice analyst is often gobbledy-gook to the beginner.
Intolerance on the part of such “experts” has no place in this book.
We have found in our graduate teaching and short courses that the best
way to understand the underlying constructs that are the armory of choice
analysis is to select one or, at most, a limited number of specific choice
problems and follow them through from the beginning to the end.
However, the main feedback on the 1st edition is the request to use a
number of data sets that show the diversity of relevance of choice analysis,
including the popular choose 1 (or first preference) labeled choice data,
unlabeled choices, best–worst attribute and alternative designs, ordered
choices, and choices involving more than one agent, in the context of
mixtures of (or stand alone) settings of revealed and stated preference
data. While readers will come from different disciplinary backgrounds
such as economics, geography, environmental science, marketing, health
science, statistics, engineering, transportation, logistics, and so forth, and
will be practising in these and other fields, the tools introduced through a
limited number of case studies should be sufficient to demonstrate that they
are universally relevant.
A reader who insists that this is not so is at a disadvantage; they are
committing the sin of assuming uniqueness in behavioral decision making
and choice response. Indeed the great virtue of the methods developed under
the rubric of choice analysis is their universal relevance. Their portability is
amazing. Disciplinary boundaries and biases are a threat to this strength.
While it is true that specific disciplines have a lot to offer to the literature on
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.003
Cambridge Books Online © Cambridge University Press, 2015
12 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.003
Cambridge Books Online © Cambridge University Press, 2015
13 In the beginning
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.003
Cambridge Books Online © Cambridge University Press, 2015
14 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.003
Cambridge Books Online © Cambridge University Press, 2015
15 In the beginning
(Chapter 15), latent class (Chapter 16), binary choices (Chapter 17), ordered
logit (Chapter 18), and data fusion (especially SP–RP), including the topic of
hypothetical bias (Chapter 19). Chapter 12 is a diversion to the important
topic of how to handle unlabeled data. The mixed logit chapter includes all of
the variants such as scaled multinomial logit, generalized mixed logit, in
preference and WTP space, and error components, as well as latent class
models (separated into Chapter 16), the latter being a discrete distribution
interpretation of a fixed or random parameter mixed logit model.
The final three chapters (Part IV) are new developments that were not
included in the 1st edition. As model functional form becomes more complex,
the need for nonlinear (in parameters) estimation becomes increasingly
relevant. The old grid search methods need to be replaced with a joint
estimation capability. Chapter 20 introduces the nonlinear random para-
meters model form as a frontier in choice analysis; illustrated after setting
out the new model form with a number of nonlinear models associated with
expected utility theory and variants of prospect theory (such as rank depen-
dent utility theory and cumulative prospect theory). Chapter 21 brings
together the growing literature on attribute processing, more broadly referred
to as process heuristics, which recognizes that respondents typically make
choices in the context of a set of rules that condition how each attribute or
alternative is processed. This is a lengthy new chapter given the growing
importance of this literature in choice modeling. The final chapter,
Chapter 22, moves beyond the single decision maker (or chooser) to a
recognition that many choices are made by groups of individuals. We show
how standard choice modeling methods can be used with data appropriate to
a multiple agent setting, in establishing the influence (or power) of each
decision maker in arriving at a group choice (be it cooperative or non-
cooperative).
Throughout the book we add numerous hints under the boxed heading of “as
an aside.” This format was chosen as a way of preserving the flow of the argument
while placing useful tips where they can best be appreciated. Finally, the data sets
used to illustrate the application of specific choice modeling methods are not
provided with the book; however a few of the data sets will be made available on a
service site for analysts to access (http://sydney.edu.au/itls/ACA-2015).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.003
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
2.1 Introduction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:21 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.004
Cambridge Books Online © Cambridge University Press, 2015
17 Choosing
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:21 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.004
Cambridge Books Online © Cambridge University Press, 2015
18 Getting started
see that the selection of these two attributes is a gross simplification of what
are the underlying influences on preferences for car or train travel; in a serious
data collection we have to measure a greater number of the potentially
relevant attributes. Indeed, even travel time itself is a complex attribute
because it includes all types of travel time – walking to a train station, waiting
for a train, time in the train, the proportion of in-train time that the person is
seated, time in the car, time parking a car, time walking to workplace after
parking the car or alighting from the train and travel time variability (or
reliability) over repeated trips.
To be able to progress, we have to decide on how we might measure the
underlying influences that define an individual’s preferences for car over
train or vice versa. Putting aside a concern about the image of a particular
form of transport (which may ultimately be an important influence on pre-
ference formation, especially for new means of transport), we have assumed
that the choice between car and train is determined by a comparison of the
travel times and costs of the trip. But how relevant or important is time
compared to cost, and does it differ within each alternative? Throughout the
development of choice analysis, we have sought to find a way of measuring an
individual’s preferences through what we call the “sources of preferences.”
Once the sources are identified, they have to be measured in units that enable
us to compare various combinations of the attributes across the alternatives,
and hopefully be confident that the alternative with the highest (positive)
value or index is the most preferred. Whether we can say that an alternative is
preferred by an exact number (i.e., a cardinal measure) or simply state that it
is more preferred (i.e., an ordinal measure) is a more challenging question, but
for this book we can safely put that issue aside.
The best way to progress the measurement of preferences is to recognize
that if the only influencing attribute were travel time, then it would be a simple
exercise to compare the travel times, and conclude that the alternative that
has the shorter travel time to the given destination would be preferred.
However, here we have two attributes (and usually many more). So how do
we measure an individual’s preferences in a multi-attribute environment?
We will begin by looking at each mode of transport separately. Take the
car with its travel time and cost. To reveal an individual’s preferences, we will
invite them to evaluate different combinations of travel time and cost asso-
ciated with a particular trip (whose distance travelled is known). We need to
ensure that all combinations are realistic, although at the same time noting
that some combinations may be outside of the individual’s existing experi-
ences (in the sense that there is no current technology available that can
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:21 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.004
Cambridge Books Online © Cambridge University Press, 2015
19 Choosing
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:21 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.004
Cambridge Books Online © Cambridge University Press, 2015
20 Getting started
Travel time
satisfaction
I1 I2 I3 I4
Bus fare satisfaction
Figure 2.1 Identification of an individual’s preferences for bus use
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:21 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.004
Cambridge Books Online © Cambridge University Press, 2015
21 Choosing
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:21 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.004
Cambridge Books Online © Cambridge University Press, 2015
22 Getting started
R
Pc
Car travel
−Pb
Pc
I1 I2 I3 I4
Bus travel R
Pb
the preferences for bus versus car travel (either in terms of a single attribute
such as cost, or in terms of combinations of attributes such as time and cost).
Note that if we evaluate more than one attribute associated with each alter-
native, then to be able to stay with a simple two-dimensional diagram, we will
need to add up each attribute. In the current context we would have to convert
travel time to dollars, and add it to fares for bus and operating costs for car
to get what is referred to as the generalized cost or generalized price of a trip.
We discuss issues of converting attributes into dollar units in a later chapter
on WTP, but for the time being we will assume that we have a single attribute
associated with bus and car, called cost. Within this revised setting of two
modes, we will define the budget constraint as an individual’s personal
income. In Figure 2.2 we overlay this budget constraint on a set of preference
(or indifference) curves to identify the domain within which preferences can
be realized. How do we present the budget constraint? There are three main
elements in the definition of the budget constraint:
1. The total resources (R) available (e.g., personal income) over a relevant
time period (which we refer to as resources available per unit of time);
2. The unit price associated with car (Pc ) and bus (Pb) travel; and
3. Whether these unit prices are influenced by the individual (as a price
maker) or the individual has no influence, and simply takes prices as
given (i.e., a price taker).
Let us define the price of a unit of car travel as Pc and the price of a bus trip as
Pb. We will also assume that the individual is a price taker and so has no
influence over these unit prices. The slope of the budget constraint is the ratio
of the prices of the two modal trips. To explain why this is so, if we assume that
all of the budget is spent on car travel, then the maximum amount of car travel
that one can “consume” is the total resources (R) divided by Pc. Likewise the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:21 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.004
Cambridge Books Online © Cambridge University Press, 2015
23 Choosing
R
Pc
Car travel A
C
B
D
Bus travel
Figure 2.3 Changes to the budget or resource constraint
R
Pc
Car travel
F G
E
H
Bus travel R
Pb
total amount of bus travel that can be undertaken is R/Pb. As a price taker, the
unit price is constant at all levels of car and bus travel cost, and so is a straight
line. To illustrate what happens to the budget line when we vary price and
resources, in Figure 2.3, starting with the original budget constraint (denoted
as line A) we present line B for a reduction in the price of car travel, line D for
an increase in the price of car travel, and line C for an increase in total
resources (holding prices fixed at Pc and Pb).
In Figure 2.4, when we overlay the budget constraint with the preference
curves, we can see the possible combinations of the two modal trips that an
individual can choose in order to maximize utility subject to the resource
constraint. This will be at point E where an indifference curve is tangential to
the budget constraint for the prices Pc and Pb, and resources R. The individual
cannot improve their level of utility without varying the total amount of
resources and/or the unit prices of the two modal trips. In Figure 2.4 we
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:21 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.004
Cambridge Books Online © Cambridge University Press, 2015
24 Getting started
Expenditure on
commodity Y
R1 R3
R2
I3
I2
I1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:21 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.004
Cambridge Books Online © Cambridge University Press, 2015
25 Choosing
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:21 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.004
Cambridge Books Online © Cambridge University Press, 2015
26 Getting started
Expenditure
on bus travel
a R1 R2
R3
I3
I2
I1
Px1
Px
Px2
Figure 2.6 has provided the necessary information to derive one demand
curve for an individual. Movements along this demand curve are attributed to
changes in the price of bus travel, all other considerations held constant.
However other influences may change from time to time for many reasons,
and then we have a problem in interpreting movements along a given demand
curve. Simply put, we cannot observe movements along a given demand curve
(called change in the quantity demanded) when something other than what
is on the vertical axis changes. If an individual’s personal income increases,
we might expect a change in the quantity of bus travel because the additional
income might enable the individual to buy a car and switch from bus to car
travel for some of the trips per unit of time. What we now have is more than
a change in the quantity of bus travel; we also have a change in the level of
demand.
That is, the demand curve itself will change, resulting in an additional
demand curve that accounts for both the reduction in bus travel and the
increase in car travel. Since Figure 2.6 is focused on bus travel, we will only
be able to observe the change in bus travel in this diagram. We will also need
an additional diagram (Figure 2.7) to show the amount of travel that has
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:21 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.004
Cambridge Books Online © Cambridge University Press, 2015
27 Choosing
Car
Bus fare running Change
cost in
a Change in b
quantity demand
demanded
Change
in D4
demand
D2 D1 D3
no. of bus trips no. of car trips
per unit of time per unit of time
Figure 2.7 Changes in demand and changes in quantity demanded
We are now able to focus again on choice analysis, armed with some very
important constructs. The most important is an awareness of how we can
identify an individual’s preferences for specific alternatives, and the types of
constraints that might limit the alternatives that can be chosen. The shapes
of the preference curves will vary from individual to individual, alternative to
alternative, and even at different points in time. Identifying the influences
molding a specific set of preferences is central to choice analysis. Together
with the constraints that limit the region within which preferences can be
honored, and the behavioral decision rule used to process all inputs, we
establish a choice outcome.
The challenge for the analyst is to find a way of identifying, capturing, and
using as much of the information that an individual takes on board when
they process a situation leading to a choice. There is a lot of information, much
of which the analyst is unlikely to observe. Although knowledge of an indivi-
dual’s choice, and the factors influencing it, is central to choice analysis,
the real game is in being able to explain choices made by a population of
individuals. When we go beyond an individual to a group of individuals (as a
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:21 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.004
Cambridge Books Online © Cambridge University Press, 2015
28 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:21 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.004
Cambridge Books Online © Cambridge University Press, 2015
29 Choosing
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:21 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.004
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
To call in the statistician after the experiment is done may be no more than asking
him to perform a post-mortem examination: he may be able to say what the experi-
ment died of.
(Ronald Fisher)
3.1 Introduction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
31 Choice and utility
level (e.g., transport or health). Truong and Hensher (2012), among others,
develop the theoretical linkages between discrete choice models and continuous
choice models, where discrete choice models focus on the structure of tastes or
preferences at the individual level, while continuous demand models can be
used to describe the interactions between these preferences at the industry or
sectoral level, extendable to an entire economy.
Working with disaggregate level data poses many challenges that those
working with more aggregate level data might not have to worry about. In
particular, disaggregate level data requires that data be captured pertaining to
the specific context within which each observed decision was made (for aggre-
gate level data, it is usually sufficient to know that X number of widgets were sold
last month, Y the month before. The circumstance under which each and every
widget was purchased is of little to no relevance, although the average price per
widget, etc. might also come in handy). As such, the analyst may need to capture
data related to the decision context related to each observed choice (e.g., was a
trip made for work or non-work, purposes), the alternatives that were available
to the decision maker at the time the choice was made (e.g., a bus, train, and car,
or just a bus and train), relevant variables that describe those same alternatives
which may have influenced the choice (e.g., the times and costs of the various
modes), as well as the characteristics of the individual decision makers them-
selves (e.g., their age, gender, income, etc.).
The decision context will typically be used as a segmentation instrument, with
different choice models estimated for different decision contexts. In some
instances, however, data collected over multiple decision contexts may be pooled
and the decision context used as an explanatory variable used to help explain
differences in choice patterns. The alternatives (which are also referred to as
profiles or treatment combinations, depending on what literature one reads),
and the variables that describe them (when a variable relates to an alternative, we
use the term attribute) group together to form what is known as either a choice
situation (which we will adopt throughout), choice set, choice task, choice
observation, profile combination, or even a run. Any information about the
characteristics of the decision makers may also be used as explanatory variables to
help explain differences in observed choices over the sampled population.
Of particular importance to the modeling process are the choice situations
consisting of information related to the various alternatives that were available
at the time a choice was observed to have been made (one can still estimate a
choice model without knowing anything about the specific choice context or
the decision makers, although having such information may result in better
modeling outcomes). Discrete choice models require that each choice
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
32 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
33 Choice and utility
25
20
15
Y
10
0
1 1.5 2 2.5 3 3.5 4 4.5 5
X
Figure 3.1 Example: log versus linear relationship
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
34 Getting started
line. The relationship between X and log(Y) is a much better linear approxima-
tion than the relationship between X and Y, suggesting that the log(Y) and not Y
should be used in any linear regression model estimated on this data.
While it may be possible to transform the data to ensure that the linearity
assumption is maintained, one assumption of the linear regression model that
cannot be so easily overcome is that the dependent variable must be contin-
uous in nature, such that Yn 2 (−∞, ∞). No such assumptions are necessary
with regards to the independent variables, however. In many cases, the
dependent variable of interest will not be continuous, but rather take a finite
number of discrete values. In Chapter 2, we suggested just such a dependent
variable; that being some form of discrete choice, where one alternative out of
a set is observed to be chosen. Assuming that Yn takes the value one if
alternative n is chosen, or zero otherwise, the dependent variable is repre-
sented as a categorical or dichotomous variable and hence definitely cannot be
treated as if it is truly continuous (i.e., Yn 2 (0, 1)). It is important to note,
however, that despite our use of the term “discrete choice,” as in discrete
choice models, the underlying econometric models which represent the focus
of this book can be applied to any data where the dependent variable is
categorical, and not just choice data. In this sense, the models we present
are far more flexible in terms of the data to which they may be applied than
otherwise might seem the case, and in some literature they are referred to as
categorical dependent variable models.
To understand the concern with using linear regression models for data
involving categorical dependent variables, and to further demonstrate that the
methods discussed within this book can be extended beyond disaggregate level
choice data, consider as an example the party make-up of The United States
Congress. Each State is divided into voting districts representing approxi-
mately 700,000 people who elect a Congress person to act as their representa-
tive, typically belonging to either the Republican or Democratic Party.
Suppose a researcher was interested in determining whether districts with a
larger proportion of minority groups (i.e., persons who identify as being non-
Caucasian) are more likely to be represented by a Democratic Party member
in Congress. Plotted in Figure 3.2 is data on congressional party affiliation
against the proportion of the population considered to be from minority
groups for each of the 168 congressional voting districts.1 As shown in
Figure 3.2, the Y values take only the value zero (Republican Party) or one
1
The data was obtained from http://ballotpedia.org/Portal:Congress accessed on 17 October 2013. Data on
430 voting districts was used for the analysis due to missing data related to five voting districts.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
35 Choice and utility
Par. (t-ratio)
1
Political party (1= Democrat 0 = Republican)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Proportion of minority groups in population
Figure 3.2 Plot of Congressional party affiliation against proportion of minority groups in voting districts
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
36 Getting started
1.6
1.2
0.8
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Proportion of minority groups in population
Figure 3.3 Linear regression model of Congressional party affiliation against proportion of minority groups in voting
districts
was estimated using Nlogit 5.0. The syntax for this and other models are
provided in Appendix 3B. The syntax for estimating models in Nlogit will be
explained further in later chapters.
Figure 3.3 plots the linear regression line based on this model. A number
of issues becomes readily apparent. Firstly, given the fact that the results of a
linear regression model should be interpreted as if the dependent variable is
continuous, the model will predict non-zero-one outcomes. Thus for exam-
ple, the model will predict for a district with a 50 percent non-white
population to have a congressional representative belonging to a party
equivalent to 0.810. While it might be tempting to treat this outcome as a
probability, and suggest that as the result is closer to one, the district is more
likely to have a Democratic Party member as their congressional represen-
tative than a Republican, the linear regression model should not be inter-
preted this way, given that the regression line is continuous and hence the
model is in fact predicting a congressional member belonging to a party
coded as 0.810. The second concern with using linear regression models on
data with a categorical dependent variable can also be seen clearly from
Figure 3.3: the model may potentially predict outcomes outside of the zero
and one range (whether it does will depend on the parameter estimates, and
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
37 Choice and utility
the values of the X variable). For the present example, a district with only
70 percent of the population identifying as being from a minority group will
be predicted as having a party affiliated Congress person of 1.097 (noting
that 70 percent falls within the data range). This further substantiates the
fact that the previous prediction of 0.810 should not be treated as if it is a
probability of party representation, as probabilities by definition are con-
strained to be between zero and one. It is worth noting that there are models
such as tobit regression that ensure compliance with lower and upper limit
constraints; however this does not change the issue of interpretation of the
predictions as probabilities.
To resolve the above issues, it is necessary to transform the dichotomous
dependent variable into a continuous variable. As noted above, transforma-
tions of the dependent variable are possible when using linear regression
models, provided that the dependent variable remains (or becomes) contin-
uous. Indeed, numerous studies have employed transformed dependent vari-
ables in the past. For example, as discussed above, a common transformation
involves taking the log of the dependent variable and using this in the
regression model, as shown in Equation (3.2):
where 0 is used to indicate that the estimates obtained in Equation (3.2) would
be expected to differ from those obtained from a model estimated as per
Equation (3.1).
Where a transformation of the dependent variable is used, it is not possible
to directly calculate the impact of the independent variables upon the depen-
dent variable Y due to the transformative process used. It is therefore neces-
sary to retrieve the relationship mathematically. For example, given the above,
consider the impact upon Y given a one unit change in xk. In this case, a Δxkn
0
will result in a expðβk xkn Þ change in Y, whereas previously a change in xk is
predicted to result in a βk change in Y.
The transformation of a dependent variable results in what is sometimes
referred to as a link function. Consider Equation (3.3):
where f(Yn) is any transformative function of Y producing Yn . Here, Yn is used
in model estimation and not Yn where the f(Yn) is the transformative link
between the two values. In the previous example, the f(Yn) = log (Yn).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
38 Getting started
0.4 1
0.9
0.35
0.8
0.3
0.7
Cumulative probability
0.25
0.6
Probability
0.2 0.5
0.4
0.15
0.3
0.1
0.2
0.05
0.1
0 0
–5 –4 –3 –2 –1 0 1 2 3 4 5 –5 –4 –3 –2 –1 0 1 2 3 4 5
Y* Y*
(a) Probability density function (b) Cumulative density function
Figure 3.4 PDF and CDFs for Normal distribution: 1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
39 Choice and utility
some density (represented by its PDF) will be located at some value less
than or equal to Yn.
For example, assume that Y is a random variable drawn from a standard
normal distribution and that the analyst is interested in calculating the prob-
ability that Y is observed to take on a value less than or equal to some known
value, say Yn ¼ 1:0. This is graphically represented in Figure 3.5(a) sub-panel
(a) as the area under the PDF to the left of Yn ¼ 1:0. The precise probability of
this occurring assuming that Y is drawn from a standard normal distribution is
0.1587, which is calculated from the CDF shown in sub-panel (b). Likewise, the
probability that Y is observed to take on a value less than or equal to some
known value, Yn ¼ 1:0 is 0.8413, as shown in Figure 3.5(b).
Relating this back to the problem at hand, assuming the right hand side of
our regression equation is normally distributed, then:
where Φ is the Greek symbol (upper case) Phi, representing the CDF of the
Normal distribution.
0.4 1
0.9
0.35
0.8
0.3
0.7
Cumulative probability
0.25
0.6
Probability
0.2 0.5
0.4
0.15
0.3
0.1
0.2
0.05
0.1
0 0
–5 –4 –3 –2 –1 0 1 2 3 4 5 –5 –4 –3 –2 –1 0 1 2 3 4 5
Y* Y*
(a) Probability density function (b) Cumulative density function
Figure 3.5 PDF and CDFs for Normal distribution: 2
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
40 Getting started
0.4 1
0.9
0.35
0.8
0.3
0.7
Cumulative probability
0.25
0.6
Probability
0.2 0.5
0.4
0.15
0.3
0.1
0.2
0.05
0.1
0 0
–5 –4 –3 –2 –1 0 1 2 3 4 5 –5 –4 –3 –2 –1 0 1 2 3 4 5
Y* Y*
(a) Probability density function (b) Cumulative density function
Figure 3.5 (cont.)
In Equation (3.4), the link function is f (Yn) = Φ−1(Yn), which has been termed
the probability unit link function, subsequently shortened to probit, and the
resulting model is termed the probit model. Y represents a latent variable
which is assumed to be Normally distributed.
In the case of a binary outcome, the probit model is known as a bivariate
probit model, and Equation (3.4) relates to the outcome coded as outcome 1.
To demonstrate, Table 3.2 presents the results of a probit estimated on the
previously described data. We discuss the probit model in more detail in
Chapter 4; however, it is sufficient to state at this time that it is possible to
substitute values for the independent variables, much like with a linear
regression model, to calculate the value of the latent variable Y .
For example, assuming that a voting district had 50 percent of its residents
identify as belonging to a minority group, then Y in the above example would
be equal to 1.899. Given that Y is assumed to be distributed standard normal,
this value can be treated as if it were a Z-score. Hence, in this instance, the model
predicts that a voting district made up of 50 percent minority groups has a
probability of 0.88 of having a representative belonging to the party coded as 1,
the Democratic Party. Given that probabilities sum to one, it is easy to calculate
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
41 Choice and utility
Par. (t-ratio)
0.9
Probability of Democrat Congress person
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Proportion of minority groups in population
Figure 3.6 Probit model of Congressional party affiliation against proportion of minority groups in voting districts
that the same district will have a 0.12 probability of having a Republican Party
member as its elective representative. Likewise, the reader can confirm that a
district with a 70 percent minority group make-up will have a 0.98 probability of
having a Democratic Party member as its elective representative.
Unlike the linear regression model, the probit model predictions derived
from the latent variable Y are probabilities and therefore bounded between
zero and one. Plotting the probit probabilities over the range of the x variable
results in an S-shaped curve known as a sigmoidal curve. The sigmoidal curve
for the model estimated above is shown in Figure 3.6. As shown in Figure 3.6, a
district with 50 percent of its population identifying as belonging to a minority
group will have a 0.88 probability of having a Congress person belonging to
the party coded as 1, in this case the Democrat Party.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
42 Getting started
0.6
0.5
–3.000 + 6.000 × min %
0.4
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Proportion of minority groups in population
Figure 3.7 Sigmoid curve examples of alternative probit models
The shape of the sigmoidal curve will depend on the parameter estimates.
Figure 3.7 plots the sigmoidal curve of the bivariate probit under various
assumptions of the parameter estimates. As can be seen in the plot, the slope,
and where the curve connects with the upper and lower x-axis, will depend on
both the constant terms and parameter estimates associated with the inde-
pendent variable.
An alternative model to the probit is the logit model. As with the probit
model, we discuss the logit model in more detail in Chapter 4; however, it is
sufficient to state at this time that the logit model is related to what are known
as odds ratios. The odds ratio represents the odds of an outcome occurring
based on some known probability. The odds of an outcome occurring may be
calculated as per Equation (3.5):
p
OddsðYÞ ¼ ; ð3:5Þ
1 p
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
43 Choice and utility
p Odds(Y) log(Odds(Y))
0 0 −∞
0.1 1/9 −2.197
0.2 1/4 −1.386
0.3 3/7 −0.847
0.4 2/3 −0.405
0.5 1 0.000
0.6 1 and 1/2 0.405
0.7 2 and 1/3 0.847
0.8 4 1.386
0.9 9 2.197
1 ∞ ∞
For the logit model, the link function used is the log of the odds ratio.
Algebraically the logit probabilities for the binary outcome case can be derived
as per Equation (3.6). As stated above, we discuss the logit model in more
detail in Chapter 4, where we extend it to the multinomial case. We also
discuss the multinomial case in Section 3.3:
pn
log ¼ Yn ¼ β0 þ β1 x1n þ β2 x2n þ . . . þ βk xkn þ en
1 pn
pn
¼ expðYn Þ
1 pn
pn ¼ ð1 pn ÞexpðYn Þ
ð3:6Þ
pn ¼ expðYn Þ expðYn Þpn
pn 1 þ expðYn Þ ¼ expðYn Þ
expðYn Þ
pn ¼
1 þ expðYn Þ
Table 3.4 presents the results of a logit model based on the same data
described above. The logit formula (for the binary case) is given in Equation
(3.6). Assuming a voting district with a 50 percent non-white population, the
latent variable Y will be equal to 1.899 based on the reported model results.
As with the probit model, the logit model predicts the probability of an
outcome occurring. Substituting 1.899 into Equation (3.6) leads to a predic-
tion of 0.87 that the district will have a Democrat as its elected Congressional
member.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
44 Getting started
Par. (t-ratio)
0.9
Probability of Democrat Congress person
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Proportion of minority groups in population
Figure 3.8 Logit and probit models of Congressional party affiliation against proportion of minority groups in voting
districts
Similar to the probit model, the logit model probabilities produce a sigmoi-
dal curve over the range of the independent variables. The sigmoidal prob-
abilities for the above model are shown in Figure 3.8 as a continuous line. Also
shown in the figure is the sigmoidal curve for the probit model reported in
Table 3.2 (shown as a dashed line). As can be seen in the figure, the two models
tend to predict very similar results, with the main differences being close to the
zero and one probabilities.
We return to discuss both the probit and logit models in Chapter 4. In
Section 3.3, with the preceding background, we return to a discussion of utility
and choice.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
45 Choice and utility
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
46 Getting started
apples is now measured as zero while the utility for oranges is –5. Once again,
all we are able to conclude under ordinal utility theory is that the person
prefers apples to oranges, and not that they are indifferent to apples but have a
dislike (what is referred to as a disutility) for oranges. For this reason, when
dealing with ordinal utility theory, it is common to refer to relative utility, as
the absolute value of utility is meaningless. The distinction between cardinal
and ordinal utility is important here as the utilities derived from discrete
choice models are interpreted as being ordinal.
The fact that utilities obtained from discrete choice models are measured on
an ordinal scale is important in that it implies that only differences in utility
matter, not the absolute value of utility. In making this statement, however, it
is necessary to make a distinction between the level and scale of utility. The
level of utility represents the absolute value of utility. Adding or subtracting a
constant to/from the utilities of all J alternatives, while changing the level of
utility, will maintain the relative differences of utility between each of the
alternatives. For example, subtracting 10 from the utilities of both the apple
and orange alternatives above did not change the differences in utility between
either. The scale of utility refers to the relative magnitude of utility. Consider
an example where the utilities of all J alternatives are multiplied by the same
value. The resulting utilities will not change in terms of their relative pre-
ference rankings; however, the utility differences will change. For example,
multiplying the utility for both apples and oranges by 2 will produce a utility of
20 for apples and 10 for oranges such that apples are still preferred to oranges,
but the relative difference in the utilities for the two alternatives is now 10.
Table 3.5 demonstrates the difference between utility level and utility scale for
the apple and orange example.
Level Scale
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
47 Choice and utility
The fact that only differences in utility matter has a number of important
implications in terms of the identification of discrete choice models. Firstly, it
is only possible to estimate parameters when there exist differences across the
alternatives. This has important ramifications for the estimation of model
constants and covariates, which we discuss later. Secondly, while the scale of
utility does not matter, in that multiplying the utilities of all of the alternatives
by the same amount will not change the relative preference rankings, it does
play an important role econometrically. Consider Equation (3.8), in which we
multiply utility by some positive amount (i.e., λ > 0):
What becomes apparent from Equation (3.8) is that the scale of the observed
component of utility is intrinsically linked to that of the unobserved compo-
nent, in that both components are affected. We will discuss the unobserved
effects in detail in Section 3.4; however, for the present it is sufficient to state
that the unobserved effects are assumed to be randomly distributed with some
density. Given this fact, it is easy to show that the scale of the observed
component of utility will necessarily affect both the mean and variance of
the unobserved component of utility. In the latter case, by precisely λ2, given
that varðλensj Þ ¼ λ2 varðensj Þ:
Note that Equations (3.7) and (3.8) make use of subscript j. This implies
that each alternative, j, present within a person’s choice set, will have its own
utility function. To relate this back to our discussion in Section 3.2, we note
that each of the models was represented by a single equation even though
there existed two possible outcomes; a Democratic or Republican Party
representative. As should be clear now from the discussion above, the latent
variable for the outcome coded as zero in Section 3.2 was simply normalized to
zero and it was for this reason that the Equations related to the outcome coded
as one. In the case of binary outcomes, it is necessary to normalize the utility of
one alternative to zero. This is because there will exist an infinite number of
utility functions that may reproduce the same relative utility differences.
In many instances, however, there will exist more than one possible
mutually exclusive non-continuous outcome. For example, there might exist
a third political party that could be voted in, or a person may potentially have
three or more possible modes of transport to choose from when travelling to
work. In such cases, each alternative will have its own unique utility function,
one of which may or may not be normalized to zero. How and when to
normalize utility will form a large basis of the remainder of this chapter, where
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
48 Getting started
K
X
Vnsj ¼ βk xnsjk : ð3:10Þ
k¼1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
49 Choice and utility
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
50 Getting started
K
X
Vns;car ¼ βk xns;car;k;
k¼1
K
X
Vns;bus ¼ βk xns;bus;k: ð3:11Þ
k¼1
Utility functions may also be specified to contain what are termed alternative-
specific parameter (ASP) estimates. An alternative-specific parameter is one
which is allowed to differ across alternatives (and hence is not constrained to
be the same, as with generic parameter estimates; i.e., βkj is not constrained to
equal βki). An example of this is given in Equation (3.12), where the parameter
estimates associated with the two different utility functions are represented by
different subscripts:
K
X
Vns;car ¼ βk;car xns;car;k ;
k¼1
K
X
Vns;bus ¼ βk;bus xns;bus;k : ð3:12Þ
k¼1
As with generic parameter estimates, the analyst may wish to specify alternative-
specific parameter estimates for a number of reasons. Firstly, the data may
suggest that the amount of (dis)-utility a decision maker will obtain for a specific
attribute is not uniform across alternatives (e.g., a minute spent in an air-
conditioned car with a functioning radio may be worth more to a person
travelling to work on a hot summer day than a minute spent on an overcrowded
non-air-conditioned bus full of commuters who forgot to bathe in the past
week). Note, however, that the specification of alternative-specific parameter
estimates does not preclude the resulting estimates being statistically equal to
one another. Second, an attribute may belong to one, or a subset of alternatives,
and hence cannot by definition belong to all J alternatives (e.g., for a specific trip,
a decision maker may be faced with a decision to use their car and pay for petrol,
possible tolls and parking, or take the bus and pay the bus fare (it is unlikely they
will have to pay for tolls or parking the bus, however)). In such cases, the
parameter associated with a specific attribute may be non-zero for some alter-
natives (although it could be statistically equal to zero also), but zero for other
alternatives simply due to the non-presence of that attribute.
In some instances, it is possible that a subset of the parameters will be
generic, while others will be alternative-specific. In Equation (3.13), for
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
51 Choice and utility
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
52 Getting started
10, and 8, respectively. Firstly, this implies that individuals prefer cars to buses
and buses to trains. Second, the difference between the utility for car and bus is
5, car and train 7, and bus and train 2. If we were to change the utility levels by
subtracting 8 from the utility of each alternative, the utilities now become 7, 2
and zero. Note that the relative preference rankings between the alternatives
remain the same as, too, the differences between the utilities. Likewise, chan-
ging the utility levels to be −85, −90, and −92 by subtracting 100 from each of
the original values reproduces the exact same result in terms of the relative
utility differences. For this reason, in estimating the model, the analyst will
need to set the level of utility by normalizing at least one of the ASCs to some
arbitrary value, with the most common value selected being zero. Once more,
this does not imply preference indifference for the selected alternative as one can
only consider relative utility values for two or more alternatives. Further, the fact
that utility is measured on an ordinal scale implies that the choice of which
alternative to normalize the ASC to zero does not matter, as the utility levels
implied by the ASCs will simply adjust by adding or subtracting the same value to
the utility of each of the alternatives to reproduce the same utility differences. For
example, assuming the ASCs for car, bus and train are 2, 1, and 0, respectively
(where the train ASC has been set to zero), then the same differences in utility will
be maintained if we set the ASC for the bus alternative to be zero, such that the
ASCs would now be 1, 0 and −1, hence preserving the relative differences.
Taking our earlier example, Equation (3.15) allows for ASCs for the car,
bus, and train alternatives that will be relative to the tram alternative. Note
that, as with any other parameter, constants may be also made to be generic
across two or more alternatives. Treating constants as generic parameters,
however, should only be done if, empirically, the ASCs for two or more
alternatives are found to be statistically equivalent:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
53 Choice and utility
Given that utility is ordinal, which alternative the ASCs are associated with
does not matter. As such, results obtained from models based on Equations
(3.16a) and (3.16b) will functionally be equivalent to one another.
Where a status quo alternative is present, the alternative will typically have
attributes that will take specific levels (e.g., the current apartment where a
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
54 Getting started
person lives will have some level of rental, number of bedrooms, etc.). As such,
unlike a “no choice” alternative, status quo alternatives will typically be
described by a set of attributes. Similar to situations involving a “no choice”
alternative, the analyst is free to select which alternatives to specify ASCs for,
provided they belong to no more than J−1 of the alternatives. Hence, the status
quo alternative may either not include an ASC (as in Equation 3.17a) or may
include an ASC (as in Equation 3.17b), with the choice of where the ASCs are
placed being once again completely arbitrary:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
55 Choice and utility
K
X
Vns1 ¼ βk xns1k þ δ1 vn ;
k¼1
K
X
Vns2 ¼ βk xns 2k þ δ 2 vn : ð3:18Þ
k¼1
Vns;car ¼ β0;car þ β1;car timens;car þ β2;car tollns;car þ β3;car parking costns;car þ β4;car agen ;
Vns;bus ¼ β0;bus þ β1;pt timens;bus þ β2;bus farens;bus þ β3;bus waiting timens;bus þ β4;bus agen
Vns;train ¼ β0;train þ β1;pt timens;train þ β2;rail farens;train þ β3;train season þ β4;rail femalen ;
Vns;tram ¼ β1;pt timens;tram þ β2;rail farens;tram þ β3;tram agen þ β4;rail femalen :
ð3:19Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
56 Getting started
comparing across the utility functions, one can interpret the parameter as
suggesting that relative to the train alternative (where the parameter has been
constrained to be equal to zero), older decision makers have a higher utility for
the car alternative, all else being equal. Generic covariate parameters may
similarly be interpreted.
The presence of a no choice or status quo alternative offers a number of
options to the analyst where covariates are concerned. As with ASCs, the
analyst may allow a covariate to enter into all utility functions excluding the
no choice (or status quo) alternative and estimate either generic or alternative-
specific parameter estimates for it. For example, in Equation (3.20a), age now
enters into the utility functions of non-no choice alternatives with a generic
parameter. In this case, β4 will be interpreted relative to the no choice
alternative with a positive parameter, suggesting that older decision makers
are more likely to choose one of the travel modes relative to not travelling, all
else being equal, while a negative parameter suggests the opposite to be true.
Of course, age could be allowed to have alternative-specific parameter esti-
mates, in which case the degree to which age influences the preferences of the
various travel modes is assumed to be different relative to the no choice
alternative:
Vns;car ¼ β0;car þ β1;car timens;car þ β2;car tollns;car þ β3;car parking costns;car þ β4 agen ;
Vns;bus ¼ β0;bus þ β1;pt timens;bus þ β2;bus farens;bus þ β3;bus waiting timens;bus þ β4 agen
Vns;train ¼ β0;train þ β1;pt timens;train þ β2;rail farens;train þ β4 agen ;
Vns;tram ¼ β1;pt timens;tram þ β2;rail farens;tram þ β4 agen ;
Vns;no choice ¼ β0;no choice : ð3:20aÞ
In Equation (3.20b), age enters into the no choice alternative only. Here, the
parameter estimate will be interpreted relative to the non-no choice alterna-
tives such that a positive age parameter is indicative of older decision makers
having a preference not to travel via one of the travel modes, while a negative
parameter suggests that older decision makers prefer to select one of the travel
modes. Note that Equations (3.20a) and (3.20b) effectively tell the same story
and indeed, β4 from Equation (3.20a) would be the same as β4,no choice
from Equation (3.20b); however, the sign will be reversed. That is, we would
expect that β4 ¼ β4;no choice : This relationship will only hold, however, if the
parameter for age in Equation (3.20a) is generic across all of the non-no choice
alternatives:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
57 Choice and utility
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
58 Getting started
the variables, such as taking the log of a variable (i.e., logðxnsjk Þ) or squaring it
2
(i.e., xnsjk ) prior to it entering into the utility functions of one or more of the
alternatives. For example, consider Equation (3.21), representing the utility
functions for two alternatives, car and bus. For the car alternative, toll is
2
entered into the utility function both as xnsjk and xnsjk while time enters into
both the car and bus alternatives as the square of the original attribute. Similar
to Train (1978), the fare attribute associated with the bus alternative is divided
by income to reflect the fact that decision makers with different incomes
may have a different marginal utility associated with the fare attribute.
Further, the age variable is assumed to enter into the bus alternative as the
natural log of age:
Vns;car ¼ β0;car þ β1;car time2ns;car þ β2;car tollns;car þ β3;car toll2ns;car þ β4;car parking costns;car ;
farens;bus
Vns;bus ¼ β1;pt time2ns;bus þ β2;bus þ β3;bus waiting timens;bus þ β4;bus logðagen Þ:
incomen
ð3:21Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
59 Choice and utility
above example, one cannot consider the impact of fare upon the utility of bus,
without jointly considering the decision maker’s income. One common trick,
which we already introduced in our discussion surrounding Equation (3.19)
without comment, is to invoke the ceteris paribus assumption. Ceteris paribus
is a Latin phrase adopted by economists, which translates to “all else being
equal” or “all else being held constant.” For example, one could state that
holding income constant, as fare changes, utility for bus will change by β2;bus .
Likewise, one could interpret the impact of income upon utility holding fare
constant. For variables transformed using the Box–Cox transformation, one
cannot interpret the results without considering the value of the λ parameter.
This is because the value of each data point will depend on the value of γ. We
return to discussing how to correctly interpret the parameters for non-linear
transformed variables in Section 3.4.5.2.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
60 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
61 Choice and utility
2.5
Utility (Train) 2
1.5
0.5
0
(1) Summer (2) Autumn (3) Winter (4) Spring
Season
Figure 3.9 Marginal utility for season (linear coding)
the resulting parameter estimate may become a function of how the variable
is coded. If, for example, the season variable was coded 1 = autumn (fall),
2 = summer, 3 = winter and 4 = spring, then the slope of the line, that is the
parameter estimate, may be very different, leading the analyst to a completely
different interpretation of the relationship between season and the utility
obtained for the train alternative.
Several non-linear coding schemes exist; however for the sake of brevity we
limit ourselves to a discussion of only three such schemes, these being dummy
coding, effects coding and orthogonal polynomial coding schemes. All three
schemes allow for a non-linear relationship between the levels of attributes
and utility, and in each case involve the analyst constructing a number of new
variables being recoded. For each non-linear coding scheme, the number of
new variables created will be equivalent to the number of levels associated with
that attribute, lk, minus one. Thus, taking our previous season example,
the variable for season has four levels (i.e., lk = 4), hence, the recoding of the
season variable into any of the non-linear coding schemes will require the
creation of three new variables (i.e., lk − 1 = 3). When using dummy coding,
each newly constructed variable will be associated with one of the original
levels, taking the value 1 if that level appears in the data, or zero otherwise.
For example, let xnsjk1 ; xnsjk2 , and xnsjk3 represent the newly constructed
variables for our season example. Although any mapping is possible, let us
assume that summer is associated with xnsjk1 ; autumn (fall) with xnsjk2 ; and
winter with xnsjk3. If in the data, a choice observation was recorded in the
summer, then xnsjk1 will take the value one while xnsjk2 and xnsjk3 will both take
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
62 Getting started
the values zero. If, on the other hand, a choice situation occurred in winter, then
xnsjk3 will take the value one while xnsjk1 and xnsjk2 will both take the values zero.
Dummy or effects coding differ only in how the last level, referred to as the
base level, is coded. In dummy coding, the base level receives a value of zero for
each of the newly constructed dummy coded variables. Thus, if the mode choice
was made in spring, then xnsjk1 ; xnsjk2 , and xnsjk3 will simultaneously be coded as
zero. In effects coding, however, the base level receives a value of minus one (−1)
for each of the newly constructed effects coded variables, such that xnsjk1 ; xnsjk2 ,
and xnsjk3 will simultaneously be coded as minus one if the choice situation were
recorded in spring. The dummy or effects coding schemes for up to seven levels
are given in Table 3.6, panels (a) and (b), respectively.
The reason we only require lk – 1 dummy or effects codes is due to
colinearity between the resulting data if all lk variables are constructed and
used. Typically, most people think of correlation as being bivariate, that is
between two random variables (e.g., ρ(xnsjk, xnsjl)). Mathematically, however,
correlation can exist between linear combinations of multiple variables (e.g.,
ρ(xnsjk, xnsjl + xnsjm)). Unfortunately, dummy and effects coding produce
perfect correlations between the resulting variables. Consider by way of
example a three level variable where the analyst constructs three dummy or
effects coded variables. To demonstrate the issue, if we were to observe that for
the first two dummy or effects coded variables, one of them took the value one,
we would know immediately that the third column would have to take the
value zero. On the other hand, if we observed that neither of the first two
variables took the value one, then by deduction, the last variable must be equal
to one. As such, for both the dummy and effects coding schemes, if we know
the values of lk – 1 variables, we know the value of the last Lk dummy or effects
coded variable. Indeed, one of the variables will always be redundant in terms
of the information provided.
Equation (3.24) represents the utility function for the train alternative based
on Equation (3.19) assuming that the season variable has now been either
dummy or effects coded. Note that in writing out the new utility function, we
have dropped the train subscript for the sake of expediency:
Vns ¼ β0 þ β1 timens þ β2 farens þ β31 xsummer þ β32 xn;autumn þ β33 xn;winter þ β4 femalen :
ð3:24Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
Table 3.6 Non-linear coding schemes
lk 2 3 4 5 6 7
(a) Dummy coding
Xnsjk1 Xnsjk1 Xnsjk2 Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk4 Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk4 Xnsjk5 Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk4 Xnsjk5 Xnsjk6
1 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0
2 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0
3 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0
4 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0
5 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0
6 0 0 0 0 0 0 0 0 0 0 1
7 0 0 0 0 0 0
(b) Effects coding
Xnsjk1 Xnsjk1 Xnsjk2 Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk4 Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk4 Xnsjk5 Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk4 Xnsjk5 Xnsjk6
1 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0
2 −1 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0
3 −1 −1 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0
4 −1 −1 −1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0
5 −1 −1 −1 −1 0 0 0 0 1 0 0 0 0 1 0
6 −1 −1 −1 −1 −1 0 0 0 0 0 1
7 −1 −1 −1 −1 −1 −1
(c) Orthogonal polynomial coding
Effect: Linear Linear Quadratic Linear Quadratic Cubic Linear Quadratic Cubic Quartic Linear Quadratic Cubic Quartic Quintic Linear Quadratic Cubic Quartic Quintic Sextic
Xnsjk1 Xnsjk1 Xnsjk2 Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk4 Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk4 Xnsjk5 Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk4 Xnsjk5 Xnsjk6
1 −1 −1 1 −3 1 −1 −2 2 −1 1 −5 5 −5 1 −1 −3 5 −1 3 −1 1
2 1 0 −2 −1 −1 3 −1 −1 2 −4 −3 −1 7 −3 5 −2 0 1 −7 4 −6
3 1 1 1 −1 −3 0 −2 0 6 −1 −4 4 2 −10 −1 −3 1 1 −5 15
4 3 1 1 1 −1 −2 −4 1 −4 −4 2 10 0 −4 0 6 0 −20
5 2 2 1 1 3 −1 −7 −3 −5 1 −3 −1 1 5 15
6 5 5 5 1 1 2 0 −1 −7 −4 −6
7 3 5 1 3 1 1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
64 Getting started
take the value one, while xautumn and xwinter will both take the value zero, and
ceteris paribus, the seasonal impact on the utility for the train alternative will
be equivalent to β31. Assuming that a choice observation was recorded to have
occurred in winter, however, the variable xwinter will now take the value one
while xsummer and xautumn will both simultaneously be equal to zero. Under this
scenario, the seasonal impact on the utility for the train alternative will now be
equal to β33, ceteris paribus. The substitution of values differs only for the base
level for dummy and effects coded variables. Assuming that the data has been
dummy coded, the base level associated with spring will requires that xsummer,
xautumn, and xwinter be simultaneously equal to zero, and hence the utility will
be equal to zero, all else being equal.
It is important to note that, given that utility is measured on an ordinal scale, a
utility of zero does not mean that the decision maker is indifferent to or has no
preference for spring. Rather, the other parameters will be estimated relative to
this base dummy coded level. Given that the ASC of an alternative represents the
average of the unobserved or un-modeled effects for that alternative, the fact that
the base level is forced to have a marginal utility of zero has led some researchers
to suggest that the marginal utility for the base level of a dummy coded variable,
not being independently measured, is perfectly confounded with the ASC for that
alternative. As such, some researchers add the ASC to the resulting marginal
utilities when dealing with dummy coded variables.
Unlike dummy coded variables, effects coded variables do not take a zero
value for the base level, but rather minus one. Taking our season example, the
variables xsummer, xautumn, and xwinter will simultaneously be assigned values of
minus one such that the seasonal impact on the utility for the train alternative
will now be equal to β31 β32 β33 ; all else being equal. As such, the base
level of an effects coded variable will produce a unique utility value which is no
longer perfectly confounded with the alternatives ASC. This is one of the
reasons why many researchers prefer to use effects coding rather than dummy
coding. Table 3.7 summarizes this discussion.
A second reason that effects coding is generally preferred to dummy coding
lies in what happens when two variables are non-linearly coded. Assume now
that both the season and female variables in Equation (3.24) are dummy coded
(hence male is coded as 0 for the later variable). As seen above, the marginal
utility for spring has been normalized to be equal to zero within the season
dummy coding scheme; however, so too has male for the gender-related
variable. Both base levels have been normalized to the same value, and both
are perfectly confounded with the model’s ASCs. As such, one cannot say
whether being male or making a choice in spring will have a greater impact on
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
65 Choice and utility
the utility for train, as both effects have been forced to be equal. If both season
and gender are effects coded, however, then the base level for season will be
equal to β31 β32 β33 ; while the marginal utility associated with being
male will be β4 : As such, the base levels for both variables now take on
unique values, and one can compare β31 β32 β33 to β4 to determine
which has the greater overall impact on utility (Table 3.8).
As an aside, in models where all variables are effects coded, the ASC of an alternative and
the effects coded variables will have a very specific interpretation. To demonstrate, we show
how the effects codes and ASCs of just such a model are estimated. Consider the example in
Table 3.8. In the table, we compute the hypothetical average utilities over a sample of
respondents for each level of an effects coded variable with four levels. Note that while in the
model, we would have three effects codes, it is still possible to compute the average utility
over the sample for all three levels. Thus, for example, over all N respondents, the average
utility for summer might be calculated as −0.225 while the average utility over the same
respondents for the winter level is 0.875. Once computed, the average of these averages is
calculated, which we call the grand mean. The effects coded parameters are then calculated
as the difference in the average utility for that level from the grand mean. The grand mean
itself will be the ASC for the model. As such, the effects code may be interpreted as reflecting
the average difference in utility for that level relative to the average utility for all of the effects
coded variables. Note that this interpretation only works when all variables in the model are
effects coded, however.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
66 Getting started
0.8
0.6
0.4
0.2
Utility (Train)
–0.2
–0.4
–0.6
–0.8
For both the dummy and effects coding schemes, the choice of which level
to set as the base is completely arbitrary. This is because utility is measured on
an ordinal scale, and the parameters will be estimated relative to whatever
level is set as the base. Further, all else being equal, the choice of coding scheme
should not matter in terms of the final model results, only in how the results
are interpreted. This is because moving from dummy coding to effects coding
(or vice versa) will, if done correctly, lead simply to a rescaling of the para-
L 1
X
meter estimates by precisely βkl ; such that the predicted utilities for the
l
alternatives will remain exactly the same. To demonstrate, consider an exam-
ple where the season variable is effects coded and produces parameter esti-
mates for summer, autumn (fall) and winter of −0.8, 0.3, and 0.6, respectively.
The base level associated with spring will be −0.1 (i.e., −(−0.8 + 0.3 + 0.6)). If
we were to use dummy codes as opposed to effects codes, the parameters for
summer, autumn, and winter would rescale by −0.1, to become −0.7, 0.4, and
0.7, respectively. Table 3.9 and Figure 3.10 show this rescaling.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
67 Choice and utility
Table 3.10 Dummy and effects coding with a status quo alternative: 1
Situation (s) Policy (j) Fish saved Constant x10 x20 x30 x10 x20 x30
1 A 10 1 1 0 0 1 0 0
1 B 20 1 0 1 0 0 1 0
1 C 10 1 1 0 0 1 0 0
1 Do nothing 0 0 0 0 0 −1 −1 −1
2 A 30 1 0 0 1 0 0 1
2 B 10 1 1 0 0 1 0 0
2 C 20 1 0 1 0 0 1 0
2 Do nothing 0 0 0 0 0 −1 −1 −1
As an aside, one instance when the parameter estimates for dummy and effects coding will
not rescale to reproduce the same result is when the base level of a variable is associated
with only one particular alternative, typically a status quo alternative. To demonstrate, taking
an environmental economics example, assume that a researcher is investigating potential
policies related to improving the quality of water for a particular river. One attribute is the
number of fish each policy will save, where for the status quo alternative representing “doing
nothing,” the value for this attribute will be zero. Assuming that each policy will save at least
some fish, the attribute associated with each policy will never take the value zero. In such
cases, many researchers in setting up the data will assign as the base level the attribute
level with the lowest value, which in this case would be saving zero fish. To demonstrate
how the data might look, consider Table 3.10.
In Table 3.10, we have used a data format whereby each row of data
represents an alternative, and each column a variable. Groups of rows repre-
sent a choice observation. For example, assuming a decision maker is faced
with the choice between three policy alternatives, A, B, or C, and the choice
not to do anything, then each block of four rows will represent one choice
observation. This particular data format convention is used by software such
as Nlogit and is further discussed in Chapter 10. Within the table, choice
situations consisting of four alternatives are shown. Assuming that the num-
ber of fish saved attribute takes four levels (0, 10, 20, and 30), both dummy and
effects coding will require the creation of three additional variables. Note that
in Table 3.10, we have also included a column for model constants (ASCs)
following the discussion presented after Equation (3.15).
Coding the data as shown causes a number of problems when attempting to
estimate the model. Firstly, x10, x20, and x30 will induce near perfect
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
68 Getting started
correlations within the data. To see why, consider any two of the three dummy
or effects coded variables. If either of these two columns takes the value one,
then we know that the last column must be zero (i.e., if either x10 or x20 = 1,
then x30 must equal 0). If, on the other hand, neither of the two columns takes
the value one, then the last column must be equal to one. As such, we know the
value of the last column by knowing the value of the other columns (this is the
same reason we only need lk−1 columns when we use dummy or effects
coding). Hence, the information for any one dummy or effects coded variable
is contained within the other variables, and in this way at least one variable is
mathematically redundant. Second, the base level for both the dummy and
effects coded variables are perfectly confounded with an alternative, in this
case the no choice alternative. In the previous example used to describe
dummy and effects coding, the season in which a choice observation was
recorded was not specific to any particular alternative, and over the data set
will hopefully be associated with each alternative at some point in time (the
fact that we used the season dummy for only the train alternative is beside the
point; in some observations, the choice observation will be for summer, in
others winter, etc.). Whereas the correlation resulting in the need for only lk−1
columns results from correlations caused by linear combination of the col-
umns, now we have correlation in both the columns and rows of the data. This
latter point means that the base level of the dummy or effects code will act like
a model ASC as it is a constant for that alternative for all choice observations
within the data. As such, one can no longer estimate J−1 ASCs, as the base
level of the dummy or effects coded variable will not only represent the base
level of the variable but also a constant for that alternative. Hence, whereas
before one could have estimated an ASC for policies A, B, and C after
normalizing the ASC for the no choice alternative to zero, now we would be
able to estimate ASCs for only two of the policies as the third ASC is the base
level of the dummy or effects coded variable associated with the no choice
alternative. Further, as an ASC represents the mean of the unobserved effects
for that alternative, the difference in the coding of the base level between the
dummy and effects coding schemes applies not only with that variable as
before, but now to the entire alternative. For this reason, the two coding
schemes will produce different model results!
Cooper et al. (2012) propose using a hybrid coding scheme for such data,
combining both dummy and effects codes. This coding scheme involves
effects coding the values not perfectly correlated with an alternative (using
our example, this would be for levels 10, 20, and 30) while setting the values for
the correlated level to be zero. Note that it is necessary to use only lk−1 of the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
69 Choice and utility
Table 3.11 Dummy and effects coding with a status quo alternative: 2
Situation (s) Policy (j) Fish saved (00) Constant x10 x20
1 A 10 1 1 0
1 B 20 1 0 1
1 C 10 1 1 0
1 Do nothing 0 0 0 0
2 A 30 1 −1 −1
2 B 10 1 1 0
2 C 20 1 0 1
2 Do nothing 0 0 0 0
Here, however, the levels the variable can take are discrete, and the polynomial
transformation is achieved via how the variable is recoded. Hence, the inter-
pretation of the results obtained from models estimated on data using an
orthogonal polynomial coding scheme do not require that one apply a power
transformation to each of the variables, as this is already accounted for in the
coding. Hence, the effect that a particular level will have on the utility of an
alternative is given as:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
70 Getting started
Table 3.12 Dummy, effects, and orthogonal polynomial coding correlation comparisons
Original code Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk1 Xnsjk2 Xnsjk3
1 (Summer) 1 0 0 1 0 0 −3 1 −1
2 (Autumn) 0 1 0 0 1 0 −1 −1 3
3 (Winter) 0 0 1 0 0 1 1 −1 −3
4 (Spring) 0 0 0 −1 −1 −1 3 1 1
Correlation structure Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk1 Xnsjk2 Xnsjk3
xnsjk1 1 1 1
xnsjk2 −0.33 1 0.5 1 0 1
xnsjk3 −0.33 −0.33 1 0.5 0.5 1 0 0 1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
71 Choice and utility
At the base of Table 3.12, we present the correlation structures inferred for
each of the three different non-linear coding schemes. As can be seen, dummy
and effects codes induce correlations between each of the constructed vari-
ables, while orthogonal polynomial coding does not. It is for this reason that
orthogonal polynomial coding is generally preferred (see, e.g., Louviere et al.
2000, 269, who recommend that orthogonal polynomial coding be used
wherever possible); however the interpretation of orthogonal polynomial
coding may be somewhat more difficult to describe to those less familiar
with such a coding scheme.
To finish our discussion, we present an example of how the data might
appear under all three coding schemes. In doing so, we use −999 to represent
missing data (the value used in Nlogit5). In our example (Table 3.14), the
season variable only applies to the train alternative, and hence we have used
the missing value code to represent this (although in reality, we would not do
this, as we may wish to apply the season variable to another alternative in
another model).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
72 Getting started
Example (1)
3
1
Utility (train)
–1
–2
0.4
0.2
Utility (train)
–0.2
–0.4
Example (3)
3
2
1
Utility (train)
0
–1
–2
–3
–4 (1) Summer (2) Autumn (3) Winter (4) Spring
Xk
Example (4)
3
2
1
Utility (train)
0
–1
–2
–3
–4 (1) Summer (2) Autumn (3) Winter (4) Spring
Xk
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
73 Choice and utility
Table 3.14 Example data set up for dummy, effects, and orthogonal polynomial coding
Orthogonal polynomial
Dummy coding Effects coding coding
n s Mode (j) Season Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk1 Xnsjk2 Xnsjk3 Xnsjk1 Xnsjk2 Xnsjk3
1 1 car 1 −999 −999 −999 −999 −999 −999 −999 −999 −999
1 1 bus 1 −999 −999 −999 −999 −999 −999 −999 −999 −999
1 1 train 1 1 0 0 1 0 0 −3 1 −1
1 1 tram 1 −999 −999 −999 −999 −999 −999 −999 −999 −999
1 2 car 3 −999 −999 −999 −999 −999 −999 −999 −999 −999
1 2 bus 3 −999 −999 −999 −999 −999 −999 −999 −999 −999
1 2 train 3 0 0 1 0 0 1 1 −1 −3
1 2 tram 3 −999 −999 −999 −999 −999 −999 −999 −999 −999
2 1 car 2 −999 −999 −999 −999 −999 −999 −999 −999 −999
2 1 bus 2 −999 −999 −999 −999 −999 −999 −999 −999 −999
2 1 train 2 0 1 0 0 1 0 −1 −1 3
2 1 tram 2 −999 −999 −999 −999 −999 −999 −999 −999 −999
2 2 car 4 −999 −999 −999 −999 −999 −999 −999 −999 −999
2 2 bus 4 −999 −999 −999 −999 −999 −999 −999 −999 −999
2 2 train 4 0 0 0 −1 −1 −1 3 1 1
2 2 tram 4 −999 −999 −999 −999 −999 −999 −999 −999 −999
βk ¼ Gðxnsjl ; βl Þ: ð3:27Þ
K
X K
X
Vnsj ¼ βl βjk xnsjk ¼ βl βjk xnsjk : ð3:28Þ
k¼1 k¼1
The marginal impact on utility for a change in xnsjk will be βl βjk ¼ θjk ; however,
as βl is common to all θjk ; the resulting estimates are likely to be correlated, not
only within each alternative but also between alternatives (Hess and Rose 2012).
Models allowing for non-linearity in the parameters utility specifications
have also been applied to data dealing with risky choices (e.g., Anderson et al.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
74 Getting started
2012; Hensher et al. 2011; and Chapter 20). “Risk” in such studies has typically
been defined in terms of uncertainty in observing what outcome is likely to
occur prior to the choice being made. In defining risk, each potential outcome
is defined as occurring up to some probability, and decision makers are
assumed to know both the potential outcome and the probability that an
outcome will actually occur (in reality, it is a little more complicated than this,
as research has shown that decision makers are likely to use what are known as
subjective as opposed to objective probabilities when making decisions, where
the latter is the actual probability associated with an outcome, while the
former is the decision maker’s interpretation of the probability; this recognizes
that for some outcome, the decision maker’s perception (i.e., the subjective
probability) of an objective probability (i.e., the real life mathematical prob-
ability) will typically be either an over- or under-weighting of the objective
probability). For example, prior to driving to work, a commuter will not know
precisely how long the trip will take due to unknowns such as the degree of
traffic congestion, how many traffic lights (or robots for our South African
readers) they will be stopped by, the weather, etc. Nevertheless, based on
previous experience, the same car commuter will likely have an expectation as
to the minimum and maximum amount of time the trip might take, as well as
an expectation as to the most likely amount of time (probably an average of
the travel times over repeated similar trips). The car commuter is also likely to
have some idea about the likelihood of each outcome actually occurring (e.g.,
the shortest and longest travel times are more likely to occur with much less
frequency than the average travel time). Based on this same scenario, Hensher
et al. (2011) modeled choice data in which car commuters were presented with
competing routes described by a range of travel times, each with an associated
probability of occurring. Let pnsoj be the probability that decision maker n in
choice situation s will experience for route j a travel time of xnsoj. Hensher et al.
(2011) parameterized their model such that:
" #
XO
1 α
κnsoj ¼ βl ωðpnsoj Þxnsoj =ð1 αÞ ; ð3:29Þ
o¼1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
75 Choice and utility
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
76 Getting started
Despite the name, discrete choice modeling need not be about choices, at least
in the sense that most people think about them. The econometric models and
underlying economic or psychological theories may be adapted to fit any
discrete outcome. For example, Jones and Hensher (2004) apply discrete
choice models to model corporate bankruptcies, insolvencies, and takeovers.
Likewise, the substantive example used in Section 3.2 dealt with election
outcomes (i.e., voting), which at the aggregate level represent disaggregate
choices that, however, should be thought of differently. Although the majority
of this chapter has dealt with the issue of utility, as discussed in Section 3.2,
discrete choice models relate, via a link function, some latent variable to the
observed outcomes, and it just so happens that the latent variable is called
utility when the outcomes are disaggregate choices.
Whether dealing with choices or otherwise, this chapter set out the basis for
modeling the observed component of discrete choice models. The chapter
sought to explain the alternatives available to the analyst when writing out the
utility functions of discrete choice models, as well as the various interpreta-
tions that each approach has. The chapter has made reference to material to
follow. Although we have introduced probit and logit models, link functions,
choice probabilities, etc., the next few chapters will go into far more detail
about the different econometric models and how they are estimated. In
particular, in outlining the various econometric models over the next few
chapters, a significant amount of time will be devoted to the unobserved
effects of discrete choice models and the impact that different assumptions
about them have on discrete choice models.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
77 Choice and utility
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
78 Getting started
REGRESS;Lhs=Party;Rhs=ONE,min$
-----------------------------------------------------------------------------------------------------
Ordinary least squares regression . . . . . . . . . . . .
LHS=PARTY Mean = .46279
Standard deviation = .49919
---------- No. of observations = 430 DegFreedom Mean square
Regression Sum of Squares = 26.0374 1 26.03737
Residual Sum of Squares = 80.8673 428 .18894
Total Sum of Squares = 106.905 429 .24919
---------- Standard error of e = .43467 Root MSE .43366
Fit R-squared = .24356 R-bar squared .24179
Model test F[ 1, 428] = 137.80596 Prob F > F* .00000
Model was estimated on Dec 02, 2013 at 05:13:26 PM
-----------+--------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
PARTY| Coefficient Error t |t|>T* Interval
-----------+---------------------------------------------------------------------------------------
Constant| .09081** .03799 2.39 .0173 .01634 .16528
MIN| 1.43778*** .12248 11.74 .0000 1.19772 1.67783
-----------+----------------------------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
----------------------------------------------------------------------------------------------------
PROBIT;Lhs=Party;Rhs=ONE,min$
Normal exit: 5 iterations. Status=0, F= 235.3260
-----------------------------------------------------------------------------
Binomial Probit Model
Dependent variable PARTY
Log likelihood function -235.32597
Restricted log likelihood -296.86149
Chi squared [ 1 d.f.] 123.07103
Significance level .00000
McFadden Pseudo R-squared .2072870
Estimation based on N = 430, K = 2
Inf.Cr.AIC = 474.7 AIC/N = 1.104
Model estimated: Dec 02, 2013, 17:13:26
Hosmer-Lemeshow chi-squared = 17.71639
P-value= .02346 with deg.fr. = 8
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
79 Choice and utility
-----------+--------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
PARTY| Coefficient Error z |z|>Z* Interval
-----------+---------------------------------------------------------------------------------------
|Index function for probability
Constant| -1.30938*** .13944 -9.39 .0000 -1.58267 -1.03608
MIN| 4.92644*** .53252 9.25 .0000 3.88272 5.97017
-----------+---------------------------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
----------------------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:09:28 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.005
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
4.1 Introduction
80
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
81 Families of discrete choice models
K
X
Unsj ¼ n βnk xnsjk þ ensj ; ð4:2Þ
k¼1
where βnk represents the marginal utility or parameter weight associated with
attribute k for respondent n. The unobserved component, εnsj, is often
assumed to be an independently and identically (IID) distributed EV1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
82 Getting started
K
X
Unsj ¼ βnk xnsjk þ ðensj =n Þ: ð4:3Þ
k¼1
XKbe seen that the variance of εnsj is inversely related to the magnitude of
It can
n k¼1 βnk xnsjk via σn. If εnsj has an EV1 distribution with this scale para-
meter, then Var(εnsj/σn) = π2/6; if, instead, εnsj is normally distributed, then
Var(εnsj/σn) = 1. In addition to the information on the levels of the attributes,
x in Equation (4.2) may also contain up to J−1 alternative-specific constants
(ASCs) that capture the residual mean influences of the unobserved effects on
choice associated with their respective alternatives. This x takes the value 1 for
the alternative under consideration or zero otherwise. The utility specification
in Equation (4.2) is flexible in that it allows for the possibility that different
respondents may have different marginal utilities for each attribute being
modeled. In practice, it is not generally feasible to estimate individual specific
parameter weights. As such, it is common to estimate parameter weights for
the population that vary randomly around a mean, such that:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
83 Families of discrete choice models
For a given choice situation, the analyst will rely on data on the attributes
describing the alternatives faced by a decision maker, covariates representing
the characteristics of the decision maker, and the context in which the decision
is being made. Also required are the observed choice outcomes. These data are
then used to form utility specifications that explain the observed choice out-
comes. The analyst, however, will never observe the actual utility that the
decision maker holds towards each of the alternatives. Utility is a latent
construct known only (even if subconsciously) to the decision maker. To
further complicate matters, the analyst will rarely, if ever, observe all of the
variables that lead to each decision maker’s level of utility for each of the
alternatives, labeled j. In part, this might be the result of failing to ask for all
relevant information from each decision maker, or that the decision makers
themselves cannot relate fully the relevant information to the analyst. As such,
the utility, Unsj will never actually equal the model specified by the analyst,
Vnsj. To reconcile these two utility constructs, an additional term is required.
This additional term, first introduced in Equation (3.7), is given as Equation
(4.5), where Unsj equals Vnsj + εnsj, such that εnsj captures the factors that affect
utility but are not measured within Vnsj and not directly observable by the
analyst:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
84 Getting started
Assuming that each decision maker acts as a utility maximiser, they will
choose the alternative for which they will derive the largest amount of utility.
Given that the specific value εnsj is unknown for all n, s, and j, the total utility,
Unsj, for each decision maker will also be unknown. As such, while the analyst
may be able to calculate the amount of utility for each alternative associated
with the observed component of the model, it remains impossible to calculate
precisely the overall utility that each decision maker will hold for any given
alternative. (That is, even assuming that Vnsj is observable. Typically, in the
context of a model, Vnsj will also involve unknown parameters that must be
estimated using the observed data.) To demonstrate, consider a scenario in
which decision maker n is faced with a choice between four possible alter-
natives, car, bus, train, and tram. Assume further that based only on their
relative times and costs, the decision maker would value the four choices, in
relative terms, at 2, 3, –2 and 0 for Vnj, for car, bus, train, and tram,
respectively. The utility functions for this example are given as Equation (4.6):
Un;car ¼ 2 þ en;car ;
Un;bus ¼ 3 þ en;bus ;
Un;train ¼ 2 þ en;train ;
Un;tram ¼ 0 þ en;tram : ð4:6Þ
Given this scenario, it is tempting to state that the decision maker would select
the bus alternative; however, whether they actually choose the bus will depend
on the values they hold for εn,car, εn,bus, εn,train, and εn,tram. For example, suppose
for our hypothetical decision maker, εn,car = −1, εn,bus = −3, εn,train = 5, and
εn,tram = 0. Then, the alternative with the highest total utility will be train.
Despite all other modes of transport offering greater amounts of utility in terms
of what was modeled, the train alternative will be chosen. The chooser has a
strong preference for rail travel. Perhaps they are a train buff, a variable not
included in the observed component of the model.
In order to make any progress at modeling choices, it is necessary to make a
number of assumptions about the unobserved components of utility. The most
common assumption is that for each alternative, j, εnsj, will be randomly
distributed with some density, f ðensj Þ, over decision makers, n, and choice
situations, s. Further assumptions about the specific density specification
adopted for the unobserved effects, εnsj (e.g., the unobserved effects are drawn
from a multivariate normal distribution) lead to alternate econometric models.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
85 Families of discrete choice models
Assuming that there exists some joint density such that ens ¼
hens1 ; . . . ; ensJ i represents a vector of the J unobserved effects for the full choice
set, it becomes possible to make probabilistic statements about the choices
made by the decision makers. Specifically, the probability that respondent n in
choice situation s will select alternative j is given as the probability that
outcome j will have the maximum utility:
Equation (4.8) reflects the probability that the differences in the random
terms, ensi ensj will be less than the differences in the observed components
of utility, Vnsi Vnsj .
The fact that discrete choice models are probabilistic in nature is important
for a number of reasons. The probabilities described in Equations (4.7) and
(4.8) represent the translation between the categorical dependent variable and
the latent utility. The properties of probabilities provide a natural link between
the utility functions of the j alternatives. While it might appear that the utility
specifications are independent of each other (indeed, one could think of them
as separate regression equations), the fact that for a mutually exclusive and
exhaustive set of alternatives, the probability of any one alternative being
selected is constrained to be between zero and one, and the sum of the
probabilities for all alternatives must sum to one, means that the utilities are
related via their associated probabilities. Thus, if the utility for one alternative
increases, ceteris paribus, the probability that that alternative will be selected
increases and, correspondingly, the probabilities that other alternatives will be
selected will fall, even though the utilities for the remaining alternatives do not
change. It is the choice probabilities that link the separate utility functions
together in one single model.
The relationship between the modeled utility and the choice probabilities is
non-linear. In Chapter 3, we demonstrated that choice probabilities are not
linear in shape over changes in x. A unit change in an attribute x, will result in
a different change in the predicted choice probabilities given different initial
values of x. To illustrate, consider the change in probability that a voting
district will have elected a Democratic Party member given an increase from
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
86 Getting started
Choice models are built around two families of distributions for the random
component of the utility function. Most recent studies rely on the Gumbel, or
Type 1 Extreme value distribution (EV1) discussed above. This model was
used in the earliest developments and remains the basic framework of choice.
Contemporary studies generally build outward from this essential model. The
alternative is the multivariate normal. The normal distribution is more natural
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
87 Families of discrete choice models
where |Ω| is the determinant of Ωe. Figure 4.1 shows a plot for a multivariate
normal distribution assuming two alternatives.
The symmetric covariance matrix Ωe will have J variance terms (i.e., σii ∀i)
and ((J−1)J)/2 unique covariance terms (i.e., σij ∀i ≠ j), with a total of ((J+1)J)/2
distinct elements. For example, if J equals 5, Ωe will have 5 variance terms and
0.2
Probability Density
0.15
0.1
0.05
0
2 3
2
0 1
e2 0
–2 –1
–2 e1
–3
Figure 4.1 Multivariate Normal distribution for two alternatives
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
88 Getting started
0 0 0 0 0 1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
89 Families of discrete choice models
ðUnsj Unsi Þ=τ ¼ ðVnsj Vnsi Þ=τ þ ðensj ensi Þ=τ: ð4:14Þ
But, the scaling does not affect the comparison. If j were the most preferred
alternative without the scaling of the utilities, it is still the preferred alternative
after the scaling. The empirical implication is that even after accommodating
the idea that, for modeling purposes, utilities are considered only relative to
each other, we must also accommodate this scaling ambiguity. Once again,
there are many ways to do this by modifying the covariance matrix so that in
terms of observable information, the matrix is “observable” (that is, estim-
able). Again, a straightforward way to proceed is to normalize one more of the
remaining variances to one, and implicitly scale the entire remaining matrix.
This would appear as follows:
00 1 0 11
0 λ11 λ12 λ13 λ14 0
BB 0 C B λ12 λ22 λ23 λ24 0 CC
BB C B CC
BB C B CC
ens N BBB 0 C; B λ13 λ23 λ33 λ34 0 CC;
B C B CC ð4:15Þ
@@ 0 A @ λ14 λ24 λ34 1 0 AA
BB C B CC
0 0 0 0 0 1
so that throughout the matrix, λii = θii/θ44 and λij = θij/√θ44. That is, the
normalization process impacts upon the unobserved effects, both variance
and covariances related to the J alternatives. We emphasize there are an
infinite number of possible ways to normalize and scale the covariance
matrix to satisfy the requirements. (See the work of Moshe Ben-Akiva and
Joan Walker for studies on different ways that the covariance matrix may
be normalized in models based on the normal distribution.) The empirical
implication is that the normalization is necessary for “identification.” We
hope to learn about Ω from observed data. But the observed data on
choices only contain a certain amount of information, and no more, with-
out further assumptions by the analyst. Choice data on J = (5) outcomes
only provide sufficient information to analyze a matrix such as that in
Equation (4.13), or a transformation of such a matrix. Second, we must
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
90 Getting started
note that normalization and scaling have implications for the deterministic
parts of utility, Vnsi, as well as the unobserved part. To see this at work,
reconsider the original unnormalized, unscaled model:
We now know based on this discussion that given the observed choice data,
we cannot actually learn about β. Because of the need for a scale normal-
ization – in terms of our 5 choice example, all we can learn about is a scaled
vector, β/√θ44.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
91 Families of discrete choice models
1 12 0 0
0 1
B
B 12 1 0 0C
Oe ¼ B C: ð4:17Þ
C
@ 0 0 1 0A
0 0 0 1
0 εn1
λ11 λ21 λ31 0
0
20 1 0 1 13
Bε C 6B 0 C B λ λ22 λ32 0C 7
B n2 C 6B C B 21
Unj ¼ Vnj þ εnj ; j ¼ 1; . . . ; 4 whereB C N6B C; B C7:
C7
@ εn3 A 4@ 0 A @ λ31 λ32 1 0 A5
εn4 0 0 0 0 1
ð4:18Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
92 Getting started
Consider the probability that the individual chooses alternative 1. For con-
venience at this point, we will drop the observation subscript. This means
that:
The three random terms, (w12, w13, w14), are linear combinations of joint
normally distributed variables, so they are joint normally distributed. The
means are obviously (0,0,0). The 3×3 covariance matrix is:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
93 Families of discrete choice models
where ϕ3(. . .) denotes the trivariate normal density in Equation (4.20) with
means zero and covariance matrix ∑[1]. The practical complication is in
computing the three variate normal integral. There is no function that can be
used. The GHK simulator, invented in the early 1990s, is a method (an
algorithm) that is used to approximate these integrals using Monte Carlo
simulation. The calculation is approximate, and extremely computer inten-
sive (time consuming) even with modern technology. Simulation-based
computations and the GHK simulator are developed in further detail in
Chapter 5.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
94 Getting started
Let the variance of the unobserved effect for alternative j be Varðensj Þ ¼ 2j .
π2
This value equals 6λ 2 for the unstandardized Type 1 GEV distribution, where λj
j
is the scale parameter mentioned in Section 4.2. As before, because only
rankings of alternatives, and not actual utilities, can be observed, it is neces-
sary to normalize the scale of the utility function – we do not have information
for estimation of an unknown scale parameter. We do this by scaling the
utility by λj as:
The standardized GEV1 random variable in (4.22) that does not have a
separate scale parameter – i.e., when the scale equals one – has variance
2
π2/6. Thus, Varðλj ensj Þ ¼ π6 . This is in contrast to the normalized probit
model in which the standardized random effect has Var[εnsj] = 1. The stan-
dardized GEV1 variable also has a non-zero mean, E½ensj ¼ 0:57721 (the
constant 0.57721 is the Euler–Mascheroni constant, γ = -Γ0 (1)).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
95 Families of discrete choice models
0.15
Probability Density
0.1
0.05
0
4
2 4
0 2
0
–2 –2
–4 –4
e2 e1
Logit models are usually specified under the general assumption that the
variances of the unobserved effects are the same for all alternatives j.
Consistent with the notation used earlier, where εns ¼ hens1 ; . . . ; ensJ i repre-
sents the vector of unobserved effects, assuming J = 5, we can express Equation
(4.22) as Equation (4.23):
00 1 0 11
0:57721=λ1 11 12 13 14 15
BB 0:57721=λ2 C B 12 22 23 24 25 CC
BB C B CC
BB C B CC
ensj GEV1B BB 0:57721=λ3 C; B 13 23 33 34 35 CC: ð4:23Þ
B C B CC
@@ 0:57721=λ4 A @ 14 24 34 44 45 AA
BB C B CC
Figure 4.2 shows a plot for a Type 1 GEV distribution assuming two alter-
natives, setting λj = 1,∀j. Note that we now recognize that different alternatives
may have different scale parameters via the inclusion of the subscript j.
The logit model requires the same level and scale normalizations as the
probit model in order to be able to estimate the parameters of interest. The
approach to ensuring identification of the logit model, however, is different
from how it is handled within the probit model framework. Although there
exist different types of logit models (e.g., multinomial logit, nested logit, mixed
multinomial logit), logit models are generally derived under the assumption
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
96 Getting started
0.45
Lambda = 1.0
0.4 Lambda = 1.5
Lambda = 0.5
0.35
0.3
0.25
f(ε)
0.2
0.15
0.1
0.05
0
–10 –5 0 5 10 15 25
ε
Lambda (λj)
1.0 1.5 0.5
Mean 0.57721 0.38481 1.15442
Variance 1.64493 0.73108 6.57974
Std Dev. 1.28255 0.85503 2.56510
that the variances of the unobserved effects are the same for all alternatives j.
The assumption that the variances of the unobserved effects are constant
across alternatives j requires some form of normalization of 2j .
As an aside, given that only differences in utility matter, the fact that the mean of the
Type 1 GEV distribution is not zero is of no consequence. The difference will be zero for
any pair of alternatives, i and j, assuming λj ¼ λi : Nevertheless, the distribution is
clearly dependent on the scale parameter as demonstrated in Figure 4.3, where the
probability density function of a univariate Type 1 EV distribution is plotted for three
different values of σ 2j :
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
97 Families of discrete choice models
2
Varðensj Þ ¼ π6 ¼ 1:6449, which is equivalent to saying that 11 ¼ 22 ¼ . . . ¼
55 in Equation (4.23). Note that this suggests that normalizing the variance of
the unobserved effects is therefore equivalent to normalizing the scale of
utility.
Further restrictions or normalizations are imposed, depending on the
specific logit model being estimated. The simplest logit model, the multi-
nomial logit (MNL) model, restricts all covariances to be zero such that
Equation (4.23) becomes:
π2 =6
00 1 0 11
0:57721 0 0 0 0
BB 0:57721 C B 0 π 2 =6 0 0 0 CC
BB C B CC
ensj
BB C B CC
IID EV1B
BB 0:57721 C; B
B C B 0 0 π2 =6 0 CC:
0 CC
2
@@ 0:57721 A @ 0 0 0 π =6 0 AA
BB C B CC
0:57721 0 0 0 0 π 2 =6
ð4:24Þ
IID is used to indicate that the random variables are independently and
identically distributed. Here, “independent” implies zero covariances or cor-
relations between the j unobserved effects, while “identical” implies that the
distributions of the unobserved effects are all the same. Note that we have also
used the term EV1 in Equation (4.24) as opposed to GEV, as this is more
consistent with the language used in the literature.
The computation of the multinomial probit probabilities in Equation (4.20)
is complex, and requires an involved approximation that uses Monte Carlo
simulation. The expression in Equation (4.10) that involves integrals that need
to be approximated is denoted an “open-form” computation. In contrast, the
probabilities for a MLN model are considerably simpler, and can be computed
in closed form. It has been shown in many sources, such as Train (2009), that
for a MLN model:
expðVnsj Þ
Prob ðAlt j is chosenÞ ¼ XJ ; j ¼ 1; . . . ; J: ð4:25Þ
j¼1
expðV nsj Þ
Assuming that the utility functions themselves are straightforward, the prob-
abilities in Equation (4.25) can be computed simply by plugging relevant
quantities into the formula, with no approximations required. This is one of
the appealing features of the logit form of choice model. We do note that, with
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
98 Getting started
The logit (MNL) model with constant variances and zero covariances is a
fairly restrictive form. In this regard, the probit model is somewhat more
attractive. However, the probit form is rather cumbersome and, as we have
seen, still quite complicated to estimate. The logit form is a convenient starting
point for a large number of interesting extensions.
Choice analysis has often been described as a way of explaining variations
in the behavior of a sample of individuals. The consequence of this view is that
a key focus of model development has been the search for increasing sources
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
99 Families of discrete choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
100 Getting started
4.5.1 Heteroskedasticity
As we examined earlier, it is not possible to learn the scales of the utility
functions from observed choice data. We did observe, however, that relative
scales can be determined. See, for example, Equation (4.15). With the normal-
ization of one of the scale factors to one, we could specify the logit model as:
θ21 π2 =6
00 1 0 11
0:57721 0 0 0 0
BB 0:57721 C B 0 θ22 π 2 =6 0 0 0 CC
BB C B CC
ensj θ23 π2 =6
BB C B CC
EV1B
BB 0:57721 C; B
B C B 0 0 0 CC:
0 CC
@@ 0:57721 A @ 0 0 0 θ25 π2 =6 0 AA
BB C B CC
0:57721 0 0 0 0 π 2 =6
ð4:26Þ
Note that we no longer state that the random terms are IID – they remain
independent, but they are not identically distributed. The normalization is
θ5 = 1. The heteroskedasticity specified in Equation (4.26) is with respect to
the set of utility functions. All individuals are still characterized by the same
scale factor. Later in the book (see Chapter 15) we will examine models in
which characteristics of the individuals (such as age, education, income, and
gender) also influence the scaling of the utilities. For present purposes, the
extension might appear as shown in Equation (4.27):
Q
X K
X
Unsj ¼ ð1 þ δq wnq Þ βk xnsjk þ εnsj ; ð4:28Þ
q k¼1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
101 Families of discrete choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
102 Getting started
two terms are multiplicative such that the utility for alternative j in choice set s
held by respondent n may be represented as:
Assuming that both Vnsj and εnsj are positive, Fosgerau and Bierlaire
(2009) note that it is possible to take logs of Vnsj and εnsj without affecting
choice probabilities, and in doing so the model is equivalent to an additive
model, where Vnsj is replaced by ln(Vnsj). Assuming, for example, that Vnsj < 0
and εnsj > 0, Equation (4.29) may be equivalently rewritten as:
Unsj ¼ lnð Vnsj Þ ln ensj : ð4:30Þ
lnðensj Þ ¼ nsj
Adopting the assumption that n
, Equation (4.30) then
becomes:
lnð Vnsj Þ þ nsj
Unsj ¼ n ¼ n lnð Vnsj Þ þ nsj : ð4:31Þ
The CDF of the error term for this new model, assuming εnsj is Extreme value
distributed, is:
The most common model used to date in the literature to account for scale
heterogeneity is the nested logit (NL) model. The NL model is typically set up
with a hierarchical tree-like structure linking alternatives that share common
scale or error variances. Each branch or nest of the model, which sits above the
(elemental) alternatives in the tree, also will have its own utility as well as
scale. The NL model allows for a (partial) parameterization of scale at each
level of the model (after some normalization). The scale parameters within the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
103 Families of discrete choice models
model are inversely related to the error (co)variances of the common set of
alternatives linked to that branch or nest and are multiplicative with the utility
of those same alternatives. The NL model is estimated in Chapter 14.
Let λb represent the scale parameter at the top branch level or nest and μ(j|b)
represent the scale at the elemental alternative level of the tree. The utility of
an alternative located at the lower level of the tree-like structure nested within
branch or nest b is given as Equation (4.33):
K
X
Unsj ¼ μðjjbÞ βk xnsjk þ ensj ; ð4:33Þ
k¼1
2
π
where μðjjbÞ ¼ 6varðensjjb Þ
.
From Equation (4.33), the influence of scale and error variance upon utility
can clearly be seen. As the error variance increases, the magnitude of μ(j|b)
decreases and hence the observed component of utility decreases. Likewise, a
decrease in error variance will result in an increase in μ(j|b) and an increase in
the magnitude of the observed component of utility.
The utility at the upper level of the tree structure is linked to the utility of
the alternatives contained within the “nest” below such that:
!
1 X
λb log exp μðjjbÞ Vnsjjb ; ð4:34Þ
μðjjbÞ b2j
2
π
where μðjjbÞ ¼ 6varðe bÞ
represents the scale at the upper branch level.
The NL model remains over-parameterized, requiring the normalization of
one or more parameters for model identification. It is typical to normalize
either μ(j|b, or λb to 1.0 for one or more of the branches or nests. Normalizing
μ(j|b) = 1.0 results in models that are said to be normalized to random utility 1
(RU1) while normalizing λb = 1.0 produces random utility 2 (RU2) models
(see, e.g., Carrasco and Ortúzar 2002 and Hensher and Greene 2002). In either
case, what is actually being estimated in the model is λb or μ 1 , rather than
ðjjbÞ
both μ(j|b) and λb separately. The estimated parameters are often referred to as
inclusive value, or IV parameters within the literature. In Chapter 14 we use
the RU2 specification, given the controversy over whether an upper or lower
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
104 Getting started
λb
corrðUjjb ; Uijb Þ ¼ 1 : ð4:35Þ
μðjjbÞ
The link between the scales contained at each level of the tree structure can
best be seen when examining the choice probabilities produced from the NL
model. These are calculated using Equation (4.36):
ð4:36Þ
where Pnjsjb is the conditional probability that respondent n will select alter-
native j in choice task s given that alternative j belongs to branch b and Pnbs is
the probability of respondent n choosing branch b.
In estimating the model, EðPnsj Þ is substituted for the probability given in
Equation (4.36). As such, the model does not have a panel specification
equivalent.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
105 Families of discrete choice models
For example, Equation (4.37) shows a configuration in which there are two
covariance parameters that allow correlation across two sets of alternatives:
00 1 0 2 11
0:57721 a a 0 0
BB 0:57721 C B a 2 a 0 0 CC
BB C B CC
ensj EV1B
BB C B 2
CC
BB 0:57721 C; B a a 0 0 C CC: ð4:37Þ
B C B C
2
@@ 0:57721 A @ 0 0 0 b AA
BB C B CC
0:57721 0 0 0 b 2
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
106 Getting started
The mixed logit model differs from the MNL model in that it assumes that at least
some of the parameters are random, following a certain probability distribution,
as suggested earlier in Equation (4.4). These random parameter distributions are
assumed to be continuous over the sampled population. This model form takes
on many names including mixed logit, random parameters logit, kernel logit, and
mixed multinomial logit (MMNL). The choice probabilities of the MMNL
model, Pn ; therefore now depends on the random parameters with distributions
defined by the analyst. The MMNL model is summarized in Equation (4.39):
expðVnsj Þ
Probðchoicens ¼ jjxnsj ; zn ; vn Þ ¼ XJns ð4:39Þ
j¼1
expðVnsj Þ
where
βn ¼ β þ Dzn þ Γv n :
1
One can, however, allow for deterministic taste heterogeneity via interaction terms with respondent-
specific characteristics.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
107 Families of discrete choice models
Having separate univariate distributions for each parameter has the benefit
that different distributions can be easily mixed. For example, if β1 Nðμ; Þ;
and β2 Uða; bÞ; then EðPn Þ is written as:
ð ð
EðPn Þ ¼ Pn β1 ðz1 jμ; Þ; β2 ðz2 ja; bÞ 1 ðz1 Þ2 ðz2 Þdz1 dz2 ; ð4:43Þ
z1 z2
2
Note that if one would not like to assume independent random variables, then one can sample directly
from the multivariate distribution. In the case of a multivariate normal distribution, this is possible
through a Cholesky decomposition (see, e.g., Greene 2002).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
108 Getting started
N XX
X
log EðLN Þ ¼ ynsj log EðPnsj Þ: ð4:44Þ
n¼1 s2Sn j2Jns
N
X Y Y
log EðLN Þ ¼ log E ðPnsj Þynsj ; ð4:45Þ
n¼1 s2Sn j2Jns
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
109 Families of discrete choice models
or:
N
X
log ðLN Þ ¼ log EðPn Þ: ð4:46Þ
n¼1
K
X L
X
Unsj ¼ βk xnsjk ηl zlns dlb ; þensj ; ð4:47Þ
k¼1 l¼1
1 if j is in nest b
where dlb ¼
0 otherwise:
The interpretation of the ECs therefore relates to their associations with
specific alternatives and not with attributes as with more traditional random
taste models. Each estimated EC represents the residual random error var-
iances linking those alternatives, and by estimating different ECs for different
subsets of alternatives it is possible to estimate complex correlation structures
among the error variances of the various alternatives being modeled. Indeed,
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
110 Getting started
the use of ECs in a model induces particular covariance structures among the
modeled alternatives, and hence represents a relaxation of the IID assumption
typically associated with most logit type models.
The covariance structure is shown in Equation (4.48):
CovðUnsi ; Unsj Þ ¼ Eðηi znsi dbi þ ensi Þ0 ðηj znsj dbj þ ensj Þ
#b if alts i are in nest b
¼ ð4:48Þ
0 otherwise:
The generalized mixed logit model builds on the specifications of the mixed
logit model developed in Train (2003, 2009), Hensher and Greene (2003), and
Greene (2007), among others, and the “generalized multinomial logit model”
proposed in Fiebig et al. (2010) – see also Greene and Hensher (2010b).
A growing number of authors has stated that the mixed logit model and
multinomial choice models, more generally, do not adequately account for
scale heterogeneity (e.g., Feibig et al. 2010 and Keane 2006). Scale hetero-
geneity across choices is easily accommodated in the model already consid-
ered by random alternative-specific constants. As in the earlier
implementation, we accommodate both observed and unobserved heteroge-
neity in the model.
The starting position is the standard mixed multinomial logit model,
Equation (4.39), modified accordingly as Equation (4.50) (see also
Section 15.10 of Chapter 15):
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
111 Families of discrete choice models
where
σn = exp[ s+ δ0 hn + τwn], the individual specific standard deviation of the
idiosyncratic error term;
hn = a set of L characteristics of individual n that may overlap with zn;
δ = the parameters in the observed heterogeneity in the scale term;
wn = the unobserved heterogeneity, standard normally distributed;
s = a mean parameter in the variance;
τ = the coefficient on the unobserved scale heterogeneity;
γ = a weighting parameter that indicates how variance in residual
preference heterogeneity varies with scale, with 0 ≤ γ ≤ 1.
The weighting parameter, γ, is central to the generalized model. It controls
the relative importance of the overall scaling of the utility function, σn, versus
the scaling of the individual preference weights contained in the diagonal
elements of Γ. Note that if σn equals one (i.e., τ = 0), then γ falls out of the
model and Equation (4.50) reverts back to the base case random parameters
model in Equation (4.39). A non-zero γ cannot be estimated apart from Γ
when σn equals one. When σn is not equal to one, then γ will spread the
influence of the random components between overall scaling and the scaling
of the preference weights. In addition to the useful special cases of the original
mixed model, some useful special cases arise in this model. If γ = 0, then a
scaled mixed logit model emerges, given in Equation (4.51):
βn ¼ σn β: ð4:52Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
112 Getting started
It follows that s is not identified separately from τ, which appears nowhere else
in the model. Some normalization is required. A natural normalization would
be to set s = 0. However, it is more convenient to normalize σn so that E[σn2] =
1, by setting s = –τ2/2 instead of zero.
A second complication concerns the variation in σn during the simulations.
The log-normal distribution implied by exp(–τ2/2 + τwn) can produce extre-
mely large draws and lead to overflows and instability of the estimator. To
accommodate this concern, in Nlogit we have truncated the standard normal
distribution of wn at −1.96 and +1.96. In contrast to Fiebig et al. who propose
an acceptance/rejection method for the random draws, we have used a one
draw method, wnr = Φ−1[.025 + .95Unr], where Φ−1(t) is the inverse of the
standard normal CDF and Unr is a random draw from the standard uniform
population. This will maintain the smoothness of the estimator in the random
draws. The acceptance/rejection approach requires, on average, 1/.95 draws to
obtain an acceptable draw, while the inverse probability approach always
requires exactly one.
Finally, in order to impose the limits on γ, γ is reparameterized in terms of
α, where γ = exp(α)/[1 + exp(α)] and α is unrestricted. Likewise, to ensure τ ≥
0, the model is fit in terms of λ, where τ = exp(λ) and λ is unrestricted.
Restricted versions in which it is desired to restrict γ = 1 or 0 and/or τ = 0 are
imposed directly during the estimation, rather than using Extreme values of
the underlying parameters, as in previous studies. Thus, in estimation, the
restriction γ = 0 is imposed directly, rather than using, for example, α = −10.0
or some other large value. See Section 15.9 of Chapter 15, where we estimate a
number of generalized mixed logit models.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
113 Families of discrete choice models
" #
1 1
β n ¼ sn β c 1 ¼ sn βc : ð4:53Þ
β ðβ þ Γv n Þ
c
θc þ Γc v n
In the simple multinomial logit case (σn = 1, Γ = 0), this is a one to one
transformation of the parameters of the original model. Where the parameters
are random, however, the transformation is no longer that simple. We, as well
as Train and Week (2005), have found, in application, that this form of the
transformed model produces generally much more reasonable estimates of
WTP for individuals in the sample than the model in the original form in
which WTP is computed using ratios of parameters (Hensher and Greene
2011).3
Assuming utility is separable in price, cnsj, and other non-price attributes
xnsjk it is possible write out Equation (4.54) in WTP space:
1 XK
Unsj ¼ n cnsj þ β x
k¼1 nk nsjk
þ ensj ;
βnc ð4:54Þ
XK
¼ n cnsj þ n k¼1 θnk xnsjk þ ensj
where the price parameter has been normalized to 1.0 and where θnk repre-
sents a direct parameterization of WTP for the remaining non-price attributes
xnsjk. As can be seen from Equation (4.54), scale also plays a dominant role in
the model. Indeed, this is noted by Scarpa et al. (2008), who discuss the
confound between scale and preference heterogeneity, stating that
If the scale parameter varies and [the taste parameters] are fixed, then the utility
coefficients vary with perfect correlation. If the utility coefficients have correlation less
than unity, then [the taste parameters] are necessarily varying in addition to, or
instead of, the scale parameter. Finally, even if [the scale parameter] does not vary
over [respondents] . . ., utility coefficients can be correlated simply due to correlations
among tastes for various attributes.
3
The paper by Hensher and Greene (2011), like Train and Weeks (2005) supports the WTP space
framework for estimating WTP distributions given that the evidence on the range is behaviorally more
plausible, despite the overall goodness-of-fit being inferior to the utility space specifications.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
114 Getting started
expðVnc Þ
Pnc ¼ X ; ð4:55Þ
c2C
expðVnC Þ
where Vnc ¼ δ c hn ; represents the observed component of utility from the class
assignment model and hn are respondent-specific covariates that condition
class membership.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
115 Families of discrete choice models
expðVnsjjc Þ
Pnsjjc ¼ X ; ð4:56Þ
i
expðV nsijc Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
116 Getting started
Y
ynsi Pnsjjc :Pnc
sY
Pnsjc ¼ X ; 8c 2 C: ð4:59Þ
c2C
y P :P
s nsi nsjjc nc
If the number of choice tasks observed per respondent equals one, then
Equation (4.59) will collapse to Equation (4.58). Analysts wanting to use the
LC model in its various forms can now go to Chapter 16.
This chapter has taken the reader on a journey through the (historical)
development of discrete choice models, mainly logit (and some limited dis-
cussion of probit), to show the growth in behavioral richness that is now
available in the progression of choice models from the very basic multinomial
logit to the advanced versions of mixed multinomial logit. The latter can allow
for scale and preference heterogeneity in various guises including decomposi-
tion to recognize sources of the systematic explanation of variation in the
distribution of preferences and scale in a sample.
A challenge for the analyst is to compare the model forms, noting the
progression in behavioral richness in closed-form models (MNL and nested
logit) where a partial relaxation of IID occurs, and the migration to open-form
models associated with the growing family of mixed multinomial logit models
that allow for continuous preference distributions for observed attributes
describing alternatives as well as variance heterogeneity (through scale and
error components) in the unobserved influences on utility. The LC model
form is linked to the open-form mixed multinomial logit model through a
discrete (in contrast to a continuous) distribution of attribute parameteriza-
tion through class assignment, as well as allowing for continuous distributions
on parameters within a class.
In several of the following chapters (especially Chapters 11–16), we show
the user how to estimate the full range of models presented in this chapter,
including applying the statistical tests presented in Chapter 7.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.006
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
An approximate answer to the right problem is worth a good deal more than an exact
answer to an approximate problem.
(Tukey 1962)
5.1 Introduction
Chapters 3 and 4 introduced a number of new concepts and models to the reader,
including the probit and logit models. As seen in Chapter 4, probit and logit
models are derived under different assumptions about the error term. For probit
models, the error terms are assumed to be multivariate Normally distributed,
while logit models assume a multivariate extreme value Type 1 distribution, or
some restriction thereof. In Chapter 4, we briefly discussed the fact that discrete
choice models are estimated using a method known as maximum likelihood
estimation. The current chapter seeks to explain maximum likelihood estimation
in the context of discrete choice models. In doing so, we also briefly discuss several
of the more common algorithms used in estimating discrete choice models.
In addition to discussing maximum likelihood estimation, we also introduce
the related concept of simulated maximum likelihood. A number of the models
introduced in Chapter 4 do not have analytically tractable solutions when one
attempts to compute their choice probabilities. Such models are said to be of open
form, requiring simulation of the choice probabilities. We therefore discuss the
several common simulation approaches used in estimating discrete choice models.
Given data, the objective of the analyst is to estimate the unknown parameters,
β. While there exist several methods to do so, the most common approach
117
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
118 Getting started
when dealing with discrete choice data is to use a method known as maximum
likelihood estimation. Maximum likelihood estimation involves the analyst
specifying some objective function, known as a likelihood function, where
the only unknowns are the parameters which are related to the data via the
analyst’s defined utility specifications, and then maximizing the function.
Given that the parameters are the only unknowns, the data being fixed, they
remain the only component of the equation that can change in maximizing
the likelihood function. The difficulty therefore is in deriving a likelihood
function that is appropriate for the problem, identifying the parameters that
best fit the data.
The likelihood function of discrete choice models is designed to maximize
the choice probabilities associated with the alternatives that are observed to be
chosen in the data. That is, the likelihood function is defined in such a way as
to maximize the predictions obtained by the model. To demonstrate, let ynsj
equal one if j is the chosen alternative in choice situation s faced by decision
maker n, and zero otherwise. In other words, y represents the observed choice
outcomes within some data. Then the parameters can be estimated by max-
imizing the likelihood function L:
N YY
Y
LNS ¼ ðPnsj Þynsj : ð5:1Þ
n¼1 s2Sn j2Jns
where N denotes the total number of decision makers, Sn is the set of choice
situations faced by decision maker n, and Pnsj is a function of the data and
unknown parameters β.
To appreciate how Equation (5.1) works, consider an example data set
involving two decision makers who were observed to have each made choices
in two separate choice situations. Let us assume that the first decision maker
had three alternatives to choose from in the first situation, coded 1, 2, and 3,
but only the first two alternatives in the second situation. Further, let us
assume that the second decision maker observed alternatives 2 and 3 in the
first choice situation but had four alternatives in the second choice situation
they faced, coded 1, 2, 3, and 4. Let each alternative be represented by three
attributes, x1, x2, and x3 and that the model has generic parameters, with no
constants. The data as described are presented in Table 5.1 alongside an
indicator variable, ynsj ; representing which of the j alternatives were observed
to have been chosen in each choice situation s.
Let us assume that the analyst has estimated the parameters related to x1, x2,
and x3 to be −0.2, 0.2, and 0.2, respectively. Based on these parameters, the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
119 Estimating discrete choice models
utilities associated with each of the alternatives are calculated, as are the choice
probabilities, assuming an MNL model. The utilities and choice probabilities
are also given in Table 5.1. Given the choice probabilities, we are able to apply
Equation (5.1) to calculate the likelihood function. Noting that for ynsj = 0,
ðPnsj Þ0 ¼ 1, while for ynsj = 1, ðPnsj Þ1 ¼ Pnsj ; the likelihood function for the
model is 0.049.
Now consider that for the same data a new model is estimated, and the
parameters are now found to be −0.301, 0.056, and −0.189 for x1, x2, and x3,
respectively (actually these are the parameter estimates which maximize the
likelihood function). Given the new parameter estimates, the utilities are
re-calculated as are the choice probabilities, both of which are presented in
Table 5.2. Note now that the model likelihood is much larger than for the
previous estimates. Also, note that for three of the four choice situations, the
choice probabilities for the chosen alternatives are larger, suggesting that, for
these choice situations, the model is a better predictor of these outcomes. This
is precisely what Equation (5.1) seeks to do. By maximizing Equation (5.1), the
analyst is attempting to maximize the choice probabilities for the chosen
alternatives within the data. Nevertheless, as demonstrated in the example
below, it may not be possible to maximize the choice probabilities for all
choice situations. In other words, the objective is to locate the parameters that
will produce the best choice probabilities over the entire sample, not just at the
individual choice situation level.
The fact that one or more choice probabilities may be worse off despite the
overall model likelihood improving may be indicative of a number of issues.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
120 Getting started
Firstly, the analyst may have mis-specified the utility expressions. For
example, perhaps the correct utility specification should have involved
some form of transformation of one or more of the independent variables.
Or perhaps there exists one or more interaction terms that have not been
accounted for in the model specification. Secondly, there may exist either
scale or preference heterogeneity that has not been allowed for in the model
specified. Or perhaps different decision makers make use of different mental
algebra in making their choices (i.e., attribute processing – see Chapter 21),
which the estimated model ignores. Thirdly, there may be omitted variables
that if included would ensure a solution whereby all the choice probabilities
for the chosen alternatives would improve as the model likelihood improves.
Only when all the choice probabilities for the chosen alternatives are equal to
one can any of the above be ruled out; however, in such a case the choices
would be completely deterministic, in the sense that if one can predict
perfectly the choices made in all choice situations by all decision makers,
then there exists no error and the analyst will have perfect knowledge of the
decision processes of all decision makers. In such a scenario, the choice
models described herein may fail, given that the choice probabilities are
derived under the assumption that there does exist error within the data,
which represents somewhat of a paradox.
Putting aside the above, it is more common to maximize the log of the
likelihood function rather than the likelihood function itself. This is because
taking the product of a series of probabilities will typically produce values that
are extremely small, particularly as n, s, and j increase. Unfortunately, most
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
121 Estimating discrete choice models
N XX
X
LLNS ¼ ynsj lnðPnsj Þ: ð5:3Þ
n¼1 s2Sn j2Jns
This is the LL function of the MNL model. For any choice situation, s, as the
probability of Pnsj increases, lnðPnsj Þ→0 and hence for ynsj = 1, ynsj lnðPnsj Þ→0.
Thus, as the probability approaches one for a chosen alternative within choice
situation, s, the choice situation-specific LL approaches zero. The objective is
therefore to locate (sample) population parameter estimates that maximize
(noting that the log of any value between zero and one will be negative, hence
we wish to maximize, not minimize) as many within choice situation LLs as
possible. In doing so, it is important to note that in estimating the parameters,
as with the likelihood function, the choice situation-specific LL may increase
for some choice situations; however, it is hoped that over all choice situations
the majority of within choice situation LLs gets smaller.
Table 5.3 shows for the same sample data used to demonstrate the like-
lihood function, the calculations for the LL function. Note that the para-
meters found by maximizing the LL function are precisely the same as those
obtained by maximizing the likelihood function. That is, the parameters are
found to be −0.301, 0.056, and −0.189 for x1, x2, and x3, respectively. Note,
however, that the object function being optimized, the LL, is now negative
(as is expected).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
122 Getting started
As an aside, in the examples above we assumed discrete choice data; however, the same
technique can be used to estimate the parameters for other related types of data – for
example, count or proportions data. For count data, decision makers are observed to select
each alternative, j, more than once, or not at all. That is, rather than making a discrete
choice, decision makers may select how many times each alternative will be chosen. When
dealing with count data, the choice indicator, ynsj, is replaced by the count variable, cnsj, in
the LL function. An example of this is given in Table 5.4, assuming now that the parameters
for x1, x2, and x3 are now estimated to be −0.019, −0.208, and 0.208, respectively.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
123 Estimating discrete choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
124 Getting started
1 8 4 6 0
2 9 5 5 0
3 10 5 4 1
4 5 2 9 2
5 8 2 6 4
6 7 4 7 1
7 4 0 10 5
8 8 1 6 3
9 10 4 4 0
10 5 3 9 3
Given the above information, we simulate the utilities for all 1,000 choice
observations by assuming that β1 = −0.5 and β2 = 0.7, and taking random
draws from an EV1 IID distribution to replicate the error structure of the
assumed sample population (we discuss methods for drawing from distribu-
tions in Section 5.5). Given knowledge of the utilities, and assuming that
decision makers will select the alternative that maximizes their utility, it is
then possible to simulate the choice index, ynsj, for each observation. Given
data simulated as described above, one would expect to retrieve the input
parameter estimates assuming the same utility specification by maximizing
the LL function given in Equation (5.3). Rather than do this, however, we
jointly vary the parameter estimates systematically in increments of 0.1
between the ranges of −1.0 and 1.0 (i.e., (β1,β2)=(−1,−1),(−1,−0.9),. . .,
(0,0),. . .,(1,0.9),(1,1)) and calculate the log-likelihood for each parameter
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
125 Estimating discrete choice models
0 –600
–700
Log Likelihood
–1000
Log Likelihood
–800
–2000 –900
–1000
–3000
–1100
–4000 –1200
1 1
0.5 1 0.5 1
0 0.5 0 0.5
0 0
–0.5 –0.5 –0.5 –0.5
Beta 2 Beta 1 Beta 2 Beta 1
–1 –1 –1 –1
(a) MNL linear in the parameters specification (b) MNL non-linear in the parameters specification
pair. Figure 5.1a plots the results of this exercise. As can be seen from the
plot, the LL function is always negative and approaches its maximum value
when β1 = −0.5 and β2 = 0.7, suggesting that these are the estimates most
likely to have come from this data.
For MNL models with linear in the parameters’ utility specifications, the
surface of the LL function will be globally concave, meaning that there will
exist one maxima which should be relatively straightforward to locate. For
all other models, including MNL models with non-linear in the parameters’
specifications, there may exist multiple (local) maxima. In such instances,
location of the global maxima may not always be so straightforward. To
demonstrate, consider now that the analyst was to specify an MNL model
using the following non-linear in the parameters specification (which can be
estimated in Nlogit5 – see Chapter 20) based on the same simulated discrete
choice data:
Figure 5.1b plots the LL surface over the same range of parameter combina-
tions based on this new utility specification. Note now that the LL surface is no
longer globally concave and has two maxima, one local (i.e., when β1 = 0.9 and
β2 = −0.3 the LL function equals −619.500) and one global (i.e., when β1 = −1.0
and β2 = 0.3 the LL function equals −604.008). Although the two LLs appear to
be quite different in value, depending on a number of factors it is possible
when estimating the model that they end up with the estimated parameters
associated with the local optima, being unaware that there exists a different
global optima. This is because most algorithms do not plot the entire surface
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
126 Getting started
Several discrete choice models assume that (some of) the parameters are
randomly distributed over the population, where typically the random
parameters are assumed to follow certain parametric probability distribu-
tions (there also exist several models that allow for non- or semi-parametric
representations of the probability distributions (e.g., Briesch et al. 2010;
Fosgerau 2006; Klein and Spady 1993); however, these models remain out-
side of the scope of the current text). The probit model is an example of
one such model, where (subject to restrictions) the error terms are para-
meters to be estimated under the assumption that they are Normally dis-
tributed over the sample population. Further, as discussed in Chapter 4,
the probit model can be extended to allow for tastes to be Normally (and
log-Normally) distributed over the sampled population. Likewise, the
MMNL model allows tastes to vary over the population according to some
analyst defined continuous distribution (see Chapter 15). Of importance
is the fact that, for these models, the estimated parameters describe the
moments of the assumed distributions for the sampled population. Where
a particular individual resides within the distribution is not known (i.e., their
assignment is random without an interaction with the source of a potential
systematic influence which could place a respondent at a particular location
on a distribution). Thus, for each individual, it is necessary to evaluate the
choice probabilities over the entire real line represented by the population
level distributions. Hence, the probability that respondent n in choice
situation s will choose alternative j can be written as:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
127 Estimating discrete choice models
ð
Lnsj ¼ Pnsj ðβÞf ðβjθÞdβ; ð5:6Þ
β
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
128 Getting started
R
1X
Lnsj ¼ EðPnsj Þ f ðβðrÞ jXÞ: ð5:10Þ
R r¼1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
129 Estimating discrete choice models
At the base of each choice situation, the average choice probability is shown
with the expected choice probability for the chosen alternative placed in bold.
The last column of the table calculates the choice situation-specific contribu-
tion to the model simulated LL, as well as the model simulated log-likelihood
based on the parameter values provided.
Note that at the base of each choice situation, we have also calculated and
shown the average simulated utility for each alternative. This has been done
for purely cosmetic purposes. Indeed, it is worthwhile stressing that the
average probability is used to calculate the simulated LL and not the average
!
Vr EðV r Þ
e ni e nsi
simulated utilities. This is because E P Vnsjr ≠ X
J J
: We leave it to the
r
e EðVnsj Þ
j¼1
e
i¼1
reader to confirm that the choice probabilities for the chosen alternatives
would be 0.707, 0.181, 0.859, and 0.745 for the four choice situations,
respectively, if the average utilities were (incorrectly) used, leading to simu-
lated LL of −2.503, as opposed to the correct value of −2.249. Although there
is only a small discrepancy in this instance, as the number of choice
observations increase so too will the differences, resulting in very different
model outcomes.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
Table 5.7 Example of simulated log-likelihood estimation (cross-sectional model)
n s r ynsj x11 x12 x13 x21 x22 x23 x31 x32 x33 x41 x42 x43 β~1 r β~2 r β3 r
Vns1 r
Vns2 r
Vns3 r
Vns4 r
Pns1 r
Pns2 r
Pns3 r
Pns4 ynsj lnEðPnsj Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
131 Estimating discrete choice models
Note that even though we have assumed the same parameter estimates for
the cross-sectional and panel versions of the MMNL model in our examples,
in practice the two models will tend to produce very different estimates and
simulated LL functions. Further, in practice the log-likelihood function of
the panel version of the model will typically be better than that of the cross-
sectional version of the same model, all else being equal. Unfortunately,
the two models are not necessarily nested in that they are maximizing
different log-likelihood functions (Equation (5.12) versus Equation (5.13)),
and hence a direct comparison may not be possible between the two.
Nevertheless, theory would suggest that in short run panels such as with
stated preference data, tastes are likely to be consistent within decision
maker n across choice observations, and hence the panel model may be
the more appropriate model to estimate on such data. Finally, we note that in
the case of each decision maker observing a single choice situation, the two
models will be necessarily the same, as the product of s in the panel model
will disappear.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
Table 5.8 Example of simulated log-likelihood estimation (panel model)
yn1j yn2j
n s r ynsj x11 x12 x13 x21 x22 x23 x31 x32 x33 x41 x42 x43 β~1 r β~2 r β3 r
Vns1 r
Vns2 r
Vns3 r
Vns4 r
Pns1 r
Pns2 r
Pns3 r
Pns4 ln E Pn1j :Pn2j
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
133 Estimating discrete choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
134 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
135 Estimating discrete choice models
0.5
0.4
Probability Density
0.3
0.2
0.1
0
–5 –4 –3 –2 –1 0 1 2 3 4 5
β(r)
Figure 5.2 Example for drawing from two different PDFs
0.9
0.8
N (–1,0.75)
0.7
Cumulative Probability
N (0,1)
0.6
0.5
0.4
0.3
0.2
0.1
0
–5 –4 –3 –2 –1 0 1 2 3 4 5
β(r)
Figure 5.3 Example for drawing from two different CDFs
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
136 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
137 Estimating discrete choice models
different estimation runs even when the different seeds are used. Independent
of whether the seed is set or not, the pseudo-random draws are retained and re-
used over different iterations (see Section 5.6). These draws are then converted
to draws taken from the density functions of the random parameters.
Figure 5.4 demonstrates this process using Microsoft Excel. In step 1, we
generate two sequences of pseudo-random numbers for R =10 draws using the
rand() function. Although not shown in the figure, these values should then be
fixed, by first copying the values and then using paste special to convert the
formulas to values. Next, assuming normal distributions, the Excel equation
Norminv(<prob>, <mean>, <std dev>) function is used to convert the 0–1
draws to the density functions of the random parameters. Here, the first
reference of the equation is to the probability, which is represented by the
0–1 sequences. Next, the equation requires the analyst to specify the mean and
standard deviation of the random parameter.
Randomness of the draws is not a prerequisite in the approximation of the
integral in Equation (5.6). Rather, Winiarski (2003) has posited that correla-
tion between draws for different dimensions can have a positive effect on
the approximation, and draws which are distributed as uniformly as possible
over the area of integration as being more desirable. Hence, selecting draws
deterministically so that they possess these properties represents a potential
way to minimize the integration error (for further discussion, see Niederreiter
1992 or Fang and Wang 1994). QMC simulation methods are almost identical
to the PMC simulation method, except that they use deterministic sequences
ðrÞ
in generating the population of values in uk Uð0; 1Þ: In QMC methods,
ðrÞ
the numbers in uk are taken from different intelligent quasi-random
sequences, also termed low discrepancy sequences. One argument for the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
138 Getting started
L
ðrÞ
X
r¼ bℓ pℓk ; ð5:17Þ
ℓ¼0
ðrÞ
where 0 ≤ bℓ ≤ pk 1 determines the L digits used in base pk in order to
represent r (i.e., solving Equation (5.17)), and where the range for L is
determined by pLk ≤ r < pLþ1
k : The draw is then obtained as:
L
ðrÞ ðrÞ
X
uk ¼ bℓ pk ℓ 1 : ð5:18Þ
ℓ¼0
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
139 Estimating discrete choice models
Base 10 Prime Prime Prime Prime Prime Prime Prime Prime Prime Prime Prime Prime
integer 2 3 5 7 11 13 17 19 23 29 31 37
0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1
2 10 2 2 2 2 2 2 2 2 2 2 2
3 11 10 3 3 3 3 3 3 3 3 3 3
4 100 11 4 4 4 4 4 4 4 4 4 4
5 101 12 10 5 5 5 5 5 5 5 5 5
6 110 20 11 6 6 6 6 6 6 6 6 6
7 111 21 12 10 7 7 7 7 7 7 7 7
8 1000 22 13 11 8 8 8 8 8 8 8 8
9 1001 100 14 12 9 9 9 9 9 9 9 9
10 1010 101 20 13 A A A A A A A A
11 1011 102 21 14 10 B B B B B B B
12 1100 110 22 15 11 C C C C C C C
13 1101 111 23 16 12 10 D D D D D D
14 1110 112 24 20 13 11 E E E E E E
15 1111 120 30 21 14 12 F F F F F F
16 10000 121 31 22 15 13 G G G G G G
17 10001 122 32 23 16 14 10 H H H H H
18 10010 200 33 24 17 15 11 I I I I I
19 10011 201 34 25 18 16 12 10 J J J J
20 10100 202 40 26 19 17 13 11 K K K K
Step 1. List the integers zero to R in Base 10. Most readers will be familiar
with Arabic numerals that consist of 10 digits, zero to nine. Arabic numerals
presented in this way are said to be in decimal, or base 10 units. Working in
base 10, all digits are used to count from zero up to nine, after which two digits
are required to represent numbers between 10 and 99, three digits for values in
the hundreds, etc. In Binary or base 2, there are only two digits available, zero
and one, while when working in base 3, there are three digits available, zero,
one, and two. For numbers presented in base 10 or greater, more than ten digits
will be required. Unfortunately, the counting system adopted by Western
societies is such that we have only nine digits available (perhaps the
Babylonians who developed a mathematical system equivalent to base 60 had
it right!). Hence, for base 11, we require 11 digits, however we only have ten
numerals available. As such, it is common to use capital letters, A, B, C, D, E,
etc. to represent the decimal numbers 10, 11, 12, 13, and 14, etc. The base values
for Prime numbers two to 37 (dimensions two and 12) are shown in Table 5.9.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
140 Getting started
Step 2. For each integer in Table 5.9, reverse the order of the digits and
convert the resulting value into a decimal number by placing the reversed
value after the decimal point. We treat the letters as a special case, substituting
the value that the letter represents and keeping that value unchanged. The
result of this process is shown in Table 5.10.
Step 3. For each decimalized number, convert the number back to Prime 10
using Equation (5.19):
L
ðrÞ ðrÞ
X
uk ¼ bl =plk ; ð5:19Þ
ℓ¼1
where pk represents the Prime number used as the base for the kth parameter,
ðrÞ
and bl represents the lth digit after the decimal place for draw r. For example,
consider the 13th value for Prime 2, 0.1011. Based on Equation (5.19), this
translates to 211 þ 202 þ 213 þ 214 ¼ 0:6875: Likewise, consider the 20th draw
based on Prime 3, that being 0.202. The conversion back to base 10 would
be 321 þ 302 þ 223 ¼ 0:7407407: For the non-decimal values (i.e., the ones that we
used letters to represent previously), the process is somewhat simplified in
that the conversion back to base 10 is:
ðrÞ ðrÞ
uk ¼ bl =pk : ð5:20Þ
Thus, for example, consider the 12th draw for the sequence generated using
Prime 13. The 12 13 ¼ 0:9230769:
Step 4. Remove the first row, related to r = 0.
Using the process as described above, the Halton sequence for the R = 20 is
given in Table 5.11. Note that in addition to deleting the first row of the
sequence related to r = 0, it is commonly advised to also delete the rows
associated with r = 1 to 10 (see Bratley et al. 1992 or Morokoff and Caflisch
1995). Although not necessary, this is done as the generated sequences may be
sensitive to the starting point chosen. Note that where the first r (not including
the r = 0) rows are discarded, it is necessary to construct longer sequences to
derive the necessary number of draws. For example, if the analyst wishes to
make use of 500 Halton draws, but at the same time delete the first 10, then
510 draws must be constructed.
Halton sequences generated in the above fashion will exhibit a certain
degree of correlation, particularly among sequences generated from higher
Prime numbers. Indeed, when two large Prime-based sequences associated
with two high dimensions are paired, the sampled points increasingly lie on
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
Table 5.10 Converting the Base values to decimals
r Prime 2 Prime 3 Prime 5 Prime 7 Prime 11 Prime 13 Prime 17 Prime 19 Prime 23 Prime 29 Prime 31 Prime 37
0 0 0 0 0 0 0 0 0 0 0 0 0
1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
2 0.01 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
3 0.11 0.01 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
4 0.001 0.11 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
5 0.101 0.21 0.01 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
6 0.011 0.02 0.11 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6
7 0.111 0.12 0.21 0.01 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7
8 0.0001 0.22 0.31 0.11 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8
9 0.1001 0.001 0.41 0.21 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9
10 0.0101 0.101 0.02 0.31 10 10 10 10 10 10 10 10
11 0.1101 0.201 0.12 0.41 0.01 11 11 11 11 11 11 11
12 0.0011 0.011 0.22 0.51 0.11 12 12 12 12 12 12 12
13 0.1011 0.111 0.32 0.61 0.21 0.01 13 13 13 13 13 13
14 0.0111 0.211 0.42 0.02 0.31 0.11 14 14 14 14 14 14
15 0.1111 0.021 0.03 0.12 0.41 0.21 15 15 15 15 15 15
16 0.00001 0.121 0.13 0.22 0.51 0.31 16 16 16 16 16 16
17 0.10001 0.221 0.23 0.32 0.61 0.41 0.01 17 17 17 17 17
18 0.01001 0.002 0.33 0.42 0.71 0.50 0.11 18 18 18 18 18
19 0.11001 0.102 0.43 0.52 0.81 0.61 0.21 0.01 19 19 19 19
20 0.00101 0.202 0.04 0.62 0.91 0.71 0.31 0.11 20 20 20 20
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
Table 5.11 Halton sequences for Primes 2 to 37
R Prime 2 Prime 3 Prime 5 Prime 7 Prime 11 Prime 13 Prime 17 Prime 19 Prime 23 Prime 29 Prime 31 Prime 37
1 0.5 0.3333333 0.2 0.1428571 0.090909 0.0769231 0.0588235 0.0526316 0.0434783 0.0344828 0.0322581 0.027027
2 0.25 0.6666667 0.4 0.2857143 0.181818 0.1538462 0.1176471 0.1052632 0.0869565 0.0689655 0.0645161 0.0540541
3 0.75 0.1111111 0.6 0.4285714 0.272727 0.2307692 0.1764706 0.1578947 0.1304348 0.1034483 0.0967742 0.0810811
4 0.125 0.4444444 0.8 0.5714286 0.363636 0.3076923 0.2352941 0.2105263 0.173913 0.137931 0.1290323 0.1081081
5 0.625 0.7777778 0.04 0.7142857 0.454545 0.3846154 0.2941176 0.2631579 0.2173913 0.1724138 0.1612903 0.1351351
6 0.375 0.2222222 0.24 0.8571429 0.545455 0.4615385 0.3529412 0.3157895 0.2608696 0.2068966 0.1935484 0.1621622
7 0.875 0.5555556 0.44 0.0204082 0.636364 0.5384615 0.4117647 0.3684211 0.3043478 0.2413793 0.2258065 0.1891892
8 0.0625 0.8888889 0.64 0.1632653 0.727273 0.6153846 0.4705882 0.4210526 0.3478261 0.2758621 0.2580645 0.2162162
9 0.5625 0.037037 0.84 0.3061224 0.818182 0.6923077 0.5294118 0.4736842 0.3913043 0.3103448 0.2903226 0.2432432
10 0.3125 0.3703704 0.08 0.4489796 0.909091 0.7692308 0.5882353 0.5263158 0.4347826 0.3448276 0.3225806 0.2702703
11 0.8125 0.7037037 0.28 0.5918367 0.008264 0.8461538 0.6470588 0.5789474 0.4782609 0.3793103 0.3548387 0.2972973
12 0.1875 0.1481481 0.48 0.7346939 0.099174 0.9230769 0.7058824 0.6315789 0.5217391 0.4137931 0.3870968 0.3243243
13 0.6875 0.4814815 0.68 0.877551 0.190083 0.0059172 0.7647059 0.6842105 0.5652174 0.4482759 0.4193548 0.3513514
14 0.4375 0.8148148 0.88 0.0408163 0.280992 0.0828402 0.8235294 0.7368421 0.6086957 0.4827586 0.4516129 0.3783784
15 0.9375 0.2592593 0.12 0.1836735 0.371901 0.1597633 0.8823529 0.7894737 0.6521739 0.5172414 0.483871 0.4054054
16 0.03125 0.5925926 0.32 0.3265306 0.46281 0.2366864 0.9411765 0.8421053 0.6956522 0.5517241 0.516129 0.4324324
17 0.53125 0.9259259 0.52 0.4693878 0.553719 0.3136095 0.0034602 0.8947368 0.7391304 0.5862069 0.5483871 0.4594595
18 0.28125 0.0740741 0.72 0.6122449 0.644628 0.3905325 0.0622837 0.9473684 0.7826087 0.6206897 0.5806452 0.4864865
19 0.78125 0.4074074 0.92 0.755102 0.735537 0.4674556 0.1211073 0.0027701 0.826087 0.6551724 0.6129032 0.5135135
20 0.15625 0.7407407 0.16 0.8979592 0.826446 0.5443787 0.1799308 0.0554017 0.8695652 0.6896552 0.6451613 0.5405405
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
143 Estimating discrete choice models
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
Prime 67
Prime 3
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Prime 2 Prime 61
parallel lines. This is illustrated in Figure 5.5. The panel on the left of
Figure 5.5 plots the space covered or evaluated for R = 1,000 draws based on
Halton sequences generated from Primes 2 and 3. The panel on the right of
Figure 5.5 plots for the same number of draws the coverage based on Halton
sequences generated from Primes 61 and 67 (dimensions 18 and 19). As
shown in the figure, the use of higher dimensions leads to a rapid deterioration
in the uniformity of the coverage of Halton sequences, with a noticeable
deterioration after only five dimensions (i.e., Prime 13 onwards) (e.g., Bhat
2001, 2003).
To break this correlation, researchers have suggested several ways in
which the Halton sequences can be randomized. We discuss two of these
approaches in Sections 5.2.2 and 5.2.3. Nevertheless, in addition to increas-
ingly worsening correlation structures, the use of higher dimensions leads to a
need to use more draws. This can clearly be seen in Table 5.11 by comparing
the sequences generated from Primes 2 to 19 (dimensions one to eight) to
sequences generated from Prime 23 onwards.
For the first set of sequences, the Halton sequences cover the zero-one space
at least once before starting again. For example, examining the sequence
generated from Prime 19, the sequence begins with a value close to zero,
before increasing to close to one (draw 18), before starting the cycle once more
with a value close to zero. As such, within 20 draws, sequences generated using
Primes 2 to 19 will have completed at least one cycle between zero and one.
Note that for sequences generated from Prime numbers greater than 19, this
is not the case. Indeed, for Prime 37, the sequence requires 36 draws before
it begins the cycle anew. As a consequence, higher dimensional Halton
sequences based on larger Prime numbers will require many more draws in
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
144 Getting started
0.15 0.15
Probability Density
Probability Density
0.1 0.1
0.05 0.05
0 0
2 2
1 2 1 2
0 1 0 1
0 –1 0
–1 –1 Prime 67 –1
Prime 3 –2 –2 –2 –2
Prime 2 Prime 61
Figure 5.6 Multivariate Normal distributions for 100 Halton sequences based on different Primes
0.15 0.15
Probability Density
Probability Density
0.1 0.1
0.05 0.05
0 0
2 3 2
0 2 0 2
1 0
–2 –1 0 –2 –2
Prime 3 –2 Prime 2 Prime 67 Prime 61
–3 –4 –4
Figure 5.7 Multivariate Normal distributions for 1,000 Halton sequences based on different Primes
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
145 Estimating discrete choice models
0.1
0.05
0
2
0 2
–2 0
–2
–4 –4
Prime 67 Prime 61
Figure 5.8 Multivariate Normal distribution for 5,000 Halton draws based on Primes 61 and 67
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
146 Getting started
that the process need not break the correlation structure as advertised,
particularly when larger Prime numbers are used to construct the Halton
sequences. For example, panel (a) of Figure 5.9 plots the unit space covered or
evaluated for R = 1,000 draws based on randomized Halton sequences gener-
ated from Primes 61 and 67, where Z61 = 663 and Z67 = 931. As can be seen in
the plot, a similar correlation pattern as existed before remains. Further still,
as can be seen in panel (b) of Figure 5.9, which plots the multivariate normal
distribution simulated using the same randomized Halton draws, even with
1,000 draws the approximation to the assumed density remains less than
desirable.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
147 Estimating discrete choice models
1
0.9
0.8 0.15
Probability Density
Prime 67 Z = 931
0.7
0.6 0.1
0.5
0.05
0.4
0.3 0
0.2 2
0.1 0 1 2
0
0 −2 −2 −1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Prime 67 −3 Prime 61
Prime 61 Z = 663
(a) Unit space covered for randomized Halton (b) Multivariate normal distribution of randomized
sequence Halton sequence
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
148 Getting started
0.9
0.15
0.8
Probability Density
0.7
0.1
0.6
Prime 67
0.5 0.05
0.4
0.3 0
2
0.2 2
0
–2 0
0.1 –2
Prime 67 –4 Prime 61
–4
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Prime 61
(a) Unit space covered for shuffled Halton (b) Multivariate Normal distribution of shuffled sequence
Halton sequence
Figure 5.10 Example of 1,000 shuffled Halton draws based on Primes 61 and 67
ðrÞ r 1
uk ¼ þ k ; r ¼ 1; . . . ; R; ð5:22Þ
R
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
149 Estimating discrete choice models
1
0.9
0.15
0.8
0.7
0.1
0.6
Prime 67
0.5
0.05
0.4
0.3
0
0.2 2
0.1 0 2
0
Prime 67 –2 –2
0 –4 Prime 61
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 –4
Prime 61
(a) Unit space covered for shuffled Halton (b) Multivariate Normal distribution of shuffled
sequence Halton sequence
Figure 5.11 Example of 5,000 shuffled Halton draws based on Primes 61 and 67
Finally, the order of values within each sequence is randomized. Thus, the
final sequence might therefore be 0.296, 0.896, 0.696, 0.496, and 0.096.
Figure 5.12 demonstrates the construction of 10 MLHS random draws for
five different sequences using Microsoft Excel rand functions. In row 2 we
compute the rand draws for ξk by first calculating 1/R and multiplying this value
by different random draws for each of the five sequences. The random draws are
taken from a random uniform distribution based on the Microsoft Excel rand()
function. The MLHS draws are then calculated as per Equation (5.22).
Once the random draws have been computed, the analyst should fix the
values ξk or else the draws will continually change every time an operation is
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
150 Getting started
performed in Excel. Further, the analyst should take the draws as calculated
and randomize each column, as shown in Figure 5.13.
Table 5.13 shows the first five primitive polynomials where the first dimension
will use the first primitive polynomial, the second dimension the second, etc.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
151 Estimating discrete choice models
0 1 −
1 x+1 −
2 x2+x+1 −
3 x3+ x+1 x3+ x2+1
For higher dimensions, several primitive polynomials will exist from which
the analyst may randomly select one.
Next, the set of values, mr are located for each draw r using the coefficients
of the primitive polynomial and a recursive relationship for r > d such that:
where c1, c2,. . ., cd−1 are the coefficients of the primitive polynomial of degree
d and ⊕ is the bit-by-bit exclusive-or (EOR) operator. For example, 14 ⊕ 8
expanded to base 2 is represented as:
01110⊕11000 ¼ 10110:
As Equation (5.24) generates values for mr for r > d only, the first “d” odd
integers must be supplied rather than constructed. Any odd values can be
chosen provided the condition 0 < mr < 2r is satisfied. A set of direction
numbers are next generated by converting each mr value into a binary fraction
in the base 2 number system such that:
mr
vðrÞ ¼ in base 2: ð5:25Þ
2r
where k ð0Þ ¼ 0; v(q) is the qth direction number and q is the rightmost zero
bit in the base two expansion. For example, the rightmost zero value for n = 9
represented in the base 2 number system (1001) corresponds to q = 2.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
152 Getting started
To demonstrate, consider the construction of the first six Sobol draws using
the third-degree primitive polynomial:
Arbitrarily choosing m1, m2, and m3 to equal 1, 3, and 7, respectively, then for
r = 4 to 6 we obtain the values in Table 5.14 for mr and v(r).
The last step involves calculation of the actual generation of the draws
themselves. For the first draw, we consider n = 0 for which the binary expan-
sion of 0 is 0.0, hence insinuating q = 1, meaning that we apply v(1) to Equation
(5.25). Hence for the first draw, we obtain, ϕ(1) = ϕ(0) ⊕ v(1) = 0.1⊕ 0.1 = 0.1,
which in base 10 gives the value 0.5. For the second draw, assuming n = 1, the
binary expansion is 0.01, and hence the rightmost zero value is for q = 2. As
such, for the second draw, we obtain ϕ(2) = ϕ(1) ⊕ v(2) = 0.10 ⊕ 0.11 = 0.01
which in base 10 returns the value 0.25. The generation of the remaining values
continues in this manner. The entire process is shown in Table 5.14.
Table 5.15 presents the first 10 Sobol draws for the first 10 dimensions of
Sobol sequences.
Figure 5.14 plots the coverage in unit space for 250 Sobol draws based on
dimensions one and two, and 19 and 20. Although in higher dimensions,
patterns become increasingly discernible, coverage of the space tends to remain
superior to Halton sequences with the same number of draws. Nevertheless, as
shown in Figure 5.15, the number of draws required to adequately simulate
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
153 Estimating discrete choice models
R Sobol 1 Sobol 2 Sobol 3 Sobol 4 Sobol 5 Sobol 6 Sobol 7 Sobol 8 Sobol 9 Sobol 10
1 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000
2 0.7500 0.2500 0.7500 0.2500 0.7500 0.2500 0.7500 0.2500 0.2500 0.7500
3 0.2500 0.7500 0.2500 0.7500 0.2500 0.7500 0.2500 0.7500 0.7500 0.2500
4 0.3750 0.3750 0.6250 0.1250 0.8750 0.8750 0.1250 0.6250 0.1250 0.8750
5 0.8750 0.8750 0.1250 0.6250 0.3750 0.3750 0.6250 0.1250 0.6250 0.3750
6 0.6250 0.1250 0.3750 0.3750 0.1250 0.6250 0.8750 0.8750 0.3750 0.1250
7 0.1250 0.6250 0.8750 0.8750 0.6250 0.1250 0.3750 0.3750 0.8750 0.6250
8 0.1875 0.3125 0.3125 0.6875 0.5625 0.1875 0.0625 0.9375 0.1875 0.0625
9 0.6875 0.8125 0.8125 0.1875 0.0625 0.6875 0.5625 0.4375 0.6875 0.5625
10 0.9375 0.0625 0.5625 0.9375 0.3125 0.4375 0.8125 0.6875 0.4375 0.8125
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
Sobol 20
Sobol 2
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Sobol 1 Sobol 19
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
154 Getting started
0.15 0.15
Probability Density
Probability Density
0.1 0.1
0.05 0.05
0 0
2 2
0 2 0 2
1 1
0 0
–2 –1 –2 –1
–2 Sobol 20 –2 Sobol 19
Sobol 2 –3 Sobol 1 –3
Figure 5.15 Multivariate Normal distributions for 250 Sobol draws based on different dimensions
An issue with the use of antithetic draws lies in the fact that the number of
draws will have to be a multiple of 2k. That is, unlike Halton, Sobol, and MLHS
draws, where the analyst can specify any value for R, the use of antithetic
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
155 Estimating discrete choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
156 Getting started
QMC
likelihood function requires the estimation of the log of the simulated choice
probabilities. As such, even though the simulated probabilities themselves
may be unbiased for the true probabilities for a given number of draws, R,
the logs of the probabilities may not be. If this is the case, then the simulated
maximum likelihood function will also be biased. While this bias will
decrease as R increases, one must also consider the impact of the number
of choice observations. Train (2009) provides a discussion of these issues
and hence we omit a detailed treatise here, providing only a brief summary
of the arguments.
Firstly, as argued by Train (2009), if R is fixed, then the simulated maximum
likelihood function will fail to converge to the true parameter estimates as the
number of choice observations in the sample, S, increases. If R increases at the
same rate as S, the simulated maximum likelihood function will be consistent;
however, the estimator will not be asymptotically normal, meaning that it will
not be possible to estimate the standard errors (see Section 5.7). Indeed, R
pffiffi
must increase at a rate greater than s for the simulated maximum likelihood
function to be consistent, asymptotically normal, and efficient, in which case it
will be equivalent to the maximum likelihood estimator. The corollary of this
is that the number of draws used in practice should increase as the number of
choice observations increases in a sample.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
157 Estimating discrete choice models
The discussion to date has implicitly assumed that the random parameters
are drawn from univariate distributions. This is because in the simulation
process as described, each individual random estimate, whether it be a ran-
dom taste parameter or random error term, is assigned to a unique PMC or
QMC generated sequence. In theory, although not in practice, each sequence
is unrelated to each other. Earlier we noted that correlation between draws for
different dimensions can have a positive effect on the approximation of
whatever integral is being evaluated. In describing the various QMC methods,
we plotted the coverage of the draws in the 0–1 space and related the resulting
patterns to correlation. To demonstrate the issue further, in Table 5.17,
we show the correlation structures for the first 12 dimensions of Halton
sequences assuming 50, 100, 500, and 1,000 draws. As shown in the table,
several of the sequences are non-trivially correlated when a low number of
draws is taken. Nevertheless, we note that the correlations tend to shrink as
the number of draws increases. While correlation between the draws may aid
in evaluating the integral of interest, it also has implications for interpreting
the results. For example, let:
represent two random parameters where zn1 and zn2 are random draws from
two univariate distributions – say, two standard Normals – and β k and ηk are
the mean and deviation parameters of the two distributions, k = 1, 2. Let ϖ1
and ϖ2 represent the simulated standard deviations of the two distributions.
Given the above, we note that if zn1 and zn2 are correlated, then by definition
so too must ϖ1 and ϖ2 and hence βn1 and βn2. As such, while the analyst has
assumed that βn1 and βn2 are independent and interpreted the model as if this
were the case, the simulation process has induced correlations (or covar-
iances) between the two random parameters.
The assumption that correlation exists between random terms need not be a
concern. Indeed, in reality, some or all of the tastes that decision makers have
towards different attributes may be correlated. For example, there might exist
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
158 Getting started
a time–cost trade-off such that decision makers who are more time sensitive
are less cost sensitive, while those who are more cost sensitive are less time
sensitive. In this case, one would expect there to exist a negative correlation
between the tastes for time and cost. Likewise, the flexibility of the probit
model allows for correlation between the random error terms. In both of these
cases, the assumption that the random estimates should be drawn from
uncorrelated univariate distributions no longer holds. The problem is that
in drawing from univariate densities as we have described above, the degree of
correlation is an input into the model that, aside from using different numbers
and types of draws, the analyst has no control over or, without performing
post-estimation simulations, any way of retrieving.
Rather than draw from separate univariate distributions, the solution is
to draw directly from the multivariate distribution. In doing so, it should be
possible to estimate the covariances of the random terms and hence recover
the degree of correlation between the estimates. Unfortunately, this is not
so straightforward and to date is only truly feasible if one is working with a
multivariate Normal distribution. The process involves making use of a
process known as Cholesky factorization or, alternatively, a Cholesky
transformation. Let βn be a vector of K normally distributed elements
such that:
βn Nðβ; Ωr Þ; ð5:29Þ
where Ωr represents the covariance matrix of βn. Note that Ωr differs from the
covariance matrix Ωe described in Chapter 4. Ωe was the covariance matrix of
the error terms, while Ωr represents the covariances of the random parameter
estimates.
In the multivariate case, the aim is to estimate all the elements of Ωr. This
includes the off-diagonal elements which describe the covariances (and hence
correlations) between the random parameter estimates. Cholesky factoriza-
tion involves constructing a lower triangular matrix, C, such that Ωr ¼ CC0 , as
shown in Equation (5.30):
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
159 Estimating discrete choice models
skl ¼ ðηkl Þ=skl ; 8k ¼ 1; i ≠ lðlower off -diagonal elements in the first columnÞ
ð5:31cÞ
k 1
X
skl ¼ ðηkl skm sml Þ=skk ; 8k ≠ 1; k ≠ l ðlower off -diagonal
k
elements not in the first columnÞ: ð5:31dÞ
Once computed, the values for ϖk may then be determined such that:
ϖ1 s11 0 0 0 z1
0 1 00 10 11
B ϖ C BB s
B 2 C BB 21 s22 0 0 C
CB z2 CC
B CC
B C ¼ BB CB CC; ð5:32Þ
@ ϖ3 A @@ s31 s32 s33 0 A@ z3 AA
ϖ4 s41 s42 s43 s44 z4
ϖ1 ¼ s11 z1 ;
ϖ2 ¼ s21 z1 þ s22 z2 ;
ð5:33Þ
ϖ3 ¼ s31 z1 þ s32 z2 þ s33 z3 ;
ϖ4 ¼ s41 z1 þ s42 z2 þ s43 z3 þ s44 z4 ;
where skl are parameters to be estimated and zk are draws from univariate
standard Normal distributions.
As an aside, Equations (5.31a–d) are not used in practice. As stated above, it is the elements
in C that are estimated and not those in Ωr. That is, in practice the matrix C is computed, from
which Ωr is later determined. Equations (5.31a–d) assume that Ωr is known, and from this
are used to calculate the elements of C. As such, we show these equations simply to
demonstrate the relationship between the two matrices (see Appendix 5A).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
160 Getting started
From Equation (5.33), it can be seen that the Cholesky factorization process
correlates the K terms based on K independent components, zk. For example,
in the above, ϖ2 and ϖ1 are correlated due to the common influence of z1. Note
that the two terms are not perfectly correlated given that z2 affects only ϖ2 and
not ϖ1 : Similar patterns of correlation are derived for the other paired
combinations of ϖk values.
To demonstrate the above, assume that the following Cholesky matrix was
obtained from a hypothetical model:
1:361 0 0 0
0 1
B 0:613 0:094 0 0 C
C¼B C; ð5:34Þ
B C
@ 0:072 0:037 0:219 0 A
0:106 0:109 0:095 0:039
such that:
ϖ1 ¼ 1:361z1 ;
ϖ2 ¼ 0:613z1 þ 0:094z2 ;
ð5:35Þ
ϖ3 ¼ 0:072z1 0:037z2 þ 0:219z3 ;
ϖ4 ¼ 0:106z1 þ 0:109z2 0:095z3 þ 0:039z4 :
Given the above estimates, the covariance matrix of random terms, Ωr, is thus
computed as:
0 10 1
1:361 0 0 0 1:361 0:613 0:072 0:106
B CB C
B 0:613 0:094 0 0 CB 0 0:094 0:037 0:109 C
Ωr ¼ B
B CB C
CB C
B 0:072 0:037 0:219 0 A@ 0 0 0:219 0:095 C
C B
@ A
0:106 0:109 0:095 0:039 0 0 0 0:039
0 1
1:853 0:835 0:098 0:144
B C
B 0:835 0:385 0:048 0:075 C
¼B C:
B C
B 0:098 0:048 0:055 0:033 C
@ A
0:144 0:075 0:033 0:034
ð5:36Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
161 Estimating discrete choice models
Note that the multivariate case will collapse to the univariate case when
skl ¼ 0; 8k ≠ l: That is:
covðηk ; ηl Þ
ðηk ; ηl Þ ¼ : ð5:38Þ
ηk ×ηl
To demonstrate how the process works in practice, assume now that the
four random parameters have the following moments: β1 Nð 0:5; 0:1Þ;
β2 Nð0:25; 0:05Þ; β3 Nð 1:00; 0:60Þ; and β4 Nð0:80; 0:20Þ: Further
assume that the random parameters are correlated, with C equal to
Equation (5.34). Given this information, four steps are followed.
Step 1: For each random parameter, k, draw R independent uniformly
distributed random numbers on the interval [0,1]. For example, Figure 5.16
shows the first 15 out of 100 draws generated from Halton sequences for K=4
random parameters.
Step 2: Transform the R independent uniformly distributed random num-
bers into standard Normal distributions. Figure 5.17 demonstrates this trans-
formation using the Microsoft Excel formula normsinv() (see cell I22). As
shown in the figure, the univariate standard Normal distributions have corre-
lations close to zero (Table 5.17).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
162 Getting started
Figure 5.16 Draw R uniformly distributed random numbers on the interval [0,1]
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
163 Estimating discrete choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
Table 5.17 Correlation structure of Halton sequences for dimensions 1 to 12 for 50 to 1,000 draws
p2 1.000 −0.047 −0.035 −0.075 −0.026 −0.100 0.016 −0.059 −0.011 −0.022 0.036 p2 1.000 −0.030 −0.007 −0.031 −0.016 −0.048 −0.029 −0.017 −0.014 −0.029 0.034
p3 −0.047 1.000 −0.049 −0.069 −0.047 0.029 0.006 −0.026 −0.035 0.057 0.001 p3 −0.030 1.000 −0.020 −0.031 0.003 −0.014 −0.054 −0.051 −0.037 0.014 −0.010
p5 −0.035 −0.049 1.000 −0.033 −0.102 0.001 −0.038 −0.058 0.041 0.083 0.043 p5 −0.007 −0.020 1.000 −0.043 −0.038 0.001 0.010 0.012 −0.015 −0.058 −0.072
p7 −0.075 −0.069 −0.033 1.000 −0.093 −0.067 −0.021 −0.095 −0.034 0.061 −0.051 p7 −0.031 −0.031 −0.043 1.000 0.010 −0.002 −0.030 −0.030 0.017 −0.030 −0.011
p11 −0.026 −0.047 −0.102 −0.093 1.000 −0.143 0.038 −0.116 0.222 −0.054 0.074 p11 −0.016 0.003 −0.038 0.010 1.000 0.009 −0.017 −0.030 0.025 0.016 −0.058
p13 −0.100 0.029 0.001 −0.067 −0.143 1.000 −0.014 0.149 −0.101 0.123 −0.003 p13 −0.048 −0.014 0.001 −0.002 0.009 1.000 −0.043 −0.005 0.010 −0.070 0.011
p17 0.016 0.006 −0.038 −0.021 0.038 −0.014 1.000 0.305 −0.185 0.140 0.240 p17 −0.029 −0.054 0.010 −0.030 −0.017 −0.043 1.000 −0.131 −0.017 0.006 −0.090
p19 −0.059 −0.026 −0.058 −0.095 −0.116 0.149 0.305 1.000 0.006 −0.092 −0.043 p19 −0.017 −0.051 0.012 −0.030 −0.030 −0.005 −0.131 1.000 −0.068 0.003 0.042
p23 −0.011 −0.035 0.041 −0.034 0.222 −0.101 −0.185 0.006 1.000 0.053 −0.051 p23 −0.014 −0.037 −0.015 0.017 0.025 0.010 −0.017 −0.068 1.000 −0.086 0.121
p29 −0.022 0.057 0.083 0.061 −0.054 0.123 0.140 −0.092 0.053 1.000 0.721 p29 −0.029 0.014 −0.058 −0.030 0.016 −0.070 0.006 0.003 −0.086 1.000 0.404
p31 0.036 0.001 0.043 −0.051 0.074 −0.003 0.240 −0.043 −0.051 0.721 1.000 p31 0.034 −0.010 −0.072 −0.011 −0.058 0.011 −0.090 0.042 0.121 0.404 1.000
(c) 500 Draws (d) 1,000 draws
p2 p3 p5 p7 p11 p13 p17 p19 p23 p29 p31 p2 p3 p5 p7 p11 p13 p17 p19 p23 p29 p31
p2 1.000 −0.009 0.000 −0.010 −0.004 −0.004 −0.014 −0.008 −0.009 −0.005 0.002 p2 1.000 −0.004 0.000 −0.004 −0.004 −0.002 −0.004 −0.003 −0.003 −0.001 0.001
p3 −0.009 1.000 −0.006 −0.011 0.001 −0.004 −0.003 −0.013 0.002 −0.003 −0.004 p3 −0.004 1.000 −0.003 −0.005 −0.001 −0.006 −0.004 −0.006 −0.003 −0.005 −0.006
p5 0.000 −0.006 1.000 −0.006 −0.001 −0.007 −0.007 0.002 −0.014 −0.013 0.000 p5 0.000 −0.003 1.000 −0.005 −0.001 −0.001 −0.002 −0.002 −0.002 −0.005 −0.002
p7 −0.010 −0.011 −0.006 1.000 −0.005 −0.004 0.000 −0.006 0.004 −0.002 0.001 p7 −0.004 −0.005 −0.005 1.000 −0.003 −0.003 0.003 0.000 −0.002 −0.005 −0.005
p11 −0.004 0.001 −0.001 −0.005 1.000 −0.005 −0.007 0.006 −0.016 −0.003 0.000 p11 −0.004 −0.001 −0.001 −0.003 1.000 −0.003 0.002 0.003 −0.006 0.000 −0.013
p13 −0.004 −0.004 −0.007 −0.004 −0.005 1.000 0.008 0.012 −0.014 0.004 0.007 p13 −0.002 −0.006 −0.001 −0.003 −0.003 1.000 0.004 0.006 0.001 −0.006 −0.006
p17 −0.014 −0.003 −0.007 0.000 −0.007 0.008 1.000 0.020 −0.007 0.019 −0.016 p17 −0.004 −0.004 −0.002 0.003 0.002 0.004 1.000 0.013 0.001 0.011 −0.013
p19 −0.008 −0.013 0.002 −0.006 0.006 0.012 0.020 1.000 −0.010 −0.004 0.001 p19 −0.003 −0.006 −0.002 0.000 0.003 0.006 0.013 1.000 0.008 −0.006 −0.002
p23 −0.009 0.002 −0.014 0.004 −0.016 −0.014 −0.007 −0.010 1.000 −0.014 −0.025 p23 −0.003 −0.003 −0.002 −0.002 −0.006 0.001 0.001 0.008 1.000 −0.003 0.010
p29 −0.005 −0.003 −0.013 −0.002 −0.003 0.004 0.019 −0.004 −0.014 1.000 0.066 p29 −0.001 −0.005 −0.005 −0.005 0.000 −0.006 0.011 −0.006 −0.003 1.000 0.043
p31 0.002 −0.004 0.000 0.001 0.000 0.007 −0.016 0.001 −0.025 0.066 1.000 p31 0.001 −0.006 −0.002 −0.005 −0.013 −0.006 −0.013 −0.002 0.010 0.043 1.000
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
165 Estimating discrete choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
166 Getting started
Given the knowledge of how to draw from different densities, we are now
able to describe how the choice probabilities may be calculated for models in
which the choice probabilities do not have a closed analytical form. This
includes the probit model and any logit model where there are random
parameter estimates, including the MMNL and GMNL models. Such models
require that the choice probabilities be simulated using either PMC or
QMC methods.
We begin first with a discussion of how to calculate the choice probabilities
of the probit model.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
167 Estimating discrete choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
168 Getting started
1:000 0:997 0 0
0 1
B 0:997 1:000 0 0 C
ðei ; ej Þ ¼ B C; ð5:42Þ
B C
@ 0 0 1:000 0 A
0 0 0 1:000
the Cholesky transformation for Ωe is computed:
1:000 0 0 0
0 1
B 0:997 0:072 0 0 C
C¼B C; ð5:43Þ
B C
@ 0 0 1:000 0 A
0 0 0 1:000
which we use to correlate the elements contained within ern as described in
Section 5.5.
Assuming that Halton sequences are used, for r = 1, we obtain
1
en ¼ ð0:000; 0:031; 0:842; 1:068Þ. The utilities for each of the j alterna-
1
tives may now be constructed such that Unsj ¼ Vnsj þ e1nsj : Note that if random
taste parameters are assumed, Vnsj will require that one or more of the para-
meters be drawn from a simulated distribution and hence also require a super-
1 1
script leading to Unsj ¼ Vnsj þ e1nsj : Let Un1 represent the vector of utilities. Given
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
169 Estimating discrete choice models
the above information, we obtain Un1 ¼ ð6:766; 6:851; 4:393; 5:015Þ: In this
case, j = 2 is observed to have the highest utility of the four alternatives, and
hence the choice index becomes In1 ¼ ð0; 1; 0; 0Þ: Repeating the process for
r = 2, we obtain e2n ¼ ð 0:674; 0:642; 0:253; 0:566Þ such that Un2 ¼
ð6:092; 6:240; 4:982; 5:517Þ: Once again, the second alternative is observed to
have the highest utility In2 ¼ ð0; 1; 0; 0Þ: We repeat this process R = 1,000 times.
The simulated probability for alternative j is then computed as the average
number of times that alternative is accepted over the R draws. We leave it to the
reader to confirm that the simulated choice probabilities are 0.041, 0.620, 0.077
and 0.262 for j = 1 to 4, respectively.
The AR simulator represents the simplest approach to calculating the
choice probabilities for probit models (indeed, the approach is general in
the sense that it can be applied to any model, and hence is not limited to just
probit models). Nevertheless, the approach is rather crude and can cause
problems in estimation. The primary concern is that, depending on the
draws taken, it is not uncommon for an alternative to have a zero probability
of being chosen. This is an issue in simulated maximum likelihood estimation
which requires that the log of the probability be taken. Unfortunately, the log
of zero is undefined and hence the simulated maximum likelihood cannot be
computed. The likelihood that an alternative will be zero will increase when
(a) the true choice probability over the sample is low, (b) when a small number
of draws is taken, and (c) if there are a large number of alternatives present.
Also of concern is the fact that the simulated probabilities are not smooth in
the parameters. This is a problem, as most estimation procedures require that
the simulated maximum likelihood be twice differentiable (see Section 5.7).
Unfortunately, this requires that the simulated probabilities be smooth in
the parameters that is not the case and, as such, the estimation procedures
commonly used may not perform as required. To overcome this problem,
larger step sizes are sometimes used than would ordinarily be the case.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
170 Getting started
r
r eλUnsi
Pnsi ¼ XJ r
; ð5:44Þ
λUnsj
j¼1
e
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
171 Estimating discrete choice models
The GHK simulator works with utility differences and assumes that the
model is normalized for both scale and level. In working with utility differ-
ences, the approach iteratively sets each of the J alternatives as the base utility,
each time calculating the choice probability for the base alternative. As such,
the GHK simulator can be quite slow in practice; however, as stated above,
compared to other methods, it is generally far more accurate.
The process begins by selecting one of the alternatives as the base alter-
native. For the moment, assume that alternative i is chosen as the base (and
hence we will calculate the choice probability for this alternative). Adopting
the notation used in Chapter 4, the utility differences are:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
172 Getting started
and the normalized covariance for the matrix of the error differences is:
0 1
1:0000 0:5000 0:5000
~ ¼ B
O @ 0:5000 387:5969 193:7984 A:
C
ð5:49Þ
e1
0:5000 193:7984 387:5969
Equation (5.49) is obtained by dividing all elements in Equation (5.48) by the first
element in the O~ As noted above, this same operation must also be performed
e1
on the vector of differences in the observed components of utility. As before, the
modeled or observed utilities for each of the J alternatives were observed to be
Vj = (6.766,6.882,5.235,6.083). Setting j = 1 as the base alternative, we obtain:
V~nsj1
¼ ð 1:6119 21:3184 9:5139 Þ; ð5:51Þ
which is the vector of differences in the observed utilities normalized for both
scale and level.
The next step in the process requires the estimation of the Cholesky factor
for O~ . We first convert O
~ into a correlation matrix which can be done via
e1 e1
Equation (5.38). Let e1 be the converted correlation matrix, such that:
0 1
1:0000 9:8437 9:8437
e1 ¼ @ 9:8437 1:0000 75115:6781 A; ð5:52Þ
B C
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
173 Estimating discrete choice models
Given the above, we are now able to express the differences in the observed
utilities accounting for the correct degree of correlation expressed in Equation
(5.52). That is, the model may be written as:
~ ns21 ¼ V
U ~ns21 þ s11 z1 ¼ 1:6119 þ z1 ;
~ ns31 ¼ V
U ~ns31 þ s21 z1 þ s22 z2 ¼ 21:3184 þ 0:5z1 þ 19:6811z2 ;
~ ns41 ¼ V
U ~ns41 þ s31 z1 þ s32 z2 þ s33 z3 ¼ 9:5139 þ 0:5z1 þ 9:8342z2 þ 17:0480z3 ;
ð5:55Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
174 Getting started
For the current example, where there exist J = 4 alternatives, this trans-
lates to:
V~ns1i V~ns1i
P~nsi
1
¼ P z1 < ¼Φ ð5:58Þ
s11 s11
This process is repeated for all alternatives j not equal to i, such that the final
calculation in the series is:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:56 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
175 Estimating discrete choice models
0 XJ 1 1
ðV~nsJi þ j¼1
sJj zj Þ
P~nsi
J ¼ P @z <
J jz1 ¼ z1r ; . . . ; zJ 1 ¼ zJr 1 A
sJJ
0 XJ 1 1
ðV~nsJi þ j¼1
sJj zjr Þ
¼ Φ@ A; ð5:61Þ
sJJ
XJ 1
where zjr ¼ Φ 1 urj Φ ðV~nsJi þ j¼1
s Jj zj
r
Þ=s2JJ ; and urj is a draw from a
standard uniform distribution.
The simulated choice probability for the rth draw is then:
r
Pnsi ¼ P~nsi
1r
× P~nsi
2r
× :::: × P~nsi
Jr
: ð5:62Þ
The process is repeated for r = 1,. . ., R draws and the simulated probability
calculated as:
R
1X
EðPnsi Þ ¼ Pr : ð5:63Þ
R r¼1 nsi
from the second dimension of the Sobol sequence, u12 ¼ 0:5; z21 ¼
Φ 1 u12 Φ ðV~ns21 þ s21 z11 Þ=s22 ¼ Φ 1 0:5Φ ð 21:3184 þ 0:5×0:5Þ=
19:6811 ¼ 1:132; which allows us to calculate P~ns1 3r
¼1¼
Φ ð 9:5139þ0:5×17:0480
1:931þ9:8342× 0:162Þ
¼ 0:761: The probability may then be
1
calculated for r = 1 as Pns1 ¼ 0:053×0:871×0:761 ¼ 0:0354: Taking 1,000
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:57 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
176 Getting started
Sobol draws, EðPns1 Þ ¼ 0:035: The choice probabilities for the remaining
alternatives were calculated as 0.632, 0.073, and 0.259, respectively.
There exists within the literature a number of algorithms for locating the
parameters of discrete choice models. In this section, we briefly discuss the
most widely used algorithms, noting that those discussed do not represent
anywhere near a definitive list of those used within the discrete choice
literature. The algorithms discussed make use of the principles of calculus,
in particular, the derivatives of the LL function with respect to the parameter
estimates. We therefore start with a discussion from this perspective, before
going into the details of the various algorithms themselves.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:57 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
177 Estimating discrete choice models
N X
S
t
X gns ðβ Þ ∂LLns ðβÞ
t
gNS ¼ ¼E ; ð5:65Þ
n¼1 s¼1
NS ∂β βt
N XX
X
LLtNS ¼ ynsj ln ðPnsj Þ
n¼1 s2Sn i2Jns
0 1
N XX
X B Vnsj C
B e C
¼ ynsj ln B
BXJ
C
C ð5:66Þ
n¼1 s2Sn i2Jns Vnsi A
e
@
i¼1
!
N XX
X J
X
¼ ynsj Vnsj ln eVnsi :
n¼1 s2Sn i2Jns i¼1
N XX J
∂LLtNS ∂ X X
¼ ynsj ðVnsj ln eVnsi Þ
∂βtk ∂βk n¼1 s2Sn i2J
ns i¼1
!
N XX J
X ∂Vnsj ∂ X
¼ ynsj ln eVnsi
n¼1 s2Sn i2Jns
∂βk ∂βk i¼1
0 1: ð5:67Þ
N XX B J C
X B∂Vnsj 1 X Vnsi ∂Vnsi C
C
¼ ynsj B
B ∂βt J
e t
n¼1 s2Sn i2Jns @ k X
Vnsi i¼1
∂βk C
e
A
i¼1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:57 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
178 Getting started
0 1
J
X ∂Vnsi
N XX
B eVnsi t C
C
∂LLtNS X B∂V
B nsj i¼1
∂β k
¼ y
C
t nsj B t J
∂βk B ∂βk
C
n¼1 s2Sn i2Jns
X C
Vnsi
@ e A
j¼1
0 1
ð5:68Þ
N XX J
eVnsi
X B X C
B C
¼ ynsj B
Bxnsjk J
xnsik C
C
n¼1 s2Sn i2Jns i¼1
X
eVnsi
@ A
i¼1
!
N XX
X J
X
¼ ynsj xnsjk Pnsi xnsik :
n¼1 s2Sn i2Jns i¼1
The gradients for other models, while different, may be similarly computed.
Daly (1987) provides the gradients for the nested logit model, while Bliemer
and Rose (2010) provide the gradients for both the panel and cross-sectional
versions of the MMNL models.
When the gradients of a model are unknown or too complex to compute
analytically, numerical approximation may be used instead. This involves first
calculating the LL for a given set of parameter estimates, and then subsequently
either adding or subtracting a small value, δk, to each parameter one at a time,
re-calculating the LL value for each parameter change (e.g., βtk þ 0:000001).
The gradient for each parameter estimate is then computed as the average over
the sample of the difference between the LL calculated using the original
parameter values and the newly calculated LL, divided by the amount added
k t
or subtracted from the parameter estimate. Let ðgNS Þ represent the gradient for
th
the k parameter, then the procedure as described is simply:
1. Calculate the LL for the model assuming βt. Designate the LL function as
LLtns ðβt Þ:
2. Recalculate the LL assuming that βt1 þ δ 1 ; fixing the remaining k−1 para-
meters at the values given in βt. Designate this new LL as LL1NS ðβ1 Þ:
3. The gradient for the first parameter is then calculated as:
2 3
LLtns ðβt Þ LLtns ðβ1 Þ
1 t
ðgNS Þ ¼ E4 δ1
5:
4. Recalculate the LL assuming that βt2 þ δ 2 ; fixing the remaining k−1 para-
meters at the values given in βt. Designate this new LL as LLtNS ðβ2 Þ:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:57 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
179 Estimating discrete choice models
∂2 LLns ðβÞ
t
HNS ¼E : ð5:69Þ
∂β∂β0 βt
The negative of this matrix, called the Fisher Information matrix, or simply
Information matrix, is therefore computed as:
∂2 LLns ðβÞ
t
INS ¼ E : ð5:70Þ
∂β∂β0 βt
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:57 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
180 Getting started
1
þ ðβtþ1 βt Þ0 gNS t t
HNS ðβtþ1 βt Þ: ð5:71Þ
2
δLLNS ðβtþ1 Þ t
¼ gNS t
þ HNS ðβtþ1 βt Þ ¼ 0;
βtþ1
t
HNS ðβtþ1 βt Þ ¼ t
gNS ; ð5:72Þ
β tþ1
β ¼t t
ðHNS Þ 1 gNS
t
;
1
βtþ1 ¼ βt t
ðHNS Þ gNSt
:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:57 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
181 Estimating discrete choice models
the ability to estimate the model parameter estimates. In addition to its role in
assisting in the location of the parameter estimates, the matrix Ωp plays
another important role from an econometrics perspective. This matrix,
which is often referred to within the literature as the variance-covariance
matrix, or simply covariance matrix, contains information as to the robustness
of the parameter estimates as well as any relationships that exist between
them, and plays an important role in experimental design (see Chapter 6) as
well as in conducting tests of statistical inference about the parameter esti-
mates themselves (see Chapter 7).
Despite being referred to as the covariance matrix, note that Ωp differs from
the covariance matrix of error terms, Ωe, as defined in Chapter 4. As shown in
Chapter 4, the elements contained within the covariance matrix of error terms
relate to the unobserved effects of the model and, depending on which model is
estimated and what normalizations are used, may be thought of as parameters to
be estimated. For example, in Section 4.3.2, we estimated, using a probit model,
a correlation term (which maps to a covariance term) from Ωe. As we show in
Chapters 6 and 7, the elements in Ωp are not parameters per se, but rather
provide information about the parameters themselves. Hence, all estimates,
whether they be parameter estimates, scale terms, or modeled error terms, will
be represented within Ωp. This is important to note, as despite presenting our
discussion in terms of parameter estimates, the discussion chapter extends to all
estimates. That is, the vector of parameter estimates, β, may be considered to
contain not just parameters associated with specific variables (i.e., βk), but also
any other parameters associated with a model being estimated, such as those
related to scale, λj, and even the error terms, σij, contained within Ωe.
As we show in the following sections, the computation of Ωp can be some-
what difficult and definitely more than a little time consuming, all of which is
exacerbated by the fact that Ωp needs to be computed over multiple iterations,
t. To save both computational effort and time, some software will multiply
Ωtp gNS
t
by a scalar, α, as shown in Equation (5.74):
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:57 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
182 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:57 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
183 Estimating discrete choice models
and
0 1
B Vnsj C
∂Pnsj ∂ BB e
C
t ¼
C
tB J
∂βl ∂βl @X V C
e nsi
A
0 i¼1 12
B C J XJ !
B 1 C ∂ Vnsj
X
Vnsi ∂
¼B J C ∂βt ðe Þ e
C eVnsj t eVnsi
BX
l i¼1
∂β l i¼1
eVnsi ð5:78Þ
@ A
0 i¼1 J J
1
∂V nsj
X
V
X
V ∂Vnsi
BeVnsj e nsi
e nsi
t Vnsj ∂βtl C
C
B ∂β l i¼1 e i¼1
¼B J
B
J J J
C
C
@ X Vnsi X Vnsi X Vnsi X Vnsi A
e e e e
i¼1 i¼1 i¼1 ! i¼1
J
∂Vnsj X ∂Vnsi
¼ Pnsj Pnsi :
∂βtl i¼1
∂βtl
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:57 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
184 Getting started
Substituting the relevant values into Equation (5.78) will produce the Hessian
matrix for the MNL model, the negative inverse of which will be the covar-
iance matrix of the model. Like the gradients, the equations necessary for
deriving the Hessian matrix of different choice models will differ given the
divergent LL functions of the various possible models. Daly (1987) and
Bliemer et al. (2009) provide the equations required to compute the Hessian
for the nested logit model, while Bliemer and Rose (2010) provide those for
both the panel and cross-sectional versions of the MMNL models.
For models where the analytical derivatives are unknown, or too difficult to
compute, alternative approaches that approximate the Hessian will need to be
adopted. The specific approximation adopted represents the defining differ-
ences of the algorithms that we now discuss.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:57 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
185 Estimating discrete choice models
k
where gns is the kth element of gns ðβt Þ; where βt has been omitted purely for
the purposes of convenience. The elements of the Information matrix for the
model are then computed by simply summing the corresponding elements of
XN X S
t t
the choice specific Information matrices, such that INS ¼ Ins :
n¼1 s¼1
Use of the BHHH algorithm offers both advantages and disadvantages
over the NR algorithm. With regards to advantages, the BHHH algorithm
will generally require less time to estimate, as the algorithm requires only
the estimation of the model gradients, which are then used to compute the
Information matrix given as Equation (5.79). The NR algorithm, on the other
hand, requires additional calculations be made to compute the second deri-
vatives of the model’s LL function with respect to the parameter estimates,
leading to longer estimation times. Nevertheless, experience suggests that the
BHHH algorithm can be somewhat slower and require substantially more
iterations than the NR algorithm to reach model convergence, particularly
when the algorithm is far from the maximum of the LL function. This is
because the BHHH algorithm will tend to produce small step changes when
far from the maximum, and hence require more iterations than the NR
algorithm to reach model convergence. Second, the BHHH algorithm, as an
approximation of the Information matrix, does not require the analyst to first
calculate, then program, the second derivatives of the LL function. This means
that the algorithm can be easily applied to any model, no matter how complex,
and hence is extremely portable. Third, unlike the NR algorithm, the
Information matrix obtained from the BHHH algorithm is guaranteed to be
positive definite at each iteration, meaning that it will always be possible to
invert the matrix (which is necessary to obtain the estimates of Ωp required in
Equation 5.72) and that improvements in LL function will be observed at each
iteration.
Nevertheless, the BHHH algorithm does suffer from a number of short-
comings. Aside from tending to provide a poor approximation to the
Information matrix when far from the maximum of the LL function, the
major disadvantage of the BHHH algorithm is that the Information matrix
produced by the algorithm will converge to the true Information matrix,
assuming that the model is correctly specified, only as NS→∞: That is, the
BHHH can yield values that are very different to the true values in small
samples.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:57 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
186 Getting started
W ð0Þ ¼ I:
Then,
0 0
W ðtþ1Þ ¼ W ðtÞ þ aðtþ1Þ aðtþ1Þ þ bðtþ1Þ bðtþ1Þ ¼ W ðtþ1Þ þ Eðtþ1Þ ; ð5:81Þ
where a(t+1) and b(t+1) are two vectors that are computed using the gradients at
the current and previous iterations, g(t+1) and g(t). Notice that the update
matrix, E(t+1) is the sum of two outer products and thus has rank 2 – hence the
0
name. The BFGS algorithm adds a third term, c(t+1)c(t+1) which produces a
rank 3 update. Precise details on the computation of E(t+1) appear in Appendix
E of Greene (2012, 1099). After a sufficient number of iterations, W (t) will
provide an approximation to the negative inverse of the second derivatives
matrix of the LL. In some applications, this approximation has been used as
the estimator of the asymptotic covariance matrix of the coefficient estimators.
As a general outcome, while the approximation is sufficiently accurate for
optimization purposes, it is not sufficiently accurate to be used directly for
computing standard errors. After optimization (or during, on the side), it is
necessary to compute the estimator of the asymptotic covariance matrix for
the maximum likelihood estimation (MLE) separately.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:57 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
187 Estimating discrete choice models
The calculations used to compute the Cholesky matrix in Equation (5.37) are:
pffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffi
s11 ¼ η11 ¼ 1:853 ¼ 1:361 from ð5:33aÞ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:57 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
188 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:57 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.007
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
As far as the laws of mathematics refer to reality, they are not certain; and as far as
they are certain, they do not refer to reality.
(Einstein 1921)
This chapter was co-authored with Michiel Bliemer and Andrew Collins.
6.1 Introduction
This chapter might be regarded as a diversion from the main theme of discrete
choice models and estimation; however, the popularity of stated choice (SC)
data developed within a formal framework known as the “design of choice
experiments” is sufficient reason to include one chapter on the topic,1 a topic
growing in such interest that it justifies an entire book-length treatment. In
considering the focus of this chapter (in contrast to the chapter in the first
edition), we have decided to focus on three themes. The first is a broad
synthesis of what is essentially experimental design in the context of data
needs for choice analysis (essentially material edited from the first edition).
The second is an overview in reasonable chronological order of the main
developments in the literature on experimental design, drawing on the con-
tribution of Rose and Bliemer (2014), providing an informative journey on the
evolution of approaches that are used to varying degrees in the design and
implementation of choice experiments. With the historical record in place, we
then focus on a number of topics which we believe need to be given a more
1
This chapter draws on the first edition and a number of papers which were written primarily by John Rose
and Michiel Bliemer, with some inputs from papers by David Hensher and Andrew Collins. Andrew
Collins provided some examples on how to use Ngene; Chinh Ho contributed the case study Ngene
design on BRT versus LRT.
189
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
190 Getting started
detailed treatment, which includes sample size issues, best–worst designs, and
pivot designs. We draw on the key contributions in Rose and Bliemer (2012,
2013); Rose (2014); and Rose et al. (2008). We use Ngene (Choice Metrics
2012), a comprehensive tool that complements Nlogit5 and which has the
capability to design the wide range of choice experiments discussed in this
chapter, and to provide syntax for use in a few of the designs. We refer the
reader to the Ngene manual for more details (www.choice-metrics.com/doc
umentation.html).
Unlike most survey data, where information on both the dependent and
explanatory variables is captured directly from respondents, SC data is unique
in that typically only the choice response variable is provided by the respon-
dent. With the exception of covariate information, which is often ignored in
most analysis, the primary variables of interest, consisting of attributes and
their associated levels, are designed in advance and presented to the respon-
dent in the form of competing alternatives in SC studies. However, increasing
evidence of both an empirical (e.g., Bliemer and Rose 2011; Louviere, Street
et al. 2008) and a theoretical nature (e.g., Burgess and Street 2005; Sándor and
Wedel 2001, 2002, 2005) suggests that the specific allocation of the attribute
levels to the alternatives presented to respondents may impact to a greater or
lesser extent on the reliability of the model outputs, particularly when small
samples are involved. As such, rather than simply randomly assign the
attribute levels shown to respondents over the course of an experiment,
experimental design theory has been applied to allocate the attribute levels
to the alternatives in some systematic manner.
The objective of this chapter is twofold. Firstly, it is argued that, despite
the disparate nature of the existent literature, there does indeed exist a
unified experimental design theory for the construction of SC experiments.
Furthermore, this theory is capable of accommodating each of the design
paradigms that have appeared within the literature at one time or another.
Second, in presenting this theory it is discussed how the various researchers in
this field have actually been reliant on this theory, many without knowing it,
but under very different sets of assumptions. It is these assumptions that define
the different approaches, and not differences in the underlying experimental
design theory.
The remainder of this chapter is set out as follows. Section 6.2 discusses
what exactly an experimental design is, and why it is important. Section 6.3
outlines a number of decisions that are required prior to generating the
experimental design. Section 6.4 then provides a discussion of the theory of
experimental design as it relates to SC studies. Section 6.5 provides a selective
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
191 Design and choice experiments
As an aside, This chapter draws on the many papers by Rose and Bliemer and Bliemer and
Rose to highlight some of the main developments in the literature on choice experiments
since the first edition. In addition we provide a number of examples of how the Ngene
software, which complements Nlogit, can be used to design choice experiments.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
192 Getting started
with the terms attribute and attribute levels. We do so noting that the experi-
mental designs we discuss throughout this book involve the manipulation of
the levels of goods and services only.
We also note that much of the literature refers to each individual attribute
level as a treatment. A combination of attributes, each with unique levels, is
called a treatment combination. Treatment combinations thus describe the
profile of the alternatives within the choice set. Again, different literatures
have developed their own terminology – for example, marketing, which refers
to treatment combinations as profiles. We will use the terms treatment and
treatment combination throughout. The language associated with the field of
experimental design can become quite complicated, quickly.
Figure 6.1 summarizes the process used to generate stated preference
experiments. This process begins with a refinement of the problem, to ensure
Problem refinement
Stage 1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
193 Design and choice experiments
that the analyst has an understanding of what the research project hopes to
achieve by the time of completion.
Once the problem is well understood, the analyst is required in stage two to
identify and refine the stimuli to be used within the experiment. It is at this stage
of the research that the analyst decides on the list of alternatives, attributes, and
attribute levels to be used. This refinement may result in further scrutiny of the
problem definition and as a result a return to the problem refinement stage of
the process. Moving from stimuli refinement, the analyst must now make
several decisions as to the statistical properties that will be allied with the final
design.
As an aside, the first two stages of the process consist of refining the analyst’s under-
standing of behavioral aspects of the problem as they relate to decision makers. It is hoped
that this understanding of the behavioral impacts will regulate the analyst’s decision process
at the time of considering the statistical properties of the design. Often, however, statistical
considerations must take precedence. Statistically inefficient designs, designs that are
unwieldy in size, or possibly even the non-availability of a design that fits the behavioral
requirements established in the earlier stages, may trigger a return to the first two stages of
the design process.
Provided that the analyst is sufficiently happy to continue at this point, the experimental
design may be generated in stage three. While it is preferable to generate such designs from
first principles, such a derivation requires expert knowledge. For the beginner, we note that
several statistical packages are capable of generating simple experimental designs that may
be of use (e.g., SPSS®, SAS®, and Ngene®). Following the generation of the experimental
design, the analyst must allocate in stage four the attributes selected in stage two to specific
columns of the design. Again, a return to previous stages of the design process may be
necessary if the design properties do not meet the criteria established at earlier stages of the
process.
Once the attributes have been allocated to columns within the design, the
analyst manipulates the design to produce the response stimuli in stage five.
While several forms of response stimuli are available to the analyst, we con-
centrate in this book on only one type, that of choice. Thus, the sixth stage of
the design process sees the analyst construct choice sets that will be used in the
survey instrument (e.g., a questionnaire). To overcome possible biases from
order effects, the order of appearance of these choice sets is randomized across
the survey instrument shown to each respondent. As such, several versions are
created for each single choice experiment undertaken. The final stage of the
experimental design process is to construct the survey, by inserting the choice
sets as appropriate into the different versions and inserting any other questions
that the analyst may deem necessary to answer the original research problem
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
194 Getting started
As an aside, we note that at this stage of the research the analyst should not be wed to any
particular methodological approach to answer the research questions. Rather the questions the
analyst arrives at from the problem refinement process should decide the approach to be
taken. Hence the approach does not decide which questions should be asked. As a further
aside, we note that given the possibility of deriving several possible research questions from a
single research problem, the analyst may be required to employ several research approaches
to satisfactorily resolve the problem.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
195 Design and choice experiments
conditions of the roads between the two cities. For example, poorly main-
tained roads may result in higher patronage of modes that do not rely on the
road system (i.e., trains or aircraft). In setting up such hypotheses, the analyst
begins to build upon the types of questions that need to be asked of the
population of travelers. For the above example, without asking questions
related to the road conditions experienced and without attempting to obtain
information as to the impacts of such conditions upon mode choice, this
hypothesis will remain just that, a hypothesis.
Only once the research problem has been properly refined should the
analyst proceed. We will assume that the analyst has garnered a sufficient
understanding of the problem to meaningfully proceed. We will further
assume that given the necessity to estimate modal market shares in the
presence of a new modal alternative, the analyst has decided upon the use of
a stated preference experiment. The next stage of the design process is the
refinement of the stimuli to be used in the experimental design.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
196 Getting started
be left with little choice but to cull alternatives in order to reach a manageable
number to study. We note several ways to reduce the number of alternatives to be
used within a study. Firstly, the analyst may assign a randomly sampled number
of alternatives taken from the universal but finite list of alternatives to each
decision maker (plus the chosen). Hence, each decision maker is presented with a
different sub-set of alternatives. Thus, while in the aggregate (provided enough
decision makers are surveyed) the entire population of alternatives may be
studied, each individual decision maker views a reduced set of alternatives within
their given choice set (essentially they adopt a process heuristics such as ignoring
certain alternatives – see Chapter 21). While such an approach appears more
appealing than simply removing alternatives from all decision makers’ choice
sets, the experimental designs for such studies tend to be quite large and complex.
This process, however, under the strict condition of IID (see Chapter 4), does not
violate the global utility maximization assumption. When we deviate from IID,
the global utility maximization assumption is violated.
The second approach to reducing the alternatives is to exclude “insignif-
icant” alternatives. The problem here is that the analyst is required to make the
somewhat subjective decision as to what alternatives are to be considered
insignificant and therefore removed from the study. However in making such
a decision, the analyst is placing more weight on practical, as opposed to
theoretical, considerations. A third approach is to use experiments that do not
name the alternatives (i.e., the analyst defines generic or unlabeled alternatives).
If the universal, but finite, list of alternatives is relatively small (typically up to
10 alternatives although we have often studied the choice among 20 alterna-
tives), the analyst may decide not to reject alternatives from the choice analysis
at all. We end this discussion by stating that the analyst should be guided in their
decision by the research problem in determining how best to proceed.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
197 Design and choice experiments
station and time taken walking to the station) and fares. None of these attributes
is associated with driving a car to work. Instead, decision makers are likely to
consider such car-related attributes as fuel, toll, and parking costs. Both modes
do share some attributes that decision makers are likely to consider. For
example, departure time from home, arrival time at work, and comfort. Yet,
despite these attributes being shared by both alternatives, the levels decision
makers cognitively associate with each alternative are likely to be different.
There is no need for the decision maker to travel to the station if they choose
to travel by car, and hence they are likely to be able to leave home later if this
mode of transport is selected (assuming favorable traffic conditions). The levels
one attaches to comfort may differ across the alternatives. Indeed, we invite the
reader to consider what the attribute comfort means in the context of traveling
to work either by train or by car. A discussion of the meaning of comfort is an
excellent group discussion theme. It reveals the ambiguities in meaning and
measurement of many attributes one may wish to use as part of a choice study.
Continuing with the example of comfort, interesting questions are raised as to
how the analyst is to communicate attributes and attribute levels to decision
makers (recalling that in SP tasks the analyst relates the attributes and attribute
levels to respondents). What does the word “comfort” really mean, and does it
mean the same thing to all decision makers? In the context of a train trip, does
comfort refer to the softness of the seats aboard the train? Or could comfort
relate to the number of other patrons aboard which affects the personal space
available for all on board? Alternatively, could comfort refer to the temperature
or ambience aboard the train? Or is it possible that decision makers perceive
comfort to be some combination of all of the above, or perhaps even none of the
above but rather some other aspect that we have missed, such as getting a seat?
And what does comfort refer to in the context of a car trip?
As an aside, the consequences of attribute ambiguity may not be apparent at first. We note
that what the analyst has done by inclusion of an ambiguous attribute is more than likely add
to the unobserved variance in choice between the alternatives without adding to their ability
to explain any of the new increase in variation observed. Further, looking ahead, consider
how the analyst may use such attributes after model estimation. Assuming that the attribute
is statistically significant for the train alternative, what recommendations can the analyst
make? The analyst may recommend improving comfort aboard trains; however questions
remain as to how the organization responsible for the running of the trains may proceed.
What aspects of comfort should be improved? Will the specific improvements result in
persuading all decision makers to switch modes, or just those who perceive comfort as
relating to those areas in which improvements were made? Failure to correctly express
attribute descriptors results in lost time and money for all parties.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
198 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
199 Design and choice experiments
In such cases, we suggest that the beginner identifies attributes that may act as
proxies for other attributes and select and use the most appropriate attribute
for the study.
Having identified the attributes to be used in the experiment, the analyst must
now derive attribute labels and attribute level labels. We define attribute levels
as the levels assigned to an attribute as part of the experimental design process.
These are represented by numbers that will have meaning for the analyst but not
for the decision maker being surveyed. Attribute level labels, on the other hand,
are assigned by the analyst and are related to the experimental design only
insofar as the number of attribute level labels must equal the number of attribute
levels for a given attribute. Attribute level labels are the narrative assigned to each
attribute level that will (if the experiment is designed correctly) provide meaning
to the decision maker. Attribute level labels may be represented as numbers
(i.e., quantitative attributes such as travel time may have attribute level labels of
10 minutes, 20 minutes, etc.) or as words (i.e., qualitative attributes such as color
may have attribute level labels of green and black).
The identification and refinement of the attribute levels and attribute level
labels to be used in an experiment is not an easy task, requiring several
important decisions to be made by the analyst. The first decision is how
many attribute levels to assign to each attribute, noting that the number of
levels does not have to be the same for each attribute. Let us consider the
attribute travel time for a single alternative. For any given decision maker,
there will exist for this attribute different quantities of utility associated with
the various levels that may be taken. That is, the utility for 5 minutes of travel
time is likely to be different to the utility attached to 10 minutes of travel time.
Is the utility attached to 5 minutes of travel time likely to be different to the
utility attached to 5 minutes and 10 seconds of travel time? Each “possible”
attribute level may be mapped to a point in utility space. The more levels we
measure of an attribute, the more information (and hopefully accuracy) we
capture in utility space.
Figure 6.2 illustrates this point. Figure 6.2 shows in utility space the level of
utility derived from a single attribute at varying levels. The utility brought
about by the levels of a single attribute has been referred to as part-worth
utility in some literatures such as marketing, or marginal utility in others. As
we move from Figure 6.2(a) to 6.2(d) we note the analyst’s ability to detect
more complex utility relationships as more levels (and hence more observa-
tions) are added. Indeed, starting with Figure 6.2(a), the analyst would be
forced to conclude that the utility relationship for the attribute is linear given a
change in the attribute level from level one to level two. Examination of
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
200 Getting started
Utility
Utility
Utility
Utility
a b c d
1 2 1 2 3 1 2 3 4 1 2 3 4 5
Attribute level Attribute level Attribute level Attribute level
Figure 6.2 Mapping part-worth utility
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
201 Design and choice experiments
As an aside, while we have concentrated on a quantitative example we note that the above
holds for qualitative attributes as well. We make a distinction between nominal and ordinal
scale qualitative attributes. A nominal qualitative attribute may be one such as the color
used for the bus alternative, where no natural order exists between the levels assumed.
Selecting attribute levels to use for such attributes involves an in depth study as to what
levels are likely to result in changes to preference (e.g., should the analyst use as levels the
colors blue or red or green?). Ordinal qualitative attributes assume that some natural order
exists among the levels. Taking the bus alternative as an example once more, the demeanor
of the bus driver may be a significant attribute in preference formation for this alternative.
Demeanor may be measured on some non-quantitative continuum ranging from “grumpy”
to “gregarious,” where gregarious is naturally rated higher than grumpy. Assigning attribute
level labels to points between these extremes is a tedious task, requiring much work for the
number of descriptive labels that may exist between these two extremes.
To conclude, we note the existence of the axiom “garbage in, garbage out.”
The meaning of this axiom is quite simple. If a computer programer enters
invalid data into a system, the resulting output produced will also be invalid.
Although originating in computer programing, the axiom applies equally well to
other systems, including systems dealing with decision making. The point to
take away from this is that the analyst is best to spend as much time identifying
and refining the lists of alternatives, attributes, attribute levels, and attribute level
labels to be used before proceeding to the formal design of the experiment.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
202 Getting started
1 Low 10 hours
2 Low 12 hours
3 Low 14 hours
4 Medium 10 hours
5 Medium 12 hours
6 Medium 14 hours
7 High 10 hours
8 High 12 hours
9 High 14 hours
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
203 Design and choice experiments
1 0 0
2 0 1
3 0 2
4 1 0
5 1 1
6 1 2
7 2 0
8 2 1
9 2 2
As an aside, convention suggests that we use only odd numbers in such coding i.e. −3, −1,
0, 1, 3, etc. Table 6.3 shows the orthogonal codes for the equivalent design codes used in
Table 6.2 above for attributes up to six levels. Note that by convention, −5 and 5 are not used
in orthogonal coding.
The analyst may choose to stop at this point of the design process and use
the design as it is. By having decision makers rate or rank each of the treatment
combinations, the analyst has elected to perform conjoint analysis. We focus
on choice analysis and not conjoint analysis and as such require some
mechanism that requires decision makers to make some type of choice. To
proceed, we note that the treatment combinations above represent possible
product forms that our single alternative may take. For a choice to take place
we require treatment combinations that describe some other alternative
(recalling that choice requires at least two alternatives).
Assuming that the second alternative also has two attributes that are deemed
important in preference formation, we now have a total of four attributes: two
attributes for each alternative. As discussed earlier, these attributes do not have
to be the same across alternatives, and even if they are the attribute levels each
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
204 Getting started
2 0 −1
1 1
3 0 −1
1 0
2 1
4 0 −3
1 −1
2 1
3 3
5 0 −3
1 −1
2 0
3 1
4 −3
6 0 −7
1 −3
2 −1
3 1
4 3
5 7
assumes do not have to be the same. For ease, we will assume that the
attributes for the two alternatives are the same. Let us assume that the
attribute levels for the comfort attribute are the same for both alternatives,
but that for alternative 2 we observe attribute levels of 1 hour, 1.5 hours, and
2 hours for the travel time attribute (as opposed to 10, 12, and 14 hours for
alternative 1). Taking the full factorial design for two alternatives, each with
two attributes with three levels, 81 different treatment combinations exist.
How did we arrive at this number?
The full enumeration of possible choice sets is equal to LJH for labeled
choice experiments (defined in Section 6.2.3.1) and LH for unlabeled experi-
ments. Thus, the above example yields 81 (i.e., 3(2×2) = 34) possible treatment
combinations (assuming a labeled choice experiment; for an unlabeled choice
experiment, we could reduce this to nine treatment combinations (i.e., 32)).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
205 Design and choice experiments
Alternative 1 Alternative 2
Car Plane
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
206 Getting started
As an aside, a further problem arising from the use of labeled experiments develops from the
perceptual assumptions decision makers hold for each labeled alternative. To date, we have
kept our example simple for pedagogical purposes. Clearly, however, in reality, mode choice
depends on more than the two attributes we have identified. Decision makers may use
assumptions surrounding the labels attached to alternatives as proxies for these omitted
attributes. We invite the reader to return to our earlier discussion on the IID assumption in
Chapter 4 to see how omitted attributes are treated. The message here is that one should
spend as much time as is feasible in identifying which attributes, attribute levels, and
attribute level labels to use in an experiment.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
207 Design and choice experiments
with Qantas or their competitors, simply because of the brand name Qantas.
The same logic has engendered an emotional attachment to light rail com-
pared to bus rapid transit (Hensher et al. 2014, in press). The brand name
connotes an historical accumulation of utility associated with attribute levels
experienced in the past, and as such is a very powerful influence on choice,
often downgrading the role of currently observed (actual or perceived) attri-
bute levels.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
208 Getting started
Alternative
Attribute Car Bus Train Plane
That is, each attribute will have only two attribute levels, both at the two
extremes of the attribute level range. Such designs are known as end-point
designs (as promoted in Louviere et al., 2000, Chapter 5). For the example
above, using an end-point design reduces the number of treatment combina-
tions to 256. End-point designs are particularly useful if the analyst believes
that linear relationships exist among the part-worth utilities, or if the analyst is
using the experiment as an exploratory tool.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
209 Design and choice experiments
As an aside, the number of rows which represent the number of alternative combinations of
attribute levels are critical to the determination of column orthogonality; but once the
orthogonality is established for a given number of rows, we can easily remove columns
without affecting the orthogonality. Removing rows, however, will affect the orthogonality. In
studies we often give individuals sub-sets of rows, which is fine assuming that when we pool
the data for analysis, we retain an equal number of responses for each row. The importance
of sampling becomes paramount in preserving the orthogonality.
As an aside, full factorial designs mathematically display orthogonality, that is why we have
ignored this important concept to date.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
210 Getting started
Vi ¼ β0i þ β1i f ðX1i Þ þ β2i f ðX2i Þ þ β3i f ðX3i Þ þ . . . : þ βKi f ðXKi Þ; ð6:1Þ
where
β1i is the weight (or parameter) associated with attribute X1 and alternative i;
β0i is a parameter not associated with any of the observed and measured
attributes, called the alternative-specific constant (ASC), which represents on
average the role of all the unobserved sources of utility.
Using Equation (6.1), an ME is the effect each attribute has on the response
variable (Vi in Equation 6.1) independent of all other attribute effects.
Examination of Equation (6.1) suggests that the impact of any attribute, for
example X1i, on Vi, is equivalent to its associated parameter weight, in this
instance β1i. Thus the βkis represent our estimates of MEs. For any given
design, the total number of MEs that we can estimate is equivalent to the
number of attribute levels present in the design.
What we have not shown in Equation (6.1) are the interaction terms. An
interaction occurs when the preference for the level of one attribute is
dependent upon the level of a second attribute. A good example of this is
nitro-glycerine. Kept separately, nitro and glycerine are relatively inert; how-
ever, when combined an explosive compound is created. This is not a chem-
istry text, however, and a useful example for students of choice is warranted.
The part-worth utility functions might thus look like Equation (6.2):
Vi ¼ β0i þ β1i f ðX1i Þ þ β2i f ðX2i Þ þ β3i f ðX3i Þ þ . . . : þ βKi f ðXKi Þ þ βLi f ðX1i X2i Þ
þ βMi f ðX1i X3i Þ þ . . . : þ βOi f ðX1i Xki Þ þ βPi f ðX2i X3i Þ þ . . . :: þ βZi f ðX1i X2i X3i... XKi Þ;
ð6:2Þ
where
f(X1iX2i) is the two-way interaction between the attributes X1i and X2i and
βKi is the interaction effect. f(X1iX2iX3i. . . XKi) is the kth-way interaction and
βZi is the related interaction effect.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
211 Design and choice experiments
Returning to our example, assume that the analyst identified color as being
an important attribute for the bus alternative. Research showed that for trips
of 10 hours or less, decision makers had no preference for the color of the bus.
However, for trips over 10 hours, bus patrons prefer light colored buses to
dark colored buses (the analyst suspects that dark color buses become hotter
and therefore more uncomfortable over longer distances). As such, the pre-
ference decision makers have for the bus alternative is not formed by the effect
of color independent of the effect of travel time but rather is formed due to
some combination of both.
Because the level of one attribute when acting in concert with a second
attribute’s level affects utility for that alternative, the analyst should not
examine the two variables separately, but rather in combination with one
another. That is, the bus company should not look at the decision of which
color bus to use as separate to the decision of what route to take (affecting
travel times). Rather, the two decisions should be considered together in order
to arrive at an optimal solution. In terms of our model, if an interaction effect
is found to be significant, then we need to consider the variables collectively
(though the model itself does not tell us what the optimal combination is). If
the interaction effect is found not to be significant, then we examine the main
effects by themselves in order to arrive at the optimal solution.
As an aside, one might confuse the concept of interaction with the concept of correlation.
Correlation between variables is said to occur when we see movement in one variable similar
to the movement in a second variable. For example, a positive correlation may be said to exist if, as
price increases, we also observe quality increasing. While this looks remarkably like the concept
of interactions discussed above, it is not. The concept of an interaction between two attributes is
about the impact two attributes are having when acting in concert. Thus, in the example described
earlier, we are not interested in whether, as the level of color changes from light to dark, travel
time increases. Rather we are interested in the impact certain combinations of color and travel
time may have on bus patronage (i.e., increasing utility for the bus alternative relative to the other
alternatives). That is, which combinations of color and travel time will sell more bus tickets? Put
simply, a correlation is said to be a relationship between two variables, whereas an interaction
may be thought of as the impact two (or more) variables have on a third (response) variable.
Interaction and main effects are important concepts and must be fully
understood by the analyst. One benefit of using full factorial designs is that
all the main effects and all the interaction effects may be estimated indepen-
dent of one another. That is, the analyst may estimate parameters for all main
effects and interaction effects such that there is no confoundment present. As
we argue later, however, the above arguments relate to linear models. Due to the
exponentiation in the choice probabilities, discrete choice models are non-linear
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
212 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
213 Design and choice experiments
Variable
Attribute level Comfort1 Comfort2
High 1 0
Medium 0 1
Low 0 0
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
214 Getting started
Such coding allows for non-linear effects to be tested in the levels of the
attributes. Returning to Equation (6.3), Comfort1 would now be associated
with f(X1i) and Comfort2 with f(X2i). Consequently we now have two β
parameters associated with our single comfort attribute, β1i and β2i. The utility
associated with a high level of comfort, ceteris paribus, now becomes:
What we now have is a different value of utility associated with each level of
the attribute coded. We have therefore overcome the problem noted with the
more traditional coding method of linear changes in the response variable
given one unit changes in the explanatory variable.
We have left β0i in Equations (6.4) through (6.9) quite deliberately.
Examination of Equation (6.9) shows that the utility associated with the
base level will always, by default, equal β0i. That is, we are not measuring
the utility associated with low comfort at all, but rather the average overall
utility level when we look at the utility for the base level. This suggests that by
dummy coding the data we have perfectly confounded the base level of an
attribute with the overall or grand mean. Each attribute we dummy code will
also be perfectly confounded with the grand mean. The question is then: What
have we measured? Have we measured the utility for the base level or the
overall or grand mean?
It is for the above reason that we prefer effects coding as opposed to dummy
coding. Effects coding has the same advantage of dummy coding in that non-
linear effects in the attribute levels may be measured, but it dispenses with the
disadvantage of perfectly confounding the base attribute level with the grand
mean of the utility function.
To effects code, we follow the procedure set out above for dummy coding;
however, instead of coding the base level 0 across our newly created variables,
we now code the base level as −1 across each of these new variables. Thus for
our example we now have the coding structure in Table 6.8.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
215 Design and choice experiments
Variable
Attribute Level Comfort1 Comfort2
High 1 0
Medium 0 1
Low −1 −1
Level 1 1
Level 2 −1
Level 1 1 0
Level 2 0 1
Level 3 −1 −1
Level 1 1 0 0
Level 2 0 1 0
Level 3 0 0 1
Level 4 −1 −1 −1
Level 1 1 0 0 0
Level 2 0 1 0 0
Level 3 0 0 1 0
Level 4 0 0 0 1
Level 5 −1 −1 −1 −1
As we have not changed the coding for the high and medium comfort levels,
Equations (6.7) and (6.8) still hold. However the estimate of utility associated
with the change of the coding for the low level of comfort, now becomes:
We note that the utility for the base level is now no longer perfectly confounded
with alternative i’s grand mean, but rather may be estimated as β0i – β1i – β2i. As
such, the effect of the base level is therefore equivalent to – β1i – β2i around β0i.
In Table 6.9, we show the coding structure for attributes up to five levels.
For attributes with more than five levels, the analyst is simply required to add
more variables.
To demonstrate the importance of the coding choice (i.e., the use of a single
(linear) attribute or several dummy or effects coded variables representing a
single attribute), consider Figure 6.4.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
216 Getting started
Utility
Utility
Utility
Utility
a 1 2 3 4 5 b 1 2 c 1 2 3 d 1 2 3 4 5
Attribute level Attribute level Attribute level Attribute level
Figure 6.4 Estimation of linear versus quadratic effects
As an aside, because the estimation of a single parameter for an attribute will produce a
linear estimate (i.e., slope), we refer to such estimates as linear estimates. An attribute
estimated with two dummy (or effects) parameters is known as a quadratic estimate and
subsequent dummy (or effects) parameters are known as polynomials of degree L−1
estimates (where L is the number of dummy or effects parameters).
What the above discussion suggests is that the more complex the part-worth
utility function, the better off one is to move to a more complex coding structure
capable of estimating more complex non-linear relationships. Of course, prior
to model estimation, beyond experience and information gleaned from other
studies, the analyst will have no information as to the complexity of a part-worth
utility function. This would suggest that the analyst is best to assume the worst
and produce models capable of estimating complex non-linear relationships.
However, as we shall see, this comes at a considerable cost, and may be far from
the best strategy to employ.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
217 Design and choice experiments
upon it during the modeling process. The independent (linear) constraints are the β
parameters we estimate, including any constants.
The above definition suggests that the more parameters an analyst desires to
estimate, the greater the number of degrees of freedom required for estimation
purposes. That is, the more complex non-linear relationships we wish to
detect, the more parameters we are required to estimate, and in turn the
more degrees of freedom we require for model estimation. As we shall show,
more degrees of freedom mean larger designs.
Assuming the estimation of a main effects only model (ignoring interac-
tions between attributes), the degrees of freedom required of a design depend
on the types of effects to be estimated and whether the design is labeled or
unlabeled. For our example, each (labeled) alternative (i.e., car, bus, train, and
plane, i.e., J = 4) has two attributes defined on three levels. Assuming estima-
tion of linear effects only, the degrees of freedom required for the design is
equal to eight (i.e., 4 × 2 + 1). As noted above, the degrees of freedom required
of a design corresponds to the number of parameters to be estimated over all
alternatives. Consider the utility functions for each of the four alternatives
given that the marginal utilities (or part-worths) are assumed to be linear. The
utility functions will be estimated as shown below (ignoring constant terms):
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
218 Getting started
Table 6.10 Minimum treatment combination requirements for main effects only
fractional factorial designs
Experiment
Effects Unlabeled Labeled
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
219 Design and choice experiments
In the first utility specification, the main effects are estimated as linear effects
and the interaction effect as the multiplication of the two linear main effects.
Thus, the interaction effect requires the estimation of only a single parameter
and hence necessitates only a single degree of freedom. In the second utility
specification, the main effects have been estimated as non-linear effects (i.e.,
they have been either dummy or effects coded). Interaction effects may thus be
estimated for each combination of the non-linear main effects. Under this
specification, 4 interactions are generated requiring 4 parameter estimates and
an additional 4 degrees of freedom. The last utility specification shows an
example whereby the main effects are estimated as non-linear effects; how-
ever, the interaction effect is estimated as if the main effects were linear in
effect. As such, only a single parameter will be estimated for the interaction
effect requiring only a single degree of freedom for estimation. The total
number of degrees of freedom required for this specification is 6.
The degrees of freedom required for the estimation of interaction terms
therefore depend on how the utility functions are likely to be estimated. The
degrees of freedom from an interaction term estimated from linear main effects
(two or more) will be one. Two-way interaction terms estimated from non-
linear main effects will be equal to (L1 – 1) × (L2 – 1), where L1 is the number of
levels associated with attribute 1 and L2 is the number of levels associated with
attribute 2. The degrees of freedom associated with the addition of attributes to
an interaction (e.g., three-way interactions) require the addition of multiplica-
tion terms to Equation (6.12) (e.g., (L1 – 1) × (L2 – 1) × (L3 – 1)).
Given knowledge of all of the above, we are now ready to proceed with the
design of an experiment. Firstly, the analyst is required to determine which
effects of interest are to be modeled. It is usual to model all main effects (treated
as either linear or non-linear) and ignore any possible interaction effects; hence,
the smallest number of effects to be estimated is equivalent to the number of
main effects (i.e., parameters). This will produce a model equivalent to Equation
(6.1). Such designs are called orthogonal main effects only designs assuming
orthogonality is retained as a statistical property. We noted earlier that reducing
the number of treatment combinations through the use of fractional factorial
designs results in confoundment of effects. Main effects only designs are designs
in which the main effects are independently estimable of all other effects but the
interaction effects will be confounded with one another.
For our example, we have 8 attributes each with 3 levels (2 attributes for
each of the 4 alternatives). Assuming that non-linear estimates are required,
each attribute requires 2 degrees of freedom for main effects to be estimated,
suggesting that the design generated requires at minimum 16 degrees of
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
220 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
221 Design and choice experiments
A B C D E F G H
AB BC CD DE EF FG GH
AC BD CE DF EG FH
AD BE CF DG EH
AE BF CG DH
AF BG CH
AG BH
AH
Fortunately, the analyst is able to generate designs that allow for the
estimation of selected interaction effects. Such designs require advance knowl-
edge as to which interactions are likely to be statistically significant. The ability
to estimate interaction effects comes at the cost of design size, however. The
more interaction effects the analyst wishes to estimate the more treatment
combinations are required. Let us assume that our analyst believes that the
interaction between comfort and travel time will be significant for the car
alternative and similarly for the bus alternative. Therefore, the analyst wishes
to estimate two two-way interaction effects. Taking the two-way interaction
effect for the car alternative and assuming non-linear effects, the degrees of
freedom for the interaction effect is 4 (i.e., (3 – 1) × (3 – 1)) (one if linear effects
are used). Similarly, the two-way interaction for the bus alternative is also 4.
Thus the analyst now requires a design with 24 degrees of freedom (16 main
effects degrees of freedom plus 8 two-way interaction degrees of freedom).
Again, a search for an orthogonal array shows that the smallest number of
treatment combinations is 27 and not 24. Thus the analyst must generate a
design with 27 treatment combinations. So how do we determine which 27
treatment combinations to select (recall that there exist 6,561 such
combinations)?
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
222 Getting started
result of which is that 3 different decision makers are required to complete the
full design. Assuming that the analyst has done as described above, then for
the design with 27 treatment combinations, each decision maker would
receive 9 of the 27 treatment combinations. If a 9-level column was used,
then 9 respondents would each receive 3 treatment combinations. Note that
we could have blocked the full factorial design (although the block would not
be orthogonal; full factorial designs allocate all possible orthogonal columns to
the attributes); however, as can be seen from the above, the sample size
required for a blocked design increases exponentially as the number of treat-
ment combinations within a block decreases for a fixed number of treatment
combinations.
As an aside, a design is only orthogonal if the complete (fractional factorial or full factorial)
design is used. Thus, if blocks of block size nine are used and only two of the three decision
makers complete the experiment, the (pooled) design used at the time of estimation will not
be orthogonal. Acknowledgement of this fact has largely been ignored by both academics
and practitioners. One wonders how many carefully crafted orthogonal designs in reality
have maintained their statistical properties after the data have been collected and used in
model estimation!
As a further aside, note that we suggested a blocking strategy that involves the use of an
extra column that is orthogonal to the other design columns. By using an orthogonal column
for the block, the attribute parameter estimates will be independent of the assigned blocks.
This is important statistically; however, it may not always be possible to add an extra design
column for the purpose of blocking without increasing the number of treatment combina-
tions, as for every design there exists a finite number of orthogonal columns available that
the analyst may choose from. It is therefore not uncommon to move to a larger design in
order to locate an additional design column that may be allocated as a blocking variable.
Note, however, that unless it is the desire of the analyst to test the effect of a blocking
variable, assuming that there exists an additional design column that may be allocated as
the blocking variable, we do not have to increase the number of treatment combinations as a
result of an increase in the degrees of freedom required.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
223 Design and choice experiments
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
224 Getting started
attributes such that each design column that will be generated will be assigned to
a specific attribute (e.g., SPSS will generate a column in the design called
comfort). Examination of the design process outlined in Figure 6.1 suggests
that for main effects only designs, the generation of the design (stage four) and
the allocation of attributes to design columns (stage five) occur simultaneously.
If the analyst wishes to test for specific interaction effects (e.g., the two-way
interaction between the comfort and travel time attributes for the car alternative)
then stages four and five of the design process occur sequentially such that the
analyst provides SPSS with generic attribute names and assigns the generated
design columns at a later stage of the design process. We see why this is so as our
discussion progresses.
As an aside, the orthogonal design developed in SPSS can also be developed in Ngene. Later
in the chapter we show how you can use Ngene to design most choice experiments.
Returning to our earlier example, we note that the analyst requires a main
effects plus selected (two-way) interaction design (i.e., the two way interaction
between the attributes comfort and travel time for the car alternative, and the
two-way interaction between comfort and travel time for the bus alternative).
Note that the analyst must pre-specify which interactions are to be tested
before a design can be generated, as well as whether linear or non-linear effects
(or some combination of the two) are to be estimated.
To generate an experimental design in SPSS, the following actions are
required. Go to the Data option in the toolbar menu. In the pop down
menu select Orthogonal Design and then Generate. . . This will open the
Generate Orthogonal Design dialog box. We show these dialog boxes in
Figure 6.5.
To progress, the analyst uses the Generate Orthogonal Design dialog box to
name the attributes to be generated. We do this by typing the names of the
attributes (factors) into the Factor Name box and pressing the Add button after
each entry. Once the Add button has been pressed, the attribute name will
appear in the box next to the Add button. Note that SPSS allows for designs with
up to 10 attributes (we shall discuss later how one may generate larger designs).
Continuing with our example, the analyst provides each of the attributes with
generic titles. For this example, we will use the names, A, B, C, D, E, F, G, H, and
I. Note that we have provided 9 attribute names, and not 8. We will use one of
these attributes (we do not know which yet) as a blocking attribute.
Next the analyst must specify the number of attribute levels. For main
effects only designs, the analyst provides attribute names that are specific to
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
225 Design and choice experiments
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
226 Getting started
names. Before we can continue, the analyst must be aware of the number of
treatment combinations required due to the degrees of freedom necessary for
estimation purposes. In generating a design for any number of attributes and
attribute levels, SPSS will generate the smallest design available (although this is
not strictly true, as we will see) capable of estimating non-linear effects. As we
have seen, the smallest experimental design possible (that we would consider
using) is a main effects only design. Thus, if a larger design as a result of a greater
number of degrees of freedom is required for estimation purposes (e.g., due to the
necessity to estimate interaction effects), we are required to inform SPSS of this.
We do this by selecting the options button in the Generate Orthogonal Design
dialog box. This will open the Generate Orthogonal Design: Options dialog box.
Following on from our earlier example, we note that the minimum number of
treatment combinations required for a design with main effects plus 2 two-way
interactions, where each attribute within the design has 3 levels, is 24. We
therefore place 25 in the Minimum number of cases to generate: box and press
Continue. (We put 25 instead of 24 purely to demonstrate that S has to be greater
than the degrees of freedom required of the design. We could put 24 here.)
The analyst is now ready to generate an experimental design. Before doing
so, however, the analyst may elect to generate the design in a file saved
somewhere on their computer, or to have the design replace the active work-
sheet in SPSS. This decision is made in the Generate Orthogonal Design dialog
box in the section titled Data File (Figure 6.7). Having selected where the
design is outputted to generate the design, press the OK button.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
227 Design and choice experiments
As an aside, note that the reader is best to generate statistical designs from first principles
rather than use statistical packages such as SPSS. To prove why, we invite the reader to
generate a design with 8 attributes each with 3 levels without specifying the minimum
number of treatment combinations to generate. In doing so, the reader should note that the
design generated will have 27 treatment combinations. This is certainly not the smallest
design possible. As we noted earlier, we are able to generate a workable design capable of
estimating non-linear effects for each attribute from first principles that has only 18
treatment combinations for the attributes and attribute levels as specified. While the 27
treatment combination is useful, this highlights one problem in using computer programs to
generate designs.
As a further aside, note that SPSS will generate a different design each time the process
described is followed. Thus, if the reader uses SPSS to generate two designs without
changing any inputs to the program, two completely different designs will be generated. As
such, the reader is advised that should they follow our example and attempt to replicate the
design we display in the next section, it is probable that they will obtain a different looking
design. No matter what the design generated, SPSS ensures that the design generated will
be orthogonal.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
228 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
229 Design and choice experiments
Treatment
combination A B C D E F G H I
1 −1 −1 1 0 0 −1 0 −1 1
2 1 0 1 −1 −1 0 0 −1 −1
3 0 0 1 0 −1 −1 1 0 0
4 0 0 0 1 0 −1 −1 0 −1
5 1 −1 0 −1 1 1 −1 0 1
6 0 1 0 −1 −1 1 1 −1 −1
7 −1 1 0 0 −1 0 −1 0 0
8 0 −1 0 0 1 0 0 1 −1
9 1 0 0 0 0 0 1 −1 1
10 1 1 1 0 1 −1 −1 1 −1
11 0 −1 −1 1 −1 0 1 1 1
12 −1 0 −1 0 1 1 1 1 −1
13 −1 1 −1 1 0 0 0 0 −1
14 1 1 −1 −1 0 −1 1 1 0
15 −1 −1 0 1 1 −1 1 −1 0
16 0 −1 1 −1 0 0 −1 1 0
17 −1 0 1 1 −1 1 −1 1 1
18 −1 −1 −1 −1 −1 −1 −1 −1 −1
19 1 −1 −1 0 −1 1 0 0 0
20 −1 1 1 −1 1 0 1 0 1
21 1 0 −1 1 1 0 −1 −1 0
22 0 1 −1 0 0 1 −1 −1 1
23 −1 0 0 −1 0 1 0 1 0
24 1 1 0 1 −1 −1 0 1 1
25 0 1 1 1 1 1 0 −1 0
26 1 −1 1 1 0 1 1 0 −1
27 0 0 −1 −1 1 −1 0 0 1
As an aside, it is good practice to test the effect of blocks on the experiment. This may
include testing the interaction of the blocking variable with the attribute columns of the
design. If we use up to the nine-way interaction, then we have included the blocking
variable in the interaction effect. Doing so, however, will require that the blocking variable
be included in the model, which in turn requires that the degrees of freedom for that
variable be included in the determination of how many treatment combinations are
required.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
230 Getting started
rule of thumb, the analyst should generate all interactions up to the order of
the highest order interaction the analyst wishes to estimate. For our example,
the highest order interaction the analyst desires to test is a two-way interac-
tion. Thus, all two-way interaction columns should be produced. Had it been
the desire of the analyst to test a four-way interaction (e.g., the interaction
between the travel time attributes for each of the alternatives) then all two-
way, three-way, and four-way design columns should be examined. Table 6.13
shows the design columns for all main effects and all two-way interactions
columns for the design shown in Table 6.12.
As an aside, Table 6.13 was derived using Microsoft Excel as shown in Figure 6.8. In
Figure 6.8, cells K2 through P2 show the calculations of the two-way interaction effects,
while cells K3 through Q28 show the results of similar calculations for each remaining row of
the design.
The next stage of the process is to generate the complete correlation matrix
for all main effects and interaction terms. This is shown in Table 6.14.
Examining the correlation matrix in Table 6.14 reveals that all of the main
effects are uncorrelated with all other main effects. Using the terminology of
the experimental design literature, we say that the main effects are un-
confounded with each other.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
231 Design and choice experiments
Data Analysis...
Figure 6.10 Microsoft Excel Data Analysis and Correlation dialog boxes
As an aside, the correlation matrix shown as Table 6.14 was also derived using Microsoft
Excel. This can be done by first selecting the Tools toolbar option followed by the Data
Analysis. . . option as shown in Figure 6.9. Note that the Data Analysis option is not
automatically installed with Microsoft Excel. If the option is not present in the Tools
Toolbar dropdown menu, the reader will need to add the option via the Add-Ins. . . option,
also shown in Figure 6.9.
Selecting Data Analysis. . . from the Tools dropdown menu will open the
Data Analysis dialog box (see Figure 6.10). From the Data Analysis dialog box
the analyst next selects the heading Correlation before pressing OK. This will
open the Correlation dialog box also shown in Figure 6.10.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
Table 6.13 Orthogonal codes for main effects plus all two-way interaction columns
Treatment Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
A −1 1 0 0 1 0 −1 0 1 1 0 −1 −1 1 −1 0 −1 −1 1 −1 1 0 −1 1 0 1 0
B −1 0 0 0 −1 1 1 −1 0 1 −1 0 1 1 −1 −1 0 −1 −1 1 0 1 0 1 1 −1 0
C 1 1 1 0 0 0 0 0 0 1 −1 −1 −1 −1 0 1 1 −1 −1 1 −1 −1 0 0 1 1 −1
D 0 −1 0 1 −1 −1 0 0 0 0 1 0 1 −1 1 −1 1 −1 0 −1 1 0 −1 1 1 1 −1
E 0 −1 −1 0 1 −1 −1 1 0 1 −1 1 0 0 1 0 −1 −1 −1 1 1 0 0 −1 1 0 1
F −1 0 −1 −1 1 1 0 0 0 −1 0 1 0 −1 −1 0 1 −1 1 0 0 1 1 −1 1 1 −1
G 0 0 1 −1 −1 1 −1 0 1 −1 1 1 0 1 1 −1 −1 −1 0 1 −1 −1 0 0 0 1 0
H −1 −1 0 0 0 −1 0 1 −1 1 1 1 0 1 −1 1 1 −1 0 0 −1 −1 1 1 −1 0 0
I 1 −1 0 −1 1 −1 0 −1 1 −1 1 −1 −1 0 0 0 1 −1 0 1 0 1 0 1 0 −1 1
AB 1 0 0 0 −1 0 −1 0 0 1 0 0 −1 1 1 0 0 1 −1 −1 0 0 0 1 0 −1 0
AC −1 1 0 0 0 0 0 0 0 1 0 1 1 −1 0 0 −1 1 −1 −1 −1 0 0 0 0 1 0
AD 0 −1 0 0 −1 0 0 0 0 0 0 0 −1 −1 −1 0 −1 1 0 1 1 0 1 1 0 1 0
AE 0 −1 0 0 1 0 1 0 0 1 0 −1 0 0 −1 0 1 1 −1 −1 1 0 0 −1 0 0 0
AF 1 0 0 0 1 0 0 0 0 −1 0 −1 0 −1 1 0 −1 1 1 0 0 0 −1 −1 0 1 0
AG 0 0 0 0 −1 0 1 0 1 −1 0 −1 0 1 −1 0 1 1 0 −1 −1 0 0 0 0 1 0
AH 1 −1 0 0 0 0 0 0 −1 1 0 −1 0 1 1 0 −1 1 0 0 −1 0 −1 1 0 0 0
AI −1 −1 0 0 1 0 0 0 1 −1 0 1 1 0 0 0 −1 1 0 −1 0 0 0 1 0 −1 0
BC −1 0 0 0 0 0 0 0 0 1 1 0 −1 −1 0 −1 0 1 1 1 0 −1 0 0 1 −1 0
BD 0 0 0 0 1 −1 0 0 0 0 −1 0 1 −1 −1 1 0 1 0 −1 0 0 0 1 1 −1 0
BE 0 0 0 0 −1 −1 −1 −1 0 1 1 0 0 0 −1 0 0 1 1 1 0 0 0 −1 1 0 0
BF 1 0 0 0 −1 1 0 0 0 −1 0 0 −1 1 0 0 1 −1 0 0 1 0 −1 1 −1 0
BG 0 0 0 0 1 1 −1 0 0 −1 −1 0 0 1 −1 1 0 1 0 1 0 −1 0 0 0 −1 0
BH 1 0 0 0 0 −1 0 −1 0 1 −1 0 0 1 1 −1 0 1 0 0 0 −1 0 1 −1 0 0
BI −1 0 0 0 −1 −1 0 1 0 −1 −1 0 −1 0 0 0 0 1 0 1 0 1 0 1 0 1 0
CD 0 −1 0 0 0 0 0 0 0 0 −1 0 −1 1 0 −1 1 1 0 −1 −1 0 0 0 1 1 1
CE 0 −1 −1 0 0 0 0 0 0 1 1 −1 0 0 0 0 −1 1 1 1 −1 0 0 0 1 0 −1
CF −1 0 −1 0 0 0 0 0 0 −1 0 −1 0 1 0 0 1 1 −1 0 0 −1 0 0 1 1 1
CG 0 0 1 0 0 0 0 0 0 −1 −1 −1 0 −1 0 −1 −1 1 0 1 1 1 0 0 0 1 0
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
CH −1 −1 0 0 0 0 0 0 0 1 −1 −1 0 −1 0 1 1 1 0 1 1 0 0 −1 0 0
CI 1 −1 0 0 0 0 0 0 0 −1 −1 1 1 0 0 0 1 1 0 1 0 −1 0 0 0 −1 −1
DE 0 1 0 0 −1 1 0 0 0 0 −1 0 0 0 1 0 −1 1 0 −1 1 0 0 −1 1 0 −1
DF 0 0 0 −1 −1 −1 0 0 0 0 0 0 0 1 −1 0 1 1 0 0 0 0 −1 −1 1 1 1
DG 0 0 0 −1 1 −1 0 0 0 0 1 0 0 −1 1 1 −1 1 0 −1 −1 0 0 0 0 1 0
DH 0 1 0 0 0 1 0 0 0 0 1 0 0 −1 −1 −1 1 1 0 0 −1 0 −1 1 −1 0 0
DI 0 1 0 −1 −1 1 0 0 0 0 1 0 −1 0 0 0 1 1 0 −1 0 0 0 1 0 −1 −1
EF 0 0 1 0 1 −1 0 0 0 −1 0 1 0 0 −1 0 −1 1 −1 0 0 0 0 1 1 0 −1
EG 0 0 −1 0 −1 −1 1 0 0 −1 −1 1 0 0 1 0 1 1 0 1 −1 0 0 0 0 0 0
EH 0 1 0 0 1 0 1 0 1 −1 1 0 0 −1 0 −1 1 0 0 −1 0 0 −1 −1 0 0
EI 0 1 0 0 1 1 0 −1 0 −1 −1 −1 0 0 0 0 −1 1 0 1 0 0 0 −1 0 0 1
FG 0 0 −1 1 −1 1 0 0 0 1 0 1 0 −1 −1 0 −1 1 0 0 0 −1 0 0 0 1 0
FH 1 0 0 0 0 −1 0 0 0 −1 0 1 0 −1 1 0 1 1 0 0 0 −1 1 −1 −1 0 0
FI −1 0 0 1 1 −1 0 0 0 1 0 −1 0 0 0 0 1 1 0 0 0 1 0 −1 0 −1 −1
GH 0 0 0 0 0 −1 0 0 −1 −1 1 1 0 1 −1 −1 −1 1 0 0 1 1 0 0 0 0 0
Gl 0 0 0 1 −1 −1 0 0 1 1 1 −1 0 0 0 0 −1 1 0 1 0 −1 0 0 0 −1 0
HI −1 1 0 0 0 1 0 −1 −1 −1 1 −1 0 0 0 0 1 1 0 0 0 −1 0 1 0 0 0
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
234 Getting started
As an aside, the correlation coefficient used by Microsoft Excel is the Pearson product
moment correlation coefficient. This statistic is strictly appropriate only when the variables
used in the test are ratio scaled. This is clearly not the case here. The full design with
interaction columns (Table 6.13) could be exported into SPSS and either the Spearman rho
or Kendall’s tau-b correlations be calculated; however, neither of these is strictly appropriate
for the data either. The appropriate measure to use for the design would be the J-index;
however, unless the analyst has access to specialized software, this correlation coefficient
will need to be calculated manually. This calculation is beyond the scope of this book, and as
such we rely on the Pearson product moment correlation coefficient, assumed to be an
approximation for all similarity indices.
In the Correlation dialog box, all the cells shown in Figure 6.10 are selected
in the Input Range: cell. By selecting the first row that includes the column
headings, the analyst is also required to check the Labels in First Row box. This
will show the column headings as part of the Excel correlation output,
otherwise Excel will assign generic column headings to the resulting correla-
tion matrix output.
Note, however, the existence of correlations with the main effects col-
umns and several of the interaction effects columns (e.g., design column A
is correlated with the BF interaction column), as shown in Table 6.14. This
is an unfortunate consequence of using fractional factorial designs. By
using only a fraction of the available treatment combinations, fractional
factorial designs must confound some of the effects. Unless designs are
generated from first principles, the analyst has no control over which
effects are confounded (another reason why the serious choice analyst is
best to learn how to generate statistical designs from first principles and not
rely on computer packages).
As an aside, experience suggests that it is more likely that two-way interactions are
statistically significant than three-way or higher interactions. Thus, designs in which all
main effects are un-confounded with all two-way interactions are preferable. To demon-
strate why this is so, return to our earlier discussion of the effects of confoundment.
Confoundment produces model effects similar to the effects of multicollinearity in linear
regression models. That is, the parameter estimates we obtain at the time of estimation
are likely to be incorrect, as are their standard errors, and as such so are any tests we
perform on attribute significance. Taking this example, unless the analyst specifically
tests the statistical significance of the attributes assigned to the B and F design columns
(we have not yet allocated attributes to columns), the analyst can never be sure that the
parameter estimates for the main effect for the attribute assigned to column A are
correct. We note that design column A is also confounded with the CI, EG, and FH
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
235 Design and choice experiments
interaction columns. As with the BF interaction, the significance of any of these inter-
actions also poses problems for model estimation. We draw the reader’s attention to the
fact that the other main effects columns are also correlated with other interaction effects.
Does the analyst test for these effects also? To do so will require larger designs due to
the requirement for degrees of freedom. Thus the only way to proceed is to assume that
these interaction effects are insignificant in practice. While this assumption may seem
unwise, the analyst can cut the odds in assuming interaction effects to be insignificant
through selecting specific interaction effects to test. The selection of which effects to test
for occurs prior to design generation. For example, the analyst in our example believed
that the interactions between the comfort attribute and travel time attribute for the car
and bus alternatives were likely to be significant. This determination was made in
advance of the design generation.
To continue, the analyst first assigns the attributes for which interaction
effects are to be tested. The analyst revisits the correlation matrix and iden-
tifies the two-way interaction columns for correlations with the main effects
columns. Examination of Table 6.14 reveals that the AD, BC, BE, BI, CD, CE,
CF, DF, and EF two-way interaction columns are all un-confounded with all
main effects design columns (but not with other two-way interaction col-
umns). The analyst requires four design columns (two for the car interaction
and two for the bus interaction). What is required is to determine which of the
columns to use given the correlations among the main effects and two-way
interaction columns. The analyst may assign any of the columns as suggested
by the interaction combinations suggested above. That is, for the interaction
between the comfort and travel time attributes the analyst may assign the
comfort attribute to column A and the travel time attribute to column D (or
comfort to D and travel time to A). Alternatively, the B and C, B and E, B and
I, C and D, C and E, C and F, D and F, or E and F columns could be used. The
analyst must also assign the attributes for the bus alternative to one of these
combinations.
But which combinations should be used? Again, the correlation matrix
provides the answer. Once the analyst has identified which interaction
design columns are un-confounded with the main effects design columns,
the next step is to examine the correlations among the two-way interactions.
Doing so, we note that the AD interaction column is confounded with the
BC, BE, BI, CE, and EF design interaction columns. Hence, if the analyst
were to assign the attributes of the car alternative to the A and D columns
and the bus alternative attributes to any of the combinations of the interac-
tion columns mentioned above, then the estimated interaction effects will be
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
236 Getting started
confounded. Thus, should the analyst decide to use the A and D columns for
one interaction effect, the other interaction attributes should be assigned to
the C and D, C and F, or the D and F columns. Note that we cannot assign
two attributes to the D column; therefore, the second two attributes for
which an interaction effect is to be tested must be assigned to the C and F
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
237 Design and choice experiments
columns. Had the B and C columns been utilized for the car attributes, the
reader is invited to check that either D and F or E and F columns may be
used for the second two attributes for which the analyst wishes to obtain
interactions.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
238 Getting started
As an aside, assuming the A and D columns and B and C columns were the ones used,
the two two-way interactions may be treated as being independent of all main effects,
but not of all other two-way interaction effects. Indeed, as with the main effects, the
analyst must assume that the two-way interactions for the BC, BE, BI, CE, and EF
interaction terms are insignificant in order to proceed (and that is only for the AD
interaction). If, however, it turns out in practice that any other interactions are significant,
then the problem of estimating models with correlated data arises once more. That these
interactions are insignificant statistically is an assumption of which there exists no way
of testing. We have said nothing of the confoundment that exists with higher order
interaction terms.
Assuming that the analyst elected to assign the car alternative attributes to
the A and D columns and the bus alternative attributes to the C and F design
columns, the remainder of the attributes may be distributed to the remaining
design columns. No interaction terms are required for these and hence
confoundement of the interaction terms for these remaining attributes is
not an issue. Note that, for our example, all attributes have three levels each,
hence it matters not to which design columns we assign the remaining
attributes to. Had the design required one attribute to have four levels, then
that attribute would be assigned to the design column with four attribute levels
(or a pair of design columns each of two levels). Table 6.15 shows the
attributes as they might be allocated to the design columns for the experi-
mental design introduced in Table 6.11. Note that we have allocated column I
to be the block variable.
As an aside, the reader should be aware of the issue of balanced designs versus
unbalanced designs. A balanced design is a design in which the levels of any given
attribute appear the same number of times as all other levels for that particular attribute.
For example, for the design described in Table 6.15, for each attribute, the level coded
−1 occurs nine times, 0 nine times, and 1 nine times. An unbalanced design is a design
in which the attribute levels do not appear the same number of times within each
attribute for the design. The use of balanced versus unbalanced designs is of interest as
early research conducted suggests that the unbalanced attributes of an unbalanced
design are often found to be statistically significant, not so much because the attribute
itself is statistically significant but because attention is drawn to that attribute at the time
of the survey.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
239 Design and choice experiments
Design column
Treatment
combination A D C F E B G H I
1 −1 0 1 −1 0 −1 0 −1 1
2 1 −1 1 0 −1 0 0 −1 −1
3 0 0 1 −1 −1 0 1 0 0
4 0 1 0 −1 0 0 −1 0 −1
5 1 −1 0 1 1 −1 −1 0 1
6 0 −1 0 1 −1 1 1 −1 −1
7 −1 0 0 0 −1 1 −1 0 0
8 0 0 0 0 1 −1 0 1 −1
9 1 0 0 0 0 0 1 −1 1
10 1 0 1 −1 1 1 −1 1 −1
11 0 1 −1 0 −1 −1 1 1 1
12 −1 0 −1 1 1 0 1 1 −1
13 −1 1 −1 0 0 1 0 0 −1
14 1 −1 −1 −1 0 1 1 1 0
15 −1 1 0 −1 1 −1 1 −1 0
16 0 −1 1 0 0 −1 −1 1 0
17 −1 1 1 1 −1 0 −1 1 1
18 −1 −1 −1 −1 −1 −1 −1 −1 −1
19 1 0 −1 1 −1 −1 0 0 0
20 −1 −1 1 0 1 1 1 0 1
21 1 1 −1 0 1 0 −1 −1 0
22 0 0 −1 1 0 1 −1 −1 1
23 −1 −1 0 1 0 0 0 1 0
24 1 1 0 −1 −1 1 0 1 1
25 0 1 1 1 1 1 0 −1 0
26 1 1 1 1 0 −1 1 0 −1
27 0 −1 −1 −1 1 0 0 0 1
As a further aside, the formulas shown in Table 6.10 are used to calculate the minimum
degrees of freedom necessary for estimating the desired number of parameters. The
numbers derived may, however, not represent the true minimum number of treatment
combinations necessary to achieve an orthogonal design, due to the necessity to maintain
attribute level balance within each attribute. For example, let M = 2, A = 3, and L = 2. The
minimum number of treatment combinations assuming the estimation of non-linear effects
in the marginal utilities in a labeled choice experiment is equal to (2–1) × 2 × 3 + 1 or 7.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
240 Getting started
However, such a design will not be balanced as each attribute has 2 levels which must
appear an equal number of times over 6 choice sets. This represents an additional
constraint, such that the smallest possible design will have a number of treatment combina-
tions equal to or greater than that calculated using the relevant formula shown in Table 6.10,
but also be a number that produces an integer when divided by all L.
Before proceeding to stage six of the design process, the analyst may wish to
sort the experimental design by the blocking variable. Doing so informs the
analyst which mixture of treatment combinations will be shown to various
decision makers. Looking at Table 6.16, we see that one out of every three
decision makers will be given treatment combinations 2, 4, 6, 8, 10, 12, 13, 18,
and 26. Other decision makers will be presented with treatment combinations
3, 7, 14, 15, 16, 19, 21, 23, and 25. Yet other decision makers will be given
treatment combinations 1, 5, 9, 11, 17, 20, 22, 24, and 27.
We have, in generating a fractional factorial design for the example
described, managed to reduce the number of treatment combinations from
6,561 (the full factorial design) to 27. Further, we have managed to reduce the
27 treatment combinations to 9 in terms of how many treatment combina-
tions each decision maker will be presented with (in the guise of choice sets).
We have done so by confounding higher order interaction effects that we are
required to assume will be statistically insignificant.
The experimental design shown in Table 6.16 represents a workable design
capable of estimating all main effects and two two-way interactions. However
returning to the method of how the design was derived, we note that the
degrees of freedom used to determine the number of treatment combinations
for the design was such that non-linear effects could be estimated for each
attribute. That is, the analyst may elect to dummy or effects code each attribute
and estimate a parameter for each dummy or effect variable thus created. It is
therefore worthwhile examining how the design will look should the analyst
elect to use effects codes (for dummy codes one simply has to replace all −1s
with 0s). Using the orthogonal code of −1 in Table 6.15 to represent the base
level (i.e., the level that will take the value −1 in our effects code), the design
will be shown as in Table 6.15.
Note that we did not effects code column I of Table 6.16. This is because this
column represents the blocking column of the design for which we will not be
estimating a parameter when we estimate our choice model (although we
could in order to determine whether the block assigned was a significant
contributor to the choice outcome). Table 6.18 shows the correlation matrix
for the design shown in Table 6.17.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
241 Design and choice experiments
Design column
Treatment
combination A D C F E B G H I
2 1 −1 1 0 −1 0 0 −1 −1
4 0 1 0 −1 0 0 −1 0 −1
6 0 −1 0 1 −1 1 1 −1 −1
8 0 0 0 0 1 −1 0 1 −1
10 1 0 1 −1 1 1 −1 1 −1
12 −1 0 −1 1 1 0 1 1 −1
13 −1 1 −1 0 0 1 0 0 −1
18 −1 −1 −1 −1 −1 −1 −1 −1 −1
26 1 1 1 1 0 −1 1 0 −1
3 0 0 1 −1 −1 0 1 0 0
7 −1 0 0 0 −1 1 −1 0 0
14 1 −1 −1 −1 0 1 1 1 0
15 −1 1 0 −1 1 −1 1 −1 0
16 0 −1 1 0 0 −1 −1 1 0
19 1 0 −1 1 −1 −1 0 0 0
21 1 1 −1 0 1 0 −1 −1 0
23 −1 −1 0 1 0 0 0 1 0
25 0 1 1 1 1 1 0 −1 0
1 −1 0 1 −1 0 −1 0 −1 1
5 1 −1 0 1 1 −1 −1 0 1
9 1 0 0 0 0 0 1 −1 1
11 0 1 −1 0 −1 −1 1 1 1
17 −1 1 1 1 −1 0 −1 1 1
20 −1 −1 1 0 1 1 1 0 1
22 0 0 −1 1 0 1 −1 −1 1
24 1 1 0 −1 −1 1 0 1 1
27 0 −1 −1 −1 1 0 0 0 1
Examination of Table 6.18 shows that there now exist correlations within
the design. Design orthogonality has been lost. Indeed, design orthogonality
will exist for linear main effects designs only. Once one moves towards designs
capable of estimating non-linear effects using such methods as effects or
dummy coding, one automatically introduces correlations (we leave it to the
reader to show the correlation structure formed when dummy codes are used
instead of effects codes for the above example).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
242 Getting started
Design column
Treatment
combination A1 A2 D1 D2 C1 C2 F1 F2 E1 E2 B1 B2 G1 G2 H1 H2 I
2 1 0 −1 −1 1 0 0 1 −1 −1 0 1 0 1 −1 −1 −1
4 0 1 1 0 0 1 −1 −1 0 1 0 1 −1 −1 0 1 −1
6 0 1 −1 −1 0 1 1 0 −1 −1 1 0 1 0 −1 −1 −1
8 0 1 0 1 0 1 0 1 1 0 −1 −1 0 1 1 0 −1
10 1 0 0 1 1 0 −1 −1 1 0 1 0 −1 −1 1 0 −1
12 −1 −1 0 1 −1 −1 1 0 1 0 0 1 1 0 1 0 −1
13 −1 −1 1 0 −1 −1 0 1 0 1 1 0 0 1 0 1 −1
18 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1
26 1 0 1 0 1 0 1 0 0 1 −1 −1 1 0 0 1 −1
3 0 1 0 1 1 0 −1 −1 −1 −1 0 1 1 0 0 1 0
7 −1 −1 0 1 0 1 0 1 −1 −1 1 0 −1 −1 0 1 0
14 1 0 −1 −1 −1 −1 −1 −1 0 1 1 0 1 0 1 0 0
15 −1 −1 1 0 0 1 −1 −1 1 0 −1 −1 1 0 −1 −1 0
16 0 1 −1 −1 1 0 0 1 0 1 −1 −1 −1 −1 1 0 0
19 1 0 0 1 −1 −1 1 0 −1 −1 −1 −1 0 1 0 1 0
21 1 0 1 0 −1 −1 0 1 1 0 0 1 −1 −1 −1 −1 0
23 −1 −1 −1 −1 0 1 1 0 0 1 0 1 0 1 1 0 0
25 0 1 1 0 1 0 1 0 1 0 1 0 0 1 −1 −1 0
1 −1 −1 0 1 1 0 −1 −1 0 1 −1 −1 0 1 −1 −1 1
5 1 0 −1 −1 0 1 1 0 1 0 −1 −1 −1 −1 0 1 1
9 1 0 0 1 0 1 0 1 0 1 0 1 1 0 −1 −1 1
11 0 1 1 0 −1 −1 0 1 −1 −1 −1 −1 1 0 1 0 1
17 −1 −1 1 0 1 0 1 0 −1 −1 0 1 −1 −1 1 0 1
20 −1 −1 −1 −1 1 0 0 1 1 0 1 0 1 0 0 1 1
22 0 1 0 1 −1 −1 1 0 0 1 1 0 −1 −1 −1 −1 1
24 1 0 1 0 0 1 −1 −1 −1 −1 1 0 0 1 1 0 1
27 0 1 −1 −1 −1 −1 −1 −1 1 0 0 1 0 1 0 1 1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
243 Design and choice experiments
a1 a2 d1 d2 c1 c2 f1 f2 E1 e2 b1 b2 g1 g2 h1 h2 i
a1 1
a2 0.5 1
d1 0 0 1
d2 0 0 0.5 1
c1 0 0 0 0 1
c2 0 0 0 0 0.5 1
f1 0 0 0 0 0 0 1
f2 0 0 0 0 0 0 0.5 1
e1 0 0 0 0 0 0 0 0 1
e2 0 0 0 0 0 0 0 0 0.5 1
b1 0 0 0 0 0 0 0 0 0 0 1
b2 0 0 0 0 0 0 0 0 0 0 0.5 1
g1 0 0 0 0 0 0 0 0 0 0 0 0 1
g2 0 0 0 0 0 0 0 0 0 0 0 0 0.5 1
h1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
h2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.5 1
i 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
244 Getting started
Treatment combination A B C D
1 −3 −3 −3 −3
2 1 1 −3 1
3 3 3 −3 3
4 −1 −1 −3 −1
5 −1 3 1 −3
6 3 1 −1 −3
7 1 −1 3 −3
8 −3 3 3 1
9 3 −3 3 −1
10 −3 1 1 −1
11 −1 1 3 3
12 −1 −3 −1 1
13 1 −3 1 3
14 1 3 −1 −1
15 3 −1 1 1
16 −3 −1 −1 3
using SPSS to generate a base design and then use this base design to generate
the other attribute columns as required. Let us use a simpler (smaller) design
example to demonstrate how. Table 6.19 shows a design generated by SPSS for
four attributes each with four levels. Note that we have used orthogonal coding.
The analyst may generate the additional design columns required using a
number of different approaches. Firstly, the analyst may use the existing treat-
ment combinations and use these as the additional design columns. To do so,
the analyst might randomize the treatment combinations and assign these
randomized treatment combinations to the new design columns while retaining
the existing design for the original columns. We do this in Table 6.20.
In assigning the randomized treatment combinations, it is important that
the analyst check that a randomized treatment combination is not assigned
next to its replicate treatment combination (i.e., randomized treatment com-
bination one is not assigned next to the original treatment combination one).
Table 6.21 shows the correlation matrix for the above design. The design
produced, as is the case here, is not likely to be orthogonal. As such, the problems
associated with modeling with correlated data are likely to be experienced.
An alternative approach is to take the foldover of the design and to use it as
the new design columns. Taking the foldover involves the reproduction of the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
245 Design and choice experiments
Table 6.20 Randomizing treatment combinations to use for additional design columns
1 −3 −3 −3 −3 2 1 1 −3 1
2 1 1 −3 1 16 −3 −1 −1 3
3 3 3 −3 3 15 3 −1 1 1
4 −1 −1 −3 −1 6 3 1 −1 −3
5 −1 3 1 −3 4 −1 −1 −3 −1
6 3 1 −1 −3 10 −3 1 1 −1
7 1 −1 3 −3 9 3 −3 3 −1
8 −3 3 3 1 14 1 3 −1 −1
9 3 −3 3 −1 8 −3 3 3 1
10 −3 1 1 −1 1 −3 −3 −3 −3
11 −1 1 3 3 12 −1 −3 −1 1
12 −1 −3 −1 1 5 −1 3 1 −3
13 1 −3 1 3 3 3 3 −3 3
14 1 3 −1 −1 7 1 −1 3 −3
15 3 −1 1 1 11 −1 1 3 3
16 −3 −1 −1 3 13 1 −3 1 3
A B C D E F G H
A 1
B 0 1
C 0 0 1
D 0 0 0 1
E −0.1 −0.05 −0.15 0.2 1
F 0.2 −0.4 0 0 0 1
G 0.6 −0.05 0.15 0 0 0 1
H 0.25 −0.25 0 0.5 0 0 0 1
design such that the factor levels of the design are reversed (e.g., replace 0 with
1 and 1 with 0). If orthogonal codes have been used we may achieve this effect
by multiplying each column by −1. Table 6.22 shows the foldover for our
simplified example. Columns E through H of Table 6.22 represent the foldover
columns.
Unfortunately, the way SPSS generates designs means that using the fold-
over to generate extra design columns is not a desirable approach to the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
246 Getting started
Treatment combination A B C D E F G H
1 −3 −3 −3 −3 3 3 3 3
2 1 1 −3 1 −1 −1 3 −1
3 3 3 −3 3 −3 −3 3 −3
4 −1 −1 −3 −1 1 1 3 1
5 −1 3 1 −3 1 −3 −1 3
6 3 1 −1 −3 −3 −1 1 3
7 1 −1 3 −3 −1 1 −3 3
8 −3 3 3 1 3 −3 −3 −1
9 3 −3 3 −1 −3 3 −3 1
10 −3 1 1 −1 3 −1 −1 1
11 −1 1 3 3 1 −1 −3 −3
12 −1 −3 −1 1 1 3 1 −1
13 1 −3 1 3 −1 3 −1 −3
14 1 3 −1 −1 −1 −3 1 1
15 3 −1 1 1 −3 1 −1 −1
16 −3 −1 −1 3 3 1 1 −3
As an aside, assuming that the analyst wishes to estimate non-linear effects, even had no
correlations been observed, the above designs would be unusable as a result of insufficient
degrees of freedom. That is, for 4 attributes each with 4 levels, 16 treatment combinations
provide a sufficient amount of degrees of freedom for estimation purposes (i.e., we require 3
× 4 = 12 degrees of freedom for main effects only). For 8 attributes each with 4 levels we
require 24 degrees of freedom (i.e., 3 × 8) for a main effects only design. As such, 16
treatment combinations will provide an insufficient amount of degrees of freedom. In our
defense, the above is used only as an example of procedure. The reader should note that had
we done the above correctly, we would specify the minimum number of treatment
combinations to be generated as 24 before proceeding to generate the additional columns.
We conclude that the analyst is best to use the first method to generate
additional design columns (at least if SPSS is used to generate the design).
Unless all decision makers respond to the questionnaire, then the design as
entered into the computer will not be orthogonal anyway (i.e., for
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
247 Design and choice experiments
Table 6.23 Correlation matrix for designs using foldovers to generate additional columns
A B C D E F G H
A 1
B 0 1
C 0 0 1
D 0 0 0 1
E −1 0 0 0 1
F 0 −1 0 0 0 1
G 0 0 −1 0 0 0 1
H 0 0 0 −1 0 0 0 1
orthogonality, it is the rows of the design which are important and not the
columns). We can remove columns and not lose orthogonality. If we lose rows
(treatment combinations) then the design will no longer be orthogonal. Thus
if a block of a design is not returned by one decision maker, orthogonality will
be lost. In practice, this fact is largely ignored. We continue our discussion on
the experimental design process in Chapter 7.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
248 Getting started
particular data types corresponding to the problems that were being addressed
at the time. Indeed, the original theories dealt specifically with experimental
problems where the dependent variable was continuous in nature. As such,
the resulting design theory was developed specifically for models capable of
handling such data; hence much of the work on experimental design theory
has concentrated on use of analysis of variance (ANOVA) and linear regres-
sion type models (see Peirce 1876). From a historical perspective this has had a
significant impact on the SC literature. The original SC studies, unsurprisingly,
concentrated on introducing and promoting the benefits of the new modeling
method and did not concentrate specifically on the issue of experimental design
(see Louviere and Hensher 1983 and Louviere and Woodworth 1983). As such,
these earlier works understandably borrowed from the early theories on experi-
mental design without considering whether they were appropriate or not for use
with models applied to such data. Over time, the designs used in these earlier SC
studies became the norm and have largely remained so ever since.
Sporadic research over the years, however, has looked at the specific
problem of experimental designs as related to econometric models estimated
on discrete choice data. In order to calculate the statistical efficiency of a SC
design, Fowkes and Wardman (1988), Bunch et al. (1996), Huber and Zwerina
(1996), Sándor and Wedel (2001), and Kanninen (2002), among others, have
shown that the common use of logit models to analyze discrete choice data
requires a priori information about the parameter estimates, as well as the
final econometric model form that will be used in estimation. Specifically,
information on the expected parameter estimates, in the form of priors is
required in order to calculate the expected utilities for each of the alternatives
present within the design. Once known, these expected utilities can in turn be
used to calculate the likely choice probabilities. Hence, given knowledge of the
attribute levels (the design), expected parameter estimate values, and the
resultant choice probabilities, it becomes a straightforward exercise to calcu-
late the asymptotic variance-covariance (AVC) matrix for the design, from
which the expected standard errors can be obtained. The AVC matrix of the
design, ΩN, can be determined as the inverse of the Fisher Information matrix, IN,
which is computed as the negative expected second derivatives of the log-
likelihood (LL) function, considering N respondents, of the discrete choice
model to be estimated (see Train 2009 and Chapter 5). By manipulating
the attribute levels of the alternatives, for known (assumed) parameter values,
the analyst is able to minimize the elements within the AVC matrix, which in the
case of the diagonals means lower standard errors and hence greater reliability in
the estimates at a fixed sample size (or even at a reduced sample size).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
249 Design and choice experiments
In taking this approach, these authors have remained consistent with the
underlying theory of experimental design as defined previously. Indeed, the theory
for generating SC experimental designs has as its objective the same objective
when dealing with linear models; that is, the minimizing of the variances and
covariances of the parameter estimates. What is different, however, is the econo-
metric models to which the theory is being applied. As discussed above, other
differences have also emerged related to the various assumptions that are required
to be made when dealing with data specifically generated for logit type models.
The efficiency of a design can be derived from the AVC matrix. Instead of
assessing a whole AVC matrix, it is easier to assess a design based on a single
value. Therefore, efficiency measures have been proposed in the literature in
order to calculate such an efficiency value, typically expressed as an efficiency
“error” (i.e., a measure for the inefficiency). The objective then becomes to
minimize this efficiency error.
The most widely used measure is called the D-error, which takes the deter-
minant of the AVC matrix Ω1 assuming only a single respondent.2 A design
with the lowest D-error is called D-optimal. In practice, it is very difficult to find
the design with the lowest D-error, therefore we are satisfied if the design has a
sufficiently low D-error, called a D-efficient design. Different types of D-error
have been proposed in the literature, depending on the available information on
the prior parameters β.~ We will distinguish three cases (also see Bliemer and
Rose 2005b, 2009 and Rose and Bliemer 2004, 2008):
(a) No information is available
If no information is available (not even the sign of the parameters), then
set β~ ¼ 0: This leads to a so-called Dz -error (“z” from “zero”).
(b) Information is available with good approximations of β
If the information is relatively accurate, β~ is set to the best guesses, assum-
ing that they are correct. This leads to a so-called Dp -error (“p” from “priors”).
(c) Information is available with uncertainty about the approximations of β
Instead of assuming fixed priors β; ~ they are assumed to be random
following some given probability distribution to express the uncertainty
about the true value of β. This Bayesian approach leads to a so-called
Db -error(“b” from “Bayesian”).
The D-errors are a function of the experimental design X and the prior values
~ and can be mathematically formulated as:
(or probability distributions) β,
2
The assumption of a single respondent is just for convenience and comparison reasons and does not have
any further implications. Any other sample size could have been used, but it is common in the literature to
base it on a single respondent.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
250 Getting started
1=H
Dz -error ¼ det O1 ðX; 0Þ ; ð6:11Þ
1=H
~
Dp -error ¼ det O1 ðX; βÞ ; ð6:12Þ
ð 1=H
~
Db -error ¼ βdet ~
O1 ðX; βÞ ~
ðβjθÞd ~
β: ð6:13Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
251 Design and choice experiments
single respondent, and that the AUC matrix can be calculated for any sample
size such that
O
ON ¼ ð6:15Þ
N
Rose and Bliemer (2013) showed that re-arranging the t-ratios for the hth
parameter
βh
th ¼ s:e:n
pffiffiffi
nh
gives
s:e2h
nh ¼ th2 ð6:16Þ
βh
The s-error is then given as the max(nh) under assumptions about the desired
values and non-zero betas. The problem of finding an efficient design can be
described as follows:
Given feasible attribute levels Λjk for all j and k, given the number of choice situations
S, and given the prior parameter values β~ (or probability distributions of β), ~ deter-
mine a level balanced design X with xjks 2 Λjk that minimizes the efficiency error in
Equation (6.11), (6.12), (6.13), or (6.14).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
252 Getting started
Travel time (min.) {10, 20, 30} {15, 30, 45} {15, 25, 35}
Delay/waiting time (min.) {0, 5, 10} {5, 10, 15} {5, 10}
Toll cost/fare ($) {2, 4, 6, 8} {0, 1, 2, 3} {4, 6, 8}
design D-error = ?
full / fractional
factorial
Step 4:
Store design with
lowest efficiency
next error
iteration
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
253 Design and choice experiments
design D-error = ?
Step 4:
Store design with
lowest efficiency
next error
iteration
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
254 Getting started
all variables are dummy coded). Sometimes only swapping is used, sometimes
only relabeling and swapping is used, as special cases of this algorithm type.
A genetic algorithm, also a column-based algorithm, has been proposed by
Quan et al. (2011). In this algorithm, a population of designs is (randomly)
created, and new designs are determined by cross-over of designs in the
population (combining columns of two designs, called the parents, creating
a new design, called the child). The fittest designs in the population, measured
by their efficiency, will most likely survive in the population, while less fit
designs with a high efficiency error will be removed from the population (i.e.,
die). Mutation in the population takes place by randomly swapping attribute
levels in the columns. Genetic algorithms seem to be quite powerful in finding
efficient designs relatively quickly.
If for some reason orthogonality is required in a Dp design, one could
construct a single orthogonal design, from this design easily create a large (but
not huge) number of other orthogonal designs, and then evaluate all these
orthogonal designs and select the most efficient one. Creating other orthogo-
nal designs from a single orthogonal design is relatively simple.
Evaluating each design for the efficiency error is the most time consuming
part of each algorithm; therefore the number of D-error or other efficiency
error evaluations should be kept to a minimum by putting more intelligence
into the construction of the designs. In determining Bayesian efficient designs
this becomes even more important, as the integral in Equation (6.13) cannot
be computed analytically, but only by simulation. Mainly pseudo-random
Monte Carlo simulations have been performed for determining the Bayesian
D-error for each design, which enables the approximation of this D-error by
taking the average of all D-errors for the same design using pseudo-random
draws for the prior parameter values. This is clearly a computation intensive
process, such that finding Bayesian efficient designs is a very time consuming
task. Bliemer et al. (2008) have proposed using quasi-random draws (such as
Halton or Sobol sequences) or preferably Gaussian quadrature methods
instead of pseudo-random draws, which require fewer simulations and there-
fore enable the evaluation of more designs in the same amount of time.
Manually determining efficient designs is only possible for the smallest
hypothetical experiments. Computer software such as SAS and Ngene are able
to generate efficient designs. SAS, however, is limited to MNL models and
does not include Bayesian efficiency measures, while Ngene is able to deter-
mine efficient designs for the ML, NL, and ML models, including Bayesian
efficient designs.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
255 Design and choice experiments
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
256 Getting started
do not satisfy the constraints are removed from this set. This ensures that all
designs generated from this candidature set will be feasible.
Note that it may be hard or even impossible to find an attribute level balanced
design satisfying the constraints, especially when the constraints impose many
restrictions. Also note that in theory RSC algorithms can also be used (see
Section 6.2.6), but that after each relabeling, swapping, or cycling all choice
situations need to be checked for feasibility. Ensuring that all choice situations
are feasible could be difficult, hence RSC algorithms may not be suitable.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
257 Design and choice experiments
Respondent 1 Respondent 2
Design (travel time = 10, toll = 2) (travel time = 30, toll = 3)
Travel time Toll cost Travel time Toll cost Travel time Toll cost
(min.) ($) (min.) ($) (min.) ($)
1. −10% +2 9 4 27 5
2. +10% +1 11 3 33 4
3. +30% +0 12 2 36 3
4. +10% +2 11 4 33 5
5. −10% +0 9 2 27 3
6. +30% +1 12 3 36 4
Hence, instead of creating a design with the actual attribute levels, a pivot
design is created with relative or absolute deviations from references. Suppose
that a single pivot design is created. The efficiency of this design depends on the
references of the respondents, as these determine the actual attribute levels in
the choice situations and therefore the AVC matrix. However, the references of
the respondents are typically not available in advance. Rose et al. (2008) have
compared several different approaches for finding efficient pivot designs:
(a) Use the population average as the reference (yields a single design)
(b) Segment the population based on a finite set of different references (yields
multiple designs)
(c) Determine an efficient design on the fly (yields a separate design for each
respondent)
(d) Use a two-stage process in which the references are captured in the first
stage and the design is created in the second stage (yields a single design).
Intuitively, approach (a) should give the lowest efficiency (individual reference
alternatives may differ widely from those assumed in generating the design),
while approach (d) should yield the highest efficiency (likely to produce truly
efficient data). This was also the outcome of the Rose et al. study. Approach (a)
worked relatively well, and approach (b) only performed marginally better.
Approach (c) and (d) performed best. The outcomes were also compared with
an orthogonal design, which performed poorly. Pivot designs for approaches (a)
and (b) are relatively easy to generate, for approaches (c) and (d) more effort is
needed. Approach (c) requires a Computer-Assisted Personal Interview
(CAPI) or an internet survey, and an efficient design is generated while the
respondent is answering other questions. Approach (d) is sensitive to drop-outs,
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
258 Getting started
as the design will only be optimal if all respondents in the second stage
participate again in the survey. An example of the design of a pivot experiment
using Ngene is given in Section 6.5.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
259 Design and choice experiments
In recent years there has been growing interest within the discrete choice
framework on seeking responses to scenarios where stakeholders select both
the best option and worst option (or attribute) from a set of alternatives, and
this literature recognizes the additional behavioral information in the best and
worst response mechanism (e.g., Marley and Louviere 2005; Marley and
Pihlens 2012). The best–worst scaling delivers more efficient and richer
discrete-choice elicitation than other approaches, and is gaining popularity
as a way to narrow down a set of attributes for a traditional choice experiment
from a much larger set that are candidate influences on preferences. It is hence
an attractive method for the preference assessment of the large number of
statements or attributes, which far exceed the number that might be included
in a comprehensive and comprehendable SC experiment.
Recent advances in survey design for SC experiments suggest that obtaining
a ranking from an iterative set of best–worst choices offers significant advan-
tages in terms of cognitive effort (for example, see Auger et al. 2007; Cohen
2009; Flynn et al. 2007; Louviere and Islam 2008). In addition to the standard
choice response (the most preferred option), best–worst designs include a
response mechanism to reveal the respondents’ perceived worst alternative.
This method can be implemented at the attribute or statement level or at a
choice alternative level. As is common practice with best–worst choice data,
the observation for the worst choice is assumed to be the negative of the best
choice data. Under this assumption, preferences for the least preferred choice
are assumed to be the negative inflection of preferences for the most preferred
choice (see Marley and Louviere 2005; Marley and Pihlens 2012). Best–worst
scaling as a data collection method has been increasingly used in studying
consumer preference for goods or services (Collins and Rose 2011; Flynn et al.
2007; Louviere and Islam 2008; Marley and Pihlens 2012). Best–worst data are
typically analyzed using conditional logit models.
In a recent study (Hensher et al. 2014), involving sets of statements
(Table 6.26) on the design of bus rapid transport (BRT) and light rail transit
(LRT), a Bayesian D-efficient design was developed assuming normally distrib-
uted priors, with means of zero and standard deviations of one. The design allows
for all main effects and was constructed to allow for best–worst choices. In
generating the design, it was assumed that the alternative chosen as best was
deleted when constructing the pseudo-worst choice task. To generate the design,
spherical-radial transformed draws were used (see Gotwalt et al. 2009), assuming
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
260 Getting started
1 There are fewer bus stops than light rail (tram) stations so people have to walk further to catch a bus
2 Light rail (tram) systems provide better network coverage than bus systems
3 A new light rail (tram) line can bring more life to the city than a new bus route in a bus lane or dedicated
corridor
4 A light rail (tram) service looks faster than a bus service in a bus lane or dedicated corridor
5 Light rail (tram) lines are fixed, so light rail (tram) stops provide more opportunity for new housing than
a bus route which can be changed very easily
6 New light rail (tram) stops will improve surrounding properties more than new bus stops or a new bus
route in a bus lane or dedicated corridor
7 Light rail (trams) are more environmentally friendly than buses in a bus lane or dedicated corridor
8 More jobs will be created surrounding a light rail (tram) route than a bus route in a bus lane or dedicated
corridor
9 A light rail (tram) is more likely than a bus service in a bus lane or dedicated corridor to still be in use in
30 years’ time
10 Light rail (tram) services stop nearer to more people than bus services
11 Light rail (tram) services are less polluting than buses
12 Light rail (tram) services are more likely to have level boarding (no steps up or down to get on the
vehicle) than buses
13 Light rail (trams) are quieter than buses
14 Light rail (tram) services have been more successful for cities than bus services in a bus lane or dedicated
corridor
15 Light rail (trams) are more permanent than buses in a bus lane or dedicated corridor
16 Light rail (trams) provide more opportunities for land redevelopment than buses in a bus lane or
dedicated corridor
17 Light rail (trams) provide more focussed development opportunities than buses in a bus lane or
dedicated corridor
18 Light rail (trams) are more likely to be funded with private investment than buses in a bus lane or
dedicated corridor
19 Light rail (trams) support higher population and employment growth than buses in a bus lane or
dedicated corridor
20 Putting down rails and buying light rail (trams) makes a light rail (tram) system cheaper than bus
services running in a bus lane or a dedicated corridor
21 Light rail (tram) systems have lower operating costs than bus services provided in a bus lane or dedicated
corridor
22 Light rail (tram) systems have lower operating costs per person carried than bus services provided in a
bus lane or dedicated corridor
23 Building a new light rail (tram) line will cause less disruption to roads in the area than a new bus route in
a bus lane or dedicated corridor
24 Overall, light rail (trams) and light rail (tram) track have lower maintenance costs than buses in a bus
lane or dedicated corridor
25 Light rail (tram) stops have greater visibility for passengers than bus stops
26 Light rail (trams) have lower accident rates than buses in a bus lane or dedicated corridor
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
261 Design and choice experiments
27 Light rail (trams) provide a more liveable environment than buses in a bus lane or dedicated corridor
28 Light rail (trams) have greater long-term sustainability than buses in a bus lane or dedicated corridor
29 Light rail (trams) provide more comfort for travelers than buses
30 Light rail (tram) systems are quicker to build and put into operation than bus services in a bus lane or
dedicated corridor
31 The long-term benefits of a new light rail (tram) line are higher than a new bus route in a bus lane or
dedicated corridor
32 House prices will rise faster around new light rail (tram) stops than bus stops associated with a bus lane
or dedicated corridor
33 Light rail (trams) provide better value for money to taxpayers than buses in a bus lane or dedicated
corridor
three radii and two randomly rotated orthogonal matrices. The final design had
22 choice tasks. An Illustrative preference screen is given in Figure 6.13. The
underlying experimental design is given in Appendix 6A. The Nlogit syntax used
to estimate a choice model is given in Table 6.27. Each respondent was given four
best–worst choice tasks.
To show how the data is set up, we present part of the data set for the first
respondent in Table 6.28. The names of each variable in Table 6.27 are column
headings in Table 6.28. There are 4 choice sets, each represented by 7 rows.
The first 4 rows (cset=4) are the unlabeled alternatives for the full choice set,
and the last 3 rows (cset=3) are the same unlabeled alternatives minus the
most preferred alternative in the 4-alternative set. Altype is an indicator (1,−1)
to identify the two specifications for the best (1) and the worst (−1) preference
regime. It is used to construct a sign reversal for the attribute levels under the
worst preference form. Altij indicates which alternative is associated with each
row, noting again that the best alternative has been removed from the worst
choice set. The choice column indicates which alternative was chosen as the
best and as the worst. We have listed the first 10 statements (out of the full set
of 66, i.e., 33 BRT preferred and 33 LRT preferred to show the sign reversal in
the worst preferred choice set). The full model set up in Nlogit shows the way
in which the data are used. There are only 4 alternatives but the relevant set is
recognized via cset and altij for each of the 8 (i.e., 4 and 3 alternative choice
sets for each of 4 choice scenarios). Given the dummy variable nature of this
specific data, the marginal utility (or parameter estimate) is relative to one of
the alternatives, arbitrarily selected as the 33rd statement for both the BRT
favoring and LRT favoring statements, where the BRT favoring statement is
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
Table 6.27 Nlogit syntax for estimating a choice model
nlogit
;lhs=choice,cset,Altij
;choices=A,B,C,D
;smnl;pts=200
;pds=4;halton
;model:
U(A,B,C,D) = <AASC,BASC,CASC,0>+
s1b*stat101+s2b*stat102+s3b*stat103+s4b*stat104+s5b*stat105+s6b*stat106+s7b*stat107+s8b*stat108
+s9b*stat109+s10b*stat110+s11b*stat111+s12b*stat112+s13b*stat113+s14b*stat114+s15b*stat115
+s16b*stat116+s17b*stat117+s18b*stat118+s19b*stat119+s20b*stat120+s21b*stat121+s22b*stat122
+s23b*stat123+s24b*stat124+s25b*stat125+s26b*stat126+s27b*stat127+s28b*stat128+s29b*stat129
+s30b*stat130+s31b*stat131+s32b*stat132/
s1lr*stat201+s2lr*stat202+s3lr*stat203+s4lr*stat204+s5lr*stat205+s6lr*stat206+s7lr*stat207+s8lr*stat208
+s9lr*stat209+s10lr*stat210+s11lr*stat211+s12lr*stat212+s13lr*stat213+s14lr*stat214+s15lr*stat215
+s16lr*stat216+s17lr*stat217+s18lr*stat218+s19lr*stat219+s20lr*stat220+s21lr*stat221+s22lr*stat222
+s23lr*stat223+s24lr*stat224+s25lr*stat225+s26lr*stat226+s27lr*stat227+s28lr*stat228+s29lr*stat229
+s30lr*stat230+s31lr*stat231+s32lr*stat232$
Consider a scenario where the government was planning to build a public transport corridor in your city.
Thinking about buses operating on bus only dedicated roads bersus light rail (trams), which of hte follwing statements most and least
decribe the service and/or design characteristics of a busway over a light rail (tram) line:
Game 7
Next
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
Table 6.28 The data set up for analysis of best–worst data
RespId GameNo BlockId GameId BusImg TramImg Versus CSet AtlType Altij Choice StateId State01 State02 State03 State04 State05 State06 State07 State08 State09 State10
1 1 15 3 2 2 1 4 1 1 0 27 0 0 0 0 0 0 0 0 0 0
1 1 15 3 2 2 1 4 1 2 0 9 0 0 0 0 0 0 0 0 1 0
1 1 15 3 2 2 1 4 1 3 1 30 0 0 0 0 0 0 0 0 0 0
1 1 15 3 2 2 1 4 1 4 0 25 0 0 0 0 0 0 0 0 0 0
1 1 15 3 2 2 1 3 −1 1 0 −27 0 0 0 0 0 0 0 0 0 0
1 1 15 3 2 2 1 3 −1 2 0 −9 0 0 0 0 0 0 0 0 −1 0
1 1 15 3 2 2 1 3 −1 4 1 −25 0 0 0 0 0 0 0 0 0 0
1 3 15 1 1 1 2 4 1 1 1 7 0 0 0 0 0 0 1 0 0 0
1 3 15 1 1 1 2 4 1 2 0 20 0 0 0 0 0 0 0 0 0 0
1 3 15 1 1 1 2 4 1 3 0 30 0 0 0 0 0 0 0 0 0 0
1 3 15 1 1 1 2 4 1 4 0 26 0 0 0 0 0 0 0 0 0 0
1 3 15 1 1 1 2 3 −1 2 0 −20 0 0 0 0 0 0 0 0 0 0
1 3 15 1 1 1 2 3 −1 3 0 −30 0 0 0 0 0 0 0 0 0 0
1 3 15 1 1 1 2 3 −1 4 1 −26 0 0 0 0 0 0 0 0 0 0
1 5 15 4 1 2 2 4 1 1 0 30 0 0 0 0 0 0 0 0 0 0
1 5 15 4 1 2 2 4 1 2 0 21 0 0 0 0 0 0 0 0 0 0
1 5 15 4 1 2 2 4 1 3 1 22 0 0 0 0 0 0 0 0 0 0
1 5 15 4 1 2 2 4 1 4 0 10 0 0 0 0 0 0 0 0 0 1
1 5 15 4 1 2 2 3 −1 1 0 −30 0 0 0 0 0 0 0 0 0 0
1 5 15 4 1 2 2 3 −1 2 0 −21 0 0 0 0 0 0 0 0 0 0
1 5 15 4 1 2 2 3 −1 4 1 −10 0 0 0 0 0 0 0 0 0 −1
1 7 15 2 2 1 1 4 1 1 1 20 0 0 0 0 0 0 0 0 0 0
1 7 15 2 2 1 1 4 1 2 0 25 0 0 0 0 0 0 0 0 0 0
1 7 15 2 2 1 1 4 1 3 0 13 0 0 0 0 0 0 0 0 0 0
1 7 15 2 2 1 1 4 1 4 0 6 0 0 0 0 0 1 0 0 0 0
1 7 15 2 2 1 1 3 −1 2 0 −25 0 0 0 0 0 0 0 0 0 0
1 7 15 2 2 1 1 3 −1 3 0 −13 0 0 0 0 0 0 0 0 0 0
1 7 15 2 2 1 1 3 −1 4 1 −6 0 0 0 0 0 −1 0 0 0 0
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
264 Getting started
associated with the s1b to s32b parameters and the LRT favoring statement is
associated with the s1lr to s32lr parameters. Given the panel nature of the data,
we have used a scaled multinomial logit (SMNL) model (see Chapters 4 and 15)
to account for the correlated nature of the 4 choice sets.
The example above is one format for a best–worst experiment. Rose (2014)
sets out the range of options in some detail. There are three unique approaches
to best–worst survey response mechanisms. In case 1, respondents are asked
to choose the most and least preferred object from a set of objects (e.g.,
Louviere et al. 2013). In case 2, the task consists of respondents viewing a
set of attributes, each described by a series of attribute levels, and being asked
to select the most and least preferred attribute or level out of the set shown
(e.g., Beck et al. 2013). Case 3 involves the respondents viewing a set of
alternatives, each described by a number of attributes and levels, and being
asked to select the best and worst alternative from those shown (e.g., Rose and
Hensher 2014). See Appendix 6B on how Ngene sets up the designs for each of
these three cases.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:58 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
265 Design and choice experiments
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
266 Getting started
Within the SC experiment, the eight attributes (four per each alternative) can
take on different levels over the different choice tasks shown to respondents.
Let us assume that each attribute can take on one of three levels, more
precisely, Lk = {1,2,3} for k = 1,. . .,4. These values were chosen for demonstra-
tion purposes. Following common practice, we constrain ourselves to attri-
bute level balanced designs (although such a constraint may result in the
generation of a sub-optimal design).
We will examine three different design types: (a) a D-efficient design, (b) an
orthogonal design, and (c) an S-efficient design. The D-efficient design aims to
minimize all (co)variances of all parameter estimates, the orthogonal design
minimizes the correlations between the attribute values to zero, and the
sample size efficient design aims to minimize the sample size needed to obtain
statistically significant parameter estimates. In order to generate the D- and S-
efficient designs, it is necessary to assume prior parameter estimates. For the
current case study, we have selected for illustrative purposes the following
prior parameter estimates in generating the designs:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
267 Design and choice experiments
As such, we treat the first two parameters as random parameters drawn from a
Normal distribution with a certain mean and standard deviation. The last two
parameters are treated as fixed parameters. In generating and evaluating each
of the designs, we employ Gaussian quadrature with six abscissae associated
with each random parameter (see Bliemer et al. 2008), which should provide a
very accurate approximation of the random parameter distributions and the
resulting simulated likelihood, and a simulated sample of 5,000 respondents to
obtain the choice vector needed for computing the AVC matrix of the panel
MMNL model (see Bliemer and Rose 2010).
The three designs are presented in Table 6.29, where each represents the
expected simulated probability that alternative j will be chosen in choice task s.
In line with common practice, the orthogonal design generated was selected at
random. Methods of manipulating the attribute levels so as to generate and
locate D-efficient designs are discussed in detail in Kessels et al. (2009) and
Quan et al. (2011), among other sources. For finding the D-efficient designs
and S-efficient designs presented here, we used several simple randomization
and swapping heuristics on the attribute levels using Ngene 1.2 (ChoiceMetrics,
2012).
As expected, the worst performing design in terms of D- and S-error is the
orthogonal design. The orthogonal design is also the only design in which
some choice probabilities show some dominant alternatives, namely in choice
tasks 3 and 4 the second alternative will be chosen in most cases (94 and 93
percent, respectively). More worrying, the orthogonal design has one choice
task with identical alternatives (see choice task 10), and four choice tasks with
strict dominating alternatives (choice tasks 1, 3, 4, and 8), in which one
alternative is better or equal in all of the attributes. Note that this cannot be
observed from the choice probabilities, but has to be identified by inspecting
the levels for each attribute in conjunction with the sign of the prior parameter
value. In generating the D- and S-efficient design, we implemented extra
checks that ensure that no choice tasks with identical alternatives or strictly
dominant alternatives are generated. Strictly dominant alternatives should be
avoided at all times and excluded from the data set to avoid biases in para-
meter estimates.
The D-error of the orthogonal design is roughly three times larger than the
D-error of the D-efficient design. This means roughly that on average the
standard
pffiffiffi error of the parameter estimates using the orthogonal design will be
3 1:73 times larger than the average standard error of the estimates using
the D-efficient design (they are in fact on average 1.76 times larger). This in
turn means that almost twice as many observations using the orthogonal
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
268 Getting started
s j xsj1 xsj2 xsj3 xsj4 E(Psj) xsj1 xsj2 xsj3 xsj4 E(Psj) xsj1 xsj2 xsj3 xsj4 E(Psj)
design are required in order to obtain the same values for the standard errors.
This demonstrates that information on prior parameter estimates can clearly
help significantly in making a more efficient design. In cases where one has no
information on the parameter estimates whatsoever, it is common practice to
assume that the prior parameter estimates are all equal to zero. As mentioned
in Rose and Bliemer (2005), assuming all zero priors (i.e., assuming that no
information exists on any of the parameters, not even the sign), an orthogonal
design will be the most efficient design.
Therefore, an orthogonal design will be a good design in a worst case scenario
(i.e., when no prior information is available to the analyst). Unfortunately, it
may be possible to generate a large number of different orthogonal designs for
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
269 Design and choice experiments
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
270 Getting started
t-ratio t-ratio
β2(µ)
β4 β4
25 β1(µ) 25 β2(µ)
20 20 β1(µ)
15 15
β3
10 10
β1(σ) β3
5 β2(σ) 5 β1(σ)
β2(σ)
0 N 0 N
0 100 200 300 400 500 0 100 200 300 400 500
57.51 (a) (b) 512.43
t-ratio
25
20
β2(µ)
15 β1(µ)
β4
10 β3
β2(σ)
5 β1(σ)
0 N
0 100 200 300 400 500
49.49 (c)
Figure 6.14 Asymptotic t-ratios for different sample sizes for the (a) D-efficient design, (b) orthogonal design, and (c)
S-efficient design
6.5.2 Effect of number of choice tasks, attribute levels, and attribute level range
In order to analyze the impact of different designs on D-efficiency and S-
efficiency, we analyze the following effects: (i) effect of the number of choice
tasks, S; (ii) effect of the number of attribute levels; and (iii) the effect of the
attribute level range.
Previously, it was assumed that each respondent reviewed twelve choice
tasks, which are essentially arbitrarily chosen (although for attribute level
balance it should be a multiple of the number of attribute levels). Typically,
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
271 Design and choice experiments
all choice tasks are shown to each respondent and in order to avoid a too high
burden on the respondent, the number of choice tasks is preferably limited.
Alternatively, one could give each respondent only a sub-set (block) of the
complete design, but it should be noted that such a blocking strategy should be
part of the design generation process for a panel MMNL model, as there will
only be dependent choice observations within each block, not across blocks.
Simultaneous optimization of the design and the blocking scheme is therefore
not trivial for the panel MMNL model and is beyond the scope of this chapter.
The minimum number of choice tasks, S, is determined by the number of
degrees of freedom, which is essentially the number of parameters to estimate, K.
It needs to hold that in our case study, since K = 6, the minimum is therefore
S = 6. A D-efficient design can be found using this minimum number of choice
tasks, whereas there does not exist an orthogonal design with this number of
choice tasks. As such, in many instances orthogonal designs will be required to
be (much) larger than is necessary. Using the same attribute levels as before, we
vary the number of choice tasks, from 6 to 27. Finding larger designs with 30 or
more choice tasks tends to be problematic, as it is increasingly difficult to find
additional choice tasks without any strictly dominant alternative. For each
design size, a D- and S-efficient design is constructed. The D-errors and
S-errors are shown in Table 6.30. As the D-error and S-error will always decrease
with the design size, in order to make a fair comparison we investigate whether
the overall efficiency is improved by normalizing for the design size. The D-error
is normalized to a single choice task by multiplying the D-error by S. Also, for
comparison reasons, the S-error is normalized by multiplying it times S, result-
ing in the number of observations. If the normalized D-error decreases, it means
that the decrease of the D-error is not just because an extra choice task is added,
but because it actually increases the overall efficiency. Similarly, if the normalized
S-error decreases, it means that we can obtain the same t-ratio of the most
difficult to estimate parameter using fewer observations in total.
Clearly, the D-error and S-error will decrease with larger designs; however,
once normalized to the number of choice tasks, the increase in efficiency due
to a larger design is not that great. Looking at designs optimized for D-error,
the drop in the normalized D-error from 18 to 27 choice tasks is small. Very
small designs are not very efficient, but there does not seem to be a reason to
generate very large designs. A similar conclusion can be drawn for designs that
are optimized for S-error, where there is a steep initial decline in the number
of observations required when moving from a very small design to a larger
design, while this decline becomes smaller for even larger designs. However,
the decrease in the normalized S-error is noticeably larger than the decrease in
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
272 Getting started
Table 6.30 Effect of the number of choice tasks on D-error and S-error
D-error (D) 0.779 0.455 0.326 0.256 0.210 0.179 0.155 0.137
Normalized D-error (D.S) 4.67 4.10 3.91 3.84 3.77 3.75 3.72 3.70
S-error (N) 253.4 91.9 57.5 43.3 32.8 24.3 20.8 17.6
Normalized S-error (N.S) 1520 827 690 650 590 510 498 476
MNL normalized D-error 2.47 2.47 2.44 2.47 2.46 2.47 2.48 2.50
Designs optimized for S-error
D-error (D) 1.184 0.706 0.454 0.378 0.356 0.263 0.211 0.182
Normalized D-error (D.S) 7.10 6.35 5.45 5.67 6.41 5.52 5.06 4.91
S-error (N) 174.1 89.1 49.5 35.0 28.0 22.7 19.0 15.8
Normalized S-error (N.S) 1045 802 594 524 504 476 456 428
MNL normalized S-error 153.3 151.1 150.6 150.7 150.8 150.9 151.0 151.2
the D-error. As mentioned before, the standard deviation parameters are the
most difficult ones to estimate in our panel MMNL model, and it seems that
collecting more data from a single respondent (i.e., using a larger design)
contributes to the efficiency of estimating these parameters. This is an interest-
ing result, as it is different from the conclusions that can be drawn for the MNL
model. If we would optimize for an MNL model (with parameters β1 = 0.6, β1 =
−0.9, β3 = −0.2, and β4 = 0.8), we observe from Table 6.30 that there is no need to
go beyond 12 choice tasks, as the choice tasks that provide the most information
are already in the design, and the normalized D-error and S-error may even go
up. This is consistent with the conclusion in Bliemer and Rose (2011) that, in
terms of the normalized D- or S-error, a relatively small design for an MNL
model is just as efficient (and often more efficient) as a large design. For the
panel MMNL, it seems that at least the standard deviation parameters benefit
from larger designs. Note that this analysis is from a statistical point of view, and
it is questionable whether a respondent is actually willing to face 27 choice tasks.
Using 12 choice tasks, we also vary the number of levels for each attribute
from two to four levels, and simultaneously make the attribute level range
narrower and wider. The attribute levels are shown in Table 6.31. For each
combination of number of levels and level range, in total nine combinations,
we again find D- and S-efficient designs.
The lowest D-errors and the minimum sample sizes (based on the S-error
calculation) for all combinations are listed in Table 6.32. There is a consistent
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
273 Design and choice experiments
Table 6.32 Effect of number of levels and level range on D-error and sample size
Narrow range Medium range Wide range Narrow range Medium range Wide range
pattern that favors two-level designs with a wide level range, in terms both of
D-error and of sample size requirements (such designs are sometimes referred
to as end-point designs, see Louviere et al. 2000). While it appears that the
number of levels does make a difference, it is the attribute level range that has a
substantial impact upon the overall efficiency of the design (and has been
shown by Louviere and Hensher (2001) to have the greatest impact on the
WTP estimates). Therefore, it is recommended for linear relationships to
choose the attribute level range as wide as realistically makes sense.
Table 6.33 lists the D- and S-efficient designs in the case of two levels with
wide range. By moving to such an end-point design, the minimum sample size
for obtaining statistically significant parameter estimates has decreased from
around 50 to 11. For completeness, we also list an orthogonal design in
Table 6.33, which performs poorly and is again problematic (it contains
identical alternatives in choice tasks 7 and 12, and strictly dominant alter-
natives in choice tasks 3, 5, and 9). The two efficient designs show similarities,
having 7 out of 12 choice tasks in common.
A remark has to be made with respect to the number of levels. Designs with
only a few attributes (e.g., two or three) may not always benefit from using two
levels and a wide level range. This is due to the fact that the designs are likely to
have dominant alternatives (that is, an alternative that will be chosen with a
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
274 Getting started
s j xsj1 xsj2 xsj3 xsj4 E(Psj) xsj1 xsj2 xsj3 xsj4 E(Psj) xsj1 xsj2 xsj3 xsj4 E(Psj)
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
275 Design and choice experiments
of using only two levels, however, is that one is restricted to testing linear
relationships for that attribute (see Hensher et al. 2005) instead of non-linear
effects with dummy or effects coding.
900 900
800 800
700 700
600 600 β4
β1(µ)
500 500 β3
400 400
200
β2(µ) 200
β1(µ)
100
β1(σ) β4 100
β3
0 0
−40% −30% −20% −10% 0% +10% +20% +30% +40% −40% −30% −20% −10% 0% +10% +20% +30% +40%
(a) deviation (b) deviation
S-error
900
800
700
600
500
400
300 β2(σ)
200 β1(σ)
β3 β2(µ)
100 β1(µ)
β4
0
−40% −30% −20% −10% 0% +10% +20% +30% +40%
(c) deviation
Figure 6.15 Impact of prior misspecification on the sample size for the (a) D-efficient design, (b) orthogonal design,
and (c) S-efficient design
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
276 Getting started
the overall efficiency of the design, we determine the S-errors when each true
parameter independently deviates between −40 percent and +40 percent of its
prior parameter value.
A number of points are worth noting from this exercise. Firstly, the
D-efficient and S-efficient designs are much more robust against prior mis-
specification than the orthogonal design, with lower losses of efficiency due to
parameter mis-specification. Second, especially mis-specifying the standard
deviation priors seems to result in large efficiency losses (although in the
orthogonal design also mis-specification of one of the mean priors can lead to
significantly larger required sample sizes). It is interesting to note that the
smaller the magnitude of the standard deviation parameters relative to what
was assumed in the design generation process, the greater the loss of efficiency
experienced for all types of designs.
Robustness of a design can be improved by assuming Bayesian priors (i.e.,
probability distributions) instead of local (fixed) priors, as argued in Sándor
and Wedel (2001), and for which efficient algorithms for the MNL model have
been proposed in Kessels et al. (2009). However, generating Bayesian efficient
designs for the panel MMNL model is computationally extremely challenging
and at this moment not feasible except for the smallest of designs.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
277 Design and choice experiments
if(A.att1=2, B.att1=[4,6]),
if(A.att2<3, B.att2=[3,5])
;model:
U(A) = A0[-0.1] +
G1[n,-0.4,0.1] * att1[2,4,6] +
G2[u,-0.4,-0.2] * att2[1,3,5] +
A1[0.7] * att3[2.5,3,3.5] +
A2[0.6] * att4[4,6,8] /
U(B) = B0[-0.2] +
G1 * att1 +
G2 * att2 +
B1[-0.4] * att7[2.5,4,5.5] +
B2[0.7] * att8[4,6,8]
$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
278 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
279 Design and choice experiments
Choice a.att1 a.att2 a.att3 a.att4 b.att1 b.att2 b.att7 b.att8 Block
situation
1 6 5 3 4 2 1 4 4 1
2 6 3 2.5 6 4 3 4 4 2
3 2 3 3.5 6 4 3 4 6 2
4 6 1 3 4 2 5 2.5 6 1
5 6 5 2.5 6 6 3 4 4 2
6 2 5 2.5 4 6 3 2.5 8 1
7 6 5 3 8 2 5 4 8 2
8 4 3 3.5 8 6 5 4 6 1
9 4 5 2.5 6 4 1 2.5 6 2
10 4 3 3.5 8 6 1 5.5 4 1
11 4 5 2.5 6 4 5 2.5 4 1
12 4 1 2.5 4 2 3 5.5 8 1
13 2 1 3 8 6 3 5.5 4 2
14 2 3 3.5 8 4 5 5.5 6 1
15 6 3 3 8 4 1 5.5 8 1
16 6 5 3.5 4 6 1 5.5 8 2
17 2 1 3 6 4 3 2.5 6 2
18 4 1 3.5 4 2 5 2.5 8 2
This design has a d-error of 0.329256. While for the same design specifica-
tion this measure can be compared across designs, it is not in itself particularly
informative. That said, a value greater than 1 is generally indicative of a poor
experimental design. Another red flag is if the choice probabilities for specific
alternatives are very high or very low, where Ngene allows these probabilities
to be interrogated.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
280 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
281 Design and choice experiments
that this approach is different from generating three separate designs inde-
pendently, as here only a single Fisher Information matrix is calculated, as a
weighting of the various segments, and from this combined matrix the overall
efficiency is calculated (note how the ;fisher property is linked to the ;eff
property by the label “fish” in this example). This is consistent with estimating
a single model from all respondents, irrespective of what reference alternative
they experience.
Alternatively, a homogenous design could be generated. This results in a
single design only, albeit one with an efficiency measure informed by the three
reference alternative segments. A homogenous design could be specified by
using:
;fisher(fish) = des1(small[0.33], medium[0.33], large[0.34])
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
282 Getting started
Table 6.34 Pre-defined attributes and attribute levels for survey design
Description of investment
Construction cost cost 0.5, 1, 3, 6 b$ 4
Construction time time 1,2,5,10 4
% metropolitan population serviced pop 5,10,15,20 4
% route dedicated to this system only and no other roway 25,50,75,100 4
means of transport
Operating and maintenance cost per year opcost 2,5,10,15 m$ 4
(million)
Service levels:
Service capacity in one direction (passengers/hour) capa 5k, 15k, 30k 4
Peak frequency of service, every . . . pfreq 5,10,15 mins 3
Off-peak frequency of service, every . . . ofreq 5,10,15,20 mins 4
Travel time (door-to-door) compared to car tcar −10, 10, 15, 25 % 4
Fare per trip compared to car-related costs fare ±20, ±10% 4
(fuel, tolls, parking)
Features of the system:
Off-vehicle prepaid ticket required prepaid Yes, No 2
Integrated fare ticket Yes, No 2
Waiting time incurred when transferring wait 1, 5,10,15 mins 4
On-board staff for passenger safety and staff present, absent 2
security
Ease of boarding public transport vehicle board level boarding, steps 2
General characteristics of investment:
Operation is assured for a minimum of yearop 10,20,30,40,50,60 years 6
Risk of it being closed down after the assured close 0,25,50,100% 4
minimum period
Attracting business around stations/stops buss low, medium, high 3
% car trips switching to this option within first shiftcar 0,5,10,20 % 4
3 years of opening
Overall environmental friendliness compared env ±25,−10,±5, 0 % 6
to car
The two systems described above are actually brt BRT, LRT 2
Table 6.34, alongside the attribute levels and attribute names. The survey is
designed with the same route length for BRT and LRT systems, which are
referred to as System A (sysA) and System B (sysB) in the choice experiment.
Thus, the survey is designed as unlabeled with an exception being that the
differences between BRT and LRT systems are treated as an attribute in the
experiment, as voters may have different images about bus-based and rail-
based systems.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
283 Design and choice experiments
Each respondent is asked to answer 2 choice tasks. Given the number of levels
for each attribute and the desire to maintain attribute level balance, the survey
is designed with 24 rows (i.e., choice tasks) and blocked into 12 blocks so that
each respondent will be assigned a block with 2 choice tasks (24 tasks/12
blocks = 2 tasks per block). This design use a D-error measure for finding an
efficient design for estimating MNL models, which is specified in the syntax
with the ;eff = (mnl,d) command. In addition, a set of conditions is employed
to require the peak-hour level of service to be no worse than the off-peak level
of service. This is executed in Ngene by a ;cond: command, which in this
example limits the off-peak level of service to a set of attribute levels. The first
condition states that if the level of peak-hour frequency of System A is 10
(minutes), then the allowed levels of off-peak frequency of System A are 10,
15, and 20 (minutes). When the level of peak frequency is 5 minutes, the off-
peak frequency can be any of the pre-defined levels, and thus no condition is
required.
This survey is designed for estimating MNL models using a ;model:
command to define the utility functions of the alternatives (System A and
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
284 Getting started
System B), set up like Nlogit. When generating an efficient design, each
parameter must have a prior value, which can be fixed (e.g., MNL model) or
random (e.g., mixed logit model). Prior parameters are specified within
square brackets, immediately following the parameter names. For example,
the syntax above uses zeros as the prior parameters of all service attributes
but only construction cost is explicitly specified with a prior of zero (cost[0]),
while prior parameters associated with other attributes are left empty to
receive the Ngene default value of zero. When priors are not available, as in
the case of this study, zero priors may be used to create a pilot survey and
distribute to a small proportion of the sample. This pilot survey provides
data for model estimation to obtain priors for generating an efficient design
for a main survey.
The above syntax also shows how Ngene handles designs with non-linear
relationships through the model utility functions with dummy coded attri-
butes. To specify dummy variables, the parameter names need to be followed
by the syntax .dummy, such as prepaid.dummy in the above example. Prior
parameters of dummy variables are required for l−1 levels and must be
specified within square brackets, separated by a | symbol. An example syntax
is buss.dummy[0|0]*buss[0,1,2], where the first two levels of attracting busi-
ness around stations/stops have been assigned a prior value of 0. Effect coded
variables are handled in a similar way with .effect being used after parameter
name in place of the .dummy syntax.
The pre-defined levels of each attribute are specified within square
brackets, following attribute names and being separated by a comma (,).
The attribute levels and prior parameters should only be defined the first
time the attributes appear in the utility function. For example, in the above
syntax all attribute levels and prior parameters are defined in the utility
function of System A and are not defined again in the utility function of
System B.
A screenshot below shows a design generated by Ngene using the above
syntax. The output includes information on the efficiency measures related to
the design, which in this case are D-error, A-error, percentage of utility
balance (B estimate) and minimum sample size (S estimate) for estimating
significant parameters, assuming the priors are correct. As zero priors are
used, utilities of System A and System B are both zero, which produces a
design with 100 percent utility balancing and a minimum sample size of zero
(i.e., no need to do the survey if all parameters are not statistically different
from zero).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
285 Design and choice experiments
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
286 Getting started
We now want you look at various scenarios that describle differnt ways in which taxpayers money might be spent on building new infrastructure.
The Table below summarises a scenario of two public transport systems (called and) with the same Rute Length.
We ask you to review these systems and then choose an answer for each of the following questions.
Description of investement:
Construction cost $6000m $500m
Fare per trip compared to car-related costs (fuel, tolls, parking) 15% higher 10% lower
Risk of it being closed down after the assured minimum period 100% 0%
% car trips switching to this option within first 3 years of opening 10% 5%
If you were voting now, which one would you vote for?
Which investment would improve the liveability of the metroplitan area more?
Next
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
287 Design and choice experiments
6.7 Conclusions
One important point that we hope that the reader does take away from this
chapter is the inappropriate use of a number of statistical measures which
have come to be prevalent within the literature. In particular, we deliber-
ately point out two such measures, one which is designed specifically to
optimize designs for linear models and the second which is used to opti-
mize designs assuming an MNL model specification with generic local
priors equal to zero under orthonormal codes (i.e., the specific case exam-
ined by Street and Burgess 2004). While use of these measures are perfectly
valid if one wishes to optimize a design under these express assumptions,
applying these measures to determine how optimal a design is generated
under other assumptions (including different model specification, prior
parameter assumptions, and coding structures) is both incorrect and mis-
leading (we are not implying that those who devised these measures have
applied the measures incorrectly; however, we can attest to the fact that a
number of reviewers have over time applied them inappropriately to infer
that designs are not optimal, even when generated under different sets of
assumptions). In this chapter, it has been argued that the use of orthogonal
designs for non-linear models, such as the logit model, will be inefficient
under most, but not all, assumptions made during the design generation
phase (e.g., the specific case examined by Street and Burgess has shown that
orthogonal designs are optimal under some assumptions). Nevertheless,
orthogonal designs remain to this day the most widely used design type.
Such prevalence is the result of the fact that orthogonal designs appear to
(and actually do) work well in most cases, and it is important to understand
why this is the case.
Designs of all types, whether orthogonal or non-orthogonal, are gener-
ated under assumptions about the true population parameter estimates
(i.e., the priors that are assumed). These assumptions are either explicitly
acknowledged by those generating the design or implicitly made without
their knowledge. Perhaps unknown to many, an orthogonal design will be
the optimal design under the assumption of locally optimal parameter
estimates set at zero (see Bliemer and Rose 2005a). As per Figure 6.3, if
the true population parameters differ from those that are assumed in the
design generation phase, then the design will generally lose statistical
efficiency. The impact of such a loss of efficiency can be seen in
Figure 6.16. Figure 6.16 shows the relationship between the standard errors
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
288 Getting started
s.e
˜ =0
B β˜ k ≠ 0 Prior parameter
k
and the parameter priors assumed for two different locally optimal designs
generated under zero prior parameters and non-zero prior parameters. As
shown in the figure, if the prior parameter is incorrectly specified, this will
typically result in an increased standard error at the true parameter value,
all else being equal. Note that this does not mean that the true parameter
cannot be estimated by the design, but simply that a larger sample size
would be required to detect the statistical significance of the parameter
estimate than otherwise would have been the case had the prior parameter
assumed been correct. It is for this precise reason that orthogonal designs
have appeared to work well in the past and will likely continue to work well
into the future. That is, the sample sizes used in practice have in most cases
reported in the literature been such that they have sufficiently outweighed
any loss of efficiency in the design as the true parameters diverge from those
assumed in generating the design. The point of those advocating non-
orthogonal designs generated under non-zero prior parameter estimates,
however, is that in undertaking SC experiments, one would assume that the
attributes chosen will have some influence in the choices made by the
respondents, and hence the true population parameters will be non-zero.
In such cases, the argument is that these designs will outperform orthogo-
nal designs given similar sample sizes, or produce the same results as an
orthogonal design but with smaller sample sizes.
It is important to note that the above discussion is predicated on the
assumption of all else being equal. That is, it assumes that there exists no
link between the population parameter estimates and the design itself.
Several articles have convincingly argued that the design may result in
unintended biases of the parameter estimates (e.g., Louviere and Lancsar
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
289 Design and choice experiments
2009). In theory however, this should not be the case. McFadden (1974)
showed that asymptotically the parameter estimates should converge to the
population parameters, independent of the data matrix (i.e., design in this
instance). Using Monte Carlo simulations, McFadden further showed that
this was the case in quite small finite samples, with as few as 50 choice
observations. Numerous studies using simulation have led to the same
conclusions (e.g., see Ferrini and Scarpa 2007). However, the arguments
put forward by Louviere and Lancsar (2009) remain compelling. They posit
that if the design attributes correlate with unobserved omitted covariates or
latent contrasts such as personality profiles or other such characteristics,
then the resulting parameters obtained from different designs will indeed
be influenced by the specific design used. Such biases will not exist in
simulated data unless they are assumed in the data generation process,
which makes empirical studies far more important in determining if these
biases are real or not. This thus represents an important area of research
that is urgently required, as the existence of any such biases may require a
different line of enquiry in terms of generating designs than has occurred in
the past, as outlined in this chapter.
Similarly, the impact of designs upon scale also represents an important
research area. Louviere et al. (2008) and Bliemer and Rose (2011) found
scale differences across various designs relating to how easy or hard the
resulting questions are as generated from the design. Both Louviere et al.
and Bliemer and Rose found, for example, that orthogonal designs tended
to lead to lower error variances than efficient designs, possibly as a result of
the presence of dominated alternatives. Given that efficient designs are less
likely to have dominated alternatives than orthogonal designs, the ques-
tions arising from the use of orthogonal designs will be easy to answer,
resulting in lower error variance. As such, there exists the very real possi-
bility that any move away from orthogonal designs to other designs repre-
sents a trade-off between capturing more information per question versus
lowering error variance. Once more, further research is required to address
this specific issue.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
290 Getting started
Id blockId gameId descrIdA descrIdB descrIdC descrIdD busImg tramImg Id blockId gameId descrIdA descrIdB descrIdC descrIdD busImg tramImg
1 1 1 24 28 26 10 1 2 69 18 1 63 21 58 32 2 1
2 1 2 8 36 57 4 2 1 70 18 2 44 61 3 47 1 1
3 1 3 17 14 11 2 2 2 71 18 3 32 29 10 17 2 2
4 1 4 4 46 40 64 1 1 72 18 4 36 45 49 54 1 2
5 2 1 16 18 7 2 2 2 73 19 1 14 13 21 66 1 2
6 2 2 65 39 49 51 2 1 74 19 2 62 54 8 48 1 1
7 2 3 20 7 12 23 1 1 75 19 3 30 1 31 28 2 2
8 2 4 46 34 62 45 1 2 76 19 4 59 41 8 51 2 1
9 3 1 35 8 54 65 2 1 77 20 1 19 15 14 16 1 2
10 3 2 22 32 24 60 2 2 78 20 2 50 3 41 39 2 2
11 3 3 4 38 45 3 1 1 79 20 3 62 39 47 59 2 1
12 3 4 23 27 5 24 1 2 80 20 4 21 9 19 25 1 1
13 4 1 40 36 65 3 1 1 81 21 1 35 41 6 61 1 1
14 4 2 13 20 24 19 1 2 82 21 2 28 32 66 9 1 2
15 4 3 63 24 17 7 2 1 83 21 3 52 46 57 49 2 1
16 4 4 54 6 50 57 2 2 84 21 4 14 7 32 31 2 2
17 5 1 25 13 63 7 1 2 85 22 1 51 44 54 40 2 2
18 5 2 37 6 4 34 2 1 86 22 2 15 56 58 7 1 1
19 5 3 29 20 2 30 2 2 87 22 3 26 16 32 13 2 1
20 5 4 3 8 43 6 1 1 88 22 4 51 55 62 43 1 2
21 6 1 28 56 13 11 1 1 89 23 1 10 25 11 15 2 2
22 6 2 49 48 40 42 2 2 90 23 2 52 4 61 62 2 1
23 6 3 5 28 12 21 1 2 91 23 3 60 63 15 29 1 1
24 6 4 36 62 64 50 2 1 92 23 4 8 65 46 50 1 2
25 7 1 19 63 10 30 1 1 93 24 1 57 51 3 64 2 2
26 7 2 39 43 53 40 1 2 94 24 2 7 26 21 31 2 1
27 7 3 20 11 27 32 2 1 95 24 3 65 60 30 56 1 1
28 7 4 40 59 45 35 2 2 96 24 4 48 35 52 36 1 2
29 8 1 55 45 8 37 2 2 97 25 1 5 31 17 16 1 2
30 8 2 7 2 28 19 1 1 98 25 2 59 55 42 36 2 1
31 8 3 34 65 38 47 2 1 99 25 3 43 52 47 45 2 2
32 8 4 56 10 16 21 1 2 100 25 4 1 58 13 23 1 1
33 9 1 61 42 46 53 2 2 101 26 1 22 26 27 14 1 2
34 9 2 32 18 30 5 1 2 102 26 2 38 49 61 59 2 1
35 9 3 16 25 60 58 1 1 103 26 3 33 40 55 6 1 1
36 9 4 45 50 51 42 2 1 104 26 4 60 14 23 28 2 2
37 10 1 1 58 13 23 1 1 105 27 1 56 10 16 21 2 2
38 10 2 43 52 47 45 1 2 106 27 2 34 65 38 47 2 1
39 10 3 58 30 14 12 2 1 107 27 3 7 2 28 19 1 2
40 10 4 6 52 39 38 2 2 108 27 4 55 45 8 37 1 1
41 11 1 24 11 31 12 2 2 109 28 1 18 58 22 11 1 2
42 11 2 65 61 55 57 1 1 110 28 2 47 35 51 33 2 1
43 11 3 8 50 44 52 1 2 111 28 3 17 15 28 18 1 1
44 11 4 56 27 1 63 2 1 112 28 4 41 64 34 55 2 2
45 12 1 7 66 23 26 2 1 113 29 1 36 62 64 50 2 2
46 12 2 61 64 43 54 1 2 114 29 2 5 28 12 21 1 2
47 12 3 31 2 60 9 1 1 115 29 3 49 48 40 42 2 1
48 12 4 34 49 35 43 2 2 116 29 4 28 56 13 11 1 1
49 13 1 27 30 9 15 2 2 117 30 1 1 7 15 5 2 1
50 13 2 50 38 33 37 2 1 118 30 2 54 47 4 55 1 2
51 13 3 15 56 58 7 1 2 119 30 3 37 6 4 34 2 2
52 13 4 51 44 54 40 1 1 120 30 4 25 13 63 7 1 1
53 14 1 42 8 33 52 2 2 121 31 1 54 6 50 57 1 1
54 14 2 31 23 56 25 2 1 122 31 2 63 24 17 7 2 2
55 14 3 28 32 66 9 1 1 123 31 3 13 20 24 19 2 1
56 14 4 35 41 6 61 1 2 124 31 4 40 36 65 3 1 2
57 15 1 44 62 41 49 1 1 125 32 1 46 51 6 48 1 1
58 15 2 9 17 20 58 2 1 126 32 2 2 16 22 1 1 2
59 15 3 26 1 19 17 2 2 127 32 3 45 33 48 61 2 2
60 15 4 41 43 42 38 1 2 128 32 4 11 7 29 66 2 1
61 16 1 59 41 8 51 2 2 129 33 1 46 34 62 45 2 1
62 16 2 30 1 31 28 1 2 130 33 2 20 7 12 23 2 2
63 16 3 62 54 8 48 2 1 131 33 3 66 17 25 27 1 1
64 16 4 14 13 21 66 1 1 132 33 4 38 40 8 46 1 2
65 17 1 36 45 49 54 1 1 133 34 1 4 46 40 64 2 1
66 17 2 32 29 10 17 2 2 134 34 2 17 14 11 2 2 2
67 17 3 44 61 3 47 2 1 135 34 3 7 12 56 2 1 2
68 17 4 63 21 58 32 1 2 136 34 4 48 53 50 34 1 1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
291 Design and choice experiments
As an aside, The Ngene syntax should work for all model types by changing the efficiency
measure, independent of case type.
The data is set up as per a normal DCE, where the attributes are dummy
codes for the alternatives shown. Each task, however, is repeated, once for best
and once for worst. For worst, the coding is the same; however, −1 is used
instead of 1. An example is presented in Table 6B.1, where the first task is an
example of the above task.
The Ngene syntax for this design would look like:
Design
;eff=(mnl,d)
;alts = A, B, C, D ? this is the number of options to show
;rows = 12
;prop = bw1(bw)
;con
;model:
U(A) = Airline.d[0|0|0|0|0|0|0] * Airline[1,2,3,4,5,6,7,8]/
U(B) = Airline.d * Airline[1,2,3,4,5,6,7,8]/
U(C) = Airline.d * Airline[1,2,3,4,5,6,7,8]/
U(D) = Airline.d * Airline[1,2,3,4,5,6,7,8]
$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
292 Getting started
Resp Set Altij Cset Bestworst AirNZ Delta Emirates JetStar Qantas Singapore United Choice
1 1 1 4 1 0 0 0 0 0 1 0 0
1 1 2 4 1 0 0 1 0 0 0 0 1
1 1 3 4 1 0 0 0 0 1 0 0 0
1 1 4 4 1 0 0 0 0 0 0 0 0
1 1 1 4 −1 0 0 0 0 0 −1 0 0
1 1 2 4 −1 0 0 −1 0 0 0 0 0
1 1 3 4 −1 0 0 0 0 −1 0 0 0
1 1 4 4 −1 0 0 0 0 0 0 0 1
1 2 1 4 1 1 0 0 0 0 0 0 0
1 2 2 4 1 0 0 1 0 0 0 0 0
1 2 3 4 1 0 0 0 0 1 0 0 0
1 2 4 4 1 0 0 0 0 0 0 1 1
1 2 1 4 −1 −1 0 0 0 0 0 0 0
1 2 2 4 −1 0 0 −1 0 0 0 0 1
1 2 3 4 −1 0 0 0 0 −1 0 0 0
1 2 4 4 −1 0 0 0 0 0 0 −1 0
Or
Design
;eff=(mnl,d)
;alts = A, B, C, D
;rows = 12
;prop = bw1(bw)
;model:
U(A,B,C,D) = Airline.d[0|0|0|0|0|0|0] * Airline[1,2,3,4,5,6,7,8]$
You can have non-zero priors. Some researchers take different approaches to
constructing the worst task, where they delete the alternative chosen as best
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
293 Design and choice experiments
when constructing the worst task. The assumption is that the respondent is
comparing the worst from the remaining set of alternatives shown. Below is an
example of this. The two approaches represent two different assumptions as to
how respondents are answering these questions. The syntax for this type of
design becomes:
Design
;eff=(mnl,d)
;alts = A, B, C, D
;rows = 12
;prop = bw1(b,w)
;con
;model:
U(A) = A[0] + Airline.d[0|0|0|0|0|0|0] * Airline[1,2,3,4,5,6,7,8]/
U(B) = B[0] + Airline.d * Airline[1,2,3,4,5,6,7,8]/
U(C) = C[0] + Airline.d * Airline[1,2,3,4,5,6,7,8]/
U(D) = Airline.d * Airline[1,2,3,4,5,6,7,8] $
and the assumed data structure being optimized is shown in Table 6B.2.
The design also allows for more than one best–worst ranking, for example:
Design
;eff=(mnl,d)
;alts = A, B, C, D
;rows = 12
;choices = bw1(bw,b)
;model:
U(A,B,C,D) = Airline.d[0|0|0|0|0|0|0] * Airline[1,2,3,4,5,6,7,8]$
Resp Set Altij Cset Bestworst AirNZ Delta Emirates JetStar Qantas Singapore United Choice
1 1 1 4 1 0 0 0 0 0 1 0 0
1 1 2 4 1 0 0 1 0 0 0 0 1
1 1 3 4 1 0 0 0 0 1 0 0 0
1 1 4 4 1 0 0 0 0 0 0 0 0
1 1 1 3 −1 0 0 0 0 0 −1 0 0
1 1 3 3 −1 0 0 0 0 −1 0 0 0
1 1 4 3 −1 0 0 0 0 0 0 0 1
1 2 1 4 1 1 0 0 0 0 0 0 1
1 2 2 4 1 0 0 1 0 0 0 0 0
1 2 3 4 1 0 0 0 0 1 0 0 0
1 2 4 4 1 0 0 0 0 0 0 1 0
1 2 2 3 −1 0 0 −1 0 0 0 0 1
1 2 3 3 −1 0 0 0 0 −1 0 0 0
1 2 4 3 −1 0 0 0 0 0 0 −1 0
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
294 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
295 Design and choice experiments
Resp Set Altij Altn Cset Bestworst Inch28 Inch30 CabScr LimMov Pay Hour1 Hour3 Choice
1 1 1 1 4 1 0 1 0 0 0 0 0 0
1 1 2 2 4 1 0 0 0 1 0 0 0 1
1 1 3 3 4 1 0 0 0 0 0 0 0 0
1 1 4 4 4 1 0 0 0 0 0 0 0 0
1 1 1 5 4 −1 0 −1 0 0 0 0 0 0
1 1 2 6 4 −1 0 0 0 −1 0 0 0 0
1 1 3 7 4 −1 0 0 0 0 0 0 0 1
1 1 4 8 4 −1 0 0 0 0 0 0 0 0
1 2 1 1 4 1 1 0 0 0 0 0 0 0
1 2 2 2 4 1 0 0 1 0 0 0 0 0
1 2 3 3 4 1 0 0 0 0 1 0 0 1
1 2 4 4 4 1 0 0 0 0 0 0 1 0
1 2 1 5 4 −1 −1 0 0 0 0 0 0 1
1 2 2 6 4 −1 0 0 −1 0 0 0 0 0
1 2 3 7 4 −1 0 0 0 0 −1 0 0 0
1 2 4 8 4 −1 0 0 0 0 0 0 −1 0
The command above is still B/W1 (it will be changed soon). You can also have
constants in the design, such that the syntax and data structure assumed are:
Design
;eff=(mnl,d)
;alts = Seat, Movie, Pay, Stop
;rows = 12
;choices = bw2(bw)
;con
;model:
U(Seat) = ASCseat[0] + seat.dummy[0|0] * seat[0,1,2] /
U(Movie) = ASCMovie[0] + Movie.dummy[0|0]*Movie[0,1,2] /
U(Pay) = ASCPay[0] + Pay[0]*Pay[0,1] /
U(Stop) = Stop.dummy[0|0]*stop[0,1,2] $
As with case 1, some researchers tend to delete the best when constructing the
worst task (Table 6B.5). The Ngene syntax to do this is:
Design
;eff=(mnl,d)
;alts = Seat, Movie, Pay, Stop
;rows = 12
;choices = bw2(b,w)
;con
;model:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
Table 6B.5 Example B/W case 2 task data set up 1 with constants
Resp Set Altij Altn Cset Bestworst Seat Mov Alc Inch28 Inch30 CabScr LimMov Pay Hour1 Hour3 Choice
1 1 1 1 4 1 1 0 0 0 1 0 0 0 0 0 0
1 1 2 2 4 1 0 1 0 0 0 0 1 0 0 0 1
1 1 3 3 4 1 0 0 1 0 0 0 0 0 0 0 0
1 1 4 4 4 1 0 0 0 0 0 0 0 0 0 0 0
1 1 1 5 4 −1 1 0 0 0 −1 0 0 0 0 0 0
1 1 2 6 4 −1 0 1 0 0 0 0 −1 0 0 0 0
1 1 3 7 4 −1 0 0 1 0 0 0 0 0 0 0 1
1 1 4 8 4 −1 0 0 0 0 0 0 0 0 0 0 0
1 2 1 1 4 1 1 0 0 1 0 0 0 0 0 0 0
1 2 2 2 4 1 0 1 0 0 0 1 0 0 0 0 0
1 2 3 3 4 1 0 0 1 0 0 0 0 1 0 0 1
1 2 4 4 4 1 0 0 0 0 0 0 0 0 0 1 0
1 2 1 5 4 −1 1 0 0 −1 0 0 0 0 0 0 1
1 2 2 6 4 −1 0 1 0 0 0 −1 0 0 0 0 0
1 2 3 7 4 −1 0 0 1 0 0 0 0 −1 0 0 0
1 2 4 8 4 −1 0 0 0 0 0 0 0 0 0 −1 0
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
297 Design and choice experiments
You can also assume that in the presence of multiple best–worst questions,
they choose:
Design
;eff=(mnl,d)
;alts = Seat, Movie, Pay, Stop
;rows = 12
;choices = bw2(b,b,b)
;con
;model:
U(Seat) = ASCseat[0] + seat.dummy[0|0] * seat[0,1,2] /
U(Movie) = ASCMovie[0] + Movie.dummy[0|0]*Movie[0,1,2] /
U(Pay) = ASCPay[0] + Pay[0]*Pay[0,1] /
U(Stop) = Stop.dummy[0|0]*stop[0,1,2] $
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
Table 6B.6 Example B/W case 2 task data set up 2
Resp Set Altij Altn Cset Bestworst Seat Mov Alc Inch28 Inch30 CabScr LimMov Pay Hour1 Hour3 Choice
1 1 1 1 4 1 1 0 0 0 1 0 0 0 0 0 0
1 1 2 2 4 1 0 1 0 0 0 0 1 0 0 0 1
1 1 3 3 4 1 0 0 1 0 0 0 0 0 0 0 0
1 1 4 4 4 1 0 0 0 0 0 0 0 0 0 0 0
1 1 1 5 3 −1 1 0 0 0 −1 0 0 0 0 0 0
1 1 3 7 3 −1 0 0 1 0 0 0 0 0 0 0 1
1 1 4 8 3 −1 0 0 0 0 0 0 0 0 0 0 0
1 2 1 1 4 1 1 0 0 1 0 0 0 0 0 0 0
1 2 2 2 4 1 0 1 0 0 0 1 0 0 0 0 0
1 2 3 3 4 1 0 0 1 0 0 0 0 1 0 0 1
1 2 4 4 4 1 0 0 0 0 0 0 0 0 0 1 0
1 2 1 5 3 −1 1 0 0 −1 0 0 0 0 0 0 1
1 2 2 6 3 −1 0 1 0 0 0 −1 0 0 0 0 0
1 2 4 8 3 −1 0 0 0 0 0 0 0 0 0 −1 0
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
299 Design and choice experiments
possible that they answered best, next best, next best, and so on.
Alternatively, the respondent may have answered best, worst, next best,
next worst, and so on.
Assuming best, next best, next best, and so on, the data would be set up as in
Table 6B.7 (representing a traditional rank explosion exercise).
The Ngene syntax for the above design would look like:
Design
;alts =A,B,C,D,E
;eff= (mnl,d,mean)
;rows=24
;bdraws=halton(100)
;choices = bw3(b,b,b,b)
;model:
U(A) = Dr.dummy[(u,-1,-0.5)|(u,-0.5,0)]*drink[0,1,2] + Sm.dummy[(u,-1,-
0.5)|(u,-0.5,0)]*Smoke[0,1,2]
+ Ch.dummy[(u,-1,-0.5)|(u,-0.5,0)]*Child[0,1,2]+ Jo.dummy[(u,-1,-0.5)|
(u,-0.5,0)]*Job[0,1,2]
+ Lo.dummy[(u,-1,-0.5)|(u,-0.5,0)]*Looks[0,1,2] + Cst[(n,-0.05,0.01)]
*Cost[5,10,15,20] /
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
300 Getting started
Resp Set RespSet Explode Altij Altn Cset Choice Drink Smoke Child Job Looks Cost
1 1 1 1 1 1 5 1 0 1 1 0 2 20
1 1 1 1 2 2 5 0 1 2 0 1 0 15
1 1 1 1 3 3 5 0 2 0 1 1 2 10
1 1 1 1 4 4 5 0 1 1 1 2 0 15
1 1 1 1 5 5 5 0 2 2 0 0 1 10
1 2 1 2 2 7 4 0 1 2 0 1 0 15
1 2 1 2 3 8 4 1 2 0 1 1 2 10
1 2 1 2 4 9 4 0 1 1 1 2 0 15
1 2 1 2 5 10 4 0 2 2 0 0 1 10
1 3 1 3 2 12 3 0 1 2 0 1 0 15
1 3 1 3 4 14 3 0 1 1 1 2 0 15
1 3 1 3 5 15 3 1 2 2 0 0 1 10
1 4 1 4 2 17 2 1 1 2 0 1 0 15
1 4 1 4 4 19 2 0 1 1 1 2 0 15
The attributes are dummy coded in the syntax for this example; however, in
theory, the method itself does not impose any particular coding structure.
Some researchers take a different track for the explosions by deleting the
alternative chosen as best or worst in the previous pseudo-observation. In this
case, the assumption is that the respondent is comparing the next best or next
worst from the remaining set of alternatives. This data set up is shown in
Table 6B.8. Note that the two approaches represent two different assumptions
as to how respondents are answering these questions.
The Ngene syntax for the above design would look like:
Design
;alts =A,B,C,D,E
;eff= (mnl,d,mean)
;rows=24
;bdraws=halton(100)
;choices = bw3(b,w,b,w)
;model:
U(A) = Dr.dummy[(u,-1,-0.5)|(u,-0.5,0)]*drink[0,1,2] + Sm.dummy[(u,-1,-
0.5)|(u,-0.5,0)]*Smoke[0,1,2]
+ Ch.dummy[(u,-1,-0.5)|(u,-0.5,0)]*Child[0,1,2]+ Jo.dummy[(u,-1,-0.5)|
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
301 Design and choice experiments
Resp Bestworst Explode Altij Altn Cset Choice Drink Smoke Child Job Looks Cost
1 1 1 1 1 5 1 0 1 1 0 2 20
1 1 1 2 2 5 0 1 2 0 1 0 15
1 1 1 3 3 5 0 2 0 1 1 2 10
1 1 1 4 4 5 0 1 1 1 2 0 15
1 1 1 5 5 5 0 2 2 0 0 1 10
1 −1 2 2 7 4 0 −1 −2 0 −1 0 −15
1 −1 2 3 8 4 0 −2 0 −1 −1 −2 −10
1 −1 2 4 9 4 1 −1 −1 −1 −2 0 −15
1 −1 2 5 10 4 0 −2 −2 0 0 −1 −10
1 1 3 2 12 3 0 1 2 0 1 0 15
1 1 3 3 13 3 1 2 0 1 1 2 10
1 1 3 5 15 3 0 2 2 0 0 1 10
1 −1 4 2 17 2 1 −1 −2 0 −1 0 −15
1 −1 4 5 20 2 0 −2 −2 0 0 −1 −10
(u,-0.5,0)]*Job[0,1,2]
+ Lo.dummy[(u,-1,-0.5)|(u,-0.5,0)]*Looks[0,1,2] + Cst[(n,-0.05,0.01)]
*Cost[5,10,15,20] /
U(B) = Dr*drink + Sm*Smoke+ Ch*Child+ Jo*Job + Lo*Looks + Cst*Cost /
U(C) = Dr*drink + Sm*Smoke+ Ch*Child+ Jo*Job + Lo*Looks + Cst*Cost /
U(D) = Dr*drink + Sm*Smoke+ Ch*Child+ Jo*Job + Lo*Looks + Cst*Cost /
U(E) = Dr*drink + Sm*Smoke+ Ch*Child+ Jo*Job + Lo*Looks + Cst*Cost $
6C.1 Louviere and Hensher (1983), Louviere and Woodworth (1983), and others
The first SC studies focused on introducing the method and promoting its
benefits over the standard stated preference techniques used at the time (such
as traditional conjoint methods). These early studies, therefore, did not
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
302 Getting started
VC ¼ 2 ðX 0 XÞ1 ; ð6C:1Þ
where σ2 is the model variance, and X is the matrix of attribute levels in the
design or in the data to be used in estimation.
Fixing the model variance for the present (which simply acts as a scaling
factor), the elements of the VC matrix for linear regression models will generally
be minimized when the columns of the X matrix are orthogonal. As such, when
such models are estimated, the orthogonality of data is considered important as
this property ensures that (a) the model will not suffer from multicollinearity,
and (b) the variances (and covariances) of the parameter estimates are mini-
mized. As such, orthogonal designs, at least in relation to linear models, meet
the two criteria for a good design mentioned earlier; they allow for an indepen-
dent determination of each attribute’s contribution on the dependent variable
and they maximize the power of the design to detect statistically significant
relationships (i.e., maximize the t-ratios at any given sample size). Of course, the
role that sigma plays may be important and as such cannot always be ignored as
suggested above. This is because it may be possible to locate a non-orthogonal
design which produces non-zero covariances and slightly larger variances, but
has smaller elements overall when scaled by sigma. Nevertheless, orthogonal
designs will tend to perform well overall for this type of model.
3
This is not to suggest that research into aspects associated with the specific use of orthogonal designs as
applied to discrete choice data were not undertaken in the early years of SC studies. For example,
Anderson and Wiley (1992) and Lazari and Anderson (1994) looked at orthogonal designs capable of
addressing problems of availability of alternatives. See Louviere et al. (2000) for a review of orthogonal
design theory as applied to SC methods.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
303 Design and choice experiments
Despite the fact that discrete choice data is often applied to non-linear
models, the question as to whether designs generated for linear models might
be appropriate for such data remained surprisingly uncommented on for a
number of years. Where an examination of the problem was made, often an
inappropriate analysis was conducted that resulted in the not surprising
conclusion that orthogonal designs are preferred to non-orthogonal designs.
For example, Kuhfeld et al. (1994) compared balanced and unbalanced
orthogonal designs to non-orthogonal designs using the Information matrix
associated with linear models (specifically Equation 6C.1 without sigma)
despite applying the designs to non-linear logit models. It is little surprising
that they concluded that while “preserving orthogonality at all costs can lead
to decreased efficiency,” particularly when a balanced orthogonal design was
not available, “non-orthogonal designs will never be more efficient than
balanced orthogonal designs, when they exist.”
Such misconceptions continue to this day. To demonstrate, consider the
frequent practice of either (i) reporting the following design statistic in SC studies
or (ii) the use of the statistic itself as the objective function to be maximized when
generating a SC design (e.g., Kuhfeld et al. 1994; Lusk and Norwood 2005):
100
D-efficiency ¼ ; ð6C:2Þ
SjðX 0 XÞ1 j1=K
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
304 Getting started
Efficiency
∼ Prior parameter
βk
Figure 6C.1 Locally optimal parameter priors and parameter prior misspecification
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
305 Design and choice experiments
methods to locate designs that minimized the variances of the ratio of two
parameters, and as such generate designs which can be considered to be
optimal under the assumptions for which they were generated.
Careful examination of the designs that were generated by this group led to
the observation that many of the resulting choice tasks were not realistic from
the perspective of the respondent. For this reason additional requirements
were imposed on the generated designs in which a reasonable coverage of so-
called “boundary values” were sought and obtained (see Fowkes and
Wardman 1988; Fowkes et al. 1993; Toner et al. 1998, 1999; Watson et al.
2000 for further discussion of these designs). Further examination of these
designs by the Leeds group found that they tended to retrieve very specific
choice probabilities, which they referred to as “Magic P’s.” This finding was
later independently rediscovered by other researchers working in other dis-
cipline areas, in particular by Kanninen in 2002.
4
An earlier version of the paper appeared in 1994; however, we prefer the later date as this version remains
freely accessible on the lead authors’ own webpage.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
306 Getting started
matrix. Designs which minimize the D-error statistic are therefore called D-
optimal designs.
Keeping in line with earlier empirical work in SC, Bunch et al. (1996)
searched only among orthogonal designs. In doing so, they considered both
simultaneously and sequentially constructed orthogonal designs in the gen-
eration process. A simultaneous orthogonal design is one where the attributes
of the design are orthogonal not only within alternatives, but also between
alternatives. This requires that the design be generated simultaneously for all
alternatives. A sequentially constructed orthogonal design is one where the
attributes of the design may be orthogonal within an alternative, but not
necessarily between alternatives (see Louviere et al. 2000). As such, their
designs also kept the same properties as orthogonal designs, including attri-
bute level balance constraints.
Unlike the designs generated by the Leeds group, the use of pre-specified
fixed attribute levels makes it generally difficult to locate the design matrix
which will be optimal. As such, algorithms are required which search over the
design space by re-arranging the attribute levels of the design and testing the
efficiency measure after each change. Only if all possible designs are tested can
one conclude that the design is optimal. For designs with large design dimen-
sions, this is not always possible, and for this reason such designs are more
correctly referred to as efficient designs. Given that Bunch et al. (1996) con-
sidered designs which were orthogonal, only a sub-set of all possible designs was
examined. For this reason, the class of designs generated by Bunch et al. are
more correctly referred to as locally optimal D-efficient designs, as opposed to
D-optimal designs. Although algorithms for locating SC designs are important,
and formed a central part of the Bunch et al. paper, for reasons of space we do
not discuss this aspect of the design generation process here (see Kessels et al.
2006 for an excellent discussion of design algorithms).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
307 Design and choice experiments
A1 A2 A3 A1 A2 A3 A1 A2 A3 A1 A2 A3
10 3 2 10 3 2 10 3 2 10 3 2
20 –5 4 30 –5 6 20 –5 4 20 –5 4
30 3 6 20 –5 4 30 3 6 30 3 4
10 –5 6 20 3 4 30 –5 6 10 3 4
30 –5 2 10 –5 2 30 –5 2 10 –5 2
20 3 4 30 3 6 20 –5 6 20 3 6
(a) (b)
Figure 6C.2 Different definitions of attribute level balance
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
308 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
309 Design and choice experiments
Efficiency
µL ∼ µU Prior parameter
βk
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
310 Getting started
manner. Whereas other researchers derive the second derivatives with respect
to the parameter such that:
∂2 logL
ΩN ¼ IN1 ; with IN ¼ EN ; ðA6:3:3Þ
∂β∂β0
where EN(.) is used to express the large sample population mean, Street and
Burgess calculate the second derivatives with respect to total utility V, such
that:
∂2 logL
ΩN ¼ IN1 ; with IN ¼ EN : ðA6:3:4Þ
∂V∂V 0
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
311 Design and choice experiments
1 0 0 2 1 1 −1 1 −1 1 1 1
2 1 1 0 0 2 0 −2 1 −1 1 −1
3 2 1 1 0 3 1 1 1 0 −2 −1
4 2 0 1 1 4 1 1 −1 0 −2 1
5 0 0 2 1 5 −1 1 −1 1 1 1
6 1 1 0 0 6 0 −2 1 −1 1 −1
7 1 0 0 1 7 0 −2 −1 −1 1 1
8 2 0 1 1 8 1 1 −1 0 −2 1
9 0 1 2 0 9 −1 1 1 1 1 −1
10 1 0 0 1 10 0 −2 −1 −1 1 1
11 2 1 1 0 11 1 1 1 0 −2 −1
12 0 1 2 0 12 −1 1 1 1 1 −1
8 24 12 8 24 12
The major contribution of Street and Burgess, however, has been to derive a
method that can be used to locate the optimal design under the above set of
assumptions without having to resort to complex (iterative) algorithms. This
involves first determining the maximum value of the determinant of the AVC
matrix. To do this, they first calculate the value Mk, which represents the
largest number of pairs of alternatives that can assume different levels for each
attribute, k, in a choice situation. This value for each attribute k, can be
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
312 Getting started
established using Equation (6C.5). Note that the particular formula to adopt
to calculate Mk is a function of the number of alternatives in the design, J, and
the number of levels of attribute k, Lk:
8 2
>
> ðJ 1Þ=4; Lk ¼ 2; J odd;
< J 2 =4;
>
> Lk ¼ 2; J even;
Mk ¼ 2 2
ð6C:5Þ
>
>
> J ðLk x þ 2 × 7 þ yÞ =2; 2 ≤ Lk ≤ J;
>
:
JðJ 1Þ=2; Lk ≥ J:
and x and y are positive integers that satisfy the equation J = Lkx + y for
0 ≤ y ≤ Lk. For the case where an attribute has levels 2 ≤ Lk ≤ J, the analyst will
need to fit integer values for y between zero and Lk to obtain values of x that
satisfies this equation. Any value of y that results in an integer value of x
represents a possible candidate for the design.
Once the value of Mk has been established for each attribute, the maximum
value of the determinant of C is calculated as:
!Lk 1
K
Y 2Mk
detðCmax Þ ¼ Y ×100: ð6C:6Þ
2
J ðLk 1Þ i≠k Li
k¼1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
313 Design and choice experiments
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
314 Getting started
Optimal choice–percentage
Number of Number of unique choice split for two-alternative
attributes (K) situations in the design model
2 2 0.82 / 0.18
3 4 0.77 / 0.23
4 4 0.74 / 0.26
5 8 0.72 / 0.28
6 8 0.70 / 0.30
7 8 0.68 / 0.32
8 8 0.67 / 0.33
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
315 Design and choice experiments
seN (X I, β) seN (X I, β)
seN (X I, β)
N N
0 10 20 30 40 50 0 10 20 30 40 50
(a) investing in larger samples (b) investing in better design
Figure 6C.4 Comparison of investing in larger sample sizes versus more efficient designs
studies. Figure 6C.4b reveals the impact for a given set of population para-
meters of investing in a better design XII (i.e., more efficient design). Typically,
larger decreases in the standard error can be achieved by investing in finding a
more efficient design than by investing in a larger sample. Note that the
relationships shown in Figure 6C.4 are an inescapable property of the logit
model; however, the rate of decline represented in the curve will be specific to
the design.
Given the above, Bliemer and Rose were able to use this relationship to
provide an insight into the sample size requirements for SC experiments.
Seeing that the square roots of the diagonal elements of the AVC matrix
represent the asymptotic standard errors for the parameter estimates, and the
asymptotic t-ratios are simply the parameter estimates divided by the asymp-
totic standard errors (Equation (6C.7)), it is possible to determine the likely
asymptotic t-ratios for a design assuming a set of prior parameter estimates:
βk
tk ¼ pffiffiffiffiffiffiffiffiffiffiffi : ð6C:7Þ
2k =N
tk2 2k
Nk ¼ : ð6C:8Þ
βk
Equation (6C.8) allows for a determination of the sample size required for
each parameter to achieve a minimum asymptotic t-ratio, assuming a set of
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
316 Getting started
non-zero prior parameter values. To use these equations, the analyst might use
the prior parameters used in generating the design, or test the sample size
requirements under various prior parameter mis-specifications. Once the
sample size is determined for all attributes, the analyst can then select the
sample size that will be expected to result in all asymptotic t-ratios taking a
minimum pre-specified value (e.g., 1.96). Such designs are called S-efficient
designs. Bliemer and Rose noted, however, that sample sizes calculated using
this method should be considered as an absolute theoretical minimum. The
method assumes certain asymptotic properties that may not hold in small
samples. Further, the method does not consider the stability of the parameter
estimates, nor at what sample size parameter stability is likely to be achieved.
Comparing samples sizes using Equation (6C.8) for different parameters may
also give an indication which parameters will be more difficult to estimate (at a
certain level of significance) than other parameters.
Rose and Bliemer (2006) next extended the theory of SC designs to include
covariates in the utility functions and hence also in the AVC matrices of the
designs. Assuming an MNL model with non-zero local priors and combina-
tions of alternative-specific and generic parameters, they were able to demon-
strate a method capable of jointly minimizing the elements of the AVC matrix
while determining the optimal number of respondents to sample from differ-
ent segments. This was accomplished by determining optimal weights to
apply to different segments of the Fisher Information matrix based on how
many respondents belong to each segment.
Rose et al. (2008) next looked at SC studies requiring pivot (or customized)
designs where the levels of the design alternatives are represented as percentage
differences from some pre-specified respondent-specific status quo alternative,
rather than as specific pre-defined levels chosen by the analyst. Again, assuming
an MNL model specification with non-zero local priors and combinations of
alternative-specific and generic parameters, they explored a number of design
procedures capable of optimizing designs at the individual respondent level.
Meanwhile, Ferrini and Scarpa (2007), writing in the environmental eco-
nomics literature, extended the optimal design theory to panel error compo-
nent models assuming non-zero local priors and fixed attribute levels. In
considering the panel version of the model, this paper represented a signifi-
cant leap forward in the theory of SC experimental design, as it was the first to
consider the issue of within respondent preference correlations which theo-
retically exist over repeated choice tasks. Unlike earlier papers, however,
Ferrini and Scarpa used simulation to derive the AVC matrix of the model
rather than more common analytical derivations.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
317 Design and choice experiments
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
318 Getting started
has been collected. To address this specific issue, Rose et al. (2009) advocated
the use of a model averaging approach, where different weights could be
applied to the Fisher Information matrices obtained assuming different
model specifications given a common design. Included in the model averaging
process were MNL, cross-sectional error components and MMNL, and panel
error components and MMNL model specifications.
More recently, Rose et al. (2011) sought to extend upon the earlier research
originating from both the Leeds group and Kanninen to a wider range of SC
problems. Unfortunately, they found that it was only possible to derive the
optimal choice probabilities for designs generated under the assumption of a
MNL model specification involving two alternatives and non-zero local priors
parameters and generic parameter estimates. To overcome this limitation, they
demonstrated how the Nelder–Mead algorithm could be used to locate the
optimal choice probabilities for any model type with any number of alter-
natives and any type of prior parameters, including non-zero Bayesian priors.
In contrast to the case with two alternatives and all generic parameters, fixed
Magic Ps do not seem to exist for this more general case.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
319 Design and choice experiments
minimize the standard errors obtained from the design, they do remain
consistent with the general theory of experimental design.
One acknowledged limitation of the current chapter lies in the way that we
have attempted to present the outputs of the various research groups in
chronological order of publication. Unfortunately, such an approach need
not reflect the true history of the research efforts of those involved in this field.
The issue lies in the variable length of time that it takes for academic research
to be published, which admittedly may be longer in some disciplines than
others. Further, we have attempted where possible to reference formally
published work over working papers, which may distort the true chronology
of events. For example, the work by Rose et al. (2009) advocating the use of a
model averaging process in generating designs when the final model type is
unknown also employs designs generated for panel MMNL model specifica-
tions, prior to publication of the paper by Bliemer and Rose (2010a) specifi-
cally dealing with panel MMNL model designs. This is because the 2010a
paper has as its origins a 2008 conference paper originally written in 2007,
whereas the 2009 paper remains a conference paper. Likewise, the Bunch et al.
(1996) paper was originally written in 1994; however, the final working paper
that is now available dates from 1996. As such, we ask that care be taken in
interpreting the exact timeline of events.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:20:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.008
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
7.1 Introduction
This chapter will discuss some issues in statistical inference in the analysis of
choice models. We are concerned with two kinds of computations, hypothesis
tests and variance estimation. To illustrate the analyses, we will work through
an example based on a revealed preference (RP) data set. In this chapter, we
present syntax and output generated using Nlogit to demonstrate the concepts
covered. The syntax and output is, for the more familiar reader, largely self-
explanatory; however, for the less familiar reader, we refer you to Chapter 11,
which you may wish to read before going further. The multinomial logit
model for the study is shown in the following Nlogit set up which gives the
utility functions for four travel modes: bus, train, busway and car, respectively:
;Model:
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = bw + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(cr) = TC*TC + PC*PC + invtcar*invt + egtcar*egt
The attributes are act = access time, invc = in vehicle cost, invt2 = in vehicle
time, egt = egress time, trnf = transfer wait time, tc = toll cost, pc = parking cost,
and invt = in vehicle time for car. Where a particular example uses a method
given in more detail in later chapters, we will provide a cross-reference.
Hypothesis tests are carried out using a variety of methods appropriate for the
situation. The most common tests compare nested models. These are cases in
which one model is obtained by a restriction on the parameters of a (larger)
320
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
321 Statistical inference
Note that the test requires that the restricted model have a smaller number of
free parameters. The multinomial logit model, for example, is a special case of
the nested logit model where all of the inclusive value parameters are equal to
one. To illustrate the likelihood-ratio test (LRT), we will test for a nesting
structure in our choice model (see Chapter 14 for nested logit models). Full
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
322 Getting started
results and the set up appear below. The LL for the nested logit is −199.25552.
The LL for the MNL is −200.40253. Twice the difference is the estimated Chi-
squared statistic, which is only 2.294. With two degrees of freedom, the critical
value (95 percent) is 5.99. So the hypothesis of the MNL model is not rejected
by this test. The LL function for the nested logit model is not significantly
larger than that for the MNL model:
? LR Test of MNL vs. nested logit
? This first model is a nested logit model.
? Note the tree definition after the list of choices (see Chapter 14 for
details)
NLOGIT
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr
;tree=bwtn(bw,tn),bscar(bs,cr)
;model:
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = bw+ actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(cr) = invtcar*invt + TC*TC + PC*PC + egtcar*egt
? Capture the unrestricted log likelihood
CALC ; llnested = logl $
? This second model is a simple MNL model – note no tree definition.
NLOGIT
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr
;model:
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = bw+ actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(cr) = invtcar*invt + TC*TC + PC*PC + egtcar*egt $
? Capture the restricted log likelihood, then compute the statistic.
? The Ctb(..) function in CALC reports the 95% critical value for the
? chi squared with two degrees of freedom.
CALC ; loglmnl = logl $
CALC ; List ; LRTest = 2*(llnested-loglmnl) ; Ctb(0.95,2) $
The estimation results for this test are shown below. For convenience, some of
the computer generated output is omitted:
-----------------------------------------------------------------------------------------------------
FIML Nested Multinomial Logit Model
Dependent variable CHOICE
➔
Log likelihood function -199.25552
Response data are given as ind. choices
Estimation based on N = 197, K = 14
Response data are given as ind. choices
Number of obs.= 197, skipped 0 obs
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
323 Statistical inference
-----------+----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+----------------------------------------------------------------------------------------
|Attributes in the Utility Functions (beta)
BS| −1.66524** .80355 −2.07 .0382 −3.24017 −.09030
ACTPT| −.07931*** .02623 −3.02 .0025 −.13071 −.02791
INVCPT| −.06125 .05594 −1.09 .2735 −.17089 .04839
INVTPT| −.01362 .00936 −1.45 .1457 −.03197 .00473
EGTPT| −.04509** .02235 −2.02 .0437 −.08890 −.00128
TRPT| −1.40080*** .46030 −3.04 .0023 −2.30297 −.49863
TN| −3.90899 2.80641 −1.39 .1637 −9.40946 1.59148
BW| −4.26044 2.91116 −1.46 .1433 −9.96621 1.44533
INVTCAR| −.04768*** .01232 −3.87 .0001 −.07183 −.02354
TC| −.11493 .08296 −1.39 .1659 −.27752 .04766
PC| −.01771 .01906 −.93 .3527 −.05507 .01965
EGTCAR| −.05896* .03316 −1.78 .0754 −.12395 .00603
|IV parameters, tau(b|l,r),sigma(l|r),phi(r)
BWTN| .55619** .26662 2.09 .0370 .03363 1.07874
BSCAR| .99522*** .24722 4.03 .0001 .51069 1.47976
-----------+----------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
➔
Log likelihood function -200.40253
Estimation based on N = 197, K = 12
-----------+----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+----------------------------------------------------------------------------------------
BS| −1.87740** .74583 −2.52 .0118 −3.33920 −.41560
ACTPT| −.06036*** .01844 −3.27 .0011 −.09650 −.02423
INVCPT| −.08571* .04963 −1.73 .0842 −.18299 .01157
INVTPT| −.01106 .00822 −1.35 .1782 −.02716 .00504
EGTPT| −.04117** .02042 −2.02 .0438 −.08119 −.00114
TRPT| −1.15503*** .39881 −2.90 .0038 −1.93668 −.37338
TN| −1.67343** .73700 −2.27 .0232 −3.11791 −.22894
BW| −1.87376** .73750 −2.54 .0111 −3.31924 −.42828
INVTCAR| −.04963*** .01166 −4.26 .0000 −.07249 −.02677
TC| −.11063 .08471 −1.31 .1916 −.27666 .05540
PC| −.01789 .01796 −1.00 .3192 −.05310 .01731
EGTCAR| −.05806* .03309 −1.75 .0793 −.12291 .00679
-----------+----------------------------------------------------------------------------------------
[CALC] LRTEST = 2.2940253
[CALC] = 5.9914645
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
324 Getting started
H0 : Rβ q ¼ 0; ð7:2Þ
In some cases, the Wald test is built into software such as NLOGIT. In others,
you would use matrix algebra to obtain the result. NLOGIT contains a WALD
command that allows you to specify the constraints (they may be non-linear
as well) and that does the matrix algebra for you. In the example below, we use
WALD, and then show how to use matrix algebra to obtain the same result.
The command below specifies a nested logit model as per Chapter 14. The
inclusive value (IV) parameters are the 13th and 14th in the estimated
parameter vector. In Section 7.2.1, we used a likelihood ratio to test the null
hypothesis that both parameters equal one. We now test the hypothesis using
WALD, and using matrix algebra:
Nlogit
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr
;model:
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = bw+ actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(cr) = invtcar*invt + TC*TC + PC*PC + egtcar*egt
;tree=bwtn(bw,tn),bscar(bs,cr)$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
325 Statistical inference
? Wald Test
WALD ; parameters = b ; labels = 12_c,ivbwtn,ivbscar
? in the above line, there are 12 parameters plus 2 IV parameters
; covariance = varb
; fn1 = ivbwtn-1 ; fn2 = ivbscar - 1 $
? Same computation using matrix algebra
MATRIX ; R = [0,0,0,0,0,0,0,0,0,0,0,0,1,0 / 0,0,0,0,0,0,0,0,0,0,0,0,0,1] ;
q=[1/1] $
MATRIX ; m = R*b – q ; vm = R*Varb*R’ ; list ; w = m’<vm>m $
------------------------------------------------------------------------------------------------------
FIML Nested Multinomial Logit Model
Dependent variable CHOICE
-----------+------------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+------------------------------------------------------------------------------------------
|Attributes in the Utility Functions (beta)
BS| −1.66524** .80355 −2.07 .0382 −3.24017 −.09030
ACTPT| −.07931*** .02623 −3.02 .0025 −.13071 −.02791
INVCPT| −.06125 .05594 −1.09 .2735 −.17089 .04839
INVTPT| −.01362 .00936 −1.45 .1457 −.03197 .00473
EGTPT| −.04509** .02235 −2.02 .0437 −.08890 −.00128
TRPT| −1.40080*** .46030 −3.04 .0023 −2.30297 −.49863
TN| −3.90899 2.80641 −1.39 .1637 −9.40946 1.59148
BW| −4.26044 2.91116 −1.46 .1433 −9.96621 1.44533
INVTCAR| −.04768*** .01232 −3.87 .0001 −.07183 −.02354
TC| −.11493 .08296 −1.39 .1659 −.27752 .04766
PC| −.01771 .01906 −.93 .3527 −.05507 .01965
EGTCAR| −.05896* .03316 −1.78 .0754 −.12395 .00603
|IV parameters, tau(b|l,r),sigma(l|r),phi(r)
BWTN| .55619** .26662 2.09 .0370 .03363 1.07874
BSCAR| .99522*** .24722 4.03 .0001 .51069 1.47976
-----------+------------------------------------------------------------------------------------------
WALD procedure. Estimates and standard errors for nonlinear functions and
joint test of nonlinear restrictions.
Wald Statistic = 3.01209
Prob. from Chi-squared[ 2] = .22179
Functions are computed at means of variables
-----------+------------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
WaldFcns| Function Error z |z|>Z* Interval
-----------+------------------------------------------------------------------------------------------
Fncn(1)| −.44381* .26662 −1.66 .0960 −.96637 .07874
Fncn(2)| −.00478 .24722 −.02 .9846 −.48931 .47976
-----------+------------------------------------------------------------------------------------------
W| 1
-----------+--------------
1| 3.01209
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
326 Getting started
The Wald statistic, 3.01209, appears at the top of the results for the WALD
command. As before, the critical value with two degrees of freedom is 5.99, so
the Wald test does not reject the hypothesis of the MNL model either. The P
value given can be used to assess the significance level of the test. Since we are
testing at the α = 5 percent significance level, the P value of 0.22179, being
larger than α, indicates that the null hypothesis should not be rejected. The last
result shows that the same value for the Wald statistic can be computed using
matrix algebra.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
327 Statistical inference
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
328 Getting started
the two models, labeled “A” and “B,” the LL functions are the sums of the
individual contributions:
XN
log Lj ¼ i¼1
log Li jj; j ¼ A; B: ð7:4Þ
Thus, V is the standard t statistic that is used to test the null hypothesis that
E[vi] = 0. sv is the sample standard deviation, and v is the average of the LR
statistics across the sample. Under the assumptions needed to justify use of the
statistic, the large sample distribution of V is standard normal. Under the
hypothesis of model A, V will be positive. If sufficiently large, i.e., greater than
1.96, the test favors model A. If V is sufficiently negative, i.e., less than −1.96,
the test favors model B. The range between −1.96 and +1.96 is inconclusive (at
the 5 percent significance level).
Consider two competing nested logit models:
?;tree=bwtn(bw,tn),bscar(bs,cr)
?;tree=Bus(bs,bw),trncar(tn,cr)
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = bw + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(cr) = invtcar*invt + TC*TC + PC*PC + egtcar*egt
The two proposed models involve the same parameters, though a different
structure for the tree. The results are shown below. The test results super-
ficially favor the second tree structure; V is negative. But the value of the test
statistic, −0.391, is squarely in the inconclusive region. We note that the LL for
the first model is slightly larger. But this is not definitive when the models are
non-nested; nor does it imply the direction of the outcome of the Vuong test.
Results are shown only for the test itself. Estimated models are omitted for
convenience:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
329 Statistical inference
NLOGIT
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr
;model:
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = bw+ actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(cr) = invtcar*invt + TC*TC + PC*PC + egtcar*egt
;tree=bwtn(bw,tn),bscar(bs,cr) $
CREATE ; llmdl1 = logl_obs $
NLOGIT
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr
;model:
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = bw+ actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(cr) = invtcar*invt + TC*TC + PC*PC + egtcar*egt
;tree=Bus(bs,bw),trncar(tn,cr) $
CREATE ; LLmdl2 = logl_obs$
CREATE ; dll = llmdl1 - llmdl2 $
CALC ; for[choice=1];list;dbar = xbr(dll)$
CALC ; for[choice=1];list;sd=sdv(dll)$
CALC ; list ; v = sqr(197)*dbar/sd$
In the second example below, we consider the RUM and RRM models as
competing models. The Vuong test, however, is inconclusive once again. This
is the set up for testing RUM versus RRM. The evidence vaguely favors RUM –
the test statistic is positive, which favors the first model – but the test is
inconclusive – the value is only 0.155:
NLOGIT
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr ?/0.2,0.3,0.1,0.4
;model:
u(bs) = bs + actpt(0)*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = bw+ actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(cr) = invtcar*invt + TC*TC + PC*PC + egtcar*egt $
CREATE ; llmnl=logl_obs $
RRLOGIT
... same model specification $
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
330 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
331 Statistical inference
individuals who chose that alternative were removed from the sample, the
remaining three choices with the associated data should be sufficient to
estimate these same parameters. That is an implication of the MNL model.
It will generally not be the case in other models, such as the multinomial
probit model or any model that relaxes the IIA condition. The strategy, then,
will be to estimate the model parameters under these two scenarios and use a
Wald statistic to measure the difference. A remaining detail is to define how
the covariance matrix for the difference is to be computed. Hausman’s (1978)
famous result (see Hausman and McFadden 1984), adapted for this applica-
tion, is:
The following carries out this test. The model command is modified directly to
remove the second alternative. The ;IAS = tn removes the observations
from the sample. (These are flagged as the 46 “bad observations.”) In comput-
ing a Hausman test, it is a good idea to check the definiteness of the covariance
matrix. It is not guaranteed to be positive definite. When it is not, the test
statistic is not valid. The MATRIX commands below list the characteristic
roots of the matrix. As they are all positive, the test can proceed. The test
statistic is Chi-squared with five degrees of freedom. The value is 16.11. The 95
percent critical value for the Chi-squared variable with five degrees of freedom
is 11.07, so on this basis, the IIA assumption is rejected. This suggests that a
nested logit structure might be tested since (as shown in Chapters 4 and 14), it
permits relaxation of IIA between branches:
NLOGIT
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr
;model:
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = bw+ actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(cr) = invtcar*invt + TC*TC + PC*PC + egtcar*egt $
MATRIX ; b1 = b(2:6) ; v1 = varb(2:6,2:6) $
?b(2:6) are the 5 generic parameters (noting 1=bs) actpt, invcpt, invtpt, egtpt and
trpt
NLOGIT
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr
;model:
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
332 Getting started
When the entire model is generic, Nlogit can carry out the test automatically
(i.e., you do not have to include the matrix commands to identify the para-
meters of interest). The following shows, for example, a model in which only
the public transport alternatives are included in the choice set, and the
constant terms are removed from the utility functions. The “?” removes the
line that defines the utility function for car from the command. In the first
model, ;ias=cr removes the drivers from the sample. In the second, ;ias=tn,cr
removes both drivers and those who choose the train. Since the model is now
completely generic across the alternatives, the Hausman statistic can be
computed by the program. The program reports a value of 6.8644, and a
Chi-squared value with five degrees of freedom. The critical value is 11.07, as
before. This implies that if we restrict attention to those who choose the public
modes, the IIA assumption appears to be valid. Specifically this suggests that
an MNL model is acceptable for the choice among the public transport
alternatives bs and bw:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
333 Statistical inference
NLOGIT
...
;model:
u(bs) = actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf
?u(cr) = invtcar*invt + TC*TC + PC*PC + egtcar*egt
;ias=cr $
NLOGIT
...
;model:
u(bs) = actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf
?u(cr) = invtcar*invt + TC*TC + PC*PC + egtcar*egt
;ias=tn,cr $
-----------------------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
... results omitted ...
Number of obs.= 197, skipped 117 obs
Hausman test for IIA. Excluded choices are
TN CR
ChiSqrd[ 5] = 6.8644, Pr(C>c) = .2309
-----------+----------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
334 Getting started
where P(β,xij) is the multinomial logit probability for outcome j and dij = 1 if
individual i chooses alternative j and zero otherwise. The maximum likelihood
estimator of β is denoted b. The first derivatives of the LL function with
respect to β are:
∂ log LðβÞ XN XJ
g¼ ¼ d ðx x i Þ
j¼1 ij ij
∂β i¼1
; ð7:8Þ
XN
¼ g
i¼1 i
XJ
where x i ¼ j¼1
Pðβ; xij Þxij . The second derivatives are:
∂2 log LðβÞ XN XJ
H¼ ¼ Pij ðxij x i Þðxij x i Þ0
∂β∂β0 i¼1 j¼1
: ð7:9Þ
XN
¼ H
i¼1 i
This matrix forms the basis of the standard errors reported with the estimates
in Section 7.2.1. The theory that justifies the usual estimator above also implies
an alternative estimator, the BHHH estimator (see Chapter 5):
hXN i
Est:Var½bBHHH ¼ B1 where B ¼ g g
i¼1 i i
0
: ð7:11Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
335 Statistical inference
In order for the estimator above to be appropriate, it must be the case that
the parameter estimator, itself, remains consistent even with the failure of
the model assumptions. Thus, the ordinary least squares (OLS) estimator of
the linear regression model remains consistent (and unbiased) whether or
not there is heteroskedasticity. In the settings of the choice models dis-
cussed in this book, it is difficult (we are tempted to suggest impossible,
but there are exceptions in practice) to devise failures of the model assump-
tions under which the MLE would still be consistent. As such, use of the so-
called robust estimator in this setting seems in the main, unjustified, but it
is useful for the analysts to be aware of this since the authors have occa-
sionally found that the inclusion of ;robust resolves problems with stan-
dard errors. It is worth noting that under the full set of model assumptions,
without violations, the robust estimator estimates the same matrix as the
conventional estimator. That is, it is generally benign, even if mostly
redundant.
A specific exception to this observation might apply to models based on
stated choice (SC) experiments, in which individuals answer multiple choice
scenarios. In this case, the data for an individual consist of a “cluster” of
responses that must be correlated since the same individual is answering the
questions. Consider, then, estimating a simple MNL model in an SC experi-
ment in which each individual provides T choice responses. Referring to the
earlier definitions, the “cluster corrected” estimator for this case would be
constructed as follows:
∂2 log LðβÞ XN XT XJ
H¼ ¼ Pijt ðxijt x it Þðxijt x it Þ0 ð7:13Þ
∂β∂β0 i¼1 t¼1 j¼1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
336 Getting started
XN XT XT 0
C¼ i¼1
g
t¼1 it
g
t¼1 it
ð7:14Þ
As an aside, We would encourage analysts to use the ;robust command to see if the
standard errors change in a noticeable way. If they do, then this might suggest some
potential problems with the model specification.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
337 Statistical inference
The following demonstrates the technique for our multinomial logit model
used in earlier examples. Note in the execute command that generates the
bootstraps, we have accounted for the fact that in this sampling setting an
“observation” consists of “cset” rows of data. We use the North West transport
data set (that is also used in Chapters 11 and 13–15) to illustrate the way in
which Nlogit obtains “revised” standard errors for each parameter estimate,
together with confidence intervals:
LOAD;file=“C:\Projects\NWTptStudy_03\NWTModels\ACA Ch 15 ML_RPL models\nw15jul03-
3limdep.SAV.lpj”$
Project file contained 27180 observations.
create
;if(employ=1)ftime=1
;if(whopay=1)youpay=1$
sample;all$
reject;dremove=1$ Bad data
reject;altij=-999$
reject;ttype#1$ work =1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
338 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
339 Statistical inference
proc$
Nlogit
;lhs=resp1,cset,Altij
;choices=NLRail,NHRail,NBway,Bus,Bway,Train,Car
;model:
U(NLRail)= NLRAsc + cost*tcost + invt*InvTime + acwt*waitt+
acwt*acctim + accbusf*accbusf+eggT*egresst
+ ptinc*pinc + ptgend*gender + NLRinsde*inside /
U(NHRail)= TNAsc + cost*Tcost + invt*InvTime + acwt*WaitT + acwt*acctim
+ eggT*egresst + accbusf*accbusf
+ ptinc*pinc + ptgend*gender + NHRinsde*inside /
U(NBway)= NBWAsc + cost*Tcost + invt*InvTime + waitTb*WaitT + accTb*acctim
+ eggT*egresst + accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Bus)= BSAsc + cost*frunCost + invt*InvTime + waitTb*WaitT + accTb*acctim
+ eggT*egresst+ ptinc*pinc + ptgend*gender/
U(Bway)= BWAsc + cost*Tcost + invt*InvTime + waitTb*WaitT + accTb*acctim
+ eggT*egresst + accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Train)= TNAsc + cost*tcost + invt*InvTime + acwt*WaitT + acwt*acctim
+ eggT*egresst + accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Car)= CRcost*costs + CRinvt*InvTime + CRpark*parkcost + CReggT*egresst$
endproc $
|-> execute ; n=100 ; pds = cset ; bootstrap = b $
Completed 100 bootstrap iterations.
-------------------------------------------------------------------------------------------------------
Results of bootstrap estimation of model.
Model has been reestimated 100 times.
Coefficients shown below are the original
model estimates based on the full sample.
Bootstrap samples have 1840 observations.
Estimated parameter vector is B .
Estimated variance matrix saved as VARB. See below.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
340 Getting started
-----------+--------------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
BootStrp| Coefficient Error z |z|>Z* Interval
-----------+--------------------------------------------------------------------------------------------
B001| 2.69464*** .34831 7.74 .0000 2.01197 3.37731
B002| -.18921*** .01565 -12.09 .0000 -.21989 -.15854
B003| -.04940*** .00216 -22.89 .0000 -.05363 -.04517
B004| -.05489*** .00580 -9.46 .0000 -.06626 -.04352
B005| -.09962*** .03819 -2.61 .0091 -.17447 -.02477
B006| -.01157** .00467 -2.48 .0132 -.02072 -.00242
B007| -.00757*** .00164 -4.61 .0000 -.01079 -.00436
B008| 1.34212*** .19793 6.78 .0000 .95419 1.73005
B009| -.94667*** .35724 -2.65 .0081 -1.64685 -.24649
B010| 2.10793*** .32810 6.42 .0000 1.46486 2.75100
B011| -.94474** .43006 -2.20 .0280 -1.78765 -.10184
B012| 1.41575*** .36756 3.85 .0001 .69534 2.13617
B013| -.07612*** .02150 -3.54 .0004 -.11825 -.03398
B014| -.06162*** .00754 -8.17 .0000 -.07641 -.04683
B015| 1.86891*** .30646 6.10 .0000 1.26825 2.46956
B016| 1.76517*** .33121 5.33 .0000 1.11601 2.41433
B017| -.11424*** .02791 -4.09 .0000 -.16894 -.05954
B018| -.03298*** .00401 -8.22 .0000 -.04084 -.02512
B019| -.01513* .00807 -1.88 .0606 -.03094 .00067
B020| -.05190*** .01207 -4.30 .0000 -.07555 -.02825
The mean estimates of each of the 20 parameters for each of 100 repetitions
is shown below (which is available as an Nlogit output called Matrix-
Bootstrp):
Choice modelers should know that the estimated parameters are not indivi-
dually useful. Because the scale of the error terms is not identified, the scale of
the individual parameters is also not identified (see Chapter 4). Therefore we
typically look at ratios of the parameters (usually identifying willingness to
pay (WTP) in the model), or use the parameters to carry out demand
simulations. Even though these are the quantities of interest for policy analy-
sis, it is very rare that any confidence region is given.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 0.165915 −0.00108075 −0.00031882 −0.00093639 −0.00141862 −0.0002714 −0.00033161 0.014501 −0.0416866 0.156543 −0.052072 0.135927 0.00074836 −0.0005873 0.140558 0.140354 0.00530359 0.00057696 −5.01E−06 0.00266974
2 −0.00108075 0.00025531 5.67E−06 1.64E−05 −8.75E−05 −1.44E−06 −2.48E−06 −8.29E−05 −8.07E−05 −0.00082322 −0.00011817 −0.00090936 4.38E−05 −2.69E−05 −0.00073693 −0.00075016 7.84E−05 6.07E−06 −1.51E−06 4.31E−05
3 −0.00031882 5.67E−06 5.44E−06 1.58E−06 3.45E−06 4.49E−07 3.28E−08 −5.99E−05 0.00019033 −0.00030549 0.00018437 −0.00030189 2.45E−06 3.66E−06 −0.00029544 −0.00027027 5.31E−06 1.98E−06 −5.15E−06 −2.97E−06
4 −0.00093639 1.64E−05 1.58E−06 2.90E−05 −8.42E−06 3.10E−06 −1.56E−06 −4.75E−05 −1.57E−05 −0.00083698 6.86E−05 −0.00035509 1.83E−05 −1.35E−06 −0.00035213 −0.00039051 −1.43E−05 −2.12E−06 −4.16E−06 −2.38E−06
5 −0.00141862 −8.75E−05 3.45E−06 −8.42E−06 0.00116714 −1.51E−05 1.19E−05 −0.0003803 9.06E−05 −0.00183793 0.00145623 −0.00154219 8.19E−05 1.25E−05 0.00051615 −0.001902 3.76E−05 4.29E−06 1.26E−05 2.97E−05
6 −0.0002714 −1.44E−06 4.49E−07 3.10E−06 −1.51E−05 2.17E−05 1.66E−06 −4.65E−05 −0.00063502 −0.00026013 −0.00034723 −0.00027256 −6.15E−06 −4.88E−06 −0.0001988 −0.00013342 9.05E−06 1.35E−06 −3.43E−06 3.54E−06
7 −0.00033161 −2.48E−06 3.28E−08 −1.56E−06 1.19E−05 1.66E−06 3.66E−06 −0.00010648 −5.16E−05 −0.0003223 6.66E−05 −0.00035772 −3.56E−06 2.86E−06 −0.00037382 −0.00037228 −6.21E−06 −4.78E−07 2.03E−06 −8.04E−06
8 0.014501 −8.29E−05 −5.99E−05 −4.75E−05 −0.0003803 −4.65E−05 −0.00010648 0.0287223 0.00538133 0.0135027 −0.00735094 0.0161169 −0.00047477 −0.00026649 0.0143727 0.0170331 0.0004132 0.00012268 2.78E−05 0.00022506
9 −0.0416866 −8.07E−05 0.00019033 −1.57E−05 9.06E−05 −0.00063502 −5.16E−05 0.00538133 0.13117 −0.0365324 0.0747337 −0.0317211 −0.00071034 0.00042719 −0.0395635 −0.0377989 −0.00292165 −8.24E−05 0.0003657 −0.00109926
10 0.156543 −0.00082322 −0.00030549 −0.00083698 −0.00183793 −0.00026013 −0.0003223 0.0135027 −0.0365324 0.153527 −0.0516432 0.133401 0.00068171 −0.00060219 0.137201 0.137854 0.00541778 0.00058943 −0.00011974 0.00259394
11 −0.052072 −0.00011817 0.00018437 6.86E−05 0.00145623 −0.00034723 6.66E−05 −0.00735094 0.0747337 −0.0516432 0.171229 −0.0422974 −0.00059645 7.55E−06 −0.0454265 −0.0435231 −0.00184781 −7.62E−05 0.00011958 −0.00154621
12 0.135927 −0.00090936 −0.00030189 −0.00035509 −0.00154219 −0.00027256 −0.00035772 0.0161169 −0.0317211 0.133401 −0.0422974 0.151222 −0.00181177 −0.00073468 0.137492 0.139309 0.00568111 0.00047328 −0.00037522 0.00226945
13 0.00074836 4.38E−05 2.45E−06 1.83E−05 8.19E−05 −6.15E−06 −3.56E−06 −0.00047477 −0.00071034 0.00068171 −0.00059645 −0.00181177 0.00062433 −4.38E−05 −0.00060203 −0.00107577 3.01E−06 2.52E−06 3.58E−05 2.38E−05
14 −0.0005873 −2.69E−05 3.66E−06 −1.35E−06 1.25E−05 −4.88E−06 2.86E−06 −0.00026649 0.00042719 −0.00060219 7.55E−06 −0.00073468 −4.38E−05 6.85E−05 −0.00088243 −0.00105089 −1.43E−05 −4.65E−06 −4.00E−06 −7.49E−06
15 0.140558 −0.00073693 −0.00029544 −0.00035213 0.00051615 −0.0001988 −0.00037382 0.0143727 −0.0395635 0.137201 −0.0454265 0.137492 −0.00060203 −0.00088243 0.147924 0.140278 0.00541487 0.00055942 −0.00028294 0.00269406
16 0.140354 −0.00075016 −0.00027027 −0.00039051 −0.001902 −0.00013342 −0.00037228 0.0170331 −0.0377989 0.137854 −0.0435231 0.139309 −0.00107577 −0.00105089 0.140278 0.147181 0.00524183 0.00057933 −0.00033699 0.00248947
17 0.00530359 7.84E−05 5.31E−06 −1.43E−05 3.76E−05 9.05E−06 −6.21E−06 0.0004132 −0.00292165 0.00541778 −0.00184781 0.00568111 3.01E−06 −1.43E−05 0.00541487 0.00524183 0.00094587 3.75E−06 −0.0001289 0.00010056
18 0.00057696 6.07E−06 1.98E−06 −2.12E−06 4.29E−06 1.35E−06 −4.78E−07 0.00012268 −8.24E−05 0.00058943 −7.62E−05 0.00047328 2.52E−06 −4.65E−06 0.00055942 0.00057933 3.75E−06 1.30E−05 −2.68E−06 9.51E−06
19 −5.01E−06 −1.51E−06 −5.15E−06 −4.16E−06 1.26E−05 −3.43E−06 2.03E−06 2.78E−05 0.0003657 −0.00011974 0.00011958 −0.00037522 3.58E−05 −4.00E−06 −0.00028294 −0.00033699 −0.0001289 −2.68E−06 6.94E−05 −1.59E−05
20 0.00266974 4.31E−05 −2.97E−06 −2.38E−06 2.97E−05 3.54E−06 −8.04E−06 0.00022506 −0.00109926 0.00259394 −0.00154621 0.00226945 2.38E−05 −7.49E−06 0.00269406 0.00248947 0.00010056 9.51E−06 −1.59E−05 0.0002128
Var(B): (note obtained by copying into Excel first to select a nice format)
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 2.81132 −0.18297 −0.04719 −0.0587 −0.08487 −0.0129 −0.01025 1.52097 −0.80526 2.23291 −1.24002 1.5981 −0.08306 −0.06737 2.12124 2.00761 −0.08996 −0.03434 −0.02993 −0.02727
2 2.81809 −0.19565 −0.05041 −0.05373 −0.09894 −0.0046 −0.01084 1.05731 −1.48296 2.212 −0.75482 1.30451 −0.05598 −0.05658 1.98883 1.7691 −0.12627 −0.03712 −0.01732 −0.03975
3 2.57788 −0.16473 −0.04799 −0.05383 −0.12556 −0.01483 −0.00855 1.2352 −0.93535 2.02848 −0.95269 1.68349 −0.10294 −0.06565 1.98603 1.82105 −0.08978 −0.03574 −0.01876 −0.05305
4 2.72124 −0.20624 −0.05206 −0.05977 −0.12502 −0.01191 −0.00591 1.00162 −0.35361 2.08802 0.144309 1.69511 −0.12123 −0.06904 1.94814 1.92527 −0.16506 −0.03334 −0.00619 −0.07128
5 3.10422 −0.19186 −0.0529 −0.05186 −0.13337 −0.01026 −0.00917 1.51472 −0.8931 2.51209 −1.19772 1.96268 −0.09026 −0.07051 2.39108 2.28459 −0.11165 −0.0317 −0.01721 −0.05128
6 2.26865 −0.17854 −0.04674 −0.05343 −0.07078 −0.00367 −0.00419 1.11763 −1.02347 1.55317 −0.96239 0.945271 −0.09843 −0.0466 1.40398 1.2679 −0.10378 −0.03228 −0.02315 −0.04116
7 2.64665 −0.19094 −0.04852 −0.05103 −0.10226 −0.0098 −0.00932 1.22533 −0.81653 2.01424 −0.40033 1.67191 −0.04997 −0.06453 1.91512 1.7967 −0.12918 −0.03108 −0.01454 −0.06142
8 2.95462 −0.20911 −0.04987 −0.06669 −0.14387 −0.00292 −0.00796 1.47679 −0.84789 2.22025 −0.86707 1.14534 −0.07631 −0.06957 1.65506 1.84138 −0.15149 −0.0339 −0.01023 −0.05318
9 3.94335 −0.22067 −0.05146 −0.0609 −0.07346 −0.01419 −0.01076 1.75702 −1.04462 3.38164 −1.29255 2.58201 −0.07182 −0.07971 3.12254 3.20887 −0.07018 −0.02479 −0.01899 −0.0371
10 2.3672 −0.17675 −0.04986 −0.05405 −0.09359 −0.01581 −0.00714 1.28591 −0.24646 1.93488 −0.6456 1.07259 −0.04502 −0.05956 1.66166 1.50386 −0.12367 −0.03414 −0.00404 −0.05521
11 2.14678 −0.18001 −0.04616 −0.04792 −0.09858 −0.00342 −0.00821 1.36864 −0.76515 1.5971 −0.50044 1.15732 −0.05448 −0.06056 1.4869 1.41516 −0.0529 −0.03324 −0.03708 −0.06938
12 2.44026 −0.16694 −0.04531 −0.05312 −0.11442 −0.00862 −0.00839 1.16039 −1.17276 1.85154 −0.73539 1.28129 −0.10735 −0.05588 1.64287 1.65119 −0.08345 −0.03322 −0.02959 −0.05177
13 3.3571 −0.21865 −0.04727 −0.05328 −0.09548 −0.01134 −0.01254 1.40929 −1.59061 2.47015 −1.15611 1.82621 −0.05721 −0.06369 2.43317 2.17521 −0.09886 −0.03484 −0.02007 −0.04865
14 2.53811 −0.178 −0.04811 −0.05772 −0.11028 −0.01634 −0.00833 1.26671 −0.43714 1.91162 −0.44643 1.22869 −0.06285 −0.06446 1.4911 1.44315 −0.13651 −0.03213 −0.006 −0.08345
15 1.98343 −0.16758 −0.05016 −0.04312 −0.10192 −0.00794 −0.00854 1.78754 −0.90579 1.4132 −1.18115 0.952768 −0.08152 −0.06778 1.43979 1.36476 −0.13479 −0.03576 −0.01312 −0.05379
16 2.77402 −0.18195 −0.05423 −0.05537 −0.09696 −0.01472 −0.01023 1.00378 −0.9336 2.08753 −1.00619 1.41782 −0.05852 −0.06762 1.88161 1.72324 −0.18225 −0.03737 −0.00612 −0.05093
17 2.91102 −0.18798 −0.04823 −0.0502 −0.06937 −0.00916 −0.00766 1.0609 −1.17684 2.29573 −1.66289 1.50493 −0.02153 −0.0598 2.04183 1.66466 −0.07818 −0.03431 −0.00714 −0.04823
18 3.14602 −0.18882 −0.04862 −0.04952 −0.20991 −0.01007 −0.01011 1.73221 −1.15075 2.5057 −0.97251 1.95995 −0.0833 −0.07368 2.16754 2.40897 −0.1131 −0.03049 −0.0108 −0.04351
19 2.5014 −0.17099 −0.04794 −0.06049 −0.15066 −0.01169 −0.00625 1.54847 −1.13953 1.83937 −1.13264 1.2547 −0.08383 −0.0659 1.37629 1.53763 −0.09286 −0.03412 −0.00749 −0.07202
20 2.39102 −0.18341 −0.05032 −0.05362 −0.18438 −0.00305 −0.00914 1.50861 −0.75403 1.87434 −1.14347 1.01835 −0.07678 −0.06803 1.5235 1.54669 −0.14127 −0.03358 −0.01595 −0.06478
21 3.22899 −0.19643 −0.05095 −0.06732 −0.04383 −0.02249 −0.00901 1.50592 −1.04799 2.62052 −0.8391 1.90627 −0.09959 −0.06952 2.4624 2.22293 −0.13523 −0.02884 −0.00925 −0.05363
22 2.49108 −0.20118 −0.04985 −0.05959 −0.08565 −0.01404 −0.00695 1.352 −0.99131 1.89138 −0.90557 1.13537 −0.08202 −0.06649 1.69282 1.54012 −0.12144 −0.0378 −0.01682 −0.03921
23 3.2374 −0.1882 −0.04949 −0.06673 −0.11753 −0.01456 −0.00807 1.5467 −0.55589 2.51868 −1.43623 1.50086 −0.05532 −0.0636 1.96525 1.87111 −0.09445 −0.03186 −0.01444 −0.05095
24 2.83047 −0.17427 −0.04973 −0.05205 −0.08511 −0.01522 −0.00812 1.46867 −1.16228 2.2206 −0.83146 1.55752 −0.0503 −0.05633 1.94142 1.89116 −0.08498 −0.03183 −0.02514 −0.03981
25 2.77202 −0.1917 −0.04864 −0.06041 −0.12566 −0.01607 −0.00932 1.49141 −0.81905 2.11683 −0.62124 1.38759 −0.07963 −0.06996 1.79877 1.79339 −0.10592 −0.03504 −0.02071 −0.05756
26 2.56711 −0.20678 −0.04618 −0.06222 −0.13406 −0.00878 −0.00519 1.44289 −0.85722 2.0132 −0.69176 1.2517 −0.09212 −0.0569 1.50269 1.61936 −0.13511 −0.03085 −0.019 −0.04761
27 3.23474 −0.16288 −0.05218 −0.05453 −0.08053 −0.01027 −0.00857 1.48734 −1.55624 2.68776 −1.75428 1.7691 −0.05716 −0.08203 2.59825 2.43802 −0.0927 −0.02824 −0.0202 −0.02825
28 2.2625 −0.17636 −0.0464 −0.05338 −0.12389 −0.01 −0.00676 1.56733 −0.28504 1.66127 0.116758 0.718882 −0.04073 −0.0638 1.28315 1.13038 −0.13454 −0.02764 −0.01455 −0.07632
29 2.81633 −0.19692 −0.05114 −0.05401 −0.09596 −0.02175 −0.00927 1.44821 −0.67327 2.17097 −0.74266 1.61842 −0.10641 −0.05408 2.06534 1.94799 −0.11057 −0.03484 −0.02568 −0.05713
30 2.80843 −0.19552 −0.0499 −0.05466 −0.10292 −0.01629 −0.00948 1.25976 −0.94786 2.17608 −1.62568 1.72198 −0.09595 −0.05969 2.07177 1.91855 −0.1182 −0.03194 −0.02503 −0.05645
31 2.71947 −0.18924 −0.0492 −0.05237 −0.06242 −0.01621 −0.00926 1.35839 −1.01852 2.04652 −0.75197 1.28989 −0.06746 −0.04765 1.81893 1.70068 −0.16304 −0.03122 −0.00156 −0.06033
32 2.95597 −0.18995 −0.05056 −0.05252 −0.13936 −0.01862 −0.01262 1.35268 −1.02157 2.34785 −0.88923 1.55905 −0.00878 −0.06968 2.12957 1.88312 −0.0932 −0.04102 −0.02358 −0.04445
33 2.0589 −0.17867 −0.04898 −0.05507 −0.05338 −0.009 −0.00724 1.23153 −0.80042 1.44277 −0.48732 0.705188 −0.06554 −0.06855 1.46379 1.0651 −0.1566 −0.03618 −0.00797 −0.03871
34 2.61638 −0.1827 −0.0477 −0.05621 −0.10418 −0.01065 −0.00863 1.2044 −1.28756 2.24202 −0.85117 1.48749 −0.05863 −0.06795 1.93929 1.81402 −0.11776 −0.03051 −0.01544 −0.05143
35 2.79489 −0.20199 −0.04678 −0.06574 −0.10138 −0.018 −0.00903 1.40822 −0.81104 2.22641 −1.25682 1.42656 −0.0662 −0.05338 1.77326 1.68907 −0.09727 −0.03703 −0.00371 −0.0837
36 2.51444 −0.17489 −0.05021 −0.0531 −0.10651 −0.01867 −0.00748 1.49798 −0.11466 1.82296 −0.18897 1.02318 −0.04938 −0.05784 1.58219 1.47589 −0.15924 −0.03064 −0.015 −0.03358
37 0.98574 −0.21921 −0.04842 −0.05488 −0.05117 −0.00836 −0.00398 1.37423 −0.62738 0.331399 −0.17352 −0.36942 −0.05966 −0.05683 0.124198 0.100593 −0.21998 −0.04374 0.003924 −0.10913
38 3.06507 −0.20754 −0.04911 −0.05301 −0.10635 −0.00962 −0.01231 1.51725 −0.82811 2.46964 −1.1568 1.81187 −0.05026 −0.06773 2.27538 2.18079 −0.11148 −0.03006 −0.02521 −0.06244
39 3.31969 −0.20555 −0.05173 −0.06063 −0.09401 −0.01751 −0.0065 1.42213 −0.96057 2.55146 −0.96278 2.00537 −0.0765 −0.06887 2.38122 2.20527 −0.12419 −0.02969 −0.00419 −0.04336
40 2.04124 −0.21879 −0.04627 −0.05943 −0.072 −0.01885 −0.00393 1.11167 −0.62812 1.45083 −0.69045 0.767981 −0.08066 −0.0529 1.14033 1.04435 −0.13321 −0.03703 −0.02795 −0.08431
41 2.98034 −0.18286 −0.05328 −0.05877 −0.11699 −0.01133 −0.00722 1.37904 −1.43477 2.46139 −1.38785 1.69772 −0.07746 −0.06499 2.16103 2.09061 −0.0775 −0.03496 −0.02989 −0.05863
42 3.16544 −0.19069 −0.04747 −0.05655 −0.09147 −0.01649 −0.00682 1.47533 −1.37752 2.6037 −1.05634 2.07543 −0.07563 −0.06066 2.33698 2.27288 −0.05107 −0.02658 −0.02858 −0.06464
43 3.10965 −0.19328 −0.04995 −0.06513 −0.13171 −0.01296 −0.00558 1.42002 −1.4035 2.51408 −1.28524 1.46141 −0.04971 −0.06553 1.94017 1.77966 −0.08576 −0.03384 −0.01466 −0.05805
44 2.62787 −0.20805 −0.05138 −0.05435 −0.09166 −0.00683 −0.00527 1.1766 −1.0842 2.17529 −1.3063 1.229 −0.06766 −0.05037 1.83722 1.63284 −0.12416 −0.03202 −0.014 −0.04684
45 2.37621 −0.19193 −0.05089 −0.03864 −0.14647 −0.01113 −0.00623 1.12176 −1.19713 1.86862 −0.78461 1.53052 −0.08386 −0.07385 2.10631 2.07361 −0.11531 −0.03584 −0.01002 −0.06369
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
46 3.17222 −0.18174 −0.05333 −0.05974 −0.1155 −0.01078 −0.00696 1.4535 −1.67472 2.54421 −1.15203 1.72399 −0.06158 −0.06722 2.18401 2.10444 −0.1029 −0.03324 −0.00439 −0.04247
47 2.95095 −0.1617 −0.05109 −0.0507 −0.18614 −0.01248 −0.00964 1.19089 −1.32231 2.4869 −0.58043 1.51172 −0.0507 −0.06013 2.01828 1.94035 −0.11471 −0.02991 −0.02425 −0.03644
48 2.99199 −0.1956 −0.05349 −0.06168 −0.06434 −0.01426 −0.00529 1.23891 −1.31825 2.26075 −0.71209 1.34862 −0.06428 −0.05956 2.04794 1.63261 −0.05372 −0.03928 −0.02372 −0.05111
49 2.29455 −0.19736 −0.0461 −0.04584 −0.07291 −0.01368 −0.0088 1.51337 −0.33932 1.62878 −0.335 1.36336 −0.08855 −0.0672 1.75667 1.64039 −0.1607 −0.02905 −0.00769 −0.05697
50 2.98313 −0.18612 −0.04946 −0.05846 −0.1318 −0.01648 −0.01029 1.32688 −0.82229 2.35077 −0.6845 1.20096 −0.06263 −0.06014 2.04867 1.92991 −0.14129 −0.0327 −0.0059 −0.0699
51 2.5045 −0.19816 −0.04634 −0.05125 −0.13205 −0.01795 −0.00977 1.43752 −0.41064 1.89174 −0.42894 1.27371 −0.09516 −0.05406 1.58647 1.54188 −0.15657 −0.03152 −0.02179 −0.05347
52 2.43068 −0.17349 −0.04904 −0.06129 −0.10145 −0.01658 −0.00673 1.2732 −0.46946 2.02286 −0.64993 1.44293 −0.1181 −0.05356 1.79601 1.69256 −0.14662 −0.03057 −0.00349 −0.0564
53 2.75678 −0.15434 −0.05198 −0.05826 −0.06112 −0.01581 −0.00559 1.24625 −0.8529 2.1026 −0.74841 1.1988 −0.0392 −0.08183 1.88525 1.64962 −0.09423 −0.03449 −0.0067 −0.06674
54 2.77522 −0.16273 −0.05021 −0.05548 −0.08262 −0.01176 −0.0095 1.56674 −1.35775 2.07649 −0.9251 1.44934 −0.10404 −0.06119 2.07197 1.86698 −0.11333 −0.03232 −0.01322 −0.04813
55 2.80767 −0.1603 −0.04772 −0.05522 −0.14742 −0.00521 −0.00838 1.17797 −1.0311 2.18692 −1.25558 1.67368 −0.06462 −0.07936 1.90168 1.92702 −0.11263 −0.02995 −0.00821 −0.04171
56 2.79175 −0.17703 −0.05142 −0.05966 −0.17103 −0.00091 −0.0084 1.49696 −1.16735 2.20933 −1.20499 1.54898 −0.11687 −0.07533 2.06002 2.12009 −0.08374 −0.03759 −0.01749 −0.05251
57 2.9819 −0.17449 −0.04968 −0.05553 −0.06397 −0.01341 −0.00755 1.4479 −0.92529 2.32797 −0.94322 1.61913 −0.06127 −0.05815 2.10833 1.87698 −0.12032 −0.02859 −0.0053 −0.04822
58 2.77843 −0.18974 −0.05043 −0.05546 −0.1185 −0.01883 −0.0088 1.46535 −0.77753 2.23648 −0.7434 1.54402 −0.0739 −0.06112 1.94037 1.76184 −0.11675 −0.03292 −0.01324 −0.05457
59 3.02765 −0.17303 −0.05287 −0.06042 −0.09038 −0.01121 −0.00802 1.36309 −1.06731 2.39519 −1.33398 1.40036 −0.07153 −0.05533 1.93236 1.82162 −0.13308 −0.03208 −0.00897 −0.02596
60 2.025 −0.14003 −0.04104 −0.05174 −0.14117 −0.00998 −0.00599 0.944203 −1.15478 1.44992 −1.33189 0.499766 −0.07328 −0.05291 1.066 1.09713 −0.12764 −0.02812 −0.02329 −0.04763
61 2.31248 −0.19131 −0.04877 −0.05194 −0.11191 −0.00883 −0.00801 1.17341 −0.75476 1.80224 −0.46079 1.51195 −0.09509 −0.08021 1.7131 1.64153 −0.10458 −0.0328 −0.03411 −0.08643
62 2.27008 −0.18105 −0.05036 −0.04992 −0.10247 −0.00706 −0.00506 1.07515 −1.09883 1.75761 −0.97857 0.959756 −0.03431 −0.06306 1.42756 1.35662 −0.0794 −0.04086 −0.01185 −0.06606
63 3.04276 −0.20755 −0.05255 −0.06411 −0.15023 −0.01603 −0.00912 1.53682 −0.11284 2.45696 −0.61502 1.3419 −0.0557 −0.0516 1.80433 1.71237 −0.14561 −0.03393 −0.00196 −0.06927
64 2.43177 −0.18065 −0.05223 −0.05471 −0.1093 −0.01941 −0.00912 1.54572 −0.74485 1.80062 −0.67672 1.12182 −0.05519 −0.082 1.64031 1.62111 −0.17906 −0.03743 −0.00152 −0.05657
65 2.38757 −0.18295 −0.0515 −0.05126 −0.05804 −0.01171 −0.00819 1.52026 −1.12153 1.82133 −1.2963 1.34591 −0.09993 −0.06873 1.91553 1.68603 −0.09618 −0.0336 −0.02981 −0.04833
66 2.83914 −0.21156 −0.05648 −0.05409 −0.06105 −0.00374 −0.00693 1.57635 −1.58025 2.2265 −1.42363 1.7443 −0.06867 −0.06768 2.06946 1.83772 −0.15771 −0.03357 −0.00436 −0.0519
67 3.41428 −0.22346 −0.04928 −0.06038 −0.12469 −0.00646 −0.00584 1.11737 −1.40967 2.71023 −0.98978 1.56797 −0.05444 −0.05117 2.29396 2.246 −0.11001 −0.02745 −0.0158 −0.03817
68 3.02309 −0.19251 −0.04946 −0.06104 −0.09613 −0.00792 −0.00706 1.06884 −1.17921 2.29943 −0.57873 1.63914 −0.08864 −0.06861 2.04873 2.03048 −0.09626 −0.03441 −0.00699 −0.05666
69 2.06503 −0.19379 −0.04922 −0.04105 −0.05705 −0.01287 −0.00728 1.24259 −0.98585 1.52202 −0.66483 0.960847 −0.07504 −0.06235 1.54577 1.50693 −0.11777 −0.03714 −0.01071 −0.06162
70 2.5089 −0.18348 −0.05208 −0.04411 −0.08844 −0.01045 −0.00797 1.54436 −0.72763 1.99424 −0.67102 1.49246 −0.06523 −0.06778 1.91189 1.74991 −0.11276 −0.03271 −0.00568 −0.04884
71 2.31269 −0.18141 −0.04963 −0.05397 −0.06141 −0.01151 −0.00657 1.39917 −0.92023 1.6933 −0.68812 1.09403 −0.05482 −0.07159 1.44782 1.36107 −0.14627 −0.03496 −0.00813 −0.04053
72 3.18683 −0.19814 −0.04967 −0.0572 −0.12052 −0.00716 −0.00939 1.50497 −0.59918 2.58377 −1.12222 1.86604 −0.10131 −0.06615 2.35493 2.20985 −0.11105 −0.03074 −0.01639 −0.03552
73 2.32795 −0.21771 −0.04927 −0.05286 −0.08125 −0.01377 −0.00616 1.21188 −0.9629 1.73549 −2.05236 0.980039 −0.09158 −0.05665 1.65563 1.42218 −0.19034 −0.03176 −0.011 −0.0787
74 2.74255 −0.19572 −0.04686 −0.05763 −0.08259 0.002855 −0.00845 1.55652 −1.42871 2.14646 −0.99659 1.37818 −0.08501 −0.06235 2.06162 1.8296 −0.11061 −0.02827 −0.01145 −0.05084
75 2.32231 −0.15281 −0.04659 −0.0547 −0.13564 −0.01884 −0.00932 1.40853 −0.80771 1.78096 −0.41756 0.790214 −0.05726 −0.0635 1.39931 1.27066 −0.14147 −0.03322 −0.01451 −0.05987
76 2.17538 −0.18967 −0.04638 −0.05851 −0.10705 −0.01449 −0.00563 1.34533 −0.34237 1.68484 −0.57325 1.01311 −0.08859 −0.06523 1.40836 1.41189 −0.12695 −0.03311 −0.02154 −0.04748
77 2.72412 −0.20056 −0.04675 −0.05665 −0.08014 −0.01155 −0.00521 1.20552 −0.47215 2.16914 −1.28922 1.09962 −0.02048 −0.05727 1.68026 1.43029 −0.11599 −0.03031 −0.01114 −0.05632
78 1.91814 −0.1964 −0.04787 −0.05486 −0.07038 −0.01095 −0.0051 1.35744 −0.69987 1.35364 −0.58791 0.623303 −0.06613 −0.05868 1.12906 0.994532 −0.13906 −0.03598 −0.02282 −0.04005
79 2.20662 −0.19111 −0.05014 −0.04981 −0.19346 −0.01007 −0.0078 1.23383 −0.78591 1.7236 −1.40426 1.03994 −0.1051 −0.05209 1.47565 1.46771 −0.17259 −0.03601 −0.01918 −0.07653
80 3.00024 −0.20155 −0.04971 −0.0591 −0.07123 −0.02152 −0.01056 1.3618 0.008392 2.41963 −0.10384 1.89266 −0.11646 −0.04384 2.21873 2.00268 −0.08718 −0.03597 −0.02102 −0.05196
81 2.20187 −0.2134 −0.05136 −0.0447 −0.11397 −0.00711 −0.00652 1.44177 −1.43173 1.66829 −0.89508 1.16437 −0.08945 −0.06371 1.54451 1.46267 −0.15077 −0.03238 −0.01829 −0.08065
82 2.53619 −0.19302 −0.05227 −0.06568 −0.10307 −0.00749 −0.00515 1.40546 −0.74195 2.02886 −1.0698 1.42958 −0.14272 −0.05792 1.77499 1.67692 −0.12723 −0.03339 −0.01941 −0.05259
83 3.07634 −0.18124 −0.04843 −0.05507 −0.04816 −0.01513 −0.00829 1.36687 −0.65342 2.45959 −0.92009 1.65776 −0.07046 −0.06133 2.41075 2.08162 −0.12378 −0.02955 −0.00477 −0.04278
84 2.62186 −0.18269 −0.05022 −0.05315 −0.09648 −0.01195 −0.00508 1.25873 −1.15127 2.07582 −1.2258 1.12544 −0.01773 −0.06556 1.65728 1.56637 −0.1088 −0.03532 −0.0042 −0.05806
85 2.36809 −0.19074 −0.04722 −0.05243 −0.08951 −0.01415 −0.00741 1.42243 −0.46385 1.89498 −0.63004 1.24193 −0.0644 −0.0578 1.56259 1.39717 −0.09157 −0.03669 −0.02342 −0.04998
86 3.03783 −0.18852 −0.05347 −0.06074 −0.16869 −0.0133 −0.00721 1.35547 −0.86968 2.54437 −0.82365 1.80347 −0.09182 −0.06534 1.97871 2.03655 −0.13738 −0.03159 −0.00819 −0.05135
87 2.39716 −0.20259 −0.04947 −0.04965 −0.09892 −0.0132 −0.00953 1.26216 −0.59307 1.81611 −1.04873 1.44305 −0.11095 −0.04688 1.77885 1.50949 −0.11551 −0.04477 −0.01328 −0.03575
88 2.81868 −0.17311 −0.04979 −0.05404 −0.11221 −0.01013 −0.01095 1.49612 −1.14932 2.19736 −2.43795 1.36928 −0.02896 −0.06482 1.7804 1.71047 −0.15689 −0.03217 −0.01479 −0.03366
89 2.47773 −0.18815 −0.05315 −0.06111 −0.12201 −0.01025 −0.00337 1.42279 −1.30915 1.73941 −1.28193 1.05775 −0.12024 −0.05135 1.37722 1.2405 −0.13661 −0.035 −0.00701 −0.09087
90 2.73949 −0.20917 −0.05012 −0.05907 −0.08778 −0.00909 −0.00579 1.25704 −1.65573 2.08618 −0.85935 1.26021 −0.04961 −0.06128 1.74798 1.63899 −0.13803 −0.02646 −0.02607 −0.05491
91 2.49617 −0.17754 −0.04636 −0.05771 −0.04394 −0.01006 −0.0058 1.29957 −1.18184 1.89537 −0.59179 1.08621 −0.0654 −0.07338 1.91722 1.4971 −0.12395 −0.02655 −0.01569 −0.06601
92 2.97021 −0.18326 −0.04997 −0.06133 −0.06877 −0.0093 −0.00863 1.36207 −0.88372 2.36211 −0.54977 1.60368 −0.1002 −0.06614 2.05749 2.10231 −0.04135 −0.03335 −0.0259 −0.05501
93 3.06323 −0.20994 −0.0478 −0.05702 −0.07107 −0.00915 −0.00754 1.52055 −1.05284 2.35792 −0.89739 1.69016 −0.11499 −0.05139 2.30026 2.05679 −0.09196 −0.03088 −0.01414 −0.04017
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
94 2.72454 −0.18678 −0.05211 −0.05204 −0.14971 −0.01677 −0.00832 1.51471 −0.79036 2.11146 −1.24719 1.61151 −0.07906 −0.04786 1.75335 1.6258 −0.09117 −0.03622 −0.01129 −0.0631
95 2.97069 −0.19496 −0.0478 −0.05339 −0.13895 −0.01569 −0.00773 1.26759 −1.17626 2.42418 −1.2627 1.7293 −0.07298 −0.06396 2.05353 2.18491 −0.12313 −0.03155 −0.01629 −0.04242
96 3.09314 −0.20704 −0.04933 −0.06206 −0.05454 −0.01123 −0.00886 1.15129 −0.79159 2.50301 −0.65512 1.48748 −0.04611 −0.07176 2.15281 2.09623 −0.13545 −0.03061 −0.01253 −0.04733
97 2.33055 −0.17391 −0.04829 −0.05447 −0.08478 −0.00941 −0.00731 1.32992 −0.57779 1.7923 −0.40595 0.911452 −0.05697 −0.07259 1.65931 1.40245 −0.14603 −0.02682 −0.01474 −0.06252
98 2.7949 −0.20094 −0.05231 −0.05729 −0.13112 −0.01601 −0.00546 1.08688 −1.18668 2.04567 −0.83092 1.30328 −0.05069 −0.06604 1.55092 1.56347 −0.15077 −0.03867 −0.00622 −0.04771
99 2.48321 −0.2049 −0.05136 −0.06118 −0.0833 −0.00932 −0.00439 1.21036 −1.53589 1.87442 −0.9157 1.17197 −0.09312 −0.0538 1.53231 1.40838 −0.11112 −0.03792 −0.02194 −0.05207
100 2.64511 −0.19791 −0.04907 −0.05448 −0.08583 −0.01236 −0.00722 1.38516 −0.41325 2.06043 −0.44269 1.17563 −0.05055 −0.07039 1.7151 1.76961 −0.10696 −0.03234 −0.02115 −0.0764
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
345 Statistical inference
As an aside, David Brownstone in 2000 made the important comment: “Judging from
reading many applied papers, the implied assertion is that if the individual coefficients
have high t-statistics, then any nonlinear combination of them must also have high t-
statistics.”
This is obviously incorrect. Even if the asymptotic normal approximation to the joint
distribution of the parameter estimates is accurate, there is no reason why the ratios of any
two of these parameters would even have a mean or a variance. If the parameter estimates
are uncorrelated, then the ratios will typically have a Cauchy distribution (which has no finite
moments). This fact suggests that standard delta-method approximations (see above and
also Greene 1997, 127 and 916) will not yield reliable inferences, although the resulting
standard error estimates are certainly better than nothing!
Car drivers’ willingness to pay higher tolls for a shorter trip would be
measured by:
wtp = invtcar/tc.
Since invtcar and tc are estimated parameters with sampling variances, wtp is
also an estimated parameter. A more involved example is provided by partial
effects (see Chapters 8 and 13 for more details) calculations. Consider a binary
choice model based on our example for whether an individual chooses to drive
or take some other mode. A logit model would appear as:
where ε has a standardized logistic distribution (mean zero, variance one). The
econometric model that follows is:
expðβ0 xÞ
Probðchoose carÞ ¼ ¼ Λðβ0 xÞ: ð7:17Þ
1 þ expðβ0 xÞ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
346 Getting started
∂Λðβ0 xÞ
¼ Λðβ0 xÞ½1 Λðβ0 xÞ β ¼ δðβ0 xÞ: ð7:18Þ
∂x
W ¼ GVG0 : ð7:20Þ
For the wtp example, V would be the 2 × 2 matrix of sampling variances and
covariance for (invtcar,tc) and G would be the 1 × 2 (one function × two
parameters) matrix:
^ ð1
G ¼ ð1 2Þ ^ Þbx
^ 0 ^
; ¼ Λðb0 xÞ: ð7:22Þ
For functions such as these partial effects, which are functions of the data as
well as the parameters, there is a question as to how to handle the data part. It
is common to do the computation at the means of the data – this produces the
“partial effects at the means.” Many applications have suggested instead
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
347 Statistical inference
computing “average partial effects.” To compute the average partial effects, the
effects are computed at each observation, and the effects themselves, rather
than the data, are averaged. The delta method must be modified in this case –
the change requires only that the average Jacobian, rather than the Jacobian at
the means, be used to compute W. (See Greene 2012 for details.)
Nlogit provides two devices for computing variances of functions. The
WALD command used in Section 7.2.1.2 can be used for basic functions of
variables. For functions such as partial effects that involve the data, two
commands, SIMULATE and PARTIALS, are used to do the relevant aver-
aging or summing over the observations and deriving the appropriate var-
iances (both discussed in detail in Chapter 13). Both procedures can be used
for the delta method or the KR method described in Section 7.4.2.
This first example uses the delta method to compute a WTP measure based
on the MNL model. We use two data sets (the RP component as above and the
SC data from the same survey). The model is estimated first. The estimated
WTP is reported as the function value by the WALD command. The evidence
below suggests that the mean estimate of WTP (i.e., value of travel time
savings in $/min.) is not statistically significantly different from 0 (z = 1.23),
at the 95 percent level of confidence for the RP data set but is statistically
significant (z = 3.44) for the SC data:
?RP Data set
nlogit
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr
;model:
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = bw+ actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(cr) = invtcar*invt + TC*TC + PC*PC + egtcar*egt $
Wald ; Parameters = b ; Covariance = Varb
; Labels = 8_c,binvt,btc,c11,c12
? Note that 8_c means c1,c2,c3,c4,c5,c6,c7,c8, which are first 8 parameters in
output
; fn1 = wtp = binvt/btc $
----------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -200.40253
Estimation based on N = 197, K = 12
Inf.Cr.AIC = 424.8 AIC/N = 2.156
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;...;RHS=ONE$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
348 Getting started
Chi-squared[ 9] = 132.82086
Prob [ chi squared > value ] = .00000
Response data are given as ind. choices
Number of obs.= 197, skipped 0 obs
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| -1.87740** .74583 -2.52 .0118 -3.33920 -.41560
ACTPT| -.06036*** .01844 -3.27 .0011 -.09650 -.02423
INVCPT| -.08571* .04963 -1.73 .0842 -.18299 .01157
INVTPT| -.01106 .00822 -1.35 .1782 -.02716 .00504
EGTPT| -.04117** .02042 -2.02 .0438 -.08119 -.00114
TRPT| -1.15503*** .39881 -2.90 .0038 -1.93668 -.37338
TN| -1.67343** .73700 -2.27 .0232 -3.11791 -.22894
BW| -1.87376** .73750 -2.54 .0111 -3.31924 -.42828
INVTCAR| -.04963*** .01166 -4.26 .0000 -.07249 -.02677
TC| -.11063 .08471 -1.31 .1916 -.27666 .05540
PC| -.01789 .01796 -1.00 .3192 -.05310 .01731
EGTCAR| -.05806* .03309 -1.75 .0793 -.12291 .00679
-----------+-----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------------------------------
WALD procedure. Estimates and standard errors for nonlinear functions and
joint test of nonlinear restrictions.
Wald Statistic = 1.52061
Prob. from Chi-squared[ 1] = .21753
Functions are computed at means of variables
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
WaldFcns| Function Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
WTP| .44859 .36378 1.23 .2175 -.26441 1.16158
-----------+-----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
349 Statistical inference
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
350 Getting started
WALD procedure. Estimates and standard errors for nonlinear functions and
joint test of nonlinear restrictions.
Wald Statistic = 11.82781
Prob. from Chi-squared[ 1] = .00058
Functions are computed at means of variables
-----------+-------------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
WaldFcns| Function Error z |z|>Z* Interval
-----------+-------------------------------------------------------------------------------------------
WTP| .28870*** .08395 3.44 .0006 .12417 .45323
This second example fits a binary logit model to the choice of whether to
drive or not for the SC data set. In the three commands, the “if[altij = 4];”
restricts the analysis to the sub-set of the sample in which the variable altij
equals 4 – that is the outcome row for (cr) in the choice model. The LOGIT
command then fits a binary logit model to the outcome. The two
PARTIALS commands compute partial effects for the four indicated vari-
ables. The first computes the average partial effects. The second computes
the partial effects at the means of the variables in the model. The results are
broadly similar, though perhaps less so than we might expect based on only
sampling variability. In fact, average partial effects and partial effects at the
means are slightly different functions:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
351 Statistical inference
-----------------------------------------------------------------------------------------------------
Partial Effects for Logit Probability Function
Partial Effects Averaged Over Observations
-----------------------------------------------------------------------------------------------------
Partial Standard
(Delta method) Effect Error |t| 95% Confidence Interval
-----------------------------------------------------------------------------------------------------
TC -.02461 .01265 1.95 -.04940 .00019
PC -.00458 .00282 1.63 -.01010 .00094
EGT -.01210 .00527 2.30 -.02243 -.00178
INVT -.00691 .00148 4.68 -.00981 -.00402
-----------------------------------------------------------------------------------------------------
Partial Effects Computed at data Means
-----------------------------------------------------------------------------------------------------
Partial Standard
(Delta method) Effect Error |t| 95% Confidence Interval
-----------------------------------------------------------------------------------------------------
TC -.03348 .01805 1.85 -.06887 .00190
PC -.00623 .00393 1.59 -.01392 .00147
EGT -.01647 .00739 2.23 -.03096 -.00198
INVT -.00941 .00232 4.06 -.01395 -.00487
-----------------------------------------------------------------------------------------------------
Obtaining the variance (in Equation (7.19)) involves some complex calcula-
tions, see Table 7.1.
1
For example: “Would you be willing to pay $X? Yes/No. If Yes, would you be willing to pay $Z (where
Z>X)? Yes/No. If No, would you be willing to pay $Y (where Y<X)? Yes/No.”
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
352 Getting started
β
First note that, 2βk xc ¼ βk ð2βc xc Þ1 , this makes the use of the product rule to derive the gradient easier:
c
2 3
∂ βk ð2βc xc Þ1
6 7
ð2βc xc Þ1
" # " #
f0 ∂βk
6 7
1 6 7
rg βk ð2βc xc Þ ¼ ¼ 6 77 ¼ : ð1Þ
h0 6
6 ∂ β ð2β xc Þ1 7 βc ð2βc xc Þ2
4 k c 5
∂ðβc Þ
So that:
Then, multiplying the resulting row vector by the final column vector gives:
ð2βc xc Þ1 ½ð2βc xc Þ1 Varðβk Þ þ βk ð2βc xc Þ2 Covðβk ; βc Þ
βk ð2βc xc Þ2 ½ð2βc xc Þ1 Covðβk ; βc Þ þ βk ð2βc xc Þ2 Varðβc Þ ð4Þ
Collecting terms
βk
→Var ¼ ð2βc xc Þ2 ½Varðβk Þ βk ð2βc xc Þ1 Covðβk ; βc Þ
2βc xitc
βk ð2βc xc Þ3 ½Covðβk ; βc Þ þ βk ð2βc xc Þ1 Varðβc Þ ð5Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
353 Statistical inference
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
354 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
355 Statistical inference
--------------------------------------------------------------------------------------------------------
FIML Nested Multinomial Logit Model
Dependent variable RESP1
Log likelihood function -2486.23068
Restricted log likelihood -3621.05512
Chi squared [ 22](P= .000) 2269.64888
Significance level .00000
McFadden Pseudo R-squared .3133961
Estimation based on N = 1840, K = 22
Inf.Cr.AIC = 5016.5 AIC/N = 2.726
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
No coefficients -3621.0551 .3134 .3117
Constants only can be computed directly
Use NLOGIT ;...;RHS=ONE$
At start values -2487.3624 .0005-.0020
Response data are given as ind. choices
BHHH estimator used for asymp. variance
The model has 2 levels.
Random Utility Form 2:IVparms = Mb|l,Gl
Number of obs.= 1840, skipped 0 obs
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
RESP1| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
|Attributes in the Utility Functions (beta)
NLRASC| 2.50852*** .35399 7.09 .0000 1.81472 3.20232
COST| -.17977*** .01550 -11.60 .0000 -.21014 -.14940
INVT| -.04607*** .00314 -14.69 .0000 -.05221 -.03992
ACWT| -.05176*** .00627 -8.25 .0000 -.06406 -.03947
ACCBUSF| -.09067*** .03143 -2.89 .0039 -.15226 -.02907
EGGT| -.01076** .00434 -2.48 .0132 -.01927 -.00225
PTINC| -.00717*** .00193 -3.72 .0002 -.01095 -.00339
PTGEND| 1.27200*** .17781 7.15 .0000 .92349 1.62051
NLRINSDE| -.79922*** .30048 -2.66 .0078 -1.38814 -.21029
TNASC| 1.96138*** .31850 6.16 .0000 1.33713 2.58562
NHRINSDE| -.76401** .34238 -2.23 .0256 -1.43506 -.09297
NBWASC| 1.37009*** .34763 3.94 .0001 .68874 2.05144
WAITTB| -.07264*** .02386 -3.04 .0023 -.11941 -.02586
ACCTB| -.05855*** .00916 -6.39 .0000 -.07650 -.04059
BSASC| 1.74362*** .30317 5.75 .0000 1.14941 2.33782
BWASC| 1.64330*** .31035 5.30 .0000 1.03504 2.25157
CRCOST| -.10797*** .02752 -3.92 .0001 -.16190 -.05403
CRINVT| -.03105*** .00424 -7.33 .0000 -.03935 -.02274
CRPARK| -.01429** .00685 -2.09 .0370 -.02773 -.00086
CREGGT| -.04953*** .01592 -3.11 .0019 -.08073 -.01832
|IV parameters, RU2 form = mu(b|l),gamma(l)
PTNEW| 1.21849*** .13886 8.77 .0000 .94632 1.49066
ALLOLD| 1.05917*** .07644 13.86 .0000 .90935 1.20900
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
356 Getting started
K and R:
wald ; parameters = b ; labels = 20_c,ivpt,ivcar
; covariance = varb
; fn1 = ivpt-1 ; fn2 = ivcar - 1 ; k&r ; pts=500 $
-------------------------------------------------------------------------------------------------------
WALD procedure. Estimates and standard errors for nonlinear functions and
joint test of nonlinear restrictions.
Wald Statistic = 2.46781
Prob. from Chi-squared[ 2] = .29115
Krinsky-Robb method used with 500 draws
Functions are computed at means of variables
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
WaldFcns| Function Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
Fncn(1)| .21849 .14012 1.56 .1189 -.05614 .49312
Fncn(2)| .05917 .07731 .77 .4441 -.09236 .21070
Wald:
wald ; parameters = b ; labels = 20_c,ivptn,ivold
; covariance = varb
; fn1 = ivptn-1 ; fn2 = ivold - 1 ; pts=500 $
-------------------------------------------------------------------------------------------------------
WALD procedure. Estimates and standard errors for nonlinear functions and
joint test of nonlinear restrictions.
Wald Statistic = 2.51379
Prob. from Chi-squared[ 2] = .28454
Functions are computed at means of variables
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
WaldFcns| Function Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
Fncn(1)| .21849 .13886 1.57 .1156 -.05368 .49066
Fncn(2)| .05917 .07644 .77 .4389 -.09065 .20900
-----------+-----------------------------------------------------------------------------------------
A second example computes the WTP (as the value of travel time savings in $/
min.) for the car alternative as the ratio binvtcr/binvccr. The result of interest is
reproduced here from the nested logit model directly above. The mean
estimate is $0.288/min. with a standard error obtained using the K&R method
of $0.101/min. and a 95 percent confidence interval of $0.089/min. to $0.486/
min. With a z value of 2.84 we can conclude that the mean estimate is
statistically significantly different from zero:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
357 Statistical inference
-------------------------------------------------------------------------------------------------------
WALD procedure. Estimates and standard errors for nonlinear functions and
joint test of nonlinear restrictions.
Wald Statistic = 10.68424
Prob. from Chi-squared[ 3] = .01356
Krinsky-Robb method used with 500 draws
Functions are computed at means of variables
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
WaldFcns| Function Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
Fncn(1)| .21849* .12854 1.70 .0892 -.03344 .47041
Fncn(2)| .05917 .07147 .83 .4078 -.08092 .19926
WTP| .28755*** .10122 2.84 .0045 .08917 .48594
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
358 Getting started
3.78
3.02
2.27
Density
1.51
.76
.00
–.20 –.10 .00 .10 .20 .30 .40 .50 .60 .70
WTPD
Kernel density estimate for WTPD
Figure 7.1 Kernel plot of the WTP distribution using Krinsky–Robb derived standard errors
As an aside, one way to try and minimize this unpalatable result is to identify systematic
sources of influence on the parameters associated with the numerator and the denominator,
so that the connection between the attribute parameters for each respondent has some
behavioral sense as proxied by a “third party” influence such as a socio-economic
characteristic.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
359 Statistical inference
.0138
.0110
.0083
Density
.0055
.0028
.0000
–1500 –1000 –500 0 500
RTC
Figure 7.2 Kernel plot of the inverse of a cost parameter
create ; xtc=rnn(-.110632,.0847106)$
create ; rtc = 1/xtc $
kernel;rhs=Rtc $
----------------------------------------------------------------------------------------------------
WALD procedure. Estimates and standard errors for nonlinear functions and
joint test of nonlinear restrictions.
Wald Statistic = .00168
Prob. from Chi-squared[ 1] = .96729
Krinsky-Robb method used with 500 draws
Functions are computed at means of variables
-----------+----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
WaldFcns| Function Error z |z|>Z* Interval
-----------+----------------------------------------------------------------------------------------
WTP| .44859 10.93804 .04 .9673 -20.98958 21.88676
-----------+----------------------------------------------------------------------------------------
Based on delta method
-----------+----------------------------------------------------------------------------------------
WTP| .44859 .36378 1.23 .2175 -.26441 1.16158
-----------+----------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:21:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.009
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
No matter how much one tries to cover the major themes in which choice
analysts are interested, we find that there are topics left out that are often listed
as future inclusions. In this chapter we identify topics that often arise out of
conference debates, referees’ suggestions to improve a paper, and question
emails to the Limdep/logit list or directly to the authors.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
361 Matters that analysts inquire about
where S(i) is the share of people who choose i. If the model is correctly
specified, then P(i) = S(i), such that:
X
aðβÞ i2C
gðβjiÞPðiÞ: ð8:4Þ
since Σi2CL(i | β)= 1. That is, the average of the conditional distributions is
the unconditional distribution. Stated more directly, if you calculated the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
362 Getting started
conditional distribution for each person based on that person’s choices, and
then averaged the conditional distributions over all people in the population,
then you would get the unconditional distribution – provided only that the
model is correctly specified such that P(i) = S(i). If you do this exercise, and the
average of the conditionals is not the same as the unconditional, then it means
that the model is mis-specified.
Here is the intuition: f(β) is the density of β over all people in the popula-
tion. The population consists of sub-populations of people, where each sub-
population contains people who make the same choices. g(β | i) is the density
of β in the sub-population of people who choose i. When you take the density
in each sub-population, and aggregate it over all the sub-populations, you get
back the density in the population.
This is the conditional distribution that you would calculate for a person,
given their attributes, the characteristics of the choices they faced, and their
observed choices. The average of the conditional densities within the sub-
population with the same s and x is:
X
aðβjs; xÞ i2C
gðβji; s; xÞSðijs; xÞ: ð8:7Þ
where S(i | s, x) is the share of people with s and x who choose i. In a correctly
specified model, P(i | s, x) = S(i | s, x), such that, using the same steps as above:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
363 Matters that analysts inquire about
Other averages can also be calculated. The average of the conditional densities
within the sub-population of people with the same attributes s but different x
is:
ð X
aðβjsÞ x i2C
gðβji; s; xÞSðijs; xÞqðxÞdx ¼ f ðβjsÞ: ð8:9Þ
That is, the average of the conditional distributions within each demographic
group (i.e., attributes s) is the unconditional distribution within that group.
The average of the conditional densities over the entire population is:
¼ sf ðβjsÞmðsÞ ds ð8:11Þ
ð
¼ f ðβÞ; ð8:12Þ
where f(β) is the density of β in the entire population, aggregated over the
attributes of people.
Essentially, the conditional distributions provide no new information about
the population. The conditioning just breaks the population into sub-groups
and finds the distribution in each sub-group. But the sub-groups necessarily
aggregate back to the population.
When the aggregation is done over a sample (as opposed to the population,
as in the above derivations), then the sample average of conditionals might
not match the unconditional, because the sample does not capture the full
integration over the density of s and x, and the sample share choosing i need
not be exactly the population share. However, the difference just represents
sampling noise (and/or mis-specification, as discussed above). It does not
provide additional or alternative information about the distribution of β in the
population.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
364 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
365 Matters that analysts inquire about
the set of choice alternatives. RRmax denotes the original model specification
in which regret is judged against the best alternative for each attribute
separately. RRsum denotes the specification that defines regret as the max-
imum utility differences between the chosen alternative and all forgone alter-
natives that result in a higher utility on the attribute of interest. RRexp
represent the most recent “new regret specification”, based on the logarithm
function and all pairwise comparisons.
The RRexp form came about as a result of applications in a multiple choice
setting (see Chorus 2010) when it was found that the the max operators imply
a non-smooth likelihood function, which may create problems in deriving
marginal effects and elasticities. Consequently the original regret specification
(RRMax) was replaced by the RRexp new regret model specification (set out
in the following paragraphs and the empirical application in Nlogit in
Chapter 13):
XK h n 0 oi
n n
RRmax ¼ k¼1
max 0; βk x k x k ð8:13aÞ
X XK h n 0 oi
n n
RRsum ¼ k¼1
max 0; β k xk x k ð8:13bÞ
n0 ≠n2C
X XK h n 0 oi
n n
RRexp ¼ k¼1
1n 1 þ exp βk x k x k : ð8:13cÞ
n0 ≠n2C
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
366 Getting started
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
367 Matters that analysts inquire about
alternative with each of the other alternatives in the choice set.1 The level of
binary regret associated with comparing the considered alternative i with
another alternative j equals the sum of the regrets that are associated with
comparing the two alternatives in terms of each of their M attributes. This
attribute level regret in turn is formulated as follows:
Rm
i↔j ¼ ln½1 þ expðβm :fxjm xim gÞ: ð8:14Þ
This formulation implies that regret is close to zero when alternative j performs
(much) worse than i in terms of attribute m, and that it grows as an approxi-
mately linear function of the difference in attribute values where i performs
worse than j in terms of attribute m. In that case, the estimable parameter βm
(for which also the sign is also estimated) gives the approximation of the slope
of the regret-function for attribute m. See Figure 8.1 for a visualization.
In combination, this implies the following formulation for systematic
X X
regret: Ri ¼ ln 1 þ exp½βm ⋅ðxjm xim Þ . Acknowledging that
j≠i m¼1::M
minimization of random regret is mathematically equivalent to maximizing
the negative of random regret, choice probabilities may be derived using a
variant of the multinomial logit-formulation2: the choice probability asso-
X
ciated with alternative i equals Pi ¼ expðRi Þ= expðRj Þ.
j¼1::J
The parameters estimated within a RRM framework, have a different
meaning than those estimated within a RUM framework. The RUM para-
meters represent the contribution of an attribute to an alternative’s utility,
whereas the RRM parameters represent the potential contribution of an
attribute to the regret associated with an alternative. An attribute’s actual
contribution to regret depends on whether an alternative performs better or
worse on the attribute than the alternative it is compared with. As a result, in
contrast with linear additive utilitarian choice models, the RRM model implies
semi-compensatory behavior. This follows from the convexity of the regret
1
This heuristic across alternatives has similar behavioral properties to the parameter-transfer rule
advocated by Hensher and Layton (2010) within an alternative. Furthermore, the symmetrical form
devised initially by Quiggin with regret and rejoice has similar properties to the best–worse (BW)
processing rule that focuses on contrasts between alternatives (see Marley and Louviere 2005).
2
Note that, as has been formally shown in Chorus (2010), the two models (RRM and RUM) give identical
results when choice sets are binary. A referee asked whether the findings would be sensitive to six alternatives
in contrast to the three alternatives used herein. We are unable to provide a definitive response since it will
depend on a number of considerations, including whether the differences in attribute levels between pairs of
alternatives are likely to vary significantly or not. This is an area of relevance in future research.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
368 Getting started
25
20
15
m
R
i j
10
0
0
8
6
4
2
0
–8
–6
–4
–2
0
2
4
6
8
10
12
14
16
18
20
–2
–1
–1
–1
–1
–1
(xjm – xim)
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
369 Matters that analysts inquire about
We formally derive the formulae for direct elasticity that has not been derived or
implemented in previous papers on RRM. The definition of Ri is:
X XM
Ri ¼ j≠i m¼1
lnf1 þ exp½βm ðxjm xim Þg: ð8:15Þ
To simplify Equation (8.15) for ease of manipulation, we add back and then
subtract the i term in the outer sum. This gives us Equation (8.16):
nXJ XM o
Ri ¼ j¼1 m¼1
lnf1 þ exp½β m ðxjm x im Þg M ln 2: ð8:16Þ
By definition:
exp½Ri
Pi ¼ XJ : ð8:17Þ
j¼1
exp½R j
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
370 Getting started
XJ
¼ βm j¼1
qðj; i; mÞ;
∂ ln Pi hXJ i
¼ βm P j qðj; i; mÞ qðl; i; mÞ : ð8:20Þ
∂xlm j¼1
Combining terms, the first part of Equation (8.17) is common to both l = i (own
elasticities) and l ≠ i (cross-elasticities), while the second term in Equation
(8.17) involves either the second or the first term in Equation (8.18), respec-
tively. The elasticity, ∂lnPi/∂lnxlm, is then a simple multiplication of Equation
(8.17) or Equation (8.19) by xlm. One oddity, unfortunately, is that the sign
results that hold for the MNL are not ensured here. The elasticities appear to be
reasonably well behaved; however, some peculiar sign reversals can occur.
8.3 Endogeneity
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
371 Matters that analysts inquire about
for, then one or more variables may end up in both V and ε and hence both
terms are no longer orthogonal. For example, if there is a price/quality trade-
off, and only price appears in V, then the interaction between price and quality
resides in the ε. Then price is in both V and ε and the two are no longer
independent. A useful paper on this topic is Petrin and Train, http://elsa.
berkeley.edu/~train/petrintrain.pdf. This issue can occur for any variable.
Some analysts (and journal referees) often have a broader definition of
endogeneity. Take the example of mode choice and crowding on public
transport. Some analysts interpret endogeneity as relating to choices – that
is, the issue of crowding occurs because people choose to travel, and one is
modeling mode choice (the dependent variable) as a function of choice
(leading to crowding), so that choices are on both the LHS and RHS of the
equation. In our opinion, this is an invalid inference. Yes, crowding occurs due
to people’s choices; however, you are modeling respondent preferences in
response to other people’s choices, not their own. So while the system may
have an inherent endogeneity concern, a particular individual’s preferences
are formed on the basis of crowding being exogenous.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
372 Getting started
Not only is there a distinction between the form that elasticities may take,
there exists also a distinction between how one may calculate the elasticity
for an attribute or SEC. The two main methods of calculation are the arc
elasticity method and the point elasticity method. We will ignore the
differences between the two estimation methods for the present and note
that the default Nlogit output (see Chapter 13) is a point elasticity (except
where a dummy variable is used, in which case an arc elasticity is provided,
based on the average of the before and after probabilities and attribute
levels). We discuss arc elasticities in Chapter 13 and how they can be derived
for any measurement unit (e.g., ratio or ordinal) using Nlogit’s simulation
capability.
The direct point elasticity for the MNL model is given as Equation (8.22),
given the definition of the partial effect (or derivative of probability with
respect to derivative of X):
0 1
∂Pi ∂ B expðViq Þ C
¼ @X
∂Xik ∂Xik
A
expðVmq Þ
m
dV
X ∂Viq
expðViq Þ dXiqik expðVmq Þ expðViq ÞexpðViq Þ
m
∂Xik
¼ X 2
expðVmq Þ
m
∂Viq
¼ Piq ð1 Piq Þ ð8:21Þ
∂Xik
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
373 Matters that analysts inquire about
P
EXiqikq ¼ βik Xikq ð1 Piq Þ ð8:23Þ
Examination of the subscripts used within Equation (8.24) will reveal that
the cross-point elasticity is calculated for alternative j independent of alter-
native i. As such, the cross-point elasticities with respect to a variable associated
with alternative j will be the same for all j, j ≠ i and, as a consequence, a choice
model estimated using MNL will display uniform cross-elasticities across all j, j
≠ i. This property relates to the IID assumption underlying the MNL model.
More advanced models (such as those described in later chapters) which relax
the IID assumption use different formulae to establish elasticities, and as such
allow for non-uniform cross-elasticities to be estimated. Equations (8.22) and
(8.24) yield elasticities for each individual decision maker.
To calculate sample elasticities (noting that the MNL choice model is
estimated on sample data), the analyst may either (1) utilize the sample
average Xik and average estimated Pi for the direct point elasticity, and Xjk
and average estimated Pj for the direct cross-elasticities, or (2) calculate the
elasticity for each individual decision maker and weight each individual
elasticity by the decision maker’s choice probability associated with a specific
alternative (this last method is known as probability weighted sample enu-
meration (PWSE), and uses ;pwt in Nlogit). Alternative aggregation method
(3), known as naive pooling, is to calculate the elasticity for each individual
decision maker but not weight each individual elasticity by the decision
maker’s associated choice probability.
Louviere et al. (2000) warn against using aggregation (1) and (3). They
reject method (1) when using logit choice models due to the non-linear nature
of such models, which means that the estimated logit function need not pass
through the point defined by the sample averages. Indeed, they report that
this method of obtaining aggregate elasticities may result in errors of up to
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
374 Getting started
The use of PWSE has important ramifications for the direct cross-elasticities
estimated. Because uniform cross-elasticities are observed as a result of the IID
assumption when calculated for individual decision makers, the use of sample
enumeration, which weights each individual decision maker differently, will
produce non-uniform cross-elasticities. Naive pooling, which does not weight
each individual elasticity by the decision maker’s associated choice probability, will
however, display uniform cross-elasticities. Analysts should not be concerned that
the sample cross-elasticities for an attribute differ between pairs of alternatives; the
individual-level cross-elasticities are strictly identical for the IID model.
Independent of how the elasticities are calculated, the resulting values are
interpreted in exactly the same manner. For direct elasticities, we interpret the
calculated elasticity as the percentage change of the choice probability for
alternative i given a 1 percent change in Xik. For cross-elasticities, we interpret
the calculated elasticity as the percentage change of the choice probability for
alternative j given a 1 percent change in Xik. If the percentage change in the
probability for either the direct or cross-elasticity is observed to be greater
than 1, that elasticity is said to be relatively elastic. If the percentage change in
the probability for either the direct or cross-elasticity is observed to be less
than 1, that elasticity is said to be relatively inelastic. If a 1 percent change in a
choice probability is observed given a 1 percent change in Xik, then the
elasticity is described as being unit elastic. Table 8.1 summarizes each sce-
nario, including the impact on revenue (or cost) given that Xik is the price of
alternative i, noting that e ¼ EXP jkq
i
.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
375 Matters that analysts inquire about
Table 8.1 Relationship between elasticity of demand, change in price and revenue
Absolute
value of
elasticity Price Price
observed Direct elasticity Cross-elasticity increase decrease Diagram
an ∞ percent an ∞ percent
O X
decrease in Pi increase in Pj
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
376 Getting started
P ∂Piq
MXiqikq ¼ : ð8:26Þ
∂Xikq
It can be shown that at the level of the individual decision maker, Equation
(8.26) is equivalent to Equation (8.27) when calculating a direct marginal
effect:
P ∂Piq
MXiqikq ¼ ¼ ½1 Piq βk : ð8:27Þ
∂Xikq
It can also be shown that for cross-marginal effects, Equation (8.26) becomes
Equation (8.28) at the level of the individual decision maker:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
377 Matters that analysts inquire about
P
MXiqjkq ¼ βjk Pjq : ð8:28Þ
As an aside, to demonstrate why the marginal effect for a categorical variable is calculated
differently from that of a continuous variable, marginal effects are mathematically
equivalent to the slopes of lines tangent to the cumulative probability curve for the variable
for which the marginal effect is being calculated, as taken at each distinct value of that
variable (Powers and Xie 2000). We show this in Figure 8.2 for an individual decision
maker. Given that the tangent can be taken at any point along the cumulative distribution
for the variable, xi, our earlier discussion with regard to the use of the sample means,
sample enumeration or naive pooling is of particular importance, as it is these approaches
which dictate where on the cumulative distribution curve that the tangent (i.e., the
marginal effect) is calculated.
For categorical variables, a cumulative distribution function curve may be drawn for each
level that the variable of concern may take. We show this for a dummy coded (0, 1) variable
in Figure 8.3, in which two curves are present. As with continuous level data, the marginal
effects as given by the tangents to the cumulative distribution functions are not constant over
the range of the variable xi. However as suggested by Figure 8.3, the maximum difference
between the cumulative distribution function for the two levels occurs at Prob(Y=1) = 0.5. It
is at this point that many researchers calculate the marginal effects (i.e., the tangents to the
curves).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
378 Getting started
Prob 1
Piq
Xi
Xikq
0.5
0
–8 8
Xi
Figure 8.2 Marginal effects as the slopes of the Tangent lines to the cumulative probability curve
Prop
1
D=1
0.5
D=0
0
0 1 Xi
Figure 8.3 Marginal effects for a categorical (dummy coded) variable
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
379 Matters that analysts inquire about
Cost ($)
$ P1
∆xk
∆Cost
$ P2
X1 X2 xk
Using the trade-off between travel time and travel cost as an example (see
Figure 8.4), the marginal WTP as the measure of value of time savings (VTTS)
describes how much the cost attribute, xc, would be required to change given a
1 unit change in an attribute, xk, such that the change in total utility will be
zero. The marginal WTP is calculated by taking the ratio of the derivatives of
both the attribute of interest and cost, which in the case of a linear in the
attributes indirect utility specification is given by Equation (8.29):
∂Vnsj
Δxk ∂x β
WTPk ¼ ¼ ∂Vnsjk ¼ k : ð8:29Þ
Δxc βc
∂xc
where Vnsj is the utility for respondent n in choice task s for alternative j, and
βk and βc are the marginal (dis)utilities for the attribute of interest and cost,
respectively.
Sometimes an attribute is expressed in natural logarithmic form. When this
occurs this is an “additional non-linearity” that requires a different treatment
when taking derivatives. For example, if xk is defined as ln(xk), then Equation
(8.29) becomes Equation (8.30):
∂
∂xk βk lnðxk Þ βk x1k βk
WTP ¼ ∂
¼ ¼ : ð8:30Þ
∂xc βc xc βc βc xk
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
380 Getting started
∂
∂xk βk xk xl βk xl
WTP ¼ ∂
¼ : ð8:31Þ
∂xc βc xc
βc
V ¼ . . . þ β1 xk þ β2 xk xc þ β3 xc2 þ . . . :
^ þβ
The mean WTP is given by ðβ ^ x c Þ=ðβ
^ x k þ 2β
^ x c Þ; while the variance
1 2 2 3
can be computed as:
1 T 0
∂V ∂V
0 1
B ∂β1 C B ∂β1 C
B C B C
B ∂V C
B C
B ∂V C
B C
varðWTPk Þ ¼ B
B C ⋅Ω⋅B
B C
B ∂β2 C ∂β2 C
C
B C B C
@ ∂V A B
@ ∂V A
C
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
381 Matters that analysts inquire about
In Chapter 13, we use the Wald procedures (from Chapter 7) to illustrate how
the analyst can obtain the empirical estimates of the mean, the standard error,
and the confidence intervals.
3
The significance of an ASC related to an unlabeled alternative simply implies that after controlling for the
effects of the modeled attributes, this alternative has been chosen more or less frequently than the base
alternative. It is possible that this might be the case because the alternative is close to the reference
alternative, or that culturally, those undertaking the experiment tend to read left to right. Failure to
estimate an ASC would in this case correlate the alternative order effect into the other estimated
parameters, possibly distorting the model results.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
382 Getting started
For SC alternative j (where j≠r), the observed utility function is given by:
Vj ;new ¼ δj þ δTollðjÞ þ δFCðjÞ þ βFFðincÞ maxðFFj FFr ;0Þþ βFFðdecÞ maxðFFr FFj ;0Þ
This specification is obtained through taking differences for the four attributes
relative to the reference alternative, where separate coefficients are estimated
for increases (inc) and decreases (dec), hence allowing for asymmetrical
responses. The resulting model structure is still very easy to estimate and
also apply, which is crucial for practical large-scale modeling analyses.
A point that deserves some attention before describing the results of the
modeling analysis is the way in which the models deal with the repeated choice
nature of the data. Not accounting for the possible correlation between the
behavior of a given respondent across the individual choice situations can
potentially have a significant effect on model results, especially in terms of
biased standard errors. In an analysis looking at differences between the
responses to gains and losses, issues with over- or under-estimated standard
errors can clearly lead to misleading conclusions.
Rather than relying on the use of a lagged response formulation (cf., Train
2003) or a jackknife correction approach (cf., Cirillo et al. 2000), we can make
use of an error components specification of the mixed logit (MMNL) model
(see Section 15.8 of Chapter 15)4 to account for individual-specific correlation.
With Vn,t,RP,base, Vn,t,SP1,base, and Vn,t,SP2,base giving the base utilities for the
three alternatives5 for respondent n and choice situation t, the final utility
function (for respondent n and choice situation t) is given by Equation (8.35)
for the reference alternative and two stated preference alternatives:
4
Our method differs from the commonly used approach of capturing serial correlation with a random
coefficients formulation, where tastes are assumed to vary across respondents but remain constant across
observations for the same respondent. This approach not only makes the considerable assumption of an
absence of inter-observational variation (cf. Hess and Rose 2007), but the results are potentially also
affected by confounding between serial correlation and random taste heterogeneity.
5
Independently of which specification is used, i.e. models based on Equation (8.33).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
383 Matters that analysts inquire about
where εn,k,RP, εn,k,SP1, and εn,k,SP2 are the IID draws from a Type I Extreme
value distribution, and ξn,RP, ξn,SP1 and ξn,SP2 are draws from three indepen-
dent Normal variates with a zero mean and a standard deviation of 1. To allow
for correlation across replications for the same individual, the integration over
these latter three variates is carried out at the respondent level rather than the
individual observation level. However, the fact that independent N(0,1) draws
are used for different alternatives (i.e., ξn,RP, ξn,SP1 and ξn,SP2) means that the
correlation does not extend to correlation across alternatives but is restricted
to correlation across replications for the same individual and a given alter-
native. Finally, the fact that the separate error components are distributed
identically means that the model remains homoskedastic.
Letting jn,t refer to the alternative chosen by respondent n in choice situa-
tion t (with t = 1,. . .,T), the contribution of respondent n to the log-likelihood
(LL) function is then given by:
0
ð YT
LLn ¼ ln@ Pðjn;t jVn;t;RP;base ; Vn;t;SP1;base ; Vn;t;SP2;base ;
n t¼1
! 1
n;RP ; n;SP1 ; n;SP2 ; θÞ f ðn Þdn A; ð8:36Þ
where ξ groups together ξn,RP, ξn,SP1, and ξn,SP2, and where f(ξn) refers to the
joint distribution of the elements in ξ, with a diagonal covariance matrix.
Using the example for VTTS, in Table 8.2 we summarize the trade-offs
between the various estimated parameters, giving the monetary values of
changes in travel time, as well as the WTP a bonus in return for avoiding
congestion and road tolls. These trade-offs were calculated separately for the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
384 Getting started
Notes:
1
Numerator of trade-off not significant beyond 4 percent level of confidence.
2
Numerator of trade-off not significant beyond 93 percent level of confidence.
travel cost and road toll coefficient, where the low level of differences needs to
be recognized when comparing the results. The main differences between
the two sets of trade-offs and across the two population segments arise in the
greater willingness by commuters to accept increases in road tolls, and the
higher sensitivity to slowed-down time for commuters.
In an asymmetrical model, the calculation is slightly different, as we now
have separate parameters for increases and decreases, suggesting different
possible combinations of VTTS calculations. As an example, the willingness
to accept increases in travel cost in return for reductions in free flow time
would be given by −βFF(dec)/βC(inc). This approach was used to calculate
WTP indicators for the two components of travel time with the two separate
cost components, where trade-offs were also calculated for δFC and δT. The
results of these calculations are summarized in Table 8.3.
In comparison with the results for the base model, there are some signifi-
cant differences. The willingness to accept (WTA) increases in travel cost in
return for reductions in free flow time decreases by 25 percent and 45 percent
for non-commuters and commuters, respectively. Even more significant
decreases (47 percent and 60 percent) are observed when looking at the
WTA increases in road tolls. While the WTA increases in travel cost in return
for reductions in slowed-down time stays almost constant for non-
commuters, it decreases by 17 percent for commuters (when compared to
the base model). When using road tolls instead of travel cost, there are
decreases in both population segments, by 26 percent and 39 percent, respec-
tively. These differences are yet another indication of the effects of allowing for
asymmetrical response rates.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:22:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.010
Cambridge Books Online © Cambridge University Press, 2015
Part II
Software and data
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:31:28 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:31:28 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
Programming today is a race between software engineers striving to build bigger and
better idiot-proof programs, and the Universe trying to produce bigger and better
idiots. So far, the Universe is winning.
(Cook, The Wizardry Compiled 1989)
9.1 Introduction
This book uses the computer program, Nlogit, which will enable you to
explore the models that are discussed in the book using your own computer.
Nlogit is a a major commercial package published by Econometric Software,
Inc. (ESI), which is used worldwide by discrete choice modelers in many
disciplines such as Transport, Economics, Agriculture, Health, Marketing,
Statistics, and all the social sciences (you might want to visit the website,
www.NLOGIT.com). This chapter will describe how to install and use this
program on your computer.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:29:01 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.012
Cambridge Books Online © Cambridge University Press, 2015
388 Software and data
non-linear random parameters capability in which users write out their own
non-linear in parameter and variables functional forms for estimation as a
logit model (see Chapter 19), and, in addition, some tools for analyzing
discrete choice models such as the model simulator described elsewhere in
this book.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:29:01 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.012
Cambridge Books Online © Cambridge University Press, 2015
389 Nlogit for applied choice analysis
Figure 9.2 File Menu on Main Desktop and Open Project. . .Explorer
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:29:01 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.012
Cambridge Books Online © Cambridge University Press, 2015
390 Software and data
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:29:01 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.012
Cambridge Books Online © Cambridge University Press, 2015
391 Nlogit for applied choice analysis
Figure 9.4 Dialog for Exiting Nlogit and Saving the Project File
Once you have started the program and input your data, you are ready to
analyze them. The functions you will perform with Nlogit will include many
options such as the following:
a. Compute new variables or transform existing variables.
b. Set the sample to use particular sets of observations in your analyses.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:29:01 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.012
Cambridge Books Online © Cambridge University Press, 2015
392 Software and data
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:29:01 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.012
Cambridge Books Online © Cambridge University Press, 2015
393 Nlogit for applied choice analysis
drop, etc. Use File, New then OK (assuming Text/Command Document is high-
lighted as in Figure 9.5) to open the text editing screen. This will appear as in
Figure 9.6. You are now ready to type your instructions. Figure 9.7 shows some
examples (the format of the instructions is discussed below and elsewhere in the
book). Typing an instruction in the editor is the first step in getting your
command carried out. You must then “submit” the instruction to the program.
This is done in two ways, both using the “Go” button that is marked in Figure 9.7.
To submit a single line of text to the program for execution, put the blinking text
cursor on that line, then with your mouse, press the Go button. The instruction
will then be executed (assuming it has no errors in it). To submit more than one
line at the same time (or one line), highlight the lines as you would in any word
processor, then, again, press “Go.”
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:29:01 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.012
Cambridge Books Online © Cambridge University Press, 2015
394 Software and data
or capital letters, and you may place spaces anywhere you wish. An instruction
may use as many lines as desired. The general format of a command is:
VERB; other information . . .$
The command always begins with a verb followed by a semicolon and always
ends with a dollar sign ($). Commands often give several pieces of informa-
tion. These are separated by semicolons. For example, when you wish to
compute a regression, you must tell Nlogit what the dependent (LHS) and
independent (RHS) variables are. You might do this as follows:
REGRESS ; LHS = y ; Rhs = One,X $
The order of command parts generally does not matter either – the RHS
variables could appear first. The other element of commands that you need at
this point is the naming convention. Nlogit operates on variables in your data
set. Variables all have names, of course. In Nlogit, variable names must have
one to eight characters, must begin with a letter, and may use only letters,
digits, and the underscore character.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:29:01 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.012
Cambridge Books Online © Cambridge University Press, 2015
395 Nlogit for applied choice analysis
9.5.3 Commands
Nlogit recognizes hundreds of different commands, but for your purposes,
you will only need a small number of them – they are differentiated by the
verbs. The functions of the program that you will use (once your data are read
in and ready to analyze) are as follows. Note that the actual command is
written in boldface below. Comments appear after the $ character. If you
actually include these in your commands, the comments that follow the $ are
ignored by the program.
(1) Data analysis
DSTATS; RHS = the list of variables $ For descriptive statistics:
REGRESS; Lhs = dependent variable; Rhs = independent variable $
Note: ONE is the term used for the constant term in a regression. Nlogit
does not put one in automatically; you must request that a constant be
estimated.
LOGIT; Lhs = variable; Rhs = variables$ For a binomial logit model.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:29:01 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.012
Cambridge Books Online © Cambridge University Press, 2015
396 Software and data
As an aside, any text after a question mark (?) is not read by Nlogit. This allows the analyst to
make comments in the command editor that may be useful in future sessions. Also note that
spacing of characters in the commands typed is not essential (i.e., the words may run into
each other); however, the use of spacing often makes it easier to follow what the command
is, and will help in locating errors.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:29:01 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.012
Cambridge Books Online © Cambridge University Press, 2015
397 Nlogit for applied choice analysis
As an aside, the .SAV file extension, the file extension name used by SPSS, was also
historically used by Nlogit. Nlogit no longer uses this file extension. The two program files are
not compatible; hence those attempting to read data from SPSS must first save the data into
another format (such as .txt, .xls, etc.) before reading it into Nlogit. We do note an extremely
useful and not very expense utility program, StatTransfer, that can be used to convert native
“save” files from dozens of programs into the format of the others. You can use StatTransfer
to convert an SPSS .SAV file to an Nlogit .LPJ file – the operation takes a few seconds.
The Project File Box contains several folders that will allow the analyst to
access various useful screens. Double clicking on the Data folder, for example,
will open up several other sub-folders, one of which is a Folder titled Variables
(Figure 9.7). Double clicking on the Variables folder will allow the analyst to
view the names of all the variables in the data set (including any that have been
created since reading in the data). By double clicking on any one of the
variable names, the Data Editor will be opened displaying up to 5,000 rows
of the data for all of the variables, in a spreadsheet format. (Nlogit makes no
claims to be a spreadsheet program. If spreadsheet capabilities are required,
the authors tend to use other programs such as Microsoft Excel or SPSS and
import the data into Nlogit.) Other useful Nlogit functions may be accessed
via the Project File Box, such as the scientific calculator.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:29:01 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.012
Cambridge Books Online © Cambridge University Press, 2015
398 Software and data
command should look like and perhaps help in locating the original error.
Nevertheless, as noted earlier, the command toolbars do not always offer the
full range of outputs that may be generated using the Text/command editor
and, as such, this error locating strategy may not always be possible.
A tip, You can “copy” commands that are echoed by the dialog boxes in the output
window, and then paste them into your Text Editor. This is useful if you want to estimate a
basic model using the dialog box; then add features to it, which will be easier in the Text
Editor format.
The full Nlogit package includes many features not listed above. These include
nearly 200 varieties of regression models, models for count data, discrete
choice models such as probit and logit, ordered discrete data models, limited
dependent variable models, panel data models, and so on. There are also a
large variety of other features for describing and manipulating data, writing
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:29:01 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.012
Cambridge Books Online © Cambridge University Press, 2015
399 Nlogit for applied choice analysis
programs, computing functions and partial effects, graphics, etc. You can
learn more about these on the website for the program, www.NLOGIT.com.
As noted earlier, Nlogit is an expanded version of Limdep. There are
signatures in the program that will indicate this to you, if you have any
doubt. You will be able to see the name Nlogit on the program icons, and
find some additional data in the Help About box after you start the program.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:29:01 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.012
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
400
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
401 Data set up for Nlogit
(As a quick aside, the reader does not have to use the attribute names we have use below. In
Nlogit names are limited to eight characters and must begin with an alpha code, but are
otherwise unrestricted.)
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
402 Software and data
Car 01 1 4 1 1 0 14
Bus 01 2 4 0 1 0 12
Train 01 3 4 0 −1 −1 12
Plane 01 4 4 0 0 1 2
Car 01 1 4 0 0 1 10
Bus 01 2 4 1 0 1 14
Train 01 3 4 0 0 1 12
Plane 01 4 4 0 −1 −1 1.5
Car 02 1 4 0 0 1 12
Bus 02 2 4 0 1 0 14
Train 02 3 4 0 −1 −1 12
Plane 02 4 4 1 1 0 1.5
Car 02 1 4 0 −1 −1 12
Bus 02 2 4 0 0 1 12
Train 02 3 4 1 −1 −1 10
Plane 02 4 4 0 −1 −1 1.5
Car 03 1 4 0 −1 −1 12
Bus 03 2 4 1 1 0 14
Train 03 3 4 0 0 1 14
Plane 03 4 4 0 0 1 2
Car 03 1 4 1 1 0 14
Bus 03 2 4 0 0 1 10
Train 03 3 4 0 1 0 14
Plane 03 4 4 0 −1 −1 1.5
speed rail proposal, called the “Very Fast Train” (VFT) between Sydney and
Melbourne. If we retain the fixed choice set size of four alternatives then within
each choice set, one alternative will have to fall out. In Table 10.2 the first
decision maker was presented with a choice set which consisted of a choice
between travel using a car, a bus, a plane, or a VFT. The second choice set for
this decision maker consisted of the alternatives, car, bus, train, and plane.
The choice set size does not have to be a fixed size. The variable cset is
designed to inform Nlogit of the number of alternatives within a particular
choice set. In both Tables 10.1 and 10.2 the choice set sizes were fixed at four
alternatives. With revealed preference (RP) data, some alternatives may not be
present at a particular physical distribution point at the time of purchase
(choice). The stated preference (SP) equivalent is to use availability designs
(see Louviere et al. 2000, Chapter 5). In either case, the number of alternatives
present varies across choice sets. In Table 10.3 the first choice set has only
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
403 Data set up for Nlogit
Car 01 1 4 1 1 0 14
Bus 01 2 4 0 1 0 12
Plane 01 4 4 0 0 1 2
VFT 01 5 4 0 1 0 8
Car 01 1 4 0 0 1 10
Bus 01 2 4 1 0 1 14
Train 01 3 4 0 0 1 12
Plane 01 4 4 0 −1 −1 1.5
Car 01 1 3 1 1 0 14
Bus 01 2 3 0 1 0 12
VFT 01 5 3 0 1 0 8
Car 01 1 5 0 0 1 10
Bus 01 2 5 1 0 1 14
Train 01 3 5 0 0 1 12
Plane 01 4 5 0 −1 −1 1.5
VFT 01 5 5 0 −1 −1 6
three of the five alternatives present, while the second choice set has all five
alternatives present. The variable cset, which is repeated in each row of data in
the choice set, gives the number of choices in the choice set. (In general, a
variable such as cset is needed only when choice sets have differing numbers of
choices. If the choice set always has the same number of choices, this will be
indicated to the program in a different way.)
The choice variable indicates which alternative within a choice set was
chosen. A “1” indicates that an alternative was selected, while a “0” indicates
that it was not. As such, the sum of the choice variable should equal 1 within
each choice set and within an individual sum to the number of choice sets
given to that individual. Across individuals, this variable should sum to the
total number of choice sets. Returning to Table 10.1, decision maker one
chooses alternative one, car, in the first choice set and alternative two, bus, in
the second. Decision maker two chose plane and train, respectively.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
404 Software and data
As an aside, for situations where every observation has the exact same alternatives (and
listed in the same order), it is not necessary to define alti and cset. Nlogit will count the
number of alternatives based on the names assigned to the choice alternatives in the
Nlogit input command syntax and assume that each observation has these alternatives
and, most importantly, that the alternatives are in the same order in each decision maker’s
choice sets.
The last three variables in our mock data set require some explanation.
Taking the case of comfort, we began with only one comfort attribute, but in
our data set we have two comfort variables. Comfort, being a qualitative
attribute, requires that words rather than numbers be attached as descriptors
at the time of survey. For analytical purposes, we are required to numerically
code these word descriptors. One possible way of coding qualitative data is to
attach a unique numeric value for each level of the attribute within one
variable. Thus, assuming three levels of comfort (low, medium, and high),
we could create a single variable (call it comfort) such that low = 0, medium = 1
and high = 2 (note that any unique values could have been used). Taking this
coding structure, for decision maker one, Table 10.1 becomes Table 10.4.
The reason we do not code qualitative (or any classification) data in the
manner suggested by Table 10.4 is simple. The use of such a coding structure
unnecessarily ascribes a linear relationship to the effects of the levels of the
attribute. That is, at the time of modeling, we will derive a single parameter
associated with the attribute comfort. Note that each alternative will have its
own β parameter if we allow for an alternative-specific model specification (as
discussed in Chapter 3). This is a problem that led us to effects and dummy
coding in Chapter 3.
Car 01 1 4 1 0 14
Bus 01 2 4 0 0 12
Train 01 3 4 0 2 12
Plane 01 4 4 0 1 2
Car 01 1 4 0 1 10
Bus 01 2 4 0 1 14
Train 01 3 4 0 1 12
Plane 01 4 4 1 2 1.5
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
405 Data set up for Nlogit
We did not effects code the travel time attribute. This is not to suggest that
we could not have effects or dummy coded the variable (by partitioning it into
a series of ranges). Indeed, to test for non-linear effects over the range of the
attribute, it is sometimes necessary to do so. Nevertheless doing so at this stage
will not add to our general discussion, and hence we will assume linearity over
the ranges of the travel time attributes.
Thus far we have said little about socio-demographic characteristics
(SECs). The SECs of a decision maker are invariant across decisions
provided that there is not a significant time lapse involved in the decision
making process. As such, when we enter socio-demographic data, the levels
of the data are constant for an individual (but vary across individuals). For
our example, let us assume that we have collected data on the age of each
decision maker. We show how this age variable is entered into Nlogit in
Table 10.5. Other socio-demographic characteristics would be entered in a
similar fashion.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
406 Software and data
Car 01 1 4 1 1 0 14 40
Bus 01 2 4 0 1 0 12 40
Train 01 3 4 0 −1 −1 12 40
Plane 01 4 4 0 0 1 2 40
Car 01 1 4 0 0 1 10 40
Bus 01 2 4 1 0 1 14 40
Train 01 3 4 0 0 1 12 40
Plane 01 4 4 0 −1 −1 1.5 40
Car 02 1 4 0 0 1 12 32
Bus 02 2 4 0 1 0 14 32
Train 02 3 4 0 −1 −1 12 32
Plane 02 4 4 1 1 0 1.5 32
Car 02 1 4 0 −1 −1 12 32
Bus 02 2 4 0 0 1 12 32
Train 02 3 4 1 −1 −1 10 32
Plane 02 4 4 0 −1 −1 1.5 32
Car 03 1 4 0 −1 −1 12 35
Bus 03 2 4 1 1 0 14 35
Train 03 3 4 0 0 1 14 35
Plane 03 4 4 0 0 1 2 35
Car 03 1 4 1 1 0 14 35
Bus 03 2 4 0 0 1 10 35
Train 03 3 4 0 1 0 14 35
Plane 03 4 4 0 −1 −1 1.5 35
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
407 Data set up for Nlogit
for the attribute levels of the non-chosen alternatives for those who did not
choose them. Thus for each individual, while we retain the information on the
individual’s chosen alternative, we generate data on the non-chosen alternatives
by using the averages of the non-chosen alternative’s attribute levels as chosen
by the other decision makers. It is worth noting that there is a risk that these
averages promote a better set of attribute levels than what would be the levels if
we knew the actual levels available to the person who has the alternative as the
non-chosen. Indeed we note that such a strategy certainly reduces the variance
of the attribute level distribution in the sampled population.
The second method employs a similar approach. We sample across a dis-
tribution of decision makers such that we have a proportion of decision makers
observed to have chosen each of the alternatives. Rather than taking the average
of the observed attribute levels for each alternative and substituting these as the
attribute levels for the non-chosen alternatives, as in method one, we take the
observed levels unamended and distribute these as the attribute levels for those
who did not choose those alternatives. This distribution can be done randomly,
or alternatively we may attempt to match the non-chosen alternatives attribute
levels to specific decision makers through a matching of socio-demographic
characteristics. For transport studies, a matching of trip origin and destination is
also very useful. The benefit of this approach is the preservation of variability in
the attribute level distribution.
As an aside, both methods are far from desirable. We would prefer to capture information on
the attribute levels of non-chosen alternatives as they actually exist for the decision maker.
While the above represent strategies to estimate what these levels might actually be, it is
likely that the actual attribute levels for the non-chosen alternatives are somewhat different.
A better strategy may be to gain information from the decision maker based on their
perception of the attribute levels for the non-chosen alternative. This approach is likely to
produce data that will require much data cleansing. However, it is more likely that decision
makers base their choices on their perceptions of what attribute levels an alternative takes
rather than the actual levels (or some view from others) that that alternative takes. Thus it
may be argued that the capturing of perceptual data will produce more realistic behavioral
models. This is the third “solution” to the problem of capturing information on the non-
chosen alternative.
The fourth solution, similar to the first two, is to synthesize the data. This
requires expert knowledge as to how the data are synthesized. The norm is to
use known information such as travel distances or other socio-demographic
characteristics and to condition the synthesized data on these. But, like the
first two approaches, synthesizing the data leaves one open to the criticism
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
408 Software and data
that the created data may not represent the alternatives actually faced by
decision makers and as such the estimation process will be tainted. If such
synthesized data can be developed from perceptual maps associated with non-
chosen alternatives, this may be an appealing solution.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
409 Data set up for Nlogit
Car 01 1 4 1 1 0 1 10.5 40
Bus 01 2 4 0 1 −1 −1 11.5 40
Train 01 3 4 0 1 −1 −1 12 40
Plane 01 4 4 0 1 1 0 1.33 40
Car 01 5 4 1 0 1 0 14 40
Bus 01 6 4 0 0 1 0 12 40
Train 01 7 4 0 0 −1 −1 12 40
Plane 01 8 4 0 0 0 1 2 40
Car 01 5 4 0 0 0 1 10 40
Bus 01 6 4 1 0 0 1 14 40
Train 01 7 4 0 0 0 1 12 40
Plane 01 8 4 0 0 −1 −1 1.5 40
Car 02 1 4 0 1 0 1 10 32
Bus 02 2 4 0 1 −1 −1 11.5 32
Train 02 3 4 0 1 −1 −1 12 32
Plane 02 4 4 1 1 1 0 1.25 32
Car 02 5 4 0 0 0 1 12 32
Bus 02 6 4 0 0 1 0 14 32
Train 02 7 4 0 0 −1 −1 12 32
Plane 02 8 4 1 0 1 0 1.5 32
Car 02 5 4 0 0 −1 −1 12 32
Bus 02 6 4 0 0 0 1 12 32
Train 02 7 4 1 0 −1 −1 10 32
Plane 02 8 4 0 0 −1 −1 1.5 32
hour and twenty minutes for those choosing the plane. Individual two was
observed to have chosen to travel by plane.
The observant reader will note that if we combine the data sources as
suggested above, we have only one RP data choice set but multiple SP data
choice sets per individual traveller. Some researchers have suggested that,
when combined, RP data should be weighted so as to have equal represen-
tation as the SP data set. Weighting each observation is really something
that should be decided by the analyst according to what behavioral
strengths each data source has. We call this Bayesian determination. If we
believe that the RP data are equally as useful as the SP data, then we may
wish to equally weight them. With, say, one RP observation and eight SP
observations, one would either weight the RP data by 8.0 or the SP data by
0.125. We are of the opinion that such weighting should not take place. We
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
410 Software and data
reason that RP data is, by its very nature, ill conditioned (i.e., it may be
invariant and is likely to suffer from multicollinearity; see Chapter 4), while
SP data arguably provides better quality inputs to estimation, especially on the
attributes of alternatives. As such, while we use the RP data to provide informa-
tion on market shares and to capture information on real choices, we believe
that we are best to obtain our parameters or taste weights associated with each
attribute from SP data sources (except for the alternative-specific constant in
labeled choice sets) and export the SP attribute parameters to the RP environ-
ment where the model is calibrated to reproduce the actual market shares of
observed alternatives.
As an aside, calibration cannot and should not be undertaken on new alternatives, for
obvious reasons. What does the analyst calibrate against? In addition, choice-based
sampling is only valid for RP alternatives.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
411 Data set up for Nlogit
weighting variable within the data set to be used to weight the data during
estimation. For example, we may wish to weight the data differently for the
different sexes. For our example, we do not have a gender variable, so let us
assume that the analyst wishes to weight the data on the age variable. For
example, assume that the analyst wishes to weight the data such that data for
those 40 years and older are weighted by 1.5 and those under the age of 40 by
0.5. The weighting variable is shown in Table 10.7.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
412 Software and data
I would
Mode Description Car Bus Train Plane not travel
Comfort Level Medium High Low Low
Travel Time 12 hours 12 hours 10 hours 1.5 hours
our example we have ignored the no choice alternative and constrained the
decision maker into making a choice from the listed alternatives. We call this a
conditional choice. However, what if the decision maker may elect not to
travel? Table 10.8 presents an example of a choice set for our travel example
with the elect not to travel alternative.
As an aside, it is useful to think of any choice analysis in which the no-choice alternative
is excluded, as a conditional choice. Given the definition of demand in Chapter 2, another
way of expressing this is that any choice analysis that ignores no choice is effectively a
conditional demand model. That is, conditional on choosing an alternative, we can
identify a probability of choosing it. The only circumstance in which the conditional
demand is equivalent to unconditional demand is where the probability of making no
choice is zero.
If one elects not to travel, then we have no observable attribute levels for this
alternative. We see this in the choice set shown in Figure 10.1. Given that the
attribute levels are not observed, we treat them as missing values. As each row
of data represents an alternative we are required to insert a row of data for the
not-travel alternative in which the attribute levels are coded as missing. The
(default) missing value code for Nlogit is −999.
As an aside, when collecting data for use in Nlogit we strongly recommend that any missing
data be either imputed in the data set or assigned a −999 code (the default in Nlogit for
missing data). Nlogit will also accept other non-numeric data, such as the word “missing” to
indicate missing values. In any event, it is a good idea to have something in place to signify a
missing value. Some ambiguities as to how to interpret the data can arise if “blank” is used
to indicate missing values.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
413 Data set up for Nlogit
Car 01 1 5 1 1 0 14 40
Bus 01 2 5 0 1 0 12 40
Train 01 3 5 0 −1 −1 12 40
Plane 01 4 5 0 0 1 2 40
None 01 5 5 0 −999 −999 −999 40
Car 01 1 5 0 0 1 10 40
Bus 01 2 5 0 0 1 14 40
Train 01 3 5 0 0 1 12 40
Plane 01 4 5 0 −1 −1 1.5 40
None 01 5 5 1 −999 −999 −999 40
Car 02 1 5 0 0 1 12 32
Bus 02 2 5 0 1 0 14 32
Train 02 3 5 0 −1 −1 12 32
Plane 02 4 5 1 1 0 1.5 32
None 02 5 5 0 −999 −999 −999 32
Car 02 1 5 0 −1 −1 12 32
Bus 02 2 5 0 0 1 12 32
Train 02 3 5 1 −1 −1 10 32
Plane 02 4 5 0 −1 −1 1.5 32
None 02 5 5 0 −999 −999 −999 32
Car 03 1 5 0 −1 −1 12 35
Bus 03 2 5 1 1 0 14 35
Train 03 3 5 0 0 1 14 35
Plane 03 4 5 0 0 1 2 35
None 03 5 5 0 −999 −999 −999 35
Car 03 1 5 1 1 0 14 35
Bus 03 2 5 0 0 1 10 35
Train 03 3 5 0 1 0 14 35
Plane 03 4 5 0 −1 −1 1.5 35
None 03 5 5 0 −999 −999 −999 35
and cset variables. In the example, we now have five alternatives (ignoring
VFT as a possible alternative) and hence the cset variable will take on the value
5. The alti variable will now take on the values 1 to 5, 5 equating to the choice
not to travel alternative. We show this in Table 10.8. In Table 10.8 the attribute
levels for the choose not to travel alternative are set to the missing value code of
−999. Socio-demographic characteristics remain unchanged over the new
alternative, hence we are not required to treat such data as missing. Again,
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
414 Software and data
the reader can see this in Table 10.8. In Table 10.8 individual one elected not to
travel in the second choice set.
As an aside, in Nlogit the alti indexing must begin at 1 and include all values up to the
maximum number of alternatives. This does permit each individual to have a different
number of alternatives in their choice set. For example, individual one may have alternatives
1, 2, 4, 5 and individual two may have alternatives 1, 2, 3, 4, 5. The only situation in which
we do not need to have an alti variable (and a cset variable) is where all individuals have
choice sets with identical alternatives which are presented in the data in the same order for
each individual. We call the latter a fixed choice set and the case of varying alti and cset a
variable choice set. Analysts who use RP and SP data in a combined data set must take this
into account when using the sub-data set (e.g., SP), as the alti values for the second data set
will begin at the value immediately following the last value of the first data set (e.g., in Table
10.6). It will need to be transformed back to 1, 2, etc. This is very easy. One simply creates a
new alti index (say altz) equal to alti-z where z is the highest RP alti value. The variable altz
then replaces alti in the SP stand alone analysis after you have rejected the RP data lines.
There are several methods of entering data into Nlogit. By far the most
common will be to import a data file prepared by some other program or
gathered from an external source, such as a website. A second method, which
you will very rarely use, is to enter the data directly into Nlogit’s spreadsheet
style Data Editor. Finally, Nlogit, like other familiar programs, has its own
type of “save” file that can be used to transport data from one user to another
and one computer to another. We will refer to this as a project file as we
proceed. The advantage of the project file is that it enables you to transport
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
415 Data set up for Nlogit
work from one session to another and easily from one computer to another. In
principle, you need only import a data set once. Thereafter, you will use the
project file to hold and move data. We consider each of these in turn.
For most functions in Nlogit, two mechanisms are available to the analyst to
initiate actions. These are the command menus and the Text/Document
Editor. The commands as described in this and the following chapters are
instructions that are to be entered into the Text/Document Editor. While the
menus are perhaps more comfortable for the beginner, we have elected not to
discuss them in any further detail unless certain commands contained in the
command menus are absolutely necessary to access some other Nlogit func-
tion. We made this decision because the choice modelers will quickly learn
how to use Nlogit using the more convenient and efficient Text/Document
Editor and leave the menus and dialog boxes behind.
You will usually analyze data that have been prepared by another program
such as Microsoft Excel, SAS, etc., or have been obtained from an external
source, such as a website. These files may come in a variety of formats. The
most common will be an ASCII character data set with a row of variable
names in the first line of the file and the data values in rows to follow, with
values separated by commas. The data, for example, might appear thus:
MODE,CHSET,COMFORT,TTIME,BLOCK,COMFORT1,COMFORT2
1,2,1,14,−1,1,0
2 ,2,1,12,−1,1,0
3,2,−1,12,−1,−1,−1
4,2,0,2,−1,0,1
1,4,0,10,−1,0,1
2,4,0,14,−1,0,1
3,4,0,12,−1,0,1
4,4,−1,1.5000000,−1,−1,−1
(We have changed it slightly by adding the variable CHSET and dropping the
repetition of TTIME.) This is a “comma separated values,” or CSV file, and it
is the most common interchange format used to transport data files. An
alternative format that was very common until about 2010 was the .XLS
format used by Microsoft Excel. Current versions of Excel use yet another
format, the .XLSX format. Still other data formats include “Fortran formatted”
data sets, binary, DIF, and space or tab separated ASCII files. NLOGIT can
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
416 Software and data
read all of these and some others using a specialized command, READ, that is
documented in the manual for the program. For current purposes, we will
focus on the CSV format that is most common.
We created a file, which we call 10A.csv (using Microsoft Excel). For this
example, we select Project:Import/Variables . . . in the desktop menu shown in
Figure 10.2 to open the Windows Explorer in Figure 10.3. The default format is
the .csv file. We select the file, then click Open (Figure 10.4), and the data file will
be read. The Project window will be updated to show the variables you have read.
You can read an .XLS data set the same way. Notice in Figure 10.4 that at the
bottom of the window, the file type *.csv is selected. You can select *.xls instead
by opening the menu in this small window. (The third type is *.* all files.)
A tip, Stat Transfer, published by Circle Systems, Inc. (stattransfer.com) is a delightful utility
program that can be used to convert files written by about thirty different programs,
including Limdep and Nlogit as well as SAS, SPSS, Minitab, RATS, Stata, and so on. You
can convert a file written by almost any contemporary program directly into a project file
readable by Nlogit, and skip the importation step altogether.
The preceding should be useable for most situations. But there is an unlimited
variety of ways to write data files, and you may need some additional flexibility.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
417 Data set up for Nlogit
Nlogit provides a command, READ, that can be used to read a file (including
the files shown above). The READ command is used in the command editor
discussed above. It takes two forms, depending on whether the names for the
variables are in the data file (as shown above) or not (in which case, the data file
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
418 Software and data
contains only the numeric data). For a data file that contains the names in one or
more lines at the top of the file, the command is:
READ ; Nvar = number of variables
; Nobs = number of observations
; Names = the number of lines of names are at the top of the file (usually 1)
; File=the full path to the file. $
You could read a CSV file with this command, for example, with NVAR = 6,
Names = 1 and Nobs = whatever is appropriate. (Some special consideration
would be needed to make sure that the program recognizes the observation labels.
This is documented in detail in the manual.) The second type of file might have
no names in it. For this, you provide names in the READ command with:
READ ; Nvar = number of variables
; Nobs = number of observations
; Names = a list of Nvar names, separated by commas.
; File=the full path to the file. $
There are many types of data files. Nlogit can even read the internal file
formats for a few programs, such as Stata 10 and 11. (We make no guarantee
about future versions here. The people at Stata change their native format
from time to time.) For the many possibilities, we strongly recommend having
a copy of Stat Transfer at close reach.
As an aside, Nlogit is unable to read data saved in the .XLS file format if the data is saved as a
workbook (i.e., has more than one worksheet). As such, when using the .XLS file format, the
analyst is best to save the file as a worksheet. Also note that Nlogit will not be able to read
Excel formulas (this is one of the most common problems in reading in data). The XLSX file is
also unreadable. For transporting from recent versions of Excel, just use Save As. . . and
save the data file in the .csv format. The CSV format has been proven to be the most portable
and reliable.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
419 Data set up for Nlogit
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
420 Software and data
As an aside, Nlogit uses the value −999 as the default missing value. When you import data
from another source, it is a good idea to mark empty cells with something distinctive that
indicate missing values. Blanks will work in a csv file – “,1, ,2” will be correctly read as 1,
−999,2. But, if the file is not comma delimited, blanks might be misread. Alphanumeric or
non-numeric data will be treated as missing values by Nlogit, so “,1,missing,2” would be
more reliable. The convention of an isolated “.” is also respected, so “,1,.,2” will also be
read correctly. Again, if you are not using a comma separated values file, it is (always)
necessary to be careful how missing values are indicated.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
421 Data set up for Nlogit
I.e., just CREATE the variables, with the names separated by commas. There
is also a dialog box that can be used, Use Project:New/Variable. . . to open the
window, enter the names of the variables to be created in the window,
separated by commas, and press OK (Figure 10.8).
Either approach opens up the empty columns in the data set. You can see
the data set (whether read, imported, loaded, or typed in directly) by viewing
the Data Editor (Figure 10.9).
Once all the variables have been named, the analyst next enters the data into
the Data Editor in a manner similar to most other statistics and spreadsheet
programs. To gain access to the Data Editor, the analyst double clicks on
the variable name in the Project dialogue box (Figure 10.10). This will open up
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
422 Software and data
the Data Editor in which the data may be entered. An alternative way to open
the Data Editor is to press the Activate Data Editor button (second from right)
in the desktop menu, indicated below.
The project file, .lpj format, is used to save your work from session to session,
and to exchange data sets from one Nlogit program to another (e.g., on a
different computer). Use File:Save Project As. . . (Figure 10.11).
This will open a Windows explorer window where you can specify where
you want the project to be saved.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
423 Data set up for Nlogit
There are several ways to reload a project that you have saved:
In Windows explorer, when you double click a file name that has the suffix .
lpj, Windows will automatically launch Nlogit and then Nlogit will reload
the project file. The .lpj format is “registered” with Windows, so it is a
recognized file format.
After you launch Nlogit, you can use File “Open Project” to open a
Windows explorer and navigate to the file then open it.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
424 Software and data
The recently used project files will be listed at the end of the File menu. If
the project you wish to load was one of the four most recently used projects,
it will appear in the File menu, and you can select it from the list.
If you wish to export data from Nlogit to some other program, there are two
ways to proceed. A very reliable way to do so is to use Stat Transfer to convert
the Nlogit project file (.lpj file) directly to the native format of the target
program. If you want to write a portable file that can be read by many other
programs, your best choice will be to create a new .csv file. The command is:
EXPORT ; list of variable names . . .
; File=<the file name and location where you want the file written> $
(There are other formats that can be used, but unless you have a compelling
reason to use one of them, we recommend the csv format.)
The data format described previously is the most commonly used format for
choice analysis using Nlogit. An alternative formatting approach is to enter
each choice set into a single line of data, as opposed to allocating each
alternative within a choice set to a separate row of data. Using the single
row per choice set data format, the total number of rows related to any given
individual is the total number of choice sets associated with that individual.
Table 10.9 shows how data is entered such that each choice set is repre-
sented by a single row of data (for space reasons, Table 10.9 shows only the
first three individuals). In this data context the choice variable no longer
consists of 0s and 1s but rather assumes a unique number corresponding to
the alternative chosen. In our example, the first alternative is car. Hence if this
is the alternative chosen the choice variable is coded 1. Similarly the choice
variable is coded 2 for bus, 3 for train. We have created an index variable, ind,
which specifies to which individual the choice set belongs. (This is the format
that is used by some other programs. Note that id is the same as ind.)
Although Nlogit can estimate models using data in this format, we prefer
the data format as set out earlier in the chapter. The single line format severely
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
Table 10.9 Data entered into a single line
1 1 1 1 0 14 1 0 12 −1 −1 12 40
1 2 1 0 1 10 0 1 14 0 1 12 40
2 3 2 0 1 12 1 0 14 −1 −1 12 32
2 3 2 −1 −1 12 0 1 12 −1 −1 10 32
3 2 3 −1 −1 12 1 0 14 0 1 14 35
3 1 3 1 0 14 0 1 10 1 0 14 35
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
426 Software and data
restricts the range of computations that can be done with the data in Nlogit. In
the interests of brevity we do not discuss the modeling procedures associated
with single line data analysis. Rather we refer those interested in modeling
with such data to the Nlogit user manual.
Nlogit provides an internal device to convert one line data sets to the
multiple line format. It is a command built to look like a model command,
but converts data instead. To illustrate, we will convert the data in Table 10.9.
Since it is a very small data set, we first import it using the procedure described
in Section 10.6.1. The editing window is shown in Figure 10.12.
The result in the Data Editor is shown in Figure 10.13.
The command to convert the data is NLCONVERT, as shown in Figure 10.14.
The LHS list is the choice variable. The command indicates that there are
three choices in the choice set. (This feature requires that the number of
choices in the choice set be fixed. This is one of the disadvantages of this
“wide” data format.) The sets of RHS variables are the attributes. Since there
are three choices, each attribute set provides a set of three variables, one for
each alternative. There are three sets of attribute variables, hence three
attributes in the final data set. The RH2 variables are variables that will be
replicated in each choice within the choice set. Notice that with 6 choice
situations and 3 choices in the choice set, the new data set will have 18 rows.
The result of the conversion is shown in Figure 10.15. (If instructed with ;
CLEAR, the command will erase the original variables.) The response of the
program in Figure 10.16 shows the computations that were done.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
427 Data set up for Nlogit
The final task for the analyst in entering data should, as always, be the cleaning
of the data. Data should always be checked for inaccuracies before analysis.
The simplest and quickest check of data is to perform an analysis of descrip-
tive statistics. The command to generate descriptive statistics is
Dstats ;rhs=*$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
428 Software and data
The * signifies in Nlogit, “all variables.” Thus the command will generate
descriptive statistics for all variables within the data set. Inserting specific
names (separated by commas) instead of * will generate descriptive statistics
only for those variables named. The descriptive statistics for the all variable
case are shown above. What the analyst should look for is unusual looking
data. Examination of the output reveals that all variables are within their
expected ranges (for example, we would expect the choice variable to have a
minimum value of 0 and a maximum value of 1). A quick examination of such
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
429 Data set up for Nlogit
output can save the analyst a significant amount of time and avoid problems at
the time of model estimation.
Examination of the descriptive statistics table is useful for locating possible
data entry errors, as well as possibly suspect data observations. The analyst,
however, needs to be aware that the output table generated above is inclusive
of observations for all alternatives. Examining the descriptive statistics for
alternatives independently may yield further interesting information. A com-
mand we could use for the example above is:
Dstats ; For [ alti ] ; rhs=comfort1,comfort2,ttime $
The addition of the ;For[alti] to the Dstats command has Nlogit produce
descriptive statistics for the observations that are specific to each value present
within the alti variable.
-------------------------------------------------------------------------
Setting up an iteration over the values of ALTI
The model command will be executed for 4 values
of this variable. In the current sample of 24
observations, the following counts were found:
Subsample Observations Subsample Observations
ALTI = 1 6 ALTI = 2 6
ALTI = 3 6 ALTI = 4 6
-------------------------------------------------------------------------
Actual subsamples may be smaller if missing values are
being bypassed. Subsamples with 0 observations will
be bypassed.
-------------------------------------------------------------------------
--------------------------------------------------------------------------------------
Subsample analyzed for this command is ALTI = 1
-----------+--------------------------------------------------------------------------
Variable| Mean Std.Dev. Minimum Maximum Cases Missing
-----------+--------------------------------------------------------------------------
COMFORT1| 0.0 .894427 -1.000000 1.0 6 0
COMFORT2| 0.0 .894427 −1.000000 1.0 6 0
TTIME| 12.33333 1.505545 10.0 14.0 6 0
-----------+--------------------------------------------------------------------------
Subsample analyzed for this command is ALTI = 2
-----------+--------------------------------------------------------------------------
COMFORT1| .500000 .547723 0.0 1.0 6 0
COMFORT2| .500000 .547723 0.0 1.0 6 0
TTIME| 12.66667 1.632993 10.0 14.0 6 0
-----------+--------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
430 Software and data
-----------+--------------------------------------------------------------------------------------
Cor.Mat.| ID ALTI CSET CHOICE COMFORT1 COMFORT2 TTIME
-----------+--------------------------------------------------------------------------------------
ID| 1.00000 .00000 .00000 .00000 .06464 −.06071 .04228
ALTI| .00000 1.00000 .00000 −.17213 −.25963 −.15517 −74871
CSET| .00000 .00000 .00000 .00000 .00000 .00000 .00000
CHOICE| .00000 −.17213 .00000 1.00000 .39613 −.02862 .17936
COMFORT1| .06464 −.25963 .00000 .39613 1.00000 .50491 .33613
COMFORT2| −.06071 −.15517 .00000 −.02862 .50491 1.00000 .16169
TTIME| .04228 −.74871 .00000 .17936 .33613 .16169 1.00000
As an aside, the analyst may wish to examine the covariance matrix. The command for this is
output=1$. Using the command output=3$ will generate both the covariance and correla-
tion matrices. The correlation matrix for our example is shown above. As with descriptive
statistics, it may be worthwhile examining the correlation matrix for each alternative
independently. In Nlogit, the correlation matrix is based on a Pearson product moment
specification between two variables. Strictly, this is valid when contrasting ratio scaled
variables (and is usually acceptable for interval scaled variables); however for classificatory
variables (e.g., ordinal scaled) other indicators of similarity are preferred. The Prelis pre-
processor in LISREL is useful for this task.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
431 Data set up for Nlogit
true for the removal of rows (i.e., treatment combinations or choice sets). As
such, design orthogonality requires the return of all data for all blocks of the
design. There can be no missing data. For small sample sizes (e.g., the total
sample size is three respondents, each assigned to one of three blocks of a
design), maintaining orthogonality is relatively straightforward. For larger
sample sizes (e.g., when the sample size is in the hundreds), maintaining
orthogonality is a non-trivial task.
For example, consider a design blocked into three. If 100 decision makers
complete the first block and 100 complete the second but only 99 complete the
third block, design orthogonality will be lost over the entire sample. The degree of
the loss of orthogonality is a question of correlation and hence multicollinearity.
As an aside, for an experimental design to remain orthogonal, either the entire design must be
given to each decision maker (and any missing data result in the entire data related to that
individual being removed from the data set) or some sampling strategy put in place to ensure
that complete designs are returned across all decision makers. Unfortunately, any such
strategy is likely to result in questions being raised regarding sampling bias. Computer
Assisted Personal Interviews (CAPI) or Internet Aided surveys (IASs) may help alleviate this
problem somewhat by detecting portions of a design (blocks) that have been under-utilized in
sampling and assign new individuals to these. Alternatively, the randomization process might
involve randomly assigning the first decision maker to a block and subsequent decision
makers to other unused blocks without replacement. Once the entire design is complete,
the process repeats itself, starting with the next decision maker sampled. In either case, the
analyst must consider whether such strategies are strictly random. At the end of the day, it is
likely that the analyst will have to live with some design non-orthogonality, which with efficient
or optimal designs is less of a concern compared with the more traditional orthogonal designs.
The commands to convert the data in Table 10.6 into multiple line choice data
are recreated below. We have also included the commands to generate the
additional descriptive tables for each alternative as well as the correlation
matrix, as described in the main text:
CREATE
;car=(choice=1);bus=(choice=2);train=(choice=3);plane=(choice=4)
;cset=4
;alt1=1
;alt2=2
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
432 Software and data
;alt3=3
;alt4=4$
WRITE
;id, alt1, cset, car, comfort1, comfort2, ttime1, age,
id, alt2, cset, bus, comfort3, comfort4, ttime2, age,
id, alt3, cset, train, comfort5, comfort6, ttime3, age,
id, alt4, cset, plane, comfort7, comfort8, ttime4, age
;file= <wherever the analyst specifies.dat>
;format=((8(F5.2,1X)))$
reset
read; file = <specified file location .dat>; nvar = 8 ;nobs = 24
;names = id, alt, cset, choice, comfort1, comfort2, ttime, age$
dstats;rhs=*$
dstats;rhs=*
output=two$
Altogether, there are well over 1,000 specific conditions that are picked up
by the command translation and computation programs in Limdep and
Nlogit. Most diagnostics are self-explanatory and will be obvious. For
example:
82 ;Lhs – variable in list is not in the variable names table.
states that your Lhs variable in a model command does not exist. No doubt
this is due to a typographical error – the name must be mis-spelled. Other
diagnostics are more complicated and in many cases it is not quite possible to
be precise about the error. Thus, in many cases, a diagnostic will say some-
thing like “the following string contains an unidentified name” and a part of
your command will be listed – the implication is that the error is somewhere
in the listed string. Finally, some diagnostics are based on information that is
specific to a variable or an observation at the point at which it occurs. In that
case, the diagnostic may identify a particular observation or value. In the
listing below, we use the conventions:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
433 Data set up for Nlogit
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:08 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.013
Cambridge Books Online © Cambridge University Press, 2015
Part III
The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:03 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:03 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
37-471
An economist is an expert who will know tomorrow why the things he predicted
yesterday didn’t happen today.
(Laurance J. Peter 1919–90)
11.1 Introduction
In this chapter we demonstrate, through the use of a labeled mode choice data
set (summarized in Appendix 11A to this chapter), how to model choice data
by means of Nlogit. In writing this chapter we have been very specific. We
demonstrate line by line the commands necessary to estimate a model in
Nlogit. We do likewise with the output, describing in detail what each line of
output means in practical terms. Knowing that “one must learn to walk before
one runs,” we begin with estimation of the most basic of choice models, the
multinomial logit (MNL). We devote Chapter 12 to additional output that
may be obtained for the basic MNL model and later chapters (especially
Chapters 21–22) to more advanced models.
The basic commands necessary for the estimation of choice models in Nlogit
are as follows:
NLOGIT
;lhs = choice, cset, altij
;choices =<names of alternatives>
;Model:
U(alternative 1 name) = <utility function 1>/
U(alternative 2 name) = <utility function 2>/
...
U(alternative i name) = <utility function i>$
437
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
438 The suite of choice models
We will use this command syntax with the labeled mode choice data described
in Chapter 10, shown here as:
Nlogit
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr
;model:
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt
+ trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt
+ trpt*trnf /
u(bw) = bw + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt
+ trpt*trnf /
u(cr) = invccr*invc + invtcar*invt + TC*TC +
PC*PC + egtcar*egt $
While other command structures are possible (e.g., using RHS and RH2
instead of specifying the utility functions – we do not describe these here
and refer the interested reader to Nlogit’s help references), the above format
provides the analyst with the greatest flexibility in specifying choice models. It
is for this reason that we use this command format over the other formats
available.
The first line of the above command, as with all commands in Nlogit,
informs the program as to the specific function being undertaken by the
analyst. This is similar to the create and dstats commands discussed pre-
viously. The command NLOGIT informs the program that the analyst
intends to estimate a discrete choice model.
The next command line specifies the components of the LHS of the choice
model (lhs). The semi-colon is obligatory. The order of the command is
always the choice variable (choice in this instance) followed by the variable
representing the number of alternatives within each choice set (i.e., cset)
followed by the variable indicating the alternative represented within each
row of data (i.e., altij). If these names are placed in an order other than that
shown, Nlogit is likely to produce an error message such as:
Error: 1099: Obs. 1 responses should sum to 1.0. Sum is 2.0000.
indicating that there exists more than one choice per choice set somewhere
within the data set. Such an error is likely if (1) the data have been incorrectly
inputted or (2) the order of the command line has not been correctly entered
as suggested above.
The next command:
;choices = <names of alternatives>
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
439 The workhorse – MNL
requires the analyst to name each of the alternatives. It is important that the
names appear in the exact order as the coding of the altij variable otherwise
the analyst is likely to misinterpret the resulting output. This is the only place
in the command syntax where order matters. For example, in the altij variable
for the case study, the bus alternative is coded 1 while the train alternative is
coded 2. As such, whatever names the analyst gives these two alternatives
should appear in the order of the bus followed by the train. The remaining
alternatives should also appear in the same order indicated by the altij
variable.
The remaining commands specify the utility functions (in any order) for
each of the alternatives:
;model:
U(<alternative 1 name>) = <utility function 1>/
U(<alternative 2 name>) = <utility function 2>/
...
U(<alternative i name>) = <utility function i>$
The utility specification begins with the command ;model: and each new
utility function is separated by a slash (/). The last utility function ends with
a dollar sign ($) informing Nlogit that the entire command sequence is
complete. Note the use of a colon (:) after the word model rather than a
semi-colon (;).
The utility function for an alternative represents a linear equation corre-
sponding to the functional relationship of attributes (and socio-demographic
characteristics, or SDCs) upon the utility level derived for that alternative.
Each utility function is equivalent to the utility function shown in Equation
(11.1):
Vi ¼ β0i þβ1i f ðX1i Þ þ β2i f ðX2i Þ þ β3i f ðX3i Þ þ . . . : þ βKi f ðXKi Þ; ð11:1Þ
where
β1i is the weight (or parameter) associated with attribute X1 and alternative i
β0i is a parameter not associated with any of the observed and measured
attributes, called the alternative-specific constant (ASC), which represents on
average the role of all the unobserved sources of utility.
As an aside, the constant β0i need not be made specific to each alternative (i.e., an
alternative-specific constant in the literature); however, it is debatable as to why an analyst
would ever wish to constrain a constant to be equal across two or more alternatives (known
as a generic parameter) when the alternatives are labeled. Given that the constant term is
representative of the average role of all the unobserved sources of utility, constraining the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
440 The suite of choice models
constant terms of two or more labeled alternatives to be equal forces the average role of all
the unobserved sources of utility for those alternatives to be equal. In most cases, this
represents a questionable proposition for labeled alternatives.
The utility functions specified by the analyst need not be the same for
each alternative. Different attributes and SDCs may enter into one or more
utility functions or may enter into all or several utility functions but be
constrained in different ways or transformed differently across the utility
functions (e.g., with log transformations). Indeed, some utility functions
may have no attributes or SDCs enter into them at all. We have discussed
all the many ways of entering attributes into the utility expressions for each
alternative in Chapter 3, and will not repeat this here. We will focus on how
to use Nlogit in defining the functional form of the elements in a utility
expression.
In specifying a utility function, the analyst must define both the parameters
and the variables of the linear utility equation. This is done in a systematic
manner with the parameter specified first and the variable specified second.
Both are separated with an asterisk (*). We show this below:
;Model:
U(<alternative 1 name>) = <parameter>*<variable> /
The variable name must be consistent with a variable present within the
data set. A parameter may be given any name so long as the name is no
more than eight characters long and begins with an alpha code. In naming the
parameter, the analyst is best to choose names that represent some meaning to
the variable related to that parameter although, as mentioned, any name will
suffice.
If the same parameter name is used more than once across (and within)
alternatives, the parameter estimated will be the same for however many
utility functions that name was used. That is to say that the parameter will
be generic across those alternatives. For example:
;Model:
U(<alternative 1 name>) = <parameter 1>*<variable 1>/
U(<alternative 2 name>) = <parameter 1>*<variable 1>/. . ..
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
441 The workhorse – MNL
;Model:
U(<alternative 1 name>) = <parameter 1>*<variable 1>/
U(<alternative 2 name>) = <parameter 2>*<variable 1> /. . ..
will produce an estimate of the constant term for alternative 1. Note that only
one constant can be specified per utility function. Thus:
;Model:
U(<alternative 1 name>) = <constant> + <mistake> + <parame-
ter>*<variable> /
will produce an error in the output stating that Nlogit was unable to estimate
standard errors or reliable parameter estimates for the model specified. We
show this below:
+-------------+-------------------+----------------------+------------+-----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z]|
+-------------+-------------------+----------------------+------------+-----------+
CONSTANT .09448145 .70978218 .133 .8941
MISTAKE .09448145 ......(Fixed Parameter).......
PARAMETE -.08663454 ......(Fixed Parameter).......
Using a specific example, the utility function for an alternative named bs (i.e.,
bus) might look thus:
;Model:
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt
+ trpt*trnf /
If a second utility function were specified as shown below, the model output
will have a single generic parameter associated with all five attributes for both
alternatives bs and tn but will estimate constant terms specific to each alter-
native (known as ASCs):
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt
+ trpt*trnf /
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
442 The suite of choice models
We show this in the following Nlogit output table. For this example, a single
generic parameter named actpt is estimated for all three public transport
alternatives while separate ASC terms are estimated:
|-> reject;SPRP=0$
|-> Nlogit
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr?/ 0.2,0.3,0.1,0.4
;show
;descriptives;crosstabs
;model:
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt +
trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt +
trpt*trnf /
u(bw) = bw + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt +
trpt*trnf /
u(cr) = invccr*invc + invtcar*invt + TC*TC + PC*PC +
egtcar*egt $
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
443 The workhorse – MNL
As an aside, The name used for the parameter may be whatever the analyst so desires so
long as the number of characters used in naming the parameter does not exceed eight
(although there are one or two reserved names, one being just such a name). While we
might name the parameter actpt for the fuel design attribute we could have used act instead
(i.e., act*act). While the parameter can take any name, the variable name must be that of a
variable located within the data set. Thus should the analyst mistakenly type the command:
;Model:U(ba) = actpt*atc /
the following error message would appear, as no such variable exists (i.e., atc) within the
data set.
Error: 1085: Unidentified name found in atc
It is very important to check the spelling within each of the command lines to
avoid unwanted errors. Returning to Equation (11.1), the utility functions
may be written as Equations (11.2a) and (11.2b), focussing on one attribute
and the ASCs:
For the present, we note that the parameter estimates for the actpt attribute
are equal for both alternatives but the constant terms of each of the alter-
natives differ (i.e., −1.88 for bs and −1.680 for tn).
If the actpt parameter in the second utility function were given a different
name than that in the first utility function (e.g., acttn), separate alternative-
specific parameters would be estimated. For example:
;Model:
u(bs) = bs + actpt*act +. . . /
u(tn) = tn + acttn*act + . . . /
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
444 The suite of choice models
The parameter estimates for the access time attribute are allowed to vary
across the alternatives. We now have two parameter estimates, one for each
alternative, defined as alternative-specific parameters.
As an aside, if a parameter is given a name with nine or more characters (e.g., a parameter is
given the name parameter), Nlogit will make use of the first eight characters only. A not
uncommon mistake when specifying alternative-specific parameters is to provide two or
more parameters with names which are differentiated only after the eighth character (e.g.,
parameter1 and parameter2). As Nlogit makes use only of the first eight characters the
estimated model will produce a single generic parameter titled paramete rather than the two
or more alternative-specific parameters desired by the analyst (e.g., parameter1 and
parameter2).
One final note is necessary before the reader can begin estimating models.
The logit model, from which the basic choice model is derived, is homogenous
of degree zero in the attributes. In layman’s terms, this suggests that attributes
and SDCs that are invariant across alternatives, such as age, number of
vehicles, etc., will fall out of the probabilities and the model will not be
estimable. This is true also of the constant term. The correct way to allow
for them, as shown in Chapter 3, is to include them in a maximum of J−1
alternatives, where J is total number of alternatives (i.e., four in the data set
being used in this chapter). Importantly, J is the total that applies across the
sample and not the number of alternatives that any one individual may have
in their choice set.
In this section we will concentrate on interpreting the output from the basic
model reported above and not concern ourselves with how to improve the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
445 The workhorse – MNL
The choice variable used was choice, which is consistent with the commands
used for the analysis. We have not used any variable to weight the data (see
Chapter 13). The number of observations refers to the number of choice sets
used within the analysis and not the number of individual respondents. Since
we are using the revealed preference (RP) data in the case study, there is one
observation per respondent:
Number of obs.= 197, skipped 0 obs
As an aside, the analyst may specify the maximum number of iterations allowable by Nlogit
in estimating any given model. This is achieved by adding the command ;maxit = n (where
n = the maximum number of iterations). That is:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
446 The suite of choice models
NLOGIT
;lhs = choice, cset, altij
;choices = <names of alternatives>
; maxit = n
;Model:
U(<alternative 1 name>) = <utility function 1>/
U(<alternative 2 name>) = <utility function 2>/
...
U(<alternative i name>) = <utility function i>$
Because we have used MLE and not ordinary least squares (OLS) as the
estimation procedure, we cannot rely upon the use of the statistical tests of
model fit commonly associated with OLS regression. We cannot use an
F-statistic to determine whether the overall model is statistically significant
or not.
To determine whether the overall model is statistically significant, the
analyst must compare the LL function of the choice model at convergence
to the LL function of some other, “base model.” To explain why this is so,
recall that values of LL functions closer to zero represent better model fits. For
this example, the LL function at convergence is −200.402, but we invite the
reader to consider just how close this is to zero if there exists no upper bound
on the value that a LL function can take. Unless we have some point of
comparison, there is no way to answer this question.
Traditionally, two points of comparison have been used. The first point
of comparison involves comparing the LL function of the fitted model
with the LL function of a model fitted independent of any information
contained within the data. The second point of comparison involves
comparing the LL function of the fitted model against the LL function
of a model fitted using only information on the market shares as they exist
within the data set. To explain the origin of these two base comparison
models (note that different literature has alternatively referred to these
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
447 The workhorse – MNL
e1 2:72
p¼ 1 0
¼ ¼ 0:73: ð11:4Þ
e þe 2:72 þ 1
An increase of one unit in the utility level associated with the first alternative
produces an increase in the probability of selecting that alternative to 0.88. We
show this calculation in Equation (11.5):
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
448 The suite of choice models
e2 2
p¼ ¼ ¼ 0:88: ð11:5Þ
e2 þ e0 2 þ 1
Increasing the utility of the first alternative from one to two produced an
increase in the probability of selecting that alternative of a magnitude of 0.15.
Now consider a further one unit increase in the utility associated with the first
alternative (i.e., the utility increases from two to three); the probability of
selecting the first alternative now becomes Equation (11.6):
e3
p¼ ¼ 0:95: ð11:6Þ
e3 þ 1
Probability
1
0.5
0
–∞ ∞
Xi
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
449 The workhorse – MNL
functional form suggested in Equation (11.4). That is, the utility functions
themselves are linear. Noting this point and following the same logical argu-
ment used to show that the base level of a dummy coded variable is perfectly
confounded with the average utility of a given alternative (see Chapter 3), it
can be shown that if the utility function for an alternative is estimated with
only an ASC (i.e., no other parameters are estimated), the ASC will be equal to
the average utility for that alternative.
Returning to our initial binary choice example, and assuming that the two
utilities discussed are utilities estimated from a model employing ASCs only,
the two utilities represent the average utilities for the two alternatives.
Assuming that the original utilities are the true utilities for each alternative,
the average utilities for alternatives 1 and 2 are one and zero, respectively
(noting that these are relative utilities and hence the average utility for the
second alternative is not strictly zero), and the probabilities of choice as
calculated from Equation (11.4) for these two alternatives are 0.73 and 0.27
respectively, ceteris paribus.
But what of the first base comparison model? This model is estimated
ignorant of any information contained within the data (hence it is sometimes
referred to as the no information model). For this model, the true choice
proportions are ignored and instead the model is estimated as if the choice or
market shares are equal across the alternatives. This is equivalent to estimating
a model with only a generic constant term for each of the J – 1 alternatives
(assuming fixed choice sets).
Independent of whether one uses the first or second model as a basis of
comparison, if the fitted model does not statistically improve the LL function
(i.e., the LL of the model is statistically closer to zero than the comparison or
base model’s LL function value) then the additional attributes do not improve
the overall model fit beyond the comparison or base model. That suggests that
the best estimate available to the analyst is the market share assumed (i.e.,
either the actual market shares or equal market shares, dependent upon the
comparison model employed).
The first comparison model assuming equal market shares among the
alternatives has fallen out of favor. This is because an assumption of equal
market shares is likely to be unrealistic, and given information in the data
on choices, the analyst has available information on the actual sample
market shares. So why not use the information available? For this reason, it
is now more common to use the actual sample market shares available in
the data as the comparison or base model to test for improvements in
model fits.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
450 The suite of choice models
A base model using the market shares within the data is equivalent to a
model estimated with ASCs only. The commands necessary in Nlogit to
generate this base model involve providing unique names for the constant
terms for each alternative. The Nlogit outputs for this model have to be
obtained using either the simple ;rhs=one command syntax, since the LL is
not reported in the model output associated with the ;model: command
syntax, or simply add ;asc to the full model. An example is given below. By
including ;choices = bs,tn,bw,cr, the parameter estimates will be named after
each alternative; otherwise, they will (as shown in the second output) be given
a name associated with the order of the alternatives such as A_Alt.1:
reject;SPRP=0$
Nlogit
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr
;rhs=one$
Normal exit: 4 iterations. Status=0, F= 250.9728
----------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log-likelihood function -250.97275
Estimation based on N = 197, K = 3
Inf.Cr.AIC = 507.9 AIC/N = 2.578
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;...;RHS=ONE$
Response data are given as ind. choices
Number of obs.= 197, skipped 0 obs
-----------+----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+----------------------------------------------------------------------------------------
A_BS| -.80552*** .20473 -3.93 .0001 -1.20678 -.40425
A_TN| -.53207*** .19616 -2.71 .0067 -.91654 -.14760
A_BW| -.62947*** .20120 -3.13 .0018 -1.02381 -.23514
--------+-------------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 25, 2013 at 09:31:34 AM
----------------------------------------------------------------------------------------------------
|-> Nlogit
;lhs = choice, cset, altij
;rhs=one$
Normal exit: 4 iterations. Status=0, F= 250.9728
-----------------------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
451 The workhorse – MNL
2ðLLbase model – LLestimated model Þ 2ðnumber of new parameters estimated in the estimated modelÞ
ð11:7Þ
Taking the difference of the LL reported for the base model in the output, (i.e.,
−250.97275) and the LL of the estimated model (i.e., −200.40241) and multi-
plying this value by minus two, the minus two log-likelihood (−2LL) statistic
equals 101.14068. To determine whether an estimated model is superior to
its related base model, the −2LL value obtained is compared to a Chi-square
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
452 The suite of choice models
Rejection region
18.31
101.14
Chi-squared[10] = 132.82111
Prob [ chi squared > value ] = .00000
From the model outputs, we can see the LRT has a value of 132.821, which is
greater than the 101.14 based on a comparison with the market shares base
model. We can deconstruct this value to obtain the equal shares LL, by
dividing 132.82 by 2 to give 66.41, and then adding this to the LL
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
453 The workhorse – MNL
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
454 The suite of choice models
Using the LL ratio test we compute the −2LL ratio using the same procedure as
before, only this time the LL of the base comparison model is replaced by the
largest LL of the two models under comparison. If we naively use the first
estimated model’s LL function value first and the second estimated model’s LL
second, the possibility exists that the LL function for the first model will be
smaller than that of the second model. In such cases, the computed Chi-square
test statistic will be negative (i.e., the same problem when generic constants
are specified). Thus the test becomes:
2ðLLLargest – LLSmallest Þ 2ðDifference in the number of parameters estimated between the two modelsÞ :
ð11:8Þ
The LL of the previous model was −200.4024, while the LL of the new model is
−196.0186; hence, we replace the LL for the base model with the LL of the
second model. Substituting the LL values for the old and new model into
Equation (11.8), we obtain:
2× 196:018 ð 200:40Þ 2ð12 10Þ d:f :
4:382 2ð2Þ d:f :
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
455 The workhorse – MNL
The degrees of freedom for the critical Chi-square statistic is equal to the
difference between the number of parameters estimated between the two
models. As the first model estimated 10 parameters and the new model 12
parameters, the degrees of freedom for the test is two. From Chi-square tables,
the Chi-square critical value with two degrees of freedom taken at a 95 percent
confidence level is 5.99.
Comparing the test statistic of 4.382 to the Chi-square critical value of 5.99,
we note that the test statistic is smaller than the critical value. Given this, the
analyst is not able to reject the hypothesis that the new model does not
statistically improve the LL over the previous model, and conclude that the
LL of the new model is statistically no closer to zero than that of the previous
model.
LLEstimated model
R2 ¼ 1 : ð11:9Þ
LLBase model
Note that some analysts reportedly use the algebraically equivalent Equation
(11.10) instead of Equation (11.9) to calculate the pseudo-R2. In either case,
the same R2 value will be calculated:
Substituting the values from estimated model output and base (known market
shares only) model output into Equation (11.9) we obtain:
200:402
R2 ¼ 1 ¼ 0:2015:
250:972
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
456 The suite of choice models
R2
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 pseudo-R 2
For this example, we use the pseudo-R2 value of 0.2015. As noted previously,
the pseudo-R2 of a choice model is not exactly the same as the R2 of a linear
regression model. Fortunately, there exists a direct empirical relationship
between the two (Domencich and McFadden 1975). Figure 11.3 shows the
mapping of the relationship between the two indices.
Figure 11.3 suggests that a pseudo-R2 value of 0.2015 still represents a bad
model fit. This is not a surprising result given that the model fitted included a
single attribute parameter.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
457 The workhorse – MNL
Nlogit allows modeling of a number of different types of choice data. For this
book we concentrate solely on individual level data; however, it is also possible
to model choices based on proportions data, frequency data, and ranked data.
It is more likely that the beginner will be exposed to individual level choice
data. For those wishing to explore these other data formats, the Nlogit
reference manuals provide an excellent discussion on how to use them.
The number of observations used for modeling is reported a second time
in the output; however, this time it is accompanied with a record of how
many bad observations were skipped in model estimation. For simple MNL
models, this record of bad observations becomes relevant when conducting
the test of the Independence of Irrelevant Alternatives (IIA) (discussed in
Chapter 7).
-----------+----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+----------------------------------------------------------------------------------------
BS| -1.88276** .81887 -2.30 .0215 -3.48771 -.27781
ACTPT| -.06035*** .01845 -3.27 .0011 -.09651 -.02420
INVCPT| -.08584* .05032 -1.71 .0880 -.18447 .01279
INVTPT| -.01108 .00829 -1.34 .1817 -.02733 .00518
EGTPT| -.04119** .02048 -2.01 .0443 -.08134 -.00104
TRPT| -1.15456*** .39991 -2.89 .0039 -1.93837 -.37074
TN| -1.67956** .83234 -2.02 .0436 -3.31091 -.04821
BW| -1.87943** .81967 -2.29 .0219 -3.48595 -.27290
INVCCR| -.00443 .27937 -.02 .9873 -.55199 .54312
INVTCAR| -.04955*** .01264 -3.92 .0001 -.07433 -.02477
TC| -.11006 .09195 -1.20 .2313 -.29029 .07016
PC| -.01791 .01799 -1.00 .3195 -.05317 .01735
EGTCAR| -.05807* .03310 -1.75 .0793 -.12294 .00680
-----------+----------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
458 The suite of choice models
The first column of output provides the variable names supplied by the
analyst. The second column provides the parameter estimates for the variables
mentioned in the first column of output. Ignoring the statistical significance of
each of the parameters (or the lack thereof) for the present, we can use the
information gleaned from the above output to write out the utility functions
for each of the alternatives. Doing so requires knowledge of how the utility
functions were specified earlier. For the above example, writing out the utility
functions to conform to the earlier model specification yields Equations
(11.11a) through (11.11d):
The utility derived from a choice model as shown above is meaningful only
when considered relative to that of the utility for a second alternative. Thus, a
utility of −1.968 is meaningful only when compared with the utility calculated
for that of each of the other alternatives. Assuming that the utility for the tn
alternative was estimated as being −1.765, the utility of the bs alternative relative
to that of the tn alternative is given as the difference of the two. That is:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
459 The workhorse – MNL
expV i
Probði j jÞ ¼ J
; j ¼ 1; . . . ; i; . . . ; J i ≠ j: ð11:12Þ
X
expVj
j¼1
To calculate the probability that an alternative will be selected over all other
available alternatives, the utility function for that alternative is treated as the
numerator in Equation (11.12) (i.e.,Vi). As such, to calculate the selection
probabilities of each of the alternatives will require as many equations as there
exist alternatives.1
Using a specific example, assuming that the analyst wishes to determine the
probability of selection of the bs alternative, expanding Equation (11.12) for
the mode case study we obtain:
eV bs
Probðbs j jÞ ¼ ð11:13Þ
eV bs þ eV tn þ eV bw þ eV ce
0:06035 act 0:08584 invc 0:01108 invt2 0:04119 egt 1:15456 trnf Þ
þeð 1:6796
0:06035 act 0:08584 invc 0:01108 invt2 0:04119 egt 1:15456 trnf Þ
þeð 1:8794
As noted previously, while the utility functions derived from a discrete choice
model are linear, the probability estimates are not. It is possible to provide a
direct behavioral interpretation of the parameter estimates when discussing
1
This is not strictly true, as the probabilities must sum to one and hence one can calculate the probability of
the last alternative given knowledge of the probabilities of all the other alternatives.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
460 The suite of choice models
utilities (although only in a relative sense) but not when discussing probabil-
ities. This is a result of the use of exponentials in Equation (11.14). In Chapter
12, we discuss the concepts of marginal effects and elasticities which provide a
direct and meaningful behavioral interpretation of the parameter estimates
when dealing with probabilities. The next column of output lists the standard
errors for the parameter estimates.
The parameter estimates obtained are subject to error. The amount of
error is given by the standard error of the coefficient. A common question
asked by analysts is whether a variable contributes to explaining the choice
response. What we are attempting to accomplish through modeling is an
explanation of the variation in the dependent variable (i.e., choice) observed
within the population of sampled individuals. Why do some individuals
choose alternative A over alternative B, while others ignore these alternatives
completely and choose alternative C? By adding variables to a model, the
analyst is attempting to explain this variation in the choice of alternative. If
an explanatory variable does not add to the analyst’s understanding of
choice, statistically the weight attached to that variable will equal zero.
That is:
βi ¼ 0: ð11:15Þ
In linear regression analysis, this test is usually performed via a t- or F-test. For
choice analysis based upon MNL models, neither the t- or F-statistic is
available. Fortunately, the asymptotically equivalent test is available. Known
as the Wald statistic, the test statistic is both calculated and interpreted in the
same manner as the t-test associated with linear regression models. The Wald
statistic (see also Chapter 7) for each parameter, given in the fourth column of
the output is:
βi
Wald ¼ : ð11:16Þ
standard errori
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
461 The workhorse – MNL
reject the hypothesis that the parameter equals zero and conclude that the
explanatory variable is statistically significant. If, on the other hand, the
absolute value of the Wald test statistic given in the output is less than the
critical Wald value, the analyst cannot reject the hypothesis that the para-
meter equals zero and therefore must conclude that the explanatory variable
is not statistically significant.
The final column of output provides the probability value (known as a
p-value) for the Wald test of the previous column.
As with the log ratio Chi-square test, the analyst compares the p-value to
some pre-determined confidence level as given by alpha. Assuming a
95 percent confidence level, alpha equals 0.05. p-values less than the deter-
mined level of alpha suggest that that parameter is not statistically equal to
zero (i.e., the explanatory variable is statistically significant), while p-values
that exceed the level of alpha as assigned by the analyst indicate that a
parameter is statistically equal to zero (and hence the explanatory variable is
not statistically significant). At the same level of confidence, both the Wald
test and the p-value will draw the same conclusion for the analyst.
As an aside, the output produced by Nlogit is best saved to a Word file in courier font with
point size 8. When you copy and paste the output as is, it will look very messy in other default
fonts and sizes (e.g., 12 point Times Roman).
The examples we have used to this point have assumed that there are no
significant interaction effects present within the data. However, attributes and
SDCs are not necessarily independent (and hence additive). For example, the
marginal utility (the estimate parameters) associated with invehicle cost may
vary according to an individual’s personal income. We can test this by
integrating (or conditioning) invc on personal; income (pinc). The following
Nlogit command syntax may be used to generate just such an interaction
variable:
Create; cst_pinc=invc*pinc$
The Nlogit model with this interaction term associated with the cr alternative
is given below. We have included ;asc so that we can obtain all of the LL results
of interest:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
462 The suite of choice models
Nlogit
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr?/ 0.2,0.3,0.1,0.4
;asc
;model:
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt +
trpt*trnf +cpinc*cst_pinc/
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt +
trpt*trnf /
u(bw) = bw + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt +
trpt*trnf /
u(cr) = invccr*invc + cpinc*cst_pinc+invtcar*invt +
TC*TC + PC*PC + egtcar*egt $
Normal exit: 4 iterations. Status=0, F= 250.9728
----------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log-likelihood function -250.97275
Estimation based on N = 197, K = 3
Inf.Cr.AIC = 507.9 AIC/N = 2.578
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only -266.8130 .0594 .0542
Response data are given as ind. choices
Number of obs.= 197, skipped 0 obs
-----------+----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+----------------------------------------------------------------------------------------
A_BS| -.80552*** .20473 -3.93 .0001 -1.20678 -.40425
A_TN| -.53207*** .19616 -2.71 .0067 -.91654 -.14760
A_BW| -.62947*** .20120 -3.13 .0018 -1.02381 -.23514
-----------+----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 26, 2013 at 08:57:38 AM
----------------------------------------------------------------------------------------------------
Normal exit: 6 iterations. Status=0, F= 198.2643
----------------------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log-likelihood function -198.26430
Estimation based on N = 197, K = 14
Inf.Cr.AIC = 424.5 AIC/N = 2.155
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only -250.9728 .2100 .1894
Chi-squared[11] = 105.41691
Prob [ chi squared > value ] = .00000
Response data are given as ind. choices
Number of obs.= 197, skipped 0 obs
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
463 The workhorse – MNL
-----------+----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+----------------------------------------------------------------------------------------
BS| -2.05677** .82596 -2.49 .0128 -3.67561 -.43792
ACTPT| -.06192*** .01863 -3.32 .0009 -.09844 -.02540
INVCPT| -.13789** .05430 -2.54 .0111 -.24430 -.03147
INVTPT| -.01216 .00834 -1.46 .1448 -.02851 .00419
EGTPT| -.04135** .02074 -1.99 .0462 -.08200 -.00071
TRPT| -1.12894*** .40268 -2.80 .0051 -1.91818 -.33970
CPINC| .00121** .00058 2.09 .0363 .00008 .00235
TN| -1.42844* .84540 -1.69 .0911 -3.08540 .22852
BW| -1.53740* .83780 -1.84 .0665 -3.17947 .10466
INVCCR| -.09990 .28535 -.35 .7263 -.65918 .45938
INVTCAR| -.04997*** .01258 -3.97 .0001 -.07463 -.02530
TC| -.10533 .09264 -1.14 .2555 -.28690 .07624
PC| -.01824 .01813 -1.01 .3145 -.05377 .01730
EGTCAR| -.05702* .03284 -1.74 .0825 -.12139 .00734
-----------+----------------------------------------------------------------------------------------
2
As models of discrete choice are linear in the utility functions, the choice modeler is able to take advantage
of this fact.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
464 The suite of choice models
As an aside, if VTTS are to be estimated for two or more alternatives and all attributes to be
used in the calculation of the WTP measure are specified generic, the resulting VTTS will be
generic to all alternatives.
As an aside, WTP measures are calculated as the ratios of two parameters, and as such are
sensitive to the attribute level ranges used in the estimation of both parameters. Some
researchers have recently observed differences in WTP measures derived from SP and RP
data sources, and have claimed that these differences may be the result of the hypothetical
nature of SP data in which respondents are not bound by real life constraints in the choices
made. As such, many prefer WTP measures derived from RP data where such constraints
are binding. What is only now being acknowledged is that such differences may in part
be the result of different attribute level ranges being employed across different studies.
Even for WTP measures derived from different data sets of similar type (e.g., SP and SP or
RP and RP), differences in the attribute levels ranges may account for some if not all of
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
465 The workhorse – MNL
any differences observed in the WTP measures derived. It is important, therefore, that
researchers report the attribute level ranges used in deriving WTP measures if any objective
comparison is to be made across studies.
Two underlying outputs of estimation of choice models are the overall utility
associated with an alternative and the associated choice probability, obtained
using the form of Equation (11.13).
The output for the first four respondents is summarized below. This data is
obtained as a cut and paste into Excel from the data variable list in Nlogit. This
is seen by clicking on the project file NW_SPRP.lpj, and then the variable list
where the probabilities and utilities are stored under the names chosen in the
command syntax:
Nlogit
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr
;utility=util
;prob=prob
;model:
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt
+ trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt
+ trpt*trnf /
u(bw) = bw + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt
+ trpt*trnf /
u(cr) = invccr*invc + invtcar*invt + TC*TC +
PC*PC + egtcar*egt $
1 0 1 3 1 0.075 −5.500
1 0 2 3 0 0.819 −3.117
1 0 3 3 0 0.106 −5.163
2 0 1 4 1 0.157 −4.461
2 0 2 4 0 0.155 −4.472
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
466 The suite of choice models
2 0 3 4 0 0.015 −6.785
2 0 4 4 0 0.673 −3.004
3 0 1 4 0 0.018 −7.562
3 0 2 4 0 0.242 −4.968
3 0 3 4 1 0.287 −4.796
3 0 4 4 0 0.453 −4.342
4 0 1 4 0 0.054 −5.492
4 0 2 4 0 0.337 −3.651
4 0 3 4 1 0.426 −3.417
4 0 4 4 0 0.183 −4.262
Appendix 11A: The labeled choice data set used in the chapter
3
The north-west sector is approximately 25 kilometres from the Sydney central business district (CBD). It
is the fastest-growing sector of Sydney in terms of residential population and traffic build up. It is also one
of the wealthiest areas with high car ownership and usage and a very poor public transport service with
the exception of a busway system along the M2 tollroad into the CBD.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
467 The workhorse – MNL
The principal aim of the study was to establish the preferences of residents
within the study area for private and public transport modes for commuting
and non-commuting trip purposes. Once known, the study called for the
preferences to be used to forecast patronage levels for various currently non-
existing transport modes, specifically possible new heavy rail, light rail, or
busway modes.
To capture information on the preferences of residents, an SC experiment
was generated and administered using CAPI technology. Sampled residents
were invited to review a number of alternative main and access modes (both
consisting of public and private transport options) in terms of levels of service
and costs within the context of a recent trip, and to choose the main mode
and access mode that they would use if faced with the same trip circumstance
in the future. Each sampled respondent completed 10 choice tasks under
alternative scenarios of attribute levels and, in each instance, choosing the
preferred main and access modes. The experiment was complicated by the
fact that alternatives available to any individual respondent undertaking a
hypothetical trip depended not only on the alternatives that that respondent
had available at the time of the “reference” trip, but also upon the destination
of the trip. If the trip undertaken was intra-regional, then the existing busway
(M2) and heavy rail modes could not be considered viable alternatives, as
neither mode travels within the bounds of the study area. If, on the other hand,
the reference trip was inter-regional (e.g., to the CBD), then respondents
could feasibly travel to the nearest busway or heavy rail train station (outside
of the origin region) and continue their trip using these modes. Further, not all
respondents have access to a private vehicle for the reference trip, due either to
a lack of ownership or that the vehicle was not available at the time when the
trip was made. Given that the objective of the study was to derive an estimate
of the patronage demand, the lack of availability of privately owned
vehicles (through either random circumstance or non-ownership) should be
accounted for in the SC experiment. Failure to account for the non-availability
of the private vehicle alternative would likely result in biased patronage
demand forecasts, in terms of both the main mode chosen and the mode
chosen to access the main mode.
The master experimental design for the mode SC study required a total of
47 attributes (46 in four levels and 1 in six levels for the blocks) and had
60 runs; that is, there are 6 blocks of 10 choice sets each. The design was
constructed using a procedure that simultaneously optimized the minimiza-
tion of the D-error of the design as well as the correlations (for a discussion
of D-error see for example, Huber and Zwerina, 1996). The final design had
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
468 The suite of choice models
correlations no greater than ± 0.06. The design generated allowed for the
estimation of all main mode and access mode alternative-specific main effects.
Within each block, the order of the choice sets has been randomized to control
for order effect biases. The experiment consisted of different task configura-
tions designed to reflect the alternatives realistically available to a respondent
given the reference trip circumstance reported by the respondent earlier in
the CAPI interview: the configurations consisted of (i) with/without car, (ii)
inter-/intra-regional trips, (iii) new light rail versus new heavy rail, new light
rail versus new busway, and new heavy rail versus new busway. These con-
figurations were included to provide more realism in the scenarios shown to
individual respondents. In order to maintain efficiency and minimize correla-
tions within the data set, a maximum number of complete designs have to
be filled within each configuration. Using the CAPI program, if the first
respondent has a car available for an intra-regional trip with new light rail
and heavy rail alternatives present, she is assigned to block 1 for that config-
uration. If the second respondent is in the exact same configuration, she
is assigned to the second block otherwise she is assigned to block 1 of the
appropriate design configuration. Once a configuration has all blocks com-
pleted, the process starts at block 1 again.
The trip attributes associated with each mode are summarized in
Table 11A.1.
For currently existing modes, the attribute levels were pivoted off the attri-
bute levels captured from respondents for the reference trip (Figure 11A.1).
Respondents were asked to complete information regarding the reference trip
not only for the mode used for the reference trip, but also for the other modes
For existing public transport modes For new public transport modes For the existing car mode
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
469 The workhorse – MNL
Car Trip Information: This mode was not chosen for the actual trip
For the same trip, please answer the follwing questions CAR ONLY FOR THE WHOLE TRIP
7. How long would it take to get from where you would park to your destination? minutes
8. In whose name is the car registered? Private Household registered business Other registered business
Back Next
Figure 11A.1 Example screen to establish current car mode trip profile
Mungerie Park Burns Road N-W business Park Hills Centre Castle Hill Franklin Road
2. From where you live, how long do you estimate it would take you to get to Burns Road station if you:
Drove or were dropped of minutes (Could not/would not drive or being dropped off)
Back Next
Figure 11A.2 Example screen to establish new public mode station and access profile
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
470 The suite of choice models
Fare (one-way) /
running cost (for car)
$ 7.50 $ 4.50 $ 6.00 $ 5.50 $ 7.50 $ 5.60
Toll cost (one-way) N/A N/A N/A N/A N/A $ 2.20
Main
Mode Parking cost (one day) N/A N/A N/A N/A N/A $ 8.00
of
Tansport
In-vehicle travel time 124 mins 113 mins 103 mins 45 mins 45 mins 90 mins
Service frequency (per hour) 10 3 3 6 3 N/A
Time spent transfering
at a rail station 4 mins 6 mins N/A N/A N/A N/A
Thinking about each transport mode Walk Walk Walk Walk Walk
separately, assuming you had taken that mode
Drive Drive Drive Drive Drive
for the journey described, how would you get
to each mode? Catch a bus Catch a bus Catch a bus Catch a bus
Back Next
Fare (one-way) /
running cost (for car)
$ 2.20 $ 3.30 $ 3.75 $ 1.35
Toll cost (one-way) N/A N/A N/A $ 4.00
Main
Mode Parking cost (one day) N/A N/A N/A $ 5.00
of
Tansport
In-vehicle travel time 10 mins 14 mins 23 mins 30 mins
Service frequency (per hour) 13 4 2 N/A
Time spent transferring
at a rail station 8 mins 0 mins N/A N/A
Back Next
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
471 The workhorse – MNL
they had available for that trip. While asking respondents to provide informa-
tion for non-chosen alternatives may potentially provide widely incorrect
attribute levels, choices made by individuals are made based on their percep-
tions of the attribute levels of the available alternatives and not the reality of
the attribute levels of those same alternatives. As such, it was felt that asking
respondents what they thought the levels were for the non-chosen alternatives
was preferable than imposing those levels on the experiment based on some
heuristic given knowledge of the attribute levels for the actual chosen alter-
native. A series of questions was asked to identify the candidate station for the
new public transport mode (see Figures 11A.2 and 11A.3). An illustrative
choice scenario screen is shown in Figure 11A.4.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:32:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.015
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
12.1 Introduction
Before we continue to look at some of the richer sets of behavioral outputs from
the basic MNL model, we want to make an important diversion. Discrete choice
data may come in one of many forms. Aside from revealed preference (RP) and
stated preference (SP) data (see Chapter 6), discrete choice data may be further
categorized as being either labeled or unlabeled in nature. In labeled choice data,
the names of alternatives have substantive meaning to the respondent beyond
their relative order of appearance in a survey (e.g., the alternatives might be
labeled Dr House, Dr Cameron, Dr Foreman, Dr Chase). In unlabeled choice
data, the names of the alternatives convey only the relative order of their
appearance within each survey task, (e.g., drug A, drug B, drug C). Aside
from affecting what outputs can appropriately be derived for the study (e.g.,
elasticities have no substantive meaning in unlabeled experiments), from the
perspective of the overall study this decision is important, as it might directly
impact upon the type and number of parameters that can or will be estimated as
part of the study. As we show below, typically, unlabeled experiments will
involve the estimation of generic parameters only, whereas labeled experiments
may involve the estimation of alternative-specific and/or generic parameter
estimates, hence potentially resulting in more parameter estimates than with
an identical, though unlabeled, experiment
472
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
473 Unlabeled discrete choice data
Table 12.1 Attributes and priors used in the case 1 experimental design
choice preferences in November 2011, data were collected from 109 respondents
who were presented with a choice experiment involving the choice between two
unlabeled routes. Unlike previous route choice toll road studies, the 2011 study
presented respondents with alternative routes describing the amount of time
spent in congested and uncongested traffic conditions broken down into time
spent on free roads and time spent on toll roads (i.e., time spent on free public
roads that is in free-flow and slowed-down traffic conditions, and time spent on
toll roads in free-flow and slowed-down traffic conditions). Respondents were
asked to trade-off these four time components with the number of traffic lights
along the entire route, and the car running costs (petrol, etc.), and a toll
payment. The attribute levels used in designing the choice experiment are
given in Table 12.1 and an illustrative choice screen in Figure 12.1.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
474 The suite of choice models
effects exist, then, all else being equal, the choice shares for alternatives
presented first in a choice task (typically on the LHS or top of the survey)
should be greater than those shown later in the same choice task (typically
those presented on the RHS or bottom of the survey), and failure to account
for such data patterns will likely bias the remaining parameter estimates (of
course, the bias related to positioning is language specific, with English being
read top to bottom, left to right). Such effects may be present independent of
any possible experimental design biases that may exist, where one alternative
may be consistently more attractive than others throughout the design.
Independent of the cause, it is recommended that analysts include ASCs in
J−1 alternatives when dealing with unlabeled choice data, which can be later
removed if found to be statistically insignificant.
The second difference between how one should handle the estimation of
unlabeled choice data and labeled choice data is in how one should treat the
non-constant parameter estimates. Unlike ASCs, which may have some
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
475 Unlabeled discrete choice data
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
476 The suite of choice models
Model 3
Model 1 Model 2
Alternative 1 Alternative 2
Alternative A Alternative B
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
477 Unlabeled discrete choice data
time). For Model 1, however, the opposite is true for time spent travelling on a
toll road (i.e., −0.069 for free-flow time relative to −0.059 for slowed-down
time). This anomaly is corrected in Model 2, where the marginal disutility for
slowed-down time is observed to be greater than that of time spent in free-flow
conditions (i.e., −0.067 for free-flow time and −0.078 for slowed-down time,
respectively).
The inclusion of ASCs in discrete choice models estimated on data collected
from unlabeled choice experiments, therefore, has behavioral meaning (such
as left to right survey response bias); however, the issue remains as to what to
do with any ASCs post-estimation. If one is interested only in calculating
effects such as the marginal willingness to pay (WTP) for certain attributes,
then the inclusion of the ASCs does not matter, as they are ignored in any such
calculation. If, on the other hand, one wishes to use the utility functions for
some form of predictive exercise, then the presence of an ASC may be
problematic. That is, presuming that unlabeled choice experiments should
be used in just such an exercise, which is highly doubtful as most exercises of
this nature will be most likely to be between branded alternatives or product
classes. Even if the results of an unlabeled choice experiment are used for
something akin to prediction, the ASCs could either be simply ignored (as
they represent the average of the unobserved effects for the survey task (often
repeated over multiple hypothetical questions, with various combinations of
attribute levels)). These would be expected to differ to the average of the
unobserved effects for a real market (which often represents a single choice
which will have a specific attribute level combination that cannot possibly be
the same as all the combinations shown in a stated choice experiment,
although the levels may overlap), or will be recalibrated so that the base
market shares for the model match those of the real observed market. In
either case, the ASCs from the modeling exercise are effectively ignored.
As well as estimating ASCs for unlabeled choice experiments, it is also
possible to estimate alternative-specific parameter estimates. Model 3 in Table
12.2 allows for alternative-specific parameter estimates (however, we have
removed the ASC in this model; as this model is behaviorally meaningless, the
removal of the ASC matters not). As can be seen from the model results, the
parameters for the free-flow time spent on non-tolled roads and slowed-down
time on toll roads are statistically significant at the 95 percent level for the
second alternative but not for the first alternative, as is the petrol cost para-
meter, while the free-flow time for the first alternative is statistically significant
at the 95 percent level but not statistically significant for the second alter-
native. The slowed-down time on free roads parameter is significant at the 95
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
478 The suite of choice models
percent level for both alternatives, although it is almost 2.5 times greater in
magnitude for alternative 2 than for alternative 1.
It is worth noting that Model 3 produces a better model fit for the data
relative to Model 1 ð 2LL ¼ 111:715; 27 ¼ 14:067Þ, although not relative to
Model 2 ð 2LL ¼ 7:513; 26 ¼ 12:592Þ: As such, if the choice was solely
between Model 1 and Model 3 on the grounds of model fit alone, Model 3
would be the preferred model. Nevertheless, as discussed previously, Model 3
is behaviorally meaningless and impossible to use in practice. For example,
assuming one were to use Model 3 to work out the relative utilities for
travelling along two well known roads, Elm Street and Wall Street, the analyst
would need to first figure out which estimates belong to which street (i.e., is
Elm Street represented by the estimates for Alternative A or B?), before
working out the attribute levels to apply. Note that this is not like the issue
of ASCs, where one can simply ignore them. In working out the relative
utilities for two routes, one has to assign a specific marginal (dis)utility for
the travel time and cost components of the trip.
12.4 Moving beyond design attributes when using unlabeled choice data
The models reported in Section 12.3 allowed only the attributes of the design
to enter into the various indirect utility functions. In this section, we extend
the discussion to explore how additional covariate information, such as socio-
demographic variables, may be used to enhance the performance of models
estimated on unlabeled choice data. There currently exist two main ways that
covariates may enter into the indirect utility functions of discrete choice
models, irrespective of whether the data is labeled or unlabeled in nature.
Discrete choice models require variables to differ across alternatives in order
to be able to estimate the model parameters (overlap between values is
possible in some choice observations, however, so long as each variable dis-
plays some degree of variation between the alternatives over the entire data
set). If a variable remains constant across all alternatives, then it becomes
impossible to isolate the specific influence that variable had in terms of the
utility derived from any one alternative. As such, variables such as socio-
demographics, which remain constant across all alternatives (a respondent
hopefully does not change gender because they are considering alternative A
instead of Alternative B), cannot directly enter into the indirect utility func-
tions of all J alternatives of the model (as the respondent presumably remains
female when considering both alternatives, it is impossible to determine
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
479 Unlabeled discrete choice data
whether their gender played any role in their observed choices; this differs
from an attribute such as price, which will likely differ across alternatives
where if a respondent (whether male or female) is observed to select the lowest
price alternative in the majority of cases, then price is likely to have influenced
their choice). The first way to introduce covariates into discrete choice models
is therefore to enter them into up to J−1 indirect utility functions. In this way,
the covariate is treated in effect as being zero for the indirect utility function(s)
that it is left out of, which artificially creates differences for the variable
between the alternatives. This then allows for estimation of the effect for the
indirect utility functions that it enters into. In such an instance, the parameter
is interpreted as representing the marginal utility that that covariate produces
for the up to J−1 indirect utility functions that it is associated with, relative to
the ones that it does not belong to.
When dealing with unlabeled choice experiments, it is often assumed that
one should enter covariates into the indirect utility functions of the model only
as interaction terms with the design attributes, or as some other transformation
involving the design attributes rather than directly as main effects. The reason-
ing behind such thinking is that the inclusion of covariates in up to J−1 indirect
utility functions is behaviorally meaningless when dealing with unlabeled choice
data, and that the model results cannot be applied meaningfully at a later stage.
Thus, the second way covariates may enter into the indirect utility functions of
discrete choice models is via interaction terms, or some other transformation
involving the variable and one or more other variables that vary across the
alternatives in the data. An interaction term represents the multiplication of two
or more variables, although one could also include effects for the summation (or
some other transformation, including division) of two or more variables,
provided at least one of them varies within the data. By relating a variable
that is constant within each choice observation with one or more variables that
vary, the resulting new variable will also vary across the alternatives. As the new
variable is no longer constant across the alternatives, it may enter into the
indirect utility functions of all J alternatives, although the analyst may also enter
it into less than J alternatives if so desired (although this should be avoided when
dealing with unlabeled choice experiments). When allocated to all J indirect
utility functions, interaction terms can be used not only for behavioral inter-
pretation, but also for other calculations such as marginal WTP, as well as for
predictive exercises, much like main effects. The more accepted approach to
including covariates into the indirect utility functions of models estimated on
unlabeled choice data is, therefore, via interaction terms with one or more of the
design attributes.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
480 The suite of choice models
Table 12.4 Model results from unlabeled choice experiment with socio-demographic characteristics
Model 4 Model 5
Using the same data as before, Models 4 and 5 reported in Table 12.4 allow
for socio-demographic variables to enter into the indirect utility functions in
the two ways just mentioned. In Model 4, age enters into the indirect utility
function of the first alternative only as a main effect. Model 4 represents an
extension of Model 2 by allowing age to enter into the indirect utility function
of the first alternative only, in direct contravention to conventional thinking.
As can be seen, the parameter for age is statistically significant at the 95
percent level, and the model provides a statistically significant improvement
in terms of model fit relative to Model 2 ð 2LL ¼ 4:104; 27 ¼ 3:841Þ: Far
from having no behavioral interpretation, the positive age parameter suggests
that older respondents are more likely to select the first alternative relative to
the second, all else being equal, and hence be more subject to left to right
response bias when answering the survey tasks. The fact that the ASC is now
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
481 Unlabeled discrete choice data
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
482 The suite of choice models
Table 12.5 Model results from unlabeled choice experiment with interaction terms
Model 6 Model 7
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
483 Unlabeled discrete choice data
on the LRT test (ð 2LL ¼ 6:000; 21 ¼ 3:841Þ). Model 7 includes both types
of interaction terms in a single model. In this model, the three-way interaction
term between the total time spent on free roads, the total time spent on toll
roads, and the number of traffic lights, remains statistically insignificant at the
95 percent level of confidence; however, the interaction term between income
and total time spent on free roads switches sign and becomes positive. This
finding hints at the possible existence of an interaction between income, total
time spent on free roads, and the number of traffic lights, given that time spent
on free roads is common across the two estimated interaction terms (such a
model is reported in Appendix 12A as Model 7A, where this interaction is
found to be statistically significant).
Appendix 12A: Unlabeled discrete choice data Nlogit syntax and output
RESET
IMPORT;FILE=“N:\ITLS\Fittler\Johnr\Studies\DCM2\Data\Route.csv”$
Last observation read from data file was 2616
dstats;rhs=*$
Descriptive Statistics for 31 variables
--------+-------------------------------------------------------------------------------------------
Variable| Mean Std.Dev. Minimum Maximum Cases Missing
--------+-------------------------------------------------------------------------------------------
ID| 55.0 31.47028 1.0 109.0 2616 0
SET| 6.500000 3.452713 1.0 12.0 2616 0
CSET| 2.0 0.0 2.0 2.0 2616 0
ALTIJ| 1.500000 .500096 1.0 2.0 2616 0
CHOICE| .500000 .500096 0.0 1.0 2616 0
FFNT| 14.0 4.472991 8.0 20.0 2616 0
SDTNT| 10.50000 3.354743 6.0 15.0 2616 0
FFT| 5.266667 1.389110 3.600000 7.0 2616 0
SDTT| 5.266667 1.389110 3.600000 7.0 2616 0
LGHTS| 7.0 1.633305 5.0 9.0 2616 0
PC| 3.600000 .447299 3.0 4.200000 2616 0
TC| 1.800000 .223650 1.500000 2.100000 2616 0
GEN| .192661 .981453 -1.000000 1.0 2616 0
AGE| 48.73394 13.36184 24.0 70.0 2616 0
INC| 58.99083 42.11309 10.0 200.0 2616 0
LGHT5E| 0.0 .816653 -1.000000 1.0 2616 0
LGHT7E| 0.0 .816653 -1.000000 1.0 2616 0
LGHT5D| .333333 .471495 0.0 1.0 2616 0
LGHT7D| .333333 .471495 0.0 1.0 2616 0
FFNT20E| 0.0 .707242 -1.000000 1.0 2616 0
FFNT16E| 0.0 .707242 -1.000000 1.0 2616 0
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
484 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
485 Unlabeled discrete choice data
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
486 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
487 Unlabeled discrete choice data
------------+---------------------------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
----------------------------------------------------------------------------------------------------
Model 4: Model with age (MNL)
nlogit
;lhs=choice,cset,Altij
;choices=A,B
;model:
U(A) = SP1 + FFNT*FFNT + SDTNT*SDTNT + FFT*FFT + SDTT*SDTT + LGHTS*LGHTS +
PC*PC + TC*TC + AGE*AGE /
U(B) = FFNT*FFNT + SDTNT*SDTNT + FFT*FFT + SDTT*SDTT + LGHTS*LGHTS +
PC*PC + TC*TC $
Normal exit: 5 iterations. Status=0, F= 788.8667
----------------------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -788.86673
Estimation based on N = 1308, K = 9
Inf.Cr.AIC = 1595.7 AIC/N = 1.220
Model estimated: Jul 22, 2013, 16:41:09
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;. . .;RHS=ONE$
Chi-squared[ 8] = 147.19806
Prob [ chi squared > value ] = .00000
Response data are given as ind. choices
Number of obs.= 1308, skipped 0 obs
-----------+----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+---------------------------------------------------------------------------------------
SP1| .16611 .22903 .73 .4683 -.28278 .61500
FFNT| -.07604*** .00959 -7.93 .0000 -.09483 -.05725
SDTNT| -.08732*** .01104 -7.91 .0000 -.10896 -.06567
FFT| -.06489** .03224 -2.01 .0441 -.12807 -.00170
SDTT| -.08234* .04206 -1.96 .0503 -.16478 .00009
LGHTS| -.03715 .02560 -1.45 .1466 -.08732 .01302
PC| -.54493*** .09427 -5.78 .0000 -.72970 -.36015
TC| -.64909*** .22016 -2.95 .0032 -1.08060 -.21758
AGE| .00923** .00457 2.02 .0432 .00028 .01818
-----------+----------------------------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
----------------------------------------------------------------------------------------------------
Model 5: Model with income and travel time interaction effects (MNL)
CREATE;TFRINC=(FFNT+SDTNT)*INC$
CREATE;TTRINC=(FFT+SDTT)*INC$
nlogit
;lhs=choice,cset,Altij
;choices=A,B
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
488 The suite of choice models
;model:
U(A) = SP1 + FFNT*FFNT + SDTNT*SDTNT + FFT*FFT + SDTT*SDTT + LGHTS*LGHTS +
PC*PC + TC*TC + TFRINC*TFRINC + TTRINC*TTRINC /
U(B) = FFNT*FFNT + SDTNT*SDTNT + FFT*FFT + SDTT*SDTT + LGHTS*LGHTS +
PC*PC + TC*TC + TFRINC*TFRINC + TTRINC*TTRINC $
Normal exit: 5 iterations. Status=0, F= 785.8665
----------------------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -785.86650
Estimation based on N = 1308, K = 10
Inf.Cr.AIC = 1591.7 AIC/N = 1.217
Model estimated: Jul 22, 2013, 16:55:23
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;. . .;RHS=ONE$
Chi-squared[ 9] = 153.19852
Prob [ chi squared > value ] = .00000
Response data are given as ind. choices
Number of obs.= 1308, skipped 0 obs
-----------+----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+----------------------------------------------------------------------------------------
SP1| .62278*** .06246 9.97 .0000 .50037 .74520
FFNT| -.09609*** .01296 -7.42 .0000 -.12148 -.07070
SDTNT| -.10713*** .01398 -7.66 .0000 -.13453 -.07974
FFT| -.13630*** .04384 -3.11 .0019 -.22223 -.05038
SDTT| -.14823*** .05134 -2.89 .0039 -.24886 -.04760
LGHTS| -.03828 .02571 -1.49 .1366 -.08868 .01212
PC| -.54577*** .09447 -5.78 .0000 -.73092 -.36062
TC| -.64372*** .22030 -2.92 .0035 -1.07551 -.21194
TFRINC| .00033** .00014 2.38 .0173 .00006 .00061
TTRINC| .00117** .00050 2.36 .0183 .00020 .00215
-----------+----------------------------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
----------------------------------------------------------------------------------------------------
Model 6: Model with travel time interaction and traffic lights (MNL)
Create;INT*INT= (SDTT+SDTT)*(SDTNT+FFT)*LGHTS$
Nlogit
;lhs=choice,cset,Altij
;choices=A,B
;model:
U(A) = SP1 + FFNT*FFNT + SDTNT*SDTNT + FFT*FFT + SDTT*SDTT + LGHTS*LGHTS +
PC*PC + TC*TC + INT*INT /
U(B) = FFNT*FFNT + SDTNT*SDTNT + FFT*FFT + SDTT*SDTT + LGHTS*LGHTS +
PC*PC + TC*TC + INT*INT $
Normal exit: 5 iterations. Status=0, F= 789.4044
----------------------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
489 Unlabeled discrete choice data
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
490 The suite of choice models
Chi-squared[10] = 156.19418
Prob [ chi squared > value ] = .00000
Response data are given as ind. choices
Number of obs.= 1308, skipped 0 obs
-----------+----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+----------------------------------------------------------------------------------------
SP1| .62133*** .06253 9.94 .0000 .49878 .74388
FFNT| -.09164*** .01316 -6.96 .0000 -.11743 -.06584
SDTNT| -.29250*** .10834 -2.70 .0069 -.50483 -.08016
FFT| -.24314*** .07597 -3.20 .0014 -.39204 -.09423
SDTT| -.21521*** .06443 -3.34 .0008 -.34150 -.08893
LGHTS| -.27948** .14199 -1.97 .0490 -.55778 -.00119
PC| -.64940*** .11256 -5.77 .0000 -.87001 -.42878
TC| -.57785*** .22378 -2.58 .0098 -1.01646 -.13925
INT| .00080* .00046 1.73 .0841 -.00011 .00171
TFRINC| .00033** .00014 2.37 .0177 .00006 .00061
TTRINC| .00117** .00050 2.35 .0187 .00019 .00214
-----------+----------------------------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
----------------------------------------------------------------------------------------------------
Model 7A: Model with income, travel time and traffic lights (MNL)
nlogit
;lhs=choice,cset,Altij
;choices=A,B
;model:
U(A) = SP1 + FFNT*FFNT + SDTNT*SDTNT + FFT*FFT + SDTT*SDTT + LGHTS*LGHTS +
PC*PC + TC*TC + TFRINC*TFRINC + TFRINCL*TFRINCL + TTRINC*TTRINC /
U(B) = FFNT*FFNT + SDTNT*SDTNT + FFT*FFT + SDTT*SDTT + LGHTS*LGHTS +
PC*PC + TC*TC + TFRINC*TFRINC + TFRINCL*TFRINCL + TTRINC*TTRINC $
Normal exit: 5 iterations. Status=0, F= 784.7284
----------------------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -784.72837
Estimation based on N = 1308, K = 11
Inf.Cr.AIC = 1591.5 AIC/N = 1.217
Model estimated: Jul 23, 2013, 11:15:55
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;. . .;RHS=ONE$
Chi-squared[10] = 155.47478
Prob [ chi squared > value ] = .00000
Response data are given as ind. choices
Number of obs.= 1308, skipped 0 obs
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
491 Unlabeled discrete choice data
-----------+----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+----------------------------------------------------------------------------------------
SP1| .61946*** .06254 9.91 .0000 .49689 .74202
FFNT| -.09649*** .01302 -7.41 .0000 -.12201 -.07096
SDTNT| -.10790*** .01399 -7.71 .0000 -.13531 -.08048
FFT| -.14047*** .04400 -3.19 .0014 -.22670 -.05423
SDTT| -.15297*** .05153 -2.97 .0030 -.25398 -.05197
LGHTS| -.07981** .03784 -2.11 .0350 -.15398 -.00563
PC| -.54678*** .09442 -5.79 .0000 -.73185 -.36172
TC| -.64716*** .22059 -2.93 .0033 -1.07951 -.21481
TFRINC| .00014 .00019 .74 .4564 -.00023 .00052
TFRINCL| .27610D-04 .1835D-04 1.50 .1325 -.83593D-05 .63580D-04
TTRINC| .00125** .00050 2.50 .0125 .00027 .00224
-----------+----------------------------------------------------------------------------------------
Note: nnnnn.D-xx or D+xx => multiply by 10 to -xx or +xx.
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
----------------------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:23 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.016
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
13.1 Introduction
As an aside, note that some rows begin with a question mark (?) or a question mark is
inserted after a command. The question mark informs Nlogit to ignore everything that
follows to the right within that line of the command. In this way, the analyst may use the
492
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
493 Getting more from your model
question mark to make comments to the right of ? that may aid in understanding what it is
that the command is supposed to do, or allow the analyst to return to a previous command
without having to retype it, simply by deleting the question mark.
Transforming variables
Create;if(altij<4)invt2=wt+invt$
Saving the data including transformation as an .lpj file for future use
Save;FILE=“C:\Books\DCMPrimer\Second Edition 2010\Latest Version\Data
and nlogit set ups\SPRPLabelled\NW_SPRP.lpj”$
|-> Nlogit
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr
;show
;descriptives;crosstabs
;effects:invc(*)/invt2(bs,tn,bw)/invt(cr)/act[bs,tn,bw]
;export=matrix
;pwt
;model:
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
494 The suite of choice models
As an aside, be careful when selecting names for parameters. For example, suppose we
accidentally used the same parameter name for transfer time and the train constant (i.e.,
tn). This model has an error in it. You are then using the symbol tn in the first three utility
functions as the parameter that multiplies attribute trnf. You are also using symbol tr as
the parameter that is the constant term in the utility function for tn. So, you are forcing tr to
do three things. The constant is unlikely to show up in the show table for train. The reason
it does not look like the train equation in ;show as a constant term is because Nlogit is
using an internal code to impose the constraint that the tn that is the constant term in the
second equation also multiplies the attribute trnf in that same equation. Unfortunately, the
routine that is trying to display the utility functions is getting confused by the extra (hidden)
notation.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
495 Getting more from your model
Nlogit (August 2013 onwards) there is a new command CORR ; Rhs = <list of
variables> $. It is the same as Dstats but it skips the descriptives and goes right
to the correlation matrix. There is no need for ;Output=2.
As an aside, Missing data are handled by listwise deletion. In order to compute the
correlations, the program goes through the observations. If any of the variables have missing
data, the observation is dropped.
As an aside, If you want to mean centre a variable, it can easily be done by using CREATE ;
CenteredX = Dev(x) $
|-> dstats;rhs=*;output=2$
Descriptive Statistics for 21 variables
-----------+-----------------------------------------------------------------------------------
Variable| Mean Std.Dev. Minimum Maximum Cases Missing
-----------+-----------------------------------------------------------------------------------
ID| 99.44980 56.56125 1.0 197.0 747 0
SET| 0.0 0.0 0.0 0.0 747 0
ALTIJ| 2.456493 1.116136 1.0 4.0 747 0
ALTN| 2.456493 1.116136 1.0 4.0 747 0
CSET| 3.859438 .411368 2.0 4.0 747 0
CHOICE| .263722 .440945 0.0 1.0 747 0
ACT| 10.80245 12.18600 0.0 210.0 572 175
INVC| 5.559839 3.418899 0.0 42.0 747 0
INVT| 52.68407 27.28067 2.0 501.0 747 0
TC| 3.765714 2.705246 0.0 7.0 175 572
PC| 11.60571 13.55063 0.0 60.0 175 572
EGT| 8.551539 8.468263 0.0 100.0 747 0
TRNF| .316434 .465491 0.0 1.0 572 175
WT| 4.402098 7.893725 0.0 35.0 572 175
SPRP| 1.0 0.0 1.0 1.0 747 0
AGE| 42.98260 12.59956 24.0 70.0 747 0
PINC| 63.19277 41.61792 0.0 140.0 747 0
HSIZE| 3.755020 2.280048 1.0 30.0 747 0
KIDS| 1.005355 1.110502 0.0 4.0 747 0
GENDER| .500669 .500335 0.0 1.0 747 0
INVT2| 43.07497 36.47828 0.0 511.0 747 0
-----------+-----------------------------------------------------------------------------------
dstats;rhs=choice,act,invc,invt2,invt,egt,trnf,pinc,gender;output=2$
Note: if you only want the correlation matrix you can use, for example, corr;
rhs=choice,act,invc,invt2,invt,egt,trnf,pinc,gender$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
496 The suite of choice models
|-> corr;rhs=choice,act,invc,invt2,invt,egt,trnf,pinc,gender$
Covariances and/or Correlations Using Listwise Deletion
Correlations computed for 9 variables.
Used 747 observations. Sum of weights = 572.0000
-----------+---------------------------------------------------------------------------------------------------
Cor.Mat.| CHOICE ACT INVC INVT2 INVT EGT TRNF PINC
-----------+---------------------------------------------------------------------------------------------------
CHOICE| 1.00000 -.08354 -.08363 -.10836 -.05707 -.06937 -.22558 -.01405
ACT| -.08354 1.00000 -.04605 -.15266 -.13956 .11016 -.10042 -.05123
INVC| -.08363 -.04605 1.00000 .21607 .15946 .12703 .23386 .06777
INVT2| -.10836 -.15266 .21607 1.00000 .97151 -.02915 .46712 -.01704
INVT| -.05707 -.13956 .15946 .97151 1.00000 -.02880 .29381 -.00967
EGT| -.06937 .11016 .12703 -.02915 -.02880 1.00000 -.04104 -.09068
TRNF| -.22558 -.10042 .23386 .46712 .29381 -.04104 1.00000 .01164
PINC| -.01405 -.05123 .06777 -.01704 -.00967 -.09068 .01164 1.00000
-----------+---------------------------------------------------------------------------------------------------
Cor.Mat.| CHOICE ACT INVC INVT2 INVT EGT TRNF PINC
-----------+---------------------------------------------------------------------------------------------------
GENDER| .04319 .04920 .04548 -.00660 -.00409 .02912 .00176 .22643
-----------+------------
Cor.Mat.| GENDER
-----------+------------
GENDER| 1.00000
Note that for this data set there is at least one missing variable for each
observation and so the command ;corr;rhs=*$ does not allow a matrix, and
reports the following:
|-> corr;rhs=*$
Covariances and/or Correlations Using Listwise Deletion
See DSTAT for (dropped) variables with no valid observations.
Correlations computed for 21 variables.
Used 747 observations. Sum of weights = .0000
*********************************************************
After listwise deletion, your sample has no observations.
Use DSTAT;Rhs=<your list>$ to see counts of missing data.
Note:This can occur even if all variables have some data.
*********************************************************
13.2.2 ;Show
The ;show command may be used to generate output informative of both the
market shares and utility structures. The estimated model is set out below and
is followed by the ;show command output in two sections. The first section of
the ;show command output can be further divided into two segments. The
first segment details information on the nested structure of the model. For the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
497 Getting more from your model
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
498 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
499 Getting more from your model
-----------+----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+----------------------------------------------------------------------------------------
BS| -1.68661** .74953 -2.25 .0244 -3.15566 -.21756
ACTPT| -.04533*** .01667 -2.72 .0065 -.07800 -.01265
INVCPT| -.08405 .07151 -1.18 .2399 -.22421 .05611
INVTPT| -.01368 .00840 -1.63 .1033 -.03013 .00278
EGTPT| -.04892* .02934 -1.67 .0954 -.10642 .00858
TRPT| -1.07979*** .41033 -2.63 .0085 -1.88403 -.27555
TN| -1.39443* .72606 -1.92 .0548 -2.81748 .02862
BW| -2.48469*** .74273 -3.35 .0008 -3.94041 -1.02897
INVTCAR| -.04847*** .01032 -4.70 .0000 -.06870 -.02825
TC| -.09183 .08020 -1.14 .2522 -.24902 .06537
PC| -.01899 .01635 -1.16 .2457 -.05104 .01307
EGTCAR| -.05489* .03198 -1.72 .0861 -.11756 .00779
-----------+----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 08:43:34 AM
----------------------------------------------------------------------------------------------------
13.2.3 ;Descriptives
The next output is generated by the command ;descriptives. As with all other
commands such as the ;show command, the ;descriptives command usually
precedes the utility function specification within the command syntax. To avoid
repetition, we discuss this series of output for the bus (BS) alternative only. The
remaining output generated is interpreted in exactly the same manner. The
heading informs the analyst which alternative the output is associated with.
After the heading, the ;descriptives the command output is broken into three
segments. The first segment of this output gives the parameter estimates for the
variables assigned to that alternative via the utility function specification.
The second segment of the ;descriptives command output indicates the
mean and standard deviation for each of the variables as specified within the
utility function for that alternative for the entire sample used for the estima-
tion of the model. For the BS alternative, the mean and standard deviation for
the entire sample for invehicle time (invt) is 71.79 and 43.55 minutes,
respectively.
The last segment of this output details the mean and standard deviation for
the variables assigned to that alternative for those who chose that alternative
only. In this instance, there exist 197 respondents, of whom 38 chose BS. For
the invehicle time attribute, the mean and standard deviation for those
individuals who chose BS is 52.0 and 17.75 minutes, respectively.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
500 The suite of choice models
+------------------------------------------------------------------------------------------------------+
| Descriptive Statistics for Alternative BS |
| Utility Function | | 38.0 observs. |
| Coefficient | All 197.0 obs.|that chose BS |
| Name Value Variable | Mean Std. Dev. |Mean Std. Dev. |
| --------------------------- ----------- | ---------------------------+--------------------------- |
| BS -1.6866 ONE | 1.000 .000| 1.000 .000 |
| ACTPT -.0453 ACT | 5.944 4.662| 5.053 4.312 |
| INVCPT -.0840 INVC | 7.071 3.872| 7.237 6.015 |
| INVTPT -.0137 INVT2 | 71.797 43.551| 52.000 17.747 |
| EGTPT -.0489 EGT | 8.680 7.331| 9.105 10.467 |
| TRPT -1.0798 TRNF | .442 .498| .079 .273 |
+------------------------------------------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------+
| Descriptive Statistics for Alternative TN |
| Utility Function | | 46.0 observs. |
| Coefficient | All 187.0 obs.|that chose TN |
| Name Value Variable | Mean Std. Dev.|Mean Std. Dev. |
| --------------------------- ----------- | ---------------------------+--------------------------- |
| ACTPT -.0453 ACT | 16.016 8.401| 15.239 6.651 |
| INVCPT -.0840 INVC | 4.947 2.451| 4.065 2.435 |
| INVTPT -.0137 INVT2 | 45.257 15.421| 43.630 9.903 |
| EGTPT -.0489 EGT | 8.882 6.788| 7.196 5.714 |
| TRPT -1.0798 TRNF | .230 .422| .174 .383 |
| TN -1.3944 ONE | 1.000 .000| 1.000 .000 |
+------------------------------------------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------+
| Descriptive Statistics for Alternative BW |
| Utility Function | | 42.0 observs. |
| Coefficient | All 188.0 obs.|that chose BW |
| Name Value Variable | Mean Std. Dev.|Mean Std. Dev. |
| --------------------------- ----------- | ---------------------------+--------------------------- |
| ACTPT -.0453 ACT | 10.707 17.561| 5.405 4.854 |
| INVCPT -.0840 INVC | 7.000 3.599| 6.405 1.345 |
| INVTPT -.0137 INVT2 | 50.904 20.300| 54.643 15.036 |
| EGTPT -.0489 EGT | 10.027 9.811| 8.286 5.932 |
| TRPT -1.0798 TRNF | .271 .446| .095 .297 |
| BW -2.4847 ONE | 1.000 .000| 1.000 .000 |
+------------------------------------------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------+
| Descriptive Statistics for Alternative CR |
| Utility Function | | 71.0 observs. |
| Coefficient | All 175.0 obs.|that chose CR |
| Name Value Variable | Mean Std. Dev.|Mean Std. Dev. |
| --------------------------- ----------- | ---------------------------+--------------------------- |
| INVTCAR -.0485 INVT | 55.406 24.166| 43.324 15.839 |
| TC -.0918 TC | 3.766 2.705| 2.592 2.708 |
| PC -.0190 PC | 11.606 13.551| 5.859 10.184 |
| EGTCAR -.0549 EGT | 6.469 9.348| 3.958 4.634 |
+------------------------------------------------------------------------------------------------------+
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
501 Getting more from your model
13.2.4 ;Crosstab
The pseudo-R2 we discussed in Chapter 12 is but one method of determining
how well a choice model is performing. An often more useful method of
determining model performance is to examine a contingency table of the
predicted choice outcomes for the sample, based on the model produced versus
the actual choice outcomes as they exist within the data. To generate such a
contingency table, Nlogit uses the command ;crosstab. If the contingency table
generated by Nlogit is too large (which is not the case in our application), it does
not appear within the output file as with other output generated by Nlogit. To
access the contingency table, the analyst must then use the Matrix: Crosstab
button similar to the LstOutp button described in Chapter 10.
+----------------------------------------------------------------------------+
| Cross tabulation of actual choice vs. predicted P(j) |
| Row indicator is actual, column is predicted. |
| Predicted total is F(k,j,i)=Sum(i=1,. . .,N) P(k,j,i). |
| Column totals may be subject to rounding error. |
+----------------------------------------------------------------------------+
-----------+-------------------------------------------------------------------------------------
NLOGIT Cross Tabulation for 4 outcome Multinomial Choice Model
XTab_Prb| BS TN BW CR Total
-----------+-------------------------------------------------------------------------------------
BS| 12.0000 12.0000 4.00000 10.0000 38.0000
TN| 10.0000 19.0000 5.00000 12.0000 46.0000
BW| 9.00000 18.0000 8.00000 7.00000 42.0000
CR| 8.00000 13.0000 5.00000 45.0000 71.0000
Total| 40.0000 61.0000 22.0000 74.0000 197.000
+----------------------------------------------------------------------------+
| Cross tabulation of actual y(ij) vs. predicted y(ij) |
| Row indicator is actual, column is predicted. |
| Predicted total is N(k,j,i)=Sum(i=1,. . .,N) Y(k,j,i). |
| Predicted y(ij)=1 is the j with largest probability. |
+----------------------------------------------------------------------------+
-----------+-------------------------------------------------------------------------------------
NLOGIT Cross Tabulation for 4 outcome Multinomial Choice Model
XTab_Frq| BS TN BW CR Total
-----------+-------------------------------------------------------------------------------------
BS| 13.0000 13.0000 .000000 12.0000 38.0000
TN| 8.00000 22.0000 2.00000 14.0000 46.0000
BW| 8.00000 24.0000 2.00000 8.00000 42.0000
CR| 5.00000 10.0000 .000000 56.0000 71.0000
Total| 34.0000 69.0000 4.00000 90.0000 197.000
Within the contingency table produced by Nlogit, the rows represent the
number of choices made by those sampled for each alternative, while the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
502 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
503 Getting more from your model
will commence the search for the constant coefficient at valuei and the
search for the variable parameter at valuej (NB: that i can equal j). For
example,
NLOGIT
;lhs = choice, cset, altij
;choices = cart, carnt, bus, train, busway, LR
;model:
U(cart) = asccart(-0.5) + ptcst(-1)*fuel /
U(carnt) = asccarnt + pntcst*fuel /
U(bus) = ascbus + cst(-0.8)*fare /
U(train) = asctn + cst*fare/
U(busway) = ascbusw + cst*fare /
U(LR) = cst*fare $
Note that the starting point for the search for the CART alternative ASC will
be –0.5 and –1 for the CART fuel attribute. The command also specifies that
the search start point for the fare attribute of the BUS alternative will be –0.8.
The astute reader will note that in the above command the fare attribute has
been specified as generic across all of the public transport alternatives. In the
case of generic parameter estimates, it is only necessary to specify the start
point for one alternative. Thus, the start point for the fare attributes of the
train, busway, and light rail alternatives will also be −0.8. The start values for
the fuel parameter of the CARNT alternative will be zero, as will the constant
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
504 The suite of choice models
terms (save for the CART alternative) as no starting point was given for these.
We omit the results from this exercise for this example as the results generated
are no different to the results if no start point is given.
The two main methods of calculation are the arc elasticity method and the point
elasticity method. The default Nlogit output is a point elasticity (except where a
dummy variable is used, in which case an arc elasticity is provided, based on the
average of the before and after probabilities and attribute levels). We discuss arc
elasticities in Section 13.3.3 and how they can be derived for any measurement
unit (e.g., ratio or ordinal) using Nlogit’s simulation capability.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
505 Getting more from your model
will not offer a guess as to what this 1 percent change in gender is). Thus while Nlogit will
provide elasticity estimates for dummy and effects coded variables, the output generated
cannot be meaningfully interpreted. Large changes in a variable such as a 100 percent
change do, however, make sense when discussing categorically coded variables (e.g., from
dummy code 1 to 0). Thus, such variables have to be handled using the arc elasticity formula
discussed later if the analyst requires elasticities. We therefore calculate and interpret the
point elasticities for continuous level data.
To calculate elasticities using Nlogit, the analyst uses the command ;effects.
For this command, the analyst must specify which variable the elasticity is to
be calculated for (i.e., which Xik) and for which alternative (i.e., which alter-
native is i). The command looks thus:
;effects: <variablek(alternativei)>
For this command, the analyst types the variable name and not the para-
meter name for which the elasticity is to be calculated, followed by the desired
alternative(s) which is placed in round (( )) brackets. It is possible using Nlogit
to calculate the elasticity for a single variable over several alternatives if that
variable relates to more than one alternative. Thus one could calculate the
point elasticities for the invehicle time attribute for both the BS and TN
alternatives. This is done by typing the name of the alternatives within the
round brackets separated by a comma, as shown below:
;effects: <variablek(alternativei, alternativej)>
It is also possible in Nlogit to calculate the point elasticities for more than
one variable at a time. For example, the analyst may calculate both the
elasticity for the invehicle attribute for the BS alternative and the elasticity
for the invehicle cost attribute for the TN alternative. In such cases, the
commands indicating the point elasticities to be estimated are divided by a
slash (/) thus:
;effects: <variablek(alternativei) / variableh(alternativei)>
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
506 The suite of choice models
In general form, the command syntax for point elasticities within the Nlogit
command for the data used in this chapter is:
?( ) is for elasticities, [ ] is for partial or marginal effects:
;effects:invc(*)/invt2(bs,tr,bw)/invt(cr)/act[bs,tr,bw]
;pwt
The elasticity output is presented below. We have used the asterisk symbol (*)
to denote all alternatives. As such, the command syntax above will produce
elasticities for all alternatives associated with a specific attribute. This applies
even if a specific attribute is not associated with every alternative. Another way
of defining the relevant alternatives is to list their names in the command such
as invt2(bs,tr,bw). The diagonal estimates are direct elasticities and the off-
diagonal estimates are cross-elasticities. For example, −0.3792 is the direct
elasticity associated with invehicle cost and bus, and indicates that a 1 percent
increase in the invehicle cost of bus will, all other influences held constant, lead
to a 0.3792 reduction in the probability of choosing the bus. 0.1033 tells us that
a 1 percent increase in the invehicle cost of bus will, all other influences held
constant, result in a 0.1033 percent increase in the probability of choosing the
train (however, be warned about the value of cross-elasticities in a model such
as MNL where IID applies, as discussed above, where differences in all cross-
elasticities in a row are due to the ;pwt command, which uses the sample
enumeration instead of averaging over observations using naive pooling).
We show below the empirical implications of examples when ;pwt is
excluded and included. Note that the cross-elasticities are the same under
naive pooling and differ when ;pwt is used. Although this difference may be of
concern, at the individual respondent level they are the same. The probability
weighting causes differences in the results, which are substantial.
No ;pwt:
Timer$
Nlogit
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr
;show
;descriptives;crosstabs
?( ) is for elasticities, [ ] is for partial or marginal effects:
;effects:invc(*)/invt2(bs,tn,bw)/invt(cr)/act[bs,tn,bw];export=matrix
? ;export=tables
;export=both
?;pwt
?;wts=gender
;model:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
507 Getting more from your model
;pwt included
Timer$
Nlogit
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr
;show
;descriptives;crosstabs
?( ) is for elasticities, [ ] is for partial or marginal effects:
;effects:invc(*)/invt2(bs,tn,bw)/invt(cr)/act[bs,tn,bw];export=matrix
? ;export=tables
;export=both
;pwt
?;wts=gender
;model:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
508 The suite of choice models
tables with standard errors. These are the same regardless of what you export.
The output associated with ;Full is given below.
;effects:invc(*)/invt2(bs,tn,bw)/invt(cr);full;pwt
+-----------------------------------------------------------------------+
| Elasticity averaged over observations.|
| Effects on probabilities of all choices in model: |
| * = Direct Elasticity effect of the attribute. |
+-----------------------------------------------------------------------+
-----------------------------------------------------------------------------------------------------
Average elasticity of prob(alt) wrt INVC in BS
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| -.36240*** .01748 -20.73 .0000 -.39666 -.32813
TN| .10679*** .00573 18.64 .0000 .09556 .11801
BW| .12445*** .00619 20.11 .0000 .11232 .13658
CR| .06967*** .00548 12.72 .0000 .05894 .08041
-----------+-----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 08:59:06 AM
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------
Average elasticity of prob(alt) wrt INVC in TN
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| .10998*** .00549 20.04 .0000 .09923 .12073
TN| -.21858*** .01046 -20.90 .0000 -.23908 -.19809
BW| .14703*** .00647 22.71 .0000 .13434 .15972
CR| .07856*** .00575 13.67 .0000 .06730 .08982
-----------+-----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 08:59:07 AM
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------
Average elasticity of prob(alt) wrt INVC in BW
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| .06621*** .00336 19.73 .0000 .05963 .07279
TN| .07187*** .00356 20.19 .0000 .06490 .07885
BW| -.44186*** .01090 -40.53 .0000 -.46323 -.42050
CR| .03508*** .00256 13.69 .0000 .03005 .04010
-----------+-----------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
509 Getting more from your model
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
510 The suite of choice models
-----------------------------------------------------------------------------------------------------
Average elasticity of prob(alt) wrt INVT in CR
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| .55295*** .03454 16.01 .0000 .48525 .62064
TN| .55344*** .03597 15.39 .0000 .48295 .62394
BW| .53235*** .03500 15.21 .0000 .46376 .60094
CR| -.91217*** .06191 -14.73 .0000 -1.03351 -.79084
-----------+-----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
The elasticity output above is presented in the Nlogit output on the screen. In
addition, you can also save the output in various levels of detail as an Excel
comma delimited (CSV) file. To do this, you have to open a file prior to
running the Nlogit model commands. An example used here is:
OPEN;export=“C:\Books\DCMPrimer\Second Edition 2010\Latest Version\Data
and nlogit set ups\SPRPLabelled\NWelall.csv”$
We suggest you use a file name that has meaning in terms of the elasticities
you want to export, and that each time you run the model you rename the file
to keep a set of separate files on the various model run outputs.
To be able to export findings to the spreadsheet, you have to add extra
commands within the model set up. There are three options available. In the
model (such as Nlogit or later other forms such as RPlogit or LClogit), in addition
to the effects command you control the export of results by selecting ;Export =
matrix to get the compact matrices, ;Export = tables to get the ;Full output, but
no matrices, and ;Export = both to get all available outputs. ;Export=both
mimics in the CSV file what you get with ;Full on your screen. These features
handle all model forms, and any number of choices. (The CSV file is limited to
254 choices if you are exporting to Excel 2003. Excel 2007 allows 65,536
columns.) We illustrate the output by running Nlogit with ;Export=both.
OPEN;export=“C:\Books\DCMPrimer\Second Edition 2010\Latest Version\Data
and nlogit set ups\SPRPLabelled\NWelall.csv”$
Nlogit
...
;effects:invc(*)
;full
?;export=matrix
? ;export=tables
;export=both
;pwt
. . .$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
511 Getting more from your model
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
512 The suite of choice models
0:6 0:55 2
ExPikq
iq
¼ : ¼ 0:182:
1 2 0:55
0:6 0:55 1
ExPikq
iq
¼ : ¼ 0:08:
1 2 0:6
Using the before change values for Xikq and Piq suggests that a 1 percent change in
price will yield an 0.08 percent decrease in the probability of selecting the
alternative for which the price change occurred, ceteris paribus. There thus exists
a discrepancy of some 0.1 percent between using the before change values and the
after change values to calculate the point elasticity for the above example.
Which is the correct elasticity to use? For multi-million dollar projects, the
answer to this question may prove critical. Rather than answer the above ques-
tion, economists prefer to answer a different question. That is, is the magnitude of
difference in an elasticity calculated using the before and after change values
sufficiently large enough to warrant concern? If the difference is marginal, then it
matters not whether the before or after change values are used. If the magnitude
of difference is non-marginal, however, then the analyst may calculate the
elasticity using another method, known as the arc elasticity method. What
constitutes a marginal or non-marginal difference is up to the individual analyst.
The calculation of arc elasticity involves using an average of the before or
after change values. Thus Equation (8.20) in Chapter 8 becomes:
∂Piq x ikq
ExPikq
iq
¼ : : ð13:1Þ
∂xikq P iq
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
513 Getting more from your model
For the previous example, using Equation (13.1), the calculated elasticity now
becomes:
Note that the arc elasticity will lie somewhere, but not necessarily halfway,
between the direct elasticities calculated using the before and after change
values. If you need to obtain an arc elasticity, then Nlogit can provide the
before and after outputs necessary to apply Equation (13.1) via the ;simula-
tion and ;scenario commands, which we discuss in Section 13.4. Before doing
so, we discuss marginal or partial effects, which are an alternative output to the
elasticity outputs (indeed, elasticities are calculated using the information
from the partial effects).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
514 The suite of choice models
;effects:act[bs,tn,bw];full;pwt
----------------------------------------------------------------------------------------------
Average partial effect on prob(alt) wrt ACT in BS
-----------+----------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+----------------------------------------------------------------------------------
BS| -.00857*** .00020 -43.20 .0000 -.00896 -.00818
TN| .00350*** .00017 20.71 .0000 .00317 .00383
BW| .00184*** .00011 16.76 .0000 .00162 .00205
CR| .00277*** .00014 19.80 .0000 .00249 .00304
-----------+----------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 09:03:03 AM
----------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------
Average partial effect on prob(alt) wrt ACT in TN
-----------+----------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+----------------------------------------------------------------------------------
BS| .00416*** .00018 23.04 .0000 .00381 .00451
TN| -.00939*** .00014 -66.97 .0000 -.00966 -.00911
BW| .00316*** .00017 18.66 .0000 .00282 .00349
CR| .00385*** .00017 22.06 .0000 .00351 .00419
-----------+----------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 09:03:03 AM
----------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------
Average partial effect on prob(alt) wrt ACT in BW
-----------+----------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+----------------------------------------------------------------------------------
BS| .00195*** .00011 17.31 .0000 .00173 .00217
TN| .00276*** .00015 18.60 .0000 .00247 .00305
BW| -.00620*** .00018 -35.10 .0000 -.00654 -.00585
CR| .00131*** .7888D-04 16.58 .0000 .00115 .00146
-----------+----------------------------------------------------------------------------------
nnnnn.D-xx or D+xx => multiply by 10 to -xx or +xx.
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 09:03:03 AM
----------------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
515 Getting more from your model
As an aside, as the choice probabilities must sum to one, the marginal effects which
represent the change in the choice probabilities are mathematically constrained to sum to
zero, thus representing a net zero change over all alternatives. This is not true of elasticities.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
516 The suite of choice models
This is a binary logit model. We can fit this model by restricting focus to the
rows of data that apply to cr, and modeling the choice variable, choice,
which equals one for those who chose cr, and zero for those who did not.
Thus:
logit ; if[altij = 4] ; lhs = choice ; rhs = one,invt,tc,pc,egt $
The estimated model is shown below. Before examining the binary logit
model, it is interesting to compare the estimates with those that were
obtained as part of the earlier MNL model. They are shown in the table
below. They are strikingly similar, though this is to be expected. The
differences can be explained by two sources: sampling variability – 175
observations is not a very large sample – and the violation of the IIA
assumption that we explored in Chapter 7. In a larger sample, and under
the assumption of IIA, we would expect to get the same estimated model
whether based on the full MNL or a marginal model for just one of the
choices.
---------------------------------------------------------------------------------------------------
Binary Logit Model for Binary Choice
Dependent variable CHOICE
Log likelihood function -88.60449
Restricted log likelihood -118.17062
Chi squared [ 4](P= .000) 59.13226
Significance level .00000
McFadden Pseudo R-squared .2501987
Estimation based on N = 175, K = 5
Inf.Cr.AIC = 187.2 AIC/N = 1.070
-----------+---------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+---------------------------------------------------------------------------------------
Constant| 3.01108*** .61209 4.92 .0000 1.81141 4.21075
INVT| -.04110*** .01042 -3.94 .0001 -.06152 -.02068
TC| -.14627* .07817 -1.87 .0613 -.29948 .00694
PC| -.02721 .01721 -1.58 .1139 -.06093 .00652
EGT| -.07195** .03287 -2.19 .0286 -.13638 -.00753
-----------+---------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
These are the estimates from the MNL shown earlier
INVTCAR| -.04847*** .01032 -4.70 .0000 -.06870 -.02825
TC| -.09183 .08020 -1.14 .2522 -.24902 .06537
PC| -.01899 .01635 -1.16 .2457 -.05104 .01307
EGTCAR| -.05489* .03198 -1.72 .0861 -.11756 .00779
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
517 Getting more from your model
How do the attributes in the model impact upon the probability of the
choice? The partial effects are:
∂Probðcr ¼ 1jxÞ
¼ Λðαcr þ β0 cr xcr Þ×½1 Λðαcr þ β0 cr xcr Þβcr : ð13:3Þ
∂x
That is, they are a multiple of the coefficient vector. The general result is that,
in the choice model, the parameters are related to, but are not equal to, the
partial effects that we are interested in. Nlogit will include partial effects with
the model results if ;Partials is added to the model command. For this choice
model:
---------------------------------------------------------------------------------------------------
Partial derivatives of E[y] = F[*] with
respect to the vector of characteristics
Average partial effects for sample obs.
-----------+--------------------------------------------------------------------------------------
| Partial Standard Prob. 95% Confidence
CHOICE| Effect Error z |z|>Z* Interval
-----------+--------------------------------------------------------------------------------------
INVT| -.00691*** .00177 -3.91 .0001 -.01038 -.00345
TC| -.02461* .01311 -1.88 .0606 -.05031 .00109
PC| -.00458 .00291 -1.57 .1154 -.01027 .00112
EGT| -.01210** .00554 -2.18 .0290 -.02297 -.00124
-----------+--------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
---------------------------------------------------------------------------------------------------
Note that the results reported are “average partial effects”; they are com-
puted by averaging the partial effects over the sample observations. The
standard errors are computed using the delta method (see Chapter 7). The
necessary Jacobian is given as in Equation (13.4):
∂2 Probðcr ¼ 1jxÞ
Γ¼
∂x∂ðαcr ; β0 Þ
¼ Λðαcr þ β0 cr xcr Þ×½1 Λðαcr þ β0 cr xcr Þ½1 2Λðαcr þ β0 cr xcr Þβcr ð1; x0 cr Þ
ð13:4Þ
For example, a partial effect or the change in the probability per unit change
in the toll cost is estimated to be −0.02461. We need now to determine what
is a reasonable change in the toll cost. Earlier, when we fit the MNL, we
learned about the attributes for each alternative (with ;Describe). For car, we
found:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
518 The suite of choice models
+-------------------------------------------------------------------------------------------------+
| Descriptive Statistics for Alternative CR |
| Utility Function | | 71.0 observs. |
| Coefficient | All 175.0 obs.|that chose CR |
| Name Value Variable | Mean Std. Dev.|Mean Std. Dev. |
| ----------------------- ------------ | -------------------------+-------------------------- |
| INVTCAR -.0485 INVT | 55.406 24.166 | 43.324 15.839 |
| TC -.0918 TC | 3.766 2.705 | 2.592 2.708 |
| PC -.0190 PC | 11.606 13.551 | 5.859 10.184 |
| EGTCAR -.0549 EGT | 6.469 9.348 | 3.958 4.634 |
+------------------------------------------------------------------------------------------------ +
Thus, the toll cost has a mean of about 3.766 and varies from zero to about 9.
So, a one unit change in the toll cost, TC actually is a reasonable experiment.
We thus infer from our results that if the toll cost rises by 1, the probability of
choosing to drive will fall by 1 × 0.02461. To complete this experiment, we
note that in the sample of 175 individuals, 71 (or 41 percent) chose cr. So, the
average probability is about 0.41, and increasing the toll by 1 would likely
decrease the probability to 0.385, or roughly 67 individuals. So, this is a fairly
substantial impact, though it is difficult to see that based on just the model
coefficients, or even just the partial effects. A visualization is much more
informative, which can be provided using the Simulate command in Nlogit,
which we present in Section 13.4.
The simulation capability of Nlogit allows the analyst to use an existing model
to test how changes in attributes and SDCs impact upon the choice probabil-
ities for each of the alternatives. This requires a two-step process:
1. Estimate the model as previously described (automatically saving outputs
in memory);
2. Apply the Simulation command (using the stored parameter estimates) to
test how changes in the attribute and SDC levels impact upon the choice
probabilities.
Step 1 involves the analyst specifying a choice model that will be used as a basis
of comparison for subsequent simulations. The Step 2 involves performing the
simulation to test how changes in an attribute or SDC impact upon the choice
probabilities for the model estimated in step 1.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
519 Getting more from your model
The ;simulation command may be used in one of two ways. Firstly, the
analyst may restrict the set of alternatives used in the simulation by specifying
which alternatives are to be included. For example, the command:
;Simulation = bs,tn
will restrict the simulation to changes in the bs (bus) and tn (train) alter-
natives. All other alternatives will be ignored.
The analyst may include all alternatives by not specifically specifying any
alternatives. Thus:
;Simulation
will have Nlogit perform the simulation on all alternatives specified within
the = <list of alternatives> command.
The remainder of the command syntax instructs Nlogit on what changes to
simulate. A number of points are required to be made. Firstly, the command
begins with a semi colon (;) but the command ;scenario is followed with a
colon (:). Next the variable specified must be included within at least one of the
utility functions and must belong to the alternative specified in the round
brackets. It is possible to simulate a change in an attribute belonging to more
than one alternative by specifying each alternative within the round brackets
separated by a comma. Thus:
;Scenario: invt2(bs,tn)
will simulate a change in the invehicle time attribute of both the bs and tn
alternatives.
The actions specifications are as follows:
= will be the specific value which the variable indicated is to take for each
decision maker (e.g., invt2(bs) = 20 will simulate invehicle time equal to
20 minutes for the bus alternative for all individuals); or
= [+] will add the value following to the observed value within the data for
each decision maker (e.g., invt2(bs = [+]20 will add 20 minutes to the
observed value of the invehicle time attribute for the bus alternative for
each individual); or
= [–] will subtract the value following from the observed value within the data
for each decision maker (e.g., invt2(bs) = [-]20 will subtract 20 minutes
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
520 The suite of choice models
from the observed value of the invehicle time attribute for the bus alter-
native for each individual); or
= [*] will multiply the observed value within the data by the value following
for each individual (e.g., invt2 (bs) = [*]2.0 will double the observed
value of the invehicle time attribute for the bus alternative for each
decision maker); or
= [/] will divide the observed value within the data by the value following
for each individual (e.g., invt2 (bs) = [/]2.0 will halve the observed value
of the invehicle time attribute for the bus alternative for each individual).
The Simulation command may specify that more than one attribute is to
change, and changes may be different across alternatives. To specify more
than one change, the command syntax is as above, however new scenarios are
separated with a slash (/). We show this below:
;Simulation = <list of alternatives>
;Scenario: <variable1(alternativei)> = <[action]magnitude of action> /
<variablek(alternativej)> = <[action]magnitude of action> $
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
521 Getting more from your model
We leave it to the reader to interpret the output above, but note that we use
this model as the basis for demonstrating the simulation capability that
follows. The output generated is as follows. Note that even though the
model is specified as before, Nlogit does not reproduce the standard results
shown above. Only the simulation results shown below are produced.
The first output indicates the total number of observations available for use
for the simulation. This should equate to the number of observations used in
the base choice model. The next output box informs the analyst that the
simulation may be performed on a subset of the available alternatives as
specified with the ;simulation = <list of alternatives> command. The remain-
der of the information provided by this output box informs the reader as to
how to interpret the remainder of the simulation output.
The third output box instructs the analyst what simulation change(s) were
modeled and which attributes and which alternatives those changes apply to.
We leave it to the reader to discover which heading applies to which action.
The last Nlogit simulation output indicates how the actions specified in
the simulation impact upon the choice shares for each of the alternatives.
The first section of this output provides the base shares for the base or
constants-only model (not to be confused with the base choice calibration
model estimated at step 1 of the simulation). The third column of the output
demonstrates how the changes specified by the analyst impact upon these
base choice shares.
In the example, a reduction of invehicle time for bus, train, and busway to
90 percent of the before value will produce an estimated market share for the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
522 The suite of choice models
bus alternative of 26.718, up from 20.203, ceteris paribus. The same change
will produce market shares of 32.596, 12.479, and 28.207 for the train, busway,
and car alternatives, respectively, ceteris paribus.
The final column provides the change in choice shares for each of the
alternatives both as a percent and in raw numbers for the sample. Thus, a
reduction of invehicle time for bus, train, and busway to 90 percent of the
before value, ceteris paribus, decreases the car share as a percent by 9.39, that
translates to 18 of the original 71 choices for that alternative now switching to
another alternative. Of these 13, 3, and 3 of those choices are predicted to
switch to the bus, train, and busway alternatives, respectively. We ignore any
rounding errors.
As an aside, it is possible to conduct more than one simulation concurrently and compare the
results of each simulation using Nlogit. To do so, the analyst separates the simulations
within the scenario command by use of the & character, as we have done in the example. We
show this below for two simulations, although it is possible to perform more than two. The
comparisons generated are pairwise comparisons, and as such Nlogit will generate output
comparing each possible combination of simulation scenario specified by the analyst.
This will simulate the choice probability for values of TC ranging from 0 to
10 in steps of 0.5, tabulate the results, and plot the average predicted prob-
ability with a confidence interval. These are the results:
--------------------------------------------------------------------------------------------
Model Simulation Analysis for Logit Probability Function
--------------------------------------------------------------------------------------------
Simulations are computed by average over sample observations
--------------------------------------------------------------------------------------------
User Function Function Standard
(Delta method) Value Error |t| 95% Confidence Interval
--------------------------------------------------------------------------------------------
Avrg. Function .40571 .03101 13.09 .34495 .46648
TC = .00 .50437 .06555 7.69 .37589 .63286
TC = .50 .49078 .05928 8.28 .37459 .60698
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
523 Getting more from your model
Figure 13.1 expands on our earlier results. We can see that at the average toll
of about 3.8, the average predicted probability is about 0.4; we found 0.41
earlier. As the toll changes from 0 to 10, we can see the predicted probability
falling from a bit over 0.5 down to about 0.25. We can see clearly, with these
tools, the implication of the estimated model for the relationship between toll
cost and the choice to drive versus not drive.
.70
.60
.50
.40
.30
.20
.10
.00
–.10
0.00 2.10 4.20 6.30 8.40 10.50
TC
Average Simulated Function Confidence Interval
Figure 13.1 Experiment I: simulated scenario with confidence intervals
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
524 The suite of choice models
.10
.00
–.10
.00 2.10 4.20 6.30 8.40 10.50
TC
Average Simulated Function Confidence Interval
Figure 13.2 Experiment II: simulated scenario with confidence intervals
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
525 Getting more from your model
|-> Nlogit
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr
;model:
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = bw + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(cr) = invtcar*invt + TC*TC + PC*PC + egtcar*egt
;Simulation;arc
;Scenario: invt2(bs,tn,bw) = 0.9 & invt2(bs,tn,bw) = 0.8 $
+---------------------------------------------------------------+
| Discrete Choice (One Level) Model |
| Model Simulation Using Previous Estimates |
| Number of observations 197 |
+---------------------------------------------------------------+
+--------------------------------------------------------------------------- +
|Simulations of Probability Model |
|Model: Discrete Choice (One Level) Model |
|Simulated choice set may be a subset of the choices. |
|Number of individuals is the probability times the |
|number of observations in the simulated sample. |
|Column totals may be affected by rounding error. |
|The model used was simulated with 197 observations.|
+--------------------------------------------------------------------------- +
----------------------------------------------------------------------------------------------------------
Estimated Arc Elasticities Based on the Specified Scenario. Rows in the table
report 0.00 if the indicated attribute did not change in the scenario
or if the average probability or average attribute was zero in the sample.
Estimated values are averaged over all individuals used in the simulation.
Rows of the table in which no changes took place are not shown.
------------------------------------------------------------------------------------------------------------
Attr Changed in | Change in Probability of Alternative
------------------------------------------------------------------------------------------------------------
Choice BS | BS TN BW CR
x = INVTPT | -.236 -.064 -.102 .192
Choice TN | BS TN BW CR
x = INVTPT | -.218 -.065 -.104 .187
Choice BW | BS TN BW CR
x = INVTPT | -.217 -.065 -.102 .186
Note, results above aggregate more than one change. They are not elasticities.
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
Specification of scenario 1 is:
Attribute Alternatives affected Change type Value
------------- ----------------------------------------- --------------------------------- ------------
INVT2 BS TN BW Fix at new value .900
------------------------------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
526 The suite of choice models
------------------------------------------------------------------------------------------------------------
Specification of scenario 2 is:
Attribute Alternatives affected Change type Value
------------- ----------------------------------------- --------------------------------- ------------
INVT2 BS TN BW Fix at new value .800
------------------------------------------------------------------------------------------------------------
The simulator located 197 observations for this scenario.
Simulated Probabilities (shares) for this scenario:
+--------------+------------------- +------------------- +-------------------------+
|Choice | Base | Scenario | Scenario - Base |
| |%Share Number |%Share Number |ChgShare ChgNumber|
+--------------+------------------- +------------------- +-------------------------+
|BS | 20.203 40 | 26.725 53 | 6.522% 13 |
|TN | 31.126 61 | 32.604 64 | 1.478% 3 |
|BW | 11.075 22 | 12.482 25 | 1.407% 3 |
|CR | 37.596 74 | 28.189 56 | -9.407% -18 |
|Total |100.000 197 |100.000 198 | .000% 1 |
+--------------+------------------- +------------------- +-------------------------+
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
527 Getting more from your model
13.5 Weighting
It is not an uncommon practice for an analyst to weight data so that the data
conforms to some prior view of the world. Consider an example where an
analyst has sample data from a population for which census data (i.e., data on
the entire population) is also available. While the research objective studied by
the analyst may mean that the sample data collected will contain data on
variables not included in the census data, any commonality in terms of
variables collected between the two data sets may be used to re-weight the
sample data to correspond with the distributions of the total population as
observed in the census data.
The information held by the analyst may be used in one of two ways to
weight the data. Firstly, if the information pertains to the true market
shares of the alternatives, the weighting criteria to be applied is said to be
endogenous, endogenous meaning internal to the choice response. The
market shares for the alternatives are represented by the choice variable
within the data set. If the information held by the analyst relates to any
variable other than the choice variable, the weighting criteria to be applied
is said to be exogenous, exogenous meaning external to the system. The
distinction between endogenous weighting and exogenous weighting is
important, as they are handled differently by Nlogit. We will now discuss
both forms of weighting.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
528 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
529 Getting more from your model
+-------------------------------------------------------------------------------------+
| Model Specification: Table entry is the attribute that |
| multiplies the indicated parameter. |
+-----------+--------+--------------------------------------------------------------- +
| Choice |******| Parameter |
| |Row 1| BS ACTPT INVCPT INVTPT EGTPT |
| |Row 2| TRPT TN BW INVTCAR TC |
| |Row 3| PC EGTCAR |
+-----------+--------+--------------------------------------------------------------- +
|BS | 1| Constant ACT INVC INVT2 EGT |
| | 2| TRNF none none none none |
| | 3| none none |
|TN | 1| none ACT INVC INVT2 EGT |
| | 2| TRNF Constant none none none |
| | 3| none none |
|BW | 1| none ACT INVC INVT2 EGT |
| | 2| TRNF none Constant none none |
| | 3| none none |
|CR | 1| none none none none none |
| | 2| none none none INVT TC |
| | 3| PC EGT |
+-------------------------------------------------------------------------------------+
Normal exit: 6 iterations. Status=0, F= 190.4789
----------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -190.47891
Estimation based on N = 197, K = 12
Inf.Cr.AIC = 405.0 AIC/N = 2.056
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;. . .;RHS=ONE$
Chi-squared[ 9] = 152.66810
Prob [ chi squared > value ] = .00000
Vars. corrected for choice based sampling
Response data are given as ind. choices
Number of obs.= 197, skipped 0 obs
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| -1.68661** .74953 -2.25 .0244 -3.15566 -.21756
ACTPT| -.04533*** .01667 -2.72 .0065 -.07800 -.01265
INVCPT| -.08405 .07151 -1.18 .2399 -.22421 .05611
INVTPT| -.01368 .00840 -1.63 .1033 -.03013 .00278
EGTPT| -.04892* .02934 -1.67 .0954 -.10642 .00858
TRPT| -1.07979*** .41033 -2.63 .0085 -1.88403 -.27555
TN| -1.39443* .72606 -1.92 .0548 -2.81748 .02862
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
530 The suite of choice models
+---------------------------------------------------------------------------------------------------+
| Descriptive Statistics for Alternative BS |
| Utility Function | | 38.0 observs. |
| Coefficient | All 197.0 obs.|that chose BS |
| Name Value Variable | Mean Std. Dev.|Mean Std. Dev. |
| ------------------------ ----------- | -------------------------- +--------------------------- |
| BS -1.6866 ONE | 1.000 .000| 1.000 .000 |
| ACTPT -.0453 ACT | 5.944 4.662| 5.053 4.312 |
| INVCPT -.0840 INVC | 7.071 3.872| 7.237 6.015 |
| INVTPT -.0137 INVT2 | 71.797 43.551| 52.000 17.747 |
| EGTPT -.0489 EGT | 8.680 7.331| 9.105 10.467 |
| TRPT -1.0798 TRNF | .442 .498| .079 .273 |
+---------------------------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------------------------+
| Descriptive Statistics for Alternative TN |
| Utility Function | | 46.0 observs. |
| Coefficient | All 187.0 obs.|that chose TN |
| Name Value Variable | Mean Std. Dev.|Mean Std. Dev. |
| ------------------------ ----------- | -------------------------- +--------------------------- |
| ACTPT -.0453 ACT | 16.016 8.401| 15.239 6.651 |
| INVCPT -.0840 INVC | 4.947 2.451| 4.065 2.435 |
| INVTPT -.0137 INVT2 | 45.257 15.421| 43.630 9.903 |
| EGTPT -.0489 EGT | 8.882 6.788| 7.196 5.714 |
| TRPT -1.0798 TRNF | .230 .422| .174 .383 |
| TN -1.3944 ONE | 1.000 .000| 1.000 .000 |
+---------------------------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------------------------+
| Descriptive Statistics for Alternative BW |
| Utility Function | | 42.0 observs. |
| Coefficient | All 188.0 obs.|that chose BW |
| Name Value Variable | Mean Std. Dev.|Mean Std. Dev. |
| ------------------------ ----------- | -------------------------- +--------------------------- |
| ACTPT -.0453 ACT | 10.707 17.561| 5.405 4.854 |
| INVCPT -.0840 INVC | 7.000 3.599| 6.405 1.345 |
| INVTPT -.0137 INVT2 | 50.904 20.300| 54.643 15.036 |
| EGTPT -.0489 EGT | 10.027 9.811| 8.286 5.932 |
| TRPT -1.0798 TRNF | .271 .446| .095 .297 |
| BW -2.4847 ONE | 1.000 .000| 1.000 .000 |
+---------------------------------------------------------------------------------------------------+
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
531 Getting more from your model
+---------------------------------------------------------------------------------------------------+
| Descriptive Statistics for Alternative CR |
| Utility Function | | 71.0 observs. |
| Coefficient | All 175.0 obs.|that chose CR |
| Name Value Variable | Mean Std. Dev.|Mean Std. Dev. |
| ------------------------ ----------- | -------------------------- +--------------------------- |
| INVTCAR -.0485 INVT | 55.406 24.166| 43.324 15.839 |
| TC -.0918 TC | 3.766 2.705| 2.592 2.708 |
| PC -.0190 PC | 11.606 13.551| 5.859 10.184 |
| EGTCAR -.0549 EGT | 6.469 9.348| 3.958 4.634 |
+---------------------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------- +
| Cross tabulation of actual choice vs. predicted P(j) |
| Row indicator is actual, column is predicted. |
| Predicted total is F(k,j,i)=Sum(i=1,. . .,N) P(k,j,i). |
| Column totals may be subject to rounding error. |
+--------------------------------------------------------------------------- +
-----------+------------------------------------------------------------------------------------
NLOGIT Cross Tabulation for 4 outcome Multinomial Choice Model
XTab_Prb| BS TN BW CR Total
-----------+------------------------------------------------------------------------------------
BS| 12.0000 12.0000 4.00000 10.0000 38.0000
TN| 10.0000 19.0000 5.00000 12.0000 46.0000
BW| 9.00000 18.0000 8.00000 7.00000 42.0000
CR| 8.00000 13.0000 5.00000 45.0000 71.0000
Total| 40.0000 61.0000 22.0000 74.0000 197.000
+--------------------------------------------------------------------------- +
| Cross tabulation of actual y(ij) vs. predicted y(ij) |
| Row indicator is actual, column is predicted. |
| Predicted total is N(k,j,i)=Sum(i=1,. . .,N) Y(k,j,i). |
| Predicted y(ij)=1 is the j with largest probability. |
+--------------------------------------------------------------------------- +
-----------+------------------------------------------------------------------------------------
NLOGIT Cross Tabulation for 4 outcome Multinomial Choice Model
XTab_Frq| BS TN BW CR Total
-----------+------------------------------------------------------------------------------------
BS| 13.0000 13.0000 .000000 12.0000 38.0000
TN| 8.00000 22.0000 2.00000 14.0000 46.0000
BW| 8.00000 24.0000 2.00000 8.00000 42.0000
CR| 5.00000 10.0000 .000000 56.0000 71.0000
Total| 34.0000 69.0000 4.00000 90.0000 197.000
+----------------------------------------------------------------------- +
| Derivative averaged over observations.|
| Effects on probabilities of all choices in model: |
| * = Direct Derivative effect of the attribute. |
+----------------------------------------------------------------------- +
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
532 The suite of choice models
-----------------------------------------------------------------------------------------------------
Average elasticity of prob(alt) wrt INVC in BS
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| -.36240*** .01748 -20.73 .0000 -.39666 -.32813
TN| .10679*** .00573 18.64 .0000 .09556 .11801
BW| .12445*** .00619 20.11 .0000 .11232 .13658
CR| .06967*** .00548 12.72 .0000 .05894 .08041
-----------+-----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 09:42:27 AM
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------
Average elasticity of prob(alt) wrt INVC in TN
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| .10998*** .00549 20.04 .0000 .09923 .12073
TN| -.21858*** .01046 -20.90 .0000 -.23908 -.19809
BW| .14703*** .00647 22.71 .0000 .13434 .15972
CR| .07856*** .00575 13.67 .0000 .06730 .08982
-----------+-----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 09:42:27 AM
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------
Average elasticity of prob(alt) wrt INVC in BW
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| .06621*** .00336 19.73 .0000 .05963 .07279
TN| .07187*** .00356 20.19 .0000 .06490 .07885
BW| -.44186*** .01090 -40.53 .0000 -.46323 -.42050
CR| .03508*** .00256 13.69 .0000 .03005 .04010
-----------+-----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 09:42:27 AM
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------
Average elasticity of prob(alt) wrt INVT2 in BS
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| -.53699*** .02234 -24.03 .0000 -.58078 -.49320
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
533 Getting more from your model
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
534 The suite of choice models
-----------------------------------------------------------------------------------------------------
Average partial effect on prob(alt) wrt ACT in BS
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| -.00857*** .00020 -43.20 .0000 -.00896 -.00818
TN| .00350*** .00017 20.71 .0000 .00317 .00383
BW| .00184*** .00011 16.76 .0000 .00162 .00205
CR| .00277*** .00014 19.80 .0000 .00249 .00304
-----------+-----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 09:42:28 AM
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------
Average partial effect on prob(alt) wrt ACT in TN
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| .00416*** .00018 23.04 .0000 .00381 .00451
TN| -.00939*** .00014 -66.97 .0000 -.00966 -.00911
BW| .00316*** .00017 18.66 .0000 .00282 .00349
CR| .00385*** .00017 22.06 .0000 .00351 .00419
-----------+-----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 09:42:28 AM
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------
Average partial effect on prob(alt) wrt ACT in BW
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| .00195*** .00011 17.31 .0000 .00173 .00217
TN| .00276*** .00015 18.60 .0000 .00247 .00305
BW| -.00620*** .00018 -35.10 .0000 -.00654 -.00585
CR| .00131*** .7888D-04 16.58 .0000 .00115 .00146
-----------+-----------------------------------------------------------------------------------------
nnnnn.D-xx or D+xx => multiply by 10 to -xx or +xx.
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 09:42:28 AM
-----------------------------------------------------------------------------------------------------
Elasticity wrt change of X in row choice on Prob[column choice]
-----------+---------------------------------------------------
INVC | BS TN BW CR
-----------+---------------------------------------------------
BS| -.3624 .1068 .1245 .0697
TN| .1100 -.2186 .1470 .0786
BW| .0662 .0719 -.4419 .0351
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
535 Getting more from your model
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
536 The suite of choice models
|-> Nlogit
;lhs = choice, cset, altij
;choices = bs,tn,bw,cr? /0.2,0.3,0.1,0.4
;show
;descriptives;crosstabs
;effects:invc(*)/invt2(bs,tn,bw)/invt(cr)/act[bs,tn,bw]
;full
;export=both
;pwt
;wts=gender
;model:
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = bw + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(cr) = invtcar*invt + TC*TC + PC*PC + egtcar*egt $
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
537 Getting more from your model
--------------------------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Weighting variable GENDER
Log likelihood function -100.03373
Estimation based on N = 197, K = 12
Inf.Cr.AIC = 224.1 AIC/N = 1.137
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;. . .;RHS=ONE$
Chi-squared[ 9] = 70.31989
Prob [ chi squared > value ] = .00000
Response data are given as ind. choices
Number of obs.= 197, skipped 0 obs
-----------+--------------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+--------------------------------------------------------------------------------------------
BS| -2.83181** 1.33516 -2.12 .0339 -5.44867 -.21495
ACTPT| -.06195** .02505 -2.47 .0134 -.11105 -.01284
INVCPT| -.07101 .05536 -1.28 .1996 -.17952 .03750
INVTPT| -.00740 .01222 -.60 .5452 -.03136 .01657
EGTPT| -.04317 .02651 -1.63 .1035 -.09513 .00879
TRPT| -1.45832** .57124 -2.55 .0107 -2.57794 -.33870
TN| -2.60598** 1.30510 -2.00 .0459 -5.16394 -.04802
BW| -2.72118** 1.31273 -2.07 .0382 -5.29409 -.14828
INVTCAR| -.06989*** .02210 -3.16 .0016 -.11320 -.02659
TC| -.11222 .13550 -.83 .4075 -.37780 .15335
PC| -.00487 .02631 -.18 .8533 -.05643 .04670
EGTCAR| -.11837* .06617 -1.79 .0736 -.24805 .01132
-----------+--------------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 09:44:20 AM
--------------------------------------------------------------------------------------------------------
+---------------------------------------------------------------------------------------------------- +
| Descriptive Statistics for Alternative BS |
| Utility Function | | 38.0 observs. |
| Coefficient | All 197.0 obs.|that chose BS |
| Name Value Variable | Mean Std. Dev.|Mean Std. Dev. |
| ------------------------- ----------- | -------------------------- +--------------------------- |
| BS -2.8318 ONE | 1.000 .000| 1.000 .000 |
| ACTPT -.0619 ACT | 5.944 4.662| 5.053 4.312 |
| INVCPT -.0710 INVC | 7.071 3.872| 7.237 6.015 |
| INVTPT -.0074 INVT2 | 71.797 43.551| 52.000 17.747 |
| EGTPT -.0432 EGT | 8.680 7.331| 9.105 10.467 |
| TRPT -1.4583 TRNF | .442 .498| .079 .273 |
+---------------------------------------------------------------------------------------------------- +
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
538 The suite of choice models
+---------------------------------------------------------------------------------------------------- +
| Descriptive Statistics for Alternative TN |
| Utility Function | | 46.0 observs. |
| Coefficient | All 187.0 obs.|that chose TN |
| Name Value Variable | Mean Std. Dev.|Mean Std. Dev. |
| ------------------------- ----------- | -------------------------- +--------------------------- |
| ACTPT -.0619 ACT | 16.016 8.401| 15.239 6.651 |
| INVCPT -.0710 INVC | 4.947 2.451| 4.065 2.435 |
| INVTPT -.0074 INVT2 | 45.257 15.421| 43.630 9.903 |
| EGTPT -.0432 EGT | 8.882 6.788| 7.196 5.714 |
| TRPT -1.4583 TRNF | .230 .422| .174 .383 |
| TN -2.6060 ONE | 1.000 .000| 1.000 .000 |
+---------------------------------------------------------------------------------------------------- +
+---------------------------------------------------------------------------------------------------- +
| Descriptive Statistics for Alternative BW |
| Utility Function | | 42.0 observs. |
| Coefficient | All 188.0 obs.|that chose BW |
| Name Value Variable | Mean Std. Dev.|Mean Std. Dev. |
| ------------------------- ----------- | -------------------------- +--------------------------- |
| ACTPT -.0619 ACT | 10.707 17.561| 5.405 4.854 |
| INVCPT -.0710 INVC | 7.000 3.599| 6.405 1.345 |
| INVTPT -.0074 INVT2 | 50.904 20.300| 54.643 15.036 |
| EGTPT -.0432 EGT | 10.027 9.811| 8.286 5.932 |
| TRPT -1.4583 TRNF | .271 .446| .095 .297 |
| BW -2.7212 ONE | 1.000 .000| 1.000 .000 |
+---------------------------------------------------------------------------------------------------- +
+---------------------------------------------------------------------------------------------------- +
| Descriptive Statistics for Alternative CR |
| Utility Function | | 71.0 observs. |
| Coefficient | All 175.0 obs.|that chose CR |
| Name Value Variable | Mean Std. Dev.|Mean Std. Dev. |
| ------------------------- ----------- | -------------------------- +--------------------------- |
| INVTCAR -.0699 INVT | 55.406 24.166| 43.324 15.839 |
| TC -.1122 TC | 3.766 2.705| 2.592 2.708 |
| PC -.0049 PC | 11.606 13.551| 5.859 10.184 |
| EGTCAR -.1184 EGT | 6.469 9.348| 3.958 4.634 |
+---------------------------------------------------------------------------------------------------- +
+--------------------------------------------------------------------------- +
| Cross tabulation of actual choice vs. predicted P(j) |
| Row indicator is actual, column is predicted. |
| Predicted total is F(k,j,i)=Sum(i=1,. . .,N) P(k,j,i). |
| Column totals may be subject to rounding error. |
+--------------------------------------------------------------------------- +
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
539 Getting more from your model
-----------+-------------------------------------------------------------------------------------
NLOGIT Cross Tabulation for 4 outcome Multinomial Choice Model
XTab_Prb| BS TN BW CR Total
-----------+-------------------------------------------------------------------------------------
BS| 12.0000 9.00000 9.00000 8.00000 38.0000
TN| 11.0000 15.0000 11.0000 8.00000 46.0000
BW| 8.00000 12.0000 17.0000 4.00000 42.0000
CR| 9.00000 10.0000 11.0000 41.0000 71.0000
Total| 40.0000 47.0000 49.0000 61.0000 197.000
+--------------------------------------------------------------------------- +
| Cross tabulation of actual y(ij) vs. predicted y(ij) |
| Row indicator is actual, column is predicted. |
| Predicted total is N(k,j,i)=Sum(i=1,. . .,N) Y(k,j,i). |
| Predicted y(ij)=1 is the j with largest probability. |
+--------------------------------------------------------------------------- +
-----------+-------------------------------------------------------------------------------------
NLOGIT Cross Tabulation for 4 outcome Multinomial Choice Model
XTab_Frq| BS TN BW CR Total
-----------+-------------------------------------------------------------------------------------
BS| 12.0000 9.00000 7.00000 10.0000 38.0000
TN| 9.00000 15.0000 11.0000 11.0000 46.0000
BW| 1.00000 9.00000 29.0000 3.00000 42.0000
CR| 2.00000 5.00000 11.0000 53.0000 71.0000
Total| 24.0000 38.0000 58.0000 77.0000 197.000
+----------------------------------------------------------------------- +
| Derivative averaged over observations.|
| Effects on probabilities of all choices in model: |
| * = Direct Derivative effect of the attribute. |
+----------------------------------------------------------------------- +
-----------------------------------------------------------------------------------------------------
Average elasticity of prob(alt) wrt INVC in BS
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| -.31580*** .01558 -20.27 .0000 -.34634 -.28526
TN| .09303*** .00644 14.44 .0000 .08041 .10566
BW| .09952*** .00505 19.71 .0000 .08963 .10942
CR| .05877*** .00684 8.59 .0000 .04537 .07218
-----------+-----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 09:44:21 AM
-----------------------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
540 The suite of choice models
-----------------------------------------------------------------------------------------------------
Average elasticity of prob(alt) wrt INVC in TN
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| .06945*** .00431 16.10 .0000 .06100 .07791
TN| -.21258*** .00930 -22.86 .0000 -.23080 -.19436
BW| .08800*** .00475 18.51 .0000 .07868 .09731
CR| .04771*** .00386 12.36 .0000 .04014 .05527
-----------+-----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 09:44:21 AM
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------
Average elasticity of prob(alt) wrt INVC in BW
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| .11703*** .00566 20.68 .0000 .10593 .12812
TN| .12681*** .00560 22.65 .0000 .11584 .13779
BW| -.29347*** .00942 -31.14 .0000 -.31194 -.27500
CR| .05798*** .00473 12.24 .0000 .04870 .06726
-----------+-----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 09:44:21 AM
------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------
Average elasticity of prob(alt) wrt INVT2 in BS
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| -.29476*** .01229 -23.97 .0000 -.31886 -.27066
TN| .08683*** .00398 21.84 .0000 .07904 .09463
BW| .09679*** .00430 22.51 .0000 .08836 .10521
CR| .05176*** .00316 16.39 .0000 .04557 .05795
-----------+-----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 09:44:21 AM
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------
Average elasticity of prob(alt) wrt INVT2 in TN
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| .07183*** .00339 21.17 .0000 .06519 .07848
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
541 Getting more from your model
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
542 The suite of choice models
-----------------------------------------------------------------------------------------------------
Average partial effect on prob(alt) wrt ACT in TN
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| .00413*** .00020 20.41 .0000 .00373 .00453
TN| -.01225*** .00023 -53.37 .0000 -.01269 -.01180
BW| .00576*** .00028 20.80 .0000 .00522 .00630
CR| .00370*** .00021 17.76 .0000 .00329 .00410
-----------+-----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 09:44:22 AM
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------
Average partial effect on prob(alt) wrt ACT in BW
-----------+-----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
Choice| Coefficient Error z |z|>Z* Interval
-----------+-----------------------------------------------------------------------------------------
BS| .00513*** .00025 20.13 .0000 .00463 .00563
TN| .00594*** .00029 20.59 .0000 .00537 .00650
BW| -.01279*** .00021 -60.05 .0000 -.01321 -.01237
CR| .00311*** .00020 15.83 .0000 .00272 .00349
-----------+-----------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 16, 2013 at 09:44:22 AM
-----------------------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
543 Getting more from your model
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
544 The suite of choice models
---------------------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -230.45797
Estimation based on N = 197, K = 6
Inf.Cr.AIC = 472.9 AIC/N = 2.401
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;. . .;RHS=ONE$
Chi-squared[ 3] = 72.70997
Prob [ chi squared > value ] = .00000
Response data are given as ind. choices
Number of obs.= 197, skipped 0 obs
-----------+---------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+---------------------------------------------------------------------------------------
BS| -.52833** .26519 -1.99 .0463 -1.04809 -.00857
INVTZ| -.03639*** .00805 -4.52 .0000 -.05216 -.02061
INVTCZ| .00049 .00095 .51 .6098 -.00138 .00235
INVCQZ| -.00048 .00141 -.34 .7326 -.00325 .00229
TN| -.94074*** .22709 -4.14 .0000 -1.38582 -.49566
BW| -.87783*** .25289 -3.47 .0005 -1.37348 -.38218
-----------+---------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 23, 2013 at 10:44:19 AM
---------------------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
545 Getting more from your model
---------------------------------------------------------------------------------------------------
WALD procedure. Estimates and standard errors
for nonlinear functions and joint test of
nonlinear restrictions.
Wald Statistic = .22617
Prob. from Chi-squared[ 1] = .63438
Functions are computed at means of variables
-----------+---------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
WaldFcns| Function Error z |z|>Z* Interval
-----------+---------------------------------------------------------------------------------------
Fncn(1)| 1.66632 3.50379 .48 .6344 -5.20097 8.53361
-----------+---------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
---------------------------------------------------------------------------------------------------
The preceding syntax computes the function and the asymptotic stan-
dard error for the function shown in Equation 8.27 of Chapter 8 from
Rose et al. (2012). It reports the function value, standard error, and
confidence interval. This function can also be used to compute the results
over a range of values of a variable. For example, if one wanted to do this
exercise for values of invc ranging from 5 to 50, and plot the results, you
might use:
; Scenario: & invc = 5(5)50; Plot(ci)
which will get the result for xc = 5,10,. . .,50, and plot the function value
against the values of xc, with confidence limits. Note that the description
there calls this the “mean WTP.” This is not the mean WTP. It is the WTP
at the means. To compute the mean WTP, the function would be computed
for each observation in the sample and the functions would be averaged.
This calculation can be obtained by removing ;Means from the command
above. An alternative to the delta method is the Krinsky–Robb
(K&R) method. The preceding can be changed to the K&R method (see
Chapter 7) by adding:
;K&R ; Draws = number
to the preceding. Some researchers suggest that the number of draws must
be greater than 5,000. Perhaps 1,000 will be sufficient, so test both if unsure.
It is also possible to use K&R for the average WTP. This is a huge amount of
computation, though if the sample size is not too large, it should be
tolerable.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
546 The suite of choice models
nlogit;lhs=mode
;choices=air,train,bus,car
;ivb=CSmode
;model:
U(air)=invc*invc+invt*invt/
U(train)=invc*invc+invt*invt/
U(bus)=invc*invc+invt*invt/
U(car)=invc*invc+invt*invt
;SIMULATION
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
547 Getting more from your model
;Scenario:invc(air)=[*]1.5$
calc;list;beta=b(1)$
matr;be=b(1:2)$
create
;csA=csmode/b(1)$
dstats;rhs= csA$
create;DeltaCS=csB-csa$
dtstats;rhs=DEltaCS$
The random regret model (RRM) is set out in Section 8.2 of Chapter 8, and is
growing in interest and as an alternative to the RUM. The data used to
investigate differences between RUM and RRM is drawn from a larger study
undertaken in Sydney on the demand for alternative-fueled automobiles. Full
details, including the properties of the design experiment, are given in Beck
et al. (2012, 2013) and Hensher et al. (2012). The data was collected over a
four-month period in 2009. The final sample used in model estimation here
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
548 The suite of choice models
1
We have been asked whether “the RRM model has a higher requirement on the reliability of the SP
data (depending on whether respondents seriously consider all alternatives in the SP game) because it
uses the unchosen alternatives as well in estimation.” We believe that although the information on
attributes of alternatives is used in a different way in RRM compared to RUM, the very same issues in
relation to how information on attributes is processed is present under RUM. Indeed, there are a number
of studies under RUM that investigate deviations from a reference or status quo alternative that involve
using data in a differencing manner (see, for example, Hess et al. 2008).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
549 Getting more from your model
the log-likelihood (LL) of each model by a quantity equal to the number of its
parameters. The AIC for model selection simply consists in comparing the
AIC values for the two models. If the value is positive the first model is chosen,
otherwise the second will be deemed best. On the AIC test, the RRM is
marginally superior on statistical fit to the RUM. All parameters have the
expected sign and are statistically significant at the 95 percent confidence level,
except for registration fee. The fuel-specific constants show a preference for
petrol vehicles, after controlling for the observed attributes.
Figures 13.3 to 13.6 depict the RUM (ProbRUM) and RRM (ProbRRM)
probability distributions across the sample overall, and for each of the alter-
native fuel types, as well as the differences (ProbDif) between RUM and RRM.
The most notable evidence is the narrower range and more peaked distribu-
tion for RRM compared to the RUM, suggesting greater heterogeneity in
predicted probabilistic choice under RUM. The incidence of greater
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
550 The suite of choice models
observation frequency around the mean and median is most stark under RRM
compared to RUM, despite overall model fits being relatively similar. There
are clear differences in the choice probabilities associated with each respon-
dent, as highlighted in the ProbDif graphs. This suggests that the implied
elasticities associated with one or more attributes are likely to differ given their
dependence on the choice probabilities (see below).
All the mean2 elasticities obtained from the RUM and RRM are summar-
ized in Table 13.3. Although the absolute magnitudes appear at first glance to
be relatively similar with some exceptions, such as vehicle price, many of the
elasticities are quite different in percentage terms (varying between 1.21 and
18.95 percent). The vehicle price elasticities for RRM are greater than for
2
Mean elasticities are obtained from probability weighting the respondent-specific elasticities, where the
probability weight relates to the probability of choosing a particular alternative in a choice set setting.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
551 Getting more from your model
7.74 2.78
5.80 2.09
Density
Density
3.87 1.39
1.93 .70
.00 .00
–.100 –.050 .000 .050 .100 .150 .00 .19 .37 .56 .75 .94
PROBDIF Range (x)
Kernel density estimate for PROBDIF PROBRUM PROBRRM
5.85 2.85
4.39 2.14
Density
Density
2.93 1.43
1.46 .71
.00 .00
–.100 –.050 .000 .050 .100 .150 .000 .250 .500 .750 1.000
PROBDIF Range (x)
Kernel density estimate for PROBDIF PROBRUM PROBRRM
Figure 13.4 Profile of petrol choice probabilities for RUM and RRM
10.76 3.15
8.07 2.36
Density
Density
5.38 1.57
2.69 .79
.00 .00
–.100 –.050 .000 .050 .100 .150 .00 .18 .36 .54 .73 .91
PROBDIF Range (x)
Kernel density estimate for PROBDIF PROBRUM PROBRRM
Figure 13.5 Profile of diesel choice probabilities for RUM and RRM
RUM by between 4.22 and 12.39 percent, for fuel price they are greater by
between 1.21 and 9.5 percent, for fuel efficiency they are higher by between
5.31 and 18.95 percent, and for annual emissions surcharge they are higher by
between 1.90 and 10.2 percent. These differences are substantial, and they
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
552 The suite of choice models
Absolute difference
Attribute (RUM – Random regret) Percent difference
7.41 2.78
5.56 2.09
Density
Density
3.70 1.39
1.85 .70
.00 .00
–.100 –.050 .000 .050 .100 .150 .00 .18 .36 .54 .72 .90
PROBDIF Range (x)
Figure 13.6 Profile of hybrid choice probabilities for RUM and RRM
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
553 Getting more from your model
–> rrlogit
;choices=Pet,Die,Hyb
;lhs=choice,cset,alt
;effects:fuel(*)/aes(*)/price(*)/ves(*)/rego(*)/fe(*);pwt
;model:
3
It is important to note that an elasticity calculation has a number of estimates embedded in it of
parameters and probabilities (see Equation (8.16) in Chapter 8), and hence it is extremely complex (if not
practically impossible) to derive standard errors that are required in testing a hypothesis about the
elasticity. The delta method or Krinsky-Robb tests could be implemented to do that, but for elasticities,
even from a simple multinomial choice model, it is extremely complex to program, if it could be done at
all. On the other hand, we would not trust an hypothesis test for an elasticity even if the standard errors
were computed by the delta method.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
554 The suite of choice models
U(Pet) = Petasc + price*price + fuel*fuel + rego*rego + AES*AES + VES*VES + FE*FE + EC*EC + SC*SC +
pricpage*pricpage+pricft*pricft+pricpinc*pricpinc+prichinc*prichinc+Kor*Kor /
U(Die) = Dieasc + price*price + fuel*fuel + rego*rego + AES*AES + VES*VES+ FE*FE + EC*EC + SC*SC
+pricpage*pricpage+pricft*pricft+pricpinc*pricpinc+prichinc*prichinc+Kor*Kor /
U(Hyb) =price*price+fuel*fuel+rego*rego+AES*AES+VES*VES+FE*FE+EC*EC+SC*SC
+pricpage*pricpage+pricft*pricft+pricpinc*pricpinc+prichinc*prichinc+Kor*Kor +male*pgend$
As an aside, Maximize gives different standard errors from the Nlogit command because
Maximize uses only first derivatives and Nlogit uses the Hessian. There will be small
differences in the standard error estimates. The example below estimates a non-linear MNL.
Sample ; All $
Nlogit ; Lhs = Mode ; Choices = Air,Train,Bus,Car
; Rhs = ttme,invc,invt,gc ; rh2=one,hinc$
? Air,Train,Bus,Car
create ; da=mode ; dt=mode[+1] ; db=mode[+2] ; dc=mode[+3] $
create ; ttmea=ttme ; ttmet=ttme[+1] ; ttmeb = ttme[+2] ; ttmec=ttme[+3] $
create ; invca=invc ; invct=invc[+1] ; invcb = invc[+2] ; invcc=invc[+3] $
create ; invta=invt ; invtt=invt[+1] ; invtb = invt[+2] ; invtc=invt[+3] $
create ; gca =gc ; gct = gc[+1] ; gcb = gc[+2] ; gcc = gc[+3] $
Create ; J = Trn(-4,0) $
Reject ; J > 1 $
Maximize
; Labels = aa,at,ab,bttme,binvc,binvt,bgc, bha,bht,bhb
; Start = 4.375,5.914,4.463,-.10289,-.08044,-.01299,.07578,.00428,-.05907,-.02295
; Fcn = ua = aa + bttme*ttmea + binvc*invca + binvt*invta + bgc*gca + bha*hinc |
va = exp(ua) |
ut = at + bttme*ttmet + binvc*invct + binvt*invtt + bgc*gct + bht*hinc |
vt = exp(ut) |
ub = ab + bttme*ttmeb + binvc*invcb + binvt*invtb + bgc*gcb + bhb*hinc |
vb = exp(ub) |
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
555 Getting more from your model
When the data consists of two subsets, for example an RP data set and a
counterpart SP data set, it is sometimes useful to fit the model with one of the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
556 The suite of choice models
data sets, then refit the second one while retaining the original coefficients,
and just adjusting the constants. One is often interested in using the parameter
estimates from one data set but re-estimating (or calibrating) on another data
set only the ASCs. The example below shows the command syntax, where we
have conveniently divided the data set into two “separate samples.” One
would normally use different data sets unless the analyst wishes to use part
of a single data set as a hold out sample:
|-> LOAD;file=“C:\Projects\NWTptStudy_03\NWTModels\ACA Ch 15 ML_RPL models\nw15jul03-
3limdep.SAV.lpj”$
Project file contained 27180 observations.
create
;if(employ=1)ftime=1
;if(whopay=1)youpay=1$
sample;all$
reject;dremove=1$ Bad data
reject;altij=-999$
reject;ttype#1$ work =1
Timer
sample;1-12060$
Nlogit
;lhs=resp1,cset,Altij
;choices=NLRail,NHRail,NBway,Bus,Bway,Train,Car
; Alg = BFGS
;model
U(NLRail)= NLRAsc + cost*tcost + invt*InvTime + acwt*wait+ acwt*acctim
+ accbusf*accbusf+eggT*egresst + ptinc*pinc + ptgend*gender + NLRinsde*inside /
U(NHRail)= TNAsc + cost*Tcost + invt*InvTime + acwt*WaitT + acwt*acctim
+ eggT*egresst + accbusf*accbusf + ptinc*pinc + ptgend*gender + NHRinsde*inside /
U(NBway)= NBWAsc + cost*Tcost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst + accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Bus)= BSAsc + cost*frunCost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst+ ptinc*pinc + ptgend*gender/
U(Bway)= BWAsc + cost*Tcost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst + accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Train)= TNAsc + cost*tcost + invt*InvTime + acwt*WaitT + acwt*acctim
+ eggT*egresst + accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Car)= CRcost*costs + CRinvt*InvTime + CRpark*parkcost+ CReggT*egresst$
+---------------------------------------------------------------------------+
|WARNING: Bad observations were found in the sample. |
|Found 565 bad observations among 2201 individuals. |
|You can use ;CheckData to get a list of these points. |
+---------------------------------------------------------------------------+
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
557 Getting more from your model
-------------------------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -2315.02908
Estimation based on N = 1636, K = 20
Inf.Cr.AIC = 4670.1 AIC/N = 2.855
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;. . .;RHS=ONE$
Response data are given as ind. choices
Number of obs.= 2201, skipped 565 obs
-----------+-------------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
RESP1| Coefficient Error z |z|>Z* Interval
-----------+-------------------------------------------------------------------------------------------
NLRASC| 3.09077*** .35051 8.82 .0000 2.40379 3.77776
COST| -.21192*** .01316 -16.10 .0000 -.23772 -.18612
INVT| -.03428*** .00193 -17.80 .0000 -.03806 -.03051
ACWT| -.02434*** .00511 -4.76 .0000 -.03436 -.01432
ACCBUSF| -.19927*** .03169 -6.29 .0000 -.26139 -.13716
EGGT| -.02650*** .00501 -5.28 .0000 -.03633 -.01667
PTINC| -.00954*** .00247 -3.86 .0001 -.01439 -.00470
PTGEND| .50243*** .16943 2.97 .0030 .17034 .83451
NLRINSDE| -1.87282*** .45534 -4.11 .0000 -2.76528 -.98037
TNASC| 2.70760*** .33840 8.00 .0000 2.04434 3.37086
NHRINSDE| -2.24667*** .55770 -4.03 .0001 -3.33974 -1.15361
NBWASC| 1.97710*** .39319 5.03 .0000 1.20645 2.74774
WAITTB| -.02656 .01950 -1.36 .1731 -.06478 .01165
ACCTB| -.04328*** .01003 -4.32 .0000 -.06294 -.02363
BSASC| 2.23452*** .33764 6.62 .0000 1.57275 2.89628
BWASC| 2.59449*** .34292 7.57 .0000 1.92238 3.26661
CRCOST| -.12599*** .02512 -5.02 .0000 -.17523 -.07676
CRINVT| -.01732*** .00323 -5.36 .0000 -.02366 -.01099
CRPARK| -.01335* .00707 -1.89 .0588 -.02720 .00049
CREGGT| -.02835** .01136 -2.50 .0125 -.05061 -.00609
-----------+-------------------------------------------------------------------------------------------
|-> sample;12061-27180$
|-> Nlogit
;lhs=resp1,cset,Altij
;choices=NLRail,NHRail,NBway,Bus,Bway,Train,Car
; Alg = BFGS
;model:
U(NLRail)= NLRAsc + cost[]*tcost + invt[]*InvTime + acwt[]*waitt+
acwt[]*acctim + accbusf[]*accbusf+eggT[]*egresst
+ ptinc[]*pinc + ptgend[]*gender + NLRinsde[]*inside /
U(NHRail)= TNAsc + cost[]*Tcost + invt[]*InvTime + acwt[]*WaitT + acwt[]*acctim
+ eggT[]*egresst + accbusf[]*accbusf
+ ptinc[]*pinc + ptgend[]*gender + NHRinsde[]*inside /
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
558 The suite of choice models
+---------------------------------------------------------------------------- +
|WARNING: Bad observations were found in the sample. |
|Found 500 bad observations among 2672 individuals. |
|You can use ;CheckData to get a list of these points. |
+---------------------------------------------------------------------------- +
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
559 Getting more from your model
The model is first fit with the first half of the data set (sample;1–12060).
Then, for the second estimation, we want to refit the model, but only recom-
pute the constant terms and keep the previously estimate slope parameters.
The device to use for the second model is the “[ ]” specification, which
indicates that you wish to use the previously estimated parameters. The
commands above will, in principle, produce the desired result, with one
consideration. Newton’s method is very sensitive to the starting values for
this model, and with the constraints imposed in the second model will
generally fail to converge. The practical solution is to change the algorithm
to BFGS, which will then produce the desired result. You can do this just by
adding ; Alg = BFGS to the second command. An additional detail is that the
second model will now replace the first as the “previous” model. So, if you
want to do a second calibration, you have to refit the first model. To pre-empt
this, you can use ;Calibrate in the second command. This specification
changes the algorithm and also instructs Nlogit not to replace the previous
estimates with the current ones.
As an aside, You may use this device with any discrete choice model that you fit with Nlogit.
The second sample must have the same configuration as the first, and the device can only
be used to fix the utility function parameters. The latter point implies that if you do this with a
random parameters model, the random parameters will become fixed; that is, the variances
will be fixed at zero.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:32 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.017
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
In mathematics you don’t understand things. You just get used to them.
(John von Neumann, 1903–57)
14.1 Introduction
The majority of practical choice study applications do not progress beyond the
simple multinomial logit (MNL) model discussed in previous chapters. The
ease of computation, and the wide availability of software packages capable of
estimating the MNL model, suggest that this trend will continue. The ease
with which the MNL model may be estimated, however, comes at a price in
the form of the assumption of Independence of Identically Distributed (IID)
error components. While the IID assumption and the behaviorally compar-
able assumption of Independence of Irrelevant Alternatives (IIA) allow for
ease of computation (as well as providing a closed form solution1), as with any
assumption violations both can and do occur. When violations do occur, the
cross-substitution effects (or correlation) observed between pairs of alterna-
tives are no longer equal given the presence or absence of other alternatives
within the complete list of available alternatives in the model (Louviere et al.
2000).
The nested logit (NL) model represents a partial relaxation of the IID and
IIA assumptions of the MNL model. As discussed in Chapter 4, this relaxation
occurs in the variance components of the model, together with some correla-
tion within sub-sets of alternatives, and while more advanced models such as
mixed multinomial logit (see Chapter 15) relax the IID assumption more fully,
1
An equation is said to be a closed-form solution if it may be solved using mathematical operations and
does not require complex, analytical calculations such as integration each time a change occurs
somewhere within the system.
560
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
561 Nested logit estimation
As with Chapters 11 and 13, we use the labeled mode choice case study as our
point of reference in estimating these models. In contrast to Chapter 13, where
we used the revealed preference (RP) data, we use the stated preference (SP)
data in this chapter (chosen so as to show users the SP part of the data which
will, in later chapters, be combined with the RP data in jointly estimating RP–
SP models. We begin by examining how NL tree structures are specified in
NLOGIT.
The majority of NL models estimated as part of choice studies typically have
only two levels. Very few NL models are estimated with three levels, and even
fewer with four levels. Nlogit has the capability to simultaneously estimate NL
models with up to four levels, with sequential estimation required for addi-
tional levels. Within the literature (see also Chapter 4), the three highest levels
of NL trees are named, from the highest level (level four) to the lowest level
(level two), as Trunks, Limbs, and Branches. At the lowest level of NL trees
(level one) resides the elemental alternatives (hereafter referred to simply as
alternatives), which are sometimes referred to in the literature as Twigs.
NL models estimated by Nlogit may have up to a maximum of five trunks,
10 limbs, 25 branches, and 500 alternatives. Any tree structure, provided that
it does not exceed the maximum number of trunks, limbs, branches, or
alternatives allowed, may be estimated. Thus, provided that the total number
of alternatives within the tree does not exceed 500, some branches may have
only one alternative (known as a degenerate branch; this is discussed in more
detail later), while other branches may have two or more alternatives.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
562 The suite of choice models
Similarly, provided that the total number of limbs does not exceed 25, some
trunks may have only a single branch, while others may have two or more
branches. Trunks may also have any number of limbs, provided that the total
number of limbs does not exceed 10 within the overall tree structure. Tree
structures in which there is only a single trunk, but two or more limbs, are
known as three-level NL models (we often omit the trunk level when we draw
such trees; however, the level is still there). Models with only one trunk and
one limb but multiple branches are, by implication, called two-level NL
models (once more, it is customary when drawing such tree structures to do
so without showing the single trunk and limb). Single level models, where
there is only a single trunk, limb, and branch, but multiple alternatives, are
also possible.
The command syntax structure for the NL model is similar to the command
syntax for the MNL model discussed in Chapter 12. The addition of the
following command to the MNL command syntax will estimate a NL model
using Nlogit:
;tree =<tree structure>
Placing the tree specification command within the MNL command syntax,
the base NL model command will look as follows:
NLOGIT
;lhs = choice, cset, altij
;choices =<names of alternatives>
;tree = <tree structure>
;Model:
U(alternative 1 name) = <utility function 1>/
U(alternative 2 name) = <utility function 2>/
...
U(alternative i name) = <utility function i>$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
563 Nested logit estimation
B(2|2,1) represents the second branch in limb two within trunk one; Lmb[1:1]
represents limb one, trunk one; and Trunk[2] represents trunk two. The
naming of a trunk, limb, or branch is done by providing a name (eight
characters or less) outside the relevant brackets. The alternatives are specified
at the lowest level of the tree structure (i.e., at level one) and are entered within
the appropriate brackets as they exist within the tree structure. Alternatives
within the same trunk, limb, or branch are separated by a comma (,).
To demonstrate the above, consider the following example (not based on
the SP data we use in estimation):
;tree = car(card,carp), PT(bus,train,busway,LR)
The above tree specification will estimate a NL model with the tree structure
in Figure 14.1.
This structure has two branches and six alternatives, two belonging to the
Car branch, and four to the public transport (PT) branch, and hence is a two-
level NL model. This tree structure represents one of many possible tree
structures that may be explored by the analyst. For example, the analyst
may also specify the NL tree structure (using the same alternatives) as follows:
;tree = car(card,carnp), PTEX(bus,train), PTNW(busway,LR)
Graphically, the above NL tree structure would look as shown in Figure 14.2.
The tree structure in Figure 14.2 differs from that in Figure 14.1 in that
there now exist three branches, each with two alternatives. For the NL model
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
564 The suite of choice models
represented by Figure 14.2, we have placed the bus and train alternatives
within the same branch named PTEX (for existing modes) and the busway
and light rail modes in the same branch named PTNW (for new modes).
As an aside, although, as we have shown, it is possible to omit higher levels from NL models
if the higher levels have a single limb or trunk (which we have omitted from our tree
diagrams), it is also possible to acknowledge that a higher level exists by providing a name
for it. For example:
;tree = Limb[car(card,carp), PTEX(bus,train), PTNW(busway,LR)]
will produce exactly the same NL model as that shown in Figure 14.2. In such cases, the
inclusive value (IV) parameter of the highest level (called Limb above) is fixed at 1.0 (see
below).
Once again, the tree structure of Figure 14.2 represents but one of many
possible tree structures that may be of interest. In the following tree specifica-
tion, we demonstrate a third possible tree structure. In this particular struc-
ture, we have added an additional level (i.e., a limb) to the tree, thus making
this a three-level NL model:
;tree= CAR[card,carpt], PT[PTRail(bus,train), PTRoad(busway,LR)]
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
565 Nested logit estimation
For example, assuming the tree structure from Figure 14.3, the following
command will constrain the inclusive value parameter of the two public
transport branches to be equal:
;ivset: (PTrail, PTroad)
For example, assuming a new branch existed (called D) for the tree struc-
ture, with two new alternatives, pushbike and motorbike, the following com-
mand would constrain the IV parameters for branches A and C and B and D:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
566 The suite of choice models
For example, assuming the tree structure shown in Figure 14.3, the follow-
ing ;ivset command would constrain the two public transport IV parameters
to equal 0.75:
For example, the following will constrain the PTRail branch of Figure 14.3
to 0.75 while simultaneously fixing the PTRoad IV parameter to 0.5:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
567 Nested logit estimation
As an aside, by default, the Nlogit starting value for all IV parameter estimates is 1.0. The
specification of starting values is not limited to the IV parameters of NL models. The analyst
may also specify starting values for the remaining parameter estimates contained within the
model, although the default is the MNL estimates. This may be done in a similar manner to
the MNL model, where the analyst places the requested start value in round brackets (( ))
after the parameter name. The command syntax ;start=logit used in earlier version of Nlogit
(pre Nlogit4) is redundant since the MNL estimates are always the default.
As a further aside, when utility functions are not specified at levels 2 to 4 of the NL model
(discussed later), the MNL model first estimated will be equivalent to the equivalently
specified MNL model.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
568 The suite of choice models
Nlogit
;lhs = choice, cset, altij
;choices = NLR,NHR,NBW,bs,tn,bw,cr
;tree=ptnew(NLR,NHR,NBW),Allold(bs,tn,bw,cr)
;show
;RU2
;prob = margprob
;cprob = altprob
;ivb = ivbranch
;utility=mutilz
;model:
u(nlr) = nlr + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(nhr) = nhr + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(nbw) = nbw + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = bw + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(cr) = invccar*invc+invtcar*invt + TC*TC + PC*PC + egtcr*egt $/
+---------------------------------------------------------------------------+
|WARNING: Bad observations were found in the sample. |
|Found 104 bad observations among 1970 individuals. |
|You can use ;CheckData to get a list of these points. |
+---------------------------------------------------------------------------+
Tree Structure Specified for the Nested Logit Model
Sample proportions are marginal, not conditional.
Choices marked with * are excluded for the IIA test.
-----------------------+----------------------+----------------------+----------------------+--------+-----
Trunk (prop.) |Limb (prop.)|Branch (prop.)|Choice (prop.)|Weight|IIA
-----------------------+----------------------+----------------------+----------------------+--------+-----
Trunk{1} 1.00000 |Lmb[1|1] 1.00000|PTNEW .40997|NLR .17471| 1.000|
| | |NHR .18060| 1.000|
| | |NBW .05466| 1.000|
| |ALLOLD .59003|BS .11790| 1.000|
| | |TN .14094| 1.000|
| | |BW .20096| 1.000|
| | |CR .13023| 1.000|
-----------------------+----------------------+----------------------+----------------------+--------+-----
Normal exit: 7 iterations. Status=0, F= 2730.693
-------------------------------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -2730.69253
Estimation based on N = 1866, K = 16
Inf.Cr.AIC = 5493.4 AIC/N = 2.944
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
569 Nested logit estimation
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
570 The suite of choice models
-----------+------------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+------------------------------------------------------------------------------------------
|Attributes in the Utility Functions (beta)
NLR| 1.76991*** .28924 6.12 .0000 1.20300 2.33681
ACTPT| -.03635*** .00498 -7.29 .0000 -.04612 -.02658
INVCPT| -.21341*** .01648 -12.95 .0000 -.24572 -.18110
INVTPT| -.02617*** .00232 -11.29 .0000 -.03072 -.02163
EGTPT| -.00542 .00372 -1.46 .1454 -.01272 .00188
TRPT| .23064** .09036 2.55 .0107 .05355 .40774
NHR| 1.72411*** .28103 6.13 .0000 1.17329 2.27492
NBW| 1.19653*** .25980 4.61 .0000 .68734 1.70571
BS| -.59018** .24843 -2.38 .0175 -1.07710 -.10327
TN| -.28961 .24381 -1.19 .2349 -.76747 .18825
BW| -.02930 .23930 -.12 .9025 -.49831 .43971
INVCCAR| -.03454 .07066 -.49 .6250 -.17303 .10396
INVTCAR| -.01473*** .00325 -4.53 .0000 -.02110 -.00835
TC| -.07077** .03124 -2.26 .0235 -.13200 -.00953
PC| -.04475*** .00886 -5.05 .0000 -.06212 -.02738
EGTCR| -.10768*** .02641 -4.08 .0000 -.15943 -.05592
|IV parameters, RU2 form = mu(b|l),gamma(l)
PTNEW| .51010*** .05571 9.16 .0000 .40091 .61928
ALLOLD| .95074*** .08846 10.75 .0000 .77737 1.12411
-----------+------------------------------------------------------------------------------------------
Estimating the above model, Nlogit first provides the MNL output used to
locate the start values for the NL and ML estimation search. The interpreta-
tion of the majority of the NL output is the same as that provided for the MNL
model. We limit our discussion to new output that is either not presented with
the MNL model output or where the interpretation between that provided for
the MNL and NL models differs. The first difference of note is in the first line
of the output box, which we reproduce below:
FIML: Nested Multinomial Logit Model
The first line of output informs the analyst that a NL model was obtained
using an estimation technique known as full information maximum like-
lihood (FIML), which we discussed in Chapter 5. NL models may be estimated
either sequentially or simultaneously. Sequential estimation (known as limited
information maximum likelihood estimators, or LIML) involves the estima-
tion of separate levels of the NL tree in sequential order from the lowest level
to the highest level of the tree. Beginning at the branch level, LIML will
estimate the utility specifications of the alternatives present within each
branch, including the IV parameters, as well as the IV parameters for each
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
571 Nested logit estimation
branch of the tree. Once the IV parameters are estimated, the IV parameters at
the branch level may be calculated. These IV parameters are then used as
explanatory variables for the next level of the tree. This process is repeated
until the entire tree structure of the NL Model is estimated. Hensher
(1986) has shown that using LIML to estimate NL models is statistically
inefficient, as the parameter estimates of levels three and higher are not
minimum variance parameter estimates resulting from the use of estimates
to estimate yet more estimates. For NL models with between two and four
levels, it is therefore more common to use simultaneous estimation proce-
dures that provide statistically efficient parameter estimates. The simulta-
neous estimation of the branches, limbs, and trunks of a NL model is
achieved using FIML. The sequential estimation of each partition of the
NL tree offers no advantages over the simultaneous estimation of the entire
NL model other than the possibility to estimate models with greater than
four levels (which will rarely be required); hence, we advise against this for
the beginner. Those interested in learning more about the differences
between the sequential and simultaneous estimation of NL models are
referred to Louviere et al. (2000, 149–52), while those interested in estimat-
ing such models are referred to the reference manuals that accompany the
Nlogit software.
Nlogit next reports the restricted and unrestricted LL functions for the
model. The LL functions of the NL model may be interpreted in exactly the
same manner as the LL functions of the MNL model. Indeed, if the two models
are estimated on the same sample, the LL functions for both are directly
comparable. The LL function reported is that for the model fitted as specified
through the utility functions, while the unrestricted LL function is the LL
function for a model estimated assuming equal choice shares (i.e., there is no
knowledge of sample shares). As with the MNL model, the test of model
significance for the NL model is the LL ratio test (see Chapter 7) using the
reported LL values discussed above. NL performs this test automatically.
The LL ratio test is chi square distributed with degrees of freedom equal to
the number of parameters estimated within the model. The number of para-
meters is inclusive of the IV parameters estimated, but not those that were
fixed (and hence not estimated). As with the MNL model, the Chi-square test
statistic for this test is:
2ðLLRestricted – LLUnrestricted Þ 2ðDifference in the number of parameters estimated between the two modelsÞ :
ð14:1Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
572 The suite of choice models
For the model output above, the test is shown below. The test has 18 degrees of
freedom (i.e., 16 parameter estimates and 2 IV parameters):
1896.43 is equal to the value we observe in the Nlogit output. To determine the
overall model fit, the analyst may compare the test statistic to the critical chi
square with 18 degrees of freedom, or use the p-value provided by Nlogit
which, for this example, is zero. As the p-value is less than alpha equal to 0.05
(i.e., 95 percent confidence level), we conclude that the estimated NL model
represents an improvement in the LL function of a model estimated with equal
market shares. As such, we conclude that the parameters estimated for the
attributes included in the utility functions improve the overall model fit.
Nlogit next estimates the pseudo-R2. As with the MNL model, the pseudo-
R2 for the NL model is estimated using the ratio of the LL function of the
model estimated here (i.e., −2711.95) over the LL function of a base model
estimated assuming equal choice shares across the alternatives (i.e. −3660.16).
The pseudo-R2 becomes 0.260, as reported in the output as McFadden Pseudo
R-squared:
When the choice set varies across individuals, as it does in the SP data we are
using (where each choice set has four of the seven alternatives), it is not
possible to compute the constants-only results from the market shares, and
the no coefficients model does not have probabilities equal to 1/J. If you want
those things calculated, you have to use ;RHS=one, as in the model below. The
LL of −3165.836 is based on the known sample choice shares, and no other
information:
|-> Nlogit
;lhs = choice, cset, altij
;choices = NLR,NHR,NBW,bs,tn,bw,cr
;rhs=one$
-----------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -3165.83600
Estimation based on N = 1866, K = 6
Inf.Cr.AIC = 6343.7 AIC/N = 3.400
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
573 Nested logit estimation
As an aside, as with all other choice models, the utilities derived from the above utility
specifications are relative. Hence, to determine the utility for any one alternative, the analyst
must take the difference between the utility for that alternative and that of a second alternative.
The final results produced are the estimates of the IV parameters for each of
the trunks, limbs, and branches of the model. As with the parameter estimates
for the attributes specified within the utility functions, Nlogit reports for each IV
parameter a standard error, a Wald statistic and a p-value. An interesting
question arises as to what an insignificant IV parameter means. The test statistic,
the Wald statistic, is calculated by dividing the IV parameter estimate by its
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
574 The suite of choice models
associated standard error and comparing the resulting value to some critical
value (usually ±1.96, representing a 95 percent confidence level). This test is
exactly the same as the one-sample t-test and is used in this case to determine
whether the IV parameter is statistically equal to zero. If the parameter is found
to be statistically equal to zero (i.e., the parameter is not significant), the
parameter remains within the 0–1 bound (it equals zero). This is important; as
mentioned in Chapter 4 we have two totally independent choice models for the
upper and lower levels and hence there exists evidence for a partition of the tree
structure at this section of the model.
As an aside, an insignificant IV parameter (i.e., one that is statistically equal to zero) suggests
that the two scale parameters taken from the different levels to form the IV parameter are
statistically very different (e.g., 0.1 divided by 0.8 equals 0.125 which is closer to zero than
0.1 divided by 0.2 which equals 0.5; of course, the standard errors must also be accounted
for). This does not mean that the variance is not statistically significant, or there is no
correlation between alternatives within a branch.
IVparameter 1
Wald Test ¼ : ð14:3Þ
std error
For the above example, the IV parameter for the PTNEW branch is statisti-
cally different from zero. As such, it is necessary to undertake the test
described in Equation (14.4) to determine whether the variable is statistically
different to one. We perform this test below:
0:5101 1
Wald Test ¼ ¼ 8:79:
0:0557
Comparing the test statistic of −8.79 to the critical value of ±1.96 (i.e., at alpha
equal to 0.05), we can reject the hypothesis that the PTNEW parameter is
statistically equal to one. This finding suggests that the nested structure is
indeed a statistically significant improvement over MNL, as well as being
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
575 Nested logit estimation
consistent with global utility maximization in that it satisfies the 0–1 bounds
for the IV parameter. The same finding applies to the other branch.
If the IV parameter were not statistically equal to either zero or one, or lie
within the 0–1 bound, but was statistically greater than one, the global utility
maximization assumption is no longer strictly valid, and cross-elasticities with
the wrong sign will be observed. The analyst will be required to (1) explore
new tree structures, (2) constrain a different IV parameter using the same tree
structure and re-estimate the model, or (3) move to more advanced models
(see Chapter 15) in order to proceed.
As an aside, The IV parameters are related to the correlation among alternatives in the same
branch (see Chapter 4):
2
λði jj ;l Þ
1 μðijj;lÞ equals the correlation of the utility functions for any pair of alternatives present
within the same nest or partition of a NL model. For the above example, the correlations
between the bus and train and busway and LR alternatives may be calculated as follows:
Corr(bs,tn,bw,cr) = 1−(0.95074)2 = 0.00964Corr(NLR,NHR,NBW) = 1−(0.5101)2 = 0.739.
Thus, IV parameters closer to 1.0 not only indicate a smaller difference in the variance
between adjoining levels, but also smaller correlation structures between the utility functions
of the alternatives present within the lower level of the nest.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
576 The suite of choice models
levels of the model via two means. Firstly, the utility functions of higher levels
of NL models are connected to the level directly below via the inclusion of the
lower level’s IV parameter within the upper level’s utility function. The second
connection occurs through the inclusion of the IV variable (i.e., the index of
expected maximum utility or EMU), which relates the utility expressions of
the level directly below to that of the upper level, as detailed in Chapter 4.
That is, the utility for the jth branch belonging to limb i of trunk l is equal to
the IV parameter multiplied by the IV variable (or EMU).
The unconditional probabilities for each outcome are calculated for the
preceding example as the product of the conditional and marginal probabilities:
P(cr,AllOld) = P(cr|AllOld) × P(AllOld)
P(bs,AllOld) = P(bs|AllOld) × P(AllOld)
P(tn,AllOld) = P(tn|AllOld) × P(AllOld)
P(bw,AllOld) = P(bw|AllOld) × P(AllOld)
P(nlr, PtNew) = P(nlr|PtNew) × P(PtNew)
P(nhr,PtNew) = P(nhr|PtNew) × P(PtNew)
P(nbw,PtNew) = P(nbw|PtNew) × P(PtNew)
These probabilities are automatically calculated by Nlogit using the formulae in
Chapter 4, and can be seen and saved by adding in the syntax ;prob=<name of
the variable to define the calculated probabilities>. The predicted probabilities
(the multiplication of all relevant conditional probabilities) in NL models may
be retained as a new variable in the data set with the addition of the command
;prob = <name>. This is the same command used for the MNL model to save
the probabilities. Conditional probabilities for elemental alternatives (level 1
probabilities) are retained using the command syntax ;Cprob = <name>. The
IV variables (not parameters) otherwise known as EMUs may also be saved as
new variables in the data set. The commands to save the IV parameters at each
level are: Branch level: IVB = <name>, Limb level: IVL = <name>, and Trunk
level: IVT = <name>.
For example, we have added in ;prob=margprob to obtain the marginal
probabilities for each of the alternatives above. Note, however, that since each
respondent in the SP data set being used considered four of the seven alter-
natives, the marginal probabilities for each respondent are limited to the four
alternatives in their choice set. For the first respondent (listed below), they had
alternatives (altij) 1,3,4, and 7 in their choice set, namely NLR, NBW, bs, and
cr. The marginal probabilities associated with the elemental alternatives sum
to 1.0 across the entire tree. In contrast, the conditional probabilities (defined
by AltProb in Table 14.1) sum to 1.0 within each branch. The analyst can
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
577 Nested logit estimation
In Section 14.3, we assumed that all attributes were associated with the utility
expressions that defined each elemental alternative. It is possible that some
influence on choice may directly influence the utility at higher levels in the
tree. This can be accomplished using the following command syntax, where
the name provided for the utility expression is a name provided in the ;tree
specification:
U(<branch, limb or trunk name>) = <utility function 1>/
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
578 The suite of choice models
Table 14.1 Useful outputs stored under the project file (data, variables)
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
579 Nested logit estimation
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
580 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
581 Nested logit estimation
The only difference between the base NL model and this model is the
addition of the following output (as well as the influence that this has on the
other parameter estimates):
|Attributes of Branch Choice Equations (alpha)
PINCZ| -.00266** .00129 -2.06 .0398 -.00519 -.00012
GEND| -.25315** .11993 -2.11 .0348 -.48821 -.01809
Personal income and the gender dummy variable (male = 1) are statisti-
cally significant, and given the negative sign this suggests that individuals on
higher incomes and males tend to have a lower utility associated with the
ptnew branch than other respondents, ceteris paribus. This finding will
condition the overall probability of choosing a branch and hence an alter-
native in the ptnew branch, which flows through to the allocation of
probabilities throughout the entire tree. Table 14.2 reports the change in
the marginal and conditional probabilities, as well as utility, for the first
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
582 The suite of choice models
Table 14.2 Comparison of findings in Table 14.1 with the NL model with upper level variables
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
583 Nested logit estimation
respondent for each of the 10 choice sets when we add in the upper level
influences on ptnew compared to Table 14.1. While the differences might appear
to be small, when summed across the entire data set this could amount to a
noticeable change in the predicted modal shares. Interested readers could cut and
paste into a spreadsheet the entire output for all respondents, and calculate the
overall modal shares.
eVLR
PðLRÞ ¼ ¼ 1: ð14:4Þ
eVLR
1 1
VC ¼ λC × IVðLRÞ ¼ λC × lnðeVLR Þ ¼ λC × VLR : ð14:5Þ
μc 1
As the utility for a degenerate alternative can only reside at one level of the NL
model (it does not matter whether we specify it at level 1 or 2), the variance
must be the same at each level of a degenerate nest. That is, VLR may be
specified at levels 1 or 2 in the above example: however, if specified at level 1,
A B C Branch (Level 2)
Car (toll) Car (no toll) Bus Train Busway Light Rail Alternatives (Level 1)
Figure 14.4 An NL tree structure with a degenerate alternative
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
584 The suite of choice models
the scale parameter, μC, is normalized to 1.0 while if specified at level 2, the
scale parameter, λC, is free to vary. This is counterintuitive, since the variance
(and hence scale parameters) for a degenerate alternative should be equal no
matter at what level one specifies the utility function. That is, the variance
structure of the NL model is such that higher level partitions incorporate the
variance of lower adjoining partitions as well as that partition’s own variance.
With a degenerate alternative, higher level partitions should theoretically not
have their own variance, as nothing is being explained at that level. As such,
the only level at which the variance should be explained is at the level at which
the utility function for that alternative is placed.
Normalization of the NL model using RU2 yields the following LR utility
function:
1 1 1
VC ¼ 1 × IVðLRÞ ¼ × lnðeμC VLR Þ ¼ × μC VLR ¼ VLR : ð14:7Þ
μC μC μC
Under RU2, the scale parameters cancel each other. That is, the IV parameter
is no longer identifiable! Nlogit recognizes this.
An important aspect of the above discussion, not recognized in the litera-
ture (at least that we know of), is that if a NL model has two degenerate
alternatives (such as Figure 14.5), scale parameters for both must be normal-
ized to one, which is equivalent to treating these alternatives as a single nest
(with MNL properties).
Taking the above into account, the following Nlogit command syntax will
estimate a NL model of the form suggested by Figure 14.4. The reader can
interpret the broader output; however, we note that the IV parameters all are
within the 0–1 range and are statistically significant, with the automatically
constrained IV parameter for the degenerate branch:
A B C D Branch (Level 2)
Car (toll) Car (no toll) Bus Train Busway Light Rail Alternatives (Level 1)
Figure 14.5 An NL tree structure with two degenerate alternatives
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
585 Nested logit estimation
|-> Nlogit
;lhs = choice, cset, altij
;choices = NLR,NHR,NBW,bs,tn,bw,cr
;tree=ptnew(NLR,NHR,NBW),PTold(bs,tn,bw),car(cr)
;show
;RU2
;model:
u(nlr) = nlr + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(nhr) = nhr + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(nbw) = nbw + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = bw + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(cr) = invccar*invc+invtcar*invt + TC*TC + PC*PC + egtcr*egt$ /
+---------------------------------------------------------------------------+
|WARNING: Bad observations were found in the sample. |
|Found 104 bad observations among 1970 individuals. |
|You can use ;CheckData to get a list of these points. |
+---------------------------------------------------------------------------+
Tree Structure Specified for the Nested Logit Model
Sample proportions are marginal, not conditional.
Choices marked with * are excluded for the IIA test.
----------------------+----------------------+-----------------------+----------------------+--------+------
Trunk (prop.)|Limb (prop.)|Branch (prop.)|Choice (prop.)|Weight|IIA
----------------------+----------------------+-----------------------+----------------------+--------+------
Trunk{1} 1.00000|Lmb[1|1] 1.00000|PTNEW .40997|NLR .17471| 1.000|
| | |NHR .18060| 1.000|
| | |NBW .05466| 1.000|
| |PTOLD .45981|BS .11790| 1.000|
| | |TN .14094| 1.000|
| | |BW .20096| 1.000|
| |CAR .13023|CR .13023| 1.000|
----------------------+----------------------+-----------------------+----------------------+--------+------
Normal exit: 7 iterations. Status=0, F= 2730.693
--------------------------------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -2730.69253
Estimation based on N = 1866, K = 16
Inf.Cr.AIC = 5493.4 AIC/N = 2.944
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;. . .;RHS=ONE$
Chi-squared[10] = 1588.10946
Prob [ chi squared > value ] = .00000
Response data are given as ind. choices
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
586 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
587 Nested logit estimation
The above example demonstrates the case whereby the degenerate nest is at the
elemental alternative level of the model. It is possible that, for 3 level or 4 level NL
models, a degenerate partition may occur at higher levels of the model. This
represents a partial degeneration. Consider the 3 level NL model in Figure 14.6.
The NL model shown in Figure 14.6 places the CARNT and LR in branch
A2; however, the upper nest of this partition, A1, is degenerate in the sense
that A1 is the sole limb in the partition. Following the same reasoning as
before, the scale parameters (and hence variances) for A1 and A2 must be
equal. We show how to handle such cases in Section 14.6.
A1 B1 Limb (Level 3)
A2 B2 B3 Branch (Level 2)
Car (no toll) Light Rail Car (toll) Train Bus Busway Alternatives (Level 1)
Figure 14.6 A 3-level NL tree structure with degenerate branches
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
588 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
589 Nested logit estimation
-------------------------------------------------------------------------------------------------------------
FIML Nested Multinomial Logit Model
Dependent variable CHOICE
Log likelihood function -2715.97371
Restricted log likelihood -3671.38073
Chi squared [ 21](P= .000) 1910.81404
Significance level .00000
McFadden Pseudo R-squared .2602310
Estimation based on N = 1866, K = 21
Inf.Cr.AIC = 5473.9 AIC/N = 2.934
Constants only must be computed directly
Use NLOGIT ;. . .;RHS=ONE$
At start values -3285.7820 .1734******
Response data are given as ind. choices
BHHH estimator used for asymp. variance
The model has 3 levels.
Random Utility Form 2:IVparms = Mb|l,Gl
Number of obs.= 1970, skipped 104 obs
-----------+------------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+------------------------------------------------------------------------------------------
|Attributes in the Utility Functions (beta)
NLR| 1.61975*** .29299 5.53 .0000 1.04549 2.19401
ACTPT| -.03474*** .00436 -7.97 .0000 -.04328 -.02620
INVCPT| -.20829*** .01592 -13.08 .0000 -.23949 -.17708
INVTPT| -.02570*** .00248 -10.36 .0000 -.03056 -.02084
EGTPT| -.00456 .00366 -1.25 .2128 -.01172 .00261
TRPT| .23276** .09890 2.35 .0186 .03892 .42661
NHR| 1.68603*** .28310 5.96 .0000 1.13118 2.24089
NBW| 1.01965*** .26181 3.89 .0001 .50652 1.53279
BS| -.67008** .26696 -2.51 .0121 -1.19332 -.14684
TN| -.38195 .26071 -1.47 .1429 -.89294 .12904
BW| .09666 .24843 .39 .6972 -.39026 .58358
INVCCAR| -.03602 .07145 -.50 .6142 -.17607 .10402
INVTCAR| -.01446*** .00339 -4.27 .0000 -.02110 -.00782
TC| -.07219** .03172 -2.28 .0229 -.13436 -.01002
PC| -.04537*** .00906 -5.01 .0000 -.06313 -.02762
EGTCR| -.10956*** .02665 -4.11 .0000 -.16178 -.05733
|IV parameters, RU2 form = mu(b|l),gamma(l)
RAILN| 1.32941*** .17559 7.57 .0000 .98526 1.67357
BUSW| 2.13830*** .29297 7.30 .0000 1.56409 2.71251
CARBUS| 1.05539*** .11462 9.21 .0000 .83075 1.28003
MODEAU| 1.26311*** .12687 9.96 .0000 1.01445 1.51176
MODEBU| 1.05061 2.18548 .48 .6307 -3.23287 5.33408
------------------------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
590 The suite of choice models
The choice invariant cross-effect associated with the MNL model is relaxed (in
part at least) in the NL model that provides an additional interesting aspect of
substitution, switching within as well as between branches. To provide an
example, the following NL model is requested by adding:
; Tree = (bs,tn),(bw,cr) ; RU2 ; Effects: act (*) ; Full
to the model command. The following are the model results: the partial effects are
accompanied by a detailed legend that describes the computations, then tables
that contain more information about the elasticities (Nlogit notices that act does
not appear in the utility function for cr, and does not produce a table for it):
-----------------------------------------------------------------------------
FIML Nested Multinomial Logit Model
Dependent variable CHOICE
Log likelihood function -197.95029
Restricted log likelihood -273.09999
Chi squared [ 14](P= .000) 150.29941
Significance level .00000
McFadden Pseudo R-squared .2751729
The model has 2 levels.
Random Utility Form 2:IVparms = Mb|l,Gl
Number of obs.= 197, skipped 0 obs
-----------+----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+----------------------------------------------------------------------------------------
|Attributes in the Utility Functions (beta)
BS| -2.02841** .83930 -2.42 .0157 -3.67340 -.38341
ACTPT| -.09771*** .03141 -3.11 .0019 -.15927 -.03614
INVCPT| -.07608 .04673 -1.63 .1035 -.16767 .01551
INVTPT| -.01555 .01107 -1.40 .1601 -.03724 .00615
EGTPT| -.03873 .02550 -1.52 .1288 -.08871 .01125
TRPT| -1.56494*** .58604 -2.67 .0076 -2.71357 -.41632
TN| -1.69240* .88414 -1.91 .0556 -3.42529 .04048
BW| -1.31136 .86139 -1.52 .1279 -2.99966 .37694
INVTCAR| -.05155*** .01529 -3.37 .0007 -.08152 -.02159
TC| -.09316 .08919 -1.04 .2962 -.26797 .08164
PC| -.01371 .02132 -.64 .5202 -.05550 .02808
EGTCAR| -.04927 .03045 -1.62 .1057 -.10896 .01041
|IV parameters, RU2 form = mu(b|l),gamma(l)
B(1|1,1)| .44168*** .15821 2.79 .0052 .13159 .75177
B(2|1,1)| 1.07293*** .32171 3.34 .0009 .44239 1.70348
-----------+----------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
591 Nested logit estimation
+-----------------------------------------------------------------------------------+
| Partial effects = average over observations |
| |
| dlnP[alt=j,br=b,lmb=l,tr=r] |
| ---------------------------- = D(k:J,B,L,R) = delta(k)*F |
| dx(k):alt=J,br=B,lmb=L,tr=R] |
| |
| delta(k) = coefficient on x(k) in U(J|B,L,R) |
| F = (r=R) (l=L) (b=B) [(j=J)-P(J|BLR)] |
| + (r=R) (l=L) [(b=B) -P(B|LR)]P(J|BLR)t(B|LR) |
| + (r=R) [(l=L)-P(L|R)] P(B|LR) P(J|BLR)t(B|LR)s(L|R) |
| + [(r=R) -P(R)] P(L|R) P(B|IR) P(J|BIR)t(B|LR)s(L|R)f(R) |
| |
| P(J|BLR)=Prob[choice=J |branch=B,limb=L,trunk=R] |
| P(B|LR), P(L|R), P(R) defined likewise. |
| (n=N) = 1 if n=N, 0 else, for n=j,b,l,r and N=J,B,L,R. |
| Elasticity = x(k) * D(j|B,L,R) |
| Marginal effect = P(JBLR)*D = P(J|BLR)P(B|LR)P(L|R)P(R)D |
| F is decomposed into the 4 parts in the tables. |
+-----------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------------+
| Elasticity averaged over observations. |
| Effects on probabilities of all choices in the model: |
| * indicates direct Elasticity effect of the attribute. |
+--------------------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------------+
| Attribute is ACT in choice BS |
| Decomposition of Effect if Nest Total Effect|
| Trunk Limb Branch Choice Mean St.Dev|
| Trunk=Trunk{1} |
| Limb=Lmb[1|1] |
| Branch=B(1|1,1) |
| * Choice=BS .000 .000 -.173 -.131 -.304 .021 |
| Choice=TN .000 .000 -.173 .125 -.048 .010 |
| Branch=B(2|1,1) |
| Choice=BW .000 .000 .110 .000 .110 .008 |
| Choice=CR .000 .000 .110 .000 .110 .008 |
+--------------------------------------------------------------------------------------------------+
| Attribute is ACT in choice TN |
| Decomposition of Effect if Nest Total Effect|
| Trunk Limb Branch Choice Mean St.Dev|
| Trunk=Trunk{1} |
| Limb=Lmb[1|1] |
| Branch=B(1|1,1) |
| Choice=BS .000 .000 -.412 .311 -.101 .015 |
| * Choice=TN .000 .000 -.412 -.345 -.757 .035 |
| Branch=B(2|1,1) |
| Choice=BW .000 .000 .292 .000 .292 .014 |
| Choice=CR .000 .000 .292 .000 .292 .014 |
+--------------------------------------------------------------------------------------------------+
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
592 The suite of choice models
As an example, consider the attribute access time (act) for bus (bs) travel. A
change in act for bus can cause bus riders within the first branch to switch to
train. The elasticity effect shown in the first table is −0.131. The change can
also induce bus users to switch to one of the modes in the other branch (bw
and cr); this effect is −0.173. The total effect of the change in access time for
bus is the sum of these two values, namely −0.304. Note that −0.304 is the
value shown in the summary table at the end of the displayed results. Note the
cross-effects. Within the first branch, the branch effect is the same, namely
−0.173. The within branch effect on Prob(cr) is +0.125, so the total effect is
negative, −0.048. Looking at the other branch, we see that the effect of the
change in access time for bus on the “alien” modes, bw and cr is the same,
0.110. There is no within branch effect in the second branch. The change in
the access time for bus does not cause travellers to substitute back and forth
between bw and cr. Altogether, something important has changed in this
model. In the MNL model, the cross-effects are equal, so of course they all
have the same sign. Here, we find that the cross-effect of act(bus) on the choice
of train is actually negative, while it is positive for the other two modes. The
implication is that when act(bus) increases, it induces some bus riders and
some train riders to switch to either bw or cr. The MNL cannot accommodate
a complicated substitution pattern such as this.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
593 Nested logit estimation
The analyst can specify a particular functional form for the covariate expres-
sion. This model can be formulated as a NL model with IV parameters
multiplied by the exponential functions. For choice k given branch j:
expðβ0 xkjj Þ
PðkjjÞ ¼ X 0
: ð14:10Þ
j expðβ x sjj Þ
For branch j:
expðα0 yj þ j Ij Þ
PðjÞ ¼ X ; ð14:11Þ
expðα0 yj þ j Ij Þ
j
where
j ¼ τ j expðγ0 zj Þ: ð14:12Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
594 The suite of choice models
expðλiq βXiq Þ
Piq ¼ X : ð14:14Þ
expðλiq βXjq Þ
j2Cq
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
595 Nested logit estimation
An example of a CovHet Nested logit using our SP mode choice data is given
below. We have introduced one linear term gender to illustrate. We could have
included ASCs, but did not in the example. The covariates have a positive utility
parameter that suggests that when a respondent is male, ceteris paribus, the scale
parameter increases in value. Another way of saying this is that the standard
deviation of the random component is greater for males compared to females.
|-> Nlogit
;lhs = choice, cset, altij
;choices = NLR,NHR,NBW,bs,tn,bw,cr
;tree=ptnew(NLR,NHR,NBW),Allold(bs,tn,bw,cr)
;show
;RU2
;prob = margprob
;cprob = altprob
;ivb = ivbranch
;utility=mutilz
;hfn=gender
;model:
u(nlr) = nlr + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(nhr) = nhr + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(nbw) = nbw + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = bw + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(cr) = invccar*invc+invtcar*invt + TC*TC + PC*PC + egtcr*egt$/
+----------------------------------------------------------------------------+
|WARNING: Bad observations were found in the sample. |
|Found 104 bad observations among 1970 individuals. |
|You can use ;CheckData to get a list of these points. |
+----------------------------------------------------------------------------+
Tree Structure Specified for the Nested Logit Model
Sample proportions are marginal, not conditional.
Choices marked with * are excluded for the IIA test.
----------------------+-----------------------+----------------------+----------------------+--------+-----
Trunk (prop.)|Limb (prop.)|Branch (prop.)|Choice (prop.)|Weight|IIA
----------------------+-----------------------+----------------------+----------------------+--------+-----
Trunk{1} 1.00000|Lmb[1|1] 1.00000|PTNEW .40997|NLR .17471| 1.000|
| | |NHR .18060| 1.000|
| | |NBW .05466| 1.000|
| |ALLOLD .59003|BS .11790| 1.000|
| | |TN .14094| 1.000|
| | |BW .20096| 1.000|
| | |CR .13023| 1.000|
----------------------+-----------------------+----------------------+----------------------+--------+-----
Line search at iteration 50 does not improve fn. Exiting optimization.
-------------------------------------------------------------------------------------------------------------
Covariance Heterogeneity Model
Dependent variable CHOICE
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
596 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
597 Nested logit estimation
The previous versions of Nlogit allow for one alternative to be placed in one
location in a tree structure. It may be behaviorally meaningful to allow an
alternative to appear in more than one branch or limb as a way of recognizing
that such an alternative is related to other alternatives in different ways. That
is, the correlation between specific alternatives may indeed exist in more than
one part of a tree. Such a specification is known as a generalized nested logit
model. If an alternative appears in more than one location, then it will have to
be assigned through estimation and allocation parameter to recognize its
contribution to specific sources of utility throughout the tree structure. In
the example below, we have placed the bw alternative in two branches. We
tested for many candidate allocations, and found that this one gave the best
overall fit. In other words, we found that only the bw alternative resulted in
model improvements with an allocation (probability) of 0.8180 and 0.1820
between PTA and PTB branches:
|-> Nlogit
;lhs = choice, cset, altij
;choices = NLR,NHR,NBW,bs,tn,bw,cr
;show
;RU2
;prob = margprob
;cprob = altprob
;ivb = ivbranch
;utility=mutilz
;gnl
;tree=PTA(NLR,NHR,NBW,bw),PTB(bs,tn,bw,cr)
;model:
u(nlr) = nlr + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(nhr) = nhr + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(nbw) = nbw + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bs) = bs + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(tn) = tn + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(bw) = bw + actpt*act + invcpt*invc + invtpt*invt2 + egtpt*egt + trpt*trnf /
u(cr) = invccar*invc+invtcar*invt + TC*TC + PC*PC + egtcr*egt$/
+----------------------------------------------------------------------------+
|WARNING: Bad observations were found in the sample. |
|Found 104 bad observations among 1970 individuals. |
|You can use ;CheckData to get a list of these points. |
+----------------------------------------------------------------------------+
Tree Structure Specified for the Nested Logit Model
In GNL model, choices are equally allocated to branches
Choices marked with * are excluded for the IIA test.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
598 The suite of choice models
----------------------+-----------------------+----------------------+----------------------+--------+-----
Trunk (prop.)|Limb (prop.)|Branch (prop.)|Choice (prop.)|Weight|IIA
----------------------+-----------------------+----------------------+----------------------+--------+-----
Trunk{1} 1.00000|Lmb[1|1] 1.00000|PTA .51045|NLR .17471| 1.000|
| | |NHR .18060| 1.000|
| | |NBW .05466| 1.000|
| | |BW .10048| 1.000|
| |PTB .48955|BS .11790| 1.000|
| | |TN .14094| 1.000|
| | |BW .10048| 1.000|
| | |CR .13023| 1.000|
----------------------+-----------------------+----------------------+----------------------+--------+-----
Normal exit: 7 iterations. Status=0, F= 2730.693
--------------------------------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -2730.69253
Estimation based on N = 1866, K = 16
Inf.Cr.AIC = 5493.4 AIC/N = 2.944
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;. . .;RHS=ONE$
Chi-squared[10] = 1588.10946
Prob [ chi squared > value ] = .00000
Response data are given as ind. choices
Number of obs.= 1970, skipped 104 obs
-----------+------------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
-----------+------------------------------------------------------------------------------------------
NLR| 1.84937*** .30793 6.01 .0000 1.24584 2.45291
ACTPT| -.04248*** .00467 -9.10 .0000 -.05163 -.03334
INVCPT| -.24053*** .01369 -17.57 .0000 -.26737 -.21370
INVTPT| -.03160*** .00234 -13.52 .0000 -.03618 -.02702
EGTPT| -.00414 .00400 -1.04 .3002 -.01198 .00369
TRPT| .28841** .12646 2.28 .0226 .04055 .53626
NHR| 1.95132*** .29157 6.69 .0000 1.37987 2.52278
NBW| .85378*** .28549 2.99 .0028 .29423 1.41333
BS| -.64770** .25958 -2.50 .0126 -1.15647 -.13893
TN| -.32632 .26261 -1.24 .2140 -.84102 .18838
BW| -.03503 .26455 -.13 .8946 -.55353 .48347
INVCCAR| -.05669 .06937 -.82 .4138 -.19266 .07928
INVTCAR| -.01635*** .00319 -5.12 .0000 -.02261 -.01009
TC| -.07601** .03093 -2.46 .0140 -.13664 -.01538
PC| -.04837*** .00882 -5.49 .0000 -.06565 -.03109
EGTCR| -.11525*** .02133 -5.40 .0000 -.15707 -.07344
-----------+------------------------------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
-------------------------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
599 Nested logit estimation
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
600 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:35:59 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.018
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
The secret of greatness is simple: do better work than any other man in your field –
and keep on doing it.
(Wilfred A. Peterson)
15.1 Introduction
The ML model syntax commands build on the commands of the MNL model
discussed in Chapter 11. We begin with the basic ML syntax command,
1
Other models exist such as the multinomial probit model (which assumes a normally distributed error
structure), ordered logit and probit models (used when the order of the dependent choice variable has
some meaning), latent class models (used to uncover possible different preference patterns among
assumed respondent segments), and generalized nested logit (GNL). We have deferred discussion of these
models to other chapters.
601
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
602 The suite of choice models
As an aside, the term “fixed parameter” with reference to a non-random parameter within
the ML literature can at times be confusing. In the MNL and NL model frameworks, fixed
parameters are parameter estimates which are fixed at some specific value (such as zero) by
the analyst. That is, they are not estimates at all but rather some analyst-specified value
(although in some cases we may think of these as an analyst-inspired estimate of the true
parameter value). It is also possible to fix parameter estimates within the ML model framework
in a similar manner. Thus, in the ML model framework, fixed parameters may refer to either a
parameter set to some pre-determined value by an analyst or a non-random parameter. For
this reason, we use the phrase, “non-random” parameter rather than fixed parameter.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
603 Mixed logit estimation
c nonstochastic βi = β
n normal βi = β + σvi,vi ~ N[0,1]
s skew normal βi = β + σvi + λ|wi|, vi, wi ~ N[0,1]
l lognormal βi = exp(β + σvi), vi ~ N[0.1]
z truncated normal βi = β + σvi, vi ~ truncated normal (−1.96 to 1.96)
u uniform βi = β + σvi, vi ~ U[−1,1]
f one sided uniform βi = β + βvi, vi ~ uniform[−1,1]
t triangular βi = β + σvi, vi ~ triangle[−1,1]
o one sided triangular βi = β + βvi, vi ~ triangle[−1,1]
d beta, dome βi = β + σvi, vi ~ 2 × beta(2,2) – 1
b beta, scaled βi = βvi, vi ~ beta(3,3)
e Erlang βi = β + σvi, vi ~ gamma(1,4) − 4
g gamma βi = exp(β + σvi), vi = log(-log(u1*u2*u3*u4))
w Weibull βi = β + σvi, vi = 2(-logui)√.5, ui~ U[0,1]
r Rayleigh βi = exp(βi (Weibull))
p exponential βi = β + σvi, vi ~ exponential – 1
q exponential, scaled βi = βvi, vi ~ exponential
x censored (left) βi = max(0, βi (normal))
m censored (right) βi = min(0, βi (normal))
v exp(triangle) βi = exp(βi (triangular))
i type I extreme value βi = β + σvi, vi ~ standard Gumbel
As an aside, For versions of Nlogit after 17 September 2012, if you have a lognormal or
Johnson Sb distributed parameter, sometimes you have to flip the sign of the variable if the
parameter being estimated is naturally negative, as on a cost variable. You can now build
this into the model command, and leave the data alone. Use ; FCN = −name(L) for a
lognormal coefficient, or -name(J) for Sb. Note the minus sign before the variable name. In
RPlogit, if you want Sb, use (J) in the specification.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
604 The suite of choice models
for parameters that vary around a fixed mean. A few deviate from this format.
For example, the lognormal model is of the form:
where both Vk,i and Wk,i are distributed as standard normal and the latter
term is the absolute value. θk may be positive or negative, so the skewness can
go in either direction. The range of this parameter is infinite in both directions,
but the distribution is skewed and therefore asymmetric.
Any of the above distributions may be assigned to any random parameter
named in the fcn command syntax. For example, the command:
;fcn = invt(n)
will specify that a parameter named invt will be a random parameter drawn
from a normal distribution. Note that this command refers to the parameter
name and not the attribute name for an attribute entering a utility expression.
In estimating ML models, more than one parameter may be treated as a
random parameter. Indeed, it is possible that all parameter estimates be
treated as random. When more than one parameter is estimated as random,
there is no requirement that the distributions be the same. Multiple random
parameters are separated in the fcn command by commas (,). In the following
example, the invt parameter will be estimated as a random parameter esti-
mated from a normal distribution, while the cost parameter will be treated as a
random parameter distributed with a triangular distribution:
;fcn=invt(n),cost(t)
As an aside, Nlogit makes use of the first four characters of parameters named within the
fcn specification only. This may cause problems if the first four characters of two or more
random parameters are exactly the same.
The random parameters assigned over the sampled population are obtained
from repeated simulated draws (see Chapter 5). The number of replications of
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
605 Mixed logit estimation
simulated draws, R, from which the random parameters are derived, may be
specified by the analyst using the following command:
;pts= <number of replications>
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
606 The suite of choice models
Rather than use random or SHS draws, shuffled uniform vectors may be used
instead. The necessary command used to request shuffled uniform vectors is
as follows (note that it is possible to use only one draw method in the
estimation process; i.e., random, or SHS, or shuffled uniform vectors):
;shuffle
As an aside, In versions of Nlogit after 17 September 2012, we have added Modified Latin
Hypercube sampling (MLHS) to Halton and pseudo-random draws as an option. Use ;MLHS
in the RPLogit command. In developing this option, we found that (i) MLHS gives the same
answer as Halton; although not identical to the digit, but close enough; and (ii) in perfectly
controlled experiments, we found that it is much faster to generate the Halton or pseudo-
random draws in the data loop, that is, over and over again, than it is to generate them once
before estimation and fetch them from a reservoir as needed. That is, the operation of
“compute each time” is much faster than “compute once then move each time.” This was
surprising. The comparison was not close. Note, however, that it is not possible to compute
the MLHS samples on the fly because the full set of R draws must all be drawn at the same
time. The upshot is that although we expected MLHS to be faster than Halton, it appears not
to be the case.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
607 Mixed logit estimation
The analyst can specify any number in the above command as the actual value
adopted is of no consequence. For consistency, it is suggested that the analyst
select one value and always use this (much as one would use the same PIN
number for a bank account). Whatever value is used, the calc command must
be given before the ML model syntax because the resetting of the random
number generator is a separate command to that of the ML command (hence
the $ at the end).
Throughout this chapter, we will use the following calc command to reset
the random number generator but, as suggested above, any number could be
used (noting that there is no replicability problem if Halton or Shuffled draws
are used and the calc command below is not required):
calc;ran(12345)$
As an aside, the order of random parameters will matter because it affects the draws (be
they random or intelligent draws such as Halton) that are applied to the different parameters.
Setting the seed only starts the entire chain at a specific point. It does not set the chain for
each parameter. For example, think of three random parameters a,b,c, and 100 draws. The
chain of draws is v1 . . . v300. If ordered a,b,c, “a” gets v1-v100, “b” gets v101-v200, and
“c” gets v201-v300. If ordered c,b,a, then “c” gets v1-v100, etc. Setting the seed only
establishes that v1 . . . v300 are the same every time. This becomes a source of any
observed small differences. To ensure that the difference is not important, the analyst needs
to choose a large enough number of draws. Although it would be possible to let the user
supply specific seeds for each parameter, it is not worth the effort.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
608 The suite of choice models
In this section we interpret the output from a ML model. As with Chapter 13,
we will concentrate on interpreting the output from this model and not
concern ourselves with how to improve the model’s overall performance
(this we leave to the reader). Subsequent sections will add to the output
generated through the estimation of more complex ML models.
The ML model shown in the following example is estimated using a com-
muter mode choice case study presented in Appendix 11A to Chapter 11.
Through the command ;halton, we have requested that standard Halton
sequences draws be used to estimate each of the random parameters. The
;fcn command specification is used to stipulate that the 11 attributes (drawn
from a normal distribution) be treated as random parameters. Other attributes
in the utility functions will be treated as non-random parameters. To reduce
the amount of time necessary for model convergence (which will be useful in
the classroom to reproduce the results), we have restricted the number of
replications to 100:
sample;all$
reject;dremove=1$ Removing data with errors
reject;ttype#1$ Selecting Commuter sample
reject;altij=-999$
Nlogit
;lhs=resp1,cset,Altij
;choices=NLRail,NHRail,NBway,Bus,Bway,Train,Car
;par
;rpl
;fcn=invt(n),cost(n),acwt(n) ,eggt(n), crpark(n),accbusf(n),
waittb(n),acctb(n),crcost(n), crinvt(n),creggt(n)
;halton;pts= 100
;model:
U(NLRail)= NLRAsc + cost*tcost + invt*InvTime + acwt*wait+ acwt*acctim
+ accbusf*accbusf+eggT*egresst + ptinc*pinc + ptgend*gender +
NLRinsde*inside /
U(NHRail)= TNAsc + cost*Tcost + invt*InvTime + acwt*WaitT + acwt*acctim
+ eggT*egresst + accbusf*accbusf + ptinc*pinc + ptgend*gender +
NHRinsde*inside /
U(NBway)= NBWAsc + cost*Tcost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst + accbusf*accbusf+ ptinc*pinc + ptgend*-
gender /
U(Bus)= BSAsc + cost*frunCost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst+ ptinc*pinc + ptgend*gender/
U(Bway)= BWAsc + cost*Tcost + invt*InvTime + waitTb*WaitT
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
609 Mixed logit estimation
As with the NL models, Nlogit will first estimate a MNL model to derive the
initial start values for each of the parameters in the ML model. In the case of
the ML model, the estimation of the MNL model to obtain starting values for
the parameter estimates is not optional and as such does not require the
addition of commands such as start=logit (see Chapter 13):
Start values obtained using MNL model
Dependent variable Choice
Log likelihood function -2487.36242
Estimation based on N = 1840, K = 20
Inf.Cr.AIC = 5014.7 AIC/N = 2.725
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;. . .;RHS=ONE$
Response data are given as ind. choices
Number of obs.= 1840, skipped 0 obs
-----------+----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
RESP1| Coefficient Error z |z|>Z* Interval
-----------+----------------------------------------------------------------------------------------
INVT| −.04940*** .00207 -23.87 .0000 -.05346 -.04535
COST| -.18921*** .01386 -13.66 .0000 -.21637 -.16205
ACWT| -.05489*** .00527 -10.42 .0000 -.06521 -.04456
EGGT| -.01157** .00471 -2.46 .0140 -.02080 -.00235
CRPARK| -.01513** .00733 -2.07 .0389 -.02950 -.00077
ACCBUSF| -.09962*** .03220 -3.09 .0020 -.16274 -.03650
WAITTB| -.07612*** .02414 -3.15 .0016 -.12343 -.02880
ACCTB| -.06162*** .00841 -7.33 .0000 -.07810 -.04514
CRCOST| -.11424*** .02840 -4.02 .0001 -.16990 -.05857
CRINVT| -.03298*** .00392 -8.42 .0000 -.04065 -.02531
CREGGT| -.05190*** .01379 -3.76 .0002 -.07894 -.02486
NLRASC| 2.69464*** .33959 7.93 .0000 2.02905 3.36022
PTINC| -.00757*** .00194 -3.90 .0001 -.01138 -.00377
PTGEND| 1.34212*** .17801 7.54 .0000 .99323 1.69101
NLRINSDE| -.94667*** .31857 -2.97 .0030 -1.57106 -.32227
TNASC| 2.10793*** .32772 6.43 .0000 1.46562 2.75024
NHRINSDE| -.94474*** .36449 -2.59 .0095 -1.65913 -.23036
NBWASC| 1.41575*** .36237 3.91 .0001 .70551 2.12599
BSASC| 1.86891*** .32011 5.84 .0000 1.24151 2.49630
BWASC| 1.76517*** .33367 5.29 .0000 1.11120 2.41914
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
610 The suite of choice models
This MNL model has 5 ASCs, 11 attributes (in bold) associated with
the set of available modes, two SECs (PTINC, PTGEND), and two
variables describing where the commuter’s trip destination for the
new rail modes (NLRINSIDE, NHRINSIDE) is inside (1) or outside
(0) of the study area.
As an aside, It is unusual to have an ASC that is the same in more than one utility expression;
namely TNASC. However, this is appropriate in this model since NHRAIL is an extension of
the TRAIN system in Sydney, and we found that this is the best representation of what is
essentially the same alternative, even though the choice experiment separated this new
infrastructure out from the existing (i.e., TRAIN) network.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
611 Mixed logit estimation
ASCs only:
|-> Nlogit
;lhs=resp1,cset,Altij
;choices=NLRail,NHRail,NBway,Bus,Bway,Train,Car
;maxit=100
;model:
U(NLRail)= NLRAsc/
U(NHRail)= NHRAsc/
U(NBway)= NBWAsc/
U(Bus)= BusAsc/
U(Train)= TnAsc/
U(Bway)= BwyAsc$
Normal exit: 5 iterations. Status=0, F= 3130.826
----------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -3130.82617
Estimation based on N = 1840, K = 6
Inf.Cr.AIC = 6273.7 AIC/N = 3.410
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;. . .;RHS=ONE$
Response data are given as ind. choices
Number of obs.= 1840, skipped 0 obs
-----------+----------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
RESP1| Coefficient Error z |z|>Z* Interval
-----------+----------------------------------------------------------------------------------------
NLRASC| .34098*** .08886 3.84 .0001 .16683 .51514
NHRASC| .64197*** .08600 7.46 .0000 .47342 .81053
NBWASC| -.95132*** .14913 -6.38 .0000 -1.24362 -.65903
BUSASC| .00090 .08913 .01 .9920 -.17378 .17558
TNASC| .30541*** .08478 3.60 .0003 .13924 .47158
BWYASC| .02057 .09015 .23 .8195 -.15611 .19726
-----------+----------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
612 The suite of choice models
is 0.316 when compared with equal choice shares, but only 0.016 when
contrasted with the MNL model performance (the start values). It is not
unusual for ML to be marginally better on overall goodness of fit from the
MNL model; however, one must be careful in relying on this single measure of
performance in selecting the preferred mode since there are many behavio-
rally appealing outputs on ML that are not available in MNL. Specifically, as
discussed later in this chapter, we often find that the WTP estimates do vary
across the sample and using a single estimate (from MNL) is a behaviorally
limiting condition, even if the model fit is very similar.
Also provided is a Chi-square statistic for the log-likelihood ratio test (LRT)
(using as the base comparison model, a model with equal choice shares only)
and information on the pseudo-R2. In the above example, the model is
statistically significant (Chi-square equal to 2283.33 with 31 degrees of free-
dom and a p-value equal to zero):
|-> Nlogit
;lhs=resp1,cset,Altij
;choices=NLRail,NHRail,NBway,Bus,Bway,Train,Car
;par
;rpl
;fcn=invt(n),cost(n),acwt(n) ,eggt(n), crpark(n),accbusf(n),
waittb(n),acctb(n),crcost(n), crinvt(n),creggt(n)
;halton;pts= 100
;model:
U(NLRail)= NLRAsc + cost*tcost + invt*InvTime + acwt*wait+ acwt*acctim
+ accbusf*accbusf+eggT*egresst + ptinc*pinc + ptgend*gender +
NLRinsde*inside /
U(NHRail)= TNAsc + cost*Tcost + invt*InvTime + acwt*WaitT + acwt*acctim
+ eggT*egresst + accbusf*accbusf + ptinc*pinc + ptgend*gender +
NHRinsde*inside /
U(NBway)= NBWAsc + cost*Tcost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst + accbusf*accbusf+ ptinc*pinc +
ptgend*gender /
U(Bus)= BSAsc + cost*frunCost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst+ ptinc*pinc + ptgend*gender/
U(Bway)= BWAsc + cost*Tcost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst + accbusf*accbusf+ ptinc*pinc +
ptgend*gender /
U(Train)= TNAsc + cost*tcost + invt*InvTime + acwt*WaitT + acwt*acctim
+ eggT*egresst + accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Car)= CRcost*costs + CRinvt*InvTime + CRpark*parkcost+
CReggT*egresst$
Normal exit: 6 iterations. Status=0, F= 2487.362
----------------------------------------------------------------------------
Random Parameters Logit Model
Dependent variable RESP1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
613 Mixed logit estimation
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
614 The suite of choice models
The output provides the analyst with information on the number of replica-
tions used in the simulated draws, as well as the type of draw used. In this
example, the output indicates that SHS draws were used in the estimation
process with 100 replications. No bad observations were removed during
model estimation.
The last section of the output provides information on the parameter
estimates of the ML model. The first and last output generated in this section
relates to the random parameters estimated as part of the ML model. The
first series of parameter estimate output relates to the random parameters
and is used to determine whether the mean of the sample population
random parameters obtained from the 100 SHS draws is statistically differ-
ent to zero.
Random parameters estimated within the most basic ML framework are
estimated over the sampled population from a number of draws (either
random draws or intelligent draws; SHS or shuffled uniform vectors). The
parameter estimates thus obtained are derived at the sample population level
only. This is not the same as estimating individual-specific parameter esti-
mates. Parameter estimates estimated at the sample population level are called
unconditional parameter estimates, as the parameters are not conditioned on
any particular individual’s choice pattern but rather on the sample population
as a whole. The process of estimating unconditional random parameters is
similar to the estimation process of non-random parameters in the MNL and
ML models; that is, maximization of the LL function over the data for the
sample population. In a later section, we demonstrate how to estimate indi-
vidual-specific or conditional parameter estimates. We leave it until then to
discuss the differences between the two types of estimates but note that the
two often produce widely varying results.
Each draw taken from some specified distribution will produce a unique
sample population parameter estimate for each random parameter estimated.
To avoid spurious results (for example, drawing a single observation from the
tail of the distribution), R replications of draws are used. It is from these R
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
615 Mixed logit estimation
replicated draws that the mean random parameter is derived. Simply put, the
mean of each random parameter is the average of the parameters drawn over
the R replications from the appropriate distribution. It is this value that is
given in the above output. For the invtime attribute treated as generic across all
public transport alternatives, the parameter estimate of −0.0784 represents the
mean of the R draws over the 100 SHS draws requested within the command
syntax.
The interpretation of the output associated with the mean of a random
parameter estimate is much the same as with the non-random parameters
discussed in Chapter 11. The p-value for the invtime attribute random para-
meter is 0.00, which is less than alpha equal to 0.05 (i.e., 95 percent confidence
interval). As the p-value is less than the analyst determined critical value, we
reject the null hypothesis at the 95 percent level of confidence and conclude
that the mean of the random parameter is statistically different to zero. The
p-value for the cost parameter associated with all public transport alternatives
is similarly less than alpha equal to 0.05, suggesting that the mean parameter
estimate of −0.3626 for this random parameter is also statistically different to
zero. Both random parameters have means that are, at the sample population
level, statistically different to zero.
While the first set of output relates to the means of each of the random
parameters, the second series of output relates to the amount of dispersion
that exists around the sample population. The parameter estimates given in
the output are the derived standard deviations calculated over each of the R
draws. Insignificant parameter estimates for derived standard deviations
indicate that the dispersion around the mean is statistically equal to zero,
suggesting that all information in the distribution is captured within the mean.
Statistically significant parameter estimates for derived standard deviations
for a random parameter suggest the existence of heterogeneity in the para-
meter estimates over the sampled population around the mean parameter
estimate (i.e., different individuals possess individual-specific parameter esti-
mates that may be different from the sample population mean parameter
estimate).
As an aside, the reader will note that the names of the parameters in the above output
are preceded by two letters. These letters are used to identify the analytical distribution
imposed on the random parameter estimate. Random parameters drawn from normal
distributions will have the letters Ns, lognormal Ls, uniform Us, triangular distributions Ts,
and non-stochastic distributions Cs (we discuss the special case of using a non-stochastic
distribution in a later section).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
616 The suite of choice models
accbus σβ≠0
β = −0.1294 0
parameter
For the above example, the dispersion of the invehicle time random para-
meter represented by a derived standard deviation of 0.042 is statistically
significant given a Wald statistic of 7.52 (within the ±1.96 range) and a
p-value of 0.00 (which is less than our critical value of alpha equal to 0.05).
In this case, all individuals within the sample cannot be (statistically) repre-
sented by a invtime parameter of −0.0784. The case for a distribution of
parameter values to represent the entire sampled population is justified.
Dispersion of the access bus fare parameter (Nsaccbus) is statistically
insignificant, as suggested by a Wald statistic of 0.67 (outside the ±1.96 critical
value range) and a p-value of 0.5033. Unlike the invt parameter, the model
suggests that the accbusf parameter should collapse to a single point repre-
sentative of the entire sampled population. For the analyst, this suggests that
the presence of heterogeneity over the sampled population with regard to
individual level accbusf parameter estimates does not exist. As such, a single
parameter estimate of −0.1294 is sufficient to represent all sampled indivi-
duals. We show this in Figure 15.1.
At this point, the analyst may wish to respecify the accbusf parameter as a
non-random parameter or re-estimate the model maintaining the accbusf
parameter as a random parameter, but assign to it some other distribution
from which it may be derived. Despite supporting evidence that the invt
parameter should be treated as a normally distributed random parameter,
the analyst may also wish to assign different distributional forms to test for
better model fits. Also, other parameters formally treated as non-random may
be estimated as random parameters in further exploratory work. Once the
analyst is content with the model results, the model should be re-estimated
with a greater number of draws to confirm stability in the results.
As an aside, the statistical significance of attributes does vary as the number of draws
changes, so one must exercise some judgment in the initial determination of statistically
significant effects. Practical experience suggests that an attribute with a z-value over 1.5 for
a small number of draws may indeed become statistically significant (i.e., over 1.96) with a
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
617 Mixed logit estimation
larger number of draws. This has been observed more for the standard deviation parameters
(i.e., those derived from normal and log-normal distributions).
In order to write out the utility functions for the above model, we need to
consider Equation (15.1), that accounts for the normal distribution assump-
tion placed upon the random parameters of the model. In the above example,
we have no associated heterogeneity in the mean parameter estimate (we
explore this option in a later section). As such, we may rewrite Equation
(15.1) as Equation (15.4):
where N has a standard normal distribution. For the invtime random para-
meter, we observe a mean of −0.07845 and a standard deviation of 0.04206.
By way of Equation (15.1) we may write this as such, known as the marginal
(dis-utility associated with the invtime attribute:
The commands used to obtain the estimate of invtd and to plot it (Figure 15.2)
are as follows:
10.25
8.20
6.15
Density
4.10
2.05
.00
–.300 –.250 –.200 –.150 –.100 –.050 .000 .050 .100
INVTD
kernel density estimate for INVTD
Figure 15.2 Unconstrained distribution of invehicle time for public transport modes
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
618 The suite of choice models
As an aside, the assignment of the numerator and denominator estimate to each sampled
respondent is random, given the absence of any systematic influences interacting with this
parameterization. The ratio of two randomized allocations produces a VTTS for each
respondent which may be problematic if the numerator happens to be a high (low) estimate
from the distribution and the denominator a low (high) value. This has been avoided in many
studies by using a fixed parameter in the denominator. While it resolves this issue, it comes
at the price of ignoring preference heterogeneity where it can be shown, as in the current
model, to exist. One way to minimize the risk of this occurring is to introduce heterogeneity
in the mean and/or variance of the random parameter, as shown in a later section of this
chapter.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
619 Mixed logit estimation
Vnlrail = 2.8912+(−0.07845+0.04206×N)*invtime+(−0.36258+0.31629xN)
*tcost+(−0.08227+0.02742xN)*wait+(−0.08227+0.02742xN)*acctim
+(−0.02832+0.05633xN)*egress
+(−0.12941+0.08363xN)*accbusf−0.0215*pinc+2.95546*gender
−1.35718*nlrinside
Vnhrail = 2.08897+(−0.07845+0.04206×N)*invtime+(−0.36258
+0.31629xN)*tcost+(−0.08227+0.02742xN)*wait+(−0.08227
+0.02742xN)*acctim+(−0.02832+0.05633xN)*egresst
+(−0.12941+0.08363xN)*accbusf−0.0215*pinc+2.95546*gender
−1.35718*nlrinside
Vnbway = 1.33874+(−0.07845+0.04206×N)*invtime+(−0.36258
+0.31629xN)*tcost+(−1.0341+0.06519xN)*wait+(−0.08388
+0.00453xN)*acctim+(−0.02832+0.05633xN)*egresst
+(−0.12941+0.08363xN)*accbusf−0.0215*pinc+2.95546*gender
Vbus = 1.59186+(−0.07845+0.04206×N)*invtime+(−0.36258+0.31629xN)
*tcost+(−1.0341+0.06519xN)*wait+(−0.08388+0.00453xN)*acctim
+(−0.02832+0.05633xN)*egresst
+(−0.12941+0.08363xN)*accbusf−0.0215*pinc+2.95546*gender
Vbway = 1.54923+(−0.07845+0.04206×N)*invtime+(−0.36258+0.31629xN)
*tcost+(−1.0341+0.06519xN)*wait+(−0.08388+0.00453xN)*acctim
+(−0.02832+0.05633xN)*egresst
+(−0.12941+0.08363xN)*accbusf−0.0215*pinc+2.95546*gender
Vtrain = 2.08897+(−0.07845+0.04206×N)*invtime+(−0.36258+0.31629xN)
*tcost+(−0.08227+0.02742xN)*wait+(−0.08227+0.02742xN)*acctim
+(−0.02832+0.05633xN)*egresst
+(−0.12941+0.08363xN)*accbusf−0.0215*pinc+2.95546*gender
Vcar = (−0.10051+0.04926×N)*invtime+(−0.31892+0.26923xN)*tcost
+(−0.08806+0.08274xN)*parkcost+(−0.12685+0.00363xN)*egresst
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
620 The suite of choice models
1 Xn K½ðzj xi Þ=h
f ðzj Þ ¼ ; j ¼ 1; . . . ; M: ð15:6Þ
n i¼1 h
The function is computed for a specified set of values of interest, zj, j = 1,. . .,M
where zj is a partition of the range of the attribute. Each value requires a
sum over the full sample of n values, xi, i = 1.,,,.n. The primary component of
the computation is the kernel, or weighting function, K[.], which take a
number of forms. For example, the normal kernel is K[z]= φ(z) (normal
density). Thus, for the normal kernel, the weights range from φ(0) = 0.399
when xi = zj to values approaching zero when xi is far from zj. Thus, again,
what the kernel density function is measuring is the proportion of the sample
of values that is close to the chosen zj.
The other essential part of the computation is the smoothing (bandwidth)
parameter, h, to ensure a good plot resolution. The bandwidth parameter is
exactly analogous to the bin width in a common histogram. Thus, as noted
earlier, narrower bins (smaller bandwidths) produce unstable histograms
(kernel density estimators) because not many points are “in the neighbor-
hood” of the value of interest. Large values of h stabilize the function, but tend
to flatten it and reduce the resolution – imagine a histogram with only two or
three bins, for example. Small values of h produce greater detail, but also cause
the estimator to become less stable. An example of a bandwidth is given in
Equation (15.7), which is a standard form used in several contemporary
computer programs:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
621 Mixed logit estimation
h ¼ 0:9Q=n0:2 ; ð15:7Þ
If you need to force a parameter to be negative, rather than positive, you can
use these distributions anyway – just multiply the variable by −1 before
estimation. (Note, in Nlogit, what we have labeled the “Rayleigh” variable is
not actually a Rayleigh variable, though it does resemble one. It has a shape
similar to the log-normal; however, its tail is thinner, so it may be a more
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
622 The suite of choice models
plausible model.) If you specify these distributions for a parameter that would
be negative if unrestricted, the estimator will fail to converge, and issue a
diagnostic that it could not locate an optimum of the function (log-likelihood,
LL). In addition, the maximum and minimum specifications are not contin-
uous in the parameters, and will often not be estimable.
As an aside, the constrained triangular that uses name(o) can also be defined by name(t,1),
which indicates that the mean and standard deviation are set equal.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
623 Mixed logit estimation
15.62
12.49
9.27
Density
6.25
3.12
.00
–.128 –.102 –.077 –.051 –.026 .000
INVTD
kernel density estimate for INVTD
Figure 15.3 Constrained distribution of invehicle time for public transport modes
(such that each side of the tent has area s × (1/s) × (1/2) = 1/2, and both sides
have area 1/2 + 1/2 = 1, as required for a density). The slope is 1/s2.
The results below differ from the previous model only in the distributional
assumption of the random parameters. You will see that the standard devia-
tion parameter estimate for each random parameter is exactly equal to the
mean estimate of the random parameter. This constraint ensures that the full
distribution satisfies the one (negative) sign. We can show this with the
following commands and the graph in Figure 15.3:
create
;rna=rnn(0,1)
;V1=rnu(0,1)
;if(v1<=0.5)T=sqr(2*V1)-1;(ELSE) T=1-sqr(2*(1-V1))
;Invtd = -0.06368 + 0.06368*T $
reject;altij=7$
kernel;rhs=invtd;limits=-0.128,0$
Kernel Density Estimator for INVTD
Kernel Function = Logistic
Observations = 9060
Points plotted = 1008
Bandwidth = .003821
Statistics for abscissa values-----
Mean = -.063446
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
624 The suite of choice models
Nlogit
;lhs=resp1,cset,Altij
;choices=NLRail,NHRail,NBway,Bus,Bway,Train,Car
;par
;rpl
;fcn=invt(o),cost(o),acwt(o) ,eggt(o), crpark(o),
accbusf(o),waittb(o),acctb(o),crcost(o),crinvt(o),creggt(o)
;maxit=200
;halton;pts= 100
;model:
U(NLRail)= NLRAsc + cost*tcost + invt*InvTime + acwt*wait+ acwt*acctim
+ accbusf*accbusf+eggT*egresst + ptinc*pinc + ptgend*gender +
NLRinsde*inside /
U(NHRail)= TNAsc + cost*Tcost + invt*InvTime + acwt*WaitT + acwt*acctim
+ eggT*egresst + accbusf*accbusf + ptinc*pinc + ptgend*gender +
NHRinsde*inside /
U(NBway)= NBWAsc + cost*Tcost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst + accbusf*accbusf+ ptinc*pinc +
ptgend*gender /
U(Bus)= BSAsc + cost*frunCost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst+ ptinc*pinc + ptgend*gender/
U(Bway)= BWAsc + cost*Tcost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst + accbusf*accbusf+ ptinc*pinc +
ptgend*gender /
U(Train)= TNAsc + cost*tcost + invt*InvTime + acwt*WaitT + acwt*acctim
+ eggT*egresst + accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Car)= CRcost*costs + CRinvt*InvTime + CRpark*parkcost+
CReggT*egresst$
Normal exit: 27 iterations. Status=0, F= 2465.753
----------------------------------------------------------------------------
Random Parameters Logit Model
Dependent variable RESP1
Log likelihood function -2465.75251
Restricted log likelihood -3580.47467
Chi squared [ 20](P= .000) 2229.44432
Significance level .00000
McFadden Pseudo R-squared .3113336
Estimation based on N = 1840, K = 20
Inf.Cr.AIC = 4971.5 AIC/N = 2.702
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
625 Mixed logit estimation
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
626 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
627 Mixed logit estimation
The model below is an extension of the previous model, with the addition of
the 11 parameters associated with heterogeneity in the mean of the random
parameters. The LL at convergence is −2444.458, compared to −2465.752 for
exactly the same model without these additional parameters. With 11 degrees
of freedom difference, the LRT gives −2*(21.294) = 42.588. This is greater than
the critical Chi-square value with 11 degrees of freedom of 19.68, and hence
we can reject the null hypothesis of no difference:
Nlogit
;lhs=resp1,cset,Altij
;choices=NLRail,NHRail,NBway,Bus,Bway,Train,Car
;par
;rpl=pinc
;fcn=invt(t,1),cost(t,1),acwt(t,1) ,eggt(t,1), crpark(t,1),
accbusf(t,1),waittb(t,1],acctb(t,1),crcost(t,1),crinvt(t,1),creggt
(t,1)
;maxit=200
;halton;pts= 100
;model:
U(NLRail)= NLRAsc + cost*tcost + invt*InvTime + acwt*wait+ acwt*acctim
+ accbusf*accbusf+eggT*egresst + ptinc*pinc + ptgend*gender +
NLRinsde*inside /
U(NHRail)= TNAsc + cost*Tcost + invt*InvTime + acwt*WaitT + acwt*acctim
+ eggT*egresst + accbusf*accbusf + ptinc*pinc + ptgend*gender +
NHRinsde*inside /
U(NBway)= NBWAsc + cost*Tcost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst + accbusf*accbusf+ ptinc*pinc +
ptgend*gender /
U(Bus)= BSAsc + cost*frunCost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst+ ptinc*pinc + ptgend*gender/
U(Bway)= BWAsc + cost*Tcost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst + accbusf*accbusf+ ptinc*pinc +
ptgend*gender /
U(Train)= TNAsc + cost*tcost + invt*InvTime + acwt*WaitT + acwt*acctim
+ eggT*egresst + accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Car)= CRcost*costs + CRinvt*InvTime + CRpark*parkcost+
CReggT*egresst$
Line search at iteration 47 does not improve fn. Exiting optimization.
----------------------------------------------------------------------------
Random Parameters Logit Model
Dependent variable RESP1
Log likelihood function -2444.45824
Restricted log likelihood -3580.47467
Chi squared [ 31](P= .000) 2272.03287
Significance level .00000
McFadden Pseudo R-squared .3172810
Estimation based on N = 1840, K = 31
Inf.Cr.AIC = 4950.9 AIC/N = 2.691
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
628 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
629 Mixed logit estimation
The marginal utility associated with a specific variable now includes the
additional “interaction” term. For example, the marginal utility expression
for invt is:
MUinvt = − 0.07373 + 0.0001*pinc + 0.07373*o, where o is the one-sided
triangular distribution. This additional term indicates that as personal income
increases, the marginal utility of invt will increase or the marginal disutility
(given the negative sign for the mean estimate) will decrease. Readers can
check this out by implementing the following command:
create
;rna=rnn(0,1)
;V1=rnu(0,1)
;if(v1<=0.5)T=sqr(2*V1)-1;(ELSE) T=1-sqr(2*(1-V1))
;Invtd = =-0.07373+0.0001*pinc+0.07373*T$
List;invtd$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
630 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
631 Mixed logit estimation
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
632 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
633 Mixed logit estimation
The variables in hri may be any variables, but they must be choice invariant.
This specification will produce the same form of heteroskedasticity in each
parameter distribution – note that each parameter has its own parameter
vector, ωk.
In Section 15.3.4 we described the method of modifying the specification of
the heterogenous means of the parameters so that some RPL variables in zi
may appear in the means of some parameters and not others. A similar
construction may be used for the variances. For any parameter specification
of the forms set out above, the specification may end with an exclamation
point, “!” to indicate that the particular parameter is to be homoskedastic even
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
634 The suite of choice models
The variance for invt includes all three variables, but the variance for acwt
excludes family.
We present a model below with three random parameters in which we
specify:
;rpl=pinc
;fcn=invt(n),cost(n),acwt(n!01)
;hfr=gender,pinc
This model allows for heterogeneity in the mean (pinc) for all three random
parameters and heteroskedasticity in variance for acwt only, but for only the
second systematic source of influence listed in ;hfr=gender,pinc. Any combi-
nation of heterogeneity in the mean and heteroskedasticity in the variance of
one or more random parameters is permissible, making this a very general
ML form. None of the five heteroskedasticity effects is statistically significant
at the 95 percent confidence level; however, the model is sufficient to show the
additional information obtained:
Nlogit
;lhs=resp1,cset,Altij
;choices=NLRail,NHRail,NBway,Bus,Bway,Train,Car
;par
;rpl=pinc
;fcn=invt(n),cost(n),acwt(n!01)
;hfr=gender,pinc
;maxit=100
;halton;pts= 100
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
635 Mixed logit estimation
;model:
U(NLRail)= NLRAsc + cost*tcost + invt*InvTime + acwt*wait+ acwt*acctim
+ accbusf*accbusf+eggT*egresst + ptinc*pinc + ptgend*gender +
NLRinsde*inside /
U(NHRail)= TNAsc + cost*Tcost + invt*InvTime + acwt*WaitT + acwt*acctim
+ eggT*egresst + accbusf*accbusf + ptinc*pinc + ptgend*gender +
NHRinsde*inside /
U(NBway)= NBWAsc + cost*Tcost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst + accbusf*accbusf+ ptinc*pinc +
ptgend*gender /
U(Bus)= BSAsc + cost*frunCost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst+ ptinc*pinc + ptgend*gender/
U(Bway)= BWAsc + cost*Tcost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst + accbusf*accbusf+ ptinc*pinc +
ptgend*gender /
U(Train)= TNAsc + cost*tcost + invt*InvTime + acwt*WaitT + acwt*acctim
+ eggT*egresst + accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Car)= CRcost*costs + CRinvt*InvTime + CRpark*parkcost+
CReggT*egresst$
Line search at iteration 61 does not improve fn. Exiting optimization.
-------------------------------------------------------------------------------------------------------
Random Parameters Logit Model
Dependent variable RESP1
Log likelihood function -5015.34107
Restricted log likelihood -7376.94538
Chi squared [ 31](P= .000) 4723.20862
Significance level .00000
McFadden Pseudo R-squared .3201331
Estimation based on N = 3791, K = 31
Inf.Cr.AIC = 10092.7 AIC/N = 2.662
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
No coefficients -7376.9454 .3201 .3189
Constants only can be computed directly
Use NLOGIT ;. . .;RHS=ONE$
At start values -5079.7499 .0127 .0110
Response data are given as ind. choices
Replications for simulated probs. = 100
Used Halton sequences in simulations.
Heteroskedastic random parameters
BHHH estimator used for asymp. variance
Number of obs.= 3791, skipped 0 obs
-----------+------------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
RESP1| Coefficient Error z |z|>Z* Interval
-----------+------------------------------------------------------------------------------------------
|Random parameters in utility functions
INVT| -.04983*** .00294 -16.92 .0000 -.05560 -.04405
COST| -.39194*** .02686 -14.59 .0000 -.44459 -.33929
ACWT| -.06842*** .00631 -10.85 .0000 -.08078 -.05606
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
636 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
637 Mixed logit estimation
As an aside, for current versions of Nlogit, the correlation command syntax will not work in
conjunction with constraints imposed on any random parameter distributions.
As an aside, The model with both correlated parameters (;Correlated) and heteroskedastic
random parameters is not estimable. If your model command contains both ;Correlated and
;Hfr = list, the heteroskedasticity takes precedence, and ;Correlated is ignored.
As an aside, The Nlogit version post-24 October 2014 has replaced the report of the “Lower
triangle of the Cholesky Matrix” with covariances of the random parameters in the standard
output of the RPL model with correlated parameters.
Nlogit
;lhs=resp1,cset,Altij
;choices=NLRail,NHRail,NBway,Bus,Bway,Train,Car
;par
;rpl
;corr
;fcn=invt(n),crinvt(n),cost(n)
;maxit=100
;halton;pts= 100
;model:
U(NLRail)= NLRAsc + cost*tcost + invt*InvTime + acwt*wait+ acwt*acctim
+ accbusf*accbusf+eggT*egresst + ptinc*pinc + ptgend*gender +
NLRinsde*inside /
U(NHRail)= TNAsc + cost*Tcost + invt*InvTime + acwt*WaitT + acwt*acctim
+ eggT*egresst + accbusf*accbusf + ptinc*pinc + ptgend*gender +
NHRinsde*inside /
U(NBway)= NBWAsc + cost*Tcost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst + accbusf*accbusf+ ptinc*pinc +
ptgend*gender /
U(Bus)= BSAsc + cost*frunCost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst+ ptinc*pinc + ptgend*gender/
U(Bway)= BWAsc + cost*Tcost + invt*InvTime + waitTb*WaitT
+ accTb*acctim + eggT*egresst + accbusf*accbusf+ ptinc*pinc +
ptgend*gender /
U(Train)= TNAsc + cost*tcost + invt*InvTime + acwt*WaitT + acwt*acctim
+ eggT*egresst + accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Car)= CRcost*costs + CRinvt*InvTime + CRpark*parkcost+
CReggT*egresst$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
638 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
639 Mixed logit estimation
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
640 The suite of choice models
N
1X
covðx; yÞ ¼ ðXi Ui ÞðYi Ui Þ: ð15:10Þ
n i¼1
Positive covariances suggest that larger parameter estimates for individuals along
the distribution on one attribute are generally associated with larger parameter
estimates for that same individual in the parameter space for the second
attribute. For example, the covariance of 0.01348 between the cost and crinvt
random parameters suggests that individuals with larger (i.e., more negative –
the marginal utilities for this attribute are expected to be negative) sensitivities to
car invehicle time are likely to have higher (negative) marginal utilities for car
invehicle time. The larger the covariance, the greater the relationship between the
two random parameters. Hence, 0.00158 suggests a weaker (positive) relation-
ship between the invt and cost random parameters than between the crinvt and
cost random parameters with a covariance statistic of 0.01348 (also a positive
relationship; larger values of crinvt result in larger values of cost).
There exists a direct relationship between the variances and covariances
and the correlations observed. The correlation coefficient used to produce the
correlations is:
covðX1; X2Þ
¼ : ð15:11Þ
X1 ×X2
that is the correlation reported between the crnvt and cost random parameters
by Nlogit (note that it is the standard deviations and not variances that make
up the denominator of the correlation coefficient).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
641 Mixed logit estimation
We need to discuss the Cholesky matrix in more detail so that the calcula-
tions reported in the model output can be better understood. The Cholesky
decomposition matrix is a lower triangular matrix (meaning that the upper
off-diagonal elements of the matrix are all zero). The above output illustrates
the presence of correlated alternatives due to correlated random parameters,
all normally distributed. When we have more than one random parameter
and we permit correlated random parameters, then the standard deviations
are no longer independent. To assess this, we have to decompose the standard
deviation parameters into their attribute-specific (e.g., invt and cost) and
attribute-interaction (e.g., invt × cost) standard deviations. Cholesky decom-
position is the method used to do this, and has been set out in some detail
in Chapter 5 where the Cholesky matrix is obtained from the variance-
covariance matrix. The ML model is extended to accommodate this case by
allowing the set of random parameters to have an unrestricted covariance
matrix. The non-zero off-diagonal element of this matrix carries the cross-
parameter correlations.
As noted, the standard deviations of random parameter estimates under
conditions of correlated parameters may not be independent. To establish the
independent contribution of each random parameter estimate, the Cholesky
decomposition matrix separates the contribution to each standard deviation
parameter made through correlation with other random parameter estimates
and the actual contribution made solely through heterogeneity around the
mean of each random parameter estimate, thus unconfounding the correla-
tion structure over random parameter estimates with their associated stan-
dard deviation parameters. This allows the parameters to be freely correlated
and have an unrestricted scale, as well, while ensuring that the covariance
matrix that we estimate is positive definite at all times.
The first element of the Cholesky decomposition matrix will always be
equal to the standard deviation parameter of the first specified random
coefficient.2 Subsequent diagonal elements of the Cholesky decomposition
2
Random parameters in Nlogit are created independent of the generation of other random parameter
estimates. Correlation between two random parameters is created by running the random parameter
estimates through a Cholesky matrix. The distribution of the resulting vector will differ depending on the
order that was specified for the coefficients in the fcn command. This means that different orderings of
random parameters can result in different parameterizations when non-normal distributions are used.
Using an example offered by Ken Train (in private correspondence with the authors), assume two
random parameters X1 and X2 specified with normal and uniform distributions with correlation. Nlogit
creates a standard normal and standard uniform that are uncorrelated, N1 and U2, and multiplies these by
use of a Cholesky matrix. For matrix C = a 0 b c the resulting coefficients are X1 = a×N1, which is normal,
and X2= b×e1+c×U2, which is the sum of a uniform and a normal. X2 is not uniform but has the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
642 The suite of choice models
distribution defined by the sum of a uniform and normal. If the order is reversed, such that N1 is uniform
and U2 is normal, X1 will be uniform and X2 will be the sum of a uniform and normal. By ordering the
random parameters differently, the user is implicitly changing the distribution of the resulting
coefficients.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
643 Mixed logit estimation
of list of ones and zeros>. The analyst must specify the entire Cholesky
matrix. For example:
;corr =
1,
1,1,
0,0,1,
0,0,0,1,
0,0,0,1,1$
This is written out in the model command as:
;corr=1,1,1,0,0,1,0,0,0,1,0,0,0,1,1$
As an aside, the “1” on the diagonals is mandatory, and will mean a 1.0 in the Cholesky
matrix. The “1” below the diagonal signals a free non-zero parameter, not necessarily 1.0.
The “0” below the diagonal means that that element of the Cholesky matrix will equal zero.
Specifying the Cholesky matrix is not the same as specifying the correlation matrix. That is
not possible, except for the case exemplified above. You can make a whole row of the
correlation matrix zero, but not a specific element.
To obtain the full distribution, now that the standard deviation parameter
estimates have accounted for correlation between the random parameters, we
use the same formula as before; namely (for the normal distribution):
create
;rna=rnn(0,1)
;Invtd =-0.07146+0.04154*rna
;Crinvtd=-0.12154+0.07906*rna
;Costd=-0.35631+0.28348*rna$
The inclusion of the distributional form (i.e., n, t, u) within the utility formula
for each random parameter estimate requires special treatment in establishing
the marginal utility possessed by any individual towards the alternative for
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
644 The suite of choice models
9.45
7.56
5.67
Density
3.78
1.89
.00
–2.00 –1.50 –1.00 –.50 .00 .50 1.00
Figure 15.4 Random parameter distributions allowing for correlated random parameters
which the random parameter estimate belongs. In Section 15.5, we show you
how to estimate the conditional parameter estimates (i.e., individual specific
parameter estimates conditioned on the choices observed within the data)
that may be used to decide where on the distribution (of marginal utility) an
individual resides. These individual parameter estimates may then be used to
derive individual level outputs, such as WTP measures (which can themselves
be directly calculated as a distribution), elasticities, etc., or be exported to
other systems such as a larger network model. The difference between condi-
tional and unconditional estimates is presented in Section 8.1 of Chapter 8.
The ML output generated by Nlogit (as reported and discussed in previous
sections of this chapter), however, is that of the unconditional parameter
estimates. The output shown is representative of the entire sampled popula-
tion. The output provides the mean and standard deviation of each of the
random parameter distributions. As such, in using the unconditional para-
meter estimates, the specific location on the distribution for any given indi-
vidual is unknown. If one is interested in the population profile and not that
of specific individuals, this does not represent a problem. If, however, one is
also interested in determining the presence of heterogeneity in the sampled
population and the possible sources of heterogeneity, as shown in previous
sections, then the ML model is ideal. However, once we add in heterogeneity
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
645 Mixed logit estimation
The Nlogit commands, utilities and prob work within the ML model
framework in the same manner as for the MNL and NL models. The
Simulation command (see Chapter 13) may be used to test the policy
implications resulting from changes in attribute levels; however, the results
cannot be easily transferred to other domains without a robust mapping
capability at the individual-observation level. That is, the analyst requires
some method to map the probabilities and/or utilities obtained for sampled
individuals and to a domain outside of the sample, a difficult task given the
presence of the random parameter estimates. In the absence of such mapping
capability, it remains possible to use the information captured in the uncondi-
tional parameter estimates to construct hypothetical samples with the same
distributional information (i.e., mean, standard deviation, or spread), which
in turn may be easily exported to other systems.
In summary, the unconditional parameter estimates capture information on
(1) the distributional form of the marginal utilities of each random parameter
(specified by the analyst), (2) the means of the distributions, and (3) the
dispersion of the distributions provided in the output as the standard deviation
or spread parameters. With this knowledge, it is possible to reconstruct the
random parameter distribution out of sample such that the same distribution is
randomly assigned over a hypothetical sample of individuals. The process to do
this will depend on the distributional form of the random parameter.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
646 The suite of choice models
syntax is ;PR0 = <list of values>, noting that it is PR zero. The list of values is
the full set of parameters, given in the following order: β(means of random
parameters); β(non-random parameters); Δ heterogeneity in mean, by rows,
one for each random β; γ lower triangular Cholesky matrix; and σ = vector of
diagonal elements (sigmas) for the variance matrix. All values must be
provided. If there is no heterogeneity, there are no values for Δ; if there is
no ;CORR, then there are no values for γ. σ must be present if this is an RP
model.
As an aside, This feature is potentially dangerous. There is no way of knowing if the values
you input are in the right order, or are valid at all. Nlogit has to trust the user.
The parameters saved by ;par are generated during estimation, not after.
They are saved in memory every time the functions are computed. The last
one computed is saved for the analyst to use.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
647 Mixed logit estimation
As an aside, If the analyst runs a subsequent model, without saving the par output, it will be
overwritten by the par output of the next model run. Thus if that output is required, it is best
to cut and paste it to a spreadsheet.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
Table 15.1 A matrix (BETA_I) with the stored conditional individual-specific mean random parameter estimates for the first 20 observations
1 2 3 4 5 6 7 8 9 10 11
1 −0.06525 −0.32382 −0.07104 0.019373 −0.18 −0.16461 −0.04422 −0.05362 −0.23926 −0.0919 −0.14548
2 −0.04744 −0.33883 −0.07225 0.021192 −0.19861 −0.16635 −0.04447 −0.05366 −0.24224 −0.11017 −0.15154
3 −0.06329 −0.10396 −0.07425 0.00044 −0.18121 −0.1671 −0.04294 −0.05438 −0.23042 −0.09626 −0.1433
4 −0.06473 −0.19812 −0.06952 0.022764 −0.17036 −0.16293 −0.04626 −0.05355 −0.23817 −0.1075 −0.13473
5 −0.05189 −0.28959 −0.07624 −0.01201 −0.18029 −0.16828 −0.04284 −0.05366 −0.23736 −0.10197 −0.13976
6 −0.04524 −0.10672 −0.0743 0.024378 −0.18661 −0.16202 −0.04282 −0.05485 −0.23612 −0.0991 −0.16997
7 −0.04847 −0.15896 −0.07727 −0.00558 −0.21356 −0.16462 −0.04536 −0.05384 −0.2442 −0.0949 −0.14407
8 −0.06662 −0.38811 −0.07153 0.005146 −0.18851 −0.16564 −0.04539 −0.05342 −0.24755 −0.09474 −0.14776
9 −0.06504 −0.40615 −0.07324 −0.01069 −0.17937 −0.16531 −0.04462 −0.05395 −0.24068 −0.1086 −0.14847
10 −0.0397 0.018877 −0.07135 −0.00669 −0.14756 −0.16768 −0.04616 −0.05319 −0.23374 −0.08921 −0.15658
11 −0.0492 −0.21267 −0.06753 −0.0448 −0.19797 −0.16468 −0.04341 −0.05343 −0.24613 −0.11663 −0.15166
12 −0.05327 −0.3638 −0.06743 −0.04518 −0.19022 −0.16632 −0.0442 −0.05361 −0.24344 −0.1107 −0.15172
13 −0.05213 −0.25684 −0.06836 −0.04347 −0.184 −0.16566 −0.04385 −0.05396 −0.23878 −0.11419 −0.14293
14 −0.05764 −0.33634 −0.0675 −0.0453 −0.21104 −0.16553 −0.04357 −0.05365 −0.24491 −0.1076 −0.13961
15 −0.05052 −0.32122 −0.06892 −0.04737 −0.211 −0.16661 −0.04484 −0.05372 −0.2419 −0.10422 −0.15075
16 −0.0545 −0.28329 −0.06769 −0.04617 −0.20192 −0.16576 −0.04483 −0.05348 −0.23865 −0.11591 −0.15911
17 −0.05237 −0.32554 −0.06852 −0.04462 −0.20299 −0.16582 −0.04399 −0.05391 −0.25099 −0.09826 −0.14926
18 −0.04894 −0.29964 −0.06789 −0.04388 −0.20783 −0.16479 −0.04407 −0.05377 −0.24728 −0.11476 −0.14587
19 −0.04699 −0.24266 −0.06695 −0.04402 −0.18382 −0.16641 −0.04499 −0.0533 −0.24253 −0.11608 −0.15123
20 −0.05252 −0.32377 −0.06861 −0.04607 −0.22832 −0.16563 −0.04269 −0.05388 −0.24311 −0.11891 −0.15281
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
Table 15.2 A matrix (SDBETA_I) with the stored conditional individual-specific standard deviation random parameter estimates for the first 20 observations
1 2 3 4 5 6 7 8 9 10 11
1 0.031199 0.29129 0.026459 0.032456 0.134795 0.011739 0.01228 0.00297 0.040093 0.050788 0.052108
2 0.031523 0.292775 0.022403 0.033743 0.146696 0.013283 0.011045 0.003158 0.041787 0.054324 0.051224
3 0.029507 0.238784 0.023505 0.040933 0.141564 0.011793 0.011718 0.003033 0.047856 0.047964 0.05878
4 0.03775 0.27758 0.024759 0.033894 0.140849 0.01237 0.012009 0.003412 0.038915 0.058443 0.054172
5 0.029791 0.286637 0.023594 0.049085 0.144547 0.012586 0.011659 0.003034 0.047303 0.051284 0.051805
6 0.030241 0.232267 0.021983 0.031668 0.115001 0.012237 0.013177 0.003444 0.036469 0.047533 0.058314
7 0.031804 0.233324 0.025085 0.04306 0.143234 0.01175 0.01205 0.003089 0.041015 0.053519 0.052101
8 0.033574 0.30703 0.025877 0.036702 0.142464 0.012636 0.011948 0.003224 0.045622 0.048812 0.058346
9 0.031489 0.301836 0.023955 0.042724 0.143112 0.012086 0.011937 0.003092 0.041313 0.056293 0.051834
10 0.030159 0.193701 0.020182 0.044783 0.159352 0.010272 0.011399 0.003135 0.045978 0.041521 0.044197
11 0.031925 0.274947 0.024472 0.050083 0.138463 0.01243 0.011975 0.003316 0.04827 0.053544 0.054153
12 0.032504 0.29965 0.024447 0.051149 0.144726 0.012121 0.012543 0.003031 0.041162 0.048821 0.055704
13 0.032653 0.283942 0.025301 0.050404 0.142918 0.011713 0.011456 0.003143 0.044205 0.054323 0.049717
14 0.034164 0.284894 0.024795 0.051315 0.140456 0.012674 0.011776 0.003101 0.044222 0.049731 0.053195
15 0.031874 0.289749 0.024964 0.050456 0.144128 0.012146 0.011795 0.003188 0.040099 0.0547 0.055036
16 0.033546 0.283511 0.024346 0.053076 0.14551 0.011902 0.011687 0.003164 0.045454 0.046459 0.054099
17 0.031721 0.299869 0.024215 0.050438 0.141713 0.01238 0.011844 0.00314 0.043902 0.053115 0.05605
18 0.031277 0.280814 0.025486 0.051678 0.136842 0.012195 0.012504 0.003266 0.040964 0.053192 0.05116
19 0.031531 0.274191 0.024205 0.050843 0.141069 0.012805 0.011439 0.002944 0.045576 0.05212 0.059923
20 0.032663 0.292846 0.023786 0.05288 0.138129 0.011757 0.012222 0.003054 0.042283 0.051859 0.058133
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
650 The suite of choice models
As an aside, the reason for this concern is that the choice distribution can be quite different in
the application sample, and imposing the choice distribution from the estimation sample is a
source of information that is a burden if the known sample choice distribution is so different.
Additional information is only useful if it is portable across data settings. The use of the
population moments associated with the unconditional estimates of parameters seems to be
more appealing when applying a model using another sample of individuals.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
651 Mixed logit estimation
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
652 The suite of choice models
Figure 15.5 graphs the conditional mean for each sampled person. In the
figure, each vertical “leg” of the centipede plot shows the conditional con-
fidence interval for βinvt for that person. The dot is the mid point of the
interval, which is the point estimate. The centre horizontal bar shows the
mean of the conditional means, which estimates the population mean. This
was reported earlier as –0.06073. The upper and lower horizontal bars show
the overall mean plus and minus twice the estimated population standard
deviation – this was reported earlier as 0.08235. Thus, the unconditional
population range of variation is estimated to be about 0.01375 to −0.175. In
this example, we have used a constrained triangular distribution (with no
heterogeneity in the mean or heteroskedasticity in the variance), and hence we
have fully satisfied the negative sign across the entire distribution.
The WTP for an attribute is the ratio of that attribute’s parameter estimate to
the parameter estimate of the cost parameter. For value of travel time savings
(VTTS), we multiply the resulting WTP measure by 60 if the time attribute
was measured in minutes. This converts the VTTS to a measure of WTP for
time per hour. We discussed in Chapters 11 and 13 how to derive measures of
WTP from the MNL model. If in the ML model, the two parameters used in
deriving measures of WTP are estimated as non-random parameters, the
methodology of calculating WTP remains unchanged. If, however, one or
the other of the parameters is estimated as a random parameter, then the WTP
calculations must take this into account.
VTTS and WTP measures may be constructed using either conditional
parameter estimates or the unconditional parameter estimates (population
moments).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
653 Mixed logit estimation
.0250
.0000
– .0250
– .0500
Range
– .0750
– .1000
– .1250
– .1500
– .1750
0 40 80 120 160 200
Person
95% Probability Intervals for wtpPT
15
10
5
Range
–5
–10
–15
0 100 200 300 400 500
Person
20
10
Range
–10
–20
–30
0 100 200 300 400 500
Person
Figure 15.5 Estimates of the marginal utility of invehicle time together with confidence intervals
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
654 The suite of choice models
The following Nlogit output is produced for the above model with the first
20 results for wtp_i and sdwtp_i given in Table 15.3 (copied into Excel with
extra columns used to convert $/min. results from Nlogit to $/hour. The
standard deviation estimates suggest a noticeable amount of heterogeneity
in the WTP estimates within the sample. The overall mean estimate for the
value of invehicle travel time savings for public transport is $21.64/hr., and for
car it is $38.07/hr. The equivalent standard deviation estimates are $4.97/hr.
and $8.16/hr.:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
Table 15.3 A matrix with the stored conditional individual-specific WTP estimates for the first 20 observations (noting that an observation is a respondent and not a
choice set in the absence of recognizing the number of choice sets using :pds = <number>)
$/min: MvttsinvtPT MvttsinvtCar $/hr: MvttsinvtPT MvttsinvtCar $/min: SDvttsinvtPT SDvttsinvtCar $/hr: SDvttsinvtPT SDvttsinvtCar
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
656 The suite of choice models
|-> Nlogit
Random Parameters Logit Model
Dependent variable RESP1
Log likelihood function -2465.75251
Restricted log likelihood -3580.47467
Chi squared [ 20](P= .000) 2229.44432
Significance level .00000
McFadden Pseudo R-squared .3113336
Estimation based on N = 1840, K = 20
Inf.Cr.AIC = 4971.5 AIC/N = 2.702
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
No coefficients -3580.4747 .3113 .3098
Constants only can be computed directly
Use NLOGIT ;. . .;RHS=ONE$
At start values -2497.0892 .0125 .0103
Response data are given as ind. choices
Replications for simulated probs. = 100
Used Halton sequences in simulations.
Number of obs.= 1840, skipped 0 obs
----------- +---------------------------------------------------------------------------------------
| Standard Prob. 95% Confidence
RESP1| Coefficient Error z |z|>Z* Interval
----------- +----------------------------------------------------------------------------------------
|Random parameters in utility functions
INVT| -.06368*** .00329 -19.37 .0000 -.07012 -.05723
COST| -.24872*** .01958 -12.70 .0000 -.28710 -.21033
ACWT| -.06976*** .00731 -9.55 .0000 -.08407 -.05544
EGGT| -.01435** .00565 -2.54 .0111 -.02543 -.00327
CRPARK| -.03559*** .01341 -2.65 .0079 -.06187 -.00931
ACCBUSF| -.10601*** .03622 -2.93 .0034 -.17701 -.03501
WAITTB| -.08739*** .02870 -3.04 .0023 -.14365 -.03113
ACCTB| -.07517*** .01089 -6.91 .0000 -.09651 -.05384
CRCOST| -.14957*** .04942 -3.03 .0025 -.24644 -.05271
CRINVT| -.07024*** .01107 -6.35 .0000 -.09193 -.04854
CREGGT| -.08194*** .02318 -3.53 .0004 -.12737 -.03650
|Nonrandom parameters in utility functions
NLRASC| 2.53832*** .46944 5.41 .0000 1.61824 3.45840
PTINC| -.01212*** .00290 -4.18 .0000 -.01781 -.00643
PTGEND| 1.87986*** .26115 7.20 .0000 1.36801 2.39171
NLRINSDE| -1.10737*** .35603 -3.11 .0019 -1.80518 -.40956
TNASC| 1.84015*** .45881 4.01 .0001 .94090 2.73940
NHRINSDE| -1.12297*** .40112 -2.80 .0051 -1.90915 -.33680
NBWASC| 1.14015** .48364 2.36 .0184 .19223 2.08807
BSASC| 1.51964*** .44718 3.40 .0007 .64318 2.39611
BWASC| 1.39054*** .46212 3.01 .0026 .48480 2.29629
|Distns. of RPs. Std.Devs or limits of triangular
TsINVT| .06368*** .00329 19.37 .0000 .05723 .07012
TsCOST| .24872*** .01958 12.70 .0000 .21033 .28710
TsACWT| .06976*** .00731 9.55 .0000 .05544 .08407
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
657 Mixed logit estimation
The centipede commands to plot the two distributions for the first 500
observations, together with their upper and lower confidence intervals for
95 percent probability intervals, are given as follows (noting that the matrices
of interest are wtp_i and sdwtp_i, not beta_i and sdbeta_i):
SAMPLE ; 1–500 $
create;bwtpPT=0;bwtpcar=0$
create;swtpPT=0;swtpcar=0$
name;rpi=bwtpPT,bwtpcar$
name;rpis=swtpPT,swtpcar$
create;rpi=wtp_i$
create;rpis=sdwtp_i$
CREATE ; lower = bwtppt - 2*swtppt
; upper = bwtppt + 2*swtppt $
create
;vttsptm=60*bwtppt
;vttscrm=60*bwtpcar$
dstats;rhs=bwtppt,bwtpcar,swtpPT,swtpcar,lower,upper,vttsptm,vttscrm$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
658 The suite of choice models
As an aside, We have found that a model estimated in WTP space (see below) in contrast
to utility space often improves on the sign preservation, even under unconstrained distribu-
tions, and seems to reduce the long tail common in many unconstrained distributions in
utility space such as the log-normal. Furthermore, we have also found that the allowance
for attribute processing rules (such as attribute non-attendance – see Chapter 21), also
contributes to reducing the incidence of negative estimates of WTP where a positive
estimate is behaviorally plausible.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
659 Mixed logit estimation
$38.07/hr.), with the car value almost identical and the public transport
estimate slightly lower, due essentially to sampling error. This supports the
proof of equivalence in Section 8.1 of Chapter 8:
As an aside, As shown below in the command syntax, when deriving estimates for each
respondent on the distribution, it is important to use different names for the random normal
and the triangular, otherwise you will obtain identical estimates for the numerator and
denominator.
sample;all$
reject;dremove=1$
reject;ttype#1$ work =1
reject;altij=-999$
sample;1–500$
create
;rna=rnn(0,1)
;V1=rnu(0,1)
;V1d=rnu(0,1)
;if(v1<=0.5)T=sqr(2*V1)-1;(ELSE) T=1-sqr(2*(1-V1))
;if(v1d<=0.5)Td=sqr(2*V1d)-1;(ELSE) Td=1-sqr(2*(1-V1d))
;MUPTt=-0.06368+0.06368*T
;MUPTc=-0.24872+0.24872*Td
;VTTSPT = 60*(MUptt/muptc)$ ?60*((-0.06368+0.06368*T)/(-0.24872
+0.24872*Td))
reject;altij=7$
dstats;rhs=t,muptt,muptc,vttspt$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
660 The suite of choice models
;if(v1d<=0.5)Td=sqr(2*V1d)-1;(ELSE) Td=1-sqr(2*(1-V1d))
;VTTSCAR = 60*(-0.07024+0.07024*T)/(-0.14957+0.14957*Td)$
reject;altij#7$
dstats;rhs=vttscar$
where the Wq,m are normally distributed effects with zero mean, m = 1,. . .,
M ≤ J and Cj,m = 1 if m appears in utility function j.4 This specification can
produce a simple “random effects” model if all J utilities share a single error
component:
3
The Ben-Akiva et al. (2002) paper was a reaction to the suggestion in Brownstone and Train, pointing out
that identification can be difficult to assess in mixed models with these kinds of error components for
alternatives and nests.
4
Issues of specification and identification are discussed in Ben-Akiva et al. (2002).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
661 Mixed logit estimation
Second, we combine this specification with the full random parameters model
laid out earlier.5 Collecting all results, the full ML model is given by Equations
(15.21–15.26):
5
Ben-Akiva et al. (2002) extend the basic model somewhat by imposing a factor analytic structure on the
set of kernels. This achieves a small amount of generality in allowing the variables that appear in the utility
functions to be correlated. With respect to the behavioral model, little is actually obtained by this, since
the assumed independent kernels above may be mixed in any fashion in the utility functions.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
662 The suite of choice models
XM
Uq;j;t ¼ βq 0 xq;j;t þ eq;j;t þ c W :
m¼1 j;m q;m
ð15:21Þ
βq ¼ β þ Δzq þ Γq v q : ð15:22Þ
vq ¼ Rvq : ð15:23Þ
The unconditional choice probability is the expected value of this logit prob-
ability over all the possible values of βq and Wq – that is, integrated over these
values, weighted by the joint density of βq and Wq. We assume that vq and Wq
are independent, so this is just the product. Thus, the unconditional choice
probability is:
ð15:28Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
663 Mixed logit estimation
Pq ðXq ; O; zq ; hq Þ ¼
Ð Ð QT
Wq β q t¼1 Lq;j;t ðβq jXq;t ; O; zq ; hq ; v q ; Wq Þf ðv q jO; zq ; hq Þf ðWq jO; hq Þdv q dWq
ð15:29Þ
log LðOÞ
XQ ð ð YT
¼ q¼1
log t¼1
Lq;j;t ðβq jXtq ; O; zq ; hq ; v q ; Wq Þ × f ðv q jO; zq ; hq Þf ðWq jO; hq Þdv q dWq :
W q βq
ð15:31Þ
1 XR YT
log LS ðOÞ ¼ L ðβ jX ; O; zq ; hq ; v qr ; Wqr ; Þ
t¼1 q;j;t q q;t
R r¼1
YT
exp½ðβ þ Dzq þ Γq vq;r Þ0 xq;j;t þ ΣM :
XQ 1 XR t¼1 m¼1 cj;m Wq;m;r
¼ log J
q¼1 R r¼1
exp½ðβ þ Dzq þ Γq vq;r Þ0 xq;m:t þ ΣM cj;m Wq;m;r
X
m¼1 m¼1
ð15:32Þ
6
In our application, we use Halton sequences rather than random draws to speed up and smooth the
simulations. See Bhat (2001), Train (2003), or Greene (2003) for a discussion.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
664 The suite of choice models
model may be layered on top of the random parameters (mixed) logit model
by adding the ECM specification, with ; ECM = the specification of the error
components.
The full set of options and features for the ML model and the random
parameters model are used in this setting as well. That includes fitted prob-
abilities, inclusive values, all display options described, and the simulator
described in Chapter 13. Do note, however, that although this model is closely
related to the RP model, there is but one parameter vector and, hence, ; Par
has no effect here. The specification ; SDE = list of symbols or values can be
used in the same fashion as ; Rst = list to constrain the standard deviations
of the error components to equal each other or fixed values. For example, with
four components, the specification ; SDE = 1,1,ss,ss forces the first two to
equal one and the third and fourth to equal each other. Two other specifica-
tions are available. ; SDE = a single value forces all error components to be
equal to that value. Finally, in any specification, if the value is enclosed in
parentheses (()), then the value is merely used to provide the starting value for
the estimator, it does not impose any constraints on the final estimates.
To allow for heteroskedasticity in variance of error components, Var[Eim]
= exp(γmhi), we include the syntax ;hfe=<list of variables>. An example of the
error component commands that will be used most often is:
;ecm= (NLRail,NHRail,Train),(NBway,Bus,Bway),(Car)
; Hfe = pinc,gender
Suppose we wish to specify that only pinc appears in the first function, only
gender in the second, and both in the third. The ; ECM specification would be
modified to:
;ecm= (NLRail,NHRail,Train!10),(NBway,Bus,Bway!01),(Car!11)
An exclamation point inside the parentheses after the last name signals that a
specification of the heteroskedastic function is to follow. The succeeding
specification is a set of zeros and ones, where a one indicates that the variable
appears in the variance and a zero indicates that it does not. The number of
zeros and ones provided is exactly the number of variables that appear in the
Hfe list (already defined earlier).
As an aside, It is permissible to allow for an alternative to appear on more than one error
component, as a nested structure. For example:
;ecm= (NLRail,NHRail,Train!10),(NBway,Bus,Bway,NLRail!01),
(Bus,Train!11)(Car!11)
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
665 Mixed logit estimation
7
The constrained triangular has only one parameter, that is its mean and spread.
8
The mean weighted average elasticities were also statistically equivalent.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
Table 15.4 Summary of empirical results9: commuter trips
Note: All public transport = (new heavy rail, new light rail, new busway, bus, train, busway); time is in min.; fares and cost is in dollars ($2003). T-values in brackets
2230 observations, 200 Halton draws.
New light rail constant New light rail 2.411 (5.0) 3.313 (6.1) 2.978 (5.7) 4.442 (4.68) 5.011 (5.3)
New busway constant New busway 1.019 (2.1) 1.933 (3.5) 1.561 (2.8) 2.939 (3.1) 3.487 (3.7)
Existing bus constant Bus 1.393 (3.0) 2.273 (4.4) 1.852 (3.6) 3.255 (3.5) 3.808 (4.1)
Train constant Existing and new train 1.709 (3.6) 2.609 (4.9) 2.246 (4.4) 3.657 (3.9) 4.213 (4.5)
Existing busway Busway 1.266 (2.7) 2.183 (4.1) 1.801 (3.4) 3.178 (3.4) 3.714 (4.0)
constant
Random parameters –constrained triangular:
Main mode fares All public transport −0.2505 (−12.1) −0.3536 (−10.1) −0.3512 (−10.4) −0.3723 (−9.3) −0.3853 (−9.4)
Car mode running and Car −0.1653 (−3.3) −0.1764 (−3.3) −0.1876 (−3.4) −0.2152 (−2.8) −0.2182 (−2.9)
toll cost
Car parking cost Car −0.0340 (−2.7) −0.0377 (−2.7) −0.0443 (−3.0) −0.0571 (−2.7) −0.0558 (−2.8)
Main mode invehicle All public transport −0.0640 (−19.3) −0.0744 (−14.4) −0.0713 (−18.5) −0.0773 (−15.3) −0.0785 (−15.1)
time
Access and wait time All train and light rail −0.0699 (−9.5) −0.0716 (−9.5) −0.0762 (−9.8) −0.0811 (−9.1) −0.0828 (−9.2)
Access time All bus and busway -.0756 (−6.9) −0.0808 (−7.1) −0.0839 (−6.5) −0.0929 (−6.4) −0.0942 (−6.3)
Wait time All bus and busway −0.0882 (−3.1) −0.0907 (−3.06) −0.1026 (−3.2) −0.1034 (−3.0) −0.1048 (−3.0)
Main mode invehicle Car –0.0728 (−6.2) −0.0791 (−6.2) −0.0859 (−7.0) −0.0796 (−5.2) −0.0732 (−4.8)
time
9
Multiple mixture runs must be conducted and a measure of variation in parameter coefficients must be reported (e.g., std. deviation in parameter estimates). Theoretically,
we cannot make statistical inference from a distribution using a single point. Unfortunately, this does not address a relevant issue in our estimations. We used Halton draws
to perform our integrations, so there is no simulation variance. If we repeated the estimation, we would get exactly the same estimates. In fact, there is no simulation
variance because the estimates are not based on simulations. We have used the Halton technique to evaluate certain integrals. There will be an approximation error, of
course. Our only control over that is to use many Halton draws, which we have done. The interpretation of the estimates as a sample of one is not correct, however. There
are many other settings in which researchers must resort to approximations to evaluate integrals, such as random effects probit models which use Hermite quadrature to
approximate integrals, and even the most mundane univariate probit model which uses a ratio of polynomials to approximate the standard normal cdf. The MLEs obtained
in these settings are not samples of one; they are maximizers of an approximation to the LL function that cannot be evaluated exactly.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
Egress travel time All public transport −0.0145 (−2.6) −0.0142 (−2.5) −0.0151 (−2.7) −0.0169 (−2.8) −0.0181 (−3.0)
Egress travel time Car −0.0814 (−3.5) −0.0876 (−3.5) −0.0892 (−2.8) −0.1193 (−2.5) −0.1084 (−2.5)
Access bus mode fare Where bus is access mode −0.1067 (−2.9) −0.1916 (−3.65) −0.1981 (−3.8) −0.2118 (−3.8) −0.2167 (−3.8)
Non-random parameters:
Inside study area New heavy rail −1.119 (−2.8) −1.207 (−2.9) −1.300 (−3.3) −1.497 (−3.4) −1.419 (−3.2)
Inside study area New light rail −1.104 (−3.1) −1.164 (−3.2) −1.249 (−3.6) −1.443 (−3.7) −1.383 (−3.5)
Personal income All public transport −0.0122 (−4.1) −0.0278 (−6.2) −0.0211 (−4.9) −0.0266 (−4.2) −0.0326 (−5.0)
Gender (male = 1) All public transport 1.905 (7.1) 1.969 (6.9) 2.662 (7.6) 3.437 (6.2) 3.493 (6.7)
Heterogeneity around mean:
Invehicle time * All public transport 0.000124 (2.6)
personal income
Main mode fares * All public transport 0.00135 (3.8) 0.00078 (1.90) 0.00079 (1.8) 0.00090 (2.0)
personal income
Access bus fare * All public transport except existing 0.00143 (2.4) 0.00146 (2.3) 0.00160 (2.3) 0.00164 (2.4)
personal income bus
Heterogeneity around standard deviation:
Invehicle time *gender All public transport 0.4007 (4.3) 0.4195 (4.5) 0.4270 (4.6)
Main mode fares * All public transport 0.6384 (4.4) 0.7049 (4.4) 0.6727 (4.2)
gender
Error components for alternatives and nests of alternatives parameters:
Standard deviation New light rail, new heavy rail, new 0.8659 (2.2) 1.010 (2.9)
busway, existing busway
Standard deviation Existing bus and heavy rail 0.2068 (0.32) 0.0814 (0.13)
Standard deviation Car 3.021 (4.0) 11.158 (2.3)
Heterogeneity around standard deviation of error components effect:
Age of commuter Car −0.0366 (−2.3)
LL at convergence −2464.3 −2451.7 −2442.1 −2435.9 −2428.7
Pseudo-R2 0.3101 0.3135 0.3161 0.3176 0.3195
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
Table 15.5 Mean and standard deviation of random parameter estimates for entire representation of each attribute from relatively simple to more complex models
Note: Except for ML1 which has a single parameter, the other models are complex representations of multiple parameters from Table 15.4.
Mean invtpt costpt acwt eggt crpark accbusf waittb acctb crcost crinvt creggt
Ave ML1 −0.0640 −0.2505 −0.0699 −0.0145 −0.0340 −0.1067 −0.0882 −0.0756 −0.1653 −0.0727 −0.0814
Std Dev ML1 0.0245 0.1003 0.0278 0.0059 0.0139 0.0435 0.0359 0.0306 0.0670 0.0283 0.0329
Ave ML2 −0.0680 −0.2744 −0.0733 −0.0148 −0.0407 −0.1125 −0.0916 −0.0831 −0.1821 −0.0869 −0.0907
Std Dev ML2 0.0287 0.1385 0.0291 0.0060 0.0166 0.1014 0.0373 0.0336 0.0738 0.0367 0.0366
Ave ML3 −0.0788 −0.3581 −0.0819 −0.0258 −0.0558 −0.1241 −0.1071 −0.0859 −0.2789 −0.1146 −0.1297
Std Dev ML3 0.0391 0.2771 0.0269 0.0465 0.0075 0.1112 0.0664 0.0002 0.1868 0.0556 0.0034
Ave ML4 −0.0923 −0.4232 −0.0942 −0.0286 −0.0763 −0.1366 −0.1231 −0.1006 −0.3866 −0.0864 −0.1735
Std Dev ML4 0.0476 0.3374 0.0299 0.0489 0.0373 0.1157 0.0979 0.0007 0.3003 0.0016 0.0161
Ave ML5 −0.0923 −0.4152 −0.0941 −0.0348 −0.1151 −0.1340 −0.1137 −0.1001 −0.2678 −0.1140 −0.1524
Std Dev ML5 0.0473 0.3236 0.0317 0.0620 0.1165 0.1051 0.0837 0.0006 0.1445 0.0602 0.0061
Notes: invtpt = invehicle time for public transport (PT); costpt = public transport fares; acwt = access and wait time for light and heavy rail; eggt = egress
time for PT; crpark = car parking cost; accbusf = access bus mode fare; waittb = wait time for bus and busway; acctb = access time for bus and busway;
crcost = car running cost; crinvt = car invehicle time; creggt = egress time from car.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
669 Mixed logit estimation
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
670 The suite of choice models
Elasticity of invehicle time for With respect to ML1 ML2 ML3 ML4 ML5
New light rail New light rail −1.800 −1.778 −1.763 −1.781 −2.182
New heavy rail New heavy rail −1.759 −1.764 −1.764 −1.720 −1.909
New busway New busway −2.323 −2.311 −2.282 −1.366 −2.092
Existing bus Existing bus −1.829 −1.798 −1.771 −2.316 −3.079
Existing busway Existing busway −1.673 −1.676 −1.686 −2.379 −3.010
Existing heavy rail Existing heavy rail −1.486 −1.495 −1.500 −2.000 −2.639
Car Car −1.204 −1.214 −1.202 −1.129 −1.036
12
Model fit on its own is not the best indicator of the advantages of a more complex structure. Indeed, the
improvements in fit may be quite small, but the findings in terms of elasticities can be quite different.
13
Evidence for other attributes is similar and available on request.
14
Elasticities are strictly meaningful, in a behavioral sense, after a model has been calibrated to reproduce
the known population shares. SC models whose ASCs have not been calibrated after estimation to
reproduce population (in contrast to sample) shares are related to sample shares only.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
671 Mixed logit estimation
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
672 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
673 Mixed logit estimation
heterogeneity (now aligned with the ML model). Papers by Breffle and Morey
(2000), Hess and Rose (2012), and Hensher and Greene (2010) are other
contributions.
The generalized ML model employed here builds on the specifications of
the mixed logit developed in Train (2003), Hensher and Greene (2003), and
Greene (2007), among others, and the “generalized multinomial logit model”
proposed in Fiebig et al. (2010). Full details are given in Chapter 4, but we
briefly summarize the main elements here as a prelude to the estimation of
models.
Briefly, the mixed multinomial logit model (MMNL) is given in
Equation (15.33):
expðVit;j Þ
Probðchoiceit ¼ jjxit;j ; zi ; vi Þ ¼ XJit ; ð15:33Þ
j¼1
expðV it;j Þ
where
15
One can, however, allow for deterministic taste heterogeneity via interaction terms with respondent-
specific characteristics.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
674 The suite of choice models
βi = σiβ. (15.36b)
This generalized mixed model also provides a straightforward method of
reparameterizing the model to estimate the taste parameters in WTP space,
which has become a behaviorally appealing alternative way of directly obtain-
ing an estimate of WTP (see Train and Weeks 2005; Fosgerau 2007; Scarpa,
Thiene, and Hensher 2008; Scarpa, Thiene, and Train 2008; Sonnier et al.
2007; Hensher and Greene 2011). If γ = 0, Δ = 0 and the element of β
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
675 Mixed logit estimation
In the simple MNL case (σi = 1, Γ = 0), this is a one to one transformation of
the parameters of the original model. Where the parameters are random,
however, the transformation is no longer that simple. We, as well as Train and
Week (2005), have found, in application, that this form of the transformed
model produces generally much more reasonable estimates of WTP for
individuals in the sample than the model in the original form, in which
WTP is computed using ratios of parameters (Hensher and Greene 2011).16
The full model, in the unrestricted form or in any of the modifications, is
estimated by maximum simulated likelihood (see Chapter 5). Fiebig et al.
(2010) note two minor complications in estimation. First, the parameter in
σi is not separately identified from the other parameters of the model. We will
assume that the variance heterogeneity is normally distributed. Neglecting the
observed heterogeneity (i.e., δ0 hi) for the moment, it will follow from the
general result for the expected value of a log-normal variable that E[σi] = exp
( + τ2/2 ). That is, σi = exp()exp(τwi), where wi ~ N(0,1), so E[σi] = exp()E
[exp(τwi)] = exp()exp(E[τwi] + ½ Var[τwi]) = exp( + τ2/2). It follows that
is not identified separately from τ, which appears nowhere else in the model.
Some normalization is required. A natural normalization would be to set =
0. However, it is more convenient to normalize σi so that E[σi2] = 1, by setting
= −τ2/2 instead of 0.
A second complication concerns the variation in σi during the simulations.
The log-normal distribution implied by exp(−τ2/2 + τwi) can produce extremely
large draws and lead to overflows and instability of the estimator. To accom-
modate this concern, one might truncate the standard normal distribution of wi
at −1.96 and +1.96. In contrast to Fiebig et al., who propose an acceptance/
rejection method for the random draws, Nlogit uses a one-draw method, wir =
Φ−1[.025 + .95Uir], where Φ−1(t) is the inverse of the standard normal cdf and Uir
16
The paper by Hensher and Greene (2011), like Train and Weeks (2005), supports the WTP space
framework for estimating WTP distributions given that the evidence on the range is behaviorally more
plausible, despite the overall goodness of fit being inferior to the utility space specifications.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
676 The suite of choice models
is a random draw from the standard uniform population.17 This will maintain
the smoothness of the estimator in the random draws. The acceptance/rejection
approach requires, on average, 1/.95 draws to obtain an acceptable draw, while
the inverse probability approach always requires exactly 1.
Finally, in order to impose the limits on γ (Equation (15.35)), γ is repar-
ameterized in terms of α, where γ = exp(α)/[1 + exp(α)] and α is unrestricted.
Likewise, to ensure τ ≥ 0, the model is fit in terms of λ, where τ = exp(λ) and λ
is unrestricted. Restricted versions in which it is desired to restrict γ = 1 or 0
and/or τ = 0 are imposed directly during the estimation, rather than using
extreme values of the underlying parameters, as in previous studies. Thus, in
estimation, the restriction γ = 0 is imposed directly, rather than using, for
example, α = −10.0 or some other large value.
Combining all terms, the simulated LL function for the sample of data is
shown in Equation (15.38):
X
XN 1 R YTi YJit dit;j
logL ¼ log Pðj; X ; β
it ir Þ ; ð15:38Þ
i¼1 R r¼1 t¼1 j¼1
where
βir = σir[β + Δzi ] + [γ + σir(1 – γ)] Γvir;
σir = exp[-τ2/2 + δ0 hi + τwir];
vir and wir = the R simulated draws on vi and wi;
ditj = 1 if individual i makes choice j in choice situation t and 0 otherwise,
and
17
The default in Nlogit is to use –τ2/2 to center the draws on σi. However, in Nlogit version 5, we have
removed the truncation device (forcing e(i) to lie in [−2,+2]) and instead use the normal draws without
truncation. To use the new device, add ;CENTER to the GMXLogit command. To get a good
comparison, we recommend using Halton draws for the simulation. Then, the two approaches use the
same set of draws.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
677 Mixed logit estimation
The resulting parameter, λi, becomes the normalizing constant in the WTP
space representation. Thus, Equation (15.41) is the WTP space form of the
model in Equation (15.40).
To show the results associated with estimating various versions of the
generalized MMNL in utility and WTP space, we continue to use the same
data set as above. We model the parameters in preference space as λi = (λp +
σpwi) and the K elements of βi as βik = (βk + σkvik), where the K + 1 random
parameters are freely correlated. In the preference space representation, as
suggested, e.g., in Thiene and Scarpa (2009), we have constrained λi to have
one sign by writing λi = λpexp(λ0 + τwi). The sign of the full expression is
not imposed a priori, but the estimate of λ is negative as expected. As before,
(wi,vi) are K + 1 freely correlated random variables. Since the scale of λi is
provided by λp, a separate λ0 is not estimable. Note that we may write λi as
exp(logλp + λ0 + τwi), so that different combinations of λp and λ0 produce
the same λi. To remove the indeterminacy, we follow Fiebig et al.’s suggestion,
and (with a standard normality assumption for wi) set λ0 = −τ2/2, so that λi =
λpexp(−τ2/2 + τwi) and, consequently, E[λi] = λp.
Equations (15.40) and (15.41) are both special cases of the Greene and
Hensher (2011) implementation of Fiebig et al.’s (2010) model. The model in
WTP space in Equation (15.41) is obtained by setting γ = 0, the row in Γ
corresponding to λp to zero, the coefficient λp on pijt in β equal to 1, and
relaxing the restriction λ0 = −τ2/2. Thus, the model in WTP space is estimated
by using a form of the generalized mixed logit model.
The models of interest are summarized in Table 15.8 (p. 691). Model 1 (M1)
is the base random parameter model in preference space. Model 2 (M2) is the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
678 The suite of choice models
sample;all$
reject;dremove=1$
reject;ttype#1$
reject;altij=-999$
nlogit
;lhs=resp1,cset,Altij
;choices=NLRail,NHRail,NBway,Bus,Bway,Train,Car
;pwt
;rpl
;pds=16;halton;pts=500
;fcn=invt(t),waitt(t), acct(t),eggt(t),cost(t)
;corr;par
;model:
U(NLRail)= NLRAsc+cost*tcost+invt*InvTime+waitt*waitt2+accT*acctim
+accbusf*accbusf
+ ptinc*pinc + ptgend*gender + NLRinsde*inside /
U(NHRail)= TNAsc+cost*Tcost+invt*InvTime+waitT*WaitT+accT*acctim
+eggT*egresst
+accbusf*accbusf+ ptinc*pinc + ptgend*gender +
NHRinsde*inside /
U(NBway)= NBWAsc +cost*Tcost+invt*InvTime+waitT*WaitT+accT*acctim
+eggT*egress
+accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Bus)= BSAsc+cost*frunCost+invt*InvTime+waitT*WaitT+accT*acctim
+eggT*egresst
+ ptinc*pinc + ptgend*gender/
18
The parameter on cost in Model 2 is implicitly −0.2956*exp(−0.48962 + 0.4896*w(i)). This is also equal
to –exp(log(0.2956)−0.48962 + 0.4896w(i)) = -exp(−1.4585 +0 .4896*w(i)).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
679 Mixed logit estimation
U(Bway)= BWAsc+cost*Tcost+invt*InvTime+waitT*WaitT+accT*acctim
+eggT*egresst
+ accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Train)= TNAsc+cost*tcost+invt*InvTime+waitT*WaitT+accT*acctim +
eggT*egresst
+ accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Car)= cost*costs+invt*InvTime+CRpark*parkcost+CReggT*egresst $
Normal exit: 53 iterations. Status=0. F= 2043.845
+---------------------------------------------------------------+
| Random Parameters Logit Model |
| Dependent variable RESP1 |
| Log likelihood function -2043.845 |
| Restricted log likelihood -3580.475 |
| Chi squared [ 32 d.f.] 3073.26014 |
| Significance level .0000000 |
| McFadden Pseudo R-squared .4291694 |
| Estimation based on N = 1840, K = 32 |
| AIC = 2.2564 Bayes IC = 2.3523 |
| AICf.s. = 2.2570 HQIC = 2.2917 |
| Model estimated: Jun 14, 2009, 11:01:27 |
| Constants only. Must be computed directly. |
| Use NLOGIT ;. . .; RHS=ONE $ |
| At start values -2480.8579 .17615 ******* |
| Response data are given as ind. choice. |
+---------------------------------------------------------------+
+---------------------------------------------------------------+
| Notes No coefficients=> P(i,j)=1/J(i). |
| Constants only => P(i,j) uses ASCs |
| only. N(j)/N if fixed choice set. |
| N(j) = total sample frequency for j |
| N = total sample frequency. |
| These 2 models are simple MNL models. |
| R-sqrd = 1 - LogL(model)/logL(other) |
| RsqAdj=1-[nJ/(nJ-nparm)]*(1-R-sqrd) |
| nJ = sum over i, choice set sizes |
+---------------------------------------------------------------+
+---------------------------------------------------------------+
| Random Parameters Logit Model |
| Replications for simulated probs. = 500 |
| Halton sequences used for simulations |
| ------------------------------------------------------------ |
| RPL model with panel has 115 groups. |
| Fixed number of obsrvs./group= 16 |
| Random parameters model was specified |
| ------------------------------------------------------------ |
| RPL model has correlated parameters |
| Number of obs.= 1840, skipped 0 bad obs. |
+---------------------------------------------------------------+
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
680 The suite of choice models
+-----------+------------------+----------------------+------------+------------+
|Variable| Coefficient | Standard Error |b/St.Er.| P[|Z|>z] |
+-----------+------------------+----------------------+------------+------------+
+-----------+Random parameters in utility functions |
|INVT | -.07728*** .00550 -14.059 .0000 |
|WAITT | -.03579 .02230 -1.605 .1085 |
|ACCT | -.10950*** .01344 -8.145 .0000 |
|EGGT | -.06294*** .01714 -3.673 .0002 |
|COST | -.34306*** .03570 -9.609 .0000 |
+-----------+Nonrandom parameters in utility functions |
|NLRASC | 1.14253*** .42031 2.718 .0066 |
|ACCBUSF | -.06420 .04065 -1.579 .1143 |
|PTINC | -.00812** .00359 -2.258 .0239 |
|PTGEND | 1.50531*** .28253 5.328 .0000 |
|NLRINSDE| -1.45966** .60315 -2.420 .0155 |
|TNASC | 1.31882*** .33094 3.985 .0001 |
|NHRINSDE| -3.08258*** .82333 -3.744 .0002 |
|NBWASC | .66456* .35718 1.861 .0628 |
|BSASC | .78874** .31056 2.540 .0111 |
|BWASC | .77814** .32520 2.393 .0167 |
|CRPARK | -.04282*** .01191 -3.595 .0003 |
|CREGGT | -.09378*** .02168 -4.326 .0000 |
+-----------+Diagonal values in Cholesky matrix, L. |
|TsINVT | .11099*** .00890 12.471 .0000 |
|TsWAITT | .26302*** .03238 8.123 .0000 |
|TsACCT | .20772*** .03179 6.535 .0000 |
|TsEGGT | .27187*** .03497 7.774 .0000 |
|TsCOST | .51315*** .07388 6.945 .0000 |
+-----------+Below diagonal values in L matrix. V = L*Lt |
|WAIT:INV| -.21696*** .03767 -5.759 .0000 |
|ACCT:INV| -.04982 .03142 -1.585 .1129 |
|ACCT:WAI| -.09559*** .02858 -3.345 .0008 |
|EGGT:INV| -.12696*** .03412 -3.721 .0002 |
|EGGT:WAI| -.16096*** .02775 -5.800 .0000 |
|EGGT:ACC| -.06111** .02515 -2.430 .0151 |
|COST:INV| .44032*** .09948 4.426 .0000 |
|COST:WAI| .23525*** .06574 3.579 .0003 |
|COST:ACC| -.14034* .07950 -1.765 .0775 |
|COST:EGG| .23593*** .08671 2.721 .0065 |
+-----------+Standard deviations of parameter distributions |
|sdINVT | .11099*** .00890 12.471 .0000 |
|sdWAITT | .34095*** .04140 8.236 .0000 |
|sdACCT | .23403*** .03314 7.061 .0000 |
|sdEGGT | .34594*** .02574 13.440 .0000 |
|sdCOST | .76675*** .07299 10.505 .0000 |
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
681 Mixed logit estimation
(S-MNL specification)
Note: in wtp space the issue of sign is not an issue any more
GMXlogit;userp
;lhs=resp1,cset,Altij
;choices=NLRail,NHRail,NBway,Bus,Bway,Train,Car
;pwt
;pds=16;halton;pts=500
;fcn=invt(t),waitt(t), acct(t),eggt(t),cost(*c)
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
682 The suite of choice models
;corr;par
;gamma=[0]
;tau=0.3
;model:
U(NLRail)= NLRAsc+cost*tcost+invt*InvTime+waitt*waitt2+accT*acctim
+accbusf*accbusf
+ ptinc*pinc + ptgend*gender + NLRinsde*inside /
U(NHRail)= TNAsc+cost*Tcost+invt*InvTime+waitT*WaitT+accT*acctim
+eggT*egresst
+accbusf*accbusf+ ptinc*pinc + ptgend*gender +
NHRinsde*inside /
U(NBway)= NBWAsc +cost*Tcost+invt*InvTime+waitT*WaitT+accT*acctim
+eggT*egress
+accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Bus)= BSAsc+cost*frunCost+invt*InvTime+waitT*WaitT+accT*acctim
+eggT*egresst
+ ptinc*pinc + ptgend*gender/
U(Bway)= BWAsc+cost*Tcost+invt*InvTime+waitT*WaitT+accT*acctim
+eggT*egresst
+ accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Train)= TNAsc+cost*tcost+invt*InvTime+waitT*WaitT+accT*acctim +
eggT*egresst
+ accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Car)= cost*costs+invt*InvTime+CRpark*parkcost+CReggT*egresst $
+---------------------------------------------------------------+
| Generalized Mixed (RP) Logit Model |
| Dependent variable RESP1 |
| Log likelihood function -2108.366 |
| Restricted log likelihood -3580.475 |
| Chi squared [ 28 d.f.] 2944.21786 |
| Significance level .0000000 |
| McFadden Pseudo R-squared .4111491 |
| Estimation based on N = 1840, K = 28 |
| AIC = 2.3221 Bayes IC = 2.4061 |
| AICf.s. = 2.3226 HQIC = 2.3531 |
| Model estimated: Jun 13, 2009, 15:05:03 |
| Constants only. Must be computed directly. |
| Use NLOGIT ;. . .; RHS=ONE $ |
| At start values -3237.9705 .34886 ******* |
| Response data are given as ind. choice. |
+---------------------------------------------------------------+
+---------------------------------------------------------------+
| Notes No coefficients=> P(i,j)=1/J(i). |
| Constants only => P(i,j) uses ASCs |
| only. N(j)/N if fixed choice set. |
| N(j) = total sample frequency for j |
| N = total sample frequency. |
| These 2 models are simple MNL models. |
| R-sqrd = 1 - LogL(model)/logL(other) |
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
683 Mixed logit estimation
| RsqAdj=1-[nJ/(nJ-nparm)]*(1-R-sqrd) |
| nJ = sum over i, choice set sizes |
+---------------------------------------------------------------+
+---------------------------------------------------------------+
| Generalized Mixed (RP) Logit Model |
| Replications for simulated probs. = 500 |
| Halton sequences used for simulations |
| ------------------------------------------------------------ |
| RPL model with panel has 115 groups. |
| Fixed number of obsrvs./group= 16 |
| Random parameters model was specified |
| ------------------------------------------------------------ |
| RPL model has correlated parameters |
| Hessian was not PD. Using BHHH estimator. |
| Number of obs.= 1840, skipped 0 bad obs. |
+---------------------------------------------------------------+
+-----------+------------------+-----------------------+------------+-----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]|
+-----------+------------------+-----------------------+------------+-----------+
+-----------+Random parameters in utility functions |
|INVT | .25713*** .02822 9.111 .0000 |
|WAITT | .17083** .08294 2.060 .0394 |
|ACCT | .45530*** .05277 8.628 .0000 |
|EGGT | .29543*** .06887 4.290 .0000 |
|COST | 1.00000 . . . . . .(Fixed Parameter). . . . . .. |
+-----------+Nonrandom parameters in utility functions |
|NLRASC | 1.14276*** .29200 3.914 .0001 |
|ACCBUSF | -.06004* .03448 -1.741 .0817 |
|PTINC | -.01312*** .00329 -3.989 .0001 |
|PTGEND | 1.56461*** .20692 7.562 .0000 |
|NLRINSDE| -1.28832* .72162 -1.785 .0742 |
|TNASC | 1.55597*** .21769 7.148 .0000 |
|NHRINSDE| -2.72308*** .77589 -3.510 .0004 |
|NBWASC | .81411*** .24155 3.370 .0008 |
|BSASC | 1.10003*** .20211 5.443 .0000 |
|BWASC | .97897*** .21523 4.548 .0000 |
|CRPARK | -.04267*** .01220 -3.496 .0005 |
|CREGGT | -.12115*** .01835 -6.601 .0000 |
+-----------+Diagonal values in Cholesky matrix, L. |
|TsINVT | .42316*** .05053 8.375 .0000 |
|TsWAITT | 1.01342*** .17256 5.873 .0000 |
|TsACCT | .77892*** .09478 8.218 .0000 |
|TsEGGT | .68827*** .14192 4.850 .0000 |
|CsCOST | .000 . . . . . .(Fixed Parameter). . . . . .. |
+-----------+Below diagonal values in L matrix. V = L*Lt |
|WAIT:INV| .67315*** .16945 3.973 .0001 |
|ACCT:INV| .30035** .14761 2.035 .0419 |
|ACCT:WAI| .54367*** .16706 3.254 .0011 |
|EGGT:INV| .65431*** .15691 4.170 .0000 |
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
684 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
685 Mixed logit estimation
sample;all$
reject;dremove=1$
reject;ttype#1$
reject;altij=-999$
GMXlogit;userp
;lhs=resp1,cset,Altij
;choices=NLRail,NHRail,NBway,Bus,Bway,Train,Car
;effects:InvTime(NLRail,NHRail,NBway,Bus,Bway,Train,Car)
;pwt
;gmx
;pds=16;halton;pts=250
;fcn=invt(t),waitt(t), acct(t),eggt(t),cost(t)
;tau=0.1 ? starting values other than 0.1 (default)
;gamma=0.1 ? starting values other than 0.1 (default)
;corr;par
+---------------------------------------------------------------+
| Generalized Mixed (RP) Logit Model |
| Dependent variable RESP1 |
| Log likelihood function -2089.330 |
| Restricted log likelihood -3580.475 |
| Chi squared [ 36 d.f.] 2982.28859 |
| Significance level .0000000 |
| McFadden Pseudo R-squared .4164655 |
| Estimation based on N = 1840, K = 36 |
| AIC = 2.3101 Bayes IC = 2.4181 |
| AICf.s. = 2.3109 HQIC = 2.3499 |
| Model estimated: Jun 16, 2009, 10:25:02 |
| Constants only. Must be computed directly. |
| Use NLOGIT ;. . .; RHS=ONE $ |
| At start values -2425.3400 .13854 ******* |
| Response data are given as ind. choice. |
+---------------------------------------------------------------+
+---------------------------------------------------------------+
| Notes No coefficients=> P(i,j)=1/J(i). |
| Constants only => P(i,j) uses ASCs |
| only. N(j)/N if fixed choice set. |
| N(j) = total sample frequency for j |
| N = total sample frequency. |
| These 2 models are simple MNL models. |
| R-sqrd = 1 - LogL(model)/logL(other) |
| RsqAdj=1-[nJ/(nJ-nparm)]*(1-R-sqrd) |
| nJ = sum over i, choice set sizes |
+---------------------------------------------------------------+
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
686 The suite of choice models
+---------------------------------------------------------------+
| Generalized Mixed (RP) Logit Model |
| Replications for simulated probs. = 250 |
| Halton sequences used for simulations |
| ------------------------------------------------------------ |
| RPL model with panel has 115 groups. |
| Fixed number of obsrvs./group= 16 |
| Random parameters model was specified |
| ------------------------------------------------------------ |
| RPL model has correlated parameters |
| Hessian was not PD. Using BHHH estimator. |
| Number of obs.= 1840, skipped 0 bad obs. |
+---------------------------------------------------------------+
+-----------+------------------+----------------------+-----------+------------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |
+-----------+------------------+----------------------+-----------+------------+
+-----------+Random parameters in utility functions |
|INVT | -.07039*** .00748 -9.411 .0000 |
|WAITT | -.05786** .02857 -2.025 .0428 |
|ACCT | -.11308*** .01764 -6.411 .0000 |
|EGGT | -.07669*** .02188 -3.506 .0005 |
|COST | -.32694*** .03399 -9.620 .0000 |
+-----------+Nonrandom parameters in utility functions |
|NLRASC | 3.24088*** .51691 6.270 .0000 |
|ACCBUSF | -.03795 .03658 -1.038 .2995 |
|PTINC | -.01704*** .00372 -4.575 .0000 |
|PTGEND | 1.39774*** .23413 5.970 .0000 |
|NLRINSDE| -1.03616 .93304 -1.111 .2668 |
|TNASC | 3.32005*** .46723 7.106 .0000 |
|NHRINSDE| -2.86157*** 1.05769 -2.705 .0068 |
|NBWASC | 2.63730*** .47806 5.517 .0000 |
|BSASC | 2.84240*** .44565 6.378 .0000 |
|BWASC | 2.79413*** .46306 6.034 .0000 |
|CRCOST | -.16531*** .05666 -2.918 .0035 |
|CRINVT | -.04816*** .00699 -6.893 .0000 |
|CRPARK | -.06677*** .01818 -3.672 .0002 |
|CREGGT | -.11379*** .01901 -5.985 .0000 |
+-----------+Diagonal values in Cholesky matrix, L. |
|TsINVT | .09863*** .01225 8.053 .0000 |
|TsWAITT | .35631*** .04133 8.622 .0000 |
|TsACCT | .19250*** .04823 3.991 .0001 |
|TsEGGT | .23144*** .03911 5.917 .0000 |
|TsCOST | .10869 .10289 1.056 .2908 |
+-----------+Below diagonal values in L matrix. V = L*Lt |
|WAIT:INV| -.27949*** .05096 -5.485 .0000 |
|ACCT:INV| -.11256*** .04108 -2.740 .0061 |
|ACCT:WAI| .12671*** .04650 2.725 .0064 |
|EGGT:INV| -.18610*** .04405 -4.224 .0000 |
|EGGT:WAI| .18536*** .04547 4.077 .0000 |
|EGGT:ACC| -.09697** .03838 -2.527 .0115 |
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
687 Mixed logit estimation
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
688 The suite of choice models
nlogit
;lhs=resp1,cset,Altij
;choices=NLRail,NHRail,NBway,Bus,Bway,Train,Car
;pwt
;rpl
;pds=16;halton;pts=500
;fcn=invt(t,1),waitt(t,1), acct(t,1),eggt(t,1),cost(t,1)
;par
;model:
U(NLRail)= NLRAsc+cost*tcost+invt*InvTime+waitt*waitt2+accT*acctim
+accbusf*accbusf
+ ptinc*pinc + ptgend*gender + NLRinsde*inside /
U(NHRail)= TNAsc+cost*Tcost+invt*InvTime+waitT*WaitT+accT*acctim
+eggT*egresst
+accbusf*accbusf+ ptinc*pinc + ptgend*gender +
NHRinsde*inside /
U(NBway)= NBWAsc +cost*Tcost+invt*InvTime+waitT*WaitT+accT*acctim
+eggT*egress
+accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Bus)= BSAsc+cost*frunCost+invt*InvTime+waitT*WaitT+accT*acctim
+eggT*egresst
+ ptinc*pinc + ptgend*gender/
U(Bway)= BWAsc+cost*Tcost+invt*InvTime+waitT*WaitT+accT*acctim
+eggT*egresst
+ accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Train)= TNAsc+cost*tcost+invt*InvTime+waitT*WaitT+accT*acctim +
eggT*egresst
+ accbusf*accbusf+ ptinc*pinc + ptgend*gender /
U(Car)= cost*costs+invt*InvTime+CRpark*parkcost+CReggT*egresst $
+---------------------------------------------------------------+
| Random Parameters Logit Model |
| Dependent variable RESP1 |
| Log likelihood function -2257.958 |
| Restricted log likelihood -3580.475 |
| Chi squared [ 17 d.f.] 2645.03271 |
| Significance level .0000000 |
| McFadden Pseudo R-squared .3693690 |
| Estimation based on N = 1840, K = 17 |
| AIC = 2.4728 Bayes IC = 2.5238 |
| AICf.s. = 2.4730 HQIC = 2.4916 |
| Model estimated: Jun 14, 2009, 18:04:16 |
| Constants only. Must be computed directly. |
| Use NLOGIT ;. . .; RHS=ONE $ |
| At start values -2332.5131 .03196 ******* |
| Response data are given as ind. choice. |
+---------------------------------------------------------------+
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
689 Mixed logit estimation
+---------------------------------------------------------------+
| Notes No coefficients=> P(i,j)=1/J(i). |
| Constants only => P(i,j) uses ASCs |
| only. N(j)/N if fixed choice set. |
| N(j) = total sample frequency for j |
| N = total sample frequency. |
| These 2 models are simple MNL models. |
| R-sqrd = 1 - LogL(model)/logL(other) |
| RsqAdj=1-[nJ/(nJ-nparm)]*(1-R-sqrd) |
| nJ = sum over i, choice set sizes |
+---------------------------------------------------------------+
+---------------------------------------------------------------+
| Random Parameters Logit Model |
| Replications for simulated probs. = 500 |
| Halton sequences used for simulations |
| ------------------------------------------------------------ |
| RPL model with panel has 115 groups. |
| Fixed number of obsrvs./group= 16 |
| Random parameters model was specified |
| ------------------------------------------------------------ |
| Number of obs.= 1840, skipped 0 bad obs. |
+---------------------------------------------------------------+
+-----------+------------------+----------------------+-----------+------------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |
+-----------+------------------+----------------------+-----------+------------+
+-----------+Random parameters in utility functions |
|INVT | -.06852*** .00387 -17.719 .0000 |
|WAITT | -.09419*** .01667 -5.649 .0000 |
|ACCT | -.11947*** .01025 -11.658 .0000 |
|EGGT | -.04954*** .01055 -4.697 .0000 |
|COST | -.29739*** .02281 -13.035 .0000 |
+-----------+Nonrandom parameters in utility functions |
|NLRASC | 2.45939*** .32302 7.614 .0000 |
|ACCBUSF | -.06846* .03712 -1.844 .0651 |
|PTINC | -.00865*** .00249 -3.478 .0005 |
|PTGEND | 1.60422*** .21467 7.473 .0000 |
|NLRINSDE| -1.22504*** .39846 -3.074 .0021 |
|TNASC | 1.76651*** .25499 6.928 .0000 |
|NHRINSDE| -1.15571*** .44472 -2.599 .0094 |
|NBWASC | .96813*** .28944 3.345 .0008 |
|BSASC | 1.27769*** .23769 5.375 .0000 |
|BWASC | 1.16536*** .25118 4.639 .0000 |
|CRPARK | -.00529 .00790 -.670 .5030 |
|CREGGT | -.05475*** .01556 -3.519 .0004 |
+-----------+Distns. of RPs. Std.Devs or limits of triangular.|
|TsINVT | .06852*** .00387 17.719 .0000 |
|TsWAITT | .09419*** .01667 5.649 .0000 |
|TsACCT | .11947*** .01025 11.658 .0000 |
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
690 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
Table 15.8 Summary of model results
Note: All public transport refers to new heavy rail, new light rail, new busway, bus, train, and busway; time is in min. and cost is in dollars ($2003). T-values are in
brackets in columns (3) to (6). Models are estimated with 500 Halton draws.
M1: Preference M3: Preference space, gen- M4: Preference space: mixed logit with
Attribute Alternatives space, mixed logit M2: WTP Space eralized mixed logit constrained triangular distn
(1) (2) (3) (4) (5) (6)
M1: Preference M2: WTP Space M3: Preference space, gen- M4: Preference space: mixed logit with
Attribute(1) Alternatives(2) space, mixed logit(3) (4) eralized mixed logit(5) constrained triangular distn(6)
Gender (male = 1) Public 1.5053 (5.33) 1.5646 (7.56) 1.3977 (5.97) 1.6042 (7.47)
transport
New light rail New light rail −1.4597 (−2.42) −1.2883 (−1.79) −1.0362 (−1.11) −1.2250 (−3.07)
inside trip
New heavy rail New heavy rail −3.0826 (−3.74) −2.7231 (−3.51) −2.8616 (−2.71) −1.1557 (−2.60)
inside trip
Random parameters: standard deviation
Main mode All modes 0.1109 (12.47) 0.4232 (8.38) 0.0986 (8.05) −0.0685 (−17.7)
invehicle time
Wait time All public 0.3409 (8.24) 1.0134 (5.87) 0.4529 (10.06) −0.0942 (−5.65)
modes
Access time All public 0.2340 (7.06) 0.7789 (8.22) 0.2565 (5.39) −0.1195 (−11.66)
modes
Egress travel time All public 0.3459 (13.44) 0.6883 (4.85) 0.3633 (8.69) −0.0495 (−4.70)
transport
Main mode All modes 0.7668 (10.51) 0.00 (fixed) 0.6669 (7.66) −0.2974 (−13.04)
invehicle cost
Cholesky matrix: diagonal
Main mode All modes 0.1109 (12.47) 0.4232 (8.38) 0.0986 (8.05) −
invehicle time
Wait time All public 0.2630 (8.12) 1.0134 (5.87) 0.3563 )8.62) −
modes
Access time All public 0.2077 (6.54) 0.7789 (8.22) 0.1925 (3.99) −
modes
Egress travel time All public 0.2719 (7.77) 0.6883 (4.85) 0.2314 (5.92) −
transport
Main mode All modes 0.5132 (6.95) 0.00 (fixed) 0.1087 (1.06) −
invehicle cost
Cholesky matrix: below-diagonal
Wait: invehicle time −0.2170 (5.76) 0.6732 (3.97) −0.2795 (5.49) −
Access: invehicle −0.0498 (1.59) 0.3004 (2.04) −0.1126 (2.74) −
time
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
Access: wait time −0.0956 (−3.35) 0.5437 (3.25) 0.1267 (2.73) −
Egress: invehicle −0.1270 (−3.72) 0.6543 (4.17) −0.1861 (−4.22) −
time
Egress: wait time −0.1609 (−5.80) 0.6270 (3.23) 0.1854 (4.08) −
Egress: access time −0.0611 (−2.43) 0.6861 (4.65) −0.0970 (−2.53) −
Invehicle cost: 0.4403 (4.43) 0.00 (fixed) −0.0633 (−0.49) −
invehicle time
Invehicle cost: wait 0.2353 (3.58) 0.00 (fixed) −0.4869 (−5.11) −
time
Invehicle cost: −0.1403 (−1.78) 0.00 (fixed) −0.4328 (−5.32) −
access time
Invehicle cost: egress 0.2359 (2.72) 0.00 (fixed) −0.0669 (−0.83) −
time
Variance parameter − 0.4896 (7.06) 0.4103 (10.67) −
in scale (τ):
Weighting − 0.00 (fixed) 0.0015 (0.007) −
parameter
gamma (γ):
Parameter for cost − −0.2956 (−12.4) − −
(WTP space)
Sigma: − − −
Sample mean − 0.9663 0.9773 −
Sample standard − 0.4144 0.3499 −
deviation
Model fit:
LL at zero −3580.48
LL at convergence −2043.85 −2108.37 −2031.63 −2257.96
Information 4328.23 4427.22 4202.31 4643.79
criterion AIC
Pseudo-R2 0.429 0.411 0.433 0.369
Number of 37 38 39
parameters
Sample size 1840
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
694 The suite of choice models
19
Although we have chosen the unconstrained triangular distribution, we have found very similar evidence
of long tails and/or sign changes when using a number of analytical distributions such as normal,
log-normal, Rayleigh, and asymmetric normal.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
Table 15.10 Lower and upper WTP estimates (ML = Model 1; WTPS = Model 2, GMX = Model 3)
invtML invtWTPS invtGMX waitML waitWTPS waitGMX accessML accessWTPS accessGMX egressML egressWTPS egressGMX
–313.09 –7.30 –222.93 –428.23 –45.84 –310.01 –475.92 –12.96 –419.77 –331.55 –33.17 –146.29
–162.36 –4.93 –105.74 –328.34 –39.20 –151.60 –406.29 –11.09 –316.98 –300.54 –31.37 –141.04
–149.48 –2.84 –74.72 –261.68 –31.63 –102.91 –237.82 –4.60 –284.73 –230.83 –28.64 –88.48
–81.69 –2.54 –69.59 –177.33 –21.42 –100.32 –192.72 –4.44 –207.14 –79.20 –27.77 –64.28
–40.79 0.11 –50.56 –158.86 –20.82 –97.58 –177.31 –3.93 –166.94 –77.01 –26.17 –43.76
–16.98 0.64 –36.22 –110.22 –20.40 –97.00 –129.31 –2.22 –123.99 –75.90 –23.36 –39.43
–13.24 0.65 –9.87 –81.37 –19.41 –77.11 –116.35 –1.27 –115.63 –54.61 –22.19 –37.96
–12.49 0.67 –8.37 –76.67 –19.31 –52.31 –99.36 –0.64 –111.72 –44.62 –21.84 –32.43
–5.22 0.91 –7.21 –67.64 –19.21 –30.87 –89.62 0.48 –5.38 –40.19 –15.57 –25.45
–2.94 1.38 –6.79 –44.67 –19.06 –27.87 –69.30 2.35 –4.57 –31.68 –15.32 –25.28
0.67 2.07 –1.94 –38.80 –18.55 –27.18 –45.36 2.36 –4.13 –27.36 –14.55 –22.59
……………….
17.35 21.63 19.95 10.07 14.53 12.31 23.52 33.66 34.56 17.89 23.75 16.34
17.48 21.86 20.59 10.27 15.07 13.19 24.07 34.08 34.64 18.94 25.65 16.51
17.49 22.10 20.97 10.93 15.10 13.26 24.50 35.73 34.69 19.01 26.72 18.09
18.14 22.15 21.39 11.77 16.77 15.83 27.26 35.74 35.03 19.03 28.52 19.07
18.37 22.24 22.10 12.71 18.01 16.07 27.41 36.45 35.32 19.63 29.44 19.71
18.53 22.41 22.47 15.50 18.96 19.68 29.19 37.43 35.75 20.62 30.05 20.33
19.42 22.52 23.42 16.20 20.05 21.56 30.73 38.06 36.18 20.86 30.63 22.44
19.43 23.08 23.56 16.88 20.48 22.00 32.08 39.61 37.28 21.02 32.28 22.86
19.95 23.51 23.94 18.15 20.96 22.37 33.75 39.80 38.22 27.85 32.84 25.81
20.84 23.63 25.33 20.64 22.46 22.64 34.21 39.98 39.72 28.82 33.35 27.92
21.07 23.79 27.54 20.91 22.98 27.47 34.31 40.30 40.49 29.86 33.40 29.24
21.87 23.96 27.86 20.91 23.56 27.86 34.64 40.68 41.28 30.15 33.55 30.25
22.20 24.04 28.42 23.22 24.46 28.65 37.97 41.81 43.43 30.47 33.59 31.45
22.22 24.10 29.18 27.53 27.11 29.18 38.17 42.17 43.51 32.93 35.05 31.74
22.73 24.24 30.62 27.61 27.95 30.44 38.46 42.22 45.89 34.40 35.75 32.85
24.96 24.25 30.63 29.87 30.39 34.40 40.18 42.69 47.13 36.05 37.87 33.94
25.53 24.43 30.97 31.11 31.29 37.65 41.13 43.32 50.02 37.12 37.96 33.96
25.92 24.50 31.07 32.25 31.58 38.56 44.29 44.45 53.03 37.27 38.18 34.88
26.21 24.51 31.25 34.27 31.75 41.31 46.29 45.37 53.80 37.40 39.27 36.10
27.63 24.66 32.53 34.59 32.23 45.15 46.54 47.85 58.93 53.30 44.71 36.98
27.75 25.09 35.25 35.85 34.87 47.82 52.39 48.23 61.81 57.93 45.67 41.84
29.22 25.22 39.91 36.68 36.42 48.87 56.12 48.41 63.91 60.82 46.27 42.82
29.48 25.46 42.68 38.44 37.01 49.48 56.44 51.46 78.78 63.84 47.49 44.81
30.36 26.10 43.42 39.42 39.17 61.99 58.21 51.73 84.01 64.95 49.10 48.00
31.04 26.30 50.65 51.81 44.74 68.52 69.14 52.14 86.22 65.21 50.01 56.48
34.73 26.33 58.06 52.70 46.91 80.23 73.21 54.27 95.08 71.11 50.35 59.56
35.23 26.40 59.44 53.33 47.16 83.70 78.17 54.50 101.80 75.33 54.96 59.94
37.96 26.48 61.50 54.01 50.13 103.79 90.56 56.65 162.19 78.94 55.52 66.96
38.97 26.55 64.33 140.38 59.39 137.37 111.07 61.53 177.26 97.21 63.42 78.93
42.08 27.39 93.52 157.04 59.52 176.00 116.23 66.08 192.59 130.60 64.13 103.18
56.05 28.38 101.26 157.46 63.18 244.44 136.66 69.05 211.45 148.38 77.43 233.66
61.21 29.57 116.45 160.58 64.05 374.80 151.11 75.21 298.92 162.58 79.45 381.36
107.47 30.72 276.16 184.47 68.57 406.47 224.87 75.72 437.25 275.43 83.91 411.82
120.62 31.07 392.64 394.87 77.17 460.66 393.93 78.94 869.15 474.97 99.63 417.57
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
696 The suite of choice models
Empirically the distribution is symmetrical about the mean,20 which not only
allows for ease of interpretation, but also avoids the problem of long tails often
associated with drawing from a log-normal distribution.
To enable a comparison of the WTP estimates above with a popular version
of a ML model with a constrained triangular distribution on all random
parameters, we also estimated Model 1 with constrained triangular distribu-
tions for each of the random parameters (Table 15.8, last column: M4), and
compare the mean WTP with Models 1–3 (Table 15.8). To be able to assess the
differences in the mean estimates, we calculate standard errors for the para-
meter estimates used to obtain the WTP estimates. Given the focus on the
mean, we can implement Equation (21) of Scarpa and Rose (2008), repro-
duced here as Equation (15.42):
2
2 1 α
Variance of mean WTP ¼ β ½VarðαÞ 2αβ Covðα; βÞ þ VarðβÞ:
β
ð15:42Þ
To obtain the standard errors around the mean, reported in Table 15.11, we
take the square root of Equation (15.42), given the variance and covariance of
the numerator (α) and the denominator (β) parameters. In WTP space, we
Table 15.11 Mean estimates of willingness to pay ($/person hr.) Standard errors in brackets
U-space constrained MXL (M1con) 16.31 (2.047E−07) 29.31 (4.093E−06) 12.32 (2.114E−06)
U-space unconstrained MXL (M1) 8.70 (1.335E−06) 7.32 (1.055E−05) 11.25 (9.258E−09)
WTP space unconstrained GMX (M2) 16.35 (0.00079) 24.07 (0.0028) 15.71 (0.0047)
U-space unconstrained GMX (M3) 17.68 (9.787E−07) 23.05 (2.701E−05) 18.58 (0.0057)
20
There has been a lot said about the appropriateness of the constrained triangular distribution. To place
this in the correct context, there is support in the literature for the point that the sign across the
distribution must make sense in terms of some acceptable hypothesis. That is the crucial initial point
(regardless of the specific constraint imposed on the specific distribution) and there is a long history in
economics in selecting a log-normal distribution precisely for this reason. This is a widely held position
in the literature on risk. There is a literature supporting the sense of constrained distributions, where the
constraint relates to an appropriate sign on the coefficient across the full distribution. The log-normal is
one such example which through its functional form ensures that the sign is always the same; however,
this comes at a high cost of a very long tail to the right. Some authors have investigated the triangular
distribution and especially the constrained triangular distribution (referred to by them as the “Flared”
Triangular Distribution), as a symmetric and an asymmetric distribution.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
697 Mixed logit estimation
only have a variance term for (α/β). There are no cross-overs between model
pairs when calculated at the 95 percent confidence intervals, suggesting that
the means of the distributions are statistically significantly different.
We find a strong equivalence between the evidence on mean WTP for
invehicle time savings when imposing a constrained distribution (e.g., con-
strained triangular) in preference space (M4) and that obtained in a WTP
space GMX model with an unconstrained distribution (M2), and to a lesser
extent for the GMX model (M3). When scale heterogeneity is not allowed for
(M1) with an unconstrained distribution, we find significantly different evi-
dence (close to 50 percent lower mean WTP). The situation is similar for the
WTP for access time savings, although there is a slightly larger divergence
between M4 and both of M2 and M3. The evidence for the WTP for egress
time savings is less conclusive; although the GMX models in WTP and
preference space are the larger estimates (respectively, $15.71 and $18.58),
the WTP space estimate sits almost midway between the unconstrained GMX
and constrained ML models in utility space.
Given a particular interest in the possibility that estimates in WTP space
might be a popular alternative way in the future (there is definitely growing
interest) to the currently “popular” use of ML models with constrained
distributions, there is encouraging evidence, at least for two of the three
attributes, to speculate that appropriate constraining of distributions in pre-
ference space might be an empirical approximation for the outcome in WTP
space. While we do not claim that we have found the rationale for justifying a
constrained distribution in preference space if the WTP in WTP space is a
benchmark reference, we might describe the evidence as potentially encoura-
ging, but in need of further confirmation from other data sets and other
analytical distributions.
In this final section, we present an SMNL and GMX model and contrast
them with the standard MNL and ML models. The four models of interest
are summarized in Table 15.12. Model 1 (M1) is the standard MNL model,
Model 2 (M2) is the base random parameter (or mixed logit) model (MXL)
in utility space, Model 3 (M3) is the generalized random parameter or mixed
logit model (GMXL) that accounts for taste and scale heterogeneity, and
Model 4 (M4) is the scale heterogeneity model (SMNL) without taste
heterogeneity.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
698 The suite of choice models
∂logPj 1 XN
ð
Est:Avg: ¼ ½δj;l Pl ðβi ; Xi Þβk xk;l;i dβi ; ð15:43Þ
∂logxk;l N i¼1
βi
21
We have found that using start values from ML for GMXL is preferable than using MNL start values.
22
We did not find any statistically significant “h” effects as per Equation (15.17).
23
AIC = 2k−2Ln(L), where k is the number of parameters in the model, and L is the maximized value of the
likelihood function for the estimated model.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
Table 15.12 Summary of model results
Note: All public transport is new heavy rail, light rail, and busway; and existing bus, train, and busway; time is in min. and cost is in dollars ($2003). T-values are in
brackets.
M3: Generalized
M1: multinomial M2: Mixed logit mixed logit M4: Scale MNL
Attribute Alternatives logit (MNL) (MXL) (GMXL) (SMNL)
M1: multinomial M2: Mixed logit M3: Generalized M4: Scale MNL
Attribute Alternatives logit (MNL) (MXL) mixed logit(GMXL) (SMNL)
Egress travel time All public transport − 0.2447 (7.58) 0.2741 (6.63) −
Main mode invehicle cost All public modes − 0.5325 (6.91) 0.4146 (3.38) −
Cholesky matrix: below- diagonal values
Wait: invehicle time All public modes − 0.1667 (4.16) −0.2814 (4.93) −
Access: invehicle time All public modes − 0.0104 (0.36) −0.0893 (2.08) −
Access: wait time All public modes − −0.1658 (5.44) 0.0703 (2.27) −
Egress: invehicle time All public transport − 0.1381 (3.41) −0.2505 (4.66) −
Egress: wait time All public modes − −0.1077 (2.67) 0.1240 (2.33) −
Egress: access time All public modes − −0.0129 (0.30) 0.0684 (1.09) −
Invehicle cost: invehicle time All public modes − −0.0865 (0.75) 0.0613 (0.47) −
Invehicle cost: wait time All public modes − 0.2490 (2.32) −0.4509 (3.79) −
Invehicle cost: access time All public transport − 0.1192 (1.22) 0.3006 (2.88) −
Invehicle cost: egress time All public modes − 0.2358 (2.48) −0.1238 (1.02) −
Variance parameter in scale (τ): − 0.4109 (7.39) 1.1418 (12.11)
Weighting parameter γ: − − 0.00028 (0.007) −
Sigma: −
Sample mean − − 0.9758 0.8185
Sample standard deviation − − 0.3504 0.8347
Model fit:
LL at zero −3580.48
LL at convergence −2522.49 −2156.88 −2111.62 −2415.54
McFadden pseudo-R2 0.295 0.398 0.410 0.325
Info. criterion: AIC 5076.97 4375.75 4289.25 4865.07
Sample size 1840
VTTS ($/person hr.)
Main mode invehicle time All public modes 15.64 10.92 (16.92)# 12.60 (6.58) 14.66
Wait time All public modes 8.78 17.09 (33.84) 13.01 (48.9) 7.79
Access time All public modes 19.25 18.94 (19.94) 20.94 (4.60) 16.95
Egress travel time All public transport 4.88 18.80 (30.79) 15.55 (30.25) 4.99
Invehicle time Car 18.08 14.09 17.60 13.48
Note:
* fixed parameters; # standard deviations in brackets for VTTS.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
701 Mixed logit estimation
Table 15.13 Direct time and cost elasticities Note: Uncalibrated models, standard deviations in brackets.
Invehicle New light rail −1.674 (1.021) −1.421 (0.758) −1.481 (0.796) −1.106 (0.462)
time (invt-NLR)
New heavy rail −1.595 (0.945) −1.399 (0.684) −1.533 (0.752) −1.172 (0.530)
(invt-NHR)
New busway (invt- −2.133 (0.976) −1.744 (0.652) −1.936 (0.747) −1.415 (0.465)
NBWY)
Bus (invt-Bus) −1.773 (0.995) −1.356 (0.581) −1.475 (0.650) −1.260 (0.456)
Busway (invt-Bway) −1.540 (0.880) −1.317 (0.703) −1.465 (0.809) −1.188 (0.530)
Train (invt-Train) −1.344 (0.752) −1.227 (0.609) −1.340 (0.731) −1.035 (0.469)
Car (invt-Car) −1.215 (0.709) −0.894 (0.648) −0.763 (0.441) −0.847 (0.853)
Cost New light rail −0.699 (0.446) −0.883 (0.512) −0.775 (0.475) −0.493 (0.236)
(cost-NLR)
New heavy rail −0.704 (0.452) −0.756 (0.391) −0.733 (0.389) −0.547 (0.272)
(cost-NHR)
New busway −1.143 (0.496) −0.917 (0.507) −0.943 (0.468) −0.806 (0.319)
(cost-NBWY)
Bus (cost-Bus) −0.942 (0.486) −0.826 (0.384) −0.815 (0.389) −0.770 (0.326)
Busway (cost-Bway) −0.646 (0.414) −0.758 (0.396) −0.739 (0.427) −0.522 (0.264)
Train (cost-Train) −0.832 (0.483) −0.713 (0.368) −0.686 (0.351) −0.626 (0.281)
Car (cost-Car) −0.580 (0.339) −0.537 (0.387) −0.363 (0.209) −0.528 (0.530)
Note:
* The standard deviations are an artifact of different choice probabilities and not a result of preference
heterogeneity.
where j and l index alternatives, x indexes the attribute, and i indicates the
individual. The integrals cannot be computed directly, so they are simulated in
the same fashion (and at the same time) as the LL function. Using R simulated
draws from the distribution of βi, we obtain the simulated values of the means
of the elasticities (Equation 15.44):
∂logPj 1 XN 1 XR
Est:Avg: ¼ ½δj;l Pl ðβi;r ; Xi Þβk;i;r xk;l;i : ð15:44Þ
∂logxk;l N i¼1 R r¼1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
702 The suite of choice models
“reference” given the focus on contrasting GMXL and SMNL with MXL) and
difference the mean estimates for each other model against this model.
Beginning with the invehicle time mean elasticities,24 the SMNL model
(M4) has the greatest consistently negative difference25 relative to MXL (M2),
remaining directionally negative for all modal alternatives, indicating that all
mean elasticities are higher for ML compared to scale MNL. In contrast, ML
has higher mean elasticity estimates than MNL (M1) and GMXL (M3), with
the one exception of car invehicle time for GMXL.
This evidence, albeit from one study, suggests that the SMNL model, that
excludes consideration of attribute preference heterogeneity, produces notice-
ably lower mean estimates of the elasticities for invehicle travel time. For the
cost attribute, the same findings apply for ML compared to SMNL; however,
the directional implication is not clear in comparisons of MXL with MNL and
GMXL.
When we undertake a statistical test of differences (using the mean and
standard deviation) between various model pairs (see Table 15.14), we find on
the t-ratio of differences test that there is no statistically significant difference
between the mean estimates, without exception.26 Hence the extension from
MNL to ML to generalized mixed logit, and the focus only on scale hetero-
geneity, does not impact materially on the evidence on direct elasticities,
despite the actual mean estimates that are typically used in practice being
different in absolute terms.
This empirical evidence suggests that although recognition of preference
and scale heterogeneity through observed attributes improves on the goodness
of fit of the models, and aligns the mean elasticity estimates “closer” to those of
the popular ML model (which assumes scale homogeneity), the differences are
not statistically significant. However, despite this evidence, practitioners tend
to focus on applying the mean estimates, and hence when only scale hetero-
geneity is accommodated the mean elasticity estimates are, with a few excep-
tions, noticeably lower than both ML and generalized mixed logit.
This evidence, admittedly from a single study, raises doubts about the
substantive empirical merits of allowing for scale heterogeneity in the absence
24
The elasticities are based on uncalibrated models and as such the numerical magnitudes are only valid in
the comparisons across models. These models cannot be used to forecast patronage without calibration
using revealed preference shares on existing modes.
25
Since all elasticities are negative, a lower value is an absolute lower value (e.g. −0.435 is lower than
−0.650).
26
We also undertook a bootstrap calculation (as per Section 7.3.3) for two of the variables to ensure that the
t-ratio test was a useful approximation. The resulting standard errors confirm that the t-ratios are a good
approximation.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
703 Mixed logit estimation
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
704 The suite of choice models
to MNL (recognizing that the value for egress time is very similar). The
behavioral implications are far from clear other than that the ML and general-
ized mixed logit models appear to produce higher mean estimates than the
models that assume preference homogeneity. This finding is known from
other studies (see Hensher 2010).
In the context of WTP, Daly et al. (2012) have expressed concern about the
properties of some common choices of distributions for the cost coefficient in
models estimated in preference space. In particular, the authors present a
mathematical proof to show that, when the domain of the distribution for the
cost coefficient includes zero, none of the moments of the WTP distribution
exists. If the distribution approaches zero, but does not include zero, then the
existence of the moments depends on the specific shape of the distribution as
it approaches zero. For the triangular distribution bounded at zero, the mean
of the inverse exists, while the variance does not.27 In experiments using a
finite number of draws in simulation, this problem is masked; indeed, Daly
et al. (2012) confirm their theoretical results using simulations with 107 draws.
It should be said that the proofs by Daly et al. are limited to the case of
uncorrelated coefficients (or to correlated normal coefficients), while the
present application allows for correlation between the time and cost coeffi-
cients which is essential when introducing scale heterogeneity, since it induces
correlation (as stated most eloquently by Train and Weeks 2005).
Nevertheless, Daly et al. (2012) also discuss how the same reasoning should
apply in the case of correlated coefficients, and from this perspective the
variances reported for the WTP indicators herein should be treated with
caution. Having investigated numerous distributions over many years, we
suggest that many distributions are controversial, especially when used as
ratios of parameters to obtain measures of WTP, and that the growing
popularity of estimating models in WTP space (in contrast to preference
space) may well be the focus in the future.
27
Albeit that problems with it going to infinity are slightly less pronounced than, say, with the uniform
bounded at zero, so that a limited simulation may produce apparently reasonable results.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
705 Mixed logit estimation
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.019
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
16.1 Introduction
Although the multinomial logit model (MNL) has provided the foundation
for the analysis of discrete choice modeling, its basic limitations, most notably
its assumption of independence from irrelevant alternatives (IIA), have moti-
vated researchers to consider alternative specifications. The mixed logit (ML)
model (see Chapter 15) is probably the most significant among a number of
innovations in terms of the range of behavior it can accommodate and its
overall flexibility. The latent class model (LCM) presented in this chapter is in
some respects a semi-parametric variant of the MNL model that resembles the
ML model. It is somewhat less flexible than the ML model in that it approx-
imates the underlying continuous distribution with a discrete one; however, it
does not require the analyst to make specific assumptions about the distribu-
tions of parameters across individuals. Thus, each model has its limitations
and virtues. However, as we will show below, the most advanced version of
LCM permits continuous distributions in each discrete class for the class-
specific parameters.
Latent class modeling provides an alternative approach to accommodating
heterogeneity in models such as MNL and ML (see Everitt 1988 and Uebersax
1999). The natural approach assumes that parameter vectors, βi, are distrib-
uted among individuals with a discrete distribution, rather than the contin-
uous distribution that lies behind the ML model. Thus, it is assumed that the
population consists of a finite number, Q, of groups of individuals. The groups
are heterogenous, with common parameters, βq, for the members of the
group, but the groups themselves are different from one another. We assume
that the classes are distinguished by the different parameter vectors, though
the fundamental data generating process, the probability density for the
interesting variable under study, is the same.
706
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
707 Latent class models
The analyst does not know from the data which observation is in which
class, hence the term latent classes. The model assumes that individuals are
distributed heterogenously with a discrete distribution in a population. In
this chapter, we will begin with the standard LCM, with fixed parameters
that are usually unconstrained so that they are different between classes, but
between-class restrictions can be imposed if they make sense. We then
introduce a more advanced LCM in which random parameters are imposed
within each class. In addition to the standard interpretation of latent classes,
we also recognize the growing popularity of the LCM to investigate attribute
processing rules (see Chapter 21), through the use of the restrictions
imposed on particular parameters within a class. When we do this for
fixed and/or random parameters, we are defining a class as having a specific
behavioral meaning and, as such, we refer to each class as a probabilistic
decision rule. The examples used in the chapter include both standard latent
class and latent class with random parameters, as well as the treatment of
attribute processing.
The LCM for the analysis of individual heterogeneity has a history in several
literatures. See Heckman and Singer (1984a, 1984b) for theoretical discussion.
However, a review of the literature suggests that the vast majority of the
received applications have been in the area of models for counts using the
Poisson or negative binomial models. Greene (2001) provides an early survey
of the literature. Swait (1994) and Bhat (1997) are early examples of the
application of LCM to the analysis of discrete choice among multiple
alternatives.
The underlying theory of the LCM posits that individual behavior depends
on observable attributes and on latent heterogeneity that varies with factors
that are unobserved by the analyst. We can analyze this heterogeneity through
a model of discrete parameter variation. Thus, it is assumed that individuals
are implicitly sorted into a set of Q classes, but which class contains any
particular individual, whether known or not to that individual, is unknown to
the analyst. The central behavioral model is a logit model for discrete choice
among Ji alternatives, by individual i observed in Ti choice situations, given in
Equation (16.1):
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
708 The suite of choice models
The number of observations and the size of the choice set may vary by
individual respondent. In principle, the choice set could vary by choice
situation as well. The probability for the specific choice made by an individual
can be formulated in several ways; for convenience, we allow yit to denote the
specific choice made, so that the model, given in Equation (16.2), provides:
The class assignment is, however, unknown. Let Hiq denote the prior prob-
ability for a class q for individual i (we consider posterior probabilities below).
Various formulations have been used for this (see Greene 2001). A particu-
larly convenient form is the MNL model shown in Equation (16.4):
expðz0 i θq Þ
Hiq ¼ XQ ; q ¼ 1; . . . ; Q; θQ ¼ 0; ð16:4Þ
0
q¼1
expðz i θ q Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
709 Latent class models
1
The EM algorithm has become popular in LCM estimation. Implementing the expectation maximization
(EM) algorithm by iterating between computing the posterior class probabilities, re-estimating the model
parameters in each class by using a probability weighted LL function, has been employed for all manner of
LC models, not just multinomial choice models, for many years. It is a generic algorithm, but it has little to
recommend it beyond its simple elegance. It is slow and requires the analyst to go back into the
computation at the end to compute the asymptotic covariance matrix for the estimators. Software such as
Latent Gold, for example, has been using this method. There is a misconception on the part of some that
the method is a new model, or somehow produces a different estimator from the MLE. Neither is the case.
See http://en.wikipedia.org/wiki/Expectations%E2%80%93maximization_algorithm.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
710 The suite of choice models
ðmodel sizeÞln N
BICðmodelÞ ¼ ln L þ : ð16:7Þ
N
With the parameter estimates of θq in hand, the prior estimates of the class
^ iq : Using Bayes’ theorem, we can obtain a posterior esti-
probabilities are H
mate of the latent class probabilities using Equation (16.8):
^ H
P ^
^ qji ¼ X ijq iq
H : ð16:8Þ
Q
^ ijq H
P ^ iq
q¼1
The same result can be used to estimate marginal effects in the logit model in
Equation (16.10):
∂ ln Fði; t; j j qÞ
km;itjjq ¼ ¼ xit;km ½1ðj ¼ kÞ Fði; t; k j qÞβmjq ð16:10Þ
∂xit;km
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
711 Latent class models
XN 1 XTi
^ km;j ¼ 1
^
: ð16:12Þ
N i¼1 T t¼1 km;tjji
i
In this section, we extend the LCM to allow for heterogeneity both within and
across groups. That is, we allow for variation of the parameter vector within
classes as well as between classes. The extended model is a straightforward
combination of the ML and latent class models. To accommodate the two
layers of heterogeneity, we allow for continuous variation of the parameters
within classes. The latent class aspect of the model is given as Equations
(16.13) and (16.14):
βi jq ¼ βq þ w ijq : ð16:15Þ
where the use of X indicates that wi|q is uncorrelated with all the exogenous
data in the sample. We will assume below that the underlying distribution for
the within class heterogeneity is normal with mean 0 and covariance matrix Σ.
In a given application, it may be appropriate to further assume that certain
rows and corresponding columns of Σq equal zero, indicating that the varia-
tion of the corresponding parameter is entirely across classes.
The contribution of individual i to the LL for the model is obtained for each
individual in the sample by integrating out the within class heterogeneity and
then the class heterogeneity. We will allow for a panel data setting as is
common with SC or best–worst data. The observed vector of outcomes is
denoted yi and the observed data on exogenous variables are collected in
Xi = [Xi1,..,XiTi]. The individual is assumed to engage in Ti choice situations,
where Ti ≥ 1. The generic model is given in Equation (16.17):
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
712 The suite of choice models
XQ ð YTi
f ðy i j Xi ; β1 ; . . . ; βQ ; θ; Σ1 ; . . . ; ΣQ Þ ¼ q¼1
π q ðθÞ t¼1
f ½yit jðβq
wi
þ w i Þ; Xit hðw i jΣq Þdw i :
ð16:17Þ
expðθq Þ
πq ðθÞ ¼ Q q ¼ 1; . . . ; Q; θQ ¼ 0: ð16:18Þ
Σq¼1 expðθq Þ
expðθ0 q zi Þ
πiq ðzi ; θÞ ¼ 0
q ¼ 1; . . . ; Q; θQ ¼ 0: ð16:19Þ
ΣQ
q¼1 expðθ q zi Þ
where yit,j = 1 for the j corresponding to the alternative chosen and 0 for all
others, and xit,j is the vector of attributes of alternative j for individual i in
choice situation t.
Just like mixed logit, the integrals cannot be evaluated analytically. We use
maximum simulated likelihood (along the same lines as mixed logit) to
evaluate the terms in the LL expression. The contribution of individual i to
the simulated LL is the log of Equation (16.21):
XQ 1 XR YTi
f S ðy i j Xi ; β1 ; . . . ; βQ ; θ; Σ1 ; . . . ; ΣQ Þ ¼ π ðθÞ
q¼1 q
f ½y jðβ þ w i;r Þ; Xit ;
R r¼1 t¼1 it q
ð16:21Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
713 Latent class models
where wi,r is the rth of R random draws (Halton draws in our implementation)
on the random vector wi. Collecting all terms, the simulated LL is given as
Equation (16.22):
XN XQ 1 XR YTi
log LS ¼ log π ðθÞ
q¼1 q
f ½y jðβ þ wi;r Þ; Xit :
i¼1 R r¼1 t¼1 it q
ð16:22Þ
The functional forms for πq(θ) and f[yit|(βq+wi,r),Xit] are given in Equations
(16.18) or (16.19), and (16.20), respectively.
Willingness to pay (WTP) estimates are computed using the familiar result,
WTP = −βx/βcost. Since there is heterogeneity of the parameters within the
classes as well as across classes, the WTP result has to be averaged to produce
an overall estimate. The averaging is undertaken for the random parameters
within each class, then again across classes using the posterior probabilities as
weights. Collecting the results, the procedure is shown in Equation (16.23):
^β 2 3
1 XR time;irjqAPR
6R L
r¼1 irjqAPR ^ 7
1 XN XQAPR 6 β cost;irjqAPR 7
d ¼
WTP fπ qAPR ðθÞjig6
^ 7
N i¼1 qAPR¼1 6
4 1 XR 7
5
L irjq
R r¼1
1 XN XQAPR 1 XR b
¼ fπqAPR ð^θÞjig WirjqAPR WTPtime;irjqAPR : ð16:23Þ
N i¼1 qAPR¼1 R r¼1
^β ^
time;irjqAPR ¼ β timejqAPR þ
^ timejqAPR wtime;irjqAPR
ð16:24Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
714 The suite of choice models
We will illustrate the set ups in Nlogit for both the standard LCM and the
LC_MMNL model, using data drawn from a study undertaken in the context
of toll versus free roads, which utilized a SC experiment involving two SC
alternatives (i.e., route A and route B), which are pivoted around the knowl-
edge base of travelers (i.e., the current trip). The trip attributes associated with
each route are summarized in Table 16.1.
Each alternative has three travel scenarios – “arriving x minutes earlier than
expected,” “arriving y minutes later than expected,” and “arriving at the time
expected.” Each is associated with a corresponding probability2 of occurrence
to indicate that travel time is not fixed but varies from trip to trip. For all
attributes except the toll cost, minutes arriving early and late, and the prob-
abilities of arriving on time, early, or late, the values for the SC alternatives are
variations around the values for the current trip. Given the lack of exposure to
tolls for many travelers in the study catchment area, the toll levels are fixed
over a range, varying from no toll to $4.20, with the upper limit determined by
the trip length of the sampled trip.
In the choice experiment, the first alternative is described by attribute levels
associated with a recent trip, with the levels of each attribute for Routes A and
B pivoted around the corresponding level of actual trip alternative with the
probabilities of arriving early, on time, and late provided. Commuters and
Routes A and B
Free-flow travel time
Slowed-down travel time
Stop/start/crawling travel time
Minutes arriving earlier than expected
Minutes arriving later than expected
Probability of arriving earlier than expected
Probability of arriving at the time expected
Probability of arriving later than expected
Running cost
Toll cost
2
The probabilities are designed and hence exogenously induced to respondents, similar to other travel time
variability studies.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
715 Latent class models
Game 5
Illustrative Choice Experiment Screen
Make your choice given the route features presented in this table, thank you.
Details of your Route A Route B
recent trip
Average travel time experienced
Time in free flow traffic (minutes) 20 14 12
Time slowed down by other traffic (minutes) 20 18 20
Time in stop/start/crawling traffic (minutes) 20 26 20
Probability of time of arrival
Arriving 9 minutes earlier than expected 30% 30% 10%
Arriving at the time expected 30% 50% 50%
Arriving 6 minutes later than expected 40% 20% 40%
Trip costs
Running costs $2.25 $3.26 $1.91
Toll costs $2.00 $2.40 $4.20
If you make the same trip again, which
route would you choose? Current Road Route A Route B
16.4.1 Results
The findings are presented in Table 16.2 for four models. All of the Nlogit set
ups are presented in section 16.5.2. Although we have not yet introduced
attribute processing (see Chapter 21), LCMs have become popular in esti-
mated choice models where analysts are interested in investigating the role of
various processing heuristics such as attribute non-attendance (ANA) or
attribute aggregation when they are in common metric units (ACMA).
Although it is not necessary to read Chapter 21 at this juncture, given the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
Table 16.2 Summary of models
Sample size N = 588 observations. For random parameter models, we used constrained t-distributions and 500 Halton draws. T-ratios are in parentheses.
Times in min. and costs in dollars (AU$2008). MNL model LL = −6729.90
Latent class, fixed parameters, no ANA, no ACMA (Model 1)
Attributes Class 1 Class 2 Class 3 Class 4 Class 5
Free-flow time −0.1945 (−7.81) −0.0743 (−3.58) −0.0398 (−4.41) −0.0312 (−1.45) −0.0033 (−0.26)
Slowed-down and Stop/start/crawling time −0.2360 (−10.5) −0.1728 (−7.53) −0.0782 (−6.46) −0.1521 (−6.81) −0.0559 (−4.96)
Running cost −0.2723 (−3.89) −2.2544 (−7.89) −0.4155 (−8.86) −1.5577 (−7.66) −0.3854 (−5.53)
Toll cost −0.2836 (−4.81) −2.6709 (−8.11) −0.3309 (−8.61) −1.2353 (−8.33) −0.1112 (−2.49)
Reference alt (1,0) −0.1727 (−0.76) −0.0823 (−0.38) 0.4211 (2.75) 3.9570 (9.26) −2.1696 (−6.92)
Commuter trip purpose (1,0) 0.2368 (0.89) 3.2134 (8.72) −2.8170 (−12.8) −3.9950 (−7.92) 3.5705 (9.52)
Probability of early arrival −0.0088 (−1.24) −0.0303 (−2.88) −0.0105 (−2.40) −0.0011 (−0.13) −0.0190 (−3.17)
Probability of late arrival −0.0222 (−2.98) −0.0398 (−3.43) −0.0198 (−4.17) −0.0349 (−4.22) −0.0250 (−3.78)
Stated choice alt 1 (1,0) −0.1022 (−0.77) 0.2673 (1.34) −0.0568 (−0.74) −0.2041 (−1.11) 0.0205 (0.20)
Class membership probability 0.124 (6.39) 0.309 (12.7) 0.184 (9.27) 0.277 (11.3) 0.107 (6.41)
Log-likelihood −4817.72
AIC/N 1.035
McFadden pseudo-R2 0.5339
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
Table 16.2 (cont.)
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
718 The suite of choice models
focus on estimating LCMs the examples may benefit from a quick read of
Chapter 21.
The first and third models in Table 16.2 assume full attribute attendance
(FAA), which is the standard fully compensatory assumption of the majority
of choice modeling applications, while the second and fourth models allow for
mixtures of FAA, ANA, and ACMA. The first two models assume fixed
parameters for all attributes, while the third and fourth models include
random parameters for the travel time and cost attributes. The random
parameters are defined by a constrained triangular distribution (see
Chapter 15 for details) with scale parameter equal to the mean estimate.3
The development of the models follows a natural sequence of behavioral
realism (or “complexity”) from Model 1 through to Model 4. Under the
assumption of FAA, selecting the number of classes is explained below,
along lines well recognized in the latent class literature. We estimated Model
1 under alternative numbers of classes (ranging from two through to seven
classes), with five classes having the best overall goodness of fit (including
AIC). When we move to Model 2, which assumes fixed parameters, but
introduces ANA and ACMA, we have to define the ANA and ACMA classes
and investigate how many classes should remain as FAA. In a number of
studies (e.g., Scarpa et al. 2009; Campbell et al. 2011; McNair et al. 2012), users
of LCMs (including Hensher and Greene 2010) imposed only one FAA class
when investigating attribute processing rules; however there are good reasons
why a number of classes might be considered (just as in Model 1), given that
taste heterogeneity can continue to exist between FAA classes in the presence
of elements of attribute processing. The introduction of multiple FAA classes
may also go some way to reducing the chance that the attribute processing
classes end up capturing taste heterogeneity as well as attribute processing,
which is a risk when the taste coefficients are not constrained across classes.
Model 2 is the final model under fixed parameters, varying the number of
FAA classes and specific ANA and ACMA classes. Model 3 overlays Model 1
with random parameters on the four time and cost attributes of the choice
experiment; however, the number of classes is reduced to four on overall
goodness of fit. Estimating this model takes many hours. Finally, we introduce
Model 4, which builds on all previous models and is also freely defined in
3
We investigated unconstrained distributions including log-normal, but models either failed to converge or
produced imprecise parameter estimates, most notably on the standard deviations of the random
parameters. This is consistent with Collins et al. (2013), who found that constraining the sign of the random
parameter distribution is necessary when ANA is handled through latent classes. Literally over 100 hours of
model estimation time was undertaken in the estimation of the random parameter versions of the models.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
719 Latent class models
terms of the number of FAA classes, the ANA and ACMA classes, and the
distributional assumptions imposed on each random parameter.4 Estimation
of Model 4 also takes many hours with many models failing to converge (see
n. 3). The number of FAA classes under random parameters is reduced to one
compared to three under fixed parameters (Model 2), possibly suggesting that
some amount of preference heterogeneity that was accommodated through
three classes under fixed parameters in Model 2 has been captured in a single
class in Model 4 through within class preference heterogeneity. The final four
models reported below are the outputs of this process.
A question naturally arises: how can the analyst determine the number of
classes, Q? Since Q is not a free parameter, a LRT is not appropriate though, in
fact, log L will increase when Q increases. Researchers typically use an infor-
mation criterion, such as AIC, to guide them toward the appropriate value. For
Model 1, the AIC was the lowest for five classes, at 1.035 (LL of −4817.72);
whereas for Model 3, with random parameters, we found four classes had the
lowest AIC (1.033 and a LL of −4803.2, slightly better than Model 1). Heckman
and Singer (1984a) also suggest a practical guidepost in selecting the number of
classes; namely, that if the model is fit with too many classes, estimates will
become imprecise, even varying wildly. Signature features of a model that has
been overfit include exceedingly small estimates of the class probabilities, wild
values of the structural parameters, and huge estimated standard errors. For
the models that account for ANA and ACMA (Models 2 and 4), the number of
classes is pre-defined by the number of restrictions on parameters that are
imposed to distinguish the attribute processing strategies of interest; however,
the number of classes with full attribute attendance is free and can be deter-
mined along the same lines as Models 1 and 3.
With respect to the ANA and ACMA conditions that might be imposed,
authors have suggested that responses to supplementary questions on whether
a respondent claims that they ignore specific attributes and/or added them up
may be useful to signal the possibility of specific attribute processing strate-
gies. For the sample of 588 observations, the following incidence of reported
ANA was obtained: free-flow time (28 percent), slowed-down and stop start
time (27 percent), running cost (17 percent), and toll cost (11 percent). The
incidence of ACMA is as follows: total time (60.5 percent) and total cost (80
percent). The reliability of such data has been questioned in many papers (see,
e.g., Hess and Hensher 2010); however, despite the concerns about the
4
We investigated correlations among the random parameters in the unconstrained distributions; however,
such models were an inferior statistical fit over Model 4 reported in Table 16.2.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
720 The suite of choice models
5
We also ran Model 3 as a standard ML model. Three models were estimated – an unconstrained
triangular distribution with and without correlated random parameters, and a constrained triangular
distribution that does not permit correlated parameters. The respective LLs (and AIC) at convergence
were −5512.22 (1.176), −5568.57 (1.187), and −6158.89 (1.311). In all cases, these models are inferior,
statistically, to Models 1–4 in Table 16.2, although an expected improvement over MNL (−6729.90,
1.433). Given that Model 4 outperforms the standard ML model, then Model 4 is additionally an
improvement over a model with preference heterogeneity accommodated with continuous random taste
heterogeneity (as opposed to the Model 1 with discrete preference heterogeneity).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
721 Latent class models
Table 16.3 WTP estimates Weighted average VTTS (2008 AUD$/person hr.) using weights for
components of time and components of cost (standard deviations in brackets)
50.00
VTTS ($/person hour)
40.00
30.00
20.00
10.00
0.00
0 100 200 300 400 500 600
Sample Sorted by VTTS
Figure 16.2 Distribution of VTTS for all models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
722 The suite of choice models
ANA are instead associated with a very low marginal disutility and appear in
FAA; and small differences in marginal disutility are revealed under ACMA in
contrast to equal marginal disutility. There does, however, remain a sizeable
(but smaller) incidence of ANA and ACMA.
WTP estimates for the value of total travel time savings ($/hr.) based on
equation (16.23), obtained for all four models, are summarized in Table 16.3
and in Figure 16.2. The averaging is undertaken for the random parameters as
per Equation (16.23). We find that the mean estimates increase as we account
for attribute processing. On a test of statistical differences of VTTS estimates,
the z values are greater than 1.96 (ranging from 13.72 for M1 versus M3 to
31.7 for M3 versus M4), except for the comparison of Models 2 and 4 (z =
0.85). Thus we can conclude that adding a layer of random parameters to the
model that accounts for FAA, ANA, and ACMA does not result in a statisti-
cally significant difference in the mean estimate of VTTS (M2 versus M4);
however, this is not the situation for comparisons between the fixed parameter
models M1 and M2, or between the random parameters models M3 and M4,
where attribute processing clearly influences VTTS in an upward direction.
We also observe that the incorporation of attribute processing reduces the
standard deviation of the VTTS quite considerably for both the fixed and
random parameter models, as well as increasing the mean estimate of VTTS.
What this suggests is that it is the allowance for attribute processing, and not the
allowance for preference heterogeneity within classes through random para-
meters, that is the key influence on the higher mean estimate of VTTS and
accompanying lower standard deviation. Model 3 is of particular interest, since it
suggests that in the absence of allowance for FAA, ANA, and ACMA, the mean
estimate of VTTS is significantly deflated but with an inflated standard deviation
when preference heterogeneity through random parameters is accommodated.
16.4.2 Conclusions
This section has introduced a generalization of the fixed parameter LCM
through a layering of random parameters within each class, and the redefinition
of classes as probabilistic decision rules associated with two specific attribute
processing rules. We implemented this extended model structure in the context
of a toll versus free road choice setting and estimated four models as a way of
seeking an understanding of the role of attribute processing in the presence of
fixed or random parameters within each probabilistic decision rule class.
What we find, for the data set analyzed, is that if attribute processing is
handled through discrete distributions defined in a sufficiently flexible way,
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
723 Latent class models
6
Noting that in all of the estimated models, we have preference heterogeneity of some kind, whether
discrete or continuous.
7
We are aware of only two studies that have estimated random parameter latent class models allowing for
ANA (Hess et al. 2012, Collins et al. 2013). They are not directly comparable with the current evidence in
this chapter because they do not allow for ACMA and multiple FAA classes, using instead a single FAA
class. The main finding, however, of both of these studies, is that the inclusion of random parameters and
ANA does improve the model fit. There is thus a consistent message under different assumptions re the role
of random parameters. Studies that introduce random parameters into LCMs without attribute processing
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
724 The suite of choice models
task, not only for the attribute processing strategies we have assessed, but for a
broader set of heuristics on how attributes are processed (see Chapter 21). There
is the possibility that our findings might be different for different data sets; this is,
however, not a concern about our evidence, but rather a reminder that behavioral
processes are often context dependent. If additional studies support the evidence
here on many occasions, then there is a case for recognizing the practical value
of selecting a latent class framework with fixed parameters for attribute proces-
sing, given that inclusion of random parameters adds very little in terms of
predictive performance while adding significant complexity in estimation.
Other authors have used the latent class structure to compare processing
heterogeneity with regard to several types of behavioral processes, with other
types of heterogeneity (e.g., scale, see Thiene et al. 2012, and taste, see Hess
et al. 2012). Although they deal with different decision processes and use
different model specifications, they offer general findings on the confounding
issue that is discussed in this chapter. They propose, like us, a latent class (or
probabilistic decision process) approach with some conditions imposed on
classes to reflect a decision process. They then layer additional heterogeneity
on top (random taste or scale) to establish the robustness of both the speci-
fications of heterogeneity, and the alternative model specifications that repre-
sent the different decision processes. They conclude that the latent class
approach has great merit as a framework within which to represent multiple
decision processes with and without a random parameter treatment.
are Greene and Hensher (2012), who found clear improvement with random parameters added to a LCM;
Bujosi et al. (2010), who also found an improvement, albeit with the random parameters making only a
small contribution (in line with only a small improvement of a RPL model over the MNL), and Vij et al.
(2012), who does not report the LCM without random parameters, so a comparison cannot be made.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
725 Latent class models
load;file=C:\projects-active\Northlink\Modeling\Brisb08_30Oct.sav$
create
;time = ff+sdt+sst
;cost=rc+tc
;if(tc#0)tollasc=1
;sdst=sdt+sst
;ttime=ff+sdst
;tcost=rc+tc$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
726 The suite of choice models
;lcm
;model:
U(Curr) = FF*FF + SDST*SDT + sdst*sst+RC*rc +TC*Tc +ref
+commref*commute +prea*prea+prla*prla/
U(AltA) = FF*FF + SDST*SDT + sdst*sst+ RC*rc +TC*Tc+sc1
+prea*prea+prla*prla/
U(AltB) = FF*FF + SDST*SDT + sdst*sst+ RC*rc +TC*Tc
+prea*prea+prla*prla
;rst=
? FAA1:
b1ff,b2sdt,b3rc,b4tc,bref,bcomr,bpea,bpla,bsc1, ? class 1
? FAA 2:
bx1aff,bx2sdt,bx3rc,bx4tc,bxref,bxcomr,bxpea,bxpla,bxsc1, ? class 2
? FAA 3:
by1ff,by2sdt,by3rc,by4tc,byref,bycomr,bypea,bypla,bsc1, ? class 3
? ANA 1:
0,b2asdt,b3arc,b4atc,bref2,bcomr2,bpea2,bpla2,bsc2, ? class 4
? ANA 2:
b1bff,0,b3brc,b4btc,bref3,bcomr3,bpea3,bpla3,bsc3, ? class 5
? ACMA:
bffsdt,bffsdt,brctc,brctc,bref4,bcomr4,bpea4,bpla4,bsc4$ ? class 6
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
727 Latent class models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
728 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
729 Latent class models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
730 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
731 Latent class models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
732 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
733 Latent class models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
734 The suite of choice models
expðλq x0 it;j βÞ
Prðj j qÞ ¼ XJi : ð16:25Þ
0
j¼1
expðλ x
q it;j βÞ
This latent class estimator is any LCM specification, with ;SLCL. An example
is provided below. It is important to recognize that the comparison of para-
meter estimates between a MNL and a LCM is not possible, since each model
is subject to a different scaling of the parameter estimates that is related to the
scale factor of the unobserved Gumbel error component. For each model,
these scale parameters are normalized in estimation (essentially to 1.0),
thereby preventing any meaningful comparison of parameter estimates
between the two models. The comparison of the marginal rates of substitution
(i.e., WTP) between attributes in the one model does make sense, since the
scale effect is neutralized.
As an aside, The scaled LC model is not a simulation-based estimator. The scale factors are
fixed parameters, not random parameters.
---------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -199.97662
Estimation based on N = 210, K = 5
Inf.Cr.AIC = 410.0 AIC/N = 1.952
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
735 Latent class models
----------------------------------------------------------------------------------------
Scaled Latent Class MNL Model
Dependent variable MODE
Log likelihood function -195.43089
Restricted log likelihood -291.12182
Chi squared [ 7](P= .000) 191.38186
Significance level .00000
McFadden Pseudo R-squared .3286972
Estimation based on N = 210, K = 7
Inf.Cr.AIC = 404.9 AIC/N = 1.928
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
No coefficients -291.1218 .3287 .3212
Constants only -283.7588 .3113 .3035
At start values -199.9788 .0227 .0118
Response data are given as ind. choices
Number of latent classes = 2
Average Class Probabilities
.462 .538
LCM model with panel has 30 groups
Fixed number of obsrvs./group= 7
BHHH estimator used for asymp. variance
Number of obs.= 210, skipped 0 obs
----------+----------------------------------------------------------------------------
Standard Prob. 95% Confidence
MODE| Coefficient Error z |z|>Z* Interval
----------+----------------------------------------------------------------------------
|Random LCM parameters in latent class -->> 1
GC|1| -.00943** .00382 -2.47 .0136 -.01692 -.00194
TTME|1| -.05564*** .01988 -2.80 .0051 -.09461 -.01668
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
736 The suite of choice models
expðβ0 q xi;j Þ
Probði; j j qÞ ¼ : ð16:27Þ
ΣJj¼1 expðβ0 q xi;j Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
737 Latent class models
βq is one of the 2K possible vectors β in which m of the elements are zero and K −m
are non-zero. Specifically, q can be thought of as a masking vector of the form (δ1,
δ2, δ3, δ4,. . .), where each δ takes the possible values 0,1. βq is then the “element for
element product” of this masking vector, with the standard coefficient vector β,
indicating that the masking vector interacts with the coefficient vector. For
example, for two attributes (classes), the parameter vectors would appear
β1=(0,0), β2=(βA,0), β3=(0,βB), β4=(βA,βB).8 However, it is an important part of
the underlying theory that the class q is not defined by the attribute taking value
zero within the class but by the corresponding coefficient taking the value zero.
Thus the “random parameters” aspect of the model is a discrete distribution of
preference structures across individuals who are distinguished by whether they
pay attention to the particular attribute or not.
Since (in our case) the sorting is not observable, we cannot directly con-
struct the likelihood function for estimation of the parameters. In keeping
with the latent class approach, we need to estimate a set of probabilities (πq)
that each individual i falls into class q. While this could be conditioned on
individual characteristics, in this case we have assumed that the same set
applies equally to all respondents, so that the probabilities reflect the class
proportions.
Hence the marginal probability that individual i will choose alternative j is
found by averaging over the classes, as in Equation (16.28):
As formulated, this is a type of finite mixture, or LCM. It differs from more familiar
formulations in that the non-zero elements in βq are the same across the classes
and the classes have specific behavioral meaning, as opposed to merely being
groupings defined on the basis of responses, as in the strict latent class formulation,
hence the reference to a probabilistic decision process model. Estimation of the
probabilistic decision process model is as straightforward as a latent class MNL
model with linear constraints on the coefficients, as suggested above.
It should be noted that although the 2k approach offers plenty of scope to
investigate a number of attribute non-attendance profiles, we are of the view
that imposing behaviorally plausible conditions through the restriction
8
In this example, there is one unrestricted parameter vector in the model, shown as β4 = (βA,βB). The other
parameter vectors are constructed from the same two parameters, either by setting one or both elements
to zero or by equating elements to those in β4. Thus, β3 = (0, βB) is obtained as a linear restriction on β4,
namely that one element equal zero and a second element equal the corresponding element in β4.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
738 The suite of choice models
The 102, 103 or 104 means that in the RHS list, the first 2, 3, or 4 attributes
are endogenous, and there will be 22 or 23 or 24 classes. Nlogit allows up to 4
endogenous attributes, which produces 16 classes. Altogether it allows up to
300 parameters. It does proliferate parameters very fast. If you have pts = 104
and 3 other attributes you would have 16*(3 + 4 + 1) = 128 parameters. This is
the binding constraint, even though the parameters are repeated in the
formulation of the model. The model output below, estimated with
;pts=103, has 23 (or 8) classes in which various parameters are set to zero:
LClogit
;lhs=choice1,cset3,Alt3
;choices=Curr,AltA,AltB
;rhs=congt,rc,tc
;rh2=one
;lcm
;pts=103
;pds=16$
Normal exit: 5 iterations. Status=0. F= 3461.130
----------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -3461.12961
Estimation based on N = 4480, K = 5
Information Criteria: Normalization=1/N
Normalized Unnormalized
AIC 1.54738 6932.25923
Fin.Smpl.AIC 1.54738 6932.27264
Bayes IC 1.55453 6964.29612
Hannan Quinn 1.54990 6943.55032
Model estimated: Sep 16, 2010, 14:03:16
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;. . .; RHS=ONE$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
739 Latent class models
Chi-squared[ 3] = 467.23551
Prob [ chi squared > value ] = .00000
Response data are given as ind. choices
Number of obs.= 4480, skipped 0 obs
-----------+---------------------------------------------------------------------------
| Standard Prob.
CHOICE1| Coefficient Error z z>|Z|
-----------+---------------------------------------------------------------------------
CONGT|1| -.07263*** .00464 -15.65 .0000
RC|1| -.33507*** .03749 -8.94 .0000
TC|1| -.27047*** .02198 -12.31 .0000
A_CURR|1| .89824*** .05751 15.62 .0000
A_ALTA|1| -.05025 .05603 -.90 .3698
-----------+---------------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
----------------------------------------------------------------------------------------
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
740 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
741 Latent class models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:37:43 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.020
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
17.1 Introduction
This chapter introduces one of the fundamental pillars of choice modeling, the
canonical model for choice between two alternatives. At the most basic level,
the model describes the activity between taking an action and not taking that
action – i.e., whether or not to use public transport to commute to work,
whether or not to purchase a car, whether or not to accept an offered plan for
delivery of a utility service such as electricity, and so on. A straightforward
extension that provides a bridge to most of the choice models discussed
elsewhere in this book describes the choice between two specific alternatives –
i.e., whether to use public transport or drive one’s own car to commute to
work, whether to choose a new technology (e.g., electric) vehicle or a con-
ventionally powered vehicle, or whether to choose a utility plan that includes
time varying rates or one that does not (but includes other desirable features).
We begin with the essential binary choice between an outcome and “not.”
Issues of specification, estimation, and inference are detailed. We will then
extend the model in several directions, concluding with multiple equation
situations and analysis of panel data. Some of the econometric presentation is
an interpretation of material already covered in earlier chapters; however, we
believe it is useful to include the material here as a way of relating the essential
elements to the popular binary choice model.
We begin with two essential assumptions that underlie the choice modeling
strategy throughout this book:
742
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
743 Binary choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
744 The suite of choice models
Ui1>Ui0
or
Ui1−Ui0> 0.
1
A technical fine point: for this formulation to be completely consistent, we will ultimately require β1 to
contain a constant term. If not then, rather than zero, we will have to choose some arbitrary constant for
Ui0 and that will end up being the constant term in β1.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
745 Binary choice models
where μ1 and σ1 are the mean and standard deviation of εi1 and Φ(.) is the
standard normal CDF.
We return to the role of the constant term in the model. If β1 contains a
constant term, say α, then β10 xi1 – μ1 =(α – μ1) – γ10 xi1*, where γ1 is the rest of
2
We are ultimately interested in rich specifications that involve complicated choice processes based on
numerous attributes and that take advantage of observable data on individual heterogeneity such as
income, age, gender, location, etc. The third path for model building, non-parametric analysis, for all its
generality, is difficult to extend to these multi-layered settings. We will not be considering non-parametric
analysis in this book. Readers are referred to more specialized sources such as Henderson and Parmeter
(2014) or Li and Racine (2010) for discussions of this topic.
3
Modern software such as NLOGIT and Stata provide menus of distributions for binary choice models
that include seven or more choices. Those beyond the probit and logit are easy to program, but not
particularly convincing methodologically. The availability of these exotic models (e.g., ArcTangent) is not
a persuasive motivation for using them as a modeling platform. For better or worse, the probit and logit
models remain the dominant choices. We will consider the choice between these two shortly.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
746 The suite of choice models
β1 and xi1* is the rest of xi1 not including the constant. So, the model with a
constant term equal to α and mean of εi1 equal to μ is exactly the same as the
model with mean of εi1 equal to zero and a constant term equal to (α – μ1) = α*.
This means that, in our model, we must either drop α or we must drop μ1. It
turns out to be much more convenient later to normalize μ1 to zero and let the
constant term in the utility function be called α, or γ0, etc.
Now, consider the variance. At this point, our model is:
0
!
0 β1 xi1
probðUi1 >0Þ ¼ Probðβ1 xi1 þ εi1 Þ > 0 ¼ Φ : ð17:8Þ
1
Bearing in mind that we do not observe utility, but only whether utility is positive
or not (i.e., only whether the individual chooses alternative 1 or not), consider
what happens if we scale the whole model by a positive constant, say C. Then:
The implication is that our model is the same regardless of what σ1 is. To
remove the indeterminacy, we set σ1 to 1. This makes sense. It is important to
note that this is not an “assumption” (at least, not one with any content or
implication). This is a normalization based on how much information about
our model will be contained in the observed data. Intuitively, the data contain
no information about scaling of the utility function, only its sign as revealed by
whether choice 1 is made or not. This sign does not change if σ changes. (In
fact, when our model is based on the logistic distribution, the assumed
standard deviation is π/√6, not 1. The essential point is that it is a fixed
known value, not a parameter to be estimated.)
To complete this part of the specification, we return to the choice between
two specific alternatives:
Choose alternative 1 if Ui1 Ui0 ¼ ðβ1 0 xi1 þ ei1 Þ – ðβ0 0 xi0 þ ei0 Þ
¼ β1 0 xi1 –β0 0 xi0 þ ei 1 ei0 > 0 : ð17:10Þ
0
¼ β x i þ ei > 0
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
747 Binary choice models
One detail left unconsidered thus far is the possibility of correlation between
εi1 and εi0. Denote that correlation ρ10. Note, once again, the information
contained in the sample. We only observe whether choice 1 or choice 0 is
made. The decision turns on the difference of the two utility functions.
Whether the correlation of εi1 and εi0 is non-zero (ρ) or not has no influence
on the observed outcome. The upshot is that when modeling the utility-
maximizing choice between two alternatives, we have no information on
correlation across the two utility functions, as we can only observe the sign
of the difference. Therefore, we normalize the correlation at zero. (Once again,
it is important to note, this is not a substantive assumption. It is a normal-
ization that is mandated by the fact that we only observe the one sign of the
difference of the utility functions.4)
where the indicator function 1[condition] equals one if the condition is true and
zero if not. With these normalizations, our model in terms of the observed data is:
4
We have left one final loose end in this derivation. In the two specific-choice cases, we have two random
terms. Could they have different variances – σ12 and σ02? Under some additional assumptions, yes. Note,
for example, with ρ = 0, we now require that (σ12 + σ02) = 1. Obviously, it is not possible to estimate both,
or to distinguish σ1 from σ0. However, one might think that if σ1 were fixed at some value, then we could
estimate σ0. Under some circumstances, it is indeed possible to estimate the ratio, σ1/σ0. This is a
complication of the model, a type of heteroskedasticity, that we will consider later. For the present, given
the simple assumptions we have made so far, the data do not provide information about this type of
heteroskedasticity either.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
748 The suite of choice models
where Φ(.) denotes the standard normal CDF. In the statement above, we
have assumed that the mean and variance of ε are zero and one, respectively.
If, instead, we build our model around the logistic distribution, then Prob
(Yi1 = 1|xi1) = Λ(β1,L0 xi1), where Λ(.) denotes the standard logistic
distribution:
expðβ0 xi Þ
Λðβ0 xi Þ ¼ : 17:13Þ
1 þ expðβ0 xi Þ
where f(.) is the respective density (normal or logistic) and β1,k is the corre-
sponding coefficient. The two models do appear to involve different coeffi-
cients, but at the same time the slopes, or partial effects implied by the two
models are (as we would hope), essentially the same. If so, then for the
particular variable, xk, the partial effects:
Over much of the range of the probabilities that one encounters in practice,
say about 0.4 to about 0.6, this ratio is near 1.6 (or a bit less). This explains the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
749 Binary choice models
empirical regularity. The estimator, to the extent that it is able, scales the
coefficients so that the predicted partial effects are roughly the same.
Consider, finally, the second choice case we examined, two specific
alternatives:
Choose alternative 1 if Ui1 Ui0 ¼ ðβ1 0 xi1 þ ei1 Þ – ðβ0 0 xi0 þ ei0 Þ
¼ β1 0 xi1 –β0 0 xi0 þ ei1 ei0
17:17Þ
¼ β0 xi þ ei
> 0:
If the two random components have a type I Extreme value distribution, with
cdf (see Chapter 4):
expðβ01 xi1 Þ
ProbðUi1 Ui0 > 0jxi1 ; xi0 Þ ¼ : ð17:19Þ
expðβ01 xi1 Þ þ expðβ00 xi0 Þ
We should note some special cases. First, consider a variable, zi, such as age or
income, and let the coefficients of zi be γ1 and γ0. The choice probability is:
expðβ01 xi1 þ γ1 zi Þ
ProbðUi1 Ui0 > 0jxi1 ; xi0 Þ ¼ : ð17:20Þ
expðβ01 xi1 þ γ1 zi Þ þ expðβ00 xi0 þ γ0 zi Þ
Suppose that the two coefficients for income are the same. Then the prob-
ability is:
expðβ0 1 xi1 þ γ1 zi Þ
ProbðUi1 Ui0 > 0jxi1 ; xi0 ; zi Þ ¼
expðβ01 xi1
þ γzi Þ þ expðβ00 xi0 þ γzi Þ
expðγzi Þexpðβ0 1 xi1 Þ
¼
expðγzi Þexpðβ01 xi1 Þ þ expðγzi Þexpðβ00 xi0 Þ
expðβ0 1 xi1 Þ
¼ :
expðβ01 xi1 Þ þ expðβ00 xi0 Þ
ð17:21Þ
Income has disappeared from the probability. This implies a general result.
When comparing the utility functions of different alternatives, the coefficients
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
750 The suite of choice models
on variables that do not vary between (among) the choices must have different
coefficients (also detailed in Chapter 3).
A second case to examine is that in which the coefficients on attributes of
the choices are the same – that is typical when the coefficients are interpreted
as marginal utilities that do not change from one choice to the next. The
relevant probability is:
expðβ0 xi1 Þ
ProbðUi1 Ui0 > 0jxi1 ; xi0 Þ ¼
expðβ xi1 Þ þ expðβ0 xi0 Þ
0
The natural result is that when comparing the utilities of alternatives in which
the marginal utilities are the same, we base the comparison on the differences
between the attributes.
The preceding shows different aspects of the functional form for binary choice
models. We will use a generic format, given in Equation (17.23) to indicate the
general result, and note the special cases when they arise in our applications:
The two distributions we are interested in are symmetric. For both probit and
logit models:
qi = 2yi – 1. (17.26)
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
751 Binary choice models
∂logL Xn
yi Fi Xn
¼ fi x i ¼ gx:
i¼1 i i
ð17:29Þ
∂β i¼1 Fi ð1 Fi Þ
This messy expression simplifies considerably for the two models we are
considering. Using the derivative results given earlier, for the probit model,
it reduces to:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
752 The suite of choice models
½
Xn
VROBUST ¼ VE i¼1
gi2 xi x0 i : ð17:37Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
753 Binary choice models
log LðmodelÞ
Pseudo-R2 ¼ 1 : ð17:38Þ
log Lðbase modelÞ
The base model would be a model that contains only a constant term. It is easy
to show that for any binary choice model (probit, logit, other), the base model
would have:
^ 0 xi Þ > :5:
^y i ¼ 1½Fðβ ð17:40Þ
The logic being that if the model predicts that y = 1 is more likely than y = 0,
predict y = 1, and vice versa. Then, different tallies of success and failure of the
rule, such as Cramer’s measure:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
754 The suite of choice models
ΣN ^ ΣN
i¼1 yi F ð1 yi ÞF^
^λ ¼ i¼1
N1 N0
¼ ðMean F^ j when y ¼ 1 Þ ðMean F^ j when y ¼ 0Þ: ð17:41Þ
The derivative term is the density that corresponds to the probability model,
i.e., the normal density, φi for the probit model and Λi(1−Λi) for the logit
model. In all cases, the partial effects are scaled versions of the coefficients.
Two issues that arise in computing partial effects are:
Since the partial effects involve the data, at which values of xi should δik be
computed? “Partial Effects at the Means” are often obtained by replacing xi
in the computation with the sample means of the data. More common in
the recent literature is the Average Partial Effect, which is computed by
evaluating δik at each sample point in the data set and averaging the results.
Since the partial effects are non-linear functions of the parameter estimates
(and the data), some method is needed to calculate standard errors for the
estimates of δik(β,xi). The delta method and the Krinsky–Robb method
discussed in Chapter 7 are generally used for this purpose.
In many applications, the exogenous variables, xi, include demographics such
as gender, marital status, age categories, and so on, that are coded as binary
variables. The derivative expression above would be inappropriate for mea-
suring the impact of a binary variable. The more common calculation is a first
difference:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
755 Binary choice models
Consider, now, the change in the odds that results from a change in a dummy
variable, z:
5
For any variable, say z, the ratio of odds ratios when z changes by one unit is OR(x,z+1)/OR(x,z) = exp(βz).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
756 The suite of choice models
6
The data can be downloaded from the Journal of Applied Econometrics data archive.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
757 Binary choice models
that this is clearly not the case. We assume it for the present for simplicity.) The
sample of observations on the insurance takeup variables is shown in Table 17.2.
The variables we will examine are summarized in Table 17.3.
Tables 17.4 and 17.5 display estimated probit and logit models for the add
on insurance takeup variable. There is (as should be expected) little to
distinguish the two models. The LLs and other diagnostic statistics are the
same. Based on the chi squared value, the model as a whole is found to be
statistically significant. The patterns of signs and significance of the coeffi-
cients are the same for the two models as well. The predicted scaling effect
between the logit and probit coefficients is evident in the results as well,
though the difference is greater than the familiar 1.6 in our results. This is
to be expected. Recall, the scaling acts to (more or less) equalize the partial
effects predicted at the middle of the data for the two models. In our add on
insurance model, the average outcome is only 0.0188, which is far from 0.5
where the 1.6 result is most obvious. But, here the anticipated ratio would be:
Given this result, we should expect the logit coefficients in Table 17.5 to be
roughly 2.5 times the probit coefficients in Table 17.4, which they are. For
example, the AGE coefficient in Table 17.5, 0.01776, is about 2.6 times that in
Table 17.4, 0.00678. The results suggest that older, more educated, female, and
higher income individuals are more likely than others to take up the add on
insurance. Marital status and presence of young children in the household
matter less than the other variables. Predictably, those who perceive them-
selves as more healthy are less likely to take up the add on insurance. (The add
on insurance enhances the coverage for hospitalization.) Table 17.5 also shows
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
758 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
759 Binary choice models
the “odds ratios” for the estimated logit model. Which of these results is more
informative would be up to the analyst. We find the coefficients and associated
partial effects generally more informative.
Table 17.6 displays several fit measures for the probit model (the results
are essentially the same for the logit model) and illustrates the difficulty of
contriving a scalar analog to the familiar R2 used for linear regression. As
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
760 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
761 Binary choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
762 The suite of choice models
noted, there are two types of measures suggested for the binary choice
model. The first is based on the LL, and modifies McFadden’s original
suggestion:
7
We note the availability of a Stata program called FitStat that will produce about twenty different “fit”
measures for binary choice models.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
763 Binary choice models
^ i Þ2
Σi ðyi P
Efron ¼ 1 ð17:51Þ
Σi ðyi n1n Þ2
and
^ i þ ð1 yi Þð1 P
Σi yi P ^ iÞ
Ben-Akiva and Lerman ¼ ð17:52Þ
n
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
764 The suite of choice models
Table 17.7 Estimated partial effects for logit and probit models
PARTIALS ; Effects : <x> ; Summary $
Semielasticities
--------------------------------------------------------------------------------------------------
Partial Effects for Logit:Probability(ADDON=1)
Partial Effects Computed at data Means. Log derivatives
*==> Partial Effect for a Binary Variable
-------------------------------------------------------------------------------------------------
Partial Standard
(Delta method) Effect Error |t| 95% Confidence Interval
-------------------------------------------------------------------------------------------------
AGE .01748 .00472 3.71 .00824 .02672
EDUC .15989 .01550 10.31 .12950 .19027
FEMALE .32102 .09075 3.54 .14315 .49889
MARRIED .08501 .12064 .70 -.15145 .32146
HHKIDS .10366 .10369 1.00 -.09957 .30690
INCOME 1.47777 .16767 8.81 1.14915 1.80639
HEALTHY -.01222 .09432 .13 -.19708 .17264
--------------------------------------------------------------------------------------------------
observation in the sample separately, then averaging the effects. This usually
makes relatively little difference in the results, though that is not axiomatic –
the difference in a relatively small sample can be substantive. As a general rule,
researchers prefer to use the latter approach, average partial effects, when
possible. Table 17.8 illustrates another, practical complication. The sample
means of the data do not reveal which variables are dummy variables and
which are not, whereas that aspect is obvious when computing the average
partial effects. One might be interested in a combination of these two ways of
evaluating partial effects. In Table 17.9, we have computed the average partial
effect for 40-year-old individuals with average income and 16 years of
education.
The partial effect for AGE in Table 17.6 illustrates, once again, the
ambiguity of the simple derivatives as measures of the impact of the
variables on the outcome under study. The estimated effect of 0.0003
seems extremely small. But, again, the probability of takeup varies around
0.0188, and AGE ranges from 25 to 65 in the data set. A change in age of
10 years would be associated with a 0.003 impact on the probability,
which is about 1/6 of the value. Figure 17.1 and Table 17.10 trace the
estimated probability of takeup over the range of the sample data. We
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
765 Binary choice models
find that the average occurs at about age 40, which we can see in
Table 17.3. But, over the range of the data, from 25 to 64, the estimated
takeup probability ranges from 0.014 to 0.026 – i.e., it nearly doubles.
This is not a trivial effect at all.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
766 The suite of choice models
.0250
.0200
.0150
.0100
25 38 51 64
Age
Figure 17.1 Model simulation
There are many variants of the binary choice models that we have devel-
oped here. These are treated in a variety of sources such as Greene (2012) and
Cameron and Trivedi (2005). In the next two sections, we will describe two of
these that appear in many studies in the literature, some panel data models,
and three bivariate probit models.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
767 Binary choice models
The GSOEP data we are using for our applications are an unbalanced panel, as
described at the beginning of Section 17.2.8. The natural next step in the
analysis would be to accommodate, or take advantage of, the panel data nature
of the data set. At the start, this is occasionally a source of some confusion.
What distinguishes a “panel data’”treatment from what we have already done
in Section 17.2? In what follows, we will take a working framework for panel
data analysis to mean some sort of treatment or specification that explicitly
recognizes the correlation of unobservable or unobserved heterogeneity across
the observations within a group. Consider the base case of our random utility
model in Equation (17.53):
The double subscript, “it” indicates that the observation applies to individual i
in period t. In particular, for our example of takeup of add on insurance, we
have observed the individual in several (up to seven) years. This understand-
ing need not apply to a time series of observations such as this, however. There
are many SC experiments, some described elsewhere in this book, in which the
sampled individual is offered a sequence of choice settings – for example, for
different configurations of travel mode, or road formats, or utility contracts.
Each of these is logically the same as the panel data case suggested.
Thus far, we have not distinguished a “panel data” approach from what we
have already done. The model above is precisely what we have used in our
example. But it would seem obvious that an element of the random utility
specification would be characteristics of the chooser that are intrinsic and
unchanging aspects of their preferences. A modification of the RUM that
would accommodate this possibility would be as in Equation (17.54):
The new term, αi, would include intrinsic characteristics of the individual i
that are unmeasured and unobserved by the analyst. (Observed heterogeneity,
such as gender, would present no interesting challenge here – observed
heterogeneity would simply be included in xit.)
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
768 The suite of choice models
As in the linear regression setting, in order to proceed with the analysis in the
presence of this possibly crucial new variable, assumptions are necessary. The two
standard cases that carry over from the regression case to this binary choice model are:
Fixed effects: E[ai | xi1,xi2,. . .,xiT] = g(Xi). The heterogeneity is correlated
with Xi.
Random effects: E[ai | xi1,xi2,. . .,xiT] = 0. The heterogeneity is not correlated
with Xi.
For each case, we consider the implication of the condition for the conven-
tional MLE, and then examine formal procedures that extend the familiar
model to this new setting.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
769 Binary choice models
N XN XTi XTi
VCLUSTER ¼ H1 ð git x it Þð g x0 Þ H 1 ;
t¼1 it it
ð17:57Þ
N1 i¼1 t¼1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
770 The suite of choice models
disadvantage of this estimator is that because the effects (the constant terms)
are conditioned out of the model and not estimated, it is not possible to
compute predicted probabilities or partial effects. As such, the fixed effects
approach is of limited usefulness, and only appears fairly rarely in received
applications.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
771 Binary choice models
Inserting the second equation into the first, then proceeding as before, we have
a random effects model to which the vector of group means (for the time
varying variables) is added to control for the correlation between the effects
and the other variables (Tables 17.13, 17.14).
The Mundlak approach (adding the group means to the model) suggests a
method of distinguishing between fixed and random effects. In Equation
(17.59), if the coefficients on the group means are all zero, then what remains
is a random effects model. The presence of the group means in the model is
necessitated by the conditions of the fixed effects model. Thus, a joint test of
the null hypothesis that the coefficients on the means are all zero is, de facto, a
8
Quadrature- and simulation-based estimation are discussed at length in Greene (2012). See also
Chapter 5.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
772 The suite of choice models
test of the null hypothesis of the random effects model against a broader
alternative. We can use a likelihood-ratio test (LRT). The LL for the random
effects model is −2074.52. That for the model that contains the means
is −2028.73. Twice the difference is 91.58. The critical value for the Chi-
square distribution with 6 degrees of freedom is 12.59. The null hypothesis
of random effects would be rejected in favor of a fixed effects specification.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
773 Binary choice models
results below illustrate how to introduce this sort of heterogeneity into the
binary choice model. (We have used the PUBLIC insurance takeup for this
exercise.) Table 17.15 shows a random parameters model. The fixed para-
meters values of the coefficients are given in parentheses with the means of the
distributions. A random parameters approach suggests that one should rein-
terpret the meaning of statistically different from zero. The fixed parameters
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
774 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
775 Binary choice models
estimated fixed value is about 1.25 standard deviations from the estimated
mean. Zero is well over two standard deviations from the estimated mean,
which suggests nearly all of the mass of the distribution of the random
parameter on FEMALE is above zero. The null hypothesis of the fixed
(non-random) parameters model can be tested against the alternative of
the random hypothesis using a LRT. The necessary values for this model are
shown in Table 17.15. The Chi-square statistic of 6789.29 is far larger than
the critical value for 8 degrees of freedom, so the fixed parameter model is
rejected.
9
See, for example, Greene and Hensher (2010), which includes nearly 100 pages specifically on binary
choice modeling.
10
The model can be extended to more than two equations in analogous fashion. When the number of
equations exceeds two, the probabilities become much more difficult to compute. See Section 4.3 for a
discussion.
11
There is no natural form of the bivariate logit model, so from this point forward, we will focus on the
probit model.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
776 The suite of choice models
the correlation across the equations.12 For an example, our health care data
includes two measures of utilization of the health care system, number of
doctor visits, and number of inpatient hospital visits. We have recoded these
to be DOCTOR = 1(DocVisits > 0) and HOSPITAL = 1(HospVisits > 0). One
would expect these to be correlated, though not perfectly so. Table 17.16
shows the mix of these two variables in our data.
Given that the two-choice responses are binary, even after accounting for
the exogenous variables, ρ would not be defined as the familiar Pearson
product moment correlation as used for continuous variables; the so-called
“tetrachoric correlation”13 is used as the appropriate measure of the correla-
tion between two binary variables. Looking at the data in Table 17.1, it is
unclear what to expect for the value of ρ. In view of the large off-diagonal
element, one might suspect it to be large and negative. We would measure the
simple, unconditional tetrachoric correlation for two binary variables as the
correlation coefficient in a bivariate probit model that contained only two
constant terms, and no regressors. For these data, that value is about +0.31. As
exogenous variables are added to the model, the correlation will move toward
zero. Some of the correlation across equations that is accounted for by omitted
variables is eliminated as the variables are added to the equations.
The LL for estimation of the bivariate probit model parameters is given in
Equation (17.61):
12
A technical motivation for fitting the two equations jointly is the possible gain in efficiency (reduction in
standard errors) that might attend the FIML estimator compared to the two LIML estimators. This effect
is likely to be minor, however, as suggested in our application.
13
The tetrachoric correlation is used when it is assumed that there are normally distributed latent
continuous variables underlying the observed binary variables. The tetrachoric correlation estimates the
correlation between the assumed underlying continuous variables. The formal definition of the
tetrachoric correlation for two binary variables is exactly consistent with what we would define as the
correlation of the two random terms in our bivariate probit model.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
777 Binary choice models
XN
logL ¼ i¼1
logΦ2 ½ðqi1 β0 1 xi1 Þ; ðqi2 β0 2 xi2 Þ; ðqi1 qi2 ρÞ: ð17:61Þ
For one example, Table 17.18 (based on Table 17.17) shows the partial effects
of the regressors on the conditional probability that the individual had at least
one hospital visit, given that they had at least one doctor visit.
There are many variants of the bivariate (and multivariate) probit models in
the received applications. We will consider two that are fairly common.
14
There is no natural form of the bivariate logit model, so from this point forward, we will focus on the
probit model.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
Table 17.17 Estimated bivariate probit model
----------------------------------------------------------------------------------------------------------------------
15
Early treatments of this model, e.g., in Maddala (1983), wrote the formulation on the RHS in terms of the
latent index rather than the observed outcomes. This does overcome the coherency problem, but it is not
a natural formulation of the behavioral aspect of the specification.
16
There is no natural form of the bivariate logit model, so from this point forward, we will focus on the
probit model.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
780 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
781 Binary choice models
y2 through its effect on y1 and thence to y2. For example, the conditional mean
for y2 given y1 is as given in Equation (17.66):
E[y2 | y1 = 1, x1, x2] = Φ2 (β10 x1, β20 x2 + γy1 ρ) / Φ (β10 x1). (17.66)
This decomposition is shown in Table 17.20 for the health care model
estimated in Table 17.19. Alternatively, the unconditional mean function is
as shown in Equation (17.67):
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
782 The suite of choice models
17
A common application in the received literature studies loan default (the binary outcome of interest) for
those loan applicants whose application is accepted (i.e., are selected).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
783 Binary choice models
Table 17.21 displays the estimates of the model. In specifying the model, we
have defined the equation for public based on the usual demographics. The
add on insurance is associated with employer provided health insurance, so
our equation for add on includes several variables related to that, such as
whether the individual is self-employed, is a public servant (BEAMT), and
whether they have a “blue collar” or “white collar” job. The model contains a
direct test of the “selection effect.” If ρ equals zero in the model, then the log
likelihood becomes:
X
yi1 ¼0
logΦ½β0 1 xi1 þ
log L ¼ X
yi1 ¼1
logfΦ½β0 1 xi1 Φ½qi2 ðβ0 2 xi2 Þg : ð17:70Þ
XN X
¼ i¼1
logΦ½qi1 β0 1 xi1 þ y ¼1
logΦ½qi2 ðβ0 2 xi2 Þ
i1
This LL would be maximized by fitting separate probit models for yi1 and yi2,
using the observed data on yi2. This would not require any consideration of a
“selection” mechanism. In our results in Table 17.21, we find that the esti-
mated correlation is significantly different from zero, which does suggest a
selection effect in the data.
18
There is no natural form of the bivariate logit model, so from this point forward, we will focus on the
probit model.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
784 The suite of choice models
17.4.3 Application I: model formulation of the ex ante link between acceptability and
voting intentions for a road pricing scheme
The model examined in this application is the recursive bivariate probit model
of Section 17.4.1. A road pricing scheme is proposed. Respondents indicate
their acceptance (y1=1) or rejection (y1=0) of the proposal, and their intention
to vote yes (y2=1|y1) or not (y1=0|y1) The model suggested is summarized in
Equation (17.71):
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
785 Binary choice models
Observations on y1 and y2 are available for all individuals; yij (j=1,2) are the
unobserved variables representing the latent utility or propensity of choosing
to Accept or Vote for a specific RP scheme.
The endogenous nature of Accept is explicitly accounted for in the formula-
tion of the LL. The LL is expressed in terms of Prob (Vote =1, Accept =1) =
Prob (Vote =1 |Accept = 1)*Prob (Accept =1). The marginal probability for
Accept = 1 is Φ(β10 x1) and the conditional probability for (Vote =1|Accept = 1)
is Φ2(β10 x1,β20 x2 + γ1Accept ρ)/Φ(β10 x1). Collecting terms, we have:
(We have included the terms γ(Accept=0) to make explicit the form of the
model. Of course, if Accept = 0, the whole term is zero.)
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
786 The suite of choice models
and expand upon the existing road network, to reduce income tax, to con-
tribute to general government revenue, and to be used to compensate toll road
companies for loss of toll revenue. The cordon-based charging scheme and a
distance-based alternative were also described by either a peak and off-peak
cordon-based charging amount or a peak or off-peak per kilometre distance-
based charge. Both non-status quo alternatives were also described by the year
proposed when the scheme would commence.
A Bayesian D-efficient experimental design was implemented for the study
(see Rose et al. 2008). The design was generated in such a way that the cost-
related attribute levels for the status quo were first acquired from respondents
during preliminary questions in the survey, while associated attributes for the
cordon-based and distance-based charging schemes were pivoted off of these
as minus percentage shifts representing a reduction in such costs for these
schemes. Pivoted attributes included average fuel costs and annual registra-
tion fees. Fuel costs were reduced by anywhere between zero percent and 50
percent of the respondent reported values, either representing no reduction in
fuel tax or up to a potential 100 percent reduction in fuel taxes. Registration
fees were reduced to between zero percent and 100 percent from the
respondent-reported values (see Rose et al. 2008 and Chapter 6 for a descrip-
tion of pivot type designs). Toll was only included in the status quo alternative,
being set to zero for the non-status quo alternatives since it is replaced by the
road pricing regime.19
The allocation of revenues raised was fixed for the status quo alternative,
but varied in the cordon-based and distance-based charging schemes over
choice tasks. The allocation of revenue was varied from zero percent to 100
percent for a given revenue stream category. Within a charging scheme, the
allocation of revenue was such that the sum had to equal 100 percent across all
possible revenue allocations.
The cordon-based charging alternative was described by a peak and off-
peak cordon charge. The peak charge varied between $2.00 and $20.00, while
the off peak charge varied between $0.00 and $15.00. Likewise, the distance-
based charge was described by two distance-based charging attributes, one for
trips taken during peak periods and the second for off-peak trips. The per
kilometre charge for the peak period ranged from $0.05 per kilometre to $0.40
per kilometre, while the off-peak distance-based charge varied between $0.00
19
The context here is one where tolls already exist, which might be replaced by a (more flexible) road
pricing scheme. This context differs a lot from countries (i.e., several countries in Europe) that do not
have tolls on a major scale, and where the issue is to introduce a form of road pricing that will not be
replacing tolls. Getting support might then be more difficult.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
787 Binary choice models
and $0.30 per kilometre. The ranges selected were based on ranges that we
believed would contain the most likely levels if implemented. The design was
generated in such a way that the peak cordon-based, and peak per kilometre-
based, charges were always equal to or greater than the associated off-peak
charges. Finally, the cordon-based and distance-based charging schemes were
described by the year the scheme would be implemented. In each case, this was
varied between 2013 (representing one year from the time of the survey) and
2016 (representing a four-year delay from the time of the survey). An example
of a choice screen is given in Figure 17.2.
20
Public acceptance can be achieved ex ante through a pilot scheme such as the Stockholm pilot, which is a
real demonstration of the merits of RP reform (see Eliasson et al. 2009). Alternatively, we have to rely on
identifying the extent of public acceptability of very specific RP schemes, ex ante, and ensure that the
support is sufficient to obtain a positive outcome in a referendum.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
788 The suite of choice models
y
Dawes Point
Hw
eld
Walsh Bay d
Balmain nR
dfi
kso Sydney
East Hic
Bra
Opera House
Darling St Millers Point
The Rocks Mrs
White Macquaries
Bay Park Barangaroo Chair
Darling Farm Cove
Harbour Royal Botanic
Port
Jones Bay Bridge St Gardens
Jackson Johnstons
Bay Pyrmont Woolloomooloo
Suzzox St
Bay Wynyard Bay
1
Elizabet St
Pitt St
York St
Sydney Elizabeth
4 Woolloomooloo
Kent St
Pyrmont Bay
Aquarium
40
Elizabeth
St
Market St
Cookie Bay
Cookle Potts Point
Ha
Bay
Bourke
Blackwattle
Hyde Park
rris
Bay
Sussex St
St
4 m St
t
ttle
Rd S
e Fig Rushcutters
idg
St
Ha
of Friendship O
xf
Eastern Distrib
St
ttle
or
d Burton St
St
St
Haymarket
Bay St
White City
Glebe Darlinghurst
Ed
de St
dy
Crown St
Av Alb
utor
n ion
Casca
St
Ro
Fove
Parramatta Rd a ux
St
31 St
Victoria Park
Separate models were initially estimated for the voting response and the
acceptability response; then recursive bivariate models were estimated with
non-random parameters. The final two models (3 and 4) are recursive bivari-
ate probits with random parameters for the RP scheme costs, distinguishing
the current cost components (i.e., registration and fuel costs) (Non-RP Cost)
and the proposed new costs associated with a cordon and a distance-based
charging regime (RP Costs). Models 3 and 4 differ by the inclusion in Model 4
of the Accept variable on the RHS of the Vote model. All models are summar-
ized in Tables 17.22 and 17.23.21 The set of explanatory variables was guided
in part by the findings in Hensher et al. (2012) as well as an extensive
investigation of the rich array of data items available. It is notable that the
21
The alternatives defining each binary response are taken from four choice scenario screens. To account
for the possibility that the response associated with a particular alternative is conditioned on the offered
set of three alternatives, we included three dummy variables to represent the four choice scenarios. These
variables were highly statistically non-significant and were excluded from the final models, giving us
confidence in the approach we have adopted.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
Table 17.22 Models of referendum voting and acceptance of road pricing schemes: 1
Sample = 2,400 observations from 200 individuals, with allowance for panel nature of data (i.e., 12 observations per individual). The covariance matrix is adjusted for
data clustering in Models 1 and 2; t-values in brackets.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
Table 17.22 (cont.)
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
791 Binary choice models
Table 17.23 Models of referendum voting and acceptance of road pricing schemes: 2
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
792 The suite of choice models
estimated asymptotic covariance matrix which ignores the clustering. Let gij
denote the first derivatives of the LL with respect to all model parameters for
observation (individual) i in cluster j, and G the number of clusters. Then, the
corrected asymptotic covariance matrix is given in Equation (17.74), a variant
of Equation (17.57):
X X
½
G G ni
Xn 0
i
^
Est:Asy:Var ½β ¼ V g g V;
G1 i¼1 j¼1 ij j¼1 ij
ð17:74Þ
where V = H−1 OPG H−1 and H is the negative of the second derivatives, and
OPG is the sum of the outer products of the gradients of the terms in the LL
function.
There is a noticeable improvement in the overall goodness of fit in moving
from the independent probit Model 1 (−2,802.0) to the bivariate probit Model
2 (−2,477.99) with non-random parameters. When we add in random para-
meters for the two cost variables, the LL for Model 3 improves even further
(−2407.062), with an additional improvement in Model 4 (−2402.12) when
Accept is introduced as a RHS endogenous variable in the Vote model. The
endogeneity of Accept is statistically significant, suggesting that the accept-
ability of a RP scheme is a positive and important influence on the probability
of voting for the scheme in a referendum, after allowing for a set of exogenous
effects. The mean elasticity estimates discussed later reaffirm the strength of
the influence.
The estimate of the correlated disturbances (rho) is 0.919 in Model 3 with a
standard error of 0.01465, producing a very high t-ratio. The Wald statistic for
the test of the hypothesis that rho equals zero is (0.919/0.01465)2 = 3,943.84.
For a single restriction, the critical value from the Chi-square table is 3.84;
hence the hypothesis is well and truly rejected. Model 3 does not include the
endogenous effect of acceptance on referendum voting. When we allow for the
endogeneity of acceptance (Model 4), rho declines, as expected, but is still
relatively correlated at 0.6684 with a t-ratio of 6.67, again rejecting the null
hypothesis on the Wald test. The non-random parameters bivariate probit
Model 2 has a correlated disturbance of 0.8805, statistically significant but
slightly lower than the correlation in Model 3 with two random parameters,
which is interesting of itself and suggests that the inclusion of preference
heterogeneity appears to induce some amount of increased association
between the unobserved influences. It is not clear why this is the case.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
793 Binary choice models
22
We did assess a log-normal, a constrained triangular, and a constrained normal, but the unconstrained
normal gave the best fit (converged well), and identified very few non-negative values in the distribution
as confirmed by Figures 17.3 and 17.4.
23
The time benefits were not directly communicated. There are, however, clues as to how respondents
perceive the benefits beyond monetary cost implications, notably potential travel time benefits. The
response to how effective the scheme is in reducing congestion must have some link to a view of
improved travel time. It was mentioned up front that these road pricing reforms are designed to reduce
traffic congestion.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
794 The suite of choice models
.0500
.0250
.0000
Range
–.0250
–.0500
–.0750
–.1000
0 40 80 120 160 200
Person
.0750
.0500
.0250
.0000
Range
–.0250
–.0500
–.0750
–.1000
–.1250
0 40 80 120 160 200
Person
.0000
–.0050
Range
–.0100
–.0150
–.0200
–.0250
0 40 80 120 160 200
Person
.010
–.000
Range
–.010
–.020
–.030
–.040
0 40 80 120 160 200
Person
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
795 Binary choice models
.0750
.0500
.0250
.0000
Range
–.0250
–.0500
–.0750
–.1000
–.1250
0 40 80 120 160 200
Person
.0500
.0250
.0000
Range
–.0250
–.0500
–.0750
–.1000
0 40 80 120 160 200
Person
.0000
–.0050
Range
–.0100
–.0150
–.0200
–.0250
0 40 80 120 160 200
Person
.010
–.000
Range
–.010
–.020
–.030
–.040
0 40 80 120 160 200
Person
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
796 The suite of choice models
that it does not impact on kilometres undertaken outside of the CBD, which is
the great majority of daily kilometres.
Regardless of the merits of each reform package in terms of the impact on
levels of traffic congestion, there are very strong arguments opposing any
reform if it discriminates between individuals on vertical equity grounds (i.e.,
the impact on individuals in different personal income groups). There is a
large literature on the topic (e.g., Ison 1998; King et al. 2007; Levinson 2010;
and Peters and Kramer 2012). Despite the recognition that revenue alloca-
tion24 can be a major lever to gain community support for road pricing
reform, as shown by statistically significant parameters for the three sources
of funding hypothecation (i.e., public transport, roads, and reductions in
personal income tax), there is also a view and evidence that revenue redis-
tribution cannot resolve all equity and fairness concerns. Initial travel patterns
also matter (Eliasson and Mattsson 2006), especially the concern that indivi-
duals undertaking most of the trips will be the ones most affected by any
change, even if the impact is higher levels of time benefits. Defining trip
exposure in terms of weekly peak and off-peak kilometres, we obtain positive
and statistically significant parameter estimates for both Accept and Vote
models. What this suggests is that car users who are more exposed to the
road network through higher kilometres are more accepting of RP reform, and
more likely to vote for reforms, compared to light users of the network. This
has important implications for the often made claim that it is not fair to
impose such charges on those who use the network more intensively than
those who travel fewer kilometres. This appears, in general, not to be the
situation. The strength of the level of exposure is given in the elasticity
estimates below.
The key elasticity results are summarised in Table 17.24 and relate to the
percentage change in the probability of an RP scheme being acceptable and
that you would vote for it in a referendum (i.e., E[y1|y2=1]) with respect to a
percentage change in the variable of interest.25 It is very informative to
compare the elasticities associated with Vote and Accept in separate probit
24
Manville and King (2013) also raise the concern about credible commitment from government in using
the revenue in line with community supports for reform. Hensher et al. (2013) found that only 22
percent of the sample had confidence that government would allocate revenue in the way they would like
it allocated.
25
We have no basis of calibrating when the reform schemes are not in existence in real markets.
Furthermore, there is only one market choice observed, and hence there is no revealed preference model.
The evidence in Li and Hensher (2011), which includes a review of revealed preference evidence, focuses
on changes in travel. It is not possible to contrast our evidence with other studies because the focus is on
voting and acceptance elasticities that, as far as we are aware, do not exist in other studies.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
Table 17.24 Summary of direct elasticities (t-values in brackets)
Non-RP costs ($ per week) −0.432 (4.95) −0.267 (4.54) −0.182 (2.75) −0.226 (2.82) −0.327 (3.48)
RP costs ($ per week) −0.234 (5.38) −0.126 (4.00) −0.097 (3.00) −0.1301(2.10) −0.190 (2.77)
Peak km (per week) 0.155 (7.46) 0.070 (2.74) 0.090 (4.23) 0.058(1.26) 0.089 (1.45)
Off-peak km (per week) 0.199 (5.75) 0.096 (3.07) 0.121 (3.80) 0.121 (2.22) 0.180 (2.36)
Improving public transport (0–100) 0.131 (10.7) 0.048 (6.46) 0.082 (8.98) 0.092 (8.39) 0.123 (5.97)
Improving existing and constructing new 0.190 (5.11) 0.098 (5.46) 0.095 (3.12) 0.101 (3.300 0.136 (3.17
roads (0–100)
Reducing personal income tax (0–100) 0.116 (7.42) 0.038 (4.32) 0.072 (5.91) 0.079 (5.92) 0.107 (5.07)
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
798 The suite of choice models
models with the bivariate probit models. In general, the direct price elasticities
are lower (i.e., less sensitivity) under joint estimation of Vote and Accept in
Models 2 and 3 compared to Model 1, where the jointness is captured through
correlated disturbances but with no RHS endogeneity. When Accept is
included as an endogenous influence on Vote in Model 4, the direct elasticity
estimate seems to move closer to the average of the independent probit
estimates of model 1, being lower than the Vote mean estimate but higher
than the Accept mean estimate.
A particularly important finding is for road pricing costs per week (RPCost).
We have −0.130 for Model 3 and −0.190 for Model 4, whereas the elasticities
associated with Vote and Accept alone (Model 1) are, respectively, −0.234 and
−0.126. This suggests in Model 1 that if one focusses only on acceptability, we
obtain a much lower mean direct elasticity than if one just focusses on
referendum voting. What this indicates is that a scheme has to be acceptable
for it to receive a higher probability of voting for it in a referendum, given the
scheme costs and other contextual influences. This reinforces the well argued
views that public acceptability is crucial to obtaining increased buy in, and a
resultant higher probability of a yes vote in a referendum (Goodwin 1989;
Hensher et al. 2013; Schade et al. 2007; Ubbels and Verhoef 2006).
Without exception, all mean estimates of direct elasticities are inelastic and
below |0.5|. The lower direct elasticities for RP Cost compared to Non-RP
Cost reflect the relative cost of each source, which indicates that any additional
costs, if existing costs remain, are quite a lot less than 50 percent of the total
cost under a proposed scheme. The elasticities associated with current trip
exposure in terms of peak and off-peak kilometres are very informative,
suggesting for Model 4, our preferred model, that a 25 percent increase in
weekly peak and off-peak kilometres (chosen as a reasonable change, given
that average peak and off-peak kilometres per week are 70.68 and 145.9
kilometres, respectively) results, respectively, in a 2.25 and 4.5 percent
increase in the joint probability of accepting and voting for a proposed RP
scheme. The revenue allocation preferences are also informative; all other
influences being held constant, if all the RP scheme funds raised were
hypothecated to public transport (compared to none), the percentage change
in the probability of accepting and voting for a specific RP scheme would
increase by 12.3 percentage points; likewise, the equivalent impact if all funds
were allocated to improvements in existing and new roads is 13.6 percent,
with a 10.7 percent increase if all monies were hypothecated to reduced
personal income taxes. Given the “closeness” of these percentage changes,
any mix of revenue allocation that is hypothecated as a mixture of the three
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
799 Binary choice models
Partial Effects of Cordon Based Charge and PT100 = 100 Partial Effects of RP DBC costs given PT100 = 100
.8250 .80
Average simulated function value
.5250 .50
.4500
.40
.3750
.30
.3000
.2250 20
1 14 27 40 1 14 27 40
ARPC ARPC
Avg P.E Lower CL Upper CL Avg P.E Lower CL Upper CL
Figure 17.5 Impact of (a) cordon-based and (b) distance-based charging per week given that all revenue is
hypothecated to public transport
Notes: ARPC = road pricing cost per week, Avg.P.E. – average partial effect, PT100 = all revenue is
allocated to public transport.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
800 The suite of choice models
17.4.4 Application II: partial effects and scenarios for bivariate probit
In this section, we use another data set to illustrate the bivariate probit model.
The data was collected in 2013 from six Australian cities (Sydney, Melbourne,
Canberra, Adelaide, Brisbane, and Perth). The differences in preferences
between the six cities are of interest as one way of determining if there exist
contextual biases (including exposure through use of specific transport invest-
ments) in the preferences of populations towards or against the voting rule
and the earmarking of increased tax for transport investments.
The findings from the estimation of the bivariate probit model are summar-
ized in Table 17.25. We investigated the potential role of each of the available
socio-economic variables as well as a city-specific dummy variable and a
transport-related variable, namely the recent use or otherwise of public transport.
The Nlogit syntax is given below:
Bivariate Probit
;lhs=votegood,votetp
;rh1=one,age,ftime,can
;rh2=one,usept,male,pinc,ptime,retired,brs
;hf2=commute
;partial effects$
Looking at the voting model (Model 1), we find that age, full time employment
status, and living in Canberra are all negative and statistically significant
influences. The negative sign indicates, all other things held constant, that as
a person’s age increases, and they are full time employed (compared to other
employment status, including not in the workforce) that the probability of
supporting the voting mechanism decreases. One might speculate as to
whether older people are more disillusioned with the effectiveness of a refer-
endum style vote given that the historical record in Australia is, with rare
exception, one of rejection of support for a specific issue. Interestingly,
residents of Australia’s capital city, Canberra, have a very strong tendency
(relative to residents of the other five cities), to not support a voting mechan-
ism tied to the idea of using it for governments to decide which projects to
invest in. This is reinforced by the percent support (not reported here), which
is much lower than in all the other cities.
The evidence associated with earmarked taxation for transport investments
(Model 2) yields six statistically significant influences, with five having
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
801 Binary choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
802 The suite of choice models
Table 17.26 Summary of elasticities, confidence interval at 95 percent in second bracket set
Age (yr.) −0.0716 (2.01) (−0.141, Use public transport −0.0114 (3.22) (−0.018,
−0.002) (1,0) −0.004)
Full time employed −0.0485 (2.30) (−0.089, Male (1,0) −0.0076 (2.70) (0.0318,
(1,0) −0.007) −0.0021)
Canberra (1,0) −0.2398 (3.27) (−0.3836, Personal income −0.0048 (2.51) (-.0085,
−0.0961) ($000) −0.0010)
Part time employed −0.0072 (2.29) (−0.0133,
(1,0) −0.0010)
Retired (1,0) −0.0071 (2.16) (−0.0136,
−0.0007)
Brisbane (1,0) −0.0062 (1.97) (0.00004,
−0.0124)
Finally, the two equations are correlated, with a 0.136 correlation of the
error disturbances. Although the interpretation of the mean parameter esti-
mates is informative, what is of greater relevance are the implied elasticities
associated with each explanatory variable, since they provide evidence on the
extent of influence of a change in the level of an exogenous variable on the
preference probability for supporting a vote and the earmarking of taxation
increases. The results are summarized in Table 17.26, including the t-ratios
and 95 percent confidence intervals.
All the mean elasticity estimates are statistically significant, and all are
relatively inelastic. The elasticities associated with dummy variables (i.e., all
but age and personal income) are arc elasticities based on the average of the
before and after levels of an explanatory variable and the probability prefer-
ence. The most significant effect is the Canberra dummy variable, which
suggests, ceteris paribus, that when a resident lives in Canberra compared to
any of the other five cities, the probability of not supporting a voting mechan-
ism decreases by 23.98 percent. The next sizeable effect is much smaller,
namely the respondent’s age, if one focuses on a 1 percent change
(−0.0716). Given an average age of 43 years, a 10 percent increase in age (to
47.3 years old) reduces the probability of supporting the voting mechanism by
7.16 percent. We ran a scenario on these two variables in which we predicted
the joint probability of voting support given support for earmarked tax
increases for transport investment. The results are shown in Table 17.27,
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
803 Binary choice models
Table 17.27 Simulated scenario of role of Canberra and age on preference probability
Model Simulation Analysis for Bivariate Probit E[y1|y2=1] function
--------------------------------------------------------------------------------------------
Simulations are computed by average over sample observations
--------------------------------------------------------------------------------------------
User Function Function Standard
(Delta method) Value Error |t| 95% Confidence Interval
--------------------------------------------------------------------------------------------
Avrg. Function .81575 .01599 51.00 .78440 .84710
--------------------------------------------------------------------------------------------
CAN = .00 -----------------------------------------------------------------------
AGE = 22.00 .85132 .01874 45.43 .81459 .88805
AGE = 32.00 .83909 .01654 50.73 .80667 .87151
AGE = 42.00 .82622 .01585 52.14 .79516 .85728
AGE = 52.00 .81271 .01747 46.53 .77847 .84694
AGE = 62.00 .79857 .02141 37.30 .75660 .84053
AGE = 72.00 .78381 .02712 28.90 .73066 .83696
--------------------------------------------------------------------------------------------
CAN = 1.00 ----------------------------------------------------------------------
AGE = 22.00 .68838 .05158 13.35 .58730 .78947
AGE = 32.00 .66999 .05018 13.35 .57163 .76834
AGE = 42.00 .65117 .05005 13.01 .55307 .74926
AGE = 52.00 .63197 .05138 12.30 .53127 .73267
AGE = 62.00 .61244 .05422 11.30 .50617 .71870
AGE = 72.00 .59261 .05845 10.14 .47806 .70717
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:06 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.021
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
18.1 Introduction
1
A number of authors have introduced random thresholds (e.g., Cameron and Heckman 1998; Cunha
et al. 2007; Eluru et al. 2008) but have not integrated this into a generalized model with RP and/or
decomposition of random thresholds by systematic sources.
804
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
805 Ordered choices
The ordered probit model was proposed by Zavoina and McElvey (1975) for
the analysis of categorical, non-quantitative choices, outcomes, and responses.
Familiar applications now include bond ratings, discrete opinion surveys such
as those on political questions, obesity measures (Greene et al. 2008), prefer-
ences in consumption, and satisfaction and health status surveys such as those
analyzed by Boes and Winkelmann (2004, 2007). The model foundation is an
underlying random utility or latent regression model (Equation (18.1)):
y i ¼ β0 x i þ ε i ; ð18:1Þ
in which the continuous latent utility, yi*, is observed in discrete form through
a censoring mechanism (Equation 18.2):
yi ¼ 0 if μ 1 < yi < μ0 ;
¼ 1 if μ0 < yi < μ1 ;
¼ 2 if μ1 < yi < μ2 ; ð18:2Þ
¼ ...
¼ J if μJ 1 < yi < μJ :
The model contains the unknown marginal utilities, β, as well as J+2 unknown
threshold parameters, μj, all to be estimated using a sample of n observations,
indexed by i = 1,. . .,n. The data consist of the covariates, xi, and the observed
discrete outcome, yi = 0,1,. . .,J. The assumption of the properties of the
“disturbance,” εi, completes the model specification. The conventional
assumptions are that εi is a continuous disturbance with conventional CDF,
F(εi|xi) = F(εi), with support equal to the real line, and with density f(εi) = F0 (εi).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
806 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
807 Ordered choices
∂Prob½yi ¼ jjxi
¼ ½f ðμj 1 β0 x i Þ f ðμj β0 xi Þβ: ð18:5Þ
∂xi
The result shows that neither the sign nor the magnitude of a coefficient is
informative about the corresponding behavioral characteristic in the model,
so the direct interpretation of the coefficients (or their “significance”) is
fundamentally ambiguous. A counterpart result for a dummy variable in the
model would be obtained by using a difference of probabilities, rather than a
derivative (Boes and Winkelmann 2007; Greene 2008, Chapter E22). One
might also be interested in the cumulative values of the partial effects, such as
shown in Equation (18.6) (see, e.g., Brewer et al. 2006). The last term in this set
is zero by construction:
∂Prob½yi ≤j j xi Xj
¼ ½f ðμm 1 β0 x i Þ f ðμm β0 xi Þ β: ð18:6Þ
∂xi m¼0
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
808 The suite of choice models
where Kj depends on Xj. This is a feature of the model that has been
labeled the “parallel regressions” assumption. Another way to view this
feature of the ordered choice model is through the J binary choices implied
by Equation 18.8. Let zij denote the binary variable defined by:
zij ¼ 1 if y > j; j ¼ 0; 1; . . . ; J 1:
Prob½zij ¼ 1 j xi ¼ Fðβ0 xi μj Þ:
The threshold parameter can be absorbed into the constant term. In prin-
ciple, one can fit these J−1 binary choice models separately. That the same β
appears in all of the models is implied by the ordered choice model.
However, one need not impose this restriction; the binary choice models
can be fitted separately and independently. Thus, the null hypothesis of
the ordered choice model is that the βs in the binary choice equations are all
the same (apart from the constant terms). A standard test of this null
hypothesis, due to Brant (1990), is used to detect the condition that the βj
vectors are different. The Brant test frequently rejects the null hypothesis of
a common slope vector in the ordered choice model. It is unclear what the
alternative hypothesis should be in this context. The generalized ordered
choice model that might seem to be the natural alternative is, in fact,
internally inconsistent – it does not constrain the probabilities of the out-
comes to be positive. It would seem that the Brant test is more about
functional form or, perhaps, some other specification error. See Greene
and Hensher (2010, Chapter 6).
Recent analyses, e.g., Long (1997), Long and Frees (2006), and
Williams (2006), have proposed a “generalized ordered choice model.”
An extended form of the ordered choice model that has attracted much
(perhaps most) of the recent attention, is the “generalized ordered logit”
(or probit) model (e.g., by Williams 2006). This model is defined in
Equation (18.9):
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
809 Ordered choices
where β−1 = 0 (see, e.g., Williams 2006; Long 1997; Long and Frees 2006). The
extension provides for a separate vector of marginal utilities for each jth
outcome. Bhat and Zhao (2002) introduce heteroskedasticity across observa-
tional units, in a spatial ordered response analysis context, along the lines of
the generalized ordered logit form.
The generalization of the model suggested above deals with both problems
(single crossing and parallel regressions), but it creates new ones. The hetero-
geneity in the parameter vector is an artifact of the coding of the dependent
variable, not a manifestation of underlying heterogeneity in the dependent
variable induced by behavioral differences. It is unclear what it means for the
marginal utility parameters to be structured in this way. Consider, for exam-
ple, that there is no underlying structure that could be written down in such a
way as to provide a means of simulating the data generating mechanism. By
implication, yi* = βj0 xi + εi if yi = j. That is, the model structure is endogenous –
one could not simulate a value of yi from the data generating mechanism
without knowing in advance the value being simulated. There is no reduced
form. The more difficult problem of this generalization is that the probabilities
in this model need not be positive, and there is no parametric restriction
(other than the restrictive model version we started with) that could achieve
this. The probability model is internally inconsistent. The restrictions would
have to be functions of the data. The problem is noted by Williams (2006), but
dismissed as a minor issue. Boes and Winkelmann (2007) suggest that the
problem could be handled through a “non-linear specification.” Essentially,
this generalized choice model does not treat the outcome as a single choice,
even though that is what it is.
To put a more positive view, we might interpret this as a semi-parametric
approach to modeling what is underlying heterogeneity. However, it is not
clear why this heterogeneity should be manifest in parameter variation across
the outcomes instead of across the individuals in the sample. One would
assume that the failure of the Brant test to support the model with parameter
homogeneity is, indeed, signaling some failure of the model. A shortcoming of
the functional form as listed above (compared to a different internally con-
sistent specification) is certainly a possibility. We hypothesize that it might
also be picking up unobserved heterogeneity across individuals. The model we
develop here accounts for individual heterogeneity in several possible forms.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
810 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
811 Ordered choices
μij ¼ μj þ δ0 zi : ð18:10Þ
It is less than obvious whether the variables, zi, are actually in the threshold or
in the mean of the regression. Either interpretation is consistent with the
model. Pudney and Shields argue that the distinction is of no substantive
consequence for their analysis.
Formal modeling of heterogeneity in the parameters as representing a
feature of the underlying data also appears in Greene (2002) (version 8.0)
and Boes and Winkelmann (2004), both of whom suggest a RP approach to
the model. In Boes and Winkelmann, it is noted that the nature of an RP
specification induces heteroskedasticity, and could be modeled as such. The
model would appear as follows:
βi ¼ β þ ui ; ð18:12Þ
where ui ~ N[0,Ω]. Inserting this in the base case model and simplifying, we
obtain Equation 18.13:
μj β0 xi
!
0 0
Prob½yi ≤ jjxi ¼ Prob½εi þ ui xi ≤ μj β xi ¼ F pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi ; ð18:13Þ
1 þ x0 i Wxi
2
The authors’ suggestion that this could be handled semi-parametrically without specifying a distribution
for ui is incorrect, because the resulting heteroskedastic probability written above only preserves the
standard normal form assumed if ui is normally distributed as well as εi.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
812 The suite of choice models
Boes and Winkelmann (2004, 2007) did not pursue this approach. Greene
(2002) analyzes essentially the same model, but proposes to estimate the
parameters by maximum simulated likelihood.
Curiously, none of the studies listed above focus on the issue of scaling,
although Williams (2006), citing Allison (1999), does mention it. A hetero-
skedastic ordered probit model with the functional form in Equation (18.14)
appears at length in Greene (1997), and is discussed in some detail in Williams
(2006):
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
813 Ordered choices
E½βi jx i ; z i ¼ β þ Dz i ð18:17Þ
The model is formulated with Γvi rather than, say just vi with covariance
matrix Ω purely for convenience in setting up the estimation method. This is a
random parameters formulation that appears elsewhere, e.g., Greene (2002,
2005). The random effects model is a special case in which only the constant is
random. The Mundlak (1978) and Chamberlain (1980) approach to modeling
fixed effects is also accommodated by letting zi = x i in the equation for the
overall constant term.
We are also interested in allowing the thresholds to vary across indivi-
duals. See, for example, King et al. (2004) for a striking demonstration of the
payoff to this generalization. The thresholds are modeled randomly and
non-linearly as:
with normalizations and restrictions μ−1 = −∞, μ0= 0, μJ = +∞. For the
remaining thresholds, we have Equation (18.20):
μ1 ¼ expðα1 þ δ0 ri þ σ1 wj1 Þ
¼ expðδ0 ri Þ expðα1 þ σ1 wj1 Þ
μ2 ¼ expðδ0 ri Þ ½expðα1 þ σ1 wj1 Þ þ expðα2 þ σ2 wj2 Þ; ð18:20Þ
j
μj ¼ expðδ0 ri Þ Σm¼1 expðαm þ m wim Þ ; j ¼ 1; . . . ; J 1
μJ ¼ þ ∞:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
814 The suite of choice models
The thresholds, like the regression itself, are shifted by both observable (ri)
and unobservable (wij) heterogeneity. The model is fully consistent, in that
the probabilities are all positive and sum to one by construction. If δ = 0 and
σj = 0, then the original model is returned, with μ1 = exp(α1), μ2 = μ1 + exp
(α2), and so on. Note that if the threshold parameters were specified as linear
functions rather than as in Equation (18.19), then it would not be possible to
identify separate parameters in the regression function and in the threshold
functions.
Finally, we allow for individual heterogeneity in the variance of the
utility function as well as in the mean. This is likely to be an important
feature of data on individual behavior. The disturbance variance is allowed
to be heteroskedastic, now specified randomly as well as deterministically.
Thus:
β0 i xi
" #
μij
Prob½yi ¼ jjxi ; zi ; hi ; ri ; vi ; w i ; ei ¼ F
expðγ0 hi þ τei Þ
μi;j 1 β0 i xi
" #
F ; ð18:22Þ
expðγ0 hi þ τei Þ
where it is noted, once again, that both μij and βi vary with observed variables
and with unobserved random terms. The log-likelihood (LL) is constructed
from the terms in Equation (18.22). However, the probability in Equation
(18.22) contains the unobserved random terms, vi, wi, and ei. The term that
enters the LL function for estimation purposes must be unconditioned on the
unobservables. Thus, they are integrated out, to obtain the unconditional
probabilities:
Prob½yi ¼ j j xi ; zi ; hi ; ri ¼
μij β0 i xi β0 i xi
ð " # " #!
μi;j1
F F f ðv i ; wi ; ei Þdvi dw i dei :
vi ;wi ;ei expðγ0 hi þ τei Þ expðγ0 hi þ τei Þ
ð18:23Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
815 Ordered choices
log LS ðβ; D; α; δ; γ ; Γ; s; τÞ
β0 i;m xi β0 i;m xi
" # " #!
Xn 1 XM μij;m μi;j1;m
¼ log F F :
i¼1 M m¼1 expðγ0 hi þ τei;m Þ expðγ0 h i þ τei;m Þ
ð18:24Þ
vi,m, wi,m, ei,m are a set of M multivariate random draws for the simulation.3
This is the model in its full generality. Whether a particular data set will be rich
enough to support this much parameterization, particularly the elements of
the covariances of the unobservables in Γ, is an empirical question that will
depend on the application.
One is typically interested in estimation of parameters such as β in Equation
(18.24) to learn about the impact of the observed independent variables on the
outcome of interest. This generalized ordered choice model contains four
points at which changes in observed variables can induce changes in the
probabilities of the outcomes – in the thresholds, μij, in the marginal utilities,
βi, in the utility function, xi, and in the variance, σi2. These could involve
different variables or they could have variables in common. Again, demo-
graphics such as age, sex, and income, could appear anywhere in the model. In
principle, then, if we are interested in all of these, we should compute all the
partial effects:
∂Probðyi ¼ jjxi ; zi ; ri ; hi Þ
¼ direct of variables in the utility function;
∂xi
∂Probðyi ¼ jjxi ; zi ; ri ; hi Þ
¼ indirect of variables that affect the parameters β;
∂zi
∂Probðyi ¼ jjxi ; zi ; ri ; hi Þ
¼ indirect of variables that affect the variance of ei ;
∂hi
∂Probðyi ¼ jjxi ; zi ; ri ; hi Þ
¼ indirect of variables that affect the thresholds:
∂ri
The four terms (in order) are the components of the partial effects (a) due
directly to change in xi, (b) indirectly due to change in the variables, zi, that
3
We use Halton sequences rather than pseudo-random numbers. See Train (2003, 2009) for a discussion.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
816 The suite of choice models
influence βi, (c) due to change in the variables, hi, in the variance, and (d) due
to changes in the variables, ri, that appear in the threshold parameters,
respectively. The probability of interest is:
∂Probðyi ¼ jjxi ; zi ; hi ; ri Þ
¼
∂xi
0
0 8 " # 9 1
> μ ij β i x i >
>f
> >
> C f ðv ; w ; e Þdv dw de
< expðγ0 hi þ τei Þ
B > >
ð B 1
> >
= C i i i i i i
ð βi ÞC
B C
0
vi ;wi ;ei Bexpðγ hi þ τei Þ >
B
μi;j 1 β0 i xi
" #
> >
> C
: f expðγ0 h þ τe Þ
@ >
> >
> A
> >
;
i i
ð18:26aÞ
∂Probðyi ¼ jjxi ; zi ; hi ; ri Þ
¼
∂zi
μij β0 i xi
0 8 " # 9 1
> >
>
> f >
>
< expðγ0 hi þ τei Þ
B >
> >
> C
1
ð B = C
0
ð D xi ÞCf ðvi ; w i ; ei Þdv i dwi dei :
B C
0
vi ;wi ;ei Bexpðγ hi þ τei Þ >
B
μi;j 1 β0 i xi
" #
> >
> C
: f expðγ0 h þ τe Þ
@ >
> >
> A
> >
;
i i
ð18:26bÞ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
817 Ordered choices
∂Probðyi ¼ jjxi ; zi ; hi ; ri Þ
¼
∂hi
μij β0 i xi μij β0i xi
08 " # ! 9 1
> >
>f
> >
B> >
< expðγ0 hi þ τei Þ expðγ0 hi þ τei Þ
> C
ð B> >
= C
ð γÞCf ðvi ; w i ; ei Þdv i dwi dei
B C
B "
μi;j 1 β0 i xi μi;j 1 β0 i xi
# !
vi ;wi ;ei B> > C
@> >
: f expðγ0 h þ τe Þ expðγ0 h þ τe Þ
>
> >
> A
> >
;
i i i i
ð18:26cÞ
∂Probðyi ¼ jjxi ; zi ; hi ; ri Þ
¼
∂ri
μij β0i xi
08 " # 9 1
μij
> >
> f >
:
> >
ð > expðγ0 hi þ τei Þ expðγ0 hi þ τei Þ
B>
B<
>
>
= C
C
ðδÞCf ðvi ; w i ; ei Þdv i dwi dei
B C
B "
μi;j 1 β0 i xi
#
vi ;wi ;ei B> μi;j 1
> C
@>
>f
> >
>
> A
: expðγ0 h þ τe Þ expðγ0 h þ τe Þ
> >
;
i i i i
ð18:26dÞ
Effects for particular variables that appear in more than one part of the model
are added from the corresponding parts. Like the LL function, the partial
effects must be computed by simulation. If a variable appears only in xi, then
this formulation retains both the “parallel regressions” and “single crossing”
features of the original model. Nonetheless, the effects are highly non-linear in
any event. However, if a variable appears anywhere else in the specification,
then neither of these properties will necessarily remain.
The context of the application, using stated choice data from a larger study
reported in Hensher (2004, 2006), is an individual’s choice among unlabeled
attribute packages of alternative tolled and non-tolled routes for the car
commuting trip in Sydney (Australia) in 2002. In this chapter we are inter-
ested in one feature of the way in which individuals process attribute informa-
tion, namely attribute inclusion or exclusion, given a maximum of five
attributes per alternative. The dependent variable in the ordered choice
model is the number of ignored attributes, or the number of attributes
attended to from the full fixed set associated with each alternative package
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
818 The suite of choice models
of route attributes. The utility function is defined over the attribute informa-
tion processed by each individual, with candidate influences on each indivi-
dual’s decision heuristic, including the dimensions of the choice experiment
(e.g., number of alternatives, range of attributes), the framing of the design
attribute levels relative to a reference alternative (see below), an individual’s
socio-economic characteristics (SECs), and attribute accumulation where
attributes are in common units (see also Hensher 2006).
The establishment of attribute inclusion/exclusion (also referred to as
preservation/non-preservation)4 in making choices in a stated choice context
is often associated with design dimensionality and the so-called complexity of
the stated choice experiment (Hensher 2006). It is typically implied that
designs with more items to evaluate are more complex than those with
fewer items5 (for example, Arentze et al. 2003; Swait and Adamowicz 2001a,
2001b), impose a cognitive burden, and are consequently less reliable, in a
behavioral sense, in revealing preference information. This is potentially
misleading, since it suggests that complexity is an artefact of the quantity of
information, in contrast to the relevance of information (Hensher 2006). In
any setting where an individual has to process the information on offer and
make a choice, psychologists interested in human judgment theory have
studied numerous heuristics that are brought to bear in aiding simplification
of the decision task (Gilovich et al. 2002). The accumulating life experiences of
individuals are also often brought to bear as reference points to assist in
selectively evaluating the information placed in front of them. These features
of human processing and cognition are not new to the broad literature on
judgment and decision making, where heuristics are offered up as deliberative
analytic procedures intentionally designed to simplify choice. The presence of
a large amount of information, whether requiring active search and consid-
eration or simply assessment when placed in front of an individual (the latter
being the case in choice experiments), has elements of cognitive overload (or
burden) that results in the adoption of rules to make processing manageable
and acceptable (presumably implying that the simplification is worth it in
terms of trading off the benefits and costs of a consideration of all the
information on offer or potentially available with some effort). It is not easy
4
The chapter focusses on attribute preservation and non-preservation; however, it is important to
recognize that one way in which the number of attributes is “reduced,” without attribute elimination, is by
adding up common metric attributes. Hence it is important that we consider this as well, and control for
the possibility that some attributes are not eliminated but added up.
5
Complexity also includes attributes that are lowly correlated, in contrast to highly correlated, the latter
supporting greater ease of assessment in that one attribute represents other attributes.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
819 Ordered choices
6
Hensher (2004), Train and Wilson (2008), and Rose et al. (2008) provide details of the design of pivot-
based experiments.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
820 The suite of choice models
7
This is an important point because we did not want the analysis to be confounded by extra attribute
dimensions.
8
Interviews took between 20 and 35 minutes, with an interviewer present who entered an individual’s
responses directly into the CAPI instrument on a laptop.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
Table 18.1 Attribute profiles for the design
Free-flow time ± 20 −20, 0, +20 −20,−10,+10,+20 −20, +40 −20,+10,+40 −20, 0,+20,+40 ±5 −5, 0,+5 −5, −2.5, +2.5, +5
Slow-down time ± 40 −40, 0, +40 −40,−20,+20,+40 −30, +60 −30,+15,+60 −30, 0,+30,+60 ± 20 −20, 0, +20 −20, −2.5, +2.5, +20
Stop/start time ± 40 −40, 0, +40 −40,−20,+20,+40 −30, +60 −30,+15,+60 −30, 0,+30,+60 ± 20 −20, 0, +20 −20, −2.5, +2.5, +20
Uncertainty of travel time ± 40 −40, 0, +40 −40,−20,+20,+40 −30, +60 −30,+15,+60 −30, 0,+30,+60 ± 20 −20, 0, +20 −20, −2.5, +2.5, +20
Total costs ± 20 −20, 0, +20 −20,−10,+10,+20 −20, +40 −20,+10,+40 −20, 0,+20,+40 ±5 −5, 0,+5 −5, −2.5, +2.5, +5
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
822 The suite of choice models
Note: Column 1 refers to the number of choice sets. The four rows represent the set of designs. The number of
alternatives does not include the reference alternative.
Transport Study
Games 1
Details of Your Alternative Road Alternative Road Alternative Road
Recent Trip A B C
Time in free-flow
15 14 16 16
(mins)
Time slowed down by
10 12 8 12
other traffic (mins)
Time in Stop/Start 5 4 6 4
conditions (mins)
Uncertainty in travel
+/– 10 +/– 12 +/– 8 +/– 8
time (mins)
Go to Game 2 of 6
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
823 Ordered choices
Ignored attributes
1. Please indicate which of the following attributes you ignored when considering the choices you made in the 10 games.
Next
theory for reference points, and (iii) the literature on heuristics that suggests
that attribute packaging or attribute accumulation9 is a legitimate rule for
some individuals in stage 1 editing under prospect theory (Gilovich et al.
2002).
The generalized ordered logit model has a preferred goodness of fit over the
traditional ordered logit model. With four degrees of freedom difference, the
likelihood ratio of 181.92 is statistically significant on any acceptable chi
squared test level. The generalized model has included a random parameter
form for congestion time framing and has accounted for two systematic
sources of variation around the mean of the random threshold parameter
(i.e., the accumulation of travel time and gender).
The evidence identifies a number of statistically significant influences on
the number of attributes attended to, given the maximum number of attri-
butes provided. The range of the attributes and the number of alternatives10 in
the choice set condition mean attribute preservation, and the number of levels
of an attribute has a systematic influence on the variance of the unobserved
9
Accumulation, grouping, and aggregation are essentially the same constructs; namely, where two or
more attributes with a common metric unit are treated as a combined attribute.
10
The difference in the number of alternatives (from two to four, excluding the reference alternative)
represents a range typically found in SC studies. The actual screens, with the reference alternative in
place, have between three and five alternatives. The number of alternatives is fixed per respondent but
varies across the sample.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
824 The suite of choice models
Generalized ordered
Attribute Units Ordered logit logit
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
825 Ordered choices
effects (or the error term). We framed the level of each attribute relative to that
of the experienced car commute as (i) free-flow time for reference (or base)
minus the level associated with an alternative in the stated choice design, and
(ii) the congested travel time for the base minus the level associated with each
stated choice alternative’s attribute level. The parameter estimates are statis-
tically significant and negative suggesting that the more that an stated choice
attribute level (“free-flow time” and “congested time” (= slowed-down plus
stop/start time)) deviates from the reference alternative’s level, the more likely
that an individual will process an increased number of attributes. The attribute
packaging effect for travel time has a negative parameter, suggesting that those
individuals who add up components of travel time tend to preserve more
attributes; indeed, aggregation is a way of simplifying the choice task without
ignoring attributes. In the sample, 82 percent of observations undertook some
attribute packaging.
The evidence here cannot establish whether an attribute reduction strategy
is strictly linked to behavioral relevance, or to a coping strategy for handling
cognitive burden, both being legitimate paradigms. It does, however, provide
indications on what features of a stated choice experiment have an influence
on how many attributes provided within a specific context are processed. It is
likely that the evidence is application-specific, but extremely useful when
analysts compare the different studies and draw inferences about the role of
specific attributes.
The threshold parameter has a statistically significant mean and two
sources of systematic variation across the sample around the mean threshold
parameter estimate. Across the sample, there were three levels of the ordered
choice observed: level 0 is where all attributes are preserved, level 1 is where 4
of the 5 attributes were preserved, and level 3 is where 3 of the 5 attributes were
preserved. No respondent preserved only 1 or 2 attributes. Hence given three
levels of the choice variable, there are two threshold parameters, one between
levels 0 and 1 and one between levels 1 and 2 (see the explanation in paragraph
following Equation 18.3). As indicated in section 2.1, a normalization is
required so that a constant can be identified. We set the threshold parameter
for between levels 0 and 1 equal to zero (μ1) and estimate the parameter
between levels 1 and 2 (μ2).11
11
Estimation of the threshold parameters is not a main object of fitting the ordered choice model per se.
The flexibility of the threshold parameters is there to accommodate the variety of ways that individuals
will translate their underlying continuous preferences into the discrete outcome. The main objective of
the estimation is the prediction of and analysis of the probabilities, e.g., the partial effects. The threshold
parameters do not have any interesting interpretation of their numerical values in their own right.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
826 The suite of choice models
Table 18.4 Marginal effects for three choice levels derived from ordered logit models
Design dimensions:
Narrow attribute range −0.4148, 0.3893, 0.0255 −0.2502, 0.2242, 0.0259
Number of alternatives 0.2779, −0.2608, −0.0171 0.1789, −0.1603, −0.0256
Framing around base alt:
Free-flow time for base minus SC −0.0099, 0.0093, 0.0006 −01017, 0.0094, 0.0011
alternative level
Congested time for base minus SC 0.0025, −0.0024, −0.0002 −0.0134, 0.0119, 0.0014
alternative level
Attribute packaging:
Adding travel time components 0.2237, −.2099, −0.0137 0.1525, −0.1367, −0.0158
Variables in threshold
Add travel time components − 0.0000, 0.06510, −0.06510
Gender (male =1) − 0.0000, 0.01785, −0.01785
Variance decomposition:
Number of levels −0.1104, 0.0249, 0.0856 −0.01740, 0.0103, 0.0071
Free-flow time for base minus SC −0.2386, 0.0537, 0.1849 0.0026, −0.0015, −0.0010
alternative level
Who pays (1 = individual, 0 = a 0.0740, −0.0167, −0.0573 0.0502, −0.0297, −0.0071
business)
Notes: The three marginal effects per attribute refer to the levels of the dependent variable.
SC = Stated choice.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
827 Ordered choices
12
This holds for continuous variables only. For dummy (1,0) variables, the marginal effects are the
derivatives of the probabilities given a change in the level of the dummy variable.
13
For the “Free-flow time for base minus SC alternative level,” we report this in variance decomposition to
show its relatively small effect compared to the overall effect of this variable given in another row in the
table.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
828 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
829 Ordered choices
number of attributes. This evidence was found for both the “free-flow time”
and “congested time” framing effects. Conversely, as the stated choice design
attribute level moves closer to the reference alternative’s level, individuals
appear to use some approximation rule, in which closeness suggests similarity,
and hence ease of eliminating specific attributes, because their role is limiting
in differentiation.
Reference dependency not only has a direct (mean) influence on the
number of attributes ignored; it also plays a role via its contribution to
explaining heteroskedasticity in the variance of the unobserved effects. This
has already been accounted for in the GOCM marginal effects for free-flow
time framing. It is separated out in the TOCM. The effect of widening the gap
between the base and stated choice “free-flow time” reduces the heteroske-
dasticity of the unobserved effects across the respondents, increasing the
acceptability of the constant variance condition when simpler models are
specified.
In GOCM, the congested time framing effect is represented by a dis-
tribution across the sample. The random parameter has a statistically
significant standard deviation parameter estimate, resulting in the distri-
bution shown in Figure 18.3. The range is from −0.857 to 1.257; hence there
is a sign change around the mean of 0.70833 and standard deviation of
1.42
1.14
.85
Density
.57
.28
.00
–1.00 –.50 .00 .50 1.00 1.50
CONGD
Kernel density estimate for CONGD
Figure 18.3 Distribution of preference heterogeneity for congested time framing
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
830 The suite of choice models
0.2657. This results in the same mean marginal effect sign in GOCM as
free-flow time framing; however, when we treated congested time framing
as having fixed parameters (in TOCM, where the standard deviation
parameter was not statistically significant), the signs are swapped for all
levels of the choice variable. The evidence from the GOCM is intuitively
more plausible.
The attribute accumulation rule in stage 1 editing under prospect theory is
consistently strong for the aggregation of travel time components. The posi-
tive marginal effect for the dummy variable “adding three travel time compo-
nents” indicates that, on average, respondents who add up the time
components in assessing the alternatives tend also to ignore more attributes.
There is clear evidence that a relevant simplification rule is the re-packaging of
the attribute set, where possible, through addition. This is not a cancellation
strategy, but a rational way of processing the information content of compo-
nent attributes, and then weighting this information (in some unobserved
way) in comparing alternatives.
The SECs of respondents proxy for other excluded contextual influences.
A respondent’s role in paying the toll was identified, through its influence
on the variance decomposition of the unobserved effects, as a statistically
significant socio-economic influence on the number of attributes consid-
ered. We have no priors on the likely sign of the influence on variance. The
positive marginal effect for who pays suggests that those who pay them-
selves (in contrast to a business paying) tend to result in a higher prob-
ability of preserving more attributes, although the influence is slightly less
in GOCM compared to TOCM. This might mean that males do care more
about the time/cost trade-off, in contrast to a situation where only time
matters if someone else pays for the travel. Gender was a systematic source
of influence on the threshold parameter, increasing its mean estimate for
males.
y* = β(i)0 x(i) + εi
β(i) = β + Γw(i), w(i) ~ N[0,I], Γ = diagonal matrix of standard deviations
ε(i) ~ N[0,σ(i)2], σ(i) = exp[γ0 z(i) + τv(i)] v(i) ~ N[0,1]
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
831 Ordered choices
Allows panel data treatment. Random draws are fixed over periods.
Nothing else need be. Note that the random parameters in the model are
the βs. The variance term, σ(i), is random because of τv(i) as well:
ORDE; Lhs = . . .
; Rhs = One,. . . (β)
; RTM(α, θ)
; RPM to request random betas (Γ)
; RVM to request random element in σ(i) (τ)
; LIMITS = list of variables for thresholds (δ)
; HET ; Hfn = list of variables (γ)
Load;file=c:\papers\wps2005\ARC_VTTS_0103\FullDataDec02\dodMay03_05.sav$
sample;all$
reject;altz<5$ To reject base alt
reject;naig=0$
reject;naig<6$
reject;naig>8$
Ordered;lhs=naign5
;rhs=one,?nlvls,
ntb,nalts1,fftd,congt1d,addtim
;het;hfn= fftd,nlvls,whopay ?,coycar,pinc
;RST=b1,b2,b3,b4,b5,b6,b7,0,b9,b10,0,0,0,0,b15,0,b17,b18,b19
;RTM
;RPM
;LIMITS=addtim,gender
;halton;pts=20
; maxit=31
; alg = bfgs ?bhhh
; tlg=0.001,tlb=0.001
;logit ;marginal effects$
Normal exit from iterations. Exit status=0.
+---------------------------------------------------------------+
| Ordered Probability Model |
| Maximum Likelihood Estimates |
| Dependent variable NAIGN5 |
| Weighting variable None |
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
832 The suite of choice models
+-----------+------------------+---------------------- +-----------+-----------+--------------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+-----------+------------------+---------------------- +-----------+-----------+--------------+
+-----------+Index function for probability |
|Constant| 2.96818*** .71141277 4.172 .0000 |
|NTB | 1.37380*** .38246834 3.592 .0003 .4098361|
|NALTS1 | -.92037*** .22538433 -4.084 .0000 3.7283372|
|FFTD | .03290*** .00819172 4.017 .0001 5.4910226|
|CONGT1D | -.00829* .00472072 -1.757 .0789 -7.7076503|
|ADDTIM | -.74074*** .17412229 -4.254 .0000 .8243560|
+-----------+Variance function |
|FFTD | -.01642*** .00597604 -2.748 .0060 5.4910226|
|NLVLS | .10434** .04445941 2.347 .0189 2.9086651|
|WHOPAY | -.30698*** .09874916 -3.109 .0019 1.3981265|
+-----------+Threshold parameters for index |
|Mu(1) | 3.09732*** .53996121 5.736 .0000 |
+-----------+---------------------------------------------------------------------------------- +
| Note: ***, **, * = Significance at 1%, 5%, 10% level. |
+-----------------------------------------------------------------------------------------------+
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
833 Ordered choices
+----------------------------------------------------------- +
| Marginal Effects for OrdLogit |
+--------------+--------------+--------------+------------- +
| Variable | NAIGN5=0 | NAIGN5=1 | NAIGN5=2 |
+--------------+--------------+--------------+------------- +
| ONE | -.89617 | .84119 | .05498 |
| NTB | -.41479 | .38934 | .02545 |
| NALTS1 | .27788 | -.26084 | -.01705 |
| FFTD | -.00993 | .00933 | .00061 |
| CONGT1D | .00250 | -.00235 | -.00015 |
| ADDTIM | .22365 | -.20993 | -.01372 |
| FFTD | -.23861 | .05368 | .18493 |
| NLVLS | -.11044 | .02485 | .08559 |
| WHOPAY | .07399 | -.01665 | -.05734 |
+--------------+--------------+--------------+------------- +
+-------------------------------------------------------------------------------------------------+
| Cross tabulation of predictions. Row is actual, column is predicted. |
| Model = Logistic . Prediction is number of the most probable cell. |
+----------+---------+-------+------+------+-------+-------+------+------+-------+------+-----+
| Actual|Row Sum| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
+----------+---------+-------+------+------+-------+-------+------+------+-------+------+-----+
| 0| 1416| 1157| 0| 259|
| 1| 1080| 707| 0| 373|
| 2| 66| 0| 0| 66|
+----------+---------+-------+------+------+-------+-------+------+------+-------+------+-----+
|Col Sum| 2562| 1864| 0| 698| 0| 0| 0| 0| 0| 0| 0|
+----------+---------+-------+------+------+-------+-------+------+------+-------+------+-----+
Maximum iterations reached. Exit iterations with status=1.
+-------------------------------------------------------------- +
| Random Thresholds Ordered Choice Model |
| Maximum Likelihood Estimates |
| Dependent variable NAIGN5 |
| Weighting variable None |
| Number of observations 2562 |
| Iterations completed 31 |
| Log likelihood function -1786.163 |
| Number of parameters 13 |
| Info. Criterion: AIC = 1.40450 |
| Finite Sample: AIC = 1.40455 |
| Info. Criterion: BIC = 1.43418 |
| Info. Criterion:HQIC = 1.41526 |
| Restricted log likelihood -1871.798 |
| McFadden Pseudo R-squared .0457499 |
| Chi squared 171.2693 |
| Degrees of freedom 9 |
| Prob[ChiSqd > value] = .0000000 |
| Model estimated: Jan 03, 2009, 10:15:43AM |
| Underlying probabilities based on Logistic |
+-------------------------------------------------------------- +
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
834 The suite of choice models
+-----------+-----------------+----------------------+-----------+-----------+--------------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+-----------+-----------------+----------------------+-----------+-----------+--------------+
+-----------+Latent Regression Equation |
|Constant| 3.74598*** 1.22345010 3.062 .0022 |
|NTB | 2.08729*** .70866123 2.945 .0032 .4098361|
|NALTS1 | 1.48161*** .45390074 -3.264 .0011 3.7283372|
|FFTD | .07748*** .02409329 3.216 .0013 5.4910226|
|CONGT1D | -.15712*** .05886616 -2.669 .0076 -7.7076503|
|ADDTIM | -.51850** .24831851 -2.088 .0368 .8243560|
+-----------+Intercept Terms in Random Thresholds |
|Alpha-01| 1.03162*** .25056918 4.117 .0000 |
+-----------+Standard Deviations of Random Thresholds |
|Alpha-01| .000*** . . . . . .(Fixed Parameter). . . . . .. |
+-----------+Variables in Random Thresholds |
|ADDTIM | 2.24714*** .14484003 15.515 .0000 |
|GENDER | .61618*** .08785327 7.014 .0000 |
+-----------+Standard Deviations of Random Regression Parameters |
|Constant| .000*** . . . . . .(Fixed Parameter). . . . . .. |
|NTB | .000*** . . . . . .(Fixed Parameter). . . . . .. |
|NALTS1 | .000*** . . . . . .(Fixed Parameter). . . . . .. |
|FFTD | .000*** . . . . . .(Fixed Parameter). . . . . .. |
|CONGT1D | .36898*** .14216626 2.595 .0094 |
|ADDTIM | .000*** . . . . . .(Fixed Parameter). . . . . .. |
+-----------+Heteroscedasticity in Latent Regression Equation |
|FFTD | -.03170*** .00904323 -3.505 .0005 |
|NLVLS | .21577*** .07512713 2.872 .0041 |
|WHOPAY | -.62202*** .09368992 -6.639 .0000 |
+-----------+---------------------------------------------------------------------------------+
| Note: ***, **, * = Significance at 1%, 5%, 10% level. |
+---------------------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------------------+
|Fixed Parameter. . . indicates a parameter that is constrained to equal |
|a fixed value (e.g., 0) or a serious estimation problem. If you did |
|not impose a restriction on the parameter, check for previous errors.|
+-------------------------------------------------------------------------------------------------+
========================================================================
||Summary of Marginal Effects for Ordered Probability Model (probit) ||
||Effects are computed by averaging over observs. during simulations.||
========================================================================
|| Regression Variable ONE Regression Variable NTB
|| ============================== ==============================
Outcome Effect dPy<=nn/dX dPy>=nn/dX Effect dPy<=nn/dX dPy>=nn/dX
======= ============================== ==============================
Y = 00 -.49177 -.49177 .00000 -.27402 -.27402 .00000
Y = 01 .45903 -.03274 .49177 .25577 -.01824 .27402
Y = 02 .03274 .00000 .03274 .01824 .00000 .01824
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
835 Ordered choices
========================================================================
|| Regression Variable NALTS1 Regression Variable FFTD
|| ============================== ==============================
Outcome Effect dPy<=nn/dX dPy>=nn/dX Effect dPy<=nn/dX dPy>=nn/dX
======= ============================== ==============================
Y = 00 .19450 .19450 .00000 -.01017 -.01017 .00000
Y = 01 -.18155 .01295 -.19450 .00949 -.00068 .01017
Y = 02 -.01295 .00000 -.01295 .00068 .00000 .00068
========================================================================
|| Regression Variable CONGT1D Regression Variable ADDTIM
|| ============================== ==============================
Outcome Effect dPy<=nn/dX dPy>=nn/dX Effect dPy<=nn/dX dPy>=nn/dX
======= ============================== ==============================
Y = 00 .02063 .02063 .00000 .06807 .06807 .00000
Y = 01 -.01925 .00137 -.02063 -.06354 .00453 -.06807
Y = 02 -.00137 .00000 -.00137 -.00453 .00000 -.00453
========================================================================
Indirect Partial Effects for Ordered Choice Model
Variables in thresholds
Outcome ADDTIM GENDER
Y = 00 .000000 .000000
Y = 01 .065100 .017851
Y = 02 -.065100 -.017851
Variables in disturbance variance
Outcome FFTD NLVLS WHOPAY
Y = 00 .002556 -.017397 .050153
Y = 01 -.001512 .010293 -.029673
Y = 02 -.001044 .007104 -.020479
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:42:49 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.022
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
19.1 Introduction
836
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
837 Combining sources of data
Bus
Bus
New mode
RP Data Train SP Data Train
Car Car
speed speed
Figure 19.1 SP and RP data generation process
are generally a good thing, but some activities are directed towards such
constraints, and RP data may not contain sufficient variability to permit
identification of such effects.
RP data have high reliability and face validity in terms of the actual choice
reported (after all, these are real choices made by individuals who committed
their actual, limited resources to make the choices possible). There remains
concern about the reliability of data on attributes associated with non-chosen
alternatives. RP data are particularly well suited to short-term forecasting of
small departures from the current state of affairs, which emphasizes the
tactical nature of the support that RP-based models can give. On the other
hand, these same characteristics make RP data quite inflexible, and often
inappropriate, if we wish to forecast to a different market than the historical
one. Shifts in the technological frontier, as opposed to movements along the
frontier, call for different data. Figure 19.1(b) shows how SP data come into
their own. SP choice data can be used to model existing markets (including the
stretching of attribute levels of existing alternatives that are not observed in
real markets), but its strengths become even more apparent if we wish to
consider fundamentally different markets than existing ones. Some of the
characteristics of SP data are as follows:
Technological relationships: Within reason, SP data can cover a much wider
range of attributes and levels than RP data. Technological relationships can
be whatever the experiment designer wishes (although attribute correla-
tions often are built into SP experiments), so that SP models tend to be
more robust than RP models;
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
838 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
839 Combining sources of data
Respondent
SP Data
SP
SP Equilibrium
Tradeoffs
RP Data
RP
RP Equilibrium
Tradeoffs
Choice
Prediction
Model
produce a model that can forecast in real market future scenarios. RP data are
collected that contain information about the equilibrium and attribute trade-
offs in a particular current market. The RP information (especially the attri-
bute trade-offs) may be deficient (i.e., identification may be problematic, or
efficiency low), and hence SP data are also collected, although the RP and SP
data may come from the same or different individuals. Significantly, the only
SP information used involves the attribute trade-offs, which are pooled with
the RP data to derive a final choice model.
The choice sets need not be the same across the two data sources (i.e., the
alternatives, attributes, and/or attribute levels may differ). The combination of
two data sources will allow the analyst to estimate models using information
that, if they had only one of the data sources available, they might not
otherwise have been able to estimate due to missing data on attributes or
attribute levels. The ability to include non-existent alternatives and manip-
ulate currently non-experienced attributes and attribute levels via an SP
choice experiment is appealing. In cases where an alternative is present in
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
840 The suite of choice models
Respondent
SP Data
SP
SP Equilibrium
Tradeoffs
RP Data
RP
RP Equilibrium
Tradeoffs
Choice
Prediction
Model
the RP data set but not the SP data set, the analyst will have no other option
but to use the RP data (ill conditioned or not) to estimate the preference
function for that alternative. Similarly, where an alternative is present within
the SP component of a data set but not within the RP component, the analyst
will have to use the SP data to obtain the preference function for that
alternative, including the SP ASCs. Indeed the only circumstance when the
SP ASCs are of relevance is when there is not an RP equivalent alternative.
A different view is represented by the work of Swait, Louviere, and Williams
(1994), and illustrated in Figure 19.3. Their view is that each source of data should
be used to capture those aspects of the choice process for which it is superior. For
example, RP data are used to obtain current market equilibria, but the trade-off
information contained in RP data are ignored because of its deficiencies. SP data
typically cover multiple “markets,” or at least a wider range than a single RP
market, hence the trade-off information in SP is used, but equilibrium informa-
tion is ignored. With regard to the latter, SP data provide information about
equilibria over a large range of situations not necessarily directly relevant to the
final objective, namely prediction in an actual RP market.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
841 Combining sources of data
Suppose two preference data sources are available, one RP and another SP,
and both deal with the same form of behavior (say, choice of commuter
mode). Each data source has a vector of attributes, and at least some of
them are common to both data sets. For purposes of exposition, let the
common attributes be XRP and XSP in the RP and SP data sets, respectively,
and let there also be unique attributes Z and W, respectively, for each data set.
Invoking the now familiar random utility framework (see Chapter 3), we
assume that the latent utility underlying the choice process in both data sets
is given by Equations (19.1) and (19.2):
URP
i ¼ αRP
i þ βRP XRP
i þ ωZi þ eRP
i ; 8i 2 C
RP
ð19:1Þ
SP SP
USP
i ¼ αSP SP SP
i þ β Xi þ δWi þ ei ; 8i 2 C ; ð19:2Þ
exp½λSP ðαSP SP SP
i þ β Xi þ δWi Þ
PSP
i ¼ X
SP SP SP SP
; 8i 2 CSP : ð19:4Þ
exp½λ ðαj þ β Xj þ δWj Þ
j2C SP
The scale factor plays a crucial role in the data enrichment process. Equations
(19.3) and (19.4) make it clear that any particular scale factor and the parameters
of its associated choice model are inseparable and multiplicative (λ.κ), where κ is
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
842 The suite of choice models
some parameter vector. Thus, it is not possible to identify a scale factor within a
particular data source under MNL. Nonetheless, the scale factor associated with
any data source fundamentally affects the values of the estimated parameters,
such that the larger (smaller) the scale, the bigger (smaller) the parameters.
There is an identification problem because scale (λ) and utility (β) para-
meters are confounded and cannot be separated in any one data source which,
in turn, implies that one cannot directly compare parameters from different
choice models. For example, one cannot compare the travel time coefficients
from two data sources directly to determine whether one is larger than the
other. In particular, one cannot determine whether the observed difference is
the result of differences in scale, true parameters, or both. Even if two data
sources were generated by the same utility function (i.e., the same β para-
meters), but have different scale factors λ1 and λ2, the estimated parameters
will differ (in one case they are λ1β, and in the other λ2β).
Let us return to comparing two data sources that we believe reflect the same
utilities, but (potentially) different scales. For example, in combining RP and
SP data, the key question is whether (λ1β1) = (λ2β2). We can rearrange the
latter expression to obtain β1 = (λ2/λ1) β2. The scale factor in an MNL model is
inversely related to the variance of the error term as follows for all alternatives
and respondents (see Chapter 3):
2 ¼ π 2 =6λ2 : ð19:5Þ
Thus, the higher the scale, the smaller the variance, which in turn implies that
models that fit well will also display larger scales. The implication of these
observations about the behavior of the scale parameter is that it plays a role in
choice models that is rather unique compared to more familiar statistical
models like ordinary least squares (OLS) regression (with which many
researchers are acquainted). That is, the model parameters and the character-
istics of the error terms are intimately (even inextricably!) related. In the case
of choice models, it is necessary to think of the variance (or, equivalently,
scale) as an integral part of the model specification instead of being a nuisance
parameter. The relationship between mean and variance exhibited by the
MNL model also is a property shared by many other choice model forms,
such as nested logit (NL) and mixed logit (ML).
Our primary interest lies in testing the equality of the parameter vectors for
SP and RP data. The process of combining the two data sources involves
imposing the restriction that the common attributes have the same parameters
in both data sources, i.e., βRP=βSP=β. However, because of the scale factor,
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
843 Combining sources of data
things are not so simple. Since the estimated model parameters are con-
founded with the scale factors for each data set (see Equations (19.3) and
(19.4)); even after imposing the restriction of common attribute parameter
equality, we still must account for the scale factors, as shown in Equations
(19.6) and (19.7) (note the absence of the superscript for β compared to
Equations (19.4) and (19.5)):
exp½λSP ðαSP SP
i þ βXi þ δWi Þ
PSP
i ¼ X
SP SP SP
; 8i 2 CSP : ð19:7Þ
exp½λ ðαj þ βXj þ δWj Þ
j2C SP
Equations 19.6 and 19.7 make it clear that if we wish to pool these two data
sources to obtain a better estimate of β, we cannot avoid controlling for the
scale factors. Data enrichment involves pooling the two choice data sources
under the restriction that common parameters are equal, while controlling for
the scale factors. Thus, the pooled data should enable us to estimate αRP, β, ω,
λRP, αSP, δ, and λSP. However, we cannot identify both scale factors, so one
must be normalized. It is conventional to assume that the scale of the RP data
set is one (λRP≡1), and so the estimate of λSP represents a relative scale with
respect to the RP data scale. Equivalently, we can view the problem as
estimating the SP variance relative to the RP variance (2RP ¼ π 2 =6).
The final parameter vector to be jointly estimated is ψ = (αRP, β, ω, αSP, δ,
λSP). Assuming that the two data sources come from independent samples, the
LL of the pooled data is simply the sum of the multinomial log likelihoods of
the RP and SP data:
XX
LðψÞ ¼ yin ln PRP RP RP
in ðXin ; Zin jα ; β; ωÞ
n2RPi 2CnSP
XX
SP
þ yin ln PSP SP SP
in ðXin ; Win jα ; β; δ; λ Þ: ð19:8Þ
n2SPi 2CnRP
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
844 The suite of choice models
exp½Vi =θ1
PðijCi Þ ¼ X : ð19:9Þ
exp½Vj =θ1
j2C1
exp½Vk =θ2
PðkjC2 Þ ¼ X : ð19:10Þ
exp½Vj =θ2
j2C2
C1 C2
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
845 Combining sources of data
C RP C SP
Figure 19.5 Combining SP and RP data using the NMNL model
Imagine that Cluster 1 in Figure 19.4 was renamed “RP” and Cluster 2
renamed “SP,” as in Figure 19.5. Thus, if we estimate a NL model from the
two data sources, we obtain an estimate of the scale factor of one data set
relative to that of the other, and our estimation objective is accomplished. This
approach was proposed by Bradley and Daly (1993) and Hensher and Bradley
(1993), who called the hierarchy in Figure 19.5 an artificial tree structure. That
is, the tree has no obvious behavioral meaning, but is a useful modeling
convenience. Nlogit as a NL model (see Chapter 14) can be used to obtain
FIML estimates of the inverse of the relative scale factors. One can identify
only one of the relative scale factors, so Figure 19.5 normalizes the inclusive
value of the RP data to unity.
The nested structure in Figure 19.5 assumes that the inclusive value para-
meter(s) associated with all SP alternatives are equal, and fixes the RP inclu-
sive value parameter to unity. This assumption allows one to identify and
estimate the variance, and hence scale parameter, of the SP data set relative to
the RP normalization, but forces within data set homoskedasticity.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
846 The suite of choice models
1
The NL structure is an econometric formulation to account for differences in variance of the unobserved
effects, or scale differences. While analysts tend to use behavioral intuition in partitioning the nest, this is
not the basis of nesting. Hence it is quite feasible for mixtures of SP and RP alternatives to reside in the
one branch.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
847 Combining sources of data
parameter estimates from the RP data set and retaining the constant terms, it
is necessary in creating the composite utility functions to recalibrate the RP
constant terms. To demonstrate why, consider Equation (19.12), that is the
equation used to calculate constant terms in discrete choice models:
k
RP
X
βRP
0i ¼ V i βRP RP
k xk : ð19:12Þ
k¼1
The latter part of Equation (19.12) accounts for the RP parameter estimates
that are to be discarded in constructing the composite utility functions while
failing to account for the SP parameter estimates that are to be used. So why
use the RP attributes in the first place? In estimating the initial SP–RP NL
model, the inclusion or exclusion of an attribute in either data set will affect all
other parameter estimates within the model. Hence, failure to include the
RP attributes in the RP component of the NL model (i.e., simply estimate the
RP model with constants only) will impact upon the SP parameter estimates
obtained from the model. Thus it is necessary, despite potential problems with
RP parameter estimates (given data issues), to include the RP attributes in the
model, otherwise all information for these components will be accounted for
solely by the unobserved effects of the RP utility functions which enter the
RP
utility functions through the V i (nevertheless, at the same time, the constant
terms will preserve information on the choice shares within the data set).
The calibration of the RP constant terms occurs through Equation (19.13):
k
RP
X
βRP
0i ¼ Vi λSP βSP RP
k xk : ð19:13Þ
k¼1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
848 The suite of choice models
The NL model is a member of the family of GEV models (McFadden 1981) that
cannot accommodate a number of specification requirements of data that have
repeated observations from the same respondent. This occurs with SP choice
sets that exhibit potential correlation due to repeated observations. In addition
to potential observation correlation, joint SP–RP estimation can induce a “state
dependence” effect, defined as the influence of the actual (revealed) choice on
the stated choices (SCs) of the individual. State dependence can manifest itself
as a positive or negative effect of the choice of an alternative on the utility
associated with that alternative in the stated responses (Bhat and Castelar 2002).
It is a reflection of accumulated experience and the role that reference depen-
dency plays in choosing (Hensher 2006).
It is possible that the effect of state (reference) dependence is positive for
some individuals and negative for others (see Ailawadi et al., 1999), suggesting
that an unconstrained analytical distribution for the random parameteriza-
tion of state dependence is appropriate. A positive effect may be the result of
habit persistence, inertia to explore another alternative, or learning combined
with risk aversion. A negative effect could be the result of variety seeking or the
result of latent frustration with the inconvenience associated with the cur-
rently used alternative (Bhat and Castelar 2002).
Most SP–RP studies disregard state dependence and adopt fixed parameters
(i.e., homogeneity of attribute preference). Bhat and Castelar (2002) accom-
modate such unobserved heterogeneity in the state dependence effect of the
RP choice on SP choices. Brownstone and Train (1999), on the other hand,
accommodate observed heterogeneity in the state dependence effect by inter-
acting the RP choice dummy variable with the socio-demographic character-
istics of the individual and SP choice attributes.
This section outlines a ML model (as per Chapter 15) that can account for a
between alternative error structure including correlated choice sets, SP–RP
scale difference, unobserved preference heterogeneity, and state or reference
dependency. It draws on contributions from Bhat and Castelar (2002). An
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
849 Combining sources of data
expðαji þ β0 i xjit Þ
Probðyit ¼ jt Þ ¼ XJi : ð19:14Þ
0
q¼1
expðα qi þ β i x qit Þ
βki ¼ βk þ σk vik ;
and; ð19:15Þ
αji ¼ αj þ σj vji ;
ρi = ρ + Γvi, (19.16)
where Γ is a diagonal matrix which contains σk on its diagonal. For conve-
nience, we gather the parameters, choice-specific or not, under the subscript
“k.” We can allow the random parameters to be correlated by allowing Γ to be
a triangular matrix with non-zero elements below the main diagonal, produ-
cing the full covariance matrix of the random coefficients as ∑ = ΓΓ0 . The
standard case of uncorrelated coefficients has Γ = diag(σ1,σ2 , . . . ,σk). If the
coefficients are freely correlated, Γ is a full, unrestricted, lower triangular
matrix and ∑ will have non-zero off-diagonal elements.
An additional layer of individual heterogeneity may be added to the
model in the form of the error components that capture influences that
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
850 The suite of choice models
Thus, Ui4t has its own uncorrelated effect, but there is a correlation between
Ui1t and Ui2t and between Ui1t and Ui3t. This example is fully populated, so the
covariance matrix is block diagonal with the first three freely correlated. The
model might usefully be restricted in a specific application. A convenient way
to allow different structures is to introduce the binary variables, djm = 1, if the
random term Em appears in utility function j and zero otherwise. The full
model with all components is given in Equation (19.18), based on Greene and
Hensher (2007):
X
exp ½αji þ β0 i xjit þ M
m ¼ 1 djm θm Em
Probðyit ¼ jÞ ¼ X Ji 0
X
M
:
q ¼ 1 exp½αqi þ β i xqit þ m ¼ 1 dqm θm Eim
ð19:18Þ
(αji, βi) = (αj, β) + ΓΩivi are random ASCs and taste parameters; Ωi = diag(ωi1,
ωi2, . . .) and β,αji are constant terms in the distributions of the random taste
parameters. Elements ω of the variance-covariance matrix represent the full
generalized matrix. Uncorrelated parameters with homogenous means and var-
iances are defined by βik = βk + σkvik when Γ = I, Ωi = diag(σ1,. . .,σk), xjit are
observed choice attributes and individual characteristics, and vi is random unob-
served taste variation, with mean vector 0 and covariance matrix I. The individual
specific underlying random error components are introduced through the term
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
851 Combining sources of data
Eim, m = 1,. . .,M, Eim ~ N[0,1], given djm= 1 if Eim appears in utility for alternative
j and 0 otherwise, and θm is a dispersion factor for error component m.
The probabilities defined above are conditioned on the random terms, vi,
and the error components, Ei. The unconditional probabilities are obtained by
integrating vik and Eim out of the conditional probabilities: Pj = Ev,E[P(j|vi,Ei)].
This multiple integral, which does not exist in closed form, is approximated by
sampling nrep draws from the assumed populations and averaging. See, for
example, Bhat (2003); Revelt and Train (1998); Train (2003); Brownstone
et al. (2000) for discussion. Parameters are estimated by maximizing the
simulated log likelihood given in Equation (19.19):
ð19:19Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
852 The suite of choice models
φq ð1 δqt;RP Þ; ð19:20Þ
2
We select the SP data set in the empirical application but the RP data set could have been chosen.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
853 Combining sources of data
The data were drawn from a SC experiment that was conducted in six Australian
capital cities in the mid 1990s: Sydney, Melbourne, Brisbane, Adelaide, Perth, and
Canberra (Hensher et al. 2005). The universal choice set comprised the currently
available modes plus two “new” modes, light rail and bus-based transitway (often
referred to as a busway). Respondents evaluated scenarios describing ways to
commute between their current residence and workplace location using different
combinations of policy sensitive attributes and levels.
Four alternatives appeared in each travel choice scenario: (a) car drive
alone, (b) car ride share, (c) bus or busway, and (d) train or light rail.
Twelve types of showcards described scenarios involving combinations of
trip length (3) and public transport pairs (4): bus versus light rail, bus versus
train (heavy rail), busway versus light rail, and busway versus train.
Appearance of public transport pairs in each choice set shown to respondents
was based on an experimental design.
Five three-level attributes were used to describe public transport alternatives:
(a) total invehicle time, (b) frequency of service, (c) closest stop to home, (d)
closest stop to destination, and (e) fare. The attributes of the car alternatives were:
(a) travel time, (b) fuel cost, (c) parking cost, (d) travel time variability and, for toll
roads, (e) departure times and (f) toll charge.3 The design allows orthogonal
estimation of alternative-specific main effect models for each mode option.
In addition to the SC data, each respondent provided details of a current
commuting trip for the chosen mode and one alternative mode. This enabled
us to estimate a joint SP and RP model of choice of mode for the commuting
trip. The data and detailed descriptions of the sampling process and data
profile are provided in the first edition of this book – Hensher et al. (2005).
Final models for the NL “trick” model and the ML models have been
estimated with and without accounting for the differences in the sample and
population shares for the RP alternatives. The population shares for each
urban area are given in Table 19.1, and have been used in the models as
choice-based weights for the pooled cities’ data to adjust the parameters,
especially the ASCs for the RP utility expressions. This is necessary when
estimating elasticities, given that the formula includes the choice probability
3
In the empirical mode here, we found that the aggregation of fuel and toll costs gave the best statistical fit.
The Departure time choice model based on this data is given in Louviere et al. (1999).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
854 The suite of choice models
for the specific alternative. Deciding if the elasticities are sensitive to allowance
for choice-based weighting is an empirical matter, considered below.
The sample modal shares are relatively similar to the population modal
shares. In model estimation, we use the population shares based on the first
four alternatives only. These four alternatives represent, respectively, 91.58
percent, 91.4 percent, 93.33 percent, 91.28 percent, 91.7 percent, and 93.14
percent of all modal trips to work in the six capital cities.
While choice-based weighting is straightforward in a NL model (as a
weighted estimation maximum likelihood (WESML) method), where the esti-
mator is exact and does not use simulation, this is not guaranteed to work with
the simulation-based estimators because they do not compute a second deriva-
tives matrix where the method invokes the BHHH estimator. Our software
attempts to compute the WESML estimator; however, sometimes its approx-
imation to the Hessian is not positive definite and it reverts to the adjustment of
means but with no correction of standard errors. An alternative is exogenous
weighting; however, this fix also omits the covariance matrix, and hence the
asymptotic standard errors (and hence the t-ratios) are not guaranteed to be
efficient. The WESML approximation worked on our data due, we suspect, to
having sample modal shares that were relatively similar to the population (in
brackets) modal shares; namely, 0.61 (0.63), 0.17 (0.22), 0.13 (0.08), and 0.09
(0.07), respectively for the RP alternatives drive alone, ride share, bus and train.
However, we warn analysts who might unawaringly assume that choice-based
weights apply without question to ML models.
The final models are summarized in Table 19.2. The ML models are a
statistically significant improvement in overall goodness of fit after controlling
for different numbers of parameters. The level of service variables are generic
within the car and public transport modes, and travel cost is generic across all
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
855 Combining sources of data
4
The triangular distribution was first used for random coefficients by Train and Revelt (2000) and Train
(2001), later incorporated into Train (2003). Hensher and Greene (2003) also used it and it is increasingly
being used in empirical studies. Let c be the center and s the spread. The density starts at c−s, rises linearly
to c, and then drops linearly to c+s. It is zeropffiffiffibelow c−s and above c+s. The mean and mode arepc.ffiffiffi The
standard deviation is the spread divided by 6; hence, the spread is the standard deviation times 6. The
height of the tent at c is 1/s (such that each side of the tent has area s×(1/s)×(1/2)=1/2, and both sides have
area 1/2+1/2=1, as required for a density). The slope is 1/s2. The mean weighted average elasticities were
also statistically equivalent.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
Table 19.2 Model results for “nested logit” trick versus panel mixed logit for combined SP and RP choice data
Invehicle cost All −0.5802 (−14.7) −.5880 (−12.7) R: −.8534 (−14.17)* R: −.8551 (−14.46)*
Main mode time SP and RP – DA, RS −0.0368 (−6.4) −.0365 (−3.8) R: −0.1119 (−13.7)* R: −0.1123 (−13.4)*
Main mode time SP and RP – BS,TN,LR, −0.0566 (−8.2) −.0598 (−8.2) R: −0.0679 (−8.42)* R: −0.0680 (−8.39)*
BWY
Access & egress mode time SP and RP – BS,TN,LR, −0.0374 (−8.5) −.0370 (−7.3) R: −0.0524 (−9.72)* R: −0.0518 (−9.84)*
BWY
Personal income SP and RP – DA 0.0068 (2.30) .0074 (2.4) 0.01638 (3.46) 0.01684 (3.72)
Drive alone constant DA – RP 0.7429 (2.48) 1.1381 (3.1) 2.3445 (7.26) 2.3221 (7.31)
Ride share constant RS – RP −0.8444 (−3.1) −.2802 (0.86) −0.9227 (−2.91) −0.8301 (−2.65)
Drive alone constant DA – SP 0.0598 (0.36) .0324 (0.18)
Ride share constant RS – SP −0.2507 (−1.8) −.2598 (−1.7)
Train-specific constant TN -SP 0.1585 (1.4) .1655 (1.4)
Light rail-specific constant LR – SP 0.3055 (2.81) .3119 (2.8)
Busway-specific constant BWY – SP −0.016 (−0.14) −.0171 (−.14)
Bus-specific constant BS – RP 0.0214 (0.81) −.0716 (−.22) 0.1383 (0.51) 0.0709 (0.26)
Random parameter standard deviations:
Invehicle cost All 0.8534 (−14.17)* .8551 (−14.46)*
Main mode time SP and RP – DA, RS 0.1119 (−13.7)* 0.1123 (−13.4)*
Main mode time SP and RP – BS,TN,LR, 0.0679 (−8.42)* 0.0680 (−8.39)*
BWY
Access & egress mode time SP and RP – BS,TN,LR, 0.0524 (−9.72)* 0.0518 (−9.84)*
BWY
State dependence DA, RS, BS, TN 0.0917 (−.81)* 0.0834 (−.75)*
SP to RP scale parameter DA, RS 2.963 (0.89) 2.367 (1.16)
BS, BWY 1.077 (6.48) 1.079 (6.64)
TN, LR 1.058 (6.31) 1.200 (5.34)
SP and RP – DA 2.877 (13.2) 2.758 (13.7)
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
Error component
(Alternative-specific
heterogeneity)
SP and RP – RS 1.845 (8.5) 1.992 (9.1)
Scale parameters SP and RP – DA, RS 1.00 (fixed) 1.00 (fixed)
SP and RP – BS,TN,LR, 0.7321 (8.85) 0.7218 (6.57)
BWY
Sample size 2688 2688 2688 2688
LL at convergence −2668.1 −2637.8 −2324.7 −2327.86
Notes: * = Constrained triangular random parameter, R = random parameter mean estimates, DA – drive alone, RS – ride share, BS – bus, TN – train, LR
– light rail, BWY – busway. We used 500 Halton draws to perform our integrations, so there is no simulation variance. EC = Error components.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
858 The suite of choice models
that the influencing attributes on car use are often more extensive (especially
when including socio-demographic conditioning) than the set that determines
public transport use. Given the generally dominant role of the car in many
cities (notably 70–85 percent modal share in Australian cities), one might
expect greater preference heterogeneity in the car choosers and hence an
increasing likelihood of greater unobserved heterogeneity. Interestingly, the
scale parameter in the NL model of 0.7218 for public transport suggests greater
unobserved heterogeneity for public transport modes than for the car; however,
while this may be the appropriate interpretation for this model, the absence
of accounting for correlated choice sets, random preference heterogeneity in
the attributes and in the alternatives makes the comparison somewhat trite.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
859 Combining sources of data
;effects:fc(rda,rrs)/at(rda,rrs)/
pf(rbs,rtn)/mt(rbs,rtn)/ae(rbs,rtn)
;pwt
;tree= car(RDA,RRS,SDA,SRS),PT(RBS,RTN,SBS,STN,SLR,SBW)
;ivset: (car)=[1.0]
;ru1
;model:
U(RDA) =rdasc+ flptc*fc+tm*at +pinc*pincome/
U(RRS) = rrsasc+ flptc*fc+tm*at/
U(RBS) = rbsasc + flptc*pf+mt*mt+acegt*ae/
U(RTN) = flptc*pf+mt*mt+acegt*ae/
U(SDA) = sdasc + flptc*fueld+ tm*time+pinc*pincome/
U(SRS) = srsasc + flptc*fueld+ tm*time /
U(SBS) = flptc*fared+ mt*time+acegt*spacegtm/
U(STN) = stnasc + flptc*fared+ mt*time+acegt*spacegtm/
U(SLR) = slrasc + flptc*fared+ mt*time+acegt*spacegtm/
U(SBW) = sbwasc+ flptc*fared+ mt*time+acegt*spacegtm$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
860 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
861 Combining sources of data
data sources. What appears absent from the literature is the capture of both
scale heterogeneity for a pooled data set and data-specific scale heterogeneity
effects. While this might appear to be a small extension, it is a crucial add on
given the expected increase in the use of GMXL using multiple data sources.5
We now move beyond the NL “trick” and the extensions developed in
Section 19.4 to the most generalized version of a choice model to combine
multiple data sets, decomposing scale heterogeneity to identify data-specific
scale effects.
The extension of interest here is to allow τ (see Chapter 15) to be a function of
a series of dummy variables that identify the presence of scale heterogeneity
between different data sets, such as an SP and an RP data set. The use of the
SMNL or GMX model permits a new variant of SP–RP model estimation. This
is a simple but important extension, as follows: τ = τ + ηds where η is a data set-
specific scale parameter and ds =1 for data source s and zero otherwise, with
s=1,2. . ., S−1. Hence we allow for differences in the GMXL scale factor between
SP and RP data sets through the inclusion of a dummy variable ds (s=SP=1,
s=RP=0) associated with σirs, i.e., σirs = exp(−τ(τ + ηds)2/2 + (τ + ηds )wir).
We use the same data set as in Section 19.4. The GMX command syntax in
Nlogit is given at the end of this section.
The results from the estimation of the GMXL models are summarized in
Table 19.3. We have estimated two models, in addition to a baseline ML model
(M1). Model 2 (M2) accounts for scale heterogeneity without distinguishing
data sets, and Model 3 (M3) is M2 plus an allowance for scale heterogeneity
between data sets (using the command ;hft=spdum).
The Bayes information criterion (BIC) is increasingly used as the preferred
measure of comparison of overall fit of choice models. When estimating model
parameters using maximum likelihood estimation, it is possible to increase the
likelihood by adding parameters, which may result in over-fitting. BIC resolves
this problem by introducing a penalty term for the number of parameters in the
model. BIC is an increasing function of the variance of the unobserved effects
and an increasing function of the number of free parameters estimated. Hence,
a lower BIC implies fewer explanatory variables, better fit, or both. The model
with the lower value of BIC is the one to be preferred.
From the evidence in Table 19.3, we would conclude that Model 3 is the
preferred model but that Models 1 and 2 are virtually indistinguishable on overall
5
We have been asked on many occasions to advise researchers how to accommodate scale differences
between data sets under conditions of scale heterogeneity.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
Table 19.3 Summary of model results. t-values are in brackets. 500 Halton draws, with panel structure accommodated and all random parameters are constrained t-distributions6
Invehicle cost ($) All −0.6223 (−13.8) −0.7243 (−12.8) −0.1284 (−4.57) 2.36 (1.92),
1.46 (1.07)
Main mode time (min.) SP and RP – DA, RS −0.1198 (−12.5) −0.1447 (−12.7) −0.0449 (−5.04) 23.31 (17.55)
Main mode time (min.) SP and RP – BS,TN,LR, −0.0838 (−10.5) −0.0966 (−9.1) −0.0127 (−4.09) 15.96 (10.83)
BWY
Access & egress mode time (min.) SP and RP – BS,TN,LR, −0.0459 (−9.26) −0.0523 (−7.9) −0.0060 (−4.20) 18.86 (13.52)
BWY
Personal income ($000s) SP and RP – DA 0.0080 (2.25) 0.0081 (2.26) 0.0065 (2.61) 34,600 (16,480)
Drive alone constant DA – RP 1.2438 (3.29) 1.4345 (3.53) 2.5409 (11.2)
Ride share constant RS – RP −0.5641 (−1.61) −0.3776 (−1.01) 0.9096 (4.64)
Drive alone constant DA – SP 0.2625 (2.59) 0.8109 (2.74) 2.7805 (11.18)
Ride share constant RS – SP 0.4070 (1.71) 0.5431 (2.00) 2.4598 (10.34)
Train-specific constant TN -SP 0.2271 (2.01) 0.2382 (2.04) 0.1648 (1.53)
Light rail-specific constant LR – SP 0.3995 (3.99) 0.4147 (4.00) 0.3618 (3.67)
Bus-specific constant BS – SP 0.0125 (0.10) 0.0127 (0.10) 0.0486 (0.34)
Bus-specific constant BS – RP −0.1363 (−0.50) −0.1484 (−0.53) 0.3347 (1.61)
6
See n.4.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
Random parameter standard
deviations:
Invehicle cost All −0.6223 (−13.8) −0.7243 (−12.8) −0.1284 (−4.57)
Main mode time SP and RP – DA, RS −0.1198 (−12.5) −0.1447 (−12.7) −0.0449 (−5.04)
Main mode time SP and RP – BS,TN,LR, −0.0838 (−10.5) −0.0966 (−9.1) −0.0127 (−4.09)
BWY
Access & egress mode time SP and RP – BS,TN,LR, −0.0459 (−9.26) −0.0523 (−7.9) −0.0060 (−4.20)
BWY
Variance parameter in scale (τ) – 0.5260 (11.67) 0.8649 (14.95)
Heterogeneity in GMXL scale factor – – 1.6209 (7.90)
(SP)
Sample size 2688
LL at zero −6189.45
LL at convergence −2549.24 −2544.68 −2518.67
Pseudo-R2 0.5881 0.5889 0.5931
Bayes information criterion (BIC) 1.9349 1.9345 1.9241
Value of travel time savings: AU1995$ per person hr.
Main mode time SP and RP – DA, RS 11.55 (0.98) 12.21 (1.32) 20.99 (2.12)
Main mode time SP and RP – BS,TN,LR, 8.08 (0.54) 7.69 (0.78) 5.92 (0.45)
BWY
Access & egress mode time SP and RP – BS,TN,LR, 4.42 (0.13) 4.32 (0.62) 2.82 (0.72)
BWY
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
864 The suite of choice models
fit.7 This latter evidence suggests that accounting for scale heterogeneity without
allowing for scale differences between data sources in the GMXL scale factor does
not improve the overall behavioral explanation offered by ML. This is reinforced
by the similarity of the mean values of travel time savings between Models 1 and 2.
Despite this finding, however, we have a statistically significant parameter
estimate for the coefficient on the overall unobserved scale heterogeneity, τ, in
Model 2 (t-ratio of 11.67). What this suggests is that, although we have
identified the presence of unobserved scale heterogeneity, when this is fed
into the calculation of the standard deviation, σir, or the individual-specific
standard deviation of the idiosyncratic error term, equal to exp(−τ2/2 + τwir),
assuming an estimate for wir, the unobserved heterogeneity is standard nor-
mally distributed; the “mean of the standard deviation” and the “standard
deviation of the standard deviation” are such that the overall influence is not
significantly different from unity.
When we allow for differences in the GMXL scale factor between SP and RP
data sets in Model 3, through the inclusion of a dummy variable ds (s=SP=1,
s=RP=0) associated with σirs, i.e., σirs = exp(−τ(τ + ηds)2/2 + (τ + ηds )wir), to
capture data set-specific scale differences, we find a significant difference in
overall fit on BIC as well as mean estimates of value of travel time savings
(VTTS).8 The mean sigma for the SP data is 0.810 (with a standard deviation
of 1.058); the mean sigma for RP data is 0.96542 (with a standard deviation of
0.9534). These distributions are plotted in Figure 19.6. We observe greater
variance in unobserved heterogeneity in the SP data (i.e., lower scale)
1.24 .66
.99 .53
Density
Density
.74 .39
.49 .26
.25 .13
.00 .00
.00 2.04 4.08 6.13 8.17 10.21 0.00 1.70 3.39 5.39 6. 78 8.48
SIGSP SIGRP
Kernel density estimate for SIGSP Kernel density estimate for SIGRP
7
The similarity of Models 1 and 2 in the model fit highlights the flexibility of the ML model; even the less
flexible model performs equally well when systematic scale heterogeneity is present.
8
The standard errors associated with the VTTS are such that Models 1 and 2 are not significantly different
in terms of mean estimates of VTTS, whereas Models 2 and 3 are.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
865 Combining sources of data
compared to the RP data (noting that the numerical scale on the horizontal
axis differs). This seems plausible given that the SP data induce greater
variation in the levels of the observed attributes, and have the real possibility
of a greater uncertainty in choice in contrast to the binary RP setting where
experience has clarified the levels of attributes of at least the chosen
alternative.
The evidence, albeit from one data set, suggests that data-specific scale differ-
ences play a potentially important role in accounting for scale differences
between pooled data sets that matters behaviorally. Another way of interpreting
this finding is that although the role of scale heterogeneity may be of intrinsic
value, there is a need to include data source-specific scale conditioning through
adjustments in the overall measure of scale heterogeneity in pooled data sets,
reinforcing the approach adopted in earlier studies and in Section 19.4 that use
closed form models (with fixed parameters) such as the NL “trick” on combining
data sets to reveal scale differences. Allowing for scale heterogeneity alone across
two or more data sets appears not sufficient for this one empirical application.
The Nlogit syntax is provided below, together with the model output, for GMX:
RESET
load;file=c:\spmaterial\sprpdemo\sprp.sav$
Project file contained 9408 observations.
sample;all$
reject;altij=1$
reject;altij=5$
reject;altij=6$
sample;all$
create;if(sprp=2)spdum=1$
gmxlogit;userp ? userp is a command to obtain mixed logit parameter
estimates as
?starting values instead of MNL estimates
;lhs=chosen,cset,altij
;choices=RPDA,RPRS,RPBS,RPTN,SPDA,SPRS,SPBS,SPTN,SPLR,SPBW
;pwt
;tlf=.001;tlb=.001;tlg=.001
;gmx
;tau=0.5
;gamma=[0]
;hft=spdum
;maxit=50
;fcn=tm(t,1),mt(t,1),acegt(t,1),flptc(t,1)
;par
;halton;pts= 500;pds=4
;model:
U(RPDA) = rdasc + flptc*fcost+tm*autotime+pinc*pincome/
U(RPRS) = rrsasc + flptc*fcost+tm*autotime/
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
866 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
867 Combining sources of data
--> dstats;rhs=vtm,vmt,vacegt,sigrp,sigsp$
Descriptive Statistics
Variable| Mean Std.Dev. Minimum Maximum Cases Missing
-----------+----------------------------------------------------------------------------------------
VTM| 20.99509 .139574E-11 20.9951 20.9951 9408 0
VMT| 5.91781 .00610 5.74363 5.92104 9408 0
VACEGT| 2.81710 .745729E-13 2.81710 2.81710 9408 0
SIGRP| 1.01330 1.10680 .216893E-01 32.4036 9408 0
SIGSP| 1.24960 30.89010 .220529E-05 2927.59 9408 0
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
868 The suite of choice models
9
We use the phrase “choice experiment” (CE) to refer to the methods commonly adopted in
transportation studies to evaluate packages of attributes, referred to as alternatives, and to then make a
choice or to rank order the alternatives.
10
We use the phrase “marginal willingness to pay” (MWTP) to refer to the valuation of a specific
attribute.
11
Total WTP is a language common in health, environmental, and resource studies to represent the change
in total consumer surplus between the null alternative and the application of interest. The estimate is
based on the total utility difference in dollars of a base alternative and a scenario where an attribute takes
a specific value (e.g., unconstrained mouse hunting versus banning mouse hunting).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
869 Combining sources of data
An assessment of the evidence, including meta analyses (e.g., List and Gallet
2001; Murphy et al. 2004) points to a number of potentially key influences on
12
The literature in agricultural, resource, and environmental valuation does not calibrate the ASCs, even
when the application has real market alternatives with a known market share (e.g. Lusk and Schroeder
2004). This may in part explain the significant differences in the total WTP in contrast to the non-
significant differences in MWTP.
13
The recognition of the role of the opt-out or null alternative has been described by Glenn Harrison as a
potentially key insight into why conjoint choice experiments may allow analysts to do tight statistical
calibration for hypothetical bias (personal communication, February 9, 2008).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
870 The suite of choice models
the findings in respect of MWTP and TWTP. These include the nature of the
good being studied (private or public); any connotations in terms of environ-
mental consciousness (feel good, yeah-say); the presence or absence of an opt-
out alternative; the opportunity to calibrate ASCs on all or on a subset of
alternatives that are observed in actual markets; the role of supplementary
data to condition the choice outcome, hypothetical or real, that can be
encapsulated in the notion of information processing (e.g., identifying heur-
istics such as the imposition of a threshold on the way one processes attributes
or ignoring certain attributes that impact on choices that may be generic to an
individual or specific to the experimental circumstance, including referen-
cing14 around a known experience15 – see Hensher 2006, 2008; Rose et al.
2008), and in items that identify “the confidence with which they would
hypothetically purchase the good at the stated alternative or attribute”
(Harrison 2007).
The focus of the section is on the MWTP evidence from choice experiments
(CEs) in the transportation context, and the extent to which evidence on
hypothetical bias from a wider literature, including the more extensive litera-
ture on contingent valuation, can offer guidance on how CEs might be
structured to narrow the gap between actual market WTP and WTP derived
from hypothetical choice experiments.
In Section 19.6.1, we present a number of key themes to highlight
approaches used to estimate MWTP, the main focus of this section, and to
identify possible sources of hypothetical bias revealed in the literature. This is
followed by a limited empirical assessment, using a number of traditional RP
and CE data sets, given the absence of non-experiment real choices observed
in a natural environment,16 designed to suggest directional influences of
specific CE elements on the gap between RP and CE MWTP. We then
consider the role of the numerator and the denominator in the empirical
estimation of MWTP, which suggests that a closer look at referencing within a
14
Referencing is the extent to which an application has an identifiable real observation to benchmark
against (e.g., choice among existing tolled and free routes used to establish market shares and MWTP for
time savings), in contrast to the valuation of specific attributes such as noise and safety where a real
observation of MWTP is not usually known or able to be assessed unambiguously.
15
The use of referencing the CE design to a real activity, as in toll road studies, is generally lacking in the
literature outside of transportation. Glenn Harrison makes the valid point that this may be a two-edged
frame, biasing responses. One way of establishing the presence of bias is to incorporate the reference into
the design as a treatment which is present and absent across the within subject experiment. This is also a
way of assessing endogeneity (see Train and Wilson 2008).
16
Where the individuals do not know they are in an experiment (see Harrison and List 2004) or what I
refer to as “at a distance.”
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
871 Combining sources of data
17
One referee provided extensive comment in support of conventional RP choice models as evidence of
WTP in real markets. This is controversial; however, given the extensive use of RP models as if they
reproduce real market trading among attributes, we have incorporated this benchmark in the section.
18
One way of distinguishing salient and non-salient circumstances is that a salient economic commitment
would be consistent with “I prefer X and Y and I actually chose X.” A non-salient economic commitment
would be “I prefer X to Y but there is no guarantee I will actually choose X.”
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
872 The suite of choice models
19
As Ben-Akiva et al. (1994) state, “the possibilities to elicit real WTP measures are limited because they
can be varied only on a small scale.”
20
For example, in a high occupancy toll lane (HOT) context in California, the analyst is able to measure the
travel times using third-party methods (e.g. car following), and identifying the toll as a posted price, as
was the situation reported in Brownstone and Small (2005).
21
It is not yet clear whether the analytical methods implemented to identify the use of various process rules
up to a probability are an improvement on the self-stated supplementary questions asked of respondents
as to how they processed the attribute data in CEs (e.g., non-attended to specific attributes, added up
common metric unit attributes). See also Chapter 21.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
873 Combining sources of data
according to the alternative chosen (e.g., Isacsson 2007; Lusk and Schroeder
2004). A concern with this focus is that the participation fee may be driving
the outcome, as distinct from revealing true behavior under circumstances
where the financial means is internally derived by the individual and is a real
trade, in terms of opportunity cost. Carlsson and Martinson (2001) suggest
that:
our test of external validity may not be seen as a test of truthful revelation but rather as
a test of external validity between hypothetical and actual experiments.
22
To investigate the possibility of bias caused by systematic misperception of travel times, Ghosh (2001)
used perceived time savings to help explain route, mode, and transponder choice in a tolled versus non-
tolled lane choice setting. Perception error (defined as perceived minus actual time savings) was added as
an explanatory variable. He found that commuters with larger positive perception errors are more likely
to use the toll facility; however, the RP values of time savings are not changed by including this variable,
suggesting that RP results may not be affected by perceptual problems. Ghosh was not able to identify
whether or not SP results are so affected (see also Brownstone and Small 2005).
23
The term was apparently first introduced in 1947 by S.V. Ciriacy-Wantrup, who thought that the
appropriate procedures employed interviews in which subjects are “asked how much money they are
willing to pay for successive additional quantities of a collective extra-market good.” One implicit
assumption of this definition is that contingent value is not needed for ordinary market goods. But with
respect to those goods that are not bought and sold, some device as to replace the set of prices that
markets happily make explicit. Toward that end the tester prepares an array of questions about some
particular subject matter in order to elicit how much they would be prepared to pay – the so-called
WTP – in order to secure the provision of some public good. Alternatively, they are asked how much
money someone would have to pay them – the so-called WTA – to discontinue some public project that
they hold dear. There is sharp disagreement as to how useful the best of these studies is in making value
determinations for widespread studies on the valuation of a full set of public goods – the creation of a
national park, the preservation of wildlife in an estuary, the control of epidemics, the pursuit of national
security, or whatever.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
874 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
875 Combining sources of data
19.6.2.1 CV evidence
The method of CV has been the subject of heavy criticism, with much of this
debate focusing on the validity of the results, in particular the hypothetical
nature of the experiments (see Carson et al. 1996). The accumulating evidence
suggests that individuals in such hypothetical CV studies exaggerate their
TWTP and MWTP for private and public goods, in part due to the problems
associated with the poor representation of other relevant attributes when the
CV focus is on one attribute only. Several attempts have been made to reduce
the influence of this hypothetical bias. Cheap talk scripts seemed to be one of
the most successful attempts. Initially suggested by Cummings et al. (1995,
1995a), cheap talk is an attempt to bring down the hypothetical bias by
describing and discussing the propensity of respondents to exaggerate stated
WTP for a specific good at a specific price. Using private goods, classroom
experiments, or closely controlled field settings, cheap talk proved to be
potentially successful (see Cummings and Taylor 1999). While the hypothe-
tical mean TWTP without cheap talk was significantly higher than TWTP
using actual economic commitments, the hypothetical TWTP with a cheap
talk script could not be shown to be statistically significantly different from the
actual TWTP. In general we would conclude that the evidence is mixed and
the debate is still wide open.
List and Gallet (2001)25 used a meta analysis to explore whether there are
any systematic relationships between various methodological differences and
hypothetical bias. Their results indicate that the magnitude of hypothetical
bias was statistically less for (a) WTP, as compared to willingness to accept
(WTA) applications, (b) private as compared to public goods, and (c) one
24
A process is said to be incentive compatible if all of the participants fare best when they truthfully reveal
any private information the mechanism asks for. As an illustration, voting systems which create
incentives to vote dishonestly lack the property of incentive compatibility. In the absence of dummy
bidders, a second-price auction is an example of a mechanism that is incentive compatible. There are
different degrees of incentive compatibility: in some games, truth-telling can be a dominant strategy. A
weaker notion is that truth-telling is a Bayes–Nash equilibrium: it is best for each participant to tell the
truth, provided that others are also doing so. See Harrison (2007).
25
Their empirical analysis is an update of Foster et al. (1997).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
876 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
877 Combining sources of data
19.6.2.2 CE evidence
CEs are typically framed in a manner that adds realism, in that they closely
resemble individual purchasing or use decisions. There are surprisingly few
published studies that test for hypothetical bias in CE (exceptions being Alfnes
and Steine 2005; Lusk 2003; Lusk and Schroeder 2004; Cameron et al. 2002;
Carlsson and Martinsson 2001; List et al. 2001; Johansson-Stenman and
Svedsäter 2003; Brownstone and Small 2005; Isacsson 2007). Both Carlsson
and Martinsson (2001) and Cameron et al. (2002) fail to reject a hypothesis of
equal MWTP in both a real and a hypothetical setting, while Johansson-
Stenman and Svedsäter (2003) reject the equality of MWTPs, and Lusk and
Schroeder (2004) find that hypothetical TWTP for the good exceeds real
TWTP, but fails to reject the equality of MWTPs for changes in the single
26
The increasing role that in-depth interviews and focus groups are playing in the definition of choice
experiments has been found by the author to add substantial credibility to the experiments. Recent
studies in the context of determining the MWTP for music in gym classes, and at nightclubs and discos,
which was subsequently used in Federal Court of Australia arbitrations on music royalties, confirms this.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
878 The suite of choice models
attributes. Carlsson et al. (2005) also conclude that they cannot reject the
hypothesis of a hypothetical bias for MWTP in choice experiments.
List et al. (2006) explore CEs that conveniently provide information on the
purchase decision as well as the attribute value vector. The empirical work
revolves around examining behavior in two very different field settings. In the
first field study, they explored hypothetical bias in the purchase decision by
eliciting contributions for a threshold public good in an actual capital cam-
paign. To extend the analysis a level deeper, in a second field experiment they
examined both the purchase decision and the marginal value vector via
inspection of consumption decisions in an actual marketplace. In support of
CEs, both field experiments provide some evidence that hypothetical choice
experiments combined with “cheap talk,” be it light or heavy, can yield
credible estimates of the purchase decision. Furthermore, they find no evidence
of hypothetical bias when estimating MWTP. Yet, they do find that the cheap
talk component might induce internal inconsistency of subjects’ preferences
in the choice experiment.
Lusk (2003) explored the effect of cheap talk on WTP that was elicited via a
mass mail survey (n = 4,900) for a novel food product, golden rice. Employing
a double-bounded dichotomous choice question, he found that estimated
WTP, calculated from hypothetical responses with cheap talk, was signifi-
cantly less than WTP estimated from hypothetical responses without cheap
talk. However, consistent with List (2001), he found that cheap talk does not
reduce WTP for experienced, or in our case knowledgeable, consumers. For all
consumers, average WTP for golden rice exceeds the price of traditional white
rice. The evidence that cheap talk tends to attenuate hypothetical bias only for
subjects less familiar with the good being valued by List (2001) and Lusk
(2003) reinforces the importance of referencing, a key focus of the current
discussion.
In addition, the potential effect of “realism” (Cummings and Taylor 1998)
or “consequentialism” (Landry and List 2007), or the role of “limit cards”
(Backhaus et al. 2005) further supports the appeal of referencing around an
experience good or alternative. The “limit cards” approach requests the
respondent to place an imaginary “limit card” behind the stimulus he con-
siders just sufficient to generate a choice. In this manner, the limit card
combines the first-preference response in CE studies with a ranking position
that separates acceptable stimuli from those that are not deemed capable of
leading to a choice. The underlying theoretical argument to support limit
cards is that individuals evaluate “decision” alternatives at a subjective level,
called the comparison level, which is not dissimilar to the idea of a reference
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
879 Combining sources of data
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
880 The suite of choice models
19.6.2.3 Summary
In summary, this section has identified a number of candidate influences on
the magnitude of hypothetical bias. The key influences are the use of cheap
talk to assist in attenuating bias, especially where there is a lack of experience;
the ability to opt-out in contrast to a forced choice, which is linked to
referencing in that the respondent’s opt-out is maintenance of the status
quo; and the use of processing strategies such as “limit cards” or questions
to establish the threshold limits for the set of alternatives that is treated as
serious decision alternatives.
What we appear to have is a strong recommendation for greater clarity of
the CE (i) in terms of a translation of offerings in real markets, (ii) in the
manner in which experience is embedded in the CE (through, in particular,
pivoting around an experienced good), and (iii) in the way that we capture
information to delineate the process heuristics that each individual uses in
evaluating the attributes and alternatives.
In the remaining sections, we focus on the role that referencing (linked to
opt-out) can play in reducing hypothetical bias, taking the Brownstone and
Small (2005) study as one influential and current benchmark in travel beha-
vior research of evidence on MWTP obtained from observing real behavioral
decisions relative to CE evidence. We also acknowledge that many researchers
regard the MWTP from traditional RP studies (with all their known deficien-
cies) as another “benchmark of interest,” which often produces higher mean
estimates than CE studies. An investigation into the role of process heuristics
is provided elsewhere in Hensher and Greene (2008).
27
Both of these features may well be at the center of the sources of hypothetical bias.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
881 Combining sources of data
Table 19.4 Summary of illustrative Australian empirical evidence on VTTS: traditional CE versus RP
Ratio RP/CE
Mean VTTS (using 95 percent confidence limits
Study title Context ($ per person hr.) on a symmetrical distribution)
RP versus CE evidence
Sydney-Melbourne Long-distance Error component: 1.67±2.1
1987 non-commuting; labeled RP: 9.74±6.23
mode choice CE: 5.81±3.01
Six Australian Urban commuting; labeled Error component: 0.838±0.7
capital cities mode choice RP: 3.51±1.47
1994 CE: 4.20±2,13
Sydney Pricing Urban commuting; labeled MNL 1.10±1.2
Tribunal 1995 mode and ticket type RP: 6.73±3.94
choice CE: 6.11± 3.22
Error component: 1.09±1.56
RP: 6.87±4.58
CE: 6.26±2.95
Note: All studies used a face to face pencil and paper survey.
The evidence suggests that the ratio RP/CE is not significantly different
from 1.0 in any of the three studies28 and hence we cannot reject the null
hypothesis of no evidence of hypothetical bias. If the “truth” resided in the RP
model, which in these three studies has the usual concerns with the identifica-
tion of the relevant set of non-chosen alternatives (including the measurement
error problem), then we would indeed conclude that hypothetical bias is not
an issue. The influential paper by Brownstone and Small (2005), especially in
the United States, however, has convinced a growing number of researchers
and practitioners that MWTP from CE studies is significantly under-esti-
mated compared to revealed behavior studies, and hence there is a need to seek
out a possible explanation(s) for this. Brownstone and Small (2005) suggest
that the traditional RP estimate is not a benchmark compared to real market
observation.29 Importantly Brownstone and Small’s model looks like the usual
RP model, but the data is obtained from a sample of a traveller’s actually
observed choosing between a variable priced tolled lane and a free lane. The
attributes are measured by external procedures so that the levels of times and
28
A referee described this evidence as “a good result.”
29
We acknowledge personal communication with Ken Small and David Brownstone in early 2008.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
882 The suite of choice models
costs actually experienced for both alternatives in the choice set are not subject
to the usual concerns associated with asking individuals.
Given the dominance of habit (in contrast to variety seeking) in much of
modal and route trip activity for a given trip purpose, there is appeal in
focussing on choice situations where preference inference (and hence
MWTP) might be better identified without “forcing” non-chosen alternatives
into the decision space, unless one can capture data along the lines of
Brownstone and Small (see n. 28). Our preference-revelation paradigm pro-
motes the use of a reference (or status quo) alternative in the design of CEs,
which is also the opt-out under conditions of actual experience. This appears
to offer great promise in the derivation of estimates of MWTP that have a
closer link to the real market activity of each individual.
Reference alternatives have an important role to play in giving sense to the
levels of the attributes offered in the CE choice scenarios. Using a CAPI and
internet-based surveys, we can now automatically individualize the attribute
levels in CEs relative to a reference experience (e.g., a recent travel activity).
That is, the levels seen by each individual will differ according to the levels
associated with a reference alternative, even where the design levels (as
percentage variations around the reference point) are the same. What is not
clear is whether the reference alternative should be included in the choice set
used in model estimation. In Table 19.5 we investigated this issue, using a 2004
toll road study in Sydney. Respondents were asked (i) to make a choice among
the reference alternative and two CE alternatives, and (ii) to choose among the
CE alternatives. What we find suggests that there is a difference, albeit small,
in the mean VTTS, which is smaller when the error components specification
is used, in contrast to the simple MNL form. The confidence intervals30 for the
Table 19.5 Empirical evidence on CE-based VTTS (mean $ per person hr. and statistical uncertainty) for pivot data
paradigm, treating time and cost parameters generic across all alternatives
30
These are important because the estimated VTTS are ratios of random variables, so they are also
random.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
883 Combining sources of data
above values were estimated using the t-ratio method equation derived by
Armstrong et al. (2001):
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
θt tc ðtt tc ρt 2 Þ
θt tc
ðρt 2 tt tc Þ2 ðtt2 t 2 Þðtc2 t 2 Þ
VS;I ¼ ;
θc tt ðtc2 t 2 Þ θc tt ðtc2 t 2 Þ
ð19:22Þ
where tt and tc correspond to the t-ratios for parameter estimates for travel
time and cost, θt and θc, respectively; t is the critical value of the statistics given
the degree of confidence required; and ρ is the coefficient of correlation
between both parameter estimates. This expression assures positive upper
and lower bounds for the VTTS if the parameters involved are statistically
different from zero.
On a test of differences, the error component model findings are not
statistically different at the 95 percent confidence level. This test, however,
says nothing about the added value of the reference alternative as a way of
identifying the marginal disutility of time and cost associated with an alter-
native chosen in an actual market setting, complete with all the real world
constraints that an individual takes into account in choosing that alternative.
Constraining parameters across the reference and CE design alternatives,
which is common in the majority of CEs, may actually be clouding real
information on the difference of the marginal disutility of time and cost of a
real alternative and a hypothetical alternative that may be sources of differ-
ences in VTTS.
The WTP derived from CEs reported in Tables 19.4 and 19.5 are ratios of
parameter estimates, and are typically sensitive to small changes in the
numerator and/or denominator estimates, which may be differentially
impacting on each alternative (although suppressed when parameters are
generic across the reference and CE alternatives). In other words, deviations
between RP and CE WTP estimates due to hypothetical bias might be con-
founded by deviations introduced by something as simple as adding another
attribute to one or more alternatives.31 In the following sections, we need to
take a closer look at the richness of the information in the numerator and
denominator of a WTP calculation, and the additional information offered in
31
A study by Steimetz and Brownstone (2005) cited in Brownstone and Small (2005) bootstraps the
distribution of WTP and takes the mean of this distribution as a point estimate in an effort to
accommodate this numerator and denominator sensitivity. I thank a referee for pointing this out.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
884 The suite of choice models
32
TWTP is predominantly a focus of environmental, health, marketing, and agricultural applications.
33
Brownstone and Small (2005) suggest that the mean VTTS in a toll HOT lane versus free route context is
in the range USD$20–$40 per person hr., which is about 50 percent higher than the evidence from SP
studies (i.e., $USD13–$16). The high end USD$40 is a self-selected group who had already obtained a
transponder that enabled them to use the express lane if they so chose, and hence they would be expected
to have the highest VTTS.
34
Higher attribute levels tend, holding unit of measurement fixed, to result in lower parameter estimates,
and hence with the cost parameter in the denominator, we obtain a higher mean MWTP for RP
situations compared to SP situations.
35
David Brownstone (personal communication, February 28, 2008) advises that many of the respondents
in the CE RP comparisons actually switched between the tolled and untolled alternatives regularly (at
least once a week). He suggests that it would be interesting to repeat the estimation on the subset of these
switchers, who are quite familiar with both alternatives.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
885 Combining sources of data
with heavy traffic leads to exaggeration of actual delay time. These reasons are
then used to suggest that the same level of an attribute in a CE will lead to the
same reaction, and hence a lowering of the parameter estimate for time.
Hensher (2006, 2008) and Hensher and Greene (2008) promote the idea
of attribute processing as a behaviorally meaningful way of ensuring that
individuals use the heuristics that they also use in real markets (although
there may be additional CE-specific effects given the amount of information
being offered for processing, which may not change the heuristic set but
simply invoke a specific processing rule). We do not see Wardman’s point
(ii) as a CE-specific issue, since this happens also in RP settings.
Supplementary questions should be asked to reveal such processing rules
for CE and RP data, or model specifications defined to test for and capture
specific process heuristics, up to a probability (see Hensher and Greene
2008; Hensher and Layton 2008). Furthermore Wardman’s point (iii) is
linked implicitly to the promotion of pivot designs (see Rose et al. 2008) that
can, if carefully designed, reduce this feature of many poorly designed CEs
(see below).
Brownstone and Small’s suggestion in the context of time savings realized
by using the express lane (2005, 88), is controversial; namely, that “if people
experiencing a 10-min time delay remember it as 20 min, then they probably
react to a hypothetical question involving a 20-min delay in the same way that
they react to a real situation involving a 10-min delay. This would cause their
measured value of time savings in the hypothetical situation to be exactly half
the value observed in real situations.” Unlike RP data, where one is asked to
indicate the level (and in some cases the difference) or, as in the case of
Brownstone and Small, use some other means of measuring not related to a
specific individual’s actual trip, such as floating cars and loop detectors, in a
CE the level is actually given to each sampled respondent. Hence an individual
is processing a given level of an attribute, used in model estimation, which is
not the same as asking an individual for an attribute level or obtaining it from
a third-party source, for the non-chosen alternative, or the difference, and
then constructing an attribute level for the non-chosen alternative. In one
sense, this removes an element of uncertainty associated with a respondent
having to construct a level of an attribute associated with a non-chosen
alternative in an RP study.
The MWTP (e.g., VTTS) is shorthand for the ratio of two distinct quantities –
the marginal (dis)utility of an attribute of interest (e.g., travel time), and the
marginal (dis)utility of money (Hensher and Goodwin 2004). Both are con-
founded by changes in tastes, leisure activities, education, and opportunities or
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
886 The suite of choice models
choice set open to sampled individuals, as well as the data collection paradigm
(Hensher 2006; Harrison 2006, 2007). Given the different context of a CE in
general, it must be recognized that CE studies should be annexures to RP studies
that can supplement where RP data is deficient. Pivot designs, discussed below,
may well be the way forward.
36
This is generally the case with the most popular transportation application of commuter mode or route
choice.
37
We acknowledge discussions with Ken Small on this point.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
887 Combining sources of data
38
Hensher et al. (2005) have suggested that one might estimate stand alone CE models to obtain robust
parameter estimates on each attribute and then calibrate the constants to reproduce the base market
shares observed in real markets. This removes the need to estimate RP models. This approach is
conditional on assuming that the parameter estimates obtained from RP alternatives, be they from a
stand alone RP model or a joint (rescaled) RP–CE model, are statistically and behaviorally less reliable.
than from CE alternatives, and especially the reference alternative.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
888 The suite of choice models
39
In part to recognize the greater uncertainty about the stated choice designed alternatives relative to the
reference alternative for each respondent.
40
The models used simulated MLE with 500 Halton draws and accounted for the correlation between 16
choice scenarios shown to each sampled respondent.
41
Sillano and Ortúzar (2005) argue that: “constraining a taste coefficient to be fixed over the population,
may make it grow in a less than average proportion (i.e. the parameters that are allowed to vary grow
more than the parameters that should vary over the population, but are constrained to be fixed).” If this
is the case, then it would apply to both the reference and CE alternatives. In addition, the majority of
empirical RP studies using ML also impose this condition.
42
This included unconstrained triangular and normal distributions for travel time, and random
parameters specifications for travel cost. The MNL model had a significantly worse overall fit (see note to
Table 19.3), and produced ratios of RP:CE mean VTTS of 1.46 and 1.05, respectively, for Sydney and
New Zealand.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
889 Combining sources of data
Note: LL for Sydney (912 observations) and New Zealand (1,840 observations) models are, respectively,
−662.51 and −1187.96 (MNL LLs are, respectively, −837.8 and −1630.2).
Practice Game
Make your choice given the route features presented in this table, thank you.
Details of Your
Road A Road B
Recent Trip
Time in free-flow traffic (mins) 50 25 40
Time slowed down by other traffic (mins) 10 12 12
Travel time variability (mins) +/– 10 +/– 12 +/– 9
Running costs $ 3.00 $ 4.20 $ 1.50
Toll costs $ 0.00 $ 4.80 $ 5.60
If you make the same trip again, Current Road Road A Road B
which road would you choose?
If you could only choose between the 2 Road A Road B
new roads, which road would you choose?
For the chosen A or B road, HOW MUCH EARLIER OR LATER WOULD YOU BEGIN YOUR TRIP to arrive at your
destination at the same time as for the recent trip: (note 0 means leave at same time) min(s) earlier later
Back Next
the mean for the CE alternatives is $17.92 (standard deviation of $7.82), derived
from the model that includes the reference alternative. The forced choice
43
We do not report the reference ASC and the SC dummy variable for choice scenario 1, both of which
account for the mean influence of other attributes and context.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
890 The suite of choice models
models produced a mean VTTS of $23.24 per person hr. (standard deviation of
$7.52). The ratio of the Reference to CE alternatives mean VTTS is 1.51. For the
New Zealand study, the mean VTTS for the reference alternative is $27.34 per
person hr. (standard deviation of $7.46); the mean for the CE alternatives is
$13.65 (standard deviation of $4.31), derived from the model that includes the
reference alternative. The forced choice models gave a mean VTTS of $11.28 per
person hr. (standard deviation of $5.35). The ratio of the Reference to CE
alternatives mean VTTS is 2.00. A t-ratio test of differences shows that the
WTP associated with the reference alternative and the CE alternative are
statistically significant at the 95 percent confidence level.
We find that the marginal disutility associated with travel time in the
reference alternative is substantially higher (especially for Sydney) than that
associated with the CE design alternatives, and is either similar (i.e., Sydney)
or lower (i.e. New Zealand) for cost; resulting in the higher mean VTTS for the
reference (or real market) alternative. The evidence from other studies by
Hensher and Louviere (see Hensher 2006; Louviere and Hensher 2001) that
the attribute range has the greatest influence on MWTP than any other
dimension of choice experiments,44 with MWTP being higher with a reduced
attribute range, supports the findings here; the CE design alternatives have a
wider attribute range relative to the range of attributes of other alternatives
that people face in real choices, and hence a lower mean VTTS than the mean
VTTS from the real market alternative. If we take the Sydney sample as an
example, the ratio of the range of each attribute in the numerator and
denominator of the calculation of VTTS for the reference and CE alternatives
is 1.42 for time and 1.48 for cost. The ratio of the reference alternative to CE
VTTS is 1.51; hence, are we seeing a coincidence or something of empirical
interest as a statistical calibration (ex post) adjustment to “explain” the
difference between the VTTS?
To comment further on the influence of attribute range, which has been
found to be the major dimension of a CE influencing WTP, research in
marketing (e.g., Ohler et al. 2000) suggests that heterogeneity systematically
varies with attribute range and distribution, as do model ASCs and goodness
44
Hensher and Louviere have found, in many studies, that the MWTP increases as the range of attribute
levels decreases, and vice versa. In CE studies it is common to have a wider range of an attribute to assess;
that is essentially what CEs are all about, creating a behaviorally richer variance. However, this may come
at a price, in that real markets are not so rich in variability, and hence when actual market data are used
we observe after estimation higher MWTP compared to an SC experiment. This naturally begs the
question: does the ratio of the range of each attribute in the numerator and denominator of the
calculation of MWTP for the reference and CE observations account for part or all of the difference in
the mean MWTP?
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
891 Combining sources of data
of fit measures (see also McClelland and Judd 1993), but preference model
parameters remain largely unaffected. Thus, it is unclear what to make of
empirical heterogeneity results because they may prove to be largely contex-
tual; that is, they are associated with particular patterns of attribute ranges and
samples of people, and cannot be generalized without taking differences in
attribute ranges and people into account. The need to take into account links
with characteristics of choosers and heterogeneity distributions has been
recognized (see Hensher 2006), but there has been little recognition of the
fact that if one changes the range and/or distribution of attributes in design
matrices, this can lead to significant differences in inferences about hetero-
geneity. Simply put, the greater relevance in preserving the attribute content
under a wider range will mean that such an attribute is relatively more
important to the outcome than it is under a narrow range specification, and
hence a higher mean WTP is inferred (Louviere and Hensher 2001).
The empirical evidence on VTTS from the two studies is in line with the
relative magnitudes of SC and RP mean MWTP found by Brownstone and
Small (2005)45 as long as we accept that under habitual behavior the reference
alternative has important information on the marginal disutility of attribute
levels associated with the experienced alternative. The difference between our
studies and those of Brownstone and Small is that we focussed on a known
trip, and assumed that most commuters had little idea about the non-chosen
alternative(s). The latter, one might argue, in an RP setting, exists to enable the
estimation of a choice model, and to give variability in trip attributes. Under
conditions of habitual behavior, a well designed pivot-based CE can deliver
the relevant market information as well as attribute variability, while avoiding
the problems in identifying meaningful data on non-chosen alternatives,
especially in contexts where habit and inertia are very strong elements of
real market behavior. The findings support the relative magnitudes of MWTP
found by Brownstone and Small (2005) and Isacsson (2007). If one desires to
use traditional RP MWTP as the benchmark, which in the non-transport
literature suggests that the MWTP from CE studies is on the low side, then the
findings here are consistent with closing the gap on hypothetical bias. If RP
and CE studies in transport cannot establish any evidence on hypothetical
bias, then one wonders why we have invested so much in CEs.46
45
Given the 2004 exchange rate of AUD$1=$USD$0.689, the Sydney evidence for the reference alternative
is USD$39.48, compared to the SC estimate of (i) USD$19.93 for the model that includes the reference
alternative, or (ii) USD$16.08 when the forced choice among two CE alternatives is used.
46
Except where the focus is on new alternatives and possibly very large attribute changes associated with
existing alternatives that are outside the range of market experience.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
892 The suite of choice models
The appeal of pivoting is not to imply specifying time and cost para-
meters as generic across all alternatives, but to recognize the role of CE data
in generating variability about the real market experience (i.e., the pivot) in
order to be able to estimate parameters. The argument is that this looks like
offering a richer attribute preference-revelation setting than either (i) the
current view on RP, with problematic identification of non-chosen alter-
natives, and (ii) the treatment of the CE alternatives as having “equal”
status as the pivot alternative in real market identification. Crucially,
however, we need the CE alternatives (without measurement error, but
subject to respondent perception), to provide the necessary variation in
attribute data to reveal preferences. The support for this approach is in part
reinforced by the evidence from Brownstone and Small (2005) and
Isacsson (2007) on the relativity of the market WTP against the CE
evidence from studies where actual trade-offs are being observed and
measured in real markets.
The empirical evidence here suggests that, for all the years of interest in
CEs, and the debate about the role of traditional RP and CE data, we may have
missed or masked an important message; namely, that CEs with referencing
back to a real market activity, especially where it is chosen on repeated
occasions, may provide a suitable specification, short of capturing data “at a
distance,” where the latter has evaded every single travel study to date.47 If we
recognize that the requirement to seek data on at least one non-chosen
alternative in RP modeling is linked to the creation of the variance necessary
to estimate a model, then this imposition in the context of habitual behavior
may be accommodated by variance revelation through a CE pivot design,
where the only information required from real markets relates to the habi-
tually selected alternative.
We strongly recommend further research into the proposition that future
CEs should consider using a real market reference alternative as a pivot in the
design of the choice scenarios.48 This not only grounds the experiment in
47
Brownstone and Small measured travel times of each alternative with floating cars (on SR91) and loop
detectors on I-15, which is the closest we have come to real independent observation.
48
This should include, or at least consider, the development of models in which we can account for sign
dependent preferences with respect to a reference point outcome (e.g., Hess et al. 2008), as suggested by
cumulative prospect theory (CPT). Seror (2007), in the context of women’s choices about pre-natal
diagnosis of Down syndrome, concluded that CPT fitted the observed choices better than expected
utility theory and rank dependent utility theory. Such a finding has been questioned by a number of
researchers, claiming that many studies have been far too casual about what “the” reference point is, and
allowed their priors, that loss aversion is significant to drive their specification of the reference point. See
Andersen et al. (2007b). In general, the notion of a reference point makes good sense in typical transport
applications.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
893 Combining sources of data
19.6.5 Conclusions
This section on hypothetical bias has brought together elements of the
literature on revealed and SC studies (CV and CE) to identify the nature
and extent of hypothetical bias, and what might be sensible specifications of
data and models to reduce the gap between the MWTP estimates likely to exist
in actual markets, when observed “at a distance,” and estimates from CEs.
In suggesting that the mean MWTP for time savings is lower when trading
time and cost in utility expressions associated with SC alternatives compared
to RP alternatives, we recognize that there is limited (but powerful) evidence
promoting this relativity from the very influential paper by Brownstone and
Small (2005),49 reinforced by Isacsson (2007). A way forward within the
context of CEs, when the interest is on estimating the MWTP under condi-
tions of habit, which is common in many transport applications, is to recog-
nize the real market information present in a reference alternative. What we
find, empirically, is that when a pivoted design is used for constructing CEs,
and the model is specified to have estimated parameters of time and cost that
are different for the reference alternative than the hypothetical alternatives,
the estimated VTTS is higher for the reference alternative than for the
hypothetical alternatives. This model specification is not the specification
that researchers have generally used with data from pivoted experimental
designs. Usually, time and cost are specified to have the same parameters for
49
The Brownstone and Small paper is increasingly being referenced by bankers engaged in toll road project
financing.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
894 The suite of choice models
the reference and hypothetical alternatives. The proposal here for reducing
hypothetical bias (given the Brownstone–Small “benchmark”), is to use a
pivoted design and allow different parameters for the reference and hypothe-
tical alternatives.
Despite the importance of good experimental design, the disproportionate
amount of focus in recent years on the actual design of the CE, in terms of its
statistical properties, may be at the expense of substantially placing less focus
on real behavioral influences on outcomes that require a more considered
assessment of process (see Chapter 7), especially referencing that is grounded
in reality.
There are many suggestions from the literature, derived from mixtures of
empirical evidence, carefully argued theoretical and behavioral positions, and
speculative explanation. The main points to emerge, that appear to offer
sensible directions for specifications of future choice studies, are:
1. The inclusion of a well scripted presentation (including cheap talk scripts),
explaining the objectives of the choice experiment.
2. Inclusion of the opt-out or null alternative, avoiding a forced choice setting
unless an opt-out is not sensible.
3. Pivoting the attribute levels of a CE around a reference alternative that has
been experienced, and/or there is substantial awareness of, and estimating
unique parameter estimates for the reference alternative, in order to
calculate estimates of the MWTP for an alternative that is actually chosen
in a real market.
4. The ability to calibrate the ASCs through choice-based weights on alter-
natives where actual shares are known. This may not be feasible in many
applications, but where there is evidence of actual market shares on the
same alternatives, this is essential if a valid comparison of TWTP is to be
made.50
5. The inclusion of supplementary questions designed to identify the attribute
processing strategy adopted, as well as a question to establish “the con-
fidence with which an individual would hypothetically purchase or use the
good (or alternative) that is actually chosen in the choice experiment”; the
latter possibly being added into the CE after each choice scenario and after
an additional response in the form of a rating of the alternatives, possibly
50
Where the data relates to labeled alternatives (e.g., specific routes or modes), the pooling of data across
individuals, who each evaluated the attribute packages around their chosen alternative, enables
construction of a choice model that looks like the traditional RP model form. This can then be calibrated
with choice-based weights.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
895 Combining sources of data
along the lines of “limit cards.” Fuji and Garling (2003) offer some ideas on
the certainty scale question.
6. Identifying constraints that may impact on actual choices that might be
ignored in CEs, which encourage responses without commitment. Once
identified, these constraints should be used in revising choice responses.
How this might be defined is a challenge for ongoing research.
We also support future empirical studies that can confirm or deny the
growing body of evidence on hypothetical bias in CEs. Using a toll road
context as an example, an empirical study might be undertaken of the
following form:
1. The context is the choice among competing existing tolled and non-tolled
routes including the option to consider none-of-these.
2. The attributes of interest should be, as a minimum, door-to-door travel
time and cost, where the latter is running cost and toll cost for the tolled
route, and running cost for the non-tolled route.
3. The sampled individuals are persons who currently use one of the two
routes. This defines a reference alternative.
4. There are two groups:
a. Group A participate in a SC experiment with no endowment and no
randomly selected alternative for implementation, as is often the prac-
tice in CV studies.
b. Group B is given an endowment (e.g., a $20 subsidy voucher) and told
that the voucher is a subsidy towards the toll on any tolled route, which
is valid for up to two weeks. The money is not a reward for participation.
This is common practice in many CV and dichotomous choice studies
in environmental and agricultural applications.
We have selected the two groups as a way to test some of the imposed
conditions common in many of the studies outside of transportation, as
reported here.
5. For each choice scenario, the sampled individual is asked to choose
between (i) the reference alternative, two design alternatives, and an opt-
out alternative, (ii) the reference alternative and two design alternatives,
(iii) the two design alternatives and an opt-out alternative, and (iv) the two
design alternatives.
6. Where the travel time is earlier or later than when one normally travels, we
should identify the extent to which the individual is able to adjust their
commitments to commence and/or finish the trip. This is a way of
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
896 The suite of choice models
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:45:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.023
Cambridge Books Online © Cambridge University Press, 2015
Part IV
Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:08 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:43:08 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
20.1 Introduction
899
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
900 Advanced topics
with the IID, type I Extreme value distribution assumed for the random terms
εitm. Conditioned on U(i,t,m), the choice probabilities take the familiar multi-
nomial logit (MNL) form:
Prob(choice j is made in choice situation t by individual i):
exp½Uði; t; jÞ
Pði; t; jÞ ¼ : ð20:2Þ
ΣJm¼1
it
exp½Uði; t; mÞ
The various parts allow several degrees of flexibility. In Equation (20.4), the
function hm(..) is an arbitrary non-linear function that defines the underlying
utilities (preferences) across alternatives with an error component structure
(shown as the last term). The fact that the mixed logit (ML) form set out is
extremely general, such that it could fit any specifications in any choice model,
is the appeal of the approach. The form of the utility function itself may vary
across the choices. Heterogeneity in the preference parameters of the model is
shown in Equation (20.5) in line with the ML model, where βi varies around
the overall constant, β, in response to observable heterogeneity through zi and
unobservable heterogeneity in vi. The parameters of the distribution of βi are
the overall mean (i.e., β), the structural parameters on the observed hetero-
geneity, Δ, and the Cholesky square root (lower triangle) of the covariance
matrix of the random components, Γ. The random components are assumed
to have known, fixed (usually at zero) means, constant known variances
(usually one), and to be uncorrelated. In the most common applications,
multivariate standard normality would be assumed for vi. The covariance
matrix of βi would then be Ω = ΓΓ0 . Parameters that are not random are
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
901 Frontiers of choice analysis
The conditioning is on the unobservables w,v,u, and the observables, Xi, yi, zi,
ci where (X,z,c)i is the full data set of attributes and characteristics, xi,t,m, and
observed heterogeneity, zi and ci; and yi is a full set of binary indicators, yitm,
that marks which alternative is chosen, yitj = 1, and which are not, yitm = 0, in
each choice situation. In full:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
902 Advanced topics
" #yitq
YJit exp½Uði; t; jÞ
Pði; t; jÞ ¼ q¼1 Jit : ð20:8Þ
Σm¼1 exp½Uði; t; mÞ
Pði; t; j j w i ; v i ; ui Þ×
XN ð YTi
log Lðβ; D; Γ; θ; δ; φjX; y; z; cÞ ¼ i¼1
log t¼1
dw i dvi dui :
w i ;v i ;ui f ðw i ; v i ; ui Þ
ð20:9Þ
ð20:10Þ
1 XR 1 XR YTi
PS;i ¼ P S;i ðrÞ ¼ P½i; t; j j wi ðrÞ; vi ðrÞ; ui ðrÞ ð20:15Þ
R r¼1 R r¼1 t¼1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
903 Frontiers of choice analysis
so that
XN
log LS ðβ; D; Γ; θ; δ; φjX; y; z; cÞ ¼ i¼1
log PS;i : ð20:16Þ
Denote by vec(Δ) and vec(Γ) the column vectors formed by stacking the rows
of Δ and Γ, respectively. Then:
∂ log LS ðβ; D; Γ; θ; δ; φ j X; y; z; cÞ
0 1
β
∂@ vecðDÞ A
B C
vecðΓÞ
0 1
gj ½i; t; j; ðrÞ g½i; t; ðrÞ
XN 1 1 XR XTi BB C
C
¼ ½PS;i ðrÞ t¼1 B gj ½i; t; j; ðrÞ g½i; t; ðrÞ ⊗ zi C
B
C;
i¼1 PS;i R r¼1
@ A
gj ½i; t; j; ðrÞ g½i; t; ðrÞ ⊗ v i
ð20:17Þ
where
and
XJit
g½i; t; ðrÞ ¼ m¼1
P½i; t; m j wi ðrÞ; v i ðrÞ; ui ðrÞ gm ½i; t; m; ðrÞ: ð20:19Þ
ð20:20Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
904 Advanced topics
Finally,
and
∂si ðrÞ ci
¼ si ðrÞ : ð20:22Þ
δ ui φ
∂
φ
Partial effects and other derivatives of the probabilities are typically associated
with scaled versions of the parameters in the model. In the simple MNL
model, the elasticity of the probability of the jth choice with respect to change
in the lth attribute of alternative m (see Chapter 10) is:
∂ log Pði; t; jÞ
¼ ½δjm Pði; t; mÞβl xl;itm ; ð20:23Þ
∂ log xl;itm
where δjm = 1[j = m]. In the model considered here, it is necessary to replace βl
with ∂hm(xitm, βi) / ∂xl,itm = Dl,itm Since the utility functions may differ across
alternatives, this derivative need not be generic. In addition, the derivative
would have to be simulated since the heterogeneity in βi would have to be
averaged out. The estimated average partial effect, averaged across individuals
and periods, is estimated using:
1 XN 1 XTi 1 XR
APEðl j j; mÞ ¼ ½δjm
N i¼1 T
i
t¼1 R r¼1
∂hm ½xitm ; βi ðrÞ
P½i; t; mjw i ðrÞ; vi ðrÞ; ui ðrÞ xl;itm : ð20:24Þ
∂xl;itm
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
905 Frontiers of choice analysis
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
906 Advanced topics
where EðUÞ is the expected utility; m (=1,. . .,M) are the possible outcomes for
an attribute and m ≥ 2; pm is the probability associated with the mth outcome;
xm is the value for the mth outcome; and r is the parameter to be estimated
which explains respondents’ attitudes towards risk (r < 1: risk averse; r ¼ 1:
risk neutral (which implies a linear function form); r > 1: risk loving).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
907 Frontiers of choice analysis
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
908 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
909 Frontiers of choice analysis
of attribute levels and the probability weighting function that Tversky and
Kahneman used in their later article (Kahneman and Tversky 1992). In the
1992 paper, they posited CPT not as a different theory but as “a new version of
Prospect Theory,” and they included even more restrictive simplifying
assumptions that may be used to describe observed behavior in particular
contexts (e.g., a linear approximation of the power function).
Tversky and Kahneman (1992) provided parametric formulae for the value
functions under a constant relative risk aversion (CRRA) assumption, as well
as a one-parameter probability weighting function. The value function in the
gain domain for x ≥ 0 is V ¼ xα and in the loss domain, where x < 0, it is
V ¼ λðxÞβ : α and β are the exponents of the value function over gains and
losses, respectively, and λ is the coefficient of loss aversion postulating that a
loss is treated as more serious than a gain of equal size.1 The probability
weighting function suggested by Tversky and Kahneman (1992) is given in
Equation (20.26). There are a number of alternative weighting functions, e.g.,
a two-parameter weighting function proposed by Goldstein and Einhorn
(1987) given in Equation (20.27), and another version of a one-parameter
weighting function derived by Prelec (1998), given in Equation (20.28):
pγm
wðpm Þ ¼ γ 1 : ð20:26Þ
½pm þ ð1 pm Þγ γ
τpγm
wðpm Þ ¼ γ : ð20:27Þ
τpm þ ð1 pm Þγ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
910 Advanced topics
Value function, ν,
for monetary gains and losses
($)
υ
losses gains
the cumulative probability distribution, where all potential outcomes are typi-
cally ranked in increasing order in terms of preference (from worst to best, see
Equation 20.4).2 Hence, the cumulative prospect value is defined as:
X
CPðVÞ ¼ m
πðpm ÞVðxm Þ:
2
Outcomes can also be ranked from best to worst (see e.g., Diecidue and Wakker 2001).
3
Risk-averse is where a sure alternative is preferred to a risky alternative (i.e., with multiple possible
outcomes) of equal expected value; risk-seeking is where a risky alternative is preferred to a sure
alternative of equal expected value.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
911 Frontiers of choice analysis
0.9
W+ W–
0.8
0.7
0.6
w(p)
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
p
Figure 20.2 Probability weighting functions for gains (W+) and losses (W−) from Tversky and Kahneman (1992);
smooth line is average, - - - is gains, and –.–. is losses.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
912 Advanced topics
Camerer and Ho (1994) suggest that the decision weighting function should
be used, rather than EUT linear probability weighting, given that the former is
able to capture individual subjective beliefs and improve model fit.
With regard to non-linear decision weighting, a common finding is that
people tend to over-weight outcomes with lower probabilities, and to under-
weight outcomes with higher probabilities (see, e.g., Tversky and Kahneman
1992; Camerer and Ho 1994; Tversky and Fox 1995). This is because prob-
abilities are weighted by an inverse S-shaped probability weighting function
(see Figure 20.2 when γ ¼ 0:56). Roberts et al. (2006) have found opposite
results (i.e., over-weighting outcomes with higher probabilities and under-
weighting outcomes with lower probabilities), by applying decision weights in
the context of individuals’ preferences for environmental quality.
Constant absolute risk aversion (CARA) and constant relative risk aversion
(CRRA) are the two main options for analyzing the attitude towards risk,
where the CARA model form postulates an exponential specification for the
utility function, and the CRRA form is a power specification (e.g., U ¼ xα ).
For the non-linear utility specification, the CRRA form rather than CARA is
used in this study, given that CARA is usually a less plausible description of
the attitude towards risk than CRRA (see Blanchard and Fischer 1989).
Blanchard and Fischer (1989, 44) further explain that “the CARA specification
is, however, sometimes analytically more convenient than the CRRA specifi-
cation, and thus also belongs to the standard tool kit.” CRRA has been widely
used in behavioral economics and psychology (see, e.g., Tversky and
Kahneman 1992; Holt and Laury 2002; Harrison and Rutström 2009) and
often delivers “a better fit than alternative families” (Wakker 2008, 1329). We
estimate the constant relative risk aversion (CRRA) model form as a general
power specification (i.e., U ¼ x1α =ð1 αÞ), more widely used than the sim-
ple xα form (Andersen et al. 2012; Holt and Laury 2002).
20.4 Case study: travel time variability and the value of expected
travel time savings
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
913 Frontiers of choice analysis
attributes that have risk (and uncertainty4) associated with them; for example,
the variability associated with travel time for the same repeated trip. Although
recent choice analysis research has focused on alternative treatments of
attributes under the broad rubric of heuristics and attribute processing (see
Hensher 2010 for an overview and Chapter 21), there has been a somewhat
limited effort to formally include, in travel choice analysis, the “obvious”
observation that attribute levels vary in repeated travel activity (e.g., travel
times for the daily commute), and hence attribute risk and perceptual con-
ditioning5 is ever present which, if ignored, become yet another confounding
source of unobserved utility associated with alternatives on offer (see van de
Kaa 2008).
Travel time variability has become an important research focus in the
transportation literature, in particular traveler behavior research. Within a
linear utility framework, the scheduling model and the mean–variance model,
typically developed empirically within the SC theoretic framework, are two
dominant approaches to empirical measurement of the value of time varia-
bility (see e.g., Small et al. 1999; Bates et al. 2001; Li et al. 2010 for a review).
However, with a few exceptions, the majority of the existing travel time
variability studies ignored two important components of decision making
under risk that are present in responses to travel time variability: non-linear
probability weighing (or perceptual conditioning), and risk attitudes,
although some of the studies recognized travel time variability in their SC
experiments in terms of a series of travel times for a trip (e.g., 5 or 10). These
traditional approaches for travel time variability are implemented under
“linear probability weighting” and “risk neutrality.”
Incorporating perceptual conditioning (through decision weights), bor-
rowed from prospect theory, into a EUT specification of particular attributes,
but staying within an overall RUM framework, offers a new variant on EU,
which we call attribute-specific extended EUT (EEUT). A number of para-
metric functional forms for such decision weights have been developed in the
4
Risk refers to a circumstance where the chooser knows with precision the probability distribution of
possible outcomes (e.g., when the analyst indicates the chance of specific travel times occurring over
repeated commuting trips). Uncertainty refers to a situation where a chooser is not offered such
information, and is required to assess the probabilities of potential outcomes with some degree of
vagueness and ambiguity (e.g., when an analyst indicates that a trip could take as long as x minutes and as
quick as y minutes, without any notion of likely occurrence).
5
The Allais paradox (Allais 1953) suggests that probabilities given in choice experiments are in reality
transformed by decision makers in the face of risky choices. To account for the perceptual translation of
agents, non-linear probability weighting was introduced by a number of authors to transform the analyst-
provided probabilities into chooser perceptions.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
914 Advanced topics
Z
X
U ¼ EEUTðUÞ þ βz Sz : ð20:32Þ
z¼1
6
The experimental design and modeling framework accommodates decisions under risk, although travel
time variability is best described under uncertainty rather than risk. Research should also address choice
made under uncertainty (in the face of travel time variability) in terms of both experimental design and
modeling approaches.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
915 Frontiers of choice analysis
undertaken in Australia in the context of toll versus free roads, which utilised a
SC experiment involving two SC alternatives (i.e., route A and route B)
pivoted around the knowledge base of travelers (i.e., the current trip). The
trip attributes associated with each route are summarized in Table 20.1.
Each alternative has three travel scenarios – “a quicker travel time than
recent trip time,” “a slower time than recent trip time,” and “the recent trip
time.”7 Respondents were advised that departure time remains unchanged.
Each is associated with a corresponding probability8 of occurrence to indicate
that travel time is not fixed but varies from time to time. For all attributes
except the toll cost, minutes for quicker and shorter trips, and the probabilities
associated with the three trip times, the values for the SC alternatives are
variations around the values for the most recent trip. Given the lack of
exposure to tolls for many travelers in the study catchment area, the toll levels
are fixed over a range, varying from no toll to $4.20, with the upper limit
determined by the trip length of the sampled trip. The variations used for each
attribute are given in Table 20.2, based on a range that we have shown in
various studies (see Li et al. 2010) to be meaningful to respondents, while still
delivering sufficient variability to identify attribute preference.
7
The data was not collected specifically to study trip time variability, and hence the limit of three travel
times, in contrast to the five levels used by Small et al. (1999) and 10 levels used by Bates et al. (2001),
where the latter studies focused specifically on travel time variability (or reliability).
8
The probabilities are designed and hence exogenously induced to respondents, similar to other travel time
variability studies.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
916 Advanced topics
Table 20.2 Profile of the attribute range in the stated choice design
Free flow time −40% −30% −20% −10% 0% 10% 20% 30%
Slowed down time −40% −30% −20% −10% 0% 10% 20% 30%
Stop/Start time −40% −30% −20% −10% 0% 10% 20% 30%
Quicker trip time −5% −10% −15% −20% – – – –
Slower trip time 10% 20% 30% 40% – – – –
Prob. of quicker time 10% 20% 30% 40% – – – –
Prob. of most recent trip 20% 30% 40% 50% 60% 70% 80% –
time
Prob. of slower trip time 10% 20% 30% 40% – – – –
Running costs −25% −15% −5% 5% 15% 25% 35% 45%
Toll costs $0.00 $0.60 $1.20 $1.80 $2.40 $3.00 $3.60 $4.20
There are three versions of the experimental design, depending on the trip
length (10 to 30 minutes, 31 to 45 minutes, and more than 45 minutes, the
latter capped at 120 minutes), with each version having 32 choice situations
(or scenarios) blocked into two sub-sets of 16 choice situations each. An
example of a choice scenario is given in Figure 20.3. The first alternative is
described by attribute levels associated with a recent trip, with the levels of
each attribute for Routes A and B pivoted around the corresponding level of
actual trip alternative.
In total, 280 commuters were sampled for this study. The experimental
design method of D-efficiency used here is specifically structured to increase
the statistical performance of the models with smaller samples than are
required for other less (statistically) efficient designs, such as orthogonal
designs (see Rose and Bliemer 2008 and Chapter 7).
The socio-economic profile of the data is given in Table 20.3, and the
descriptive overview of choice experiment attributes is given in Table 20.4.
The descriptive statistics for the time and probability variables are given in
Table 20.5.
The design assumes a fixed level for a shorter or longer trip within each
choice scenario. However, across the choice scenarios, we vary the probability
of a shorter, a longer, and a recent trip time, and hence recognize the
stochastic nature of the travel time distribution (see Table 20.2 where, for
example, the probability of travel time occurrence varies from 10 percent to 40
percent in the CE).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
917 Frontiers of choice analysis
Game 5
Illustrative Choice Experiment Screen
Make your choice given the route features presented in this table, thank you.
Details of your Route A Route B
recent trip
Average travel time experienced
Time in free flow traffic (minutes) 25 14 12
Time slowed down by other traffic (minutes) 20 18 20
Time in stop/start/crawling traffic (minutes) 35 26 20
Probability of travel time
9 minutes quicker 30% 30% 10%
As above 30% 50% 50%
6 minutes slower 40% 20% 40%
Trip costs
Running costs $2.25 $3.26 $1.91
Toll costs $2.00 $2.40 $4.20
If you make the same trip again, which
route would you choose? Current Road Route A Route B
20.4.2 Empirical analysis: mixed multinomial logit model with non-linear utility functions
We focus on an MMNL model. MNL estimates are given in Hensher et al. (2011).
For the random parameters, unconstrained normal distributions are applied to
the Expected time parameter and the Cost parameter. Given that the distributions
for α and γ are quite likely to be asymmetrical, skewed normal distributions are
used for these two parameters. The skewed normal distribution is given as βk,i =
βk + σk Vk,i + θk |Wk,i|, where both Vk,i and Wk,i are distributed as standard
normal. This form is in line with Equation (20.5) except that we have not
included the covariate term, Δzi (observable heterogeneity through zi)9 but
have added in the extra term to allow for skewness or asymmetry. The second
term is the absolute value. θk may be positive or negative, so the skewness can go
in either direction. The range of this parameter is infinite in both directions, but
since the distribution is skewed, it is therefore asymmetric.
We can derive the value of an expected travel time savings (VETTS) as
given in Equation (20.33). The only difference across the four models is in the
form of the probability weighting functions:10
9
In the mixed logit specification, we did investigate the role of socioeconomic characteristics as sources of
systematic heterogeneity associated with the random parameters; however we were unable to find any
statistically significant influences.
10
In the experimental design, there are three possible travel times for each alternative route within a choice set.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
918 Advanced topics
Notes: PS, PL and PMR are probabilities for quicker, slower, and recent trip
time, MRT is the most recent travel time (the sum of three components: free
flow, slowed down and stop/start times), X(quicker) and Y(slower) are the
amounts of quicker and slower times compared with most recent time; which
are designed and presented in the experiment. ST is the actual quicker (or
shorter) travel time (=MRT −X(quicker)); LT is the actual slower (or longer)
travel time (=MRT +Y(slower)); PTE (=PS * ET), PTL (=PL * LT) and PTMR
(=PMR * MRT) are probability weighted values for quicker, slower and most
recent time respectively.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
919 Frontiers of choice analysis
Nonrandom parameters:
Reference constant 0.5129 2.69
Tollasc −0.6766 −4.32
Age (years) 0.0305 7.26
Means for random parameters:
Alpha (α) 0.4727 14.56
Gamma (γ) 0.7355 2.33
Expected Time (minutes) −0.3708 −4.69
Cost ($) −0.8554 −9.88
Standard deviations for random parameters:
Alpha (α) 1.5896 18.21
Gamma (γ) 1.3276 3.12
Expected Time (minutes) 0.6911 4.82
Cost ($) 1.1720 9.53
Skew normal θ for Alpha −1.8673 −20.01
Skew normal θ for Gamma 0.3469 0.57
No. of observations 4480
Information Criterion : AIC 5444.59
LL −2709.29
VETTS 7.73 (0.53)
11
We ran models with 100 draws, 250 draws and 500 draws. The model with 250 draws has a better model
fit than other two models (log-likelihood: −2731.55 for 100 draws; −2709.29 for 250 draws; −2745.48 for
500 draws). Models took between 10 and 25 hours to estimate and converge. Further details of Halton
draws are provided in Bhat (2001) and Halton (1970).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
920 Advanced topics
12
Under the probability weighting function with γ ¼ 1:7419; within the range of our designed
probabilities, only when the raw probability is 0.8, the transformed probability is slightly higher (i.e.,
0.807).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
921 Frontiers of choice analysis
1
y = 0.9261 (lowest)
0.9 y = 1.7419 (mean)
y = 4.1734 (highest)
0.8
0.7
0.6
w(p)
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
p
Figure 20.4 Individual probability weighting function curves (MMNL)
Frequency
(see Figure 20.5), giving increased behavioral realism relative to the simple
MNL model. It is common to observed extreme values and sign changes in
distributions of WTP in many studies where unconstrained distributions are
used (see Hensher 2006). The log-normal circumvents a sign change but
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
922 Advanced topics
13
Some applied studies often remove the extreme tails when using unconstrained distributions.
14
The standard deviation in the MNL model is caused by different levels of probabilities and times (see
Equation 20.31).
15
This commentary does not suggest that there may be other influences at play; however attributing the
findings to preferences, probability weighting, and attitude toward risks does not preclude the role of
other effects. The evidence nevertheless is very encouraging and suggests that consideration of these
additional behavioral dimensions has merit.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
923 Frontiers of choice analysis
As an aside, NLRPlogit as a non-linear utility function cannot know in advance where the
missing data are. Hence, regardless of whether a variable is included in a specific utility
expression or not, if it is coded −999 and associated with any alternative that is included in
the model, then it will create an error message (error number 509) and it is almost certain
that the model will not estimate.
Timer$
? Generic for E, L, On and random parameters for risk attitude, time, cost
and Gammap
NLRPLogit
; Lhs = Choice1,cset3,alt3
; Choices = Curr,AltA,AltB
; checkdata
; maxit=10
; Labels = bref,betac, gammap,btolla,bage ,alphar, betatelo, ttau
; Start = 0.48, -0.33,0.21,-0.3,0.03 ,0.05,-0.34, 1.9
; Fn1 = earltr=(earlta^(1-alphar))/(1-alphar) ?equation 20.26
; Fn2 = latetr=(lateta^(1-alphar))/(1-alphar)
; Fn3 = ontr=(time^(1-alphar))/(1-alphar)
; Fn4 = wpo = (Ttau*pronp^gammap)/(Ttau*pronp^gammap + (1-pronp)^gammap)
; Fn5 = wpe = (Ttau*preap^gammap)/(Ttau*preap^gammap + (1-preap)^gammap)
; Fn6 = wpl = (Ttau*prlap^gammap)/(Ttau*prlap^gammap + (1-prlap)^gammap)
; Fn7 = Util1 = bref+wpe*(betatelo*earltr) +wpl*(betatelo*latetr) +betac*cost +
wpo*(betatelo*ontr) +btolla*tollasc +bage*age1
; Fn8 = Util2 = +wpe*(betatelo*earltr) +wpl*(betatelo*latetr) +betac*cost +
wpo*(betatelo*ontr) +btolla*tollasc
; Fn9 = Util3 = +wpe*(betatelo*earltr) +wpl*(betatelo*latetr) +betac*cost +
wpo*(betatelo*ontr) +btolla*tollasc
; Model: U(Curr)=Util1/U(AltA) = util2/U(AltB) = util3
;RPL;halton;draws=250;pds=16;parameters;fcn=alphar(s), gammap(s),ttau (s),
betatelo(n), betac(n)$? unconstrainted, some risk averse(alpha<0), some = l
Nonlinear Utility Mixed Logit Model
Dependent variable CHOICE1
Log likelihood function -2755.94897
Restricted log likelihood -4921.78305
Chi squared [ 16 d.f.] 4331.66816
Significance level .00000
McFadden Pseudo R-squared .4400507
Estimation based on N = 4480, K = 16
Information Criteria: Normalization=1/N
Normalized Unnormalized
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
924 Advanced topics
***************************EEUT*****************************************************
? earlta, lateta, and time are three possible travel times per trip when arriving
early, late and on time; and preap, prlap, and pronp are associated probabilities of
occurrence correspondingly.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
925 Frontiers of choice analysis
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
926 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
927 Frontiers of choice analysis
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
928 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
929 Frontiers of choice analysis
Factor Analysis
Explanatory Explanatory Explanatory
Indicators Latent Latent
Variables Variables Indicators Variables
Variables Attributes
Figure 20.6 Incorporating latent variables in discrete choice models using different methods
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
930 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
931 Frontiers of choice analysis
disturbances, Ψ disturbances, ε
Decision
disturbances, v
process, u
measurement equations
RP/SP choice
indicators, d structural equations
latent (unobserved) variables
Choice model observable variables
Figure 20.7 The integrated latent variable and discrete choice modeling framework
Sources: Walker and Ben-Akiva (2002), Bolduc et al. (2005).
general model system (which will be offered in Nlogit 6). The various elements
are presented below.
z ¼ Γw þ η: ð20:35Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
932 Advanced topics
U ¼ U ðx; z ; IÞ þ ε; ð20:38Þ
where ε has the usual iid Extreme value distribution, producing a conditional
MNL model. More advanced models are also permissible, such as ML.
We can now define y generically as the indicator of the choice within a
maximum random utility setting, as per usual. z at this stage has a very
general form and role in the choice model.
Likelihood function: the LL is composed, ultimately, of the joint densities
of the observed outcomes, the multinomial choice, and the observed indica-
tors. The form of the joint density is given in Equation 20.39:
ð
Pðy; Ijw; z; xÞ ¼ Pchoice ðyjz ; w; x; z; IÞPindicators ðIjz ; w; x; zÞf ðz jwÞdz:
z
ð20:39Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
933 Frontiers of choice analysis
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
934 Advanced topics
To obtain the contribution to the LL, we integrate zi out of P(i|zi,. . .), then
take logarithms.
The data arrangements are set out below as a way of assisting in under-
standing data requirements. We will assume that a variable number of choices
and a variable number of choice tasks, while feasible, can be ignored with both
fixed across a sample. Assume three alternatives in a choice set, and a SC
experiment with two choice tasks. The schema below in Table 20.7 suggests
one attribute in the choice model, up to three Qj indicator, one or two Ait task
level models, and one Ii person level indicator equation. In each case, there
could be multiple models, though a model with more than one Qj alternative
specific model equation would be somewhat complex.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
935 Frontiers of choice analysis
Rows 1−3:
1 1 Y1,1 X1,1 Q1 h1 A1 f1 I g m
2 1 Y2,1 X2,1 Q2 h2 A1 f1 I g m
3 1 Y3,1 X3,1 Q3 h3 A1 f1 I g m
Rows 4−6:
4 2 Y1,2 X1,2 Q1 h1 A2 f2 I g m
5 2 Y2,2 X2,2 Q2 h2 A2 f2 I g m
6 2 Y3,2 X3,2 Q3 h3 A2 f2 I g m
Note 1: Block of 3 rows is repeated with each choice task. Only the data given with the first task are actually
used, since these indicator models use N observations to fit each of the J models.
Note 2: Each row is repeated within the choice task. Only the first row in each choice task block is actually
used. Each of the T Choice task indicator models is fit with N observations.
Note 3: Each row is repeated for every row of the individual’s data set. Only the first row is actually used. The
individual model is fit with N observations in total.
Note 4: Same configuration of Ii indicators. Equation for z is fit at the individual level.
The preceding addresses all the data and synchronization issues the analyst
is likely to want to specify in Nlogit, even if for any one application there are
redundant data in the table above, as shown in the notes. Examples of model
applications using the one common data set up are as follows, noting that the
entire data set for this person is rows 1–6.
1. To fit an individual level, Ii model, I will use row 1 only, and ignore rows 2–6.
2. To fit a choice task level model relating to choice task 1, I will use row 1. If
there is also a question about choice task 2, I would use row 4. Rows 2–3
and 5–6 would be ignored.
3. To fit a model about alternative 1, I would use row 1. To fit an equation
about alternative 2, I would use row 2. To fit an equation about alternative
3, I would use row 3. Rows 4–6 would be ignored.
Rows that are not used are filled with the corresponding data mainly to
“coerce” the user into always providing the correct data in the internal rows
of the block. This will provide a way for the user to keep the data straight. An
additional advantage of this structure is that the simulation that will be needed
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
936 Advanced topics
to integrate out the random part of the attitude variables will run at the same
rate as the random parameters synchronization.
The proposed syntax for the command (in release Nlogit 6) is:
HybridLogit
; Lhs = choice variable
; Choices = list of choices
[; specification of utility functions using the standard arrangements]
[;RPL ; Fcn = the usual specification] allow some random parameter spe-
cification
; Attitudes : name (choices in which it appears) [= list of variables ] /
name (choices in which it appears) [= list of variables] . . .
; Indicators: name (level, type) = list / level is Individual, Choice, Task
name (level, type) = list . . .$ type is Continuous,Binary,Scale
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:48:42 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.025
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
937-1071
This chapter was co-authored with Waiyan Leong and Andrew Collins.
21.1 Introduction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
938 Advanced topics
attribute, alternative, and choice set levels, with empirical evidence suggesting
that inclusion of process matters in a non-marginal way in the determination
of estimates of WTP, elasticities, and choice outcomes. This chapter focuses
on the role of heuristics in information processing in choice experiments
(CEs), since this is the setting within which the major contributions have
been made, but we remind readers that the heuristics also apply in the context
of revealed preference (RP) data.
Although there should be no suggestion that fully compensatory choice
rules are always invalid – indeed they may be, in aggregate, an acceptable
representation of many process circumstances – there is a strong belief that
process heterogeneity exists as a consequence of mixtures of genuine cognitive
processing strategies that simplify decision making in real markets, for all
manner of reasons, and the presence of new states that are in particular
introduced through the design of CEs that are no more than new circum-
stances to process. Whether the processing rules adopted are natural to real
choices, or are artefacts of the design of an experiment, or some other survey
instrument (including RP surveys) in front of an individual, is in some senses
irrelevant; what is relevant is the manner in which such choice assessments are
processed in respect of the role that each design attribute and the mixture of
attributes and alternatives plays in the outcome. Yoon and Simonson (2008)
and Park et al. (2008)1 provide some interesting perspectives from marketing
research on preference revelation.
There is a substantial extant literature in the psychology domain as regards
the influence of various factors on the amount of information processed in
decision tasks. Evidence demonstrates the importance of such factors as time
pressure (e.g., Diederich 2003), cognitive load (e.g., Drolet and Luce 2004),
and task complexity (Swait and Adamowicz 2001a) in influencing the decision
strategy employed during complex decision tasks. There is also a great deal of
variability in the decision strategies employed in different contexts, and this
variability adds to the complexity in understanding the behavioral mechan-
isms involved in decision making and choice. There is also a debate on what
constitutes “complexity” in the eyes of the decision maker (in contrast to the
assumptions of the analyst), with some authors such as Hensher (2006)
suggesting that relevance is what matters and that what is complex to one
agent may not be so to another. We discuss this in more detail below.
1
Park et al. (2008) promotes the idea of starting with a basic product profile and upgrading it one attribute
at a time, identifying the WTP for that additional attribute given the budgets available.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
939 Attribute processing, heuristics, and preference construction
Attribute or
Strategy alternative-based Amount of information Consistency
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
940 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
941 Attribute processing, heuristics, and preference construction
strategy within the same decision context. In other words, once a strategy is
selected for a given task (or choice), it does not change within the task.
This issue is further complicated by an influential psychological theory
which identifies two main stages in the decision process. Differentiation and
consolidation (Diff Con) theory, developed by Svenson and Malmsten (1996),
assumes that decision making is a goal oriented task which incorporates the
pre-decision process of differentiation and the post-decision process of con-
solidation. This theory is crucial in encouraging a disaggregation of the entire
decision process.
The two issues discussed above, regarding the adaptive nature of strategies
and the disaggregation of the decision process, are issues that can only be
assessed realistically within a paradigm that relaxes the deterministic assump-
tion of most rational and normative models of decision making. In other words,
a stochastic specification of AP capable of accommodating the widespread
consensus in the decision making literature that decision making is an active
process which may require different strategies in different contexts and at
different stages of the decision process (e.g., Stewart et al. 2003). As the relevance
of attributes in a decision task changes so, too, must our approach to modeling
the strategies that individuals employ when adapting to such changes.
There is widespread evidence in the psychology literature concerning the
behavioral variability, unpredictability, and inconsistency regularly demon-
strated in decision making and choices (e.g., Gonzáles-Vallejo 2002; Slovic
1995), reflecting an assumption that goes back at least to Thurstone’s law of
comparative judgment (1927). One of the particularly important advantages
of using a stochastic representation of decision strategies, as promoted here, is
that it enables a more behaviorally realistic analysis of variation in decision
strategies.
Recent research by Hensher (2006, 2008), Greene and Hensher (2008),
Layton and Hensher (2010), Hensher and Rose (2009), Hensher and Layton
(2010), Hess and Hensher (2010), Puckett and Hensher (2008), Swait (2001),
Cantillo et al. (2006), Cameron (2008), Scarpa et al. (2008), Beharry and
Scarpa (2008), Cantillo and Ortúzar (2005), and Hensher et al. (2009),
among others, are examples of a growing interest in the way that individuals
evaluate a package of attributes associated with ordered or unordered
mutually exclusive alternatives in real or hypothetical markets, and make
choices.2 The accumulating empirical evidence suggests that individuals use
2
This chapter does not consider other aspects of process in CEs such as uncertainty in the choice response.
See Lundhede et al. (2009).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
942 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
943 Attribute processing, heuristics, and preference construction
There is much in the psychology literature that points to the use of quick
mental processing rules known as heuristics that are relied on to manage the
vast number of decisions that must be made in everyday life. It is recognized
that the WADD rule, if followed strictly to the letter, is cognitively demanding
and time consuming (Payne et al. 1993). Furthermore, it implies an assump-
tion of stable, well articulated preferences which appears to hold only under
conditions where the choice task is familiar or when the respondent has
experience with the various alternatives that are presented. When these
conditions fail to apply, preferences are not determined in advance of the
choice situation, but are instead constructed in response to the characteristics
of the choice task. As Payne et al. (1999, 245) put it, the construction process
involves an interaction between “the properties of the human information
processing system and the properties of the choice task.”
Rather than static decision processes that are repeatedly applied to different
choice contexts, behavioral decision research tells us that “individuals have a
repertoire of decision strategies for solving decision problems” (Bettman et al.
1998, 194). Table 21.2 (based on Table 21.1) describes some classic decision
strategies that have been identified in the decision research literature (Payne
et al. 1993).
Bettman et al. (1998) propose a choice goals framework to understand how
individuals come to use particular decision strategies. They argue that respon-
dents attempt to trade-off between two conflicting goals: maximizing the
accuracy of a decision and minimizing the cognitive effort required to reach
that decision. The effort–accuracy trade-off is on view in the majority of
decision cases, although individuals may also pursue other goals like mini-
mizing negative emotions and maximizing the ease of justifying the decision.
Bettman et al.’s framework resonates with Jones’ (1999) thesis that people are
“intendedly rational,” but limits imposed by human cognitive and emotional
architecture constrain decision making behavior.
To assess cognitive effort, decision strategies can be decomposed into
elementary information processes (EIPs) – such as READ, COMPARE,
ADD, MULTIPLY, ELIMINATE, and so on. An EBA strategy can be thought
of as (i) reading the weight of each attribute; (ii) comparing the weight just
read with the largest weight found previously until the most important
attribute is found; (iii) reading a cut-off threshold for that attribute; (iv)
reading the attribute value across all alternatives; (v) comparing each value
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
944 Advanced topics
against the cut-off; and (vi) eliminating alternatives whose attribute values fail
to meet the cut-off. Cognitive effort for the decision strategy is then expressed
as a function of the total number of EIPs and types of EIPs. The reason for
varying cognitive effort with EIP type is that in empirical estimates EIPs have
been found to differ in cognitive effort requirements – for example,
MULTIPLY takes over 2 seconds versus under half a second for COMPARE.
To define the accuracy of a decision strategy, Payne et al. (1993) suggest
comparing the WADD value of the choice in relation to the normative
WADD rule. Such a relative accuracy measure is proposed in Equation (21.1):
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
945 Attribute processing, heuristics, and preference construction
For the common heuristics described earlier, Payne et al. conclude that
relative accuracy does not decrease very much when the number of alterna-
tives increases, but the cognitive effort, measured in terms of the EIP work-
load, increases much more rapidly for the WADD strategy than for the
heuristics. Thus, as the number of alternatives grows in a choice task, the
heuristics appear to be more efficient from an effort–accuracy trade-off
perspective. This means that with, say six or eight alternatives given to the
respondent, the effort–accuracy framework predicts a shift from compensa-
tory to non-compensatory choice strategies. Such shifts have indeed been
observed in empirical settings through process tracing methods (see Section
21.5.3). Respondents have been known to use attribute-based strategies like
EBA early in the process to reduce the number of alternatives before using an
alternative-based strategy such as additive utility to arrive at the final outcome,
in what has been called a phased decision strategy.
More generally, in relatively less complex choice tasks, where complexity,
according to Payne et al., refers to task characteristics such as the number of
alternatives, number of attributes, and time pressure,3 the effort–accuracy
perspective predicts that compensatory decision strategies such as the
WADD model tend to be more frequently employed. This idea of complexity
can be distinguished from Hensher’s (2006d) notion of relevance, which
pertains to providing more complete descriptions of attributes in the choice
task and allowing respondents to form their own processing rules with regards
to relevance. Hence, a choice task that disaggregates, say, a time attribute in its
various components such as free-flow time, slowed-down time and stop-start
time may be more relevant than aggregating these components into an overall
“time” attribute.
Given more attributes to process, someone using a fully compensatory
strategy is required to exert greater cognitive effort. When there are more
attributes, there is consistent evidence showing that respondents become
more selective in their information search, by reducing the proportion of
information searched (Sundstrom 1987; Olshavsky 1979; Payne 1976), but
evidence is mixed as to whether this represents a fundamental change in
decision strategy (Sundstrom 1987) or whether this is a case of different
weights being applied (Olshavsky 1979). Compared to the WADD rule, it is
also unclear how the relative efficiencies of the heuristics stack up in this
situation. Unlike what happens when the number of alternatives increases,
Payne et al. show that the relative accuracy of the heuristics like LEX and SAT
3
Although in most CEs, time pressure is not experimentally manipulated.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
946 Advanced topics
decreases as the number of attributes increases, the exception to this being the
EBA rule.4 Neither has the question of whether too many attributes can
overload respondents and lead to a degradation of choice quality been
resolved. Some authors like Malhotra (1982) argue for this position, but
Bettman et al. (1998) suggest that increases in the amount of information
given to respondents need not be harmful as long as they select information
that reflects their values, rather than basing their decision on surface features
of the choice task such as salience or format.
So far, the discussion of Payne et al.’s effort–accuracy framework posits a
top down approach, where the respondent weighs the costs and benefits of
adopting each of the various decision strategies, and then chooses the one
which best meets the effort–accuracy trade-off for the required task. A com-
plementary view involves preference construction as “bottom up” or “data-
driven” (Payne et al., 1993, 171), where respondents shape or change decision
strategies by exploiting previously encountered problem structures. Decision
problems are subsequently restructured as an intermediate step, making them
more amenable to analysis using certain heuristics. Information in choice
tasks might be transformed through rounding or standardizing values in a
common metric. Information might also be rearranged or further simplified
by deeming certain attributes irrelevant. The restructuring serves to reduce the
perceived complexity of the choice task (Payne and Bettman 1992; Jones
1999).
4
With an increase in the number of attributes, the EBA rule requires the chosen alternative to surpass more
cut-off values.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
947 Attribute processing, heuristics, and preference construction
X
Pjq ¼ Pq ðjjCÞPq ðCÞ; ð21:2Þ
C2G
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
948 Advanced topics
λjk and κjk are the respective penalties of violating the lower bound and upper
bound constraints. Let the lower bound cut-off and the upper bound cut-off
for attribute k be denoted ck and dk respectively, where ck and dk may be
allowed to vary across individuals. λjk and κjk may be defined in terms of ck and
dk as follows in expression (21.4):
( (
0 if ck does not exist 0 if dk does not exist
λjk ¼ κjk ¼ ð21:4Þ
maxð0; ck Xjk Þ maxð0; Xjk dk Þ
ωk and υk are the marginal disutilities of violating the lower and upper cut-offs
for attribute k.
To estimate his model, Swait obtains self-reported cut-off information from
respondents. However, if such information is unavailable, as might be
expected in most CE data where attribute level thresholds are not explicitly
accounted for in the modeling, it might still be worthwhile to consider using
attribute levels of the reference alternative as “pseudo-cut-off” values for ck
and dk. Such a representation would be consistent with both the reference
dependency and loss aversion concepts established in many behavioral stu-
dies. Reference alternatives may simply be the status quo or, in the spirit of
more recent work on reference point revision, those alternatives which were
chosen in previous choice sets. Another point of observation is that because ck
and dk are essentially thresholds the stochastic models of threshold formation
mentioned in Section 21.3.1 would also be relevant to this discussion and are a
possible extension of the model.
Hensher et al. (2013) have developed a model that incorporates the upper
and lower bound attribute thresholds, in a choice model where the relevance
of an alternative is also taken into account. The model form for the utility
expression that encapsulates the thresholds is given in Equation (21.5), which
can be estimated using the NLRPLOGIT command set out in Chapter 20:5
5
An alternative form for the alternative acceptability conditioning is the exponential form:
XH
expðδj h¼1 ðAjq þγRhq ÞÞ: Empirically, the difference is negligible in terms of predictive power and
elasticity outputs.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
949 Attribute processing, heuristics, and preference construction
XH XK
Ujq ¼ 1 þ δ j h¼1
ðA jq þ γR hq Þ½α j þ β X
k¼1 kj kjq
XL
þ l¼Kþ1 βl f0 : maxð0; Xljq Xlq minÞg
XM
þ m¼Lþ1 βm f0 : maxð0; Xmq max Xmjq Þg þ ej ; ð21:5Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
950 Advanced topics
the trade-off condition in the usual sense, or the alternative event representing
a rejection condition, where the utility for alternative j is not defined over
attribute values:
Swait’s (2009) model can be set up to embed the EBA heuristic as part of
choice set formation. The EBA heuristic states that an alternative is elimi-
nated if the attribute of that alternative fails to meet a certain threshold. This
allows pj, the probability of an alternative being in the rejection condition, to
be written as a function of a disjunctive screening rule: it takes just one
attribute to fail the threshold cut-off before the alternative is eliminated.
Conversely, qj, which is the probability that an alternative is in the usual
random utility-maximizing, fully compensatory trade-off condition, is writ-
ten in the conjunctive sense: all attributes must meet the threshold criteria
before subsequent processing takes place.
Using a similar concept employed by Cantillo and Ortúzar (2005), Swait
(2009) assumes that for each attribute of interest, individual-specific thresh-
olds are randomly distributed across the population. His specific assumption
for the distribution of τ is normal with mean τ k and variance 2k : In an
unlabeled experiment, τ may be assumed to be generic across alternatives. If
the EBA heuristic is applied to only one aspect – for example, cost – then
Equation (21.7) is obtained:
Xjqk τ k Xjqk τ k
pjq ¼ Prðτ qk < Xjk Þ ¼ Pr Z < ¼Φ
k k
Xjqk τ k
qjq ¼ 1 pjq ¼ 1 Φ : ð21:7Þ
k
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
951 Attribute processing, heuristics, and preference construction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
952 Advanced topics
function 1ðXjk ≻Xik 8j ≠ iÞ: The symbol “≻” denotes a preference relationship
where the LHS variable is preferred to the RHS variable. This indicator
function equals 1 if the level of the kth attribute of the jth alternative is
“best” among all alternatives in the choice set. Ties could be included in the
definition of “best.”
The indicator function needs to be weighted by the importance that the
respondent attaches to that attribute. The reason is that despite an alternative
scoring best on an attribute, if that attribute turns out to be relatively unim-
portant from the respondent’s perspective, the probability of that alternative
being chosen on the basis of the lexicographic rule should still turn out to be
comparatively small. To weight the attributes, one could normalize by the
jk ðβ Xjk Þ2
squared part-utilities X ðβ 2 , ensuring that all part-utilities are non-negative
jk Xjk Þ
k
jk expðβ Xjk Þ
or by the logit function X expðβ : To simplify the model, prior known values
jk Xjk Þ
k
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
953 Attribute processing, heuristics, and preference construction
overall conclusion from this line of inquiry is that accounting for (decision)
process heterogeneity – that is, allowing heuristic use to vary by subgroups of
respondents (up to a probability) – leads to improvements in model fits
compared to the standard multinomial logit (MNL) model. In instances
where supplementary questions are a part of the survey instrument, account-
ing for self-stated responses to questions of whether attributes were added up
or certain attributes ignored has further improved the explanatory power of
the models.
As an aside, a simple latent class approach may be used to model the lexicographic rule.6 As
a non-compensatory rule, the lexicographic rule may be characterized by a high β for the
attribute of importance and low or zero βs for the rest of the attributes. This might suggest a
latent class structure that assigns respondents’ probabilistically to a fully compensatory
utility model or to various classes that constrain all but one of the βs to zero. This approach
interprets the lexicographic rule as an extreme form of attribute non-attendance. However,
this specification does not model the second stage, and any higher order considerations of
the lexicographic rule, which states that in case of a tie the second most important attribute
is considered, and so on.
6
Some exploratory analysis would also be useful to check if a respondent is consistently choosing the
alternative which is best according to a given attribute. Histograms of the frequency that such choices are
made can be plotted.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
954 Advanced topics
expðβXjqt Þ
πjqt ¼ X : ð21:9Þ
expðβXjqt Þ
j2S
The entropy measure allows for the degree of preference similarity (the
difficulty of making trade-offs) to influence complexity, as entropy reaches
its maximum when each of the J alternatives are indistinguishable and has an
equal probability of being chosen. The scale of choice task t for individual q is
given in Equation (21.10):
The use of the quadratic form is to account for non-linear effects of complexity
on decision processes. Specifically, at low levels of complexity, easy decisions
requiring little cognitive effort lead to more preference consistency across
respondents (scale is high), while moderate levels of complexity lead to more
preference inconsistency (lower scale) as respondents resort to using simpli-
fying heuristics. At extreme levels of complexity, alternatives are all approxi-
mately similar in utility terms, thus the error variance should begin to decline
after a certain point. Writing the scale factor in the form above leads to a
specific form of heterogeneity across respondents. Swait and Adamowicz
(2001a) also use the same idea of entropy to model complexity; the difference
however, is the use of entropy as an explanatory variable for respondent
choice.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
955 Attribute processing, heuristics, and preference construction
Vj is the value of alternative i (given a choice set S), vjk(Xjk) is the utility of
attribute k of alternative j, λk is the loss aversion parameter for attribute k, and
Xrk indicates the value of attribute k at the reference point in choice set S.
The context concavity model takes the attribute value with the lowest part-
utility as the reference point and codes the utility of other attribute values as gains
against the reference. The model specification is shown in Equation (21.12):
X ck
Vj ¼ vjk ðXjk Þ vrk ðXrk Þ : ð21:12Þ
k
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
956 Advanced topics
Attribute 1:
Assumed value for vj1(Xj1) 5.4 10.2 20.3
vj1 ðXj1 Þ vr1 ðXr1 Þ 0 4.8 14.9
0:5
0 2.2 3.9
vj1 ðXj1 Þ vr1 ðXr1 Þ , assuming c1 = 0.5
Attribute 2:
Assumed value for vj2(Xj2) 30.3 23.7 15.2
vj2 ðXj2 Þ vr2 ðXr2 Þ 15.1 8.5 0
0:3
2.3 1.9 0
vj2 ðXj2 Þ vr2 ðXr2 Þ , assuming c2 = 0.3
0:5 0:3
2.3 4.1 3.9
Vj ¼ vj1 ðXj1 Þ vr1 ðXr1 Þ þ vj2 ðXj2 Þ vr2 ðXr2 Þ
lowest utility on attribute k across all alternatives in the choice set. Using some
assumed values for the part-utilities, Table 21.3 illustrates how the contextual
concavity model leads to an increased relative preference for the intermediate
alternative (Alt 2). More generally, the concavity parameter implies diminish-
ing marginal sensitivity to gains, thus benefiting the in-between alternative
with its moderate gains on the attributes, compared to the extreme
alternatives.
Tversky and Simonson (1993) have also proposed a componential context
model, also called a relative advantage model (RAM) by Kivetz et al. (2004).
This model is shown in Equation (21.13). The Nlogit commands for the RAM
model are given in Appendix 21A, using data described in Section 21.7:
X X
Vj ¼ vk ðXjk Þ þ θ Rðj; iÞ: ð21:13Þ
k j2S
R(j,i) denotes the relative advantage of alternative j over alternative i, and θ is the
weight given to the relative advantage component of the model. The parameter
θ can be taken as an indication of the strength of the choice context in
determining preferences. Using Tversky and Simonson’s notation, R(j,i) can
be defined as follows: first, for a pair of alternatives (j, i), consider the advantage
of j over i with respect to an attribute k, denoted in Equation (21.14) by
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
957 Attribute processing, heuristics, and preference construction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
958 Advanced topics
10). An example of a non-alignable choice set is the choice among multiple car
models, with say one alternative having a high quality car stereo with rear seat
DVD entertainment (but no sun roof), and another alternative having the sun
roof, but no rear seat entertainment. Hence, the trade-off across attributes is
discrete, such that by choosing one alternative the desirable features of
another may have to be given up completely. In cases of non-alignable choices,
Gourville and Soman found that respondents displayed an increased tendency
to either of the extreme alternatives (i.e., a low price, basic model or a high
price, fully loaded model) when the size of the choice set is increased.
Consumers are posited to increasingly rely on an all-or-nothing strategy,
choosing the basic low priced alternative or the high priced, fully loaded
alternative.
Gourville and Soman do not reject extremeness aversion outright, but
qualify that such aversion occurs when the attributes are alignable, i.e.,
when attributes can be traded off incrementally. For example, a choice invol-
ving a low priced, low processing speed computer model and a medium
priced, medium processing speed model is alignable and the introduction of
an extreme high priced, high processing speed option causes the market share
of the intermediate option to go up. Gourville and Soman suggest that more
research needs to be done to investigate the impact of hybrid alignable/non-
alignable attributes in the choice set, which arguably characterizes most real
world decision making. Although an interesting heuristic, the context in
which many CEs are designed seems to preclude the wider applicability of
extremeness seeking. Rather, the value of this discussion about alignable and
non-alignable attributes serves to emphasize the possibility that in most
applications where the attributes that determine choice are alignable, extre-
meness aversion will prevail.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
959 Attribute processing, heuristics, and preference construction
Indeed, the discrete choice literature has begun to amass evidence support-
ing the Simonson and Tversky hypothesis that the past matters. In responding
to sequences of choice tasks, there are signs that indicate reference point
revision (DeShazo 2002). Hensher and Collins (2011) find that if a non-
reference (i.e., non-status quo) alternative is chosen in the preceding choice
set n−1 the reference in the current choice set n is revised and its utility
increases. This suggests a shift in the value function around a new reference
point.
“Ordering anomalies,” where choice is biased by the sequence of attribute
values observed in the preceding choice set(s), are also not uncommon (Day
and Prades 2010). For example, if a price attribute of one alternative is seen to
increase from one choice set to another, the proportion of respondents
choosing that alternative in the second choice set is smaller than if the second
choice set was the first choice set, i.e., if there was no preceding choice set for
comparison. A proposed explanation is a “good deal/bad deal” heuristic
(Bateman et al. 2008) or a trade-off contrast (Simonson and Tversky 1992),
whereby current preferences are revised on the basis of previous price or cost
attributes. This finding may be viewed as a specific example of a more general
phenomenon of preference reversal (Tversky et al. 1990).
Strategic misrepresentation has also been invoked as one justification for
incorporating previous choices as a reference point in the current choice set.
The argument from a public goods provision context is that people aim to
increase the likelihood of their most preferred alternative being implemented
by deliberately withholding the truth about their preferences in the current
choice task if chosen alternatives in previous choice tasks have better attribute
values than those in the current choice task. Strategic misrepresentation
assumes that the respondents have stable and well formed preferences, but
that a discrepancy exists between stated preferences (SP) and underlying true
preferences. A weaker version of strategic misrepresentation allows respon-
dents to consider the likelihood that the good would not be provided if they
did not reveal their true preferences, and hence to only reject truth-telling
probabilistically (McNair et al. 2010).
The key to modeling strategic representation is to assume that the status
quo option is not only chosen when it is preferred to the other alternatives, but
also chosen when a previously chosen alternative is preferred to the alter-
natives in the current choice task. When the latter happens, the attributes of
the status quo are replaced by the attributes of this alternative, once such an
alternative has been chosen in a preceding choice set.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
960 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
961 Attribute processing, heuristics, and preference construction
where meta-utility is dependent on all past (static) utilities, Vj,t-s, which is itself
dependent only on the attributes in the period t – s. The link between current
utility and historical observed utilities is achieved through a path dependence
parameter, αjs ; 0 ≤ αjs ≤ 1; αj0 ¼ 1, where αjs might also be interpreted as the
weights associated with the previous periods. Taking logs to obtain a linear
additive form and adding past and contemporaneous error terms results in
Equation (21.18):
t
X t
X t
X
^ jt Þ ¼
lnðV Vj;ts þ lnðαjs Þ þ ej;ts : ð21:18Þ
s¼0 s¼0 s¼0
As the first RHS term of Equation (21.18) contains all past attribute levels, this
equation can also be seen to link “current utility to historical observed
attribute levels in a fashion that is consistent with learning about attributes
or updating” (Swait et al. 2004, 98). Attribute levels in previous periods
are combined with current attribute levels in a form of temporal averaging.
State dependence can be modeled using a dummy variable that equals 1 for
alternative j in choice set t if the same alternative had been chosen in choice set
t – 1. The variance structure of the disturbance term is allowed to vary over
time, providing a form of temporal heteroskedasticity. One final observation
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
962 Advanced topics
to make is that in the repeated CE context, this model provides another way of
investigating the role of the value learning heuristic.
An alternative specification proposed by Cantillo et al. (2006) assumes a just
noticeable difference heuristic in linking the attribute levels in the preceding
choice set to the attributes in the current choice set. A change in the attribute
level from choice task n−1 to choice task n is assumed to be perceptible only if
jΔXk;n j ¼ jXk;n Xk;n1 j ≥ δk ; for non-negative threshold values δk of attribute
k. Like several of the threshold formulations described earlier, thresholds can be
assumed to be individual-specific, randomly distributed across the population
and may also depend on socio-demographic characteristics.
Cantillo et al. assume that respondents only perceive the part of the
attribute level change that is bigger than the threshold, as in Equation (21.19):
m
X K
X
Ujqn ¼ Vjqn þ ejqn ¼ βjkq Xjkqn þ βjkq Xjkqn þejqn
k¼1 k¼mþ1
m K
X δkq X
¼ βjkq Xjkq;n1 þ ΔXjkqn 1 Ijkq þ βjkq Xjkqn þ ejqn ;
k¼1
jΔXjkqn j k¼mþ1
ð21:20Þ
1 if jΔXjkqn j ≥ δkq
where Ijkq ¼
0 otherwise
To complete the model, a joint density function needs to be assumed for δk.
Cantillo et al. assume that all δk are independently distributed over a trian-
gular distribution. The value for m is determined exogenously, by allowing the
perception of each attribute to be threshold constrained at a time.
Cantillo et al.’s just noticeable difference heuristic provides one way of
allowing respondents to “change” the attribute values presented to them in a
particular choice task, thereby relaxing the assumption in most choice models
that respondents take the attribute levels as given. In applications where
variability matters – for example, in transport where both travel times and
variability of travel times are important determinants of choice (see Hensher
and Li 2012) – the travel time attribute may itself be changed or edited by the
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
963 Attribute processing, heuristics, and preference construction
respondent, with the magnitude of the edit possibly depending on the varia-
bility attribute and any associated threshold.
The provision of an alternative may itself be a subject of uncertainty.
McNair et al. (2011) have attempted to model this aspect of the choice data
by assigning probability weights to each of the alternatives in the choice set.
These weights are determined on the basis that respondents expect a higher
cost alternative to be more readily provided. Moreover, a history of accept-
ing alternatives with the higher cost in previous choice sets also improves
the probability that the alternatives in the current choice set will be
provided.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
964 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
965 Attribute processing, heuristics, and preference construction
2 2 2
Metric 2: correlation between attribute rank (AR) and the number of boxes opened
for each attribute (NBOX)
EBA and LEX strategies imply an attribute-wise search, with a selective
amount of information processed for each attribute. These strategies also
imply the elimination of alternatives prior to arriving at a final choice.
In this metric, define a box rank to be the nth box that is opened by the
respondent, hence, the first box that is opened gets a box rank of 1 and so on.
The attribute rank (AR) is defined as the mean of all box ranks for that
attribute. The earlier the attribute is considered during the decision process,
the lower its AR will be. NBOX is the number of boxes that were opened for
each attribute.
A typical EBA simulation is illustrated in Figure 21.2. Here a “+” sign
indicates that the attribute value exceeds the threshold value, while a “–” sign
indicates that the attribute value falls short of the threshold and thereby
eliminates that particular alternative from further consideration. In this
example, the (AR, NBOX) data pairs form the following sequence: (3,5),
(7,3), (9.5,2).
It will be noted that the EBA and LEX strategies imply a negative correlation
between AR and NBOX. In contrast, compensatory strategies like EQW and
WADD should give a zero correlation, as a consistent set of information is
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
966 Advanced topics
Attr1 1+ 2– 3+ 4– 5+ 3 5
Attr2 8+ 7– 6+ 7 3
processed per alternative, resulting in NBOX for each attribute being a con-
stant factor across all attributes.
One of the main objectives of CEs and the SP technique is to find “a robust and
reliable method for valuing the non-market impacts of public policies on
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
967 Attribute processing, heuristics, and preference construction
Table 21.4 Summary of candidate heuristics and example model forms testable on existing data sets
7
Assumptions will have to be made about latent constructs, for example, thresholds and reference points.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
968 Advanced topics
A significant portion of the literature has observed that model outputs such
as welfare estimates and WTP are substantially different when standard
assumptions are relaxed and more behaviorally plausible assumptions put
in. It may therefore be worthwhile revisiting some of our existing data sets to
see how our results and conclusions would change if we were to now consider
embedding heuristics into our choice models.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
969 Attribute processing, heuristics, and preference construction
of various travel times, in locations where toll roads currently exist.8 To ensure
some variety in trip length, an individual was assigned to one of the three trip
length segments based on a recent commuting trip: no more than 30 minutes,
31 to 60 minutes, and more than 61 minutes (capped at 2 hours). A telephone
call was used to establish eligible participants from households stratified
geographically, and a time and location agreed for a face-to-face Computer
Assisted Personal Interview (CAPI).
A statistically efficient design (see Rose and Bliemer 2008; Sándor and
Wedel 2002), that is pivoted around the knowledge base of travellers, is used
to establish the attribute packages in each choice scenario, in recognition of
supporting theories in behavioral and cognitive psychology and economics,
such as prospect theory. A pivot design recognizes the useful information
contained in an RP alternative, capturing the accumulated exposure to the
studied context. Further details of the design of the CE and merits of pivot or
referenced designs are provided in Hensher and Layton (2010) and Hensher
(2008).
The two SC alternatives are unlabeled routes. The trip attributes associated
with each route are free-flow time, slowed-down time, trip time variability,
running cost, and toll cost. All attributes of the SC alternatives are based on
the values of the current trip. Variability in travel time for the current
alternative was calculated as the difference between the longest and shortest
trip time provided in non-SC questions. The SC alternative values for this
attribute are variations around the total trip time. For all other attributes, the
values for the SC alternatives are variations around the values for the current
trip. The variations used for each attribute are given in Table 21.5.
The experimental design has one version of 16 choice sets (games). The
design has no dominance given the assumptions that less of all attributes is
better.9 The distinction between free-flow and slowed-down time is designed
to promote the differences in the quality of travel time between various
routes – especially a tolled route and a non-tolled route – and is separate to
the influence of total time. Free-flow time is interpreted with reference to a trip
at 3 a.m. in the morning when there are no delays due to traffic.10 An example
of an SC screen is shown in Figure 21.3.
8
Sydney has a growing number of operating toll roads; hence drivers have had a lot of exposure to paying
tolls.
9
The survey designs are available from the author on request.
10
This distinction does not imply that there is a specific minute of a trip that is free flow per se, but it does
tell respondents that there is a certain amount of the total time that is slowed down due to traffic, etc. and
hence a balance is not slowed down (i.e., it is free flow as one observes typically at 3 a.m. in the morning).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
970 Advanced topics
Practice Games
Make your choice given the route features presented in this table, thank you.
Details of Your
Road A Road B
Recent Trip
Time in free-flow traffic (mins) 50 25 40
Time slowed down by other traffic (mins) 10 12 12
Travel time variability (mins) +/– 10 +/– 12 +/– 9
Running costs $ 3.00 $ 4.20 $ 1.50
Toll costs $ 0.00 $ 4.80 $ 5.60
How would you PRIMARILY spend the time that you have saved travelling?
Back Next
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
971 Attribute processing, heuristics, and preference construction
versus EBA, we allow for a mix of regimes across a sample. In contrast, most
studies impose the same rule on the entire estimation sample, and contrast
two regimes as two separate models. That is, they assume consistency in
attribute strategy within the same decision context for the entire sample.
Consider a utility function for alternative i defined in terms of two attri-
butes labeled x1 and x2 (these might be free-flow time and congestion time,
both in common units) and other attributes such as running cost and toll cost
x3 and x4:
where
( )
β1 xi1 þ β2 xi2 þ β3 xi3 þ β4 xi4 if ðxi1 xi2 Þ2 > α
f ðxi1 ; xi2 ; xi3 ; xi4 Þ ¼ :
β12 ðxi1 þ xi2 Þ þ βi3 xi3 þ βi4 xi4 if ðxi1 xi2 Þ2 < α
ð21:22Þ
β1, β2, β3, β4, β12, are parameters and α is an unknown threshold. The term
(xi1−xi2)2 represents the square of the distance between xi1 and xi2. A squared
form supports efficient computation, but another form could be used.
Intuitively, when the difference between the common metric attributes xi1
and xi2 is great enough, the agent’s process preserves attribute partitioning,
and thus treats each attribute as separate entities and evaluates their contribu-
tion to utility in the standard random utility model (RUM) manner with
parameters β1 and β2. On the other hand, when the difference between the
common metric attributes xi1 and xi2 is relatively small, the agent’s process
aggregates the attributes and thus treats the sum of xi1 and xi2 as a single
attribute with utility weight β12.
We enrich the model by allowing αn for person n to be randomly distrib-
uted (with αn > 0). A useful candidate distribution is that αn is exponential
with mean 1/λ and density g(α) = λe−λα. This density allows for some fraction
of the population to behave essentially as standard utility maximizers at the
level of a very specific alternative (where for repeated observations, we impose
the condition of independence in respect of the way the heuristic operates).
Still others behave as standard utility maximizers when attributes are dissim-
ilar, but aggregate when attributes are similar. Importantly, this density also
allows for a tail of others who more frequently are aggregating the two
attributes. The probability conditions are given in Equation (21.23). In this
model, we assume that there is an exponentially distributed threshold
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
972 Advanced topics
parameter, IID across alternatives and respondents, that indicates how the
respondent views the attribute components.11 This non-linear utility function
permits a probabilistic preservation or aggregation of each attribute:
Pi ðxi1 xi2 Þ2 > α ¼ 1 expλ ðxi1 xi2 Þ2 : ð21:23Þ
Equation (21.24), together with the equivalent treatment of xi3 and xi4, implies
that:
2
2
Ui ¼ ðβ1 xi1 þ β2 xi2 Þ 1 exp λ1 ðxi1 xi2 Þ þ β12 ðxi1 þ xi2 Þ exp λ1 ðxi1 xi2 Þ
:
λ2 ðxi3 xi4 Þ2 λ2 ðxi3 xi4 Þ2
þ ðβ3 xi3 þ β4 xi4 Þ 1 exp þ β34 ðxi3 þ xi4 Þ exp þ ei
ð21:25Þ
Equation (21.25) is a non-linear form in xi1, xi2, xi3, xi4. As λq q = 1,2, tends
toward ∞ the distribution becomes degenerate at zero. In this case, all
individuals are always standard utility maximizers who partition the
common-metric attributes, and we obtain the linear additive form (21.26):
11
One can allow for the αns to be constant across alternatives for a given respondent. We discuss the
formulation and report results for such a model later. At this juncture, we find it clearest to present the
model in terms of uncorrelated αns.
12
As an example, imagine an experimental design with x1 and x2 being dummy variables, and the only
combinations considered are (1,0) and (0,1). In both cases ðx1 x2 Þ2 = 1, and so we have condition:
U ¼ ðβ1 x1 þ β2 x2 Þð1 expλ Þ þ β12 ðx1 þ x2 Þðexpλ Þ þ e.
If x1 =1 and x2 = 0, we have condition (a): U ¼ ðβ1 x1 Þð1 expλ Þ þ β12 ðx1 Þðexpλ Þ þ e
equivalent to (b): U ¼ ðβ1 x1 Þ þ ðβ12 β1 Þx1 ðexpλ Þ þ e ¼ fβ1 þ ðβ12 β1 Þðexpλ Þg x1 þ e.
The same functional expression applies for x2. In both cases we have a co-mingling of parameters. If
we include the combinations of (1,1) and (0,0), then we have Equation (c): U ¼ β12 ðx1 þ x2 Þ þ e.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
973 Attribute processing, heuristics, and preference construction
To compute Equation (21.28) one must utilize the sequence of choices and
integrate the resulting panel choice probability over the density of the thresh-
old parameter, αn.
We can derive the relevant WTP for travel time savings for free-flow
and slowed-down time, and a weighted average total time, and contrast it
with the results from the traditional linear models. The WTP function is
now highly non-linear. The derivative of the utility expression with respect to
a specific attribute is given in Equation (21.29), using free-flow time (defined
as x1) and in Equation (21.30) using slowed-down time, (x2), as examples of
the common form, suppressing the subscript for an alternative. The difference
is in the specific parameters and the sign change before the number “2.”
Exactly the same functional form for Equations (21.29) and (21.30) applies
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
974 Advanced topics
to running cost and toll cost, respectively. The WTP for free-flow time, for
example, defined in terms of running cost would be:
ð∂U=∂x1 Þ=ð∂U=∂x3 Þ:
2
2
∂U=∂x1 ¼ β1 1 expλðx1 x2 Þ þ 2ðβ1 x1 þ β2 x2 Þλðx1 x2 Þexpλðx1 x2 Þ
:
2 2
þ β12 expλðx1 x2 Þ 2β12 ðx1 þ x2 Þλðx1 x2 Þexpλðx1 x2 Þ
ð21:29Þ
λðx1 x2 Þ2 λðx1 x2 Þ2
∂U=∂x2 ¼ β1 1 exp þ 2ðβ1 x1 þ β2 x2 Þλðx1 x2 Þexp
:
2 2
þ β12 expλðx1 x2 Þ 2β12 ðx1 þ x2 Þλðx1 x2 Þexpλðx1 x2 Þ
ð21:30Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
975 Attribute processing, heuristics, and preference construction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
976 Advanced topics
dffsdtc,dffsdts1,dffsdts2,drctcc,drctcs1,drctcs2$
Maximise
; Labels = bff,bsdt,brc,btoll,bvar,nonsqasc,?btollasc,
betacc,betatt,acc1,att1
; Start = -.068,-.083,-.306,-.403,-.009,-.156,0,0,0,0
; maxit = 20
; Fcn = uc =
(brc*rccu + btoll*tccu)*(1-exp(-acc1*drctcc))
+ betacc*sumtcc*exp(-acc1*drctcc)
+ (bff*ffcu + bsdt*sdcu)*(1-exp(-att1*dffsdtc))
+betatt*sumttc*exp(-att1*dffsdtc) + bvar*varcu|
vc = exp(uc) |
us1 = nonsqasc +
(brc*rcs1 + btoll*tcs1)*(1-exp(-acc1*drctcs1))
+betacc*sumtcs1*exp(-acc1*drctcs1)
(bff*ffs1 + bsdt*sds1)*(1-exp(-att1*dffsdts1))
+betatt*sumtts1*exp(-att1*dffsdts1)
+ bvar*varcu|
vs1 = exp(us1) |
us2 = nonsqasc + (brc*rcs2 + btoll*tcs2)*(1-exp(-acc1*drctcs2))
+betacc*sumtcs2*exp(-acc1*drctcs2)
(bff*ffs2 + bsdt*sds2)*(1-exp(-att1*dffsdts2))
+betatt*sumtts2*exp(-att1*dffsdts2)
+ bvar*varcu|
vs2 = exp(us2) |
IV = vc+vs1+vs2|
P = (dcu*vc + ds1*vs1 + ds2*vs2)/IV |
log(P) $
Maximum iterations reached. Exit iterations with status=1.
+---------------------------------------------------------------+
| User Defined Optimization |
| Maximum Likelihood Estimates |
| Dependent variable Function |
| Weighting variable None |
| Number of observations 3568 |
| Iterations completed 21 |
| Log likelihood function 2969.138 |
| Number of parameters 0 |
| Info. Criterion: AIC = -1.66431 |
| Finite Sample: AIC = -1.66431 |
| Info. Criterion: BIC = -1.66431 |
| Info. Criterion:HQIC = -1.66431 |
| Restricted log likelihood .0000000 |
| Chi squared 5938.276 |
| Degrees of freedom 10 |
| Prob[ChiSqd > value] = .0000000 |
| Model estimated: May 12, 2008, 09:38:46AM |
+---------------------------------------------------------------+
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
977 Attribute processing, heuristics, and preference construction
+-----------+---------------------+----------------------+-----------+-----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]|
+-----------+---------------------+----------------------+-----------+-----------+
|BFF | .29530 .93089447 .317 .7511 |
|BSDT | .27320 .83889932 .326 .7447 |
|BRC | -.02064 .01463281 -1.411 .1583 |
|BTOLL | -.36827*** .01514496 -24.317 .0000 |
|BVAR | -.00900 .122518D+09 .000 1.0000 |
|NONSQASC| -.61223*** .06787733 -9.020 .0000 |
|BETACC | -.21771*** .02015601 -10.801 .0000 |
|BETATT | -.06792*** .00292412 -23.227 .0000 |
|ACC1 | 20.8150*** 3.27880870 6.348 .0000 |
|ATT1 | .00021 .00067015 .317 .7510 |
+-----------+---------------------------------------------------------------------+
The class assignment is unknown. Let Hiq denote the prior probability for class
q for individual i. A convenient form is the MNL (Equation 21.32):
expðz 0 i θq Þ
Hiq ¼ XQ ; q ¼ 1; . . . ; Q; θQ ¼ 0; ð21:32Þ
0
q¼1
expðz i θq Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
978 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
979 Attribute processing, heuristics, and preference construction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
980 Advanced topics
VTTS:
VTTS: free- slowed-down VTTS: Weighted
Process rule flow time time average time Reference
trend here that, if reinforced by other data sets, sends a warning about the
under-estimation of VTTS when processing heuristics are not accounted for.
The extent of under-estimation appears significant; for the overall weighted
average travel time it ranges from a high of 34.7 percent for the full set of
process rules in the LCM to a low of 7.3 percent for attribute aggregation for
both time and cost.14
We take a closer look at the findings from the LCM, summarized in
Table 21.7. There is a range of mean estimates of VTTS across the latent
classes. The range is $1.35 to $42.19, after dividing the marginal disutility of
each time component by the weighted average cost parameter, where the
weights are the levels of running and toll cost. To obtain an overall sample
13
In order to estimate the model as a panel, Layton and Hensher (2010) used a combination of many start
values and simulated annealing (code written by E.G.Tsionas, 9 April 1995, available at the American
university Gauss Archive: www.american.edu/academic.depts/cas/econ/gaussres/GAUSSIDX.HTM).
Using the maximum from the simulated annealing approach, we then computed one Newton–Raphson
iteration using 500 replications of the simulator, and computed the covariance from all terms except for
λt and λc.
14
It is worth noting that the attribute aggregation model (Process I) allowed for aggregation of both the
time and the cost components. By contrast, the LCM (Process IV) only found time aggregation
statistically significant, but did identify a significant effect from the heuristic that transferred the toll cost
parameter to the running cost attribute. What this latter evidence suggests is that individuals do not tend
to add up the cost components, but tend to re-weight their influence by the parameter transfer rule.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
981 Attribute processing, heuristics, and preference construction
Free- Slowed-
NAT = not attended to ParT = parameter Class member- flow down Total
transfer ship probability time time time
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
982 Advanced topics
Ignored attributes
1. Please indicate which of the following attributes you ignored when considering the choices you made in the 10 games.
Next
15
This question was asked after completion of all 16 choice tasks. An alternative approach is to ask these
questions after each choice task, as was the case in Puckett and Hensher (2009), and Scarpa et al. (2010).
Our preference is for choice task-specific self-stated processing questions, especially where the attribute
level matters; however, this comes at the risk of cognitive burden and the possibility that the number of
choice tasks might have to be reduced. We also recognize the potential limitation of such questions, and
the need to investigate question structure and the believability/plausibility of the evidence.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
983 Attribute processing, heuristics, and preference construction
and (iv) all four attributes preserved as separate components. One LCM
defined four class memberships as per (i)–(iv) above without recourse to
information from the supplementary questions, whereas another LCM con-
ditioned class membership on conditions (i)–(iv). A base LCM assumed that
all attributes were treated separately but three classes were identified with
statistically significant latent class probabilities. The findings are summarized
in Table 21.8. MLs and LCMs are well documented in the literature.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
984 Advanced topics
(ii) (cont.)
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
985 Attribute processing, heuristics, and preference construction
16
pffiffiffi
The spread is the standard deviation times 6:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
986 Advanced topics
The overall goodness of fit for the models with allowance for self-stated
attribute processing strategy (APS) are statistically better than when self-
stated APS is not accounted for. The ML models differ in the way that the
time and cost attributes are included in the utility expressions, but in both
models all parameters have the expected negative signs and are statistically
significant at the 1 percent level. Given the different ways that free-flow and
slowed-down time are handled, the most sensible representation of VTTS is as
a weighted average estimate, with weights associated with the contribution of
each of the three specifications of cost and of time. The VTTS in Table 21.7
(p. 981) are based on conditional distributions (that is, conditional on the
alternative chosen). The VTTS in the ML model is significantly higher when
the self-stated APS is accounted for, i.e., $20.12 (22.63 with error components)
per person hr. compared to $15.87 ($16.11 with error components) per person hr.
The LCM is based on four attribute addition rules (i)–(iv), and all time and
cost parameters are statistically significant at the 1 percent level and of the
expected sign when class membership is conditioned on the self-stated APS;
however, when the self-stated APS are not included, all but one parameter is
statistically significant at the 1 percent level, the exception being running cost
in the second latent class, which has a 10 percent significance level. The overall
log-likelihood (LL) at convergence is greatly improved over the ML model for
both LCMs, suggesting that the discrete nature of heterogeneity captured
through latent class is a statistical improvement over the continuous repre-
sentation of heterogeneity in the ML model. The weighted average VTTS are
derived first across classes for each attribute, based on conditional distribu-
tions associated with the probability of class membership of each respondent
within each class, and then a further weighting is undertaken using weights
that reflect the magnitudes of the components of time and cost.
The weighted average VTTS in the two LCMs that account for AP are
virtually identical. What this suggests is that once we have captured the
alternative processing rules, through the definition of latent classes, the
inclusion of the self-stated APS rules as conditions on class membership do
not contribute additional statistically useful evidence to revise the findings, in
the aggregate. This is consistent with the statistical non-significance of most of
the self-stated APS variables; with only three parameters having a 10 percent
significance level (excluding the constants). There were no parameters with 1
or 5 percent significance levels. However, when we contrast this evidence to
the base LCM that makes no allowance for AP, the mean VTTS is only slightly
lower (i.e., $17.89 per person hr. compared to $18.02, and $14.07 for the MNL
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
987 Attribute processing, heuristics, and preference construction
model). What this may suggest is that the latent class specification may have
done a good job in approximating the way in which attributes are processed.
These findings support the hypothesis that allowance for AP rules tends to
result, on average, in a higher mean estimate of WTP for travel time savings.
This is consistent directionally with other studies undertaken by Rose et al.
(2005) and Hensher and Layton (2010).
21.8 Case study II: the influence of choice response certainty, alternative
acceptability, and attribute thresholds
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
988 Advanced topics
17
An interesting way of including response certainty into a model is to create a relative measure around a
reference alternative, where the latter has been chosen in a real market and hence its certainty value is 10
on the 1–10 scale. Deviations from 10 may be more informative than the actual certainty scale value.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
989 Attribute processing, heuristics, and preference construction
18
This is not strictly scale heterogeneity – see the following paragraphs – although it appears like
deterministic scale as a function only of covariates. In contrast, scale heterogeneity as represented in
SMNL is a stochastic treatment which may be partially decomposed via the deterministic addition of
covariates.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
990 Advanced topics
measurement error, and also may be revised depending on the levels offered
by other attributes. That is, there is “softness” (in the language of Swait 2001)
in the binding nature of perceived threshold levels reported by the qth
individual. To capture the notion of threshold, we define a lower cut-off and
an upper cut-off. Accounting for attribute thresholds is equivalent to introdu-
cing functions that are incremental effects on the linear attribute effect
throughout an attribute’s entire range, and only get activated if the corre-
sponding cut-off is in use. These cut-off penalties are a linear function of the
amount of constraint violation and are defined as: {0:max(0, Xljq−Xlmin)}, the
lower cut-off effect and deviation of the attribute level from the minimum cut-
off attribute threshold where the attribute level is below the minimum cut-off
(i.e., the cut-off exists), and zero otherwise (if the cut-off does not exist); and
{0:max(0, Xmin –Xmjq)}, the upper cut-off effect and deviation of the attribute
level from the maximum cut-off attribute threshold where the attribute level is
above the maximum cut-off (i.e., the cut-off exists), and zero otherwise (if the
cut-off does not exist). Defining Xkjq as the kth attribute associated with the jth
alternative and qth individual, with l = K+1,. . .,L attribute lower cut-offs; m =
L+1,. . .,M attribute upper cut-offs; q =1,. . .,Q respondents, and βl and βm
estimated penalty parameters, we write the threshold penalty expression as:
XL XM
β f0 : maxð0; Xljq Xlq minÞg þ
l¼Kþ1 l m¼Lþ1
βm f0 : maxð0; Xmq max Xmjq Þ:
ð21:33Þ
In the current application, both upper and lower bounds are behaviorally
meaningful. For example, some individuals might only be interested in 6
cylinder cars and would not consider 4 and 8 cylinder cars. Likewise low
prices and very high prices might be rejected for different reasons, with
purchasers often looking within a specific price range given their preferences.
We also define Certcs to represent levels of surety. To allow for the influence
of response certainty, which is choice set (cs)-specific, we assume that the
entire utility function associated with each alternative must be exogenously
weighted by the index of certainty, defined here on a 10-point scale, where 1 is
the lowest level of certainty.
The model form for the utility expression that encapsulates the elements
presented above is given in Equation (21.34)19:
19
An alternative form for the alternative acceptability conditioning is the exponential form: expðδj ðACjq þ
XH
h¼1
γRhq ÞÞ: Empirically, the difference is negligible in terms of predictive power and elasticity
outputs.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
991 Attribute processing, heuristics, and preference construction
XH h XK
Ujq ¼ ð1 þ δj ACjq þ h¼1
γ h R hq αj þ β X
k¼1 kj kjq
XL
þ l¼Kþ1 βl 0 : maxð0; Xljq Xlq min g
XM i
þ m¼Lþ1 βm 0 : maxð0; Xmq max Xmjq þ ej : ð21:34Þ
with the IID, Type I EV distribution assumed for the random terms εjqt.
Conditioned on Vjqt, the choice probabilities take the familiar MNL form in
Equation (21.36):
expVjqt
Probjqt ¼ PJqt : ð21:36Þ
j¼1 expVjqt
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
992 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
993 Attribute processing, heuristics, and preference construction
heteroskedasticity. This model is the most general one we will estimate and is
called heteroskedastic Gumbel scale MNL (HG-SMNL). All parameters of the
models are estimated by maximum simulated likelihood. This most general
model is given in Equation (21.42):
XH
exp½q ð1 þ δj ðACjq þ γ R ÞÞVjqt
h¼1 h hq
Probjqt ¼ J X H : ð21:42Þ
qt
Σj¼1 exp½q ð1 þ δj ðACjq þ h¼1
ðAC jq þγ h R hq ÞÞV jqt
In summary, in the section following the discussion of the CE, we present the
findings for six models. The form of the utility expression that incorporates all
three features is non-linear, a form known as heteroskedastic MNL (Model
5) or with scale heterogeneity allowed for (Model 6). To establish the
contribution of these features, we begin with the standard MNL (Model 1)
and ML models (Model 3), then move on to choice certainty weighted MNL
(Model 2) and ML (Model 4), and finish with heteroskedastic MNL (HMNL)
(Model 5) and the extension to heteroskedastic Gumbel scale MNL (HG-
SMNL) (Model 6).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
994 Advanced topics
Levels 1 2 3 4 5
Both of the surcharges are determined by the type of fuel a vehicle uses and
the fuel efficiency of that vehicle. For a given vehicle, if it is fueled by petrol,
owners would pay a higher surcharge than if it were fueled by diesel, which is
in turn more expensive than if it were a hybrid. Once the car has been specified
in terms of fuel type and efficiency, there are five levels of surcharge that could
be applied.
The CE is a D-efficient design, where the focus is on the asymptotic
properties of the standard errors of estimates, given the priors of attribute
parameters. Prior parameter estimates obtained from substantive pilot sur-
veys are used to minimize the asymptotic variance-covariance matrix which
leads to lower standard errors and more reliable parameter estimates, for a
given sample size (see Rose and Bliemer 2008 and Chapter 6 for details). The
methodology focusses not only on the design attributes which are expanded
out through treatment repetition, i.e., multiple choice sets, but also on the
non-expanded socio-demographics and other contextual variables that are
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
995 Attribute processing, heuristics, and preference construction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
996 Advanced topics
fuel efficiency combination, there are five levels of surcharge that apply
(Table 21.9).
The variable emissions surcharge is determined by the type of fuel used by
the alternative and the fuel efficiency of that vehicle. For each fuel type and
fuel efficiency combination, there are five levels of surcharge that apply
(Table 21.9).
An internet-based survey with face to face assistance of an interviewer was
programmed. An eligible respondent had to have purchased a new vehicle in
2007, 2008, or 2009. Details of response rates and reasons for non-eligibility
are summarized in Beck et al. (2012). The survey was completed online at a
central location (varied throughout the Sydney metropolitan area to minimize
travel distance for respondents). Respondents provided details of the vehicles
within the household, and details of the most recent (or a potential) purchase.
Eight choice sets are provided (with an example shown in Figure 21.5), with all
Choice Scenario 1
Make your choice given the vehicles presented in this table.
If an attribute is not relevant across all alternatives, then please click on the label of the attribute.
In an attribute is not relevant for one or more specific alternatives, then please click on the box that the attribute is in.
Please indicate which vehicles are ones that you Yes Yes Yes
would find acceptable No No No
Given that the vehicle you rated number one is your preferred choice, on the following scale, how certain
are you that you would actually make this choice?
1 2 3 4 5 6 7 8 9 10
Very Unsure Very Sure
Next
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
997 Attribute processing, heuristics, and preference construction
Minimum Maximum
Purchase price of the vehicle
Registration (incl. CTP)
Fuel cost per 100km
Fuel consumption (litres per
100km)
Engine capacity (cylinders)
Seating capacity
Select the vehicle body types that you were prepared to buy
Hatch Sedan Stationwagon Coupe Ute Family Van 4WD
Select the countries/regions that manufacture brands that you were prepared to buy
Japan Europe South Korea Australia USA
Next
Figure 21.6 Attribute threshold questions (preceding the choice set screens)
20
The survey is programmed so that respondents can click on various rows, columns, and cells within a
choice scenario if they find that attribute, alternative, or level to be ignored or irrelevant. This
information is stored so that for each and every choice set completed by every respondent, data are
collected on what information was important in making a decision and what information was discarded.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
998 Advanced topics
(2012) for details of the data set, confining the presentation to the data
elements relevant to the modeling undertaken below. Table 21.10 summarizes
the data used in the estimation of the six models.
Of particular interest is that 65 percent of the alternatives were perceived to
be acceptable, suggesting that a large percentage were perceived as not accep-
table given the attribute levels offered. The mean certainty scale value is 7.14
on the 1–10 scale, suggesting that certainty is greater than the mid point,
although a lot of respondents are not totally sure.
The attribute threshold responses are very illuminating. Taking the attri-
bute rejection evidence for the combined minimum and maximum cut-offs,
83.2 percent of the CE levels for vehicle price are outside of the upper and
lower bounds for the price attribute that respondents indicated they are
prepared to pay or accept. In contrast, the percentages for the other attributes
are, respectively, 52.5 percent for the price of fuel, 45.2 percent for the annual
registration charge, 49 percent for fuel consumption in litres/100km, 37.5
percent for engine capacity, and 57.9 percent for seating capacity. These are
sizeable percentages, and raise some fundamental questions about empirical
evidence if thresholds are ignored.
Separating out the lower and upper cut-off thresholds (defined by an
attribute rejection range dummy variable), we see that the percentage that
exceed the upper (i.e., maximum) cut off are greater than the lower (i.e.,
minimum) cut-off, except for seating capacity. The rejection percentage is as
high as 71.4 percent for the upper vehicle price and 5 percent for the lower
engine capacity. The actual differences between the minimum (maximum)
perceived threshold levels and the levels shown in the CE are also summarized
in Table 21.10. For example, the average vehicle price in the CE is $16,780
above the threshold maximum; and fuel consumption is 1.38 litres per 100km
above the upper threshold, on average.
This descriptive evidence on attribute thresholds raises the interesting
question as to whether future SC studies should take this information into
account in designing the range of attribute levels. The modeling evidence
below explicitly accounts for the influence of attribute thresholds and the
acceptability of each alternative (both correlated) has on prediction success
and mean direct elasticity estimates.
The results for all six choice models are summarized in Table 21.11. Models
1 and 2 are basic MNL models, distinguished by the exogenous weighting of
the certainty scale, and models 3 and 4 are the equivalent ML models that have
associated random parameters for all eight design attributes to account for
preference heterogeneity, with an unconstrained normal distribution assumed
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
999 Attribute processing, heuristics, and preference construction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1000 Advanced topics
for all attribute parameters. The remaining two models are extensions of MNL
in which we account for the acceptability of each alternative at a choice set
level and the perceived attribute thresholds. Model 6 differs from Model 5 by
the estimation of an additional parameter to account for scale heterogeneity.
The overall goodness of fit of the models improves substantially as we move
from Model 1 to Model 6; however, the evidence from comparing Models 1
and 3 with Models 2 and 4 is that the exogenous weighting by the certainty
scale has a very small (almost negligible) influence on the overall model fit.
Models 5 and 6 are significantly better fitting models (with exogenous weight-
ing included), with fits improving by a factor of close to twice compared to the
ML models. The allowance for scale heterogeneity is significant in overall gain
in fit (given one degree of freedom difference between Models 5 and 6).
In-sample prediction success increases substantially when we allow for the
acceptability of each alternative and the attribute thresholds. In Table 21.11,
we report the percentage improvement in prediction of Models 5 and 6
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1001 Attribute processing, heuristics, and preference construction
21
It should be noted that the overall utility expression is negative, and hence the heteroskedastic effect
reduces the disutility when the alternative is acceptable, compared to not acceptable, as might be
expected.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1002 Advanced topics
;rpl =prrejz,flrejz,rgrejz,ferejz,ecrejz,screjz
;fcn=altac(c)
;halton;draws=50
;pds=panel
;smnl;tau=0.1
;wts=cert,noscale
;output=3;crosstab;printvc
;start = -.647,-.0993,-1.8085,-.00451,-0.00245,-.6388,-.15051,.06247,-.0709,
.18976,-.45376,-0.0893,0.02961,-1.13807,-.04491,0.12248,-0.00057,
-0.13019,.41258,-.69951,-.02448
;labels =altac,pric,fue,reg,ae,ve,fel,ecl,scl,petasc,dasc
,prcld,prchd,fuehd,feld,fehd,rghd,ecld,echd,scld,schd
;prob=probvw1;utility=utilvw1
;fn1= ealtacc=(1+altac*altaccz) ? linear
;fn2= Vaa=pric*price + fue*fuel+reg*rego +AE*AES+VE*VES +FEl*FE+ECl*EC+
SCl*SC
;fn3= vab= prcld*prcldf+prchd*prchdf+fuehd*fuelhdf+feld*feldf+fehd*fehdf
+rghd*rghdf+ecld*ecldf+echd*echdf+scld*scldf+schd*schdf
;fn4 = Util1 = ealtacc*(Petasc + Vaa+Vab)?+Vac)
;fn5 = Util2 = ealtacc*(Dasc + Vaa+Vab) ?+Vac)
;fn6 = Util3 = ealtacc*(Vaa+Vab) ?+Vac)?
;?ecm=(pet),(hybrid)
;model:
U(pet) = Util1 /U(die) = Util2 /U(hyb) = Util3 $
M3: M4:
M1: Mixed Mixed M5: H– M6: H–
Alts MNL M2: MNL Logit Logit MNL SMNL
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1003 Attribute processing, heuristics, and preference construction
M3: M4:
M1: Mixed Mixed M5: H– M6: H–
Alts MNL M2: MNL Logit Logit MNL SMNL
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1004 Advanced topics
M3: M4:
M1: Mixed Mixed M5: H– M6: H–
Alts MNL M2: MNL Logit Logit MNL SMNL
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1005 Attribute processing, heuristics, and preference construction
M3: M4:
M1: Mixed Mixed M5: H– M6: H–
Alts MNL M2: MNL Logit Logit MNL SMNL
∂Vjq
Elaskjq ¼ ð1 Probjq Þ × Xkjq × ; ð21:43aÞ
∂Xkjq
∂V
where ∂Xkjqjq is the parameter estimate (or marginal disutility) associated with
the kth attribute in the jth alternative for the qth individual in Models 1 to 6,
and is the form in Equation (21.43) for Models 5 and 6. The general form
given in equation (21.43b) is derived from Equation (21.34), where all terms
are as defined in previous equations:
∂Vjq
¼ βk ð1 þ 2γh Xhjq δj ACjq Þ þ βl ð1 þ δj Ajq γh Xhl min þ 2γh Xhjq Þ
∂Xkjq :
þ βm ð1 δj ACjq þ γh Xhm max 2γh Xhjq Þ
ð21:43bÞ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1006 Advanced topics
In Equation (21.43b), the second (third) term falls out when the lower (upper)
cut off is satisfied (i.e., when the attribute level for the hth (or kth) attribute
associated with the jth alternative and qth individual is outside of the per-
ceived attribute threshold acceptance level. The marginal disutility expression
(see Equation (21.44) which is an interpretation of Equation (21.43)) for price
in Model 6 accounts for the mean non-penalized marginal disutility, the
penalized lower and upper bound cut-offs where the price level offered is
outside of the minimum and maximum threshold levels (zero otherwise), and
the conditioning of the price on acceptability of an alternative when price is in
the attribute rejection range:
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1007 Attribute processing, heuristics, and preference construction
Table 21.12 Summary of mean direct elasticity results Three estimates for petrol, diesel, and hybrid; all
elasticities are probability weighted
M5: H- M6: H-
Attribute M1: MNL M2: MNL M3: ML M4: ML MNL SMNL
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1008 Advanced topics
Marginal rate of substitution between vehicle price ($) and: M5: H-MNL M6: H-SMNL
Note: The vehicle price marginal disutility is in the denominator. One can construct other ratios
by taking two of the marginal rates of substitution (MRS) and dividing one into the other. For
example, the MRS between fuel price and annual emission surcharge in Model 5 is 0.0011/0.649
= 0.00169. Standard deviations in brackets.
have a non-marginal impact on the vehicle price paid and vice versa in terms
of substitution and purchase of vehicle decisions. The other subsitution rates
are relatively low, with the possible exception of engine capacity.
21.8.4 Conclusions
This section has presented a framework within which important processing
influences on choice making at the respondent and choice set level can be
incorporated into a RUM. Drawing on the existing literature that suggests that
choice attribute thresholds and response certainty have behaviorally signifi-
cant influences on the probability of a choice, we extend the processing set to
investigate the role that perceived acceptance of an alternative also has. The
model specification that incorporates these three influences is referred to as a
heteroskedastic MNL model and modified as heteroskedastic scale MNL when
scale heterogeneity is accounted for.
A comparison of the models shows the significant improvement in pre-
dictive power as well as different mean direct elasticities for the HMNL and
HG-SMNL (compared to simple MNL and ML) models, due in large measure
to the “scaling” of the standard utility expression by a function that accounts
for the acceptability of each alternative and perceived attribute thresholds, as
well as accounting for scale heterogeneity. The evidence also suggests, in
particular, that alternative acceptance appears far more influential than
response certainty in improving the predictive performance of the choice
model.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1009 Attribute processing, heuristics, and preference construction
The approach and evidence presented suggests that we should not ignore
supplementary information on how alternatives and attributes are processed
prior to a choice response as well as accommodating the degrees of certainty of
actually making the choice. Indeed the improvement in prediction perfor-
mance and significantly different direct elasticities is sufficient enough to
recognize the role that supplementary data plays, regardless of whether one
believes in the credibility of such information.
An ongoing research challenge is to gain a greater understanding and
reliability concerning a range of supplementary questions in aiding our under-
standing of how individuals view the information content of SC experiments,
in contrast to assuming, through ignorance, that all information is relevant
and treated as if all attributes and alternatives are subject to the same trading
regime.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1010 Advanced topics
research into behavioral explanations for the preference changes that appear
to occur over a sequence of choice tasks, using parametric (Bateman et al.
2008; Day et al. 2009; McNair et al. 2010a) and non-parametric tests (Day and
Prades 2010) as well as equality constrained LCMs (McNair et al. 2012).
What we believe is not given enough emphasis is the extent to which we can
learn from an interrogation of each response at the choice set level, and set up
candidate rules, or heuristics (often referred to as “rules of thumb”) that align
with one or more possible processing rules used by an individual, within and
between sequentially administered choice sets, to reveal their choice response.
Specifically, the analysis here is looking for evidence that would be consistent
with respondents’ use of heuristics to make choices in SC experiments. This
matters because of the small, but accumulating empirical evidence, that
alternative AP strategies (APS) do influence behavioral outputs such as
estimates of WTP and model predictive capability (see Hensher 2010 for an
overview). While we can never be certain that a specific rule is applied, we are
seeking out a way to gain confidence in the evidence, given that some pundits
believe that respondents are known to make choices that have no “rational”
attachment.
To illustrate the focus of this chapter, we reproduce, in Table 21.14, data
from one respondent in one of many CEs the authors have conducted,22 in the
context of choosing among three routes for a commuter trip, where the first
route description is the reference or status quo (SQ) alternative associated
with a recent trip. The design attributes are free-flow time (FF), slowed-down
time (SDT), running cost (Cost), toll if applicable (Toll), and overall trip time
variability (Var) (times are in min., costs in dollars, and time variability in plus
or minus min.). We begin with the most commonly assumed normative
processing rule that assumes (in the absence of any known AP heuristic)
that all attributes (and levels) are relevant, and that a fully compensatory
processing strategy is active at the choice set level. Focussing on these five
attributes only, we highlight in shaded grey the most attractive attribute level
(e.g., lowest FF), which varies across the attributes, and propose that if an
alternative had the most attractive level on one or more attributes, and that
alternative was chosen, then we can reasonably suggest that the respondent
was “plausible” in their choice, assuming that the heuristic being used to
process the choice set preserves (i.e., does not ignore) the attribute(s) with
the “most attractive level(s)” based, of course, on only the offered attributes.
22
We undertook exactly this same exercise on a number of data sets and a number of respondents in each
data set, and the message was the same or very similar.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1011 Attribute processing, heuristics, and preference construction
Choice scenario Alternative TotTime TotCost Var FF SDT Cost Toll Choice Plausible = Y
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1012 Advanced topics
Choice scenario Alternative TotTime TotCost Var FF SDT Cost Toll Choice Plausible = Y
Applying the same logic across all of the sixteen choices that each respondent
made, we found that 51 of the 300 respondents were consistently selecting
options that were best on the same attribute, where the experimental design
did not allow them to consistently choose such that two or more attributes
were always best.
There could be other reasons why an alternative is chosen, regardless of
the attribute levels and their relative performance, such as satisfaction with
the status quo or the adoption of a minimum regret calculus, in contrast to
a utility maximization calculus (see Chorus 2010 and Hensher et al. 2013).
Indeed, if a respondent focuses on only one attribute, then we might be
observing a consistent EBA heuristic. However, on the face of the observed
attribute evidence, the 16 choice scenarios satisfy a “plausible choice” test
in 16 situations. Five of the choice scenarios show the status quo as the
preferred alternative (bolded in the choice column in Table 21.14). It may
also be that this example individual adopts one or more AP rules in
evaluating the choice scenarios, which may be the basis of choice in any
of the 16 choice sets, regardless of whether they have passed the “plausi-
bility” test used above. We investigate a number of these AP rules in the
following sections.
Furthermore, supplementary data associated with the respondents’ percep-
tion of whether specific attributes were ignored or added up (where they have
a common metric) might also be brought to bear, to add additional insights
into the choice responses. No attributes were ignored by this respondent, as
reported by responses to supplementary questions. Looking at the possibility
that this individual may also have added up FF and SDT and/or Cost and Toll,
we cannot find any evidence within the “plausible choice” test that it would
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1013 Attribute processing, heuristics, and preference construction
have failed if attribute addition (TotTime, TotCost) had not been applied,
although this may have assisted in making the choice.
The following sections undertake a more formal inquiry using another data
set collected in 2007 in New Zealand, to delve more deeply into alternative
“plausible choice” tests as well as the role of non-compensatory heuristics in
aiding our understanding of how SC sets are processed in assisting the selection
of a choice outcome. We briefly describe the data, followed by a statistical
assessment of them in the search for possible rules (or heuristics) that explain
specific choice responses under specific assumptions. The investigated rules and
tests focus on the influence of the choice sequence on choice response, a
pairwise alternative plausibility test and the presence of dominance, the influ-
ence of non-trading, dimensional versus holistic AP, the influence of relative
attribute levels, and revision of the reference alternative as value learning across
sequenced choice sets. We then discuss the evidence, and conclude with a
proposal to include two new explanatory variables in choice models to capture
the number of attributes in an alternative that are “best” as well as value
learning, together with a statement of the degree of confidence one might
have in the behavioral sense of the data emanating from an SC experiment.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1014 Advanced topics
Attribute Levels
Free-flow time (variation around reference level) −30%, −15%, 0, 15%, 30%
Slowed-down time (variation around reference level) −30%, −15%, 0, 15%, 30%
Trip time variability ±0%, ±,5%, ±10%, ±15%
Running cost (variation around reference level) −40%, −10%, 0 ,20%, 40%
Toll cost $0, $0.5, $1, $1.5, $2, $2.5, $3, $3.5, $4
Practice Game 1
Make your choice given the route features presented in this table, thank you.
Details of your
Route A Route B
recent trip
Time in free-flow traffic (minutes) 30 34 34
Time slowed down by other traffic (minutes) 30 39 26
Trip time variability (minutes) +/– 10 +/– 7 +/– 8
Running costs $6.24 $4.37 $8.11
Toll costs $0.00 $0.50 $3.00
If you make the same trip again,
which route would you choose? Current Road Route A Route B
Back Next
of the design were optimized in accordance with efficient design theory, with a
d-error measure employed (see Rose and Bliemer 2008 and Chapter 6 for
details).
A few additional rules were imposed on the design:
(i) Free-flow and slowed-down times23 in the non-reference alternatives
were set to a base of 5 min. if the respondent entered zero for their
current trip.
23
The distinction between free-flow and slowed-down time is solely to promote the differences in the
quality of travel time between various routes – especially a tolled route and a non-tolled route, and is
separate to the influence of total time.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1015 Attribute processing, heuristics, and preference construction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1016 Advanced topics
and (v) the revision of the reference alternative as value learning across
sequenced choice sets.
The “plausible choice” test presented above for one respondent can be
applied across the 6,048 observations in the New Zealand data. Appendix
21C details all 54 choice sets (or scenarios) where the test failed. An alternative
that would fail the test if chosen was present in 291 choice scenarios, resulting
in a failure rate of 18.6 percent. Note that the lack of a toll in some alternative
(i.e., non-tolled routes) meant that the reference alternative always had at least
one best attribute, and so if it was chosen, the “plausible choice” test could not
fail. Table 21.16 also shows the proportion (and counts) of plausible choice
sets by choice task sequence number. When all attributes are assumed to be
relevant we find, across all 16 choice sets, that 99.12 percent of the observa-
tions pass the “plausible choice” test associated with one or more attributes
being best on the chosen alternative (with the percentage varying across the 16
choice sets from 100 percent to 98.4 percent). When we omit those attributes
which the respondent claimed not to have considered, i.e., they were ignored,
95.78 percent of the observations pass this test (with the percentage varying
1 0.9894 4 0.9471 20
2 1.0000 0 0.9497 19
3 0.9894 4 0.9392 23
4 0.9894 4 0.9603 15
5 0.9974 1 0.9524 18
6 0.9841 6 0.9603 15
7 0.9921 3 0.9841 6
8 0.9947 2 0.9550 17
9 0.9841 6 0.9444 21
10 0.9894 4 0.9656 13
11 0.9894 4 0.9603 15
12 0.9921 3 0.9550 17
13 0.9894 4 0.9550 17
14 0.9894 4 0.9524 18
15 0.9894 4 0.9709 11
16 1.0000 0 0.9735 10
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1017 Attribute processing, heuristics, and preference construction
across the 16 choice sets from 98.41 percent to 94.44 percent), suggesting that
regardless of respondents’ claims of attributes being ignored or not, there is a
very high incidence of plausible choosing. The evidence also suggests that there
is no noticeable deterioration in plausible choice response as the respondent
works through the choice sets from set 1 to set 16. At the respondent level, we
find that the 54 choice observations that failed the “plausible choice” test were
spread across 49 respondents.
The structure of the design has an impact on the incidence of observations
that fail the “plausible choice” test. If full attribute attendance is assumed, then
the test cannot be failed if every alternative in the experimental design has at
least one best attribute. In this empirical setting, only one alternative in the
design did not have a best level (choice scenario 31 in Appendix 21B), which
might have had some role in keeping the incidence rate low (54 observations
out of a possible 291). Other choice scenarios also allowed the test to fail as a
consequence of the forced variability in slowed-down and free-flow time when
the recent trip values were less than 5 min. (rule (i) discussed earlier). Once
ignored attributes are taken into account, the number of scenarios in which
the test is failed can in no way be inferred from the experimental design. While
there are a finite (albeit large) number of combinations with which the
attributes can be ignored or preserved, the analyst does not know a priori
which of these will be chosen. Looking at the entire data set, it can be
determined that when accounting for the reported ignoring of attributes,
255 observations are implausible out of a possible 1,699 choice scenarios
where an implausible choice could have been made, spread across 99
respondents.
We also ran two simple logit models (not reported here) to explore the
possible influence of the commuter’s age, income and gender on whether
the choice response for each choice set was plausible (1) or not (0) under the
“plausible choice” test. One model assumed full attribute relevance and the
other accounted for the attributes that the respondents stated as ignored (or
not preserved24). Income and gender had no influence, but age had a statis-
tically significant impact when accounting for whether an attribute was
ignored or not, with the probability of satisfying the “plausible choice” test
increasing as the commuter ages.
24
We are starting to see, in the literature, a number of ways of indicating that attributes are ignored. A
popular language, especially in the environmental literature, is “attribute non-preservation” or “attribute
non-attendance.”
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1018 Advanced topics
Choice task response latencies have been used by Haaijer et al. (2000) and
Rose and Black (2006) to improve the model fit of the final choice models of
interest. We took an alternative approach, investigating the relationship
between the “plausible choice” test (both under full attribute relevance and
allowing for attributes to be ignored) and the amount of time to complete each
of the 16 choice scenarios (i.e., the response latency). Statistically significant
relationships were found between the choice scenario completion time and
the “plausible choice” test, both under full attribute relevance and when AP
was taken into account, and are reported in Table 21.17 (i) at the choice set
level, and Table 21.17 (ii) at the respondent level. We find that for respondents
who satisfied the “plausible choice” test at the choice set level, the average time
to complete a choice set was 27.47 seconds, with a standard deviation of 26.03
seconds; however when we account for the choice set response being implau-
sible at the observation level, we find that the mean time decreases by 5.21
seconds under full attribute relevance and 5.58 seconds when ignoring attri-
butes is accounted for. When we do the same comparison at the respondent
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1019 Attribute processing, heuristics, and preference construction
level, we find for respondents who have at least one choice set not satisfying
the plausibility test that the average time to complete a choice screen decreases
by 4.84 and 3.66 seconds for full attribute relevance and attribute non-
attendance, respectively, relative to the respondents who pass the test. One
possible explanation for this difference in completion time is that those who
pass the “plausible choice” test are more engaged in the choice task.
Alternatively, those who fail the test might be employing some other heuristic
that allows them to make a more rapid choice. Clearly, no definitive causal
inferences can be drawn, despite speculative opinion that such respondents
might be less engaged in the task.
The Nlogit set up for Table 21.17 is given below (together with the create
commands that apply to subsequent tables):
read;file=C:\papers\WPs2016\choicesequence\data\NZdata_rat.xls$
reject;respid=16110048$
create
;ratd=rat=ratig
;rat1d=rat1-ratig1
;rat2d=rat2-ratig2
;rat3d=rat3-ratig3
;if(alt3=1)refalt=1
;if(alt3=2)scalt2=1
;if(alt3=3)scalt3=1
;time=FF+Sdt$
create
;if(shownum=1)cseq1=1
;if(shownum=2)cseq2=1
;if(shownum=3)cseq3=1
;if(shownum=4)cseq4=1
;if(shownum=5)cseq5=1
;if(shownum=6)cseq6=1
;if(shownum=7)cseq7=1
;if(shownum=8)cseq8=1
;if(shownum=9)cseq9=1
;if(shownum=10)cseq10=1
;if(shownum=11)cseq11=1
;if(shownum=12)cseq12=1
;if(shownum=13)cseq13=1
;if(shownum=14)cseq14=1
;if(shownum=15)cseq15=1
;if(shownum=16)cseq16=1$
create
;alt2a=alt3–1
;if(refff<0)refffb=refff ?ref alt ff better (b) than sc
;if(refff=0)refffe=refff ?ref alt ff equal (e) to sc
;if(refff>0)refffw=refff$ ?ref alt ff worse (w) than sc
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1020 Advanced topics
create
;if(refsdt<0)refsdtb=refsdt ?ref alt sdt better (b) than sc
;if(refsdt=0)refsdte=refsdt ?ref alt sdt equal (e) to sc
;if(refsdt>0)refsdtw=refsdt ?ref alt sdt worse (w) than sc
;if(refvar<0)refvarb=refvar ?ref alt var better (b) than sc
;if(refvar=0)refvare=refvar ?ref alt var equal (e) to sc
;if(refvar>0)refvarw=refvar$ ?ref alt var worse (w) than sc
create
;if(refrc<0)refrcb=refrc ?ref alt rc better (b) than sc
;if(refrc=0)refrce=refrc ?ref alt rc equal (e) to sc
;if(refrc>0)refrcw=refrc ?ref alt rc worse (w) than sc
;if(reftc<0)reftcb=reftc ?ref alt tc better (b) than sc
;if(reftc=0)reftce=reftc ?ref alt tc equal (e) to sc
;if(reftc>0)reftcw=reftc$ ?ref alt tc worse (w) than sc
create
;if(alt3=1&choice1=1)ref=1
;if(alt3=2&choice1=1)sc2=1
;if(alt3=2&choice1=1)sc3=1
;if(rat=1&choice1=1)ratref=1$
create
;bestt=bestff+bestsdt+bestvar+bestrc+besttc
;bet=beff+besdt+bevar+berc+betc
;if(bestffi=-888)bestffic=0;(else)bestffic=bestffi
;if(bestsdti=-888)bestsdic=0;(else)bestsdic=bestsdti
;if(bestvari=-888)bestvarc=0;(else)bestvarc=bestvari
;if(bestrci=-888)bestrcic=0;(else)bestrcic=bestrci
;if(besttci=-888)besttcic=0;(else)besttcic=besttci
;besttc=bestffic+bestsdic+bestvarc+bestrcic+besttcic
;if(beffi=-888)beffic=0;(else)beffic=beffi
;if(besdti=-888)besdic=0;(else)besdic=besdti
;if(bevari=-888)bevarc=0;(else)bevarc=bevari
;if(berci=-888)bercic=0;(else)bercic=berci
;if(betci=-888)betcic=0;(else)betcic=betci
;betc=beffic+besdic+bevarc+bercic+betcic$
create
;if(chSQ=16)allSQ=1;(else)allSQ=0
;if(chSQ=0)allHyp=1;(else)allHyp=0
;rpVar=WstLngth-BstLngth
;rpVarPct=rpVar/WstLngth
;rpCongPc=Slowed/TrpLngth$
Create
;if(income<0)income=-888;if(QuotaVeh=2)business=1;(else)business=0
;numIg=IgFFTime+IgSlowTm+IgTrpVar+IgRnCost+IgTlCost$
sample;all$
reject;respid=16110048$ ? 570 freeflow
reject;respid=2110012$ ? 270 freeflow
reject;respid=16110070$ ? 270 freeflow
reject;respid=2611008$ ? 240 freeflow
create
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1021 Attribute processing, heuristics, and preference construction
;if(alt3=1&choice1=1)ref=1
;if(alt3=2&choice1=1)sc2=1
;if(alt3=2&choice1=1)sc3=1
;refl=ref[-3]
;sc1l=sc2[-3]
;sc2l=sc3[-3]$
create
;if(refl=1)newref=1
;if(sc1l=1)newrefa=1
;if(sc2l=1)newrefa=1$
create
;if(shownum=1)newrefa=0
;if(shownum=1)refcs1=1
;if(shownum=1)newrefa=0$
sample;all$
reject;choice1=-999$
create
;if(alt3=1&choice1=1)ref=1
;if(alt3=2&choice1=1)sc2=1
;if(alt3=2&choice1=1)sc3=1
;if(rat=1&choice1=1)ratref=1$
crmodel;lhs=chtime;rhs=one,ratig;het$
------------------------------------------------------------------
Ordinary least squares regression . . . . . . . . . . . . . . . .
LHS=CHTIME Mean = 27.46613
Standard deviation = 26.03400
Number of observs. = 18336
Model size Parameters = 2
Degrees of freedom = 18334
Residuals Sum of squares = 12403768.05060
Standard error of e = 26.01047
Fit R-squared = .00186
Adjusted R-squared = .00181
Model test F[ 1, 18334] (prob) = 34.2(.0000)
White heteroskedasticity robust covariance matrix.
Br./Pagan LM Chi-sq [ 1] (prob) = 67.92 (.0000)
Model was estimated on Apr 29, 2010 at 09:37:58 AM
-----------+---------------------------------------------------------------------------
| Standard Prob. Mean
CHTIME| Coefficient Error z z>|Z| of X
-----------+---------------------------------------------------------------------------
Constant| 22.1163*** .71810 30.80 .0000
RATIG| 5.58563*** .74490 7.50 .0000 .95779
-----------+---------------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
------------------------------------------------------------------
--> crmodel;lhs=chtime;rhs=one,ratigre;het$
------------------------------------------------------------------
Ordinary least squares regression . . .. . .. . .. . .. . .. . .
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1022 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1023 Attribute processing, heuristics, and preference construction
25
All parameter estimates are statistically significant in all four models.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1024 Advanced topics
Running cost
Running cost
especially the weighted average VTTS (where the weights relate to the attri-
bute levels for free-flow and slowed-down time, and running and toll cost),
using the delta test to obtain standard errors. This is the case even when over 4
percent of the sample is removed due to a suspicion of implausible choice
behavior. This finding suggests that the underlying model is robust, and able
to cope with a small percentage of seemingly implausible decisions. However,
when we compare the mean VTTS at the respondent level (in Table 21.18(ii)),
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1025 Attribute processing, heuristics, and preference construction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1026 Advanced topics
as EBA, even when it contains no best attributes, if it has at least one better
attribute than the rejected alternative on a pairwise comparison. If the pair
includes the reference alternative, it may be that this contrast delivers an
outcome that passes a pairwise “plausibility choice” test on more occasions.
On closer inspection, of the 54 choice sets that failed the full choice set
“plausible choice” test from the 6,048 choice sets in the sample, all but one
satisfied the pairwise “plausible choice” test, with 20 of the chosen alternatives
having the better level on all five attributes, 17 on four attributes, 14 on three
attributes, and two on two attributes. This suggests that if a three-way and/or a
two-way assessment of alternatives are both candidate processing strategies,
then only one respondent failed both “plausibility choice” tests on only one
choice set.
Could it be that just as, as some researchers suggest, there is a bias towards
the reference alternative, there might be circumstances where the bias is
reversed?26 For modeling, it may be appropriate to remove the reference
alternative and treat their processing strategy as elimination-by-alternatives,
allowing the reference alternative to be specified as “non-existent.” This is
equivalent to ignoring an alternative in contrast to an attribute. Within this
data set, 23 respondents chose the reference alternative for all 16 choice tasks,
while a further 17 respondents chose the alternative for 15 out of 16 choice
tasks. However, with 70 respondents never choosing the reference alternative,
total avoidance of the reference alternative was much more common than
total avoidance of the two hypothetical alternatives.
At a choice set level, if a chosen alternative passes the pairwise comparison
test, that is, it is better on at least one attribute than the alternative to which it
is compared, we can state that it is not dominated by the other alternative.
Expressed another way, the alternative in question is dominated by the other
alternative if, for every attribute, the attribute level is equal to or worse than
the other alternative. While the pairwise “plausible choice” test applied above
to those who failed the three-way “plausible choice” test found only one case
of response dominance at the choice set level, an examination of all choice sets
for each respondent uncovered a wider pattern of choice of a dominated
alternative for 46 (out of 6,048) observations. Of the total of 6,048 choice
26
Within the environmental economics literature this is actually an often quoted criticism of eliciting
preferences through stated preference (SP) methods (i.e., that people act strategically in an hypothetical
setting and are more likely to choose a non-reference as it provides them with an “option” to choose it,
even though they would be unlikely to do so in reality). Related to this issue of strategic decision making
is yeah-saying (especially in environmental economics case studies). Within the context of the transport
application here, this is far less likely to be of concern; however, it is important to recognize this matter in
applications more aligned to environmental economics.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1027 Attribute processing, heuristics, and preference construction
27
The 667 choice scenarios containing dominance primarily stemmed from three choice scenarios
containing dominance in the experimental design (see Appendix 21C, choice scenarios 15, 20, and 25).
However, the application of various rules to ensure variation in the attribute levels of the hypothetical
alternatives might have led to the presentation and capture of choice scenarios containing dominance
that was not present in the experimental design.
28
The experimental design did not contain a scenario where the second alternative dominated the
reference alternative. However, the application of various rules as outlined in n. 28 led to this condition
in some of the choice scenarios in the data set.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1028 Advanced topics
other choice scenarios, when dominance is not present, to the detriment of the
quality of the data set. Care should be taken to minimize the chance of this
happening, via clear instructions to the respondent and, if relevant, appropriate
training of the interviewers administering the survey.
To be truly effective, the dominance check requires an unlabeled experi-
ment, such that the only points of comparison between alternatives are the
attributes. In this experiment, while the two hypothetical routes are unlabeled,
the reference alternative represents their current route, and thus other factors
might be influencing whether they choose the reference alternative or one of
the remaining two alternatives. For nine dominated observations, shown in
the last column of Table 21.19, the respondent always chose the reference
alternative over the 16 choice tasks. This suggests that they were not trading
over the attributes, such that a new alternative with superior attributes was not
preferred. Conversely, for 10 dominated observations, the respondent never
chose the reference alternative, instead trading only between the two hypothe-
tical alternatives. The respondent might have been dissuaded from the refer-
ence alternative by their actual experiences of it. Alternatively, inferences
might be made about omitted attributes, leading to seemingly implausible
choices being made (Lancsar and Louviere 2006). The remaining observations
were by respondents who chose the reference alternative and a hypothetical
alternative at least once each. We have no clear explanation for their choice of
a dominated alternative. A preference for, or aversion to, the reference alter-
native might still have been in effect, except with some trading across these
alternatives. Alternatively, the dominance might be the consequence of not
paying attention, for example to the third alternative, as discussed above.
The above examination of dominance assumed that none of the attributes was
ignored. Just as the number of alternatives in a data set that lead to failure in the
“plausible choice” test will be impacted by the particular APS of the individual, so
too will the presence of dominance in a choice task. If an alternative is already
dominated by another alternative, then the omission of attributes in the compar-
ison will either retain the dominance or lead to a tie between the two alternatives.
However, a pair of alternatives that, under full attribute attendance, present
trade-offs, with some attributes better and worse for each alternative, might
degenerate into a condition where one alternative dominates the other. Choice
of a dominated alternative in this scenario might be indicative of several things. A
genuine mistake might have been made either at the time of choice or when
revealing which attributes were ignored. Alternatively, the AP rules might vary
across choice tasks, even though they were gathered once after the completion of
the choice scenarios in this study (see Puckett et al. 2007 for a study where APSs
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1029 Attribute processing, heuristics, and preference construction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1030 Advanced topics
Ignored
Full relevance attributes
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1031 Attribute processing, heuristics, and preference construction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1032 Advanced topics
Normalized Unnormalized
AIC .45238 171.00088
Fin.Smpl.AIC .45342 171.39112
Bayes IC .53566 202.48003
Hannan Quinn .48544 183.49447
Model estimated: Apr 20, 2010, 11:37:55
Hosmer-Lemeshow chi-squared = 5.50370
P-value= .70263 with deg.fr. = 8
-----------+---------------------------------------------------------------------------------------
Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X
-----------+---------------------------------------------------------------------------------------
|Characteristics in numerator of Prob[Y = 1]
Constant| -1.48841 .93467 -1.592 .1113
RPVARPCT| -1.60119 1.56593 -1.023 .3065 .35627
RPCONGPC| .50599 1.10958 .456 .6484 .24711
TRPLNGTH| -.02932** .01255 -2.337 .0194 47.2328
INCOME| .01021 .00827 1.235 .2168 48.7725
BUSINESS| -1.69986** .76525 -2.221 .0263 .33333
TOLLREXP| -.01474 .13872 -.106 .9154 3.23810
NUMIG| .18624 .19859 .938 .3483 1.06878
Next we ran a binary logit model where the dependent variable is whether
the reference alternative was chosen at the choice set level (i.e., a single task).
The actual levels and the two other alternatives did not enter the specification.
This is a move away from the non-traders. There is now one observation per
choice task (Model 2):
sample;all$
reject;choice1=-999$
reject;alt3#1$
create;if(income<0)income=-888$
create;if(QuotaVeh=2)business=1;(else)business=0$
create;numIg=IgFFTime+IgSlowTm+IgTrpVar+IgRnCost+IgTlCost$
logit;lhs=choice1
;rhs=one,rpVarPct,rpCongPc,TrpLngth,income,chTime,business,
TollRExp,numIg$
Normal exit: 5 iterations. Status=0. F= 3930.201
-------------------------------------------------------------------------------------
Binary Logit Model for Binary Choice
Dependent variable CHOICE1
Log likelihood function -3930.20098
Restricted log likelihood -4173.24521
Chi squared [ 8 d.f.] 486.08846
Significance level .00000
McFadden Pseudo R-squared .0582387
Estimation based on N = 6048, K = 9
Information Criteria: Normalization=1/N
Normalized Unnormalized
AIC 1.30265 7878.40197
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1033 Attribute processing, heuristics, and preference construction
Next we added these variables into the base model (as Model 3). The LL
improves from −5,428 to −5,331. The extra SQ attributes were highly sig-
nificant, and the SQ ASC became marginally significant and positive, suggest-
ing that we have accounted for many of the reasons why they do not like the
SQ. Similar improvements can be found with attribute non-attendance and is
reported below as well (LL from −5265 to −5173).
sample;all$
reject;choice1=-999$
nlogit;lhs=choice1,cset3,Alt3
;choices=Cur1,AltA1,AltB1
;checkdata
;model:
U(Cur1) = Rp1 + ff*ff +sdt*sdt+ VR*var + RC*RC +TC*TC + rpVarPct*rpVarPct +
rpCongPc*rpCongPc + TrpLngth*TrpLngth + income*income
+ business*business + TollRExp*TollRExp /
U(AltA1) = SP1 + ff*ff +sdt*sdt+ VR*var + RC*RC +TC*TC/
U(AltB1) = ff*ff +sdt*sdt+ VR*var + RC*RC +TC*TC$
No bad observations were found in the sample
Normal exit: 6 iterations. Status=0. F= 5331.118
-----------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -5331.11780
Estimation based on N = 6048, K = 13
Information Criteria: Normalization=1/N
Normalized Unnormalized
AIC 1.76723 10688.23560
Fin.Smpl.AIC 1.76724 10688.29592
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1034 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1035 Attribute processing, heuristics, and preference construction
| RCI 1120 |
| TCI 656 |
+--------------------------------------------------------------------------+
Normal exit: 6 iterations. Status=0. F= 5173.796
-----------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -5173.79563
Estimation based on N = 6048, K = 13
Information Criteria: Normalization=1/N
Normalized Unnormalized
AIC 1.71521 10373.59125
Fin.Smpl.AIC 1.71522 10373.65158
Bayes IC 1.72963 10460.78853
Hannan Quinn 1.72021 10403.86000
Model estimated: Apr 20, 2010, 11:58:13
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;...; RHS=ONE$
Chi-squared[11] = 2503.05090
Prob [ chi squared > value ] = .00000
Response data are given as ind. choices
Number of obs.= 6048, skipped 0 obs
-----------+----------------------------------------------------------------------
Variable| Coefficient Standard Error b/St.Er. P[|Z|>z]
-----------+----------------------------------------------------------------------
RP1| 1.12992*** .12399 9.113 .0000
FFI| -.09039*** .00339 -26.645 .0000
SDTI| -.10809*** .00692 -15.623 .0000
VRI| -.01017 .00686 -1.481 .1387
RCI| -.44807*** .02141 -20.930 .0000
TCI| -.63032*** .02054 -30.689 .0000
RPVARPCT| -1.00122*** .20707 -4.835 .0000
RPCONGPC| .46208*** .15852 2.915 .0036
TRPLNGTH| -.01111*** .00124 -8.940 .0000
INCOME| -.00419*** .00113 -3.717 .0002
BUSINESS| -.39950*** .06305 -6.336 .0000
TOLLREXP| -.04157** .01788 -2.325 .0201
SP1| .06773* .04018 1.686 .0918
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1036 Advanced topics
29
Accounting for ties did not materially affect the findings.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1037 Attribute processing, heuristics, and preference construction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1038 Advanced topics
sample;all$
reject;choice1=-999$
nlogit ? Model 1
;lhs=choice1,cset3,Alt3
;choices=Cur1,AltA1,AltB1
;checkdata
;model:
U(Cur1) = Rp1 + ff*ff +sdt*sdt+ VR*var + RC*RC +TC*TC /
U(AltA1) = SP1 + ff*ff +sdt*sdt+ VR*var + RC*RC +TC*TC /
U(AltB1) = ff*ff +sdt*sdt+ VR*var + RC*RC +TC*TC $
Normal exit: 7 iterations. Status=0. F= 5428.170
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1039 Attribute processing, heuristics, and preference construction
--------------------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -5428.17018
Estimation based on N = 6048, K = 7
Information Criteria: Normalization=1/N
Normalized Unnormalized
AIC 1.79734 10870.34036
Fin.Smpl.AIC 1.79735 10870.35890
Bayes IC 1.80511 10917.29274
Hannan Quinn 1.80004 10886.63892
Model estimated: Oct 05, 2010, 14:45:23
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;. . .; RHS=ONE$
Chi-squared[ 5] = 1994.30179
Prob [ chi squared > value ] = .00000
Response data are given as ind. choices
Number of obs.= 6048, skipped 0 obs
-----------+-------------------------------------------------------------------
| Standard Prob.
CHOICE1| Coefficient Error z z>|Z|
-----------+-------------------------------------------------------------------
RP1| .00646 .04836 .13 .8937
FF| -.08992*** .00317 -28.34 .0000
SDT| -.09629*** .00598 -16.11 .0000
VR| -.01774*** .00577 -3.07 .0021
RC| -.41466*** .01864 -22.25 .0000
TC| -.53116*** .01928 -27.54 .0000
SP1| .07491* .03979 1.88 .0597
-----------+-------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-------------------------------------------------------------------------------
nlogit ? Model 2
;lhs=choice1,cset3,Alt3
;choices=Cur1,AltA1,AltB1
;checkdata
;model:
U(Cur1) = Rp1 + ff*ff +sdt*sdt+ VR*var + RC*RC +TC*TC +bet*bet/
U(AltA1) = SP1 + ff*ff +sdt*sdt+ VR*var + RC*RC +TC*TC +bet*bet/
U(AltB1) = ff*ff +sdt*sdt+ VR*var + RC*RC +TC*TC +bet*bet$
Normal exit: 6 iterations. Status=0, F= 5417.552
-----------------------------------------------------------------------------
Discrete choice (multinomial logit) model
Dependent variable Choice
Log likelihood function -5417.55217
Estimation based on N = 6048, K = 8
Inf.Cr.AIC = 10851.1 AIC/N = 1.794
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1040 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1041 Attribute processing, heuristics, and preference construction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1042 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1043 Attribute processing, heuristics, and preference construction
The same tests were performed, after accounting for attributes stated as
being ignored (Models 4–6). Any ignored attributes were not included in the
count of the number of best attributes. Model 4 of Table 21.23 sets out the base
model that accounts for attribute ignoring, which itself fits the data better than
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1044 Advanced topics
when all attributes are assumed to be attended to. Model 5 presents the model
that accounts for both heuristics, and Model 6 represents the inclusion of the
number of best attributes in the absence of explicit consideration of each
attribute, after allowing for attributes that are indicated as ignored. The BIC is
improved, at 1.7483 compared to 1.7514 for the base model, with the number
of the best attributes parameter being statistically significant and of the
expected sign.
We report the weighted average VTTS in Table 21.23 (where the weights
are the levels of each attribute, namely free-flow and slowed-down time, and
running and toll cost) which, at the mean estimate for the weighted average
total time, appear to vary sufficiently between full relevance and allowing for
attributes being ignored, but not between models within each of these AP
settings when allowance is made for the number of attributes that are best.
When confidence intervals are generated using a bootstrapping procedure
with 1,000 random draws from normal distributions for relevant parameters,
with moments set at their coefficient point estimates and standard errors
(Krinsky and Robb 1986 and Chapter 8), we find, as expected, that there are
no statistically significant differences between Models 1 and 2 (and between
Models 4 and 5); however, the differences are statistically significant at the 95
percent confidence level between the estimates for full relevance and attribute
non-attendance.
While Model 2 (Model 5) compared to Model 1 (Model 4) is an improve-
ment on BIC, albeit relatively small, its underlying form suggests that all
respondents simultaneously consider and trade between both the attribute
levels in a typical compensatory fashion (both under full relevance and
after ignoring some attributes if applicable), and the number of best
attributes in each alternative. More plausibly, a respondent might resort
solely to the MCD heuristic, or refrain from using it entirely. In recognition
that there may be two classes of respondent, with heuristic application
distinguishing between them, two LCMs30 were estimated (Table 21.23).
Two classes are defined,31 where the utility expressions in each class are
30
See Hensher and Greene (2010) for other examples of the identification of AP heuristics with the
LCM.
31
We investigated a three-class model in which the additional class was defined by all attributes plus the
number of best attributes. The overall fit of the model did not improve and many of the attributes were
not statistically significant. We also estimated a three-class model with class-specific parameter estimates
for attributes included in more than one class, but many parameters were not statistically significant. A
further model allowing for random parameters was investigated but did not improve on the two-class
model reported in Table 21.23.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1045 Attribute processing, heuristics, and preference construction
Class 1
Reference constant (1,0) −0.4207 (−0.67) −0.0676 (−1.06)
SC1 constant (1,0) 0.0674 (1.27) 0.0852 (1.51)
Free-flow time (min.) −0.1234 (−16.52) −0.1448 (−16.6)
Slowed-down time (min.) −0.1192 (−11.37) −0.1676 (−12.1)
Trip time variability (plus/minus min.) −0.0145 (−1.83) −0.0116 (−1.18)
Running cost ($) −0.5467 (−15.04) −0.6980 (−14.9)
Toll cost ($) −0.7159 (−12.92) −0.9038 (−18.0)
Class 2
# of attributes in an alternative that are best 0.2856 (2.76) 0.2665 (3.06)
Probability of class membership:
Class 1 0.8465 (6.25) 0.8206 (9.58)
Class 2 0.1535 (6.35) 0.1794 (8.17)
VTTS ($/person hour):
Free-flow time (based on running cost 13.54 12.45
parameter estimate)
Free-flow time (based on toll cost parameter 10.34 9.61
estimate)
Slowed-down time (based on running cost 13.08 14.41
parameter estimate)
Slowed-down time (based on toll cost 9.99 11.13
parameter estimate)
Weighted average VTTS: 12.60 12.17
Number of observations with attribute ignored:
Free-flow time (min.) – 944
Slowed-down time (min.) – 1504
Trip time variability (plus/minus min.) – 2240
Running cost ($) – 1120
Toll cost ($) – 656
Model fit:
BIC 1.7795 1.7287
LL at convergence −5402.47 −5218.52
Sample size 6048
constrained to represent one of the two heuristics. The first class contains
the attribute levels and ASCs, as per the base model, while the second class
contains only the number of best attributes. A further improvement in
model fit is obtained with this model, with the BIC under full attribute
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1046 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1047 Attribute processing, heuristics, and preference construction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1048 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1049 Attribute processing, heuristics, and preference construction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1050 Advanced topics
These results suggest that some respondents are employing the MCD
heuristic. Under the heuristic, trading is not occurring on the absolute attri-
bute levels. What matters instead is which alternative has the best level for
each attribute, where tallies of the number of best attributes appear to act as a
supplementary step when determining the best alternative. Overall, the mean
probability of class membership of each class in both models is over 80 percent
for processing of the constituent attributes and between 15 and 18 percent for
the number of attributes being the determining influence.
The implication is that the application of the choice model must recog-
nize that the trading among the attributes occurs up to a probability of 85
percent (or 82 percent when accounting for ignoring) on average, with the
number of best attribute levels having an influence up to a probability of 15
percent (or 18 percent) on average. This is an important finding that
downplays the contribution of the marginal disutility of each attribute in
the presence of the overall number of preferred attribute levels associated
with an alternative. When we compare the mean estimates of VTTS for
Model 2 (and Model 5) in Table 21.22 with the LCMs (Table 21.23), the
mean estimates are, respectively, $12.20 and $12.60 for full relevance and
$11.58 and $12.17 when attributes are ignored. The latent class mean
estimates have moved closer to the mean estimates in Table 21.22 when
we do not include allowance for the number of best attributes (i.e., Model 1
and 4 in Table 21.23 of $12.48 and $11.85, respectively). If the contrast is
with the base models in Table 21.22, we would conclude that the VTTS
estimates are not statistically significant in the presence and absence of
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1051 Attribute processing, heuristics, and preference construction
accounting for the MCD rule; however, differences are significant when
allowing for attributes to be ignored. This finding supports the evidence in
studies undertaken by Hensher and his colleagues (see Hensher 2010) that
allowing for attribute non-attendance has a statistically significant influence
on the mean estimates of VTTS.
Parameter
Attributes defined as reference minus SC1 or minus SC2 Percent of data estimates
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1052 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1053 Attribute processing, heuristics, and preference construction
Full relevance
Revised reference (1,0) (which can be any of the three alternatives) 0.9358 (15.73)
Free-flow time (min.) −0.01033 (−52.3)
Slowed-down time (min.) −0.0972 (−17.4)
Trip time variability (plus/minus min.) −0.0178 (−2.96)
Running cost ($) −0.4810 (−36.8)
Toll cost ($) −0.6163 (−43.2)
BIC 1.7637
LL at convergence −5027.00
Sample size 5730
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1054 Advanced topics
Ignored attributes
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1055 Attribute processing, heuristics, and preference construction
Ignored attributes
The weighted mean estimate of value of travel time savings in Table 21.26 is
$11.19 per person hr. This estimate can be contrasted with the findings of the
“base” model (reported in Table 21.22) which only included the design
attributes and constants for the existing reference alternative (without value
learning), namely $12.48 under full attribute reference, or $11.85 when we
allowed for attributes being ignored. At the 95 percent level of confidence, the
weighted mean estimate of VTTS is significantly different and lower. The
Nlogit model command for Table 21.26 (using all the create commands listed
under Table 21.16) is:
dstata;rhs=newref,newrefa,refl,sc1l,sc2l$
Descriptive Statistics
All results based on nonmissing observations.
========================================================================
Variable Mean Std.Dev. Minimum Maximum Cases Missing
========================================================================
All observations in current sample
--------+-------------------------------------------------------------------------------------------
NEWREF| .152686 .359695 .000000 1.00000 18240 0
NEWREFA| .963268E-01 .295047 .000000 1.00000 18240 0
REFL| .152712 .359719 .000000 1.00000 18237 3
SC1L| .963426E-01 .295069 .000000 1.00000 18237 3
SC2L| .963426E-01 .295069 .000000 1.00000 18237 3
reject;choice1=-999$
reject;ratig=0$
nlogit
;lhs=choice1,cset3,Alt3
;choices=Cur1,AltA1,AltB1
;checkdata
;model:
U(Cur1) = refcs1+
ffi*ffi +sdti*sdti+ VRi*vari + RCi*RCi +TCi*TCi
+ rpVarPct*rpVarPct + rpCongPc*rpCongPc + TrpLngth*TrpLngth + income*income
+ business*business + TollRExp*TollRExp +betc*betc/?+numig*numig/
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1056 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1057 Attribute processing, heuristics, and preference construction
21.9.10 Conclusions
What does this evidence suggest for moving forward in the use of CE data?
We have identified a number of features of the choosing process that are
associated with the design of the CE, and the characteristics of respondents,
that influence the SC outcome. Some very specific heuristics appear to have
some systematic influence on choice, in particular the number of attributes
that offer the best levels for an alternative, and the revision of the reference
alternative as a result of value learning, reflected in a previous choice in the
choice set sequence. Building both of these features into the estimated choice
model seems to be a useful step forward in recognition of process rule
heterogeneity. We also believe that the simple “plausible choice” test pro-
posed here for the entire choice set, and for pairwise alternatives, at the
observation and respondent levels, is a useful tool in eliminating data, if
required, that has individuals choosing an alternative that has no single
attribute that is better.
Another avenue for reconciling seemingly implausible choice behavior
stems from the recognition that the choice might be plausible when a decision
or process rule is employed by the decision maker. We have handled several
decision rules in our analysis, namely the treatment of attributes the respon-
dent claimed not to have considered, the application of the MCD heuristic,
and revision of the reference alternative as value learning. However, other
processes might be employed by the respondents that are not consistent with
utility maximization. For example, Gilbride and Allenby (2004) estimated a
choice model that handled conjunctive and disjunctive screening rules, with
choice treated as a compensatory process on the remaining alternatives. Here,
a choice task that appears implausible might pass the plausibility test after
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1058 Advanced topics
some alternatives have been eliminated in the screening stage. Swait (2009)
allowed the unobserved utility of the choice alternatives to be in one of several
discrete states. One of the states allowed conventional utility maximization,
while other states led to “alternative rejection” and “alternative dominance.”
Again, plausibility might prevail once the process rule is employed: in this
case, once rejection and dominance has been taken into account. We propose
that one way to assess these, and other new model forms, is to determine how
well they can explain decisions that appear implausible when viewed through
the conventional prism of utility maximization.
Of interest to the analyst are possible ways in which implausible behavior can
be minimized in an SC environment. In our data, there appeared to be no link
between the task order number and the rate of implausible behavior, which
suggested that the number of choice tasks might not have an impact, within
reasonable limits. Choice task complexity (as defined by dimensions such as
number of alternatives, attributes, and attribute levels) was not varied in this
analysis; however, the impact of task complexity on implausible behavior would
be an interesting area of research. Also of interest is the plausibility of choice in
market conditions, which may be impacted by habit, mood, time pressure, and
ease with which information can be compared. We anticipate that these influ-
ences would lead to a decrease in plausibility of choice, either through an
increase in errors, or an increase in use of decision rules and heuristics. If the
aim of a SC task is to successfully predict market choices, encouraging plausible
choice in the SC environment might not actually be the best way forward.
Survey realism might instead be more important.
This section will hopefully engender an interest in further inquiry into the
underlying sources of process heterogeneity that should be captured explicitly in
the formulation of the utility expressions that represent the preference domain
of each respondent for each alternative. Including additional attribute and AP-
related explanatory variables appears to provide plausible explanations of utility
maximizing behavior in choice making. Testing of the ideas presented on other
data sets will enable us to establish the portability of the evidence.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1059 Attribute processing, heuristics, and preference construction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1060 Advanced topics
decision rule and the non-linear worst level referencing (NLWLR)32 heuristic.
This example is very much in the spirit of Tversky and Simonson’s (1993)
componential contextual model, where utility comprises a context independent
effect (in this case LPLA) and a context dependent effect (in this case NLWLR).
For this model, define the LPLA and NLWLR specifications (respectively as H1
and H2) as illustrated in Equation (21.46). For ease of illustrating this multiple
heuristics approach, the utility function for each alternative is defined by only
two attributes, which are the trip cost (cost), defined as the fare in the case of a
public transport option, and the sum of running cost, toll cost, and parking cost
in the case of the car option and the travel time (TT):
In the NLWLR model, respondents are assumed to make reference to the worst
attribute level of each choice set. This reference may be defined as the maximum
of each of the cost and TT attributes in the choice set, since higher levels of cost
and TT give rise to greater disutility. Moreover, as costmax and TTmax precede the
minus sign, the prior expectation is for β ^ to be positive. If the NLWLR model is
k
a better representation of choice behavior, then the power parameter, φk ; is
expected to satisfy the inequality 0 < φk < 1: This arises from one of the
predictions of prospect theory, which suggests that gains in utility, relative to
the reference, are best represented by a concave function.
For this example, the full data set is used. A choice set in the data may
comprise up to five alternatives. The utility functions for these alternatives can
be written in the form of Equation (21.47):
Ubus ¼ β0;bus þ H2 þ e0
Utrain ¼ β0;train þ W1 H1 þ W2 H2 þ e1
Umetro ¼ β0;metro þ W1 H1 þ W2 H2 þ e2 ð21:47Þ
Uother ¼ H2 þ e3
Utaxi ¼ β0;taxi þ H2 þ e4 :
32
This model was first introduced as a contextual concavity model by Kivetz et al. (2004), who use it to model a
specific phenomenon known as extremeness aversion. They make the prior assumption that relative to the
worst performing attribute, utility is concave in the gains. This assumption is empirically testable, and we
find that it does not always hold (see Leong and Hensher 2012). Hence, it may be more useful to label such a
functional specification as a “non-linear worst level referencing” (NLWLR) model instead.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1061 Attribute processing, heuristics, and preference construction
expðγH1 H1 H1
0 þ γage age þ γinc incomeÞ
W1 ¼
expðγH1 H1 H1 H2 H2 H2
0 þ γage age þ γinc incomeÞ þ expðγ0 þ γage age þ γinc incomeÞ
expðγH2 H2 H2
0 þ γage age þ γinc incomeÞ
W2 ¼ :
expðγH1 H1 H1 H2 H2 H2
0 þ γage age þ γinc incomeÞ þ expðγ0 þ γage age þ γinc incomeÞ
ð21:48Þ
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1062 Advanced topics
Table 21.27 Estimation of weighted LPLA and NLWLR decision rules in utility
^ (z-ratio)
β ^ (z-ratio)
φ ^γ (z-ratio)
approach suggested here may in fact be a preferred way of accounting for the
impact of SECs on decision making.
Appendix 21A Nlogit command syntax for NLWLR and RAM heuristics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1063 Attribute processing, heuristics, and preference construction
create
; ffd = ff - maxff
; ctd = congtime-maxct
; rcd = rc-maxrc
; tcd = tc - maxtc$
? We can then implement the NLWLR heuristic using the NLRPlogit command.
The choice of starting values are crucial in this.
? NLWLR
NLRPlogit
;lhs=choice1,noalts,alt
;choices=Curr, AltA, AltB
;start= 0,0, -0.04, -0.06, -0.2, -0.2, 1.0, 1.0, 1.0, 1.0
;labels = ref, ASCA, ffh1, cth1, rch1, tch1, concff, concct, concrc,
conctc
;fn1 = NLWLR = (ffh1*ffd)^concff + (cth1*ctd)^concct +
(rch1*rcd)^concrc + (tch1*tcd)^conctc
;fn2 = Util1 = ref + NLWLR
;fn3 = Util2 = ASCA + NLWLR
;model:
U(curr) = Util1/
U(altA)= Util2/
U(altB) = NLWLR$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1064 Advanced topics
create
;if(alt=1)|dffasq = dffasq[+1]; dffbsq = dffbsq[+2]; dffab = dffab[+1];
dffba=dffba[+2]
;dctasq = dctasq[+1]; dctbsq = dctbsq[+2]; dctab = dctab[+1];
dctba=dctba[+2]
;drcasq = drcasq[+1]; drcbsq = drcbsq[+2]; drcab = drcab[+1];
drcba=drcba[+2]
;dtcasq = dtcasq[+1]; dtcbsq = dtcbsq[+2]; dtcab = dtcab[+1];
dtcba=dtcba[+2]$
create
;if(alt=2)|dffsqa = dffsqa[-1]; dffba = dffba[+1];dffsqb = dffsqb[-1];
dffbsq = dffbsq[+1]
;dctsqa = dctsqa[-1]; dctba = dctba[+1];dctsqb = dctsqb[-1];
dctbsq = dctbsq[+1]
;drcsqa = drcsqa[-1]; drcba = drcba[+1];drcsqb= drcsqb[-1];
drcbsq = drcbsq[+1]
;dtcsqa = dtcsqa[-1]; dtcba = dtcba[+1];dtcsqb= dtcsqb[-1];
dtcbsq = dtcbsq[+1]$
create
;if(alt=3)|dffsqb = dffsqb[-2]; dffab = dffab[-1];dffsqa = dffsqa[-2];
dffasq= dffasq[-1]
;dctsqb = dctsqb[-2]; dctab = dctab[-1];dctsqa = dctsqa[-2];
dctasq= dctasq[-1]
;drcsqb = drcsqb[-2]; drcab = drcab[-1];drcsqa = drcsqa[-2];
drcasq= drcasq[-1]
;dtcsqb = dtcsqb[-2]; dtcab = dtcab[-1];dtcsqa = dtcsqa[-2];
dtcasq= dtcasq[-1]$
? The following estimates the regret-RAM model
? adv_altj_altj’ denotes the advantage of altj over altj’
? dadv_altj_altj’ denotes the disadvantage of altj over altj’
? radv_altj_altj’ denotes the relative advantage of altj over altj’
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1065 Attribute processing, heuristics, and preference construction
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1066 Advanced topics
Block 1 Block 2
Alt. Cset FF SDT Var Cost Toll Cset FF SDT Var Cost Toll
1 1 0 0 0 0 0 17 0 0 0 0 0
2 −0.15 0.3 −0.3 0.3 3.5 0.15 0.3 −0.3 −0.3 0.5
3 0.15 −0.3 −0.3 −0.3 0 0.15 −0.15 −0.15 0.3 3
1 2 0 0 0 0 0 18 0 0 0 0 0
2 −0.15 −0.3 0.3 −0.15 4 0.15 −0.15 −0.15 −0.15 2.5
3 0.3 −0.15 −0.3 0.15 1 0.15 −0.15 0.15 −0.3 1.5
1 3 0 0 0 0 0 19 0 0 0 0 0
2 0.3 −0.15 −0.15 −0.15 0 −0.3 0.15 0.15 0.3 0
3 −0.15 −0.3 −0.3 −0.3 0.5 0.3 −0.15 −0.3 0.3 0.5
1 4 0 0 0 0 0 20 0 0 0 0 0
2 −0.3 −0.15 0.3 −0.3 3.5 0.3 0.3 −0.3 −0.15 0.5
3 0.3 −0.3 0.3 −0.3 2.5 −0.15 −0.3 −0.15 −0.15 0
1 5 0 0 0 0 0 21 0 0 0 0 0
2 −0.15 −0.15 −0.3 −0.3 2.5 −0.15 0.15 0.3 −0.3 3
3 −0.3 −0.15 −0.3 −0.15 3 0.3 0.15 0.3 −0.3 2.5
1 6 0 0 0 0 0 22 0 0 0 0 0
2 0.3 0.15 −0.3 −0.3 3.5 −0.15 0.3 −0.3 0.3 1.5
3 −0.15 −0.3 −0.15 0.15 3.5 −0.15 −0.3 −0.15 −0.15 2
1 7 0 0 0 0 0 23 0 0 0 0 0
2 −0.3 0.15 0.15 −0.15 1.5 0.3 0.15 −0.15 −0.3 0
3 −0.3 0.3 −0.15 0.15 3 −0.3 0.3 0.3 −0.15 4
1 8 0 0 0 0 0 24 0 0 0 0 0
2 −0.15 −0.3 −0.3 −0.15 2.5 0.3 −0.3 0.3 0.15 0.5
3 −0.3 −0.15 0.3 0.15 2 0.15 0.3 0.15 −0.15 2.5
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1067 Attribute processing, heuristics, and preference construction
Block 1 Block 2
Alt. Cset FF SDT Var Cost Toll Cset FF SDT Var Cost Toll
1 9 0 0 0 0 0 25 0 0 0 0 0
2 −0.3 −0.3 −0.15 0.15 2 −0.3 −0.3 −0.3 0.3 0
3 0.15 −0.15 −0.3 −0.3 4 −0.3 −0.15 0.3 0.3 4
1 10 0 0 0 0 0 26 0 0 0 0 0
2 −0.15 0.3 −0.15 0.15 4 −0.3 −0.15 0.15 0.3 1
3 0.3 −0.15 0.15 −0.3 1 −0.3 0.15 0.3 −0.3 1.5
1 11 0 0 0 0 0 27 0 0 0 0 0
2 0.15 −0.3 0.15 −0.3 4 0.15 −0.15 −0.15 −0.15 3
3 −0.15 0.3 −0.3 0.15 2 −0.15 0.15 −0.15 −0.15 3.5
1 12 0 0 0 0 0 28 0 0 0 0 0
2 −0.15 0.15 0.3 −0.3 2 −0.3 −0.3 −0.15 0.15 4
3 −0.3 −0.3 0.15 −0.3 3.5 −0.15 0.15 0.15 −0.15 1.5
1 13 0 0 0 0 0 29 0 0 0 0 0
2 −0.15 −0.15 0.3 −0.15 1.5 0.15 −0.15 −0.15 0.15 0
3 −0.3 −0.15 −0.3 −0.15 4 −0.15 0.15 −0.3 0.3 0.5
1 14 0 0 0 0 0 30 0 0 0 0 0
2 0.15 −0.3 −0.15 −0.15 1 0.3 −0.15 −0.15 0.15 1
3 0.3 0.15 −0.15 −0.3 1 −0.3 −0.3 −0.15 0.3 3.5
1 15 0 0 0 0 0 31 0 0 0 0 0
2 −0.15 −0.3 0.15 −0.15 0.5 −0.3 0.3 −0.3 0.3 3
3 0.15 −0.3 0.15 0.15 3 −0.15 0.3 −0.15 0.3 0.5
1 16 0 0 0 0 0 32 0 0 0 0 0
2 −0.3 −0.3 −0.3 −0.3 2 −0.3 −0.15 0.15 −0.3 3
3 −0.15 0.3 −0.15 −0.15 0 −0.3 −0.3 −0.3 −0.15 3.5
1 20 0 0 2.6 0 20 2.6 0
2 17 2 6 2.21 4 19 6.21 0
3 26 3 4 2.99 1 29 3.99 1
1 8 2 2 1.2 0 10 1.2 0
2 7 1 6 1.02 4 8 5.02 0
3 10 2 4 1.38 1 12 2.38 1
1 60 0 30 7.8 0 60 7.8 0
2 51 2 34 6.63 0.5 53 7.13 0
3 69 2 34 8.97 3 71 11.97 1
1 25 0 18 3.25 0 25 3.25 0
2 29 4 12 2.28 0.5 33 2.78 0
3 29 3 15 4.23 3 32 7.23 1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1068 Advanced topics
1 30 0 5 3.9 0 30 3.9 0
2 39 2 6 4.49 0.5 41 4.99 1
3 34 4 6 3.32 2.5 38 5.82 0
1 22 0 2 2.86 0 22 2.86 0
2 29 2 6 3.29 0.5 31 3.79 1
3 25 4 6 2.43 2.5 29 4.93 0
1 16 0 5 2.08 0 16 2.08 0
2 21 2 6 2.39 0.5 23 2.89 1
3 18 4 6 1.77 2.5 22 4.27 0
1 20 0 5 2.6 0 20 2.6 0
2 26 2 6 2.99 0.5 28 3.49 1
3 23 4 6 2.21 2.5 27 4.71 0
1 22 0 2 2.86 0 22 2.86 0
2 29 3 4 3.29 1 32 4.29 1
3 15 2 4 3.72 3.5 17 7.22 0
1 35 10 2 5.33 0 45 5.33 0
2 46 8 4 6.13 1 54 7.13 1
3 24 7 4 6.93 3.5 31 10.43 0
1 8 2 2 1.2 0 10 1.2 0
2 10 2 4 1.38 1 12 2.38 1
3 6 1 4 1.55 3.5 7 5.05 0
1 40 5 8 5.59 0 45 5.59 0
2 28 6 5 7.27 3 34 10.27 0
3 34 6 6 7.27 0.5 40 7.77 1
1 50 10 10 7.28 0 60 7.28 0
2 35 13 7 9.46 3 48 12.46 0
3 42 13 8 9.46 0.5 55 9.96 1
1 25 5 8 3.64 0 30 3.64 0
2 18 6 5 4.73 3 24 7.73 0
3 21 6 6 4.73 0.5 27 5.23 1
1 45 45 22 9.36 0 90 9.36 0
2 32 58 16 12.17 3 90 15.17 0
3 38 58 19 12.17 0.5 96 12.67 1
1 40 40 35 8.32 0 80 8.32 0
2 28 52 24 10.82 3 80 13.82 0
3 34 52 30 10.82 0.5 86 11.32 1
1 15 0 2 1.95 0 15 1.95 0
2 10 4 4 2.54 3 14 5.54 0
3 13 4 4 2.54 0.5 17 3.04 1
1 35 40 12 7.67 0 75 7.67 0
2 24 52 9 9.97 3 76 12.97 0
3 30 52 11 9.97 0.5 82 10.47 1
1 10 25 12 3.25 0 35 3.25 0
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1069 Attribute processing, heuristics, and preference construction
2 7 32 9 4.23 3 39 7.23 0
3 8 32 11 4.23 0.5 40 4.73 1
1 22 0 2 2.86 0 22 2.86 0
2 15 4 4 3.72 3 19 6.72 0
3 19 4 4 3.72 0.5 23 4.22 1
1 30 7 6 4.45 0 37 4.45 0
2 21 9 5 5.78 3 30 8.78 0
3 26 9 6 5.78 0.5 35 6.28 1
1 90 0 45 11.7 0 90 11.7 0
2 63 4 32 15.21 3 67 18.21 0
3 76 4 38 15.21 0.5 80 15.71 1
1 65 25 15 10.4 0 90 10.4 0
2 46 32 10 13.52 3 78 16.52 0
3 55 32 13 13.52 0.5 87 14.02 1
1 55 5 12 7.54 0 60 7.54 0
2 38 6 9 9.8 3 44 12.8 0
3 47 6 11 9.8 0.5 53 10.3 1
1 20 20 10 4.16 0 40 4.16 0
2 14 26 7 5.41 3 40 8.41 0
3 17 26 8 5.41 0.5 43 5.91 1
1 80 10 15 11.18 0 90 11.18 0
2 56 13 10 14.53 3 69 17.53 0
3 68 13 13 14.53 0.5 81 15.03 1
1 60 10 12 8.58 0 70 8.58 0
2 42 13 9 11.15 3 55 14.15 0
3 51 13 11 11.15 0.5 64 11.65 1
1 25 0 18 3.25 0 25 3.25 0
2 18 4 12 4.23 3 22 7.23 0
3 21 4 15 4.23 0.5 25 4.73 1
1 55 10 15 7.93 0 65 7.93 0
2 38 13 10 10.31 3 51 13.31 0
3 47 13 13 10.31 0.5 60 10.81 1
1 240 30 30 33.54 0 270 33.54 0
2 168 39 21 43.6 3 207 46.6 0
3 204 39 26 43.6 0.5 243 44.1 1
1 30 15 8 5.07 0 45 5.07 0
2 21 20 5 6.59 3 41 9.59 0
3 26 20 6 6.59 0.5 46 7.09 1
1 30 15 40 5.07 0 45 5.07 0
2 21 20 28 6.59 3 41 9.59 0
3 26 20 34 6.59 0.5 46 7.09 1
1 35 10 2 5.33 0 45 5.33 0
2 24 13 4 6.93 3 37 9.93 0
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1070 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
1071 Attribute processing, heuristics, and preference construction
1 50 40 15 9.62 0 90 9.62 0
2 35 52 10 12.51 3 87 15.51 0
3 42 52 13 12.51 0.5 94 13.01 1
1 22 3 4 3.09 0 25 3.09 0
2 15 4 2 4.02 3 19 7.02 0
3 19 4 3 4.02 0.5 23 4.52 1
1 20 10 15 3.38 0 30 3.38 0
2 14 13 10 4.39 3 27 7.39 0
3 17 13 13 4.39 0.5 30 4.89 1
1 90 0 15 11.7 0 90 11.7 0
2 63 4 10 15.21 3 67 18.21 0
3 76 4 13 15.21 0.5 80 15.71 1
1 50 10 15 7.28 0 60 7.28 0
2 35 13 10 9.46 3 48 12.46 0
3 42 13 13 9.46 0.5 55 9.96 1
1 30 10 12 4.68 0 40 4.68 0
2 21 13 9 6.08 3 34 9.08 0
3 26 13 11 6.08 0.5 39 6.58 1
1 40 15 12 6.37 0 55 6.37 0
2 28 20 9 8.28 3 48 11.28 0
3 34 20 11 8.28 0.5 54 8.78 1
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:49:39 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.026
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
22.1 Introduction
1072
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1073 Group decision making
Hensher and detailed in Brewer and Hensher (2000) and Rose and Hensher
(2004). (ii) studies that develop ways of establishing the influence and power
of each agent in the joint choice outcome, which may or may not use an IACE
framework. Puckett and Hensher (2006) review this literature, which is
primarily in marketing and household economics and has, for example,
been extended and implemented in the study of freight distribution chains
by Hensher et al. (2008), to the study of partnering between bus operators and
the regulator by Hensher and Knowles (2007) and, most recently, to the
household purchase of alternative fueled vehicles by Hensher et al. (2011)
and Beck et al. (2012).
Schematically, the IACE structure involves a sequential engagement of two
or more agents seeking to establish a consensus on their individual prefer-
ences that can, through negotiation or “give-and-take,” result in an agreed or
non-agreed joint choice outcome. The representation of the role of each agent
is identified by an additional shadow value of the power or influence of each
agent, given their own individual preferences (see Arora and Allenby 1999;
Aribarg et al. 2002; Corfman 1991; Corfman and Lehmann 1987; Dosman and
Adamowicz 2006; Hensher et al. 2008). We now discuss the IACE method in
detail, and show how the method can be implemented in Nlogit.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1074 Advanced topics
K
X
Vbi ¼ αbi þ ðβbk xik Þ; ð22:2Þ
k¼1
where Vai represents the observed utility derived by agent a for alternative i,
αai represents a constant specific to alternative i (this value can also be generic
across alternatives), xik is a vector of k design attributes associated with
alternative i, and βak is the corresponding vector of marginal (dis)utility
parameters. Note that the total utility would be a summation of this observed
utility plus an error term that captures unobserved utility. Under the ran-
dom utility-maximization (RUM) framework, the alternative with the
highest total utility is the alternative chosen by that agent.
In the interactive agency process, the initial choices made by agents are
compared. If the same alternative has been selected by both agents, then
it is inferred that this would be the alternative chosen by the group.
Where agreement has been reached between the parties, the choice is said
to be in equilibrium. After each pass, choice tasks where no equilibrium
decision was reached are sent back to each agent for re-evaluation where
one or more of the agents may revise their choice. This process continues
until an equilibrium choice is reached, or the analyst terminates the
process.
For equilibrated choices, the same choice is observed for each member of
the group (i.e., ignoring tasks where no agreement was reached). As such, the
inferred utility of group g can be defined as:
K
X
Vgi ¼ αgi þ ðβgk xik Þ: ð22:3Þ
k¼1
However, if the assumption is made that the group utility is a function of the
individual preferences of each agent weighted by the level of influence of the
agent (or perhaps in the case of a cooperative household, the agent’s level of
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1075 Group decision making
responsibility for the decision or the importance of the decision for one agent
relative to the other), then it is possible to define the utility of group g as:
eðθÞ
λa ¼ : ð22:6Þ
1 þ eðθÞ
This modeling structure lends itself to the mixed logit (ML) model.
To assist in tracking the behavioral outputs of the IACE framework we
refer to each stage in which an agent reviews a choice set of alternatives
and indicates their preferred alternative as a round. When both agents
have completed a round in a sequence, we refer to the completion of a
pass. In the example in this chapter, we have designed the IACE such that
each agent can go up to three rounds before we impose a stop rule, and
hence there are three passes in total. Specific agents may stop after their
initial round (that is, at the completion of pass 1) if they both choose the
same alternative. Agents who do not agree will continue to pass 2 (that is,
rounds 3 and 4 for agents 1 and 2). In pass 2 we commence a process of
feedback, review, and revision or maintenance of pass 1 preferences. In
pass 2, both agents will have knowledge of the preferences of the other
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1076 Advanced topics
agent; in contrast, this is only known in pass 1 if the analyst provides this
information to the subsequent agent.
If agreement occurs in pass 2 for some agents, then a subset will continue
into pass 3 (rounds 5 and 6). Previous studies have found that the majority of
agent pairs agree (cooperatively or non-cooperatively) by the end of round 6.
One of our hypotheses is that specific attributes and their levels become the
major reason for non-agreement in earlier passes, and as the negotiations
unfold there appears to be some degree of give-and-take in the interests of a
resolution, so that an “equilibrium” joint choice can be actioned in the market.
What is of particular interest here is identifying what exogenous drivers
influence agreement and non-agreement at the end of each pass, and how
this knowledge can be used to identify the relative influence, and hence power,
of each agent in decision making. The IACE framework, supplemented by
additional contextual data, enables the analyst to investigate possible influ-
ences on agreement, and the inferred power of each agent (Table 22.1).
We propose the following model system as a way of establishing the
preferences of each agent in an IACE framework and the role that each agent’s
individual preferences play in establishing the group preference function for
choice making.
Stage 1: Each agent participates in a stated choice experiment (CE) with
common choice sets. The behavioral process assumes that each agent acts as if
they are a utility maximizer. The agent-specific models define utility expres-
sions of the form: U(alt i, agent q) i=1,. . .,J; q=1,. . .,Q, where alt defines an
alternative package of attribute levels. For example, with two agents and three
alternatives we have U(a1q1), U(a2q1), U(a3q1) for agent 1 and U(a1q2),
U(a2q2), U(a3q2) for agent 2. An unlabeled stated choice (SC) design (in our
case study – see p. 1079) will be established to parameterize this independent
utility maximizing choice model.
Stage 1 involves a series of rounds and passes as described above, with all
agents participating in pass 1 and incrementally reducing as we move to the
next pass as a result of agreement between parties. Each pass defines a set of
alternatives for each agent that can be jointly modeled as ML. A by-product of
the estimation of pass level models is a binary logit model for agree and non-
agree. In the application setting of vehicle purchase set out in Section 22.5, there
are four identical alternatives assessed by each agent, giving eight alternatives to
be included in the estimation of the pass model for each agent pair.
Stage 2: This involves recognition of the final pass in which an agent pair
agreed and the estimation of a single model in which the utility functions for
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1077 Group decision making
PASS
AG1→AG2
Alt 1 Pass 1 Alt 1 – Alt 1 Agree
Alt 2 AG1 > AG2 Alt 2 – Alt 2
Alt 3 Alt 3 – Alt 3
Alt 1 – Alt 2 Not agree
Alt 1 – Alt 3
Alt 2 – Alt 1
Alt 2 – Alt 3
Alt 3 – Alt 1
Current Alt 3 – Alt 2
TC 1 Pass 2 AG1→AG2 Agree
TC 2 AG1 > AG2 Alt 1 – Alt 1
Alt 2 – Alt 2 Not agree
Alt 3 – Alt 3
Alt 1 – Alt 2
Alt 1 – Alt 3
Alt 2 – Alt 1
Alt 2 – Alt 3
Alt 3 – Alt 1
Alt 3 – Alt 2
Current Pass 3 AG1→AG2
TC 1 AG2 > AG2 Alt 1 – Alt 1 Agree
TC 2 Alt 2 – Alt 2
Alt 3 – Alt 3
Alt 1 – Alt 2 Not agree
Alt 1 – Alt 3
Alt 2 – Alt 1
Alt 2 – Alt 3
Alt 3 – Alt 1
Alt 3 – Alt 2
each agent are drawn from the pass at which the agents agreed. We refer to
this phase as establishing group equilibrium preferences. Importantly these
preferences have benefitted from the sequential process undertaken across the
passes, and hence the parameterization of each attribute to reveal the prefer-
ences of each agent in joint agent choice making space is enriched by the
negotiation that has been completed to establish consensus in choice,
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1078 Advanced topics
The power measures for agents q and _q sum to one, making comparisons of
agent types straightforward. If the two power measures are equal for a given
attribute mix defining a proposition (that is, λqp = (1 – λqp) = 0.5), then group
choice equilibrium is not governed by a dominant agent with respect to
1
The ASCs may not be imported. One can also jointly estimate the attribute parameters and power
weights. See Section 22.5.14.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1079 Group decision making
The data used in the implementation of the IACE framework was collected
using first- and second-year marketing undergraduate students.
The focus is on choosing at automobile type assuming you are in the market
today to acquire a vehicle (regardless of how many vehicles your household
currently owns). The main part of the survey is an unlabeled SC experiment,
administered to pairs of individuals from the same household who are asked
to work through a series of CEs in an interactive way to arrive at a choice
outcome.
Before commencing the SC experiment, respondents were asked a series of
questions related to their currently owned vehicle. This information was then
used to assign agent pairs to a vehicle type (small, medium, large, four-wheel
drive, or luxury vehicle) so as to provide context for the experiment. The
attributes and attribute levels of the SC experiment are shown in Table 22.2.
The list is not extensive, given our primary interest in testing the capability of
undertaking an IACE task using the internet. A lot of logistics effort was
required to arrange to have both agents available at the same time to partici-
pate in the survey, considerably more than if one is surveying an independent
agent.
209 agent pairs participated; 31 were assigned to the small vehicle condi-
tion, 66 to the medium vehicle condition, 35 to the large, 31 to four-wheel
drives, and the remaining 46 to the luxury vehicle experimental condition. We
have selected a sub-sample of pairs that are a male and a female.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1080 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1081 Group decision making
where k represents the number of parameters for the design, LL(β) the log-
likelihood (LL) function of the discrete choice model under consideration, N
the sample size, and β the parameters to be estimated from the design. Given
that we are generating designs and not estimating parameters for an already
existing design, it is necessary to assume a set of priors for the parameter
estimates. Given uncertainty as to the actual population parameters, it is
typical to draw these priors from Bayesian distributions rather than assume
fixed parameter values.
The D(b)-error is calculated by taking the determinant, with both scaled to take
into account the number of parameters to be estimated. It involves a series of
multiplications and subtractions over all the elements of the matrix (see, for
example, Kanninen 2002). As such, the determinant (and by implication, the
D(b)-error measure) summarizes all the elements of the matrix in a single
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1082 Advanced topics
The analysis has been undertaken in a sequence that matches the IACE stages
presented in Section 22.2. We start with the empirical evidence for each of the
passes (Table 22.4), followed by the sources of influence on agreement versus
non-agreement (Table 22.5). Table 22.6 presents the evidence on the power
influence that agents play in each pass, followed by the findings for the
preference model estimated on the agreement pass for each agent pair,
referred to as the group equilibrium model (Table 22.7). The remaining two
tables present the probability contrasts between the sequenced pass models
and the pooled equilibrium passes (Table 22.8) and the sources of agreement
versus disagreement in the group equilibrium model (Table 22.7).
In Table 22.4, Alternatives A–C and E–G refer to the unlabeled three
vehicle alternatives, respectively, for agents 1 and 2, while Alternatives D
and H are the null alternative (that is, “none” in Table 22.3) for each agent.
404 agent pairs participated in pass 1, with a progression to 164 agents pairs in
pass 1, and 91 agent pairs in pass 3.
In the ML pass 1 model, the marginal disutility of the price of the vehicle for
agent 1 and the fuel efficiency (litres/100km) of the vehicles for both agents
were identified as random parameters with constrained triangular distribu-
tions, while vehicle price is a fixed parameter for agent 2. This supports the
presence of preference heterogeneity for these two attributes for agent 1 and
fuel efficiency only for agent 2. Although the alternatives are unlabeled, a
“generic” constant for the three vehicle alternatives for each of the agents was
positive and statistically significant (although marginally so for agent 2)2
suggesting that there are, on average, additional unobserved influences on
relative utility that contribute more than three times to the utility of the
alternatives for agent 1 compared to agent 2. The implication is that agent 1
2
The ASCs for agents 1 and 2 were estimated separately for A, B, and C, but we found that they were almost
identical suggesting no order bias after controlling for the explicit attributes of alternatives. We then
treated them as generic constants across A–C and E–G.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
Table 22.4 Pass model results
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
Table 22.4 (cont.)
Note: Alternative A–C (E–G) = automobile attribute packages for agent 1 (agent 2); Alternative D (H) = null alternative for agent 1 (agent 2)
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1085 Group decision making
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1086 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1087 Group decision making
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1088 Advanced topics
Attribute Alternative(s) ML
Random parameters:
Vehicle price (000s) A–C −0.05148 (−2.1)
Fuel efficiency (litres/100km) A–C −0.19889 (−2.2)
Fixed parameters:
Air conditioning (1,0) A–C 1.4053 (8.1)
Manual transmission (1,0) A–C 1.1743 (6.7)
ABS brakes (1,0) A–C 0.6399 (3.8)
Agent 1 Male (1,0) D 0.6366 (1.8)
No. of cars in household D 0.3519 (3.2)
Null alternative constant D −2.4717 (−2.4)
Pass number D −0.8434 (−2.2)
Random parameter standard deviations
Vehicle price (000s) A–C 0.05148 (2.1)
Fuel efficiency (litres/100km) A–C 0.07956 (2.2)
Error component (alternative-specific heterogeneity) D 0.0153 (4.3)
Sample size 333
LL at zero −679.97
LL at convergence −358.01
Note: We only need to estimate models on A-D since both agents’ agreed and attribute levels are
identical per alternative. We have, however, included the SECs of each agent in the model.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1089 Group decision making
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1090 Advanced topics
on the choice between the three vehicle alternatives and the null. We have
added in a variable to represent the number of passes leading up to agreement.
The negative sign indicates that, after controlling for the influence of all other
observed and unobserved effects, as the number of passes increases, the
probability of choosing the null decreases. This is intuitively plausible to the
extent to which each agent gathers more information on the preferences of
the other party and has more opportunity to negotiate through feedback and
review, resulting in an increasing probability of agreeing on a vehicle choice.
The final model provides empirical estimates of group equilibrium prefer-
ences. These enable us to establish a set of probabilities of choosing each
vehicle package and the null. What is particularly interesting is the extent to
which, on average, the application of the group equilibrium model results in
different mean probability estimates than those obtained for each of the
passes, especially pass 1, which is equivalent to the traditional one-pass SC
experiment. The comparisons are summarized in Table 22.7.
If we used pass 1 as the correct preference revelation setting, essentially
equivalent to what we obtain when we run an agent independent CE with no
feedback and revision, relative to group choice equilibrium, alternatives AE
(1,5) and BF (2,6) would have an over-estimated mean choice probability for
both agents; alternative CG (3,7) has a mean estimate that is identical for agent
1 but an over-estimate in pass 1 for agent 2. The greatest difference is for the
null alternative DH (4,8) where there is a significant under-estimate in pass 1.
Passes 2 and 3 are not so informative since they represent the agents who do
not agree in Pass 1, and which are not usually captured in the traditional non-
feedback revision CE setting.
To gain some more systematic appreciation of the influences on agent
agreement in the initial pass in contrast to the agreement in subsequent passes,
we ran a binary logit model. The findings, given in Table 22.8, suggest that
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1091 Group decision making
Table 22.8 Sources of agreement: passes 2 and 3 versus pass 1 group equilibrium
males are more likely to agree in pass 1 than females. The greater the age
difference, the more likely the negotiation will continue beyond pass 1.
Relative to the agent pair being a brother and sister (base), married, de facto
and non-related couples are more likely to disagree in pass 1 and continue to a
second or third pass. This reinforces the evidence for passes 1, 2, and 3 as
reported in the agree–non-agree models (Table 22.5).
The Nlogit set ups are exactly the ones used to obtain the findings presented in
Section 22.4. We have edited out some of the output that is not essential to
guide the users in setting up the command stream. Passes 1, 2, and 3 have the
same command structure except that different subsets of data are used as
shown by the reject commands we have listed.
Load;file =C:\Papers\WPs2011\IACECar\IACE_Car_MF.sav$
reject;pass#1$
reject;rnd>2$
reject;alt>8$ To eliminate observations that are not applicable
create;pricez=price/1000$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1092 Advanced topics
rplogit
;lhs=choice1,cset,altijz
;choices=altA1,altB1,altC1,altD1,altA2,altB2,altC2,altD2?4 alternatives
for agents 1 & 2
;rpl;fcn=price1(t,1),fuelef12(t,1);halton,pts=500
;par ;utility=util1;prob=pass1p ?storing utilities and probabilities
;model:
U(altA1)=ASC1+price1 pricez+fuelef12 fuel+smallv1 smallv+ac1 ac+trans1
trans+abs1 abs/
U(altB1)=ASC1+price1 pricez+fuelef12 fuel+smallv1 smallv+ac1 ac+trans1
trans+abs1 abs/
U(altC1)=ASC1+price1 pricez+fuelef12 fuel+smallv1 smallv+ac1 ac+trans1
trans+abs1 abs/
U(altD1)=age1 ageA/
U(altA2)=asc2+price2 pricez+fuelef12 fuel+ac2 ac+trans2 trans+abs2 abs/
U(altB2)=asc2+price2 pricez+fuelef12 fuel+ac2 ac+trans2 trans+abs2 abs/
U(altC2)=Asc2+price2 pricez+fuelef12 fuel+ac2 ac+trans2 trans+abs2 abs/
U(altD2)=genderd2 genderb$
Normal exit from iterations. Exit status=0.
+---------------------------------------------------------------+
| Random Parameters Logit Model |
| Maximum Likelihood Estimates |
| Dependent variable CHOICE1 |
| Number of observations 808 |
| Iterations completed 17 |
| Log likelihood function -949.0695 |
| Number of parameters 14 |
| Info. Criterion: AIC = 2.38384 |
| Info. Criterion: BIC = 2.46518 |
| Restricted log likelihood -1680.189 |
| McFadden Pseudo R-squared .4351412 |
| At start values -950.6900 .00170 |
+---------------------------------------------------------------+
+---------------------------------------------------------------+
| Random Parameters Logit Model |
| Replications for simulated probs. = 500 |
| Halton sequences used for simulations |
+---------------------------------------------------------------+
+--------+--------------+----------------+--------+---------+---------+-------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]|
+--------+--------------+----------------+--------+---------+---------+-------+
---------+Random parameters in utility functions
PRICE1 | -.08419594 .02759731 -3.051 .0023
FUELEF12| -.15328834 .06085694 -2.519 .0118
---------+Nonrandom parameters in utility functions
ASC1 | 3.49565266 1.06965972 3.268 .0011
SMALLV1 | -1.46097818 .62849023 -2.325 .0201
AC1 | 1.13049774 .14996959 7.538 .0000
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1093 Group decision making
+---------------------------------------------------------------+
| Binary Logit Model for Binary Choice |
| Maximum Likelihood Estimates |
| Dependent variable AGREE1 |
| Number of observations 3232 |
| Iterations completed 11 |
| Log likelihood function -1367.056 |
| Number of parameters 10 |
| Info. Criterion: AIC = .85214 |
| Restricted log likelihood -1502.479 |
| McFadden Pseudo R-squared .0901328 |
+---------------------------------------------------------------+
+--------+--------------+---------------+---------+---------+-----------+-----------+-----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+---------------+---------+---------+-----------+-----------+-----------+
---------+Characteristics in numerator of Prob[Y = 1]
AGEA | .03993165 .03720238 1.073 .2831 3.63366337
GENDERA | .83576694 .15363277 5.440 .0000 .54455446
AGEB | -.03620329 .03792487 -.955 .3398 3.86138614
GENDERB | .89030702 .15153326 5.875 .0000 .45544554
NUMCARS | -.09759253 .03868549 -2.523 .0116 -18.1980198
TOLD | .92658029 .12092513 7.662 .0000 .24381188
CHOOSE | 1.61992446 .11679118 13.870 .0000 .44925743
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1094 Advanced topics
create
;passA1=pass1p
;passB1=pass1p[+1]
;passC1=pass1p[+2]
;passD1=pass1p[+3]
;passA2=pass1p[+16]
;passB2=pass1p[+17]
;passC2=pass1p[+18]
;passD2=pass1p[+19]$
create
;coopA=passA1 passA2 ? alt A agent 1 and alt A agent 2
;ncoop12=passA1 passB2
;ncoop13=passA1 passC2
;ncoop14=passA1 passD2
;ncoop21=passB1 passA2
;coopB=passB1 passB2
;ncoop23=passB1 passC2
;ncoop24=passB1 passD2
;ncoop31=passC1 passA2
;ncoop32=passC1 passB2
;coopC=passC1 passC2
;ncoop34=passC1 passD2
;ncoop41=passD1 passA2
;ncoop42=passD1 passB2
;ncoop43=passD1 passC2
;coopD=passC1 passC2 $
22.5.6 Removing all but line 1 of the four choice sets per person in pair
create;lined=dmy(32,1)$
reject;lined#1$ To use only line one of the 32
namelist;cprobs=coopA,ncoop12,ncoop13,ncoop14,ncoop21,coopB,ncoop23,ncoop24,
ncoop31,ncoop32,coopC,ncoop34,ncoop41,ncoop42,ncoop43,coopD$
namelist;passpr=passA1,passB1,passC1,passD1,passA2,passB2,passC2,passD2$
dstats;rhs=cprobs,passpr,rnd,pass,lined$
Descriptive Statistics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1095 Group decision making
22.5.7 Getting utilities on 1 line (note: focusing only on overall utilities at this stage)
Sample;all$
reject;pass#1$
reject;rnd>2$
reject;alt>8$ To eliminate obs that are not applicable
create
;utilA1=util1
;utilB1=util1[+1]
;utilC1=util1[+2]
;utilD1=util1[+3]
;utilA2=util1[+16]
;utilB2=util1[+17]
;utilC2=util1[+18]
;utilD2=util1[+19]$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1096 Advanced topics
write;
pass,rnd,coopA,utilA1,utilA2,
pass,rnd,ncoop12,utilA1,utilB2,
pass,rnd,ncoop13,utilA1,utilC2,
pass,rnd,ncoop14,utilA1,utilD2,
pass,rnd,ncoop21,utilB1,utilA2,
pass,rnd,coopB,utilB1,utilB2,
pass,rnd,ncoop23,utilB1,utilC2,
pass,rnd,ncoop24,utilB1,utilD2,
pass,rnd,ncoop31,utilC1,utilA2,
pass,rnd,ncoop32,utilC1,utilB2,
pass,rnd,coopC,utilC1,utilC2,
pass,rnd,ncoop34,utilC1,utilD2,
pass,rnd,ncoop41,utilD1,utilA2,
pass,rnd,ncoop42,utilD1,utilB2,
pass,rnd,ncoop43,utilD1,utilC2,
pass,rnd,coopD,utilD1,utilD2
;format=(15(5F12.5/)5f12.5)
;file=C:\Papers\WPs2011\IACECar\Pass1Power.txt$
reset
read;file=C:\Papers\WPs2011\IACECar\Pass1Power.txt
;names= pass,rnd,prob,util1,util2
;format=(5f12.5);nobs= 1616 ;nvar=5$
dstats;rhs= $
Descriptive Statistics
======================================================================
Variable Mean Std.Dev. Minimum Maximum Cases
======================================================================
All observations in current sample
--------------------------------------------------------------------------------------------------
PASS | 1.00000 .000000 1.00000 1.00000 1616
RND | 1.46535 .498952 1.00000 2.00000 1616
PROB | .658249E-01 .655178E-01 .730000E-03 .556130 1616
UTIL1 | .829769 1.09591 -1.73545 3.49634 1592
UTIL2 | .849197 1.07817 -1.73545 3.49634 1336
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1097 Group decision making
create;diffut=util1-util2;lprob=log(prob/(1-prob))$
crmodel
;lhs=lprob
;rhs=one,util1,util2
;cls:b(2)+b(3)=1$
+-------------------------------------------------------------------------+
| Ordinary least squares regression |
| LHS=LPROB Mean = -3.124691 |
| Standard deviation = 1.121428 |
| WTS=none Number of observs. = 1616 |
| Model size Parameters = 3 |
| Degrees of freedom = 1613 |
| Residuals Sum of squares = 2023.224 |
| Standard error of e = 1.119966 |
| Fit R-squared = .3841372E-02 |
| Adjusted R-squared = .2606210E-02 |
| Model test F[ 2, 1613] (prob) = 3.11 (.0449) |
| Diagnostic Log likelihood = -2474.593 |
| Restricted(b=0) = -2477.703 |
| Info criter. LogAmemiya Prd. Crt. = .2284510 |
| Akaike Info. Criter. = .2284510 |
+-------------------------------------------------------------------------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+---------+--------+----------+----------+----------+
Constant| -3.09768641 .03061562 -101.180 .0000
UTIL1 | .00030301 .00023926 1.266 .2054 -14.0191879
UTIL2 | .00013201 .764675D-04 1.726 .0843 -172.392001
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1098 Advanced topics
create
;if(pass=1)chp1=choice1
;if(pass=2)chp2=choice1$
reject;pass#2$
create;if(rnd=3|rnd=4)rnd34=1$
reject;rnd34#1$
reject;alt>8$ To eliminate obs that are not applicable
nlogit
;lhs=choice1,cset,alt
;choices=altA1,altB1,altC1,altD1,altA2,altB2,altC2,altD2
;utility=util2;prob=pass2p
;ecm= (altD1),(altD2)
;model:
U(altA1)=ac1 ac+trans1 trans+abs1 abs+manualb1 manualb/
U(altB1)=ac1 ac+trans1 trans+abs1 abs+manualb1 manualb/
U(altC1)=ac1 ac+trans1 trans+abs1 abs+manualb1 manualb/
U(altD1)= choose choose /
U(altA2)=smallv2 smallv+ac2 ac+trans2 trans /
U(altB2)=smallv2 smallv
+ac2 ac+trans2 trans /
U(altC2)=smallv2 smallv+ac2 ac+trans2 trans /
U(altD2)=choose choose
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1099 Group decision making
+---------------------------------------------------------------+
| Error Components (Random Effects) model |
| Replications for simulated probs. = 500 |
| Number of obs.= 327, skipped 0 bad obs. |
+---------------------------------------------------------------+
+--------+--------------+----------------+---------+--------+--------+--------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]|
+--------+--------------+----------------+---------+--------+--------+--------+
---------+Nonrandom parameters in utility functions
AC1 | .49005645 .18805095 2.606 .0092
TRANS1 | .64838425 .18805761 3.448 .0006
ABS1 | .68111715 .19230800 3.542 .0004
MANUALB1| -1.40274509 .48704108 -2.880 .0040
CHOOSE | .73906653 .33872551 2.182 .0291
SMALLV2 | 1.51774727 1.09623989 1.385 .1662
AC2 | .88020449 .20142269 4.370 .0000
TRANS2 | .75316259 .20134452 3.741 .0002
---------+Standard deviations of latent random effects
SigmaE01| .00668620 .00541920 1.234 .2173
SigmaE02| .09251584 .00552830 16.735 .0000
logit
;lhs=agree1
;rhs=agea,gendera,ageb,genderb,told,choose,marr,defacto,notrel$
Normal exit from iterations. Exit status=0.
+---------------------------------------------------------------+
| Binary Logit Model for Binary Choice |
| Maximum Likelihood Estimates |
| Dependent variable AGREE1 |
| Number of observations 1308 |
| Iterations completed 5 |
| Log likelihood function -798.1246 |
| Number of parameters 9 |
| Info. Criterion: AIC = 1.23414 |
| Restricted log likelihood -895.2948 |
| McFadden Pseudo R-squared .1085343 |
+---------------------------------------------------------------+
+--------+--------------+----------------+--------+---------+----------+----------+-----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+---------+----------+----------+-----------+
---------+Characteristics in numerator of Prob[Y = 1]
AGEA | .04472296 .04338149 1.031 .3026 3.43425076
GENDERA | -.37357912 .19417657 -1.924 .0544 .52293578
AGEB | -.14802374 .04465762 -3.315 .0009 3.55351682
GENDERB | -.12024167 .19242782 -.625 .5321 .47706422
TOLD | .81835993 .14352136 5.702 .0000 .55351682
CHOOSE | 2.18160246 .20911924 10.432 .0000 .19266055
MARR | .71989585 .21491511 3.350 .0008 .12844037
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1100 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1101 Group decision making
Sample;all$
reject;pass#2$
create;if(rnd=3|rnd=4)rnd34=1$
reject;rnd34#1$
reject;alt>8$ To eliminate obs that are not applicable
create
;utilA1=util1
;utilB1=util1[+1]
;utilC1=util1[+2]
;utilD1=util1[+3]
;utilA2=util1[+16]
;utilB2=util1[+17]
;utilC2=util1[+18]
;utilD2=util1[+19]$
create;lined=dmy(32,1)$
reject;lined#1$ To use only line one of the 32
write;
pass,rnd,coopA,utilA1,utilA2,
pass,rnd,ncoop12,utilA1,utilB2,
pass,rnd,ncoop13,utilA1,utilC2,
pass,rnd,ncoop14,utilA1,utilD2,
pass,rnd,ncoop21,utilB1,utilA2,
pass,rnd,coopB,utilB1,utilB2,
pass,rnd,ncoop23,utilB1,utilC2,
pass,rnd,ncoop24,utilB1,utilD2,
pass,rnd,ncoop31,utilC1,utilA2,
pass,rnd,ncoop32,utilC1,utilB2,
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1102 Advanced topics
pass,rnd,coopC,utilC1,utilC2,
pass,rnd,ncoop34,utilC1,utilD2,
pass,rnd,ncoop41,utilD1,utilA2,
pass,rnd,ncoop42,utilD1,utilB2,
pass,rnd,ncoop43,utilD1,utilC2,
pass,rnd,coopD,utilD1,utilD2
;format=(15(5F12.5/)5f12.5)
;file=C:\Papers\WPs2011\IACECar\Pass2Power.txt$
reset
read;file=C:\Papers\WPs2011\IACECar\Pass2Power.txt
;names= pass,rnd,prob,util1,util2
;format=(5f12.5);nobs= 1616 ;nvar=5$
Last observation read from data file was 752
dstats;rhs= $
Descriptive Statistics
======================================================================
Variable Mean Std.Dev. Minimum Maximum Cases
======================================================================
All observations in current sample
--------------------------------------------------------------------------------------------------
PASS | 2.00000 .000000 2.00000 2.00000 752
RND | 3.36170 .480813 3.00000 4.00000 752
PROB | .666169E-01 .509216E-01 .166000E-02 .285910 736
UTIL1 | 1.11452 .717230 -.397410 2.55852 728
UTIL2 | 1.17962 .689951 -.274140 2.55852 472
create
;diffut=util1-util2
;lprob=log(prob/(1-prob))$
crmodel
;lhs=lprob
;rhs=one,util1,util2
;cls:b(2)+b(3)=1$
+-------------------------------------------------------------------------+
| Ordinary least squares regression |
| LHS=LPROB Mean = -24.12440 |
| Standard deviation = 143.8358 |
| WTS=none Number of observs. = 752 |
| Model size Parameters = 3 |
| Degrees of freedom = 749 |
| Residuals Sum of squares = 9857381. |
| Standard error of e = 114.7202 |
| Fit R-squared = .3655639 |
| Adjusted R-squared = .3638698 |
| Model test F[ 2, 749] (prob) = 215.79 (.0000) |
| Diagnostic Log likelihood = -4631.896 |
| Restricted(b=0) = -4802.983 |
| Chi-sq [ 2] (prob) = 342.17 (.0000) |
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1103 Group decision making
+-------------------------------------------------------------------------+
| Linearly restricted regression |
| Ordinary least squares regression |
| LHS=LPROB Mean = -24.12440 |
| Standard deviation = 143.8358 |
| WTS=none Number of observs. = 752 |
| Model size Parameters = 2 |
| Degrees of freedom = 750 |
| Residuals Sum of squares = .1561210E+08 |
| Standard error of e = 144.2780 |
| Fit R-squared = -.4818718E-02 |
| Adjusted R-squared = -.6158477E-02 |
| Diagnostic Log likelihood = -4804.790 |
| Restricted(b=0) = -4802.983 |
| Info criter. LogAmemiya Prd. Crt. = 9.946140 |
| Akaike Info. Criter. = 9.946140 |
+-------------------------------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+-----------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+-----------+----------+
Constant| 20.2625944 6.47789105 3.128 .0018
UTIL1 | .96009981 .01110134 86.485 .0000 -30.8040281
UTIL2 | .03990019 .01110134 3.594 .0003 -371.227687
reset
Load;file =C:\Papers\WPs2011\IACECar\IACE_Car_MF.sav$
reject;pass#3$
create;if(rnd=5|rnd=6)rnd56=1$
reject;rnd56#1$
reject;alt>8$ To eliminate obs that are not applicable
create
;pricez=price/1000$
create
;if(relation=1)marr=1
;if(relation=2)defacto=1
;if(relation=3)notrel=1$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1104 Advanced topics
nlogit
;lhs=choice1,cset,alt
;choices=altA1,altB1,altC1,altD1,altA2,altB2,altC2,altD2
;utility=util2;prob=pass2p
;model:
U(altA1)=ASC1+ac1 ac+trans1 trans+abs1 abs/
U(altB1)=ASC1+ac1 ac
+trans1 trans+abs1 abs/
U(altC1)=ASC1+ac1 ac+trans1 trans+abs1 abs/
U(altD1)= marr marr/
U(altA2)=ac2 ac /
U(altB2)=ac2 ac/?+trans2 trans /
U(altC2)=ac2 ac/?+trans2 trans /
U(altD2)=marr marr$
Normal exit from iterations. Exit status=0.
+--------------------------------------------------------------+
| Discrete choice (multinomial logit) model |
| Maximum Likelihood Estimates |
| Dependent variable Choice |
| Number of observations 182 |
| Iterations completed 6 |
| Log likelihood function -237.1371 |
| Number of parameters 6 |
| Info. Criterion: AIC = 2.67184 |
| Info. Criterion: BIC = 2.77746 |
| R2=1-LogL/LogL Log-L fncn R-sqrd RsqAdj |
| Number of obs.= 182, skipped 0 bad obs. |
+--------------------------------------------------------------+
+--------+--------------+----------------+--------+--------+---------+--------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]|
+--------+--------------+----------------+--------+--------+--------+---------+
ASC1 | -.88925055 .38018939 -2.339 .0193
AC1 | .55795018 .26762586 2.085 .0371
TRANS1 | .41948069 .26803721 1.565 .1176
ABS1 | .61454275 .28223996 2.177 .0295
MARR | -1.71948941 1.05072358 -1.636 .1017
AC2 | .94820487 .26548810 3.572 .0004
logit
;lhs=agree1
;rhs=genderb,gendera,numcars,marr,choose,ageb$
Normal exit from iterations. Exit status=0.
+---------------------------------------------------------------+
| Binary Logit Model for Binary Choice |
| Maximum Likelihood Estimates |
| Dependent variable AGREE1 |
| Number of observations 728 |
| Iterations completed 5 |
| Log likelihood function -358.5647 |
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1105 Group decision making
| Number of parameters 6 |
| Info. Criterion: AIC = 1.00155 |
| Restricted log likelihood -383.3864 |
| McFadden Pseudo R-squared .0647432 |
+---------------------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+-----------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+-----------+----------+
---------+Characteristics in numerator of Prob[Y = 1]
GENDERB | -1.33326571 .22254840 -5.991 .0000 .43406593
GENDERA | -1.51788811 .22970120 -6.608 .0000 .56593407
NUMCARS | .22571487 .07816808 2.888 .0039 1.73076923
MARR | .78591991 .28140716 2.793 .0052 .10989011
CHOOSE | 1.13743684 .26059781 4.365 .0000 .10439560
AGEB | -.14127984 .04296746 -3.288 .0010 3.68681319
create
;passA1=pass3p
;passB1=pass3p[+1]
;passC1=pass3p[+2]
;passD1=pass3p[+3]
;passA2=pass3p[+16]
;passB2=pass3p[+17]
;passC2=pass3p[+18]
;passD2=pass3p[+19]$
create
;coopA=passA1 passA2 ;ncoop12=passA1 passB2
;ncoop13=passA1 passC2
;ncoop14=passA1 passD2
;ncoop21=passB1 passA2
;coopB=passB1 passB2
;ncoop23=passB1 passC2
;ncoop24=passB1 passD2
;ncoop31=passC1 passA2
;ncoop32=passC1 passB2
;coopC=passC1 passC2
;ncoop34=passC1 passD2
;ncoop41=passD1 passA2
;ncoop42=passD1 passB2
;ncoop43=passD1 passC2
;coopD=passC1 passC2 $
create;lined=dmy(32,1)$
reject;lined#1$ To use only line one of the 32
namelist;cprobs=coopA,ncoop12,ncoop13,ncoop14,ncoop21,coopB,ncoop23,
ncoop24,
ncoop31,ncoop32,coopC,ncoop34,ncoop41,ncoop42,ncoop43,coopD$
namelist;passpr=passA1,passB1,passC1,passD1,passA2,passB2,passC2,passD2$
dstats;rhs=cprobs,passpr,rnd,pass,lined$
Descriptive Statistics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1106 Advanced topics
======================================================================
Variable Mean Std.Dev. Minimum Maximum Cases
======================================================================
--------------------------------------------------------------------------------------------------
All observations in current sample
--------------------------------------------------------------------------------------------------
COOPA | .728784E-01 .352189E-01 .254591E-01 .157090 17
NCOOP12 | .739366E-01 .357962E-01 .221670E-01 .142914 17
NCOOP13 | .976446E-01 .557435E-01 .321044E-01 .227205 17
NCOOP14 | .496361E-01 .304240E-01 .179073E-01 .130852 17
NCOOP21 | .719108E-01 .348934E-01 .166485E-01 .134326 17
COOPB | .701286E-01 .330785E-01 .229877E-01 .134326 17
NCOOP23 | .927623E-01 .488328E-01 .290864E-01 .201830 17
NCOOP24 | .477815E-01 .255676E-01 .109036E-01 .940584E-01 17
NCOOP31 | .638935E-01 .516668E-01 .166485E-01 .250758 17
NCOOP32 | .647819E-01 .483981E-01 .189139E-01 .213877 17
COOPC | .793098E-01 .467249E-01 .254111E-01 .213877 17
NCOOP34 | .439761E-01 .277870E-01 .693795E-02 .971526E-01 17
NCOOP41 | .408312E-01 .240431E-01 .650025E-02 .850882E-01 17
NCOOP42 | .431087E-01 .287565E-01 .498137E-02 .107663 17
NCOOP43 | .542407E-01 .319923E-01 .674357E-02 .112520 17
COOPD | .793098E-01 .467249E-01 .254111E-01 .213877 17
PASSA1 | .294096 .106673 .105533 .462469 17
PASSB1 | .282583 .113634 .130491 .462469 17
PASSC1 | .251961 .133596 .908731E-01 .542216 17
PASSD1 | .171360 .932685E-01 .282525E-01 .317525 17
PASSA2 | .249514 .812497E-01 .127583 .462469 17
PASSB2 | .251956 .844410E-01 .112318 .462469 17
PASSC2 | .323957 .109466 .179177 .585510 17
PASSD2 | .174573 .745316E-01 .439956E-01 .310449 17
RND | 5.35294 .492592 5.00000 6.00000 17
PASS | 3.00000 .000000 3.00000 3.00000 17
LINED | 1.00000 .000000 1.00000 1.00000 17
Sample;all$
reject;pass#3$
create;if(rnd=5|rnd=6)rnd56=1$
reject;rnd56#1$
reject;alt>8$ To eliminate obs that are not applicable
create
;utilA1=util1
;utilB1=util1[+1]
;utilC1=util1[+2]
;utilD1=util1[+3]
;utilA2=util1[+16]
;utilB2=util1[+17]
;utilC2=util1[+18]
;utilD2=util1[+19]$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1107 Group decision making
create;lined=dmy(32,1)$
reject;lined#1$ To use only line one of the 32
write;
pass,rnd,coopA,utilA1,utilA2,
pass,rnd,ncoop12,utilA1,utilB2,
pass,rnd,ncoop13,utilA1,utilC2,
pass,rnd,ncoop14,utilA1,utilD2,
pass,rnd,ncoop21,utilB1,utilA2,
pass,rnd,coopB,utilB1,utilB2,
pass,rnd,ncoop23,utilB1,utilC2,
pass,rnd,ncoop24,utilB1,utilD2,
pass,rnd,ncoop31,utilC1,utilA2,
pass,rnd,ncoop32,utilC1,utilB2,
pass,rnd,coopC,utilC1,utilC2,
pass,rnd,ncoop34,utilC1,utilD2,
pass,rnd,ncoop41,utilD1,utilA2,
pass,rnd,ncoop42,utilD1,utilB2,
pass,rnd,ncoop43,utilD1,utilC2,
pass,rnd,coopD,utilD1,utilD2
;format=(15(5F12.5/)5f12.5)
;file=C:\Papers\WPs2011\IACECar\Pass3Power.txt$
reset
read;file=C:\Papers\WPs2011\IACECar\Pass3Power.txt
;names= pass,rnd,prob,util1,util2
;format=(5f12.5);nobs= 1616 ;nvar=5$
dstats;rhs= $
Descriptive Statistics
======================================================================
Variable Mean Std.Dev. Minimum Maximum Cases
======================================================================
--------------------------------------------------------------------------------------------------
PASS | 3.00000 .000000 3.00000 3.00000 272
RND | 5.35294 .478766 5.00000 6.00000 272
PROB | .653828E-01 .416536E-01 .498000E-02 .250760 272
UTIL1 | .196654 .475090 -.889250 .948200 248
UTIL2 | .000000 .000000 .000000 .000000 4
create
;diffut=util1-util2
;lprob=log(prob/(1-prob))$
crmodel
;lhs=lprob
;rhs=one,util1,util2
;cls:b(2)+b(3)=1$
+-------------------------------------------------------------------------+
| Ordinary least squares regression |
| Model was estimated Feb 26, 2007 at 05:13:32PM |
| Standard deviation = .7395663 |
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1108 Advanced topics
RESET
Load;file =C:\Papers\WPs2011\IACECar\IACE_Car_MF.sav$
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1109 Group decision making
-create
;if(agea=0)ageaa=21
;if(agea=1)ageaa=27
;if(agea=2)ageaa=32
;if(agea=3)ageaa=37
;if(agea=4)ageaa=43
;if(agea=5)ageaa=48
;if(agea=6)ageaa=53
;if(agea=7)ageaa=58
;if(agea=8)ageaa=65
;if(agea=9)ageaa=75$
create
;if(ageb=0)agebb=21
;if(ageb=1)agebb=27
;if(ageb=2)agebb=32
;if(ageb=3)agebb=37
;if(ageb=4)agebb=43
;if(ageb=5)agebb=48
;if(ageb=6)agebb=53
;if(ageb=7)agebb=58
;if(ageb=8)agebb=65
;if(ageb=9)agebb=75$
create
;if(rnd=2 & rndagree=2)requi=1
;if(rnd=3 & rndagree=3)requi=2
;if(rnd=4 & rndagree=4)requi=3
;if(rnd=5 & rndagree=5)requi=4
;if(rnd=6 & rndagree=6)requi=5$
reject;requi=0$
reject;requi>5$
create
;if(requi=1)equiR2=1
;if(requi=2)equiR3=1
;if(requi=3)equiR4=1
;if(requi=4)equiR5=1
;if(requi=5)equiR6=1
;gendB=genderb[-4]
?to get gender of second agent (note one is M and one is F only)
;agB=agebb[-4]
?to get age of second agent
;agediff=ageaa-agebb$
reject;altij>4$
Done because for equilibrium Agent 1 and 2 have same attributes
Not socios)
create
;pricez=price/1000
;pass23=pass2+pass3$
rplogit
;lhs=choice1,cset,altij
;choices=altA,altB,altC,altD
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1110 Advanced topics
;rpl
;fcn=price (t,1),fuel(t,0.4)
;halton;pts=600 ?0
;par
;utility=utileq;prob=passeq
;ecm=(altd)
;model:
U(altA)=price pricez+fuel fuel+ac ac+trans trans+abs abs/
U(altB)=price pricez+fuel fuel
+ac ac+trans trans+abs abs/
U(altC)=price pricez+fuel fuel+ac ac+trans trans+abs abs/
U(altD)=ASCD+gendera gendera +pass23 pass3+pass23 pass2 +ncars
numcars$
Normal exit from iterations. Exit status=0.
+--------------------------------------------------------------- +
| Random Parms/Error Comps. Logit Model |
| Maximum Likelihood Estimates |
| Dependent variable CHOICE1 |
| Number of observations 325 |
| Iterations completed 12 |
| Log likelihood function -358.0119 |
| Number of parameters 10 |
| Info. Criterion: AIC = 2.26469 |
| Restricted log likelihood -450.5457 |
| McFadden Pseudo R-squared .2053816 |
| Degrees of freedom 10 |
| At start values -358.3682 .00099 |
+--------------------------------------------------------------- +
+---------------------------------------------------------------+
| Random Parms/Error Comps. Logit Model |
| Replications for simulated probs. = 600 |
| Halton sequences used for simulations |
| Number of obs.= 333, skipped 8 bad obs. |
+---------------------------------------------------------------+
+--------+--------------+----------------+--------+--------+--------+---------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]|
+--------+--------------+----------------+--------+--------+---------+--------+
---------+Random parameters in utility functions
PRICE | -.05148758 .02452672 -2.099 .0358
FUEL | -.19889737 .09055701 -2.196 .0281
---------+Nonrandom parameters in utility functions
AC | 1.40531487 .17382527 8.085 .0000
TRANS | 1.17429432 .17521797 6.702 .0000
ABS | .63998532 .16686204 3.835 .0001
ASCD | -2.47166656 1.04154490 -2.373 .0176
GENDERA | .63657641 .34604309 1.840 .0658
PASS23 | -.84341903 .38167430 -2.210 .0271
NCARS | .35196772 .11157544 3.155 .0016
---------+Derived standard deviations of parameter distributions
TsPRICE | .05148758 .02452672 2.099 .0358
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1111 Group decision making
RESET
Load;file =C:\Papers\WPs2011\IACECar\IACE_Car_MFZ.sav$
create
;if(relation=1)marr=1
;if(relation=2)defacto=1
;if(relation=3)notrel=1$
dstats;rhs= $
Descriptive Statistics
======================================================================
Variable Mean Std.Dev. Minimum Maximum Cases
======================================================================
All observations in current sample
--------------------------------------------------------------------------------------------------
PASS23 | .348348 .476626 .000000 1.00000 1332
PASSEQ | .250000 .188431 .103374E-01 .852799 1300
UTILEQ | -1.50674 1.04679 -3.98054 1.19307 1300
MARR | .114114 .318069 .000000 1.00000 1332
DEFACTO | .600601E-01 .237687 .000000 1.00000 1332
NOTREL | .630631E-01 .243168 .000000 1.00000 1332
mlogit;lhs=pass23;rhs=one,gendera,agediff,marr,defacto,notrel$
Normal exit from iterations. Exit status=0.
+---------------------------------------------------------------+
| Binary Logit Model for Binary Choice |
| Maximum Likelihood Estimates |
| Dependent variable PASS23 |
| Number of observations 1332 |
| Iterations completed 4 |
| Log likelihood function -829.0678 |
| Number of parameters 6 |
| Info. Criterion: AIC = 1.25386 |
| Restricted log likelihood -861.0290 |
| McFadden Pseudo R-squared .0371198 |
+---------------------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+----------+-----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+----------+-----------+
---------+Characteristics in numerator of Prob[Y = 1]
Constant| -.61707294 .09209337 -6.701 .0000
GENDERA | -.27962040 .12301604 -2.273 .0230 .53753754
AGEDIFF | .04027671 .00888108 4.535 .0000 -1.16216216
MARR | .63308687 .17752829 3.566 .0004 .11411411
DEFACTO | .27004987 .24010498 1.125 .2607 .06006006
NOTREL | 1.18134367 .25671565 4.602 .0000 .06306306
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1112 Advanced topics
sample;all$
reject;altij#1$
dstats;rhs=altij,passeq,utileq,pass,choice1$
Descriptive Statistics
======================================================================
Variable Mean Std.Dev. Minimum Maximum Cases
======================================================================
All observations in current sample
--------------------------------------------------------------------------------------------------
ALTIJ | 1.00000 .000000 1.00000 1.00000 1317
PASSEQ | .258778 .187604 .186015E-01 .851210 325
UTILEQ | -1.23537 1.09126 -3.79724 1.35483 325
PASS | 1.52468 .725380 1.00000 3.00000 1317
CHOICE1 | .282460 .450367 .000000 1.00000 1317
sample;all$
reject;altij#2$
dstats;rhs=altij,passeq,utileq,pass,choice1$
Descriptive Statistics
======================================================================
Variable Mean Std.Dev. Minimum Maximum Cases
======================================================================
All observations in current sample
----------------------------------------------------------------------
ALTIJ | 2.00000 .000000 2.00000 2.00000 1317
PASSEQ | .266494 .200153 .192713E-01 .821879 325
UTILEQ | -1.20870 1.11740 -3.79555 1.35419 325
PASS | 1.52468 .725380 1.00000 3.00000 1317
CHOICE1 | .291572 .454659 .000000 1.00000 1317
sample;all$
reject;altij#3$
dstats;rhs=altij,passeq,utileq,pass,choice1$
Descriptive Statistics
======================================================================
Variable Mean Std.Dev. Minimum Maximum Cases
======================================================================
All observations in current sample
--------------------------------------------------------------------------------------------------
ALTIJ | 3.00000 .000000 3.00000 3.00000 1317
PASSEQ | .274737 .201526 .171286E-01 .754001 325
UTILEQ | -1.21273 1.13896 -3.58578 1.35740 325
PASS | 1.52468 .725380 1.00000 3.00000 1317
CHOICE1 | .261200 .439455 .000000 1.00000 1317
sample;all$
reject;altij#4$
dstats;rhs=altij,passeq,utileq,pass,choice1$
Descriptive Statistics
All results based on nonmissing observations.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1113 Group decision making
======================================================================
Variable Mean Std.Dev. Minimum Maximum Cases
======================================================================
All observations in current sample
--------------------------------------------------------------------------------------------------
ALTIJ | 4.00000 .000000 4.00000 4.00000 1317
PASSEQ | .199990 .148926 .102940E-01 .836078 325
UTILEQ | -1.90069 .687080 -3.36625 .997002 325
PASS | 1.52468 .725380 1.00000 3.00000 1317
CHOICE1 | .164768 .371112 .000000 1.00000 1317
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1114 Advanced topics
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
1115 Group decision making
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:55:15 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.027
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
Terms marked * appear in the model output rather than the main text.
A-error Rule used in designing choice experiments. The design with the lowest A-
error is called A-optimal. Instead of taking the determinant, the A-error takes the
trace of the AVC matrix.
a priori Before the fact.
alternative hypothesis Outcome of the hypothesis test for which one wishes to find
supporting evidence.
alternatives Options containing specified levels of attributes.
alternative-specific constant (ASC) Parameter for a particular alternative that is
used to represent the role of unobserved sources of utility.
arc elasticity Elasticity calculated over a range of values for the reference variable.
attribute Specific variable that is included in an estimated model as an explanatory
variable.
attribute invariance Limited variation in the levels of attributes observed in the
market.
attribute level label Narrative description corresponding to an attribute.
attribute levels Specific value taken by an attribute. Experimental designs require
that each attribute takes on two or more levels, which may be quantitative or
qualitative.
attribute non-attendance (ANA) Rule of not attending to (or ignoring) an
attribute in choosing an alternative.
attribute processing Set of rules used by respondents to assess attributes and make
choices.
attributes Characteristics of an alternative.
balanced design Design in which the levels of any given attribute appear the same
number of times as all other levels for that particular attribute.
1116
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:51:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.028
Cambridge Books Online © Cambridge University Press, 2015
1117 Select glossary
best–worst Way to use choice data where the focus is on identifying the best and
worst attribute or the best and worst alternative and modeling the choice with this
information using various methods.
bias Force that leads to incorrect inferences regarding behavior.
blocking Use of an additional design column to assign sub-sets of treatment
combinations to decision makers.
bootstrapping It is often uncertain what formula should be used to compute the
asymptotic covariance matrix of an estimator. A reliable and common strategy is to
use a parametric bootstrap procedure.
branch Third division of alternatives in a nested model.
calibrate To adjust the constant terms in a model in order to replicate actual market
shares through model estimation.
calibration constant Constant used to allow the model to correspond to actual
choice shares.
cardinal Numerical value that is directly comparable to all other such values (i.e., a
value of ten is twice as good as a value of five).
ceteris paribus All other things held constant (Latin).
choice-based sampling Sampling method involving the deliberate over- and
under-sampling of groups that make particular choices.
choice outcome Observed choice behavior of an individual.
choice set generation Process of identifying the choices that are relevant to a
particular problem.
choice set Set of alternatives over which an agent makes a choice.
choice setting Scenario in which an agent’s choice takes place.
choice shares Proportion of the population that chooses a particular alternative.
Cholesky matrix A lower off-diagonal matrix L which is used in the factorization of
a matrix A, such that A = LL0 .
closed-form Mathematically tractable, involving only mathematical operations.
coding Use of numbers to designate a particular state of an attribute (e.g., zero
denotes male and one denotes female).
coefficient Scalar value by which a particular element in a model is multiplied in the
estimation process.
cognitive burden Level of difficulty faced by a respondent in considering a set of
choice menus.
column vector Matrix containing only one column.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:51:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.028
Cambridge Books Online © Cambridge University Press, 2015
1118 Select glossary
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:51:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.028
Cambridge Books Online © Cambridge University Press, 2015
1119 Select glossary
design efficiency Designing choice experiments that use information on priors and
the asymptotic variance-covariance matrix (AVC) to obtain a determinant of the
AVC called the D-error; the lowest value is the D-efficient design.
discrete choice Selection of one alternative among a set of mutually exclusive
alternatives.
discrete Variable that can only take a finite level of values.
distribution Range over which the value of a variable may be, and the frequency
with which each of those values is, observed to occur.
dummy coding Denotes the existence of a particular attribute with a one and its
absence with a zero.
effect Impact of a particular treatment upon a response variable.
effects coding See orthogonal coding.
efficient design See design efficiency.
elasticity Percentage change in one variable with respect to a percentage change in
another.
elemental alternatives Alternatives that are not composites of other alternatives
(e.g., choosing to drive a car, choosing to take a train).
elimination-by-aspects (EBA) The EBA heuristic states that an alternative is
eliminated if the attribute of that alternative fails to meet a certain threshold.
endogenous Within the control of the decision maker (e.g., which alternative to
choose).
endogenous weighting Weighting of choice data based on information regarding
true market shares.
error components Random components associated with each alternative which
may be defined with common or different variances for one or more of the
alternatives.
exogenous Outside of the control of the decision maker (e.g., gender or age).
exogenous weighting Weighting of any data besides choice.
expected utility theory (EUT) Recognizes that individual decision making is
made under uncertainty or risk (i.e., the outcome is not deterministic).
expected value Average value of a set of values observed for a particular variable.
experiment Manipulation of one variable with the purpose of observing the effect of
that manipulation upon a second variable.
experimental design Specification of attributes and attribute levels for use in an
experiment.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:51:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.028
Cambridge Books Online © Cambridge University Press, 2015
1120 Select glossary
factor level Specific value taken by a factor. Experimental designs require that each
factor takes on two or more levels, which may be quantitative or qualitative.
fixed parameter Parameter with a constant value. Also refers to a non-random
parameter.
foldover Reproduction of a design in which the factor levels of the design are
reversed (e.g., replace 0 with 1 and replace 1 with 0).
full factorial design Design in which all possible treatment combinations are
enumerated.
generalized cost Measure of cost that allows for the direct comparison of the costs
of all alternatives. This involves the conversion of attribute levels into a common
measure, generally a monetary value (e.g., converting travel time into a value of
travel time, VTTS).
generalized mixed logit Extension of random parameter (mixed) logit to allow for
heterogeneity in scale. See scale heterogeneity.
Hausman test Test for the existence of the independence of irrelevant alternatives.
heterogeneity Variation in behavior that can be attributed to differences in the
tastes and decision making processes of individuals in the population.
hypothesis testing Process by which one determines the worth of an estimate of a
population parameter.
hypothetical bias Extent to which individuals might behave inconsistently, when
they do not have to back up their choices with real commitments.
IID condition Assumption that the unobserved components of utility of all alter-
natives are uncorrelated with the unobserved components of utility for all other
alternatives, combined with the assumption that each of these error terms has the
exact same distribution.
importance weight Relative contribution of an attribute to utility.
inclusive value (IV) Parameter estimate used to establish the extent of dependence
or independence between linked choices. Also referred to as logsum and expected
maximum utility.
income effect Change in quantity demanded that can be attributed to a change in
an individual’s income.
independence of irrelevant alternatives (IIA) Restrictive assumption, which is
part of the multinomial logit (MNL) model. The IIA property states that the ratio of
the choice probabilities is independent of the presence or absence of any other
alternative in a choice set.
indifference curves All combinations of two attributes that yield the same level of
utility.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:51:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.028
Cambridge Books Online © Cambridge University Press, 2015
1121 Select glossary
indirect utility function Function used to estimate the utility derived from a
particular set of observed attributes.
insignificant Having no systematic influence.
intelligent draws Draws that are not random draws that have characteristics that
can improve the efficiency of estimates for a given sample. Examples are Halton
sequences, Random and Shuffles Halton sequences, Modified Latin Hypercube
Sampling.
interaction effect Effect upon the response variable obtained by combining two or
more attributes which would not have been observed had each of the attributes
been estimated separately.
inter-attribute correlation Subjective interrelation between two attributes (e.g., a
higher price may signal higher quality).
interactive agency choice experiments (IACE) Method to jointly model the
choices of more than one agent.
kernel density Smoothed plot used to describe the distribution of a sample of
observations.
*Krinsky–Robb (KR) method Non-symmetric confidence intervals can be
obtained using the Krinsky–Robb method.
Krinsky–Robb (KR) test Method to obtain the standard errors associated with
parameter estimates and especially when the interest is in the standard errors
associated with ratios of parameters, as in WTP estimates.
labeled experiment Contains a description of the alternative (e.g., naming a par-
ticular item model).
Lagrange multiplier (LM) test Statistical test of a simple null hypothesis that a
parameter of interest θ is equal to some particular value θ0.
latent class Modeling method that recognises that the analyst does not know from
the data which observation is in which class, hence the term latent classes. Latent
class models (LCMs) can have fixed and/or random parameters as well as restric-
tion of parameters in each class.
limb Second division of alternatives in a nested model.
lower off-diagonal matrix Matrix in which all values above and to the right of the
diagonal are equal to zero.
main effect (ME) Direct independent effect of each factor upon a response variable.
For experimental designs, the main effect is the difference in the means of each level
of an attribute and the overall or grand mean.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:51:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.028
Cambridge Books Online © Cambridge University Press, 2015
1122 Select glossary
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:51:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.028
Cambridge Books Online © Cambridge University Press, 2015
1123 Select glossary
ordinal Numerical value that is indirectly comparable to all other such values (i.e., a
value of ten is better than a value of five).
ordinal scaled data Data in which the values assigned to levels observed for an
object are both unique and provide an indication of order (i.e., a ranking).
orthogonal Independent of all other factors.
orthogonal coding Coding in which all values for a given attribute sum to zero.
In the case of even numbers of code levels, each positive code level is matched by its
negative value. In the case of odd numbers of code levels, the median level is
assigned the value zero. For example, in the two-level case, the levels assigned are –
1 and 1; in the three-level case, the levels assigned are –1, 0, and 1.
orthogonal main effects only design Orthogonal design in which only the main
effects are estimated. All other interactions are assumed to be insignificant.
orthogonality Term that represents a situation of zero correlation between pairs of
attributes in a choice experiment.
overidentified Having too many variables to be estimated by the available
information.
panel data Data incorporating multiple observations per sampled individual.
parameter Unique weight used to describe the systematic contribution of a parti-
cular element in a model.
part-worth Proportion of utility that can be attributed to a specific attribute.
pivot design In a pivot design the attribute levels shown to the respondents are
pivoted from the reference alternatives of each respondent.
point elasticity Elasticity calculated at a particular point.
power functions Way of weighting the influence of each agent in a two or more
person group choice model in respect of establishing the overall role their prefer-
ences play in defining the role of either an attribute or an alternative in choice
making.
preference heterogeneity Differing preferences across the population.
preferences Forces leading an individual to select one alternative over another.
probability density function (PDF) Probability distribution over the various
values that a variable might take (bounded by zero and one, inclusively).
probability-weighted sample enumeration Calculation of marginal effects for
each decision maker, weighted by the decision maker’s associated choice
probability.
probit Choice model that assumes a normal distribution for the random errors (in
contrast to EV1 for logit).
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:51:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.028
Cambridge Books Online © Cambridge University Press, 2015
1124 Select glossary
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:51:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.028
Cambridge Books Online © Cambridge University Press, 2015
1125 Select glossary
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:51:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.028
Cambridge Books Online © Cambridge University Press, 2015
1126 Select glossary
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:51:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.028
Cambridge Books Online © Cambridge University Press, 2015
1127 Select glossary
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:51:04 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.028
Cambridge Books Online © Cambridge University Press, 2015
Cambridge Books Online
http://ebooks.cambridge.org/
Chapter
Aadland, D. and Caplan, A. J. (2003) Willingness to pay for curbside recycling with detection
and mitigation of hypothetical bias, American Journal of Agricultural Economics, 85 (3),
492–502.
Accent Marketing and Research and Centre for Research in Environmental Appraisal &
Management (CREAM) (2002) Yorkshire Water Services, Final Report, Prepared for
Yorkshire Water, November.
ACT Gover (2003a) Community Summit, Workbook, 27 August, Canberra.
(2003b) Community Water Summit, Workshop Groups’ Summary Reports, 27 August,
Canberra. Available at: www.thinkwater.act.gov.au/strategy/summit_transcripts.shtml.
ACTEW Corporation (2004) Community assistance makes water restrictions a success, Media
Release, 8 June.
Adamowicz, W., Hanemann, M., Swait, J., Johnson, R., Layton, D., Regenwetter, M., Reimer, T.
and Sorkin, R. (2005) Decision strategy and structure in households: a groups perspective,
Marketing Letters, 16, 387–399.
Ailawadi, K. L., Gedenk, K. and Neslin, S. A. (1999) Heterogeneity and purchase event feedback
in choice models: an empirical analysis with implications for model buildings,
International Journal of Research in Marketing, 16, 177–198.
Akaike, H. (1974) A new look at the statistical model identification, IEEE Transactions on
Automatic Control 19 (6), 716–723.
Alfnes, F. and Steine, G. (2005) None-of-these bias in hypothetical choice experiments,
Discussion Paper DP-06/05, Department of Economics and Resources Management,
Norwegian University of Life Sciences, Aas.
Alfnes, F., Guttormsen, A., Steine, G. and Kolstad, K. (2006) Consumers’ willingness to pay for
the color of salmon: a choice experiment with real economic incentives, American Journal
of Agricultural Economics, 88 (4), 1050–1061.
Allais, M. (1953) Le comportement de l’homme rationnel devant le risque, Econometrica, 21
(4), 503–546.
Allenby, G. M., Shively, T. S., Yang, S. and Garratt, M. J. (2004) A choice model for packaged goods:
dealing with discrete quantities and quantity discounts, Marketing Science, 23 (1), 95–108.
Allison, P. (1999) Comparing logit and probit coefficients across groups, Sociological Methods
and Research, 28, 186–208.
Anderson, D. A. and Wiley, J. B. (1992) Efficient choice set designs for estimating cross effect
models, Marketing Letters, 3, 357–370.
Anderson, S., Harrison, G. W., Hole, A. R., Lau, M. and Rutström, E. E. (2012) Non-linear
mixed logit, Journal of Theory and Decision, 73, 7–96.
1128
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1129 References
Anderson, S., Harrison, G. W., Lau, M. and Rutström, E. (2007a) Valuation using multiple price
list formats, Applied Economics, 39 (4), 675–682.
(2007b) Dual criteria decisions, Working Paper 06–11, Department of Economics, College of
Business Administration, University of Central Florida.
Antonov, I. A. and Saleev, V. M. (1979) An economic method of computing LPtau sequences,
Zh, vychisl. Mat. mat. Fiz., 19, 243–245, English translation, USSR Comput. Maths. Math.
Phys., 19, 252–256.
Arendt, J. and Holm, A. (2007) Probit models with binary endogenous regressors, 6th
International Health Economics Association World Congress, Copenhagen, 8–11 July.
Arentze, T., Borgers, A., Timmermans, H. and DelMistro, R. (2003) Transport stated choice
responses: effects of task complexity, presentation format and literacy, Transportation
Research Part E, 39, 229–244.
Aribarg, A., Arora, N. and Bodur, H. O. (2002) Understanding the role of preference revision
and concession in group decisions, Journal of Marketing Research, 39, 336–349.
Armstrong, P. M., Garrido, R. A. and Ortúzar, J. de D. (2001) Confidence intervals to bound the
value of time, Transportation Research Part E, 7 (1), 143–161.
Arora, N. and Allenby, G. M. (1999) Measuring the influence of individual preference struc-
tures in group decision making, Journal of Marketing Research, 37, 476–487.
Asensio, J. and Matas, A. (2008) Commuters’ valuation of travel time variability,
Transportation Research Part E, 44 (6), 1074–1085.
Ashok, K., Dillon, W. R. and Yuan, S. (2002) Extending discrete choice models to incorporate
attitudinal and other latent variables, Journal of Marketing Research, 39 (1), 31–46.
Ashton, W. D. (1972) The logit transformation, Griffin, London. Available at: www.bepress.
com/bejeap/advances/vol6/iss2/art2.
Asmussen, S. and Glynn, P. W. (2007) Stochastic Simulation: Algorithms and Analysis,
Springer, New York.
Auger, P., Devinney, T. M. and Louviere, J. J. (2007) Best–worst scaling methodology to
investigate consumer ethical beliefs across countries, Journal of Business Ethics, 70,
299–326.
Backhaus, K., Wilken, R., Voeth, M. and Sichtmann, C. (2005) An empirical comparison of
methods to measure willingness to pay by examining the hypothetical bias, International
Journal of Market Research 47 (5), 543–562.
Balcombe, K., Fraser, I. and Chalak, A. (2009) Model selection in the Bayesian mixed logit:
misreporting or heterogeneous preferences?, Journal of Environmental Economics and
Management, 57 (2), 219–225.
Bateman, I. J., Carson, R. T., Day, B., Dupont, D., Louviere, J. J., Morimoto, S., Scarpa R. et al.
(2008) Choice set awareness and ordering effects in discrete choice experiments, CSERGE
Working Paper EDM 08-01.
Bateman, I. J., Carson, R. T., Day, B., Dupont, D., Louviere, J. J., Morimoto, S., Scarpa, R. and
Wang, P. (2008) Choice set awareness and ordering effects in discrete choice experiments,
CSERGE Working Paper EDM 08–01.
Bateman, I. J. and Munro, A. (2005) An experiment on risky choice amongst households,
Economic Journal, 115, 176–189.
Bates, J., Polak, J., Jones, P. and Cook, A. (2001) The valuation of reliability for personal travel,
Transportation Research Part E, 37 (2–3), 191–229.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1130 References
Batley, R. and Ibáñez, N. (2009) Randomness in preferences, outcomes and tastes: an applica-
tion to journey time risk, International Choice Modelling Conference, Harrogate.
Beck, M., Rose, J. M. and Hensher, D. A. (2012) Comparison of group decision making models:
a vehicle purchasing case study, Paper presented at the International Association of
Traveller Behaviour Research (IATBR) Conference, Toronto, July 13–15.
(2013) Consistently inconsistent: the role of certainty, acceptability and scale in automobile
choice, Transportation Research Part E, 56 (3), 81–93.
Becker, G. (1991) A Treatise on the Family, Harvard University Press, Cambridge, MA.
(1993) A theory of marriage: Part 1, Journal of Political Economy, 81 (4), 813–846.
Beharry, N., Hensher, D. and Scarpa, R. (2009) An analytical framework for joint vs. separate
decisions by couples in choice experiments: the case of coastal water quality in Tobago,
Environmental and Resource Economics, 43, 95–117.
Beharry, N. and Scarpa, R. (2008) Who should select the attributes in choice-experiments for
non-market valuation? An application to coastal water quality in Tobago, Sustainability
Research Institute, University of Leeds.
Ben-Akiva, M. E. and Bolduc, D. (1996) Multinomial probit with a logit kernel and a general
parametric specification of the covariance structure, Unpublished Working Paper,
Department of Civil Engineering, MIT, Cambridge, MA.
Ben-Akiva, M. E., Bolduc, D. and Bradley, M. (1993) Estimation of travel choice models with
randomly distributed values of time, Transportation Research Record, 1413, 88–97.
Ben-Akiva, M. E., Bradley, M., Morikawa, T., Benjamin, J., Novak, T. P., Oppewal, H. and
Rao, V. (1994) Combining revealed and stated preferences data, Marketing Letters, 5 (4),
336–350.
Ben Akiva, M. E. and Lerman, S. R. (1979) Disaggregate travel and mobility choice models and
measures of accessibility, in Hensher, D. A. and Stopher, P. R. (eds.), Behavioural Travel
Modelling, Croom Helm, London.
(1985) Discrete Choice Analysis: Theory and Application to Travel Demand, MIT Press,
Cambridge, MA.
Ben-Akiva, M., McFadden, D., Abe, M., BoÈckenholt, U., Bolduc, D., Gopinath, D.,
Morikawa, T., Ramaswamy,V., Rao, V., Revelt, D. and Steinberg, D. (1997). Modelling
methods for discrete choice analysis, Marketing Letters 8 (3), 273–286.
Ben-Akiva, M., McFadden, D., Garling, T., Gopinath, D., Walker, J., Bolduc, D., Boersch-Supan,
A., Delquié, P., Larichev, O., Morikawa, T., Polydoropoulou, A. and Rao, V. (1999).
Extended framework for modelling choice behavior, Marketing Letters, 10 (3), 187–203.
Ben-Akiva, M., McFadden, D., Train, K., Walker, J., Bhat, C., Bierlaire, M., Bolduc, D., Boersch-
Supan, A., Brownstone, D., Bunch, D., Daly, A., de Palma, A., Gopinath, D., Karlstrom, A.
and Munizaga, M. A. (2002) Hybrid choice models: progress and challenges, Marketing
Letters, 13 (3), 163–175.
Ben-Akiva, M. E. and Morikawa, T. (1991) Estimation of travel demand models from multiple
data sources, in Koshi, M. (ed.), Transportation and Traffic Theory, Proceedings of the
11th ISTTT, Elsevier, Amsterdam, 461–476.
Ben-Akiva, M. E., Morikawa, T. and Shiroishi, F. (1991) Analysis of the reliability of preference
ranking data, Journal of Business Research, 23, 253–268.
Ben-Akiva, M. E., and Swait J. (1986) The Akaike likelihood ratio index, Transportation Science,
20 (2), 133–136.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1131 References
Bentham, J. (1789) An Introduction to the Principles of Morals and Legislation, Clarendon Press,
Oxford.
Berkson, J. (1944) Application of the logistics function to bioassay, Journal of the American
Statistical Association, 39, 357–65.
Bernouli, D. (1738) Specimen Theoriae Novae de Mensura Sortis, Commentarii Academiae
Scientiarum Imperialis Petropolitanae, Tomus V (Papers of the Imperial Academy of
Sciences in Petersburg), V, 175–192.
Berry, S., Levinsohn, J. and Pakes, A. (1995) Automobile prices in market equilibrium,
Econometrica, 63 (4), 841–890.
Bertrand, M. and Mullainathan, S. (2001) Do people mean what they say? Implications
for subjective survey data, American Economic Review Papers and Proceedings, 91(2),
67–72.
Bettman, J. R., Luce, M. F. and Payne, J. W. (1998). Constructive consumer choice processes,
Journal of Consumer Research, 25, 187–217.
Bhat, C. R. (1994) Imputing a continuous income variable from grouped and missing income
observations, Economics Letters, 46 (4), 311–320.
(1995) A heteroscedastic extreme value model of intercity travel mode choice,
Transportation Research Part B, 29 (6), 471–483.
(1997) An endogenous segmentation mode choice model with an application to intercity
travel, Transportation Science, 31, 34–48.
(2001) Quasi-random maximum simulated likelihood estimation of the mixed multinomial
logit model, Transportation Research Part B, 35 (7), 677–693.
(2003) Simulation estimation of mixed discrete choice models using randomized and
scrambled Halton sequences, Transportation Research Part B, 37 (9), 837–855.
(2008) The multiple discrete-continuous extreme value (MDCEV) model: role of utility
function parameters, identification considerations, and model extensions, Transportation
Research Part B, 42 (3), 274–303.
Bhat, C. R. and Castelar, S. (2002) A unified mixed logit framework for modeling revealed and
stated preferences: formulation and application to congestion pricing analysis in the San
Francisco Bay area, Transportation Research Part B, 36, 577–669.
Bhat, C. R. and Pulugurta, V. (1998) A comparison of two alternative behavioral mechanisms
for car ownership decisions, Transportation Research Part B, 32 (1), 61–75.
Bhat, C. R. and Zhao, H. (2002) The spatial analysis of activity stop generation, Transportation
Research Part B, 36 (6), 557–575.
Bickel, P. J. and Doksum, K. A. (1981) An analysis of transformations revisited, Journal of the
American Statistical Association, 76, 296–311.
Blackburn, M., Harrison, G. W. and Rutström, E. E. (1994) Statistical bias functions and
informative hypothetical surveys, American Journal of Agricultural Economics, 76 (5),
1084–1088.
Blanchard, O. and Fischer, S. (1989) Lectures on Macroeconomics, MIT Press, Cambridge, MA.
Bliemer, M. C. J. and Rose, J. M. (2005a) Efficiency and sample size requirements for stated
choice studies, Working Paper ITLS-WP-05-08, Institute of Transport and Logistics
Studies, The University of Sydney.
(2005b) Efficient designs for alternative specific choice experiments, Working Paper
ITLS-WP-05-04, Institute of Transport and Logistics Studies, the University of Sydney.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1132 References
(2009) Designing stated choice experiments: the state of the art, in Kitamura, R., Yoshi, T.
and Yamamoto, T. (eds.), The Expanding Sphere of Travel Behaviour Research, Selected
Papers from the 11th International Conference on Travel Behaviour Research, Chapter 25,
495–538.
(2010a) Construction of experimental designs for mixed logit models allowing for correla-
tion across choice observations, Transportation Research Part B: Methodological, 44 (6)
720–34.
(2010b) Construction of experimental designs for mixed logit models allowing for correla-
tion across choice observations, Transportation Research Part B: Methodological, 44 (6)
720–34.
(2010c) Serial choice conjoint analysis for estimating discrete choice models, in Hess, S. and
Daly, A. (eds.), Choice Modelling: The State of the Art and the State of the Practice, Emerald
Group, Bingley, 139–161.
(2011) Experimental design influences on stated choice outputs: an empirical study in air
travel choice, Transportation Research Part A, 45 (1), 63–79.
(2013) Confidence intervals of willingness-to-pay for random coefficient logit models,
Transportation Research Part B, 58 (2), 199–214.
(2014) A unified theory of experimental design for stated choice studies, Paper presented at
the 10th International Conference on Transport Survey Methods, Leura.
Bliemer, M. C. J., Rose, J. M. and Hensher, D.A. (2009) Efficient stated choice experiments for
estimating nested logit models, Transportation Research Part B, 43 (1), 19–35.
Bliemer, M. C. J., Rose, J. M. and Hess, S. (2008) Approximation of Bayesian efficiency in
experimental choice designs, Journal of Choice Modelling, 1 (1), 98–127.
Blumenschein, K., Johanneson, M., Yokoyama, K. K. and Freeman, P. R. (2001) Hypothetical
versus real willingness to pay in the health care sector: results from a field experiment,
Journal of Health Economics, 20 (3), 441–457.
Blumenschein, K., Johannesson, M., Blomquist, G. C., Liljas, B. and O’Coner, R. M. (1998)
Experimental results on expressed certainty and hypothetical bias in contingent valuation,
Southern Economic Journal, 65 (1), 169–177.
Bock, R. D. and Jones, L. V. (1968) The Measurement and Prediction of Judgment and Choice,
Holden-Day, San Francisco, CA.
(2004) Income and happiness: new results from generalized threshold and sequential
models, IZA Discussion Paper 1175, SOI Working Paper 0407.
Boes, S. and Winkelman, R. (2007) Ordered response models, Allgemeines Statistiches Archiv
Physica Verlag, 90 (1), 165–180.
Bolduc, D. and Alvarez Daziano, R. (2010) On estimation of hybrid choice models, in Hess, S.
and Daly, A. (eds.), Choice Modelling: The State of the Art and the State of Practice,
Emerald Group, Bingley.
Bolduc, D., Ben-Akiva, M., Walker, J. and Michaud, A. (2005) Hybrid choice models with logit
kernel: applicability to large scale models, in Lee-Gosselin, M. and Doherty, S. (eds),
Integrated Land-Use and Transportation Models: Behavioural Foundations, Elsevier,
Oxford, 275–302.
Bonini, N., Tentori, K. and Rumiati, R. (2004) Contingent application of the cancellation
editing operation: the role of semantic relatedness between risky outcomes, Journal of
Behavioral Decision Making, 17, 139–152.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1133 References
Bordley, R. (2013) Discrete choice with large choice sets, Economics Letters, 118, 13–15.
Borsh-Supan, A. and Hajivassiliou, V. (1993) Smooth unbiased multivariate probability simu-
lation for maximum likelihood estimation of limited dependent variable models, Journal
of Econometrics, 58 (3), 347–368.
Box, G. E. P. and Cox, D. R. (1964) An analysis of transformations, Journal of the Royal
Statistical Society, Series B, 26, 211–252.
Box, G. E. P. and Draper, N. R. (1987) Empirical Model-Building and Response Surfaces, Wiley,
New York.
Bradley, M. A. (2006) Process data for understanding and modelling travel behaviour, in
Stopher, P. and Stecher, C. (eds.), Travel Survey Methods: Quality and Future Directions,
Elsevier, Oxford.
Bradley, M. A., and Daly, A. J. (1997) Estimation of logit choice models using mixed stated
preference and revealed preference information, in Stopher, P. R. and Lee-Gosselin, M.
(eds.), Understanding Travel Behaviour in an Era of Change, Pergamon, Oxford, 209–232.
Bradley, R. A. and Terry, M. E. (1952) Rank analysis of incomplete block designs. I: the method
of paired comparison, Biometrika, 39, 324–345.
Brant, R. (1990) Assessing proportionality in the proportional odds model for ordinal logistic
regression, Biometrics, 46, 1171–1178.
Bratley, P., Fox, B. L. and Niederreiter, H. (1992) Implementing Sobol’s quasi-random sequence
generator, ACM Transactions on Computer Software, 2 (3), 195–213.
Breffle, W. S. and Morey, E. R. (2000) Investigating preference heterogeneity in a repeated
discrete-choice recreation demand model of Atlantic salmon fishing, Marine Resource
Economics, 15, 1–20.
Brewer, A. and Hensher, D. A. (2000) Distributed work and travel behaviour: the dynamics
of interactive agency choices between employers and employees, Transportation, 27,
117–148.
Brewer, C., Kovner, C. T., Wu, Y., Greene, W., Liu, Y. and Reimers, C. (2006) Factors influen-
cing female registered nurses’ work behavior, Health Services Research, 43 (1), 860–866.
Briesch, R. A., Chintagunta, P. K. and Matzkin, R. L. (2010) Nonparametric discrete choice
models with unobserved heterogeneity, Journal of Business and Economic Statistics, 28 (2),
291–307.
Briesch, R. A., Krishnamurthi, L., Mazumdar, T. and Raj, S. P. (1997) A comparative analysis of
reference price models, Journal of Consumer Research, 24, 202–214.
Brown, T. C., Ajzen, I. and Hrubes, D. (2003) Further tests of entreaties to avoid hypothetical
bias in referendum contingent valuation, Journal of Environmental Economics and
Management, 46 (2), 353–361.
Browning, M. and Chiappori, P. A. (1998) Efficient intra-household allocations: a general
characterization and empirical tests, Econometrica, 66 (6), 1241–78.
Brownstone, D. and Small, K. A. (2005) Valuing time and reliability: assessing the evidence
from road pricing demonstrations, Transportation Research Part A, 39 (4), 279–293.
Brownstone, D. and Train, K. (1999) Forecasting new product penetration with flexible
substitution patterns, Journal of Econometrics, 89 (1–2), 109–129.
Bunch, D. S., Louviere, J. J. and Anderson, D. A. (1996) A comparison of experimental design
strategies for multinomial logit models: the case of generic attributes, Working Paper,
Graduate School of Management, University of California at Davis.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1134 References
Burgess, G., Burgess, H., Burton, L., Howe, C. W., Johnson, L., MacDonnell, L. J. and
Reitsma, R. F. (1992) Improving the environmental problem-solving process: lessons
from the 1980s California drought, Working Paper, University of Colorado.
Burgess, L. and Street, D. (2003) Optimal designs for 2k choice experiments, Communications
in Statistics – Theory and Methods, 32 (11), 2185–2206.
(2005) Optimal designs for choice experiments with asymmetric attributes, Journal of
Statistical Planning and Inference, 134 (1), 288–301.
Burnett, N. (1997) Gender economics courses in Liberal Arts Colleges, Journal of Economic
Education, 28 (4), 369–377.
Cai, Y., Deilami, I. and Train, K. (1998) Customer retention in a competitive power market:
analysis of a ‘double-bounded plus follow-ups’ questionnaire, Energy Journal, 19 (2),
191–215.
Caflisch, R. E. (1998) Monte Carlo and quasi-Monte Carlo methods, Acta Numerica,
7, 1–49.
Camerer, C. and Ho, T. (1994) Violations of the betweenness axiom and non-linearity in
probability, Journal of Risk and Uncertainty, 8 (2), 167–196.
Cameron, A. and Trivedi, P. (2005) Microeconometrics: Methods and Applications, Cambridge
University Press.
Cameron, S. V. and Heckman, J. J. (1998) Life cycle schooling and dynamic selection bias:
models and evidence for five cohorts of American males, Journal of Political Economy, 106
(2), 262–333.
Cameron, T. A. and DeShazo, J. R. (2010) Differential attention to attributes in utility-theoretic
choice models, Journal of Choice Modelling, 3 (3), 73–115.
Cameron, T. A., Poe, G. L., Ethier, R. G. and Schulze, W. D. (2002) Alternative non-market
value-elicitation methods: are the underlying preferences the same?, Journal of
Environmental Economics and Management, 44 (3), 391–425.
Campbell, D., Hensher, D. A. and Scarpa, R. (2011) Non-attendance to attributes in environ-
mental choice analysis: a latent class specification, Journal of Environmental Planning and
Management, 54 (8), 1061–1076.
(2012) Cost thresholds, cut-offs and sensitivities in stated choice analysis: identification and
implications, Resource and Energy Economics, 34, 396–411.
Campbell, D., Hutchinson, W. and Scarpa, R. (2008) Incorporating discontinuous preferences
into the analysis of discrete choice experiments, Environment and Resource Economics, 43
(1), 403–417.
Cantillo, V., Heydecker, B. and Ortúzar, J. de D. (2006) A discrete choice model incorporating
thresholds for perception in attribute values, Transportation Research Part B, 40 (9),
807–825.
Cantillo, V. and Ortúzar, J. de D. (2005) A semi-compensatory discrete choice model with
explicit attribute thresholds of perception, Transportation Research Part B, 39 (7),
641–657.
Carlsson, F and Martinsson, P. (2001) Do hypothetical and actual marginal willingness to pay
differ in choice experiments?, Journal of Environmental Economics and Management, 41
(2), 179–192.
(2002) Design techniques for stated preference methods in health economics, Health
Economics, 12, 281–294.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1135 References
Carlsson, F., Frykblom, P. and Lagerkvist, C-J. (2005) Using cheap-talk as a test of validity of
choice experiments, Economics Letters, 89 (5), 147–152.
Carlsson, F., Kataria, M. and Lampi, E. (2008) Ignoring attributes in choice experiments,
Proceedings of the EAERE Conference, 25–28 June, Gothenburg.
Carp, F. M. (1974) Position effects on interview responses, Journal of Gerontology, 29 (5),
581–587.
Carrasco, J. A. and Ortúzar, J. de D. (2002) A review and assessment of the nested logit model,
Transport Reviews, 22, 197–218.
Carroll, J. S. and Johnson, E. J. (1990). Decision Research: A Field Guide, Sage, Newbury
Park, CA.
Carson, R., Flores, E., Martin, K. and Wright, J. (1996) Contingent valuation and revealed
preference methodologies: comparing the estimates for quasi-public goods, Land
Economics, 72 (1), 80–99.
Carson, R., Groves, T., List, J. and Machina, M. (2004) Probabilistic influence and supplemental
benefits: a field test of the two key assumptions underlying stated preferences, Paper
presented at the European Association of Environmental and Resource Economists,
Budapest, June.
Carson, R., Groves, T. and Machina, M. (2007) Incentive and informational properties of
preference questions, Environment and Resource Economics, 37, 181–210.
Carson, R., Louviere, J. J., Anderson, D., Arabie, P., Bunch, D., Hensher, D. A., Johnson, R.,
Kuhfeld, W., Steinberg, D., Swait, J., Timmermans, H. and Wiley, J. (1994) Experimental
analysis of choice, Marketing Letters, 5, 351–367.
Cassel, E. and Mendelsohn, R. (1985) The choice of functional forms for hedonic price
equations: comment, Journal of Urban Economics, 18 (2), 135–142.
Caussade, S., Ortúzar, J. de D., Rizzi, L. and Hensher, D. A. (2005) Assessing the influence of
design dimensions on stated choice experiment estimates, Transportation Research Part B,
39 (7), 621–640.
Chamberlain, G. (1980) Analysis of covariance with qualitative data, Review of Economic
Studies, 47, 225–238.
Cherchi, E., Meloni, I. and Ortúzar, J. de D. (2002) Policy forecasts involving new train services:
application of mixed rp/sp models with interaction effects, Paper presented at the XII
Panamerican Conference on Transport and Traffic Engineering, Quito.
Chiuri, M. C. (2000) Individual decisions and household demand for consumption and leisure,
Research in Economics, 54, 277–324.
Choice Metrics (2012) NGene 1.1.1 User Manual and Reference Guide, Choice Metrics, Sydney.
Chorus, C. G. (2010) A new model of random regret minimization, European Journal of
Transport and Infrastructure Research, 10, 181–196.
Chorus, C. G., Arentze, T. A. and Timmermans, H. J. P. (2008a) A random regret-minimization
model of travel choice, Transportation Research Part B, 42 (1), 1–18.
(2008b) A comparison of regret minimization and utility-maximization in the context of
travel mode-choices, Proceedings of the 87th Annual Meeting of the Transportation
Research Board, Washington, DC.
Cirillo, C., Lindveld, K. and Daly, A. (2000) Eliminating bias due to the repeated measurements
problem in SP data, in Ortúzar, J. de D. (ed.), Stated Preference Modelling Techniques:
PTRC Perspectives 4, PTRC Education and Research Services Ltd, London.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1136 References
Cohen, E. (2009) Applying best–worst scaling to wine marketing, International Journal of Wine
Business Research, 21 (1), 8–23.
Collins, A. T. and Rose, J. M. (2011) Estimation of stochastic scale with best–worst data,
Manuscript, Institute of Transport and Logistics Studies, University of Sydney Business
School, 2nd International Choice Modelling Conference ICMC 2011, University of Leeds,
6 July.
Collins, A. T., Rose, J. M. and Hensher D. A. (2013) Specification issues in a generalised
random parameters attribute nonattendance model, Transportation Research Part B:
Methodological, 56, 234–53.
Connolly, T. and Zeelenberg, M. (2002) Regret in decision making, Current Directions in
Psychological Science, 11, 212–216.
Cook, R. D. and Nachtsheim, C. J. (1980) A comparison of algorithms for constructing exact
D-optimal designs, Techometrics, 22, 315–324.
Cooper, B., Rose, J. M. and Crase, L. (2012) Does anybody like water restrictions? Some
observations in Australian urban communities, Australian Journal of Agricultural and
Resource Economics, 56 (1), 61–51.
Corfman, K. P. (1991) Perceptions of relative influence: formation and measurement, Journal of
Marketing Research, 28, 125–136.
Corfman, K. P. and Lehmann, D. R. (1987) Models of cooperative group decision-making and
relative influence, Journal of Consumer Research, 14, 1–13.
Coricelli, G., Critchley, H. D., Joffily, M., O’Doherty, J. P., Sirigu, A. and Dolan, R. J. (2005)
Regret and its avoidance: a neuroimaging study of choice behaviour, Nature Neuroscience,
8 (9), 1255–1262.
Creel, M. D. and Loomis, J. B. (1991) Confidence intervals for welfare measures with an applica-
tion to a problem of truncated counts, Review of Economics and Statistics, 73, 370–373.
Cummings, R. G., Harrison, G. W. and Osborne, L. L. (1995) Can the bias of contingent
valuation be reduced? Evidence from the laboratory, Economics Working Paper
B-95-03, Division of Research, College of Business Administration, University of South
Carolina. Available at: www.bus.ucf.edu/gharrison/wp/.
Cummings, R. G., Harrison, G. W. and Rutström, E. E. (1995) Homegrown values and hypothe-
tical surveys: is the dichotomous choice approach incentive compatible?, American
Economic Review, 85 (1), 260–266.
Cummings, R. G. and Taylor, L. O. (1998) Does realism matter in contingent valuation
surveys?, Land Economics, 74 (2), 203–215.
(1999) Unbiased value estimates for environmental goods: a cheap talk design for the
contingent valuation method, American Economic Review, 89 (3), 649–665.
Cunha, F., Heckman, J. J. and Navarro, S. (2007) The identification and economic content of
ordered choice models with stochastic cutoffs, International Economic Review, 48 (4),
1273–1309.
Daly, A. J. (1987) Estimating ‘tree’ logit models, Transportation Research Part B, 21, 251–67.
Daly, A. J., Hess, S. and Train, K. (2012) Assuring finite moments for willingness to pay in
random coefficient models, Transportation, 39 (1), 19–31.
Daly, A. J., Hess, S., Patruni, B., Potoglou, D. et al. (2013) Using ordered attitudinal indicators in
a latent variable choice model: a study of the impact of security on rail travel behaviour,
Transportation, 39 (2), 267–297.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1137 References
Daly, A. J. and Ortúzar, J. de D. (1990) Forecasting and data aggregation: theory and practice,
Traffic Engineering & Control, 31, 632–643.
Daly, A. J. and Zachary, S. (1978) Improved multiple choice models, in Hensher, D. A. and
Dalvi, M. Q. (eds.), Determinants of Travel Choice, Saxon House, Farnborough.
Day, B., Bateman, I. J., Carson, R. T., Dupont, D., Louviere, J. J., Morimoto, S., Scarpa, R. et al.
(2009) Task independence in stated preference studies: a test of order effect explanations,
CSERGE Working Paper EDM 09–14.
Day, B. and Prades, J. P. (2010) Ordering anomalies in choice experiments, Journal of
Environmental Economics and Management, 59, 271–285.
Daykin, A. and Moffitt, P. (2002) Analyzing ordered responses: a review of the ordered probit
model, Understanding Statistics, 3, 157–166.
Daziano, R. and Bolduc, D. (2012) Covariance, identification, and finite sample performance
of the MSL and Bayes estimators of a logit model with latent attributes, Transportation,
40 (3), 647–670.
Debreu, G. (1960) Review of R. D. Luce, Individual Choice Behavior, American Economic
Review, 50, 186–188.
Dellaert, B. G. C., Prodigalidad, M. and Louviere, J. J. (1998) Family members’ projections of each
other’s preference and influence: a two-stage conjoint approach, Marketing Letters, 9, 135–145.
Dempster, A. P. (1967) Upper and lower probabilities induced by a multiple-valued mapping,
Ann Math. Stat., 38, 325–339.
DeShazo, J. R. (2002) Designing transactions without framing effects in iterative question
formats, Journal of Environmental Economics and Management, 43, 360–385.
DeShazo, J. R. and Fermo, G. (2002) Designing choice sets for stated preference methods: the
effects of complexity on choice consistency, Journal of Environmental Economics and
Management, 44, 123–143.
Diamond, P. and Hausman, J. (1994) Contingent valuation: is some number better than no
number?, Journal of Economic Perspectives, 8(4), 45–64.
Diecidue, E. and Wakker, P. P. (2001) On the intuition of rank-dependent utility, Journal of
Risk and Uncertainty, 23 (3), 281–298.
Diederich, A. (2003) MDFT account of decision making under time pressure, Psychonomic
Bulletin & Review, 10 (1), 156–166.
Ding, M., Grewal, R. and Liechty, J. (2007) An incentive-aligned mechanism for conjoint
analysis, Journal of Marketing Research, 44 (2), 214–223, doi:10.1109/TAC.1974.1100705.
Domencich, T. and McFadden, D. (1975) Urban Travel Demand, North-Holland, Amsterdam.
Dosman, D. and Adamowicz, W (2006) Combining stated and revealed preference data to
construct an empirical examination of intrahousehold bargaining, Review of Economics of
the Household, 4, 15–34.
Drolet, A. and Luce, M. F. (2004) The rationalizing effects of cognitive load on emotion-based
trade-off avoidance, Journal of Consumer Research, 31 (1), 63–77.
Dubois, D. and Prade, H. (1987) Representation and combination of uncertainty with belief
functions and possibility measures, Computational Intelligence, 170 (11), 909–924.
(1988) Modelling uncertainty and inductive inference: a survey of recent non-additive
probability systems, Acta Psychologica, 68, 53–78.
Dworkin, J. (1973) Global trends in natural disasters 1947–1973, Natural Hazards Research
Working Paper 26, Institute of Behavioral Science, University of Colorado.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1138 References
Einstein, A. (1921) Geometry and experience, Lecture at the Prussian Academy of Science,
Berlin, 27 January.
El Helbawy, A. T. and Bradley, R. A. (1978) Treatment contrasts in paired comparisons: large-
sample results, applications and some optimal designs, Journal of the American Statistical
Association, 73, 831–839.
Eliasson, J., Hultzkranz, L., Nerhagen, L. and Smidfelt Rosqvist, L. (2009) The Stockholm
congestion charging trial 2006: overview of effects, Transportation Research Part A, 43 (3),
240–250.
Eliasson, J. and Mattsson, L. G. (2006) Equity effects of congestion pricing: quantitative
methodology and a case study for Stockholm, Transportation Research Part A, 40 (7),
602–620.
Ellsberg, D. (1961) Risk, ambiguity, and the savage axioms, Quarterly Journal of Economics, 75
(4), 643–669.
Elrod, T. (1988) Choice map: inferring a product-market map from panel data, Marketing
Science, 7 (1), 21–40.
Elrod, T. and Keane, M. P. (1995) A factor-analytic probit model for representing the market
structure in panel data, Journal of Marketing Research, 32 (1), 1–16.
Eluru, N., Bhat, C. R. and Hensher, D. A. (2008) A mixed generalized ordered response model
for examining pedestrian and bicyclist injury severity level in traffic crashes, Accident
Analysis and Prevention, 40 (3), 1033–1054.
Ericsson, K. A. and Simon, H. A. (1993) Protocol Analysis: Verbal Reports as Data, MIT Press,
Cambridge, MA.
Eto, J., Koomey, J., Lehman, B., Martin, N., Mills, E., Webber, C. and Worrell, E. (2001) Scoping
study on trends in the economic value of electricity reliability to the US economy,
Technical Report, Energy Analysis Department, Lawrence Berkeley Laboratory,
Berkeley, CA.
Everitt, B. (1988) A finite mixture model for the clustering of mixed-mode data, Statistics and
Probability Letters, 6, 305–309.
Fader, P.S., Lattin, J. M. and Little, J. D. C. (1992) Estimating nonlinear parameters in the
multinomial logit model, Marketing Science, 11 (4), 372–385.
Fang, K.-T. and Wang, Y. (1994) Number-Theoretic Methods in Statistics, Chapman & Hall,
London.
Ferrini, S. and Scarpa, R. (2007) Designs with a-priori information for nonmarket valuation
with choice-experiments: a Monte Carlo study, Journal of Environmental Economics and
Management, 53 (3), 342–363.
Fiebig, D. G., Keane, M., Louviere, J. J. and Wasi, N. (2010) The generalized multinomial logit:
accounting for scale and coefficient heterogeneity, Marketing Science, 29 (3), 393–421.
Fisher, R. A. (1935) The Design of Experiments, Hafner Press, New York.
Flynn, T. N., Louviere, J. J., Peters, T. J. and Coast, J. (2007) Best–worst scaling: what it can do
for health care research and how to do it, Journal of Health Economics, 26, 171–189.
(2008) Estimating preferences for a dermatology consultation using best–worst scaling:
comparison of various methods of analysis, BMC Medical Research Methodology, 8 (76),
1–12.
Fosgerau, M. (2006) Investigating the distribution of the value of travel time savings,
Transportation Research Part B, 40 (8), 688–707.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1139 References
(2007) Using nonparametrics to specify a model to measure the value of travel time,
Transportation Research Part A, 41 (9), 842–856.
Fosgerau, M. and Bierlaire, M. (2009) Discrete choice models with multiplicative error terms,
Transportation Research, Part B: Methodological, 43 (5), 494–505.
Foster, V., Bateman, I. and Harley, D. (1997) Real and hypothetical willingness to pay for
environmental preservation: a non-experimental comparison, Journal of Agricultural
Economics, 48 (1), 123–138.
Fowkes, A. S. and Wardman, M. R. (1988) The design of stated preference travel choice
experiments with particular regard to inter-personal taste variations, Journal of
Transport Economics and Policy, 22, 27–44.
Fowkes, A. S., Wardman, M. and Holden, D. G. P. (1993) Non-orthogonal stated preference
design, Proceedings of the PTRC Summer Annual Meeting, 91–97.
Fox, C. R. and Poldrack, R. A. (2008) Prospect theory and the brain, in Glimcher, P., Fehr, E.,
Camerer, C. and Poldrack, R. (eds.) Handbook of Neuroeconomics, Academic Press, San
Diego, CA, 145–170.
Fox, C. R. and Tversky, A. (1998) A belief-based account of decision under uncertainty,
Management Science, 44 (7), 870–895.
Fox, J. A., Shogren, J. F., Hayes, D. J. and Kliebenstein, J. B. (1998) CVM-X: calibrating con-
tingent values with experimental auction markets, American Journal of Agricultural
Economics, 80, 455–465.
Frykblom, P. (1997) Hypothetical question modes and real willingness to pay, Journal of
Environmental Economics and Management, 34 (2), 274–287.
Fujii, S. and Gärling, T. (2003) Application of attitude theory for improved predictive accuracy
of stated preference methods in travel demand analysis, Transportation Research Part A,
37 (4), 389–402.
Galanti, S. and Jung, A. (1997) Low-discrepancy sequences: Monte Carlo simulation of option
prices, Journal of Derivatives, 5 (1), 63–83.
Garrod, G. D., Scarpa, R. and Willis, K. G. (2002) Estimating the benefits of traffic calming on
through routes: a choice experiment approach, Journal of Transport Economics and Policy,
36 (2), 211–232.
Georgescu-Roegen, N. (1954) Choice, expectations, and measurability, Quarterly Journal of
Economics, 68, 503–534.
Geweke, J. (1989) Bayesian inference in econometric models using Monte Carlo integration,
Econometrica, 57 (6), 1317–1339.
(1991) Efficient simulation from multivariate normal and Student-t distributions subject to
linear constraints, in Keramidas, M. E. (ed.), Computer Science and Statistics: Proceedings
of the Twenty-Third Symposium on the Inference, Interface Foundation of North America,
Inc., Fairfax, VA, 571–578.
Ghosh, A. (2001) Valuing time and reliability: commuters’ mode choice from a real time
congestion pricing experiment, PhD dissertation, Department of Economics, University
of California at Irvine.
Gilboa, I. and Schmeidler, D. (2001) A Theory of Case-Based Decisions, Cambridge University
Press.
Gilbride, T. J. and Allenby, G. M. (2004) A choice model with conjunctive, disjunctive, and
compensatory screening rules, Marketing Science, 23 (3), 391–406.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1140 References
(2006). Estimating heterogeneous EBA and economic screening rule choice models,
Marketing Science, 25 (5), 494–509.
Gilovich, T., Griffin, D. and Kahneman, D. (eds.) (2002) Heuristics and Biases – The Psychology
of Intuitive Judgment, Cambridge University Press.
Goett, A., Hudson, K. and Train, K. (2000) Customers’ choice among retail energy suppliers:
the willingness-to-pay for service attributes, Energy Journal, 21 (4), 1–28.
Goldstein, W. M. and Einhorn, H. J. (1987) Expression theory and the preference reversal
phenomena, Psychological Review, 94 (2), 236–254.
Golob, T. (2001) Joint models of attitudes and behaviour in evaluation of the San Diego I–15
congestion pricing project, Transportation Research Part A, 35 (6), 495–514.
González, R. and Wu, G. (1999) On the shape of the probability weighting function, Cognitive
Psychology, 38 (1), 129–166.
González-Vellejo, C. (2002) Making trade-offs: a probabilistic and context-sensitive model of
choice behavior, Psychological Review, 109, 137–155.
Goodwin, P. (1989) The rule of three: a possible solution to the political problem of competing
objectives for road pricing, Traffic Engineering and Control, 30, 495–497.
(1997) Solving congestion, Inaugural Lecture of the Professorship of Transport Policy,
University College London. Available at: www.cts.ucl.ac.uk/tsu/pbginau.htm, retrieved
19 May 2012.
Gordon, J., Chapman, R. and Blamey, R. (2001) Assessing the options for the Canberra water
supply: an application of choice modelling, in Bennett, J. and Blamey, R. (eds.), The Choice
Modelling Approach to Environmental Evaluation, Edward Elgar, Cheltenham.
Gotwalt, C. M., Jones, B. A. and Steinberg, D. M. (2009) Fast computation of designs robust to
parameter uncertainty for nonlinear settings, Technometrics, 51, 88–95.
Gourieroux, C., and Monfort, A. (1996) Simulation-Based Methods Econometric Methods,
Oxford University Press.
Gourville, J. T. and Soman, D. (2007). Extremeness seeking: when and why consumers prefer
the extreme. Harvard Business School Working Paper 07–092.
Greene, W. H. (1997) LIMDEP version 7.0 Reference Manual, Econometric Software, New York.
(1998a) Econometric Analysis, Prentice Hall, Upper Saddle River, NJ, 4th edn.
(1998b) Gender economics courses in Liberal Arts Colleges: further results, Journal of
Economic Education, 29 (4), 291–300.
(2001) Fixed and random effects in nonlinear models, Working Paper EC-01–01, Stern
School of Business, Department of Economics, New York University.
(2002) LIMDEP version 8.0 Reference Manual, Econometric Software, New York.
(2004a) The behavior of the fixed effects estimator in nonlinear models, Econometrics
Journal, 7 (1), 98–119.
(2004b) Fixed effects and bias due to the incidental parameters problem in the Tobit model,
Econometric Reviews, 23 (2), 125–147.
(2007) Nlogit 4, Econometric Software, New York and Sydney.
(2008) Econometric Analysis, Prentice Hall, Upper Saddle River, NJ, 6th edn.
(2012) Econometric Analysis, Prentice Hall, Upper Saddle River, NJ, 7th edn.
Greene, W. H., Harris, M., Hollingsworth, B. and Maitra, P. (2008) A bivariate latent class
correlated generalized ordered probit model with an application to modelling observed
obesity levels, Working Paper 08–18, Stern School of Business, New York University.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1141 References
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1142 References
(2006b) Hypothetical bias over uncertain outcomes, in List, J. A. (ed.), Using Experimental
Methods in Environmental and Resource Economics, Edward Elgar, Northampton, MA, 41–69.
(2007) Making choice studies incentive compatible, in Kanninen, B. (ed.), Valuing
Environmental Amenities Using Stated Choice Studies, Springer, Dordrecht, 67–110.
Harrison, G. W., Humphrey, S. and Verschoor, A. (2010) Choice under uncertainty: Evidence
from Ethiopia, India and Uganda, Economic Journal, 120 (543), 80–104.
Harrison, G. W. and List, J. A. (2004) Field experiments, Journal of Economic Literature, 42 (4),
1013–1059.
Harrison, G. W. and Rutström, E. E. (2008) Experimental evidence on the existence of hypothe-
tical bias in value elicitation methods, in Plott, C. R. and Smith, V. L. (eds.), Handbook of
Experimental Economics Results, North-Holland, Amsterdam.
(2009) Expected utility theory and prospect theory: one wedding and a decent funeral,
Journal of Experimental Economics, 12 (2), 133–158.
Hausman, J. (1978) Specification tests in econometrics, Econometrica, 46, 1251–1271.
Hausman, J. and McFadden, D. (1984) Specification tests for the multinomial logit model,
Econometrica, 52, 1219–1240.
Heckman, J. (1979) Sample selection bias as a specification error, Econometrica, 47, 153–161.
Heckman, J. and Singer, B. (1984a) A method for minimizing the impact of distributional
assumptions in econometric models, Econometrica, 52, 271–320.
(1984b) Econometric duration analysis, Journal of Econometrics, 24, 63–132.
Henderson, D. and Parmeter, C. (2014) Applied Nonparametric Econometrics, Cambridge
University Press.
Hensher, D. A. (1974) A Probabilistic disaggregate model of binary mode choice, in
Hensher, D. A. (ed.), Urban travel choice and demand modelling, Special Report 12,
Australian Road Research Board, Melbourne, August, 61–99.
(1975) The value of commuter travel time savings: empirical estimation using an alternative
valuation model, Journal of Transport Economics and Policy, 10 (2), 167–176.
(1986) Sequential and full information maximum likelihood estimation of a nested-logit
model, Review of Economics and Statistics, 58(4), 657–667.
(1994) Stated preference analysis of travel choices: the state of practice, Special Issue of
Transportation on The Practice of Stated Preference Analysis, 21 (2), 106–134.
(1998) Establishing a fare elasticity regime for urban passenger transport, Journal of
Transport Economics and Policy, 32 (2), 221–246.
(1999) HEV choice models as a search engine for specification of nested logit tree structures,
Marketing Letters, 10 (4), 333–343.
(2001) Measurement of the valuation of travel time savings, Journal of Transport Economics
and Policy, 35 (1), 71–98.
(2001a) The valuation of commuter travel time savings for car drivers in New Zealand:
evaluating alternative model specifications, Transportation, 28, 110–118.
(2002) A systematic assessment of the environmental impacts of transport policy: an end use
perspective, Environmental and Resource Economics, 22 (1–2), 185–217.
(2004) Accounting for stated choice design dimensionality in willingness to pay for travel
time savings, Journal of Transport Economics and Policy, 38 (2), 425–446.
(2006a) The signs of the times: imposing a globally signed condition on willingness to pay
distributions, Transportation, 33(3), 205–222.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1143 References
(2006b) Integrating accident and travel delay externalities in an urban context, Transport
Reviews, 26 (4), 521–534.
(2006c) Revealing differences in behavioral response due to the dimensionality of
stated choice designs: an initial assessment, Environmental and Resource Economics,
34 (1), 7–44.
(2006d) How do respondents handle stated choice experiments? – Attribute processing
strategies under varying information load, Journal of Applied Econometrics, 21 (5),
861–878.
(2008) Joint estimation of process and outcome in choice experiments and implications for
willingness to pay, Journal of Transport Economics and Policy, 42 (2), 297–322.
(2010a) Attribute processing, heuristics and preference construction in choice analysis, in
Hess, S. and Daly, A. (eds.), Choice Modelling: The State of Art and the State of Practice,
Emerald Group, Bingley, 35–70.
(2010b) Hypothetical bias, choice experiments and willingness to pay, Transportation
Research Part B, 44 (6), 735–752.
Hensher, D. A., Beck, M. J. and Rose, J. M. (2011) Accounting for preference and scale hetero-
geneity in establishing whether it matters who is interviewed to reveal household auto-
mobile purchase preferences, Environment and Resource Economics, 49, 1–22.
Hensher, D. A. and Bradley, M. (1993) Using stated response data to enrich revealed preference
discrete choice models, Marketing Letters, 4 (2), 139–152.
Hensher, D. A. and Brewer, A. M. (2000) Transport and Economics Management, Oxford
University Press.
Hensher, D. A. and Collins, A. (2011) Interrogation of responses to stated choice experiments:
is there sense in what respondents tell us? A closer look at what respondents choose in
stated choice experiments, Journal of Choice Modelling, 4 (1), 62–89.
Hensher, D. A. and Goodwin, P. B. (2004) Implementation of values of time savings: the
extended set of considerations in a tollroad context, Transport Policy, 11 (2), 171–181.
Hensher, D. A. and Greene, W. H. (2002) Specification and estimation of the nested logit
model: alternative normalisations, Transportation Research Part B, 36 (1), 1–17.
(2003) Mixed logit models: state of practice, Transportation, 30 (2), 133–176.
(2010) Non-attendance and dual processing of common-metric attributes in choice analysis:
a latent class specification, Empirical Economics, 39 (2), 413–426.
(2011) Valuation of travel time savings in WTP and preference space in the presence of taste
and scale heterogeneity, Journal of Transport Economics and Policy, 45 (3), 505–525.
Hensher, D. A., Greene, W. H. and Chorus, C. (2013) Random regret minimisation or random
utility maximisation: an exploratory analysis in the context of automobile fuel choice,
Journal of Advanced Transportation, 47, 667–678.
Hensher, D. A., Greene, W. H. and Li, Z. (2011) Embedding risk attitude and decision weights
in non-linear logit to accommodate time variability in the value of expected travel time
savings, Transportation Research Part B, 45 (7), 954–972.
Hensher, D. A. and Johnson, L. W. (1981) Applied Discrete-Choice Modelling, Croom Helm,
London/John Wiley, New York.
Hensher, D. A. and King, J. (2001) Parking demand and responsiveness to supply, pricing and
location in the Sydney Central Business District, Transportation Research Part A, 5 (3),
177–196.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1144 References
Hensher, D. A. and Knowles, L. (2007) Spatial alliances of public transit operators: establishing
operator preferences for area management contracts with government, in Macario, R.,
Viega, J. and Hensher, D. A. (eds.), Competition and Ownership of Land Passenger
Transport, Elsevier, Oxford, 517–546.
Hensher, D. A. and Layton, D. (2010) Common-metric attribute parameter transfer and cogni-
tive rationalisation: implications for willingness to pay, Transportation, 37 (3), 473–490.
Hensher, D. A. and Li, Z. (2012) Valuing travel time variability within a rank-dependent utility
framework and an investigation of unobserved taste heterogeneity, Journal of Transport
Economics and Policy, 46 (2), 293–312.
(2013) Referendum voting in road pricing reform: a review of the evidence, Transport Policy,
25 (1), 186–97.
Hensher, D. A., Li, Z. and Rose, J. M. (2013) Accommodating risk in the valuation of expected
travel time savings, Journal of Advanced Transportation, 47 (2), 206–224.
Hensher, D. A., Louviere, J. J. and Swait, J. (1999) Combining sources of preference data,
Journal of Econometrics, 89, 197–221.
Hensher, D. A. and Mulley, C. (2014) Complementing distance based charges with discounted
registration fees in the reform of road user charges: the impact for motorists and govern-
ment revenue, Transportation, doi: 10.1007/s11116-013-9473-6.
Hensher, D. A., Mulley, C. and Rose, J. M. (2014, in press) Understanding the relationship
between voting preferences for public transport and perceptions and preferences for bus
rapid transit versus light rail, Journal of Transport Economics and Policy.
Hensher, D. A. and Prioni, P. (2002) A service quality index for area-wide contract performance
assessment, Journal of Transport Economics and Policy, 36, 93–113.
Hensher, D. A. and Puckett, S. M. (2007) Congestion charging as an effective travel demand
management instrument, Transportation Research Part A, 41 (5), 615–626.
Hensher, D. A., Puckett S. M. and Rose, J. M. (2007a) Extending stated choice analysis to
recognise agent-specific attribute endogeneity in bilateral group negotiation and choice:
a think piece, Transportation, 34 (6), 667–679.
Hensher, D. A., Puckett, S. and Rose, J. (2007b) Agency decision making in freight distribution
chains: revealing a parsimonious empirical strategy from alternative behavioural struc-
tures, Transportation Research Part B, 41 (9), 924–949.
Hensher, D. A. and Rose, J.M (2007) Development of commuter and non-commuter mode
choice models for the assessment of new public transport infrastructure projects: a case
study, Transportation Research Part A, 41 (5), 428–433.
(2009) Simplifying choice through attribute preservation or non-attendance: implications
for willingness to pay, Transportation Research Part E, 45 (4), 583–590.
(2012) The influence of alternative acceptability, attribute thresholds and choice response
certainty on automobile purchase preferences, Journal of Transport Economics and Policy,
46 (3), 451–468.
Hensher, D. A., Rose, J. M. and Beck, M. J. (2012) Are there specific design elements of choice
experiments and types of people that influence choice response certainty?, Journal of
Choice Modelling, 5 (1), 77–97.
Hensher, D. A., Rose, J. M. and Collins, A. T. (2011) Identifying commuter preferences for
existing modes and a proposed Metro, Public Transport – Planning and Operation, online,
DOI: 10.1007/512469-010-0035-4, 3:109–147.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1145 References
Hensher, D. A., Rose, J. M. and Black, I. (2008) Interactive agency choice in automobile
purchase decisions: the role of negotiation in determining equilibrium choice outcomes,
Journal of Transport Economics and Policy, 42 (2), 269–296.
Hensher, D. A., Rose, J. M. and Collins, A. (2013) Understanding buy in for risky prospects:
incorporating degree of belief into the ex ante assessment of support for alternative road
pricing schemes, Journal of Transport Economics and Policy, 47 (3), 453–73.
Hensher, D. A., Rose, J. and Greene, W. (2005a) The implications on willingness to pay of
respondents ignoring specific attributes, Transportation, 32 (3), 203–220.
(2005b) Applied Choice Analysis: A Primer, Cambridge University Press.
(2012) Inferring attribute non-attendance from stated choice data: implications for
willingness to pay estimates and a warning for stated choice experiment design,
Transportation, 39 (2), 235–245.
Hensher, D. A., Shore, N. and Train, K. N. (2005) Households’ willingness to pay for water
service attributes, Environmental and Resource Economics, 32, 509–531.
Hess, S. and Hensher, D. A. (2010) Using conditioning on observed choices to retrieve
individual-specific attribute processing strategies, Transportation Research Part B, 44
(6), 781–790.
Hess, S., Hensher, D. A. and Daly, A. J. (2012) Not bored yet – revisiting respondent fatigue in
stated choice experiments, Transportation Research Part A, 46 (3), 626–644.
Hess, S. and Rose, J. M. (2007) A latent class approach to modelling heterogeneous information
processing strategies in SP studies, Paper presented at the Oslo Workshop on Valuation
Methods in Transport Planning, Oslo.
(2012) Can scale and coefficient heterogeneity be separated in random coefficients models?,
Transportation, 39 (6), 1225–1239.
Hess, S., Rose, J. M. and Bain, S. (2010) Random scale heterogeneity in discrete choice models,
Paper presented at the 89th Annual Meeting of the Transportation Research Board,
Washington, DC.
Hess, S., Rose, J. M. and Hensher, D. A. (2008) Asymmetrical preference formation in willingness
to pay estimates in discrete choice models, Transportation Research Part E, 44 (5), 847–863.
Hess, S., Stathopoulos, A. and Daly, A. (2012) Allowing for heterogeneous decision rules in
discrete choice models: an approach and four case studies, Transportation, 39 (3), 565–591.
Hess, S., Train, K. E. and Polak, J. W. (2004) On the use of randomly shifted and shuffled uniform
vectors in the estimation of the mixed logit model for vehicle choice, Paper presented at the
83rd Annual Meeting of the Transportation Research Board, Washington, DC.
(2006) On the use of a Modified Latin Hypercube Sampling (MLHS) approach in the
estimation of a mixed logit model for vehicle choice, Transportation Research Part B, 40
(2), 147–163.
Hole, A. R. (2011) A discrete choice model with endogenous attribute attendance, Economics
Letters, 110 (3), 203–205.
Hollander, Y. (2006) Direct versus indirect models for the effects of unreliability,
Transportation Research Part A, 40 (9), 699–711.
Holmes, C. (1974) A statistical evaluation of rating scales, Journal of the Market Research
Society, 16, 87–107.
Holt, C. A. and Laury, S. K. (2002) Risk aversion and incentive effects, American Economic
Review, 92 (5), 1644–1655.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1146 References
Houston, D. A. and Sherman, S. J. (1995) Cancellation and focus: the role of shared and unique
features in the choice process, Journal of Experimental Social Psychology, 31, 357–378.
Houston, D. A., Sherman, S. J. and Baker, S. M. (1989) The influence of unique features and
direction of comparison on preferences, Journal of Experimental Social Psychology, 25,
121–141.
Howe, C. W. and Smith, M. G. (1994) The value of water supply reliability in urban water
systems, Journal of Environmental Economics and Management, 26, 19–30. Available at:
www.sawtoothsoftware.com/technicaldownloads.shtml#ssize.
Huber, J. and Zwerina, K. (1996) The importance of utility balance in efficient choice designs,
Journal of Marketing Research, 33, 307–317.
Hudson, D., Gallardo, K. and Hanson, T. (2006) Hypothetical (non)bias in choice experiments:
evidence from freshwater prawns, Working Paper. Department of Agricultural
Economics, Mississippi State University.
Hull, C. L. (1943) Principles of Behavior, Appleton-Century, New York.
Idson, L. C., Krantz, D. H., Osherson, D. and Bonini, N. (2001) The relation between probability
and evidence judgment: an extension of support theory, Journal of Risk and Uncertainty,
22 (3), 227–249.
Isacsson, G. (2007) The trade off between time and money: is there a difference between real
and hypothetical choices?, Swedish National Road and Transport Research Institute,
Borlange.
Ison, S. (1998) The saleability of urban road pricing, Economic Affairs, 18 (4), 21–25.
Johannesson, M., Blomquist, G., Blumenschien, K., Johansson, P., Liljas, B. and O’Connor, R.
(1999) Calibrating hypothetical willingness to pay responses, Journal of Risk and
Uncertainty, 8 (1), 21–32.
Johansson-Stenman, O. and Svedsäter, H. (2003) Self image and choice experiments: hypothe-
tical and actual willingness to pay, Working Papers in Economics 94, Department of
Economics, Gothenburg University.
John, J. A. and Draper, N. R. (1980) An alternative family of transformations, Applied Statistics,
29, 190–197.
Johnson, F. R., Kanninen, B. J. and Bingham, M. (2006) Experimental design for stated choice
studies, in Kanninen, B. J. (ed.) Valuing Environmental Amenities Using Stated Choice
Studies: A Common Sense Approach to Theory and Practice, Springer, Dordrecht, 159–202.
Johnson, R. and Orme, B. (2003) Getting the most from CBC, Sawtooth Conference Paper,
Sawtooth ART Conference, Beaver Creek.
Jones, B. D. (1999) Bounded rationality, Annual Review of Political Science, 2, 297–321.
Jones, P.M. (1998) Urban road pricing: public acceptability and barriers to implementation,
in Button, K. J. and Verhoef, E. T. (eds.), Road Pricing, Traffic Congestion and the
Environment: Issues of Efficiency and Social Feasibility, Edward Elgar, Cheltenham, 263–284.
Jones, S. and Hensher, D. A. (2004) Predicting financial distress: a mixed logit model,
Accounting Review, 79 (4), 1011–1038.
Joreskog, K. G. and Goldberger, A. S. (1975) Estimation of a model with multiple indicators and
multiple causes of a single latent variable, Journal of the American Statistical Association,
70 (351), 631–639.
Jou, R. (2001) Modelling the impact of pre-trip information on commuter departure time and
route choice, Transportation Research Part B, 35 (10), 887–902.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1147 References
Jovicic G. and Hansen, C. O. (2003) A passenger travel demand model for Copenhagen,
Transportation Research Part A, 37 (4), 333–349.
Kahneman, D. and Tversky, A. (1979) Prospect theory: an analysis of decision under risk,
Econometrica, 47 (2), 263–292.
Kanninen, B. J. (2002) Optimal design for multinomial choice experiments, Journal of
Marketing Research, 39, 214–217.
(2005) Optimal design for binary choice experiments with quadratic or interactive terms,
Paper presented at the 2005 International Health Economics Association conference,
Barcelona.
Karmarkar, U. S. (1978) Weighted subjective utility: a descriptive extension of the
expected utility model, Organizational Behavior and Human Performance, 21 (1),
61–72.
Kates, R. W. (1979) The Australian experience: summary and prospect, in Heathcote, R. L. and
Thom, B. G. (eds.), Natural Hazards in Australia, Australian Academy of Science,
Canberra, 511–520.
Kaye-Blake, W. H., Abell, W. L. and Zellman, E. (2009) Respondents’ ignoring of attribute
information in a choice modelling survey, Australian Journal of Agricultural and Resource
Economics, 53, 547–564.
Keane, M. (1990) Four essays in empirical macro and labor economics, PhD thesis, Brown
University.
(1994) A computationally practical simulation estimator for panel data, Econometrica, 62
(1), 95–116.
(2006) The generalized logit model: preliminary ideas on a research program, Presentation at
Motorola–CenSoC Hong Kong Meeting, 22 October.
Keppel, G. and Wickens, D. W. (2004) Design and Analysis: A Researcher’s Handbook, Pearson
Prentice Hall, Upper Saddle River, NJ, 4th edn.
Kessels, R., Bradley, B., Goos, P. and Vandebroek, M. (2009) An efficient algorithm for
constructing Bayesian optimal choice designs, Journal of Business and Economic
Statistics, 27 (2), 279–291.
Kessels, R., Goos, P. and Vandebroek, M. (2006) A comparison of criteria to design efficient
choice experiments, Journal of Marketing Research, 43, 409–419.
King, D., Manville, M. and Shoup, D. (2007) The political calculus of congestion pricing,
Transport Policy, 14 (2), 111–123.
King, G., Murray, C., Salomon, J. and Tandon, A. (2004) Enhancing the validity and cross-
cultural comparability of measurement in survey research, American Political Science
Review, 98, 191–207.
King, G. and Wand, J. (2007) Comparing incomparable survey responses: new tools for
anchoring vignettes, Political Analysis, 15, 46–66.
Kivetz, R., Netzer, O. and Srinivasan, V. (2004) Alternative models for capturing the compro-
mise effect, Journal of Marketing Research, 41 (3), 237–257.
Klein, R. and Spady, R. (1993) An efficient semiparametric estimator for discrete choice models,
Econometrica, 61, 387–421.
Knight, F. H. (1921) Risk, Uncertainty and Profit, University of Chicago Press.
Koss, P. and Sami Khawaja, M. (2001) The value of water supply reliability in California: a
contingent valuation study, Water Policy, 3, 165–174.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1148 References
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1149 References
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1150 References
(2001) Combining preference data, in Hensher, D. A. (ed.), The Leading Edge of Travel
Behaviour Research, Pergamon Press, Oxford, 125–144.
Louviere, J., Hensher, D. A. and Swait, J. (2000) Stated Choice Methods: Analysis and
Applications, Cambridge University Press.
Louviere, J. J. and Islam, T. (2008) A comparison of importance weights and willingness-to-pay
measures derived from choice-based conjoint, constant sum scales and best–worst scaling,
Journal of Business Research, 61, 903–911.
Louviere, J. J., Islam, T., Wasi, N., Street, D. and Burgess, L. (2008) Designing discrete choice
experiments: do optimal designs come at a price?, Journal of Consumer Research, 35 (2),
360–375.
Louviere, J. J. and Lanscar, E. (2009) Choice experiments in health: the good, the bad, and the
ugly and toward a brighter future, Health Economics, Policy and Law, 4 (4), 527–546.
Louviere, J. J., Lings, I., Islam, T., Gudergan, S. and Flynn, T. (2013) An introduction to the
application of (case 1) best–worst scaling in marketing research, International Journal of
Marketing Research, 30, 292–303.
Louviere, J. J., Meyer, R. J., Bunch, D. S., Carson, R., Dellaert, B., Hanemann, W. A.,
Hensher, D. A. and Irwin, J. (1999) Combining sources of preference data for modelling
complex decision processes, Marketing Letters, 10 (3), 205–217.
Louviere, J., Oppewal, H., Timmermans, H. and Thomas, T. (2003) Handling large numbers of
attributes in conjoint applications, Working Paper 3.
Louviere, J. J., Street, D., Burgess, L., Wasi, N., Islam, T. and Marley, A. A. J. (2008) Modelling
the choices of individual decision makers by combining efficient choice experiment
designs with extra preference information, Journal of Choice Modelling, 1(1), 128–163.
Louviere, J. J. and Woodworth, G. (1983) Design and analysis of simulated consumer choice or
allocation experiments: an approach based on aggregate data, Journal of Marketing
Research, 20, 350–367.
Luce, R. D. (1959) Individual Choice Behavior, Wiley, New York.
Luce, R. D. and Suppes, P. (1965) Preference, utility and subjective probability, in Luce, R. D.,
Bush, R. R. and Galanter, E. (eds.), Handbook of Mathematical Psychology, Vol. III, Wiley,
New York.
Lui, Y. and Mahmassani, H. (2000) Global maximum likelihood estimation procedures for
multinomial probit (MNP) model parameters, Transportation Research Part B, 34 (5),
419–444.
Lundhede, T. H., Olsen, S. B., Jacobsen, J. B. and Thorsen, B. J. (2009) Handling respondent
uncertainty in choice experiments: evaluating recoding approaches against explicit mod-
elling of uncertainty, Faculty of Life Sciences, University of Copenhagen.
Lusk, J. L. (2003) Willingness to pay for golden rice, American Journal of Agricultural
Economics, 85 (4), 840–856.
Lusk, J. L. and Norwood, F. B. (2005) Effect of experimental design on choice-based conjoint
valuation estimates, American Journal of Agricultural Economics, 87 (3), 771–785.
Lusk, J. and Schroeder, T. (2004) Are choice experiments incentive compatible? A test with
quality differentiated beef steaks, American Journal of Agricultural Economics, 86 (2),
467–482.
Maddala, G. S. (1983) Limited-Dependent and Qualitative Variables in Econometrics,
Cambridge University Press.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1151 References
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1152 References
(2001b) Economic choices, Nobel Lecture, December 2000, American Economic Review, 91
(3), 351–378.
McFadden, D. and Train, K. (2000) Mixed MNL models for discrete response, Journal of
Applied Econometrics, 15 (5), 447–470.
McNair, B. J., Bennett, J. and Hensher, D. A. (2010) Strategic response to a sequence of discrete
choice questions, 54th Annual Conference of the Australian Agricultural and Resource
Economics Society. Adelaide.
(2011) A comparison of responses to single and repeated discrete choice questions, Resource
and Energy Economics, 33, 544–571.
McNair, B., Hensher, D. A. and Bennett, J. (2012) Modelling heterogeneity in response beha-
viour towards a sequence of discrete choice questions: a probabilistic decision process
model, Environment and Resource Economics, 51, 599–616.
Meral, G. H. (1979) Local drought-induced conservation: California experiences, Proceedings
of the Conference on Water Conservation: Needs and Implementing Strategies, American
Society of Civil Engineers, New York.
Meyer, R. K. and Nachtsheim, C. J. (1995) The coordinate-exchange algorithm for constructing
exact optimal experimental designs, Technometrics, 37 (1), 60–69.
Mongin, P. (1997) Expected utility theory, in Davis, J., Hands, W. and Mäki, U. (eds.)
Handbook of Economic Methodology, Edward Elgar, London, 342–350.
Morikawa, T. (1989) Incorporating stated preference data in travel demand analysis, PhD
dissertation, Department of Civil Engineering, MIT.
Morikawa, T., Ben-Akiva, M. and McFadden, D. (2002) Discrete choice models incorporating
revealed preferences and psychometric data, in Franses, P. H. and Montgomery, A. L.
(eds.), Econometric Models in Marketing, Vol. 16, Elsevier, Amsterdam, 29–55.
Morokoff, W. J. and Caflisch, R. E. (1995) Quasi-Monte Carlo integration, Journal of
Computational Physics, 122 (2), 218–230.
Mundlak, Y. (1978) On the pooling of time series and cross sectional data, Econometrica, 56,
69–86.
Murphy, J., Allen, P., Stevens, T. and Weatherhead, D. (2004) A meta-analysis of hypothetical
bias in stated preference valuation, Department of Resource Economics, University of
Massachusetts, Amherst, MA, January.
(2005) Is cheap talk effective at eliminating hypothetical bias in a provision point mechan-
ism?, Environmental and Resource Economics, 30 (3), 313–325.
Nelson, J. O. (1979) Northern California rationing lessons, Proceedings of the Conference on
Water Conservation: Needs and Implementing Strategies, American Society of Civil
Engineers, New York.
Niederreiter, H. (1992) Random number generation and quasi-Monte Carlo methods, CBMS-
NFS Regional Conference Series in Applied Mathematics, 63, SIAM, Philadelphia, PA.
Noland, R. B. and Polak, J. W. (2002) Travel time variability: a review of theoretical and
empirical issues, Transport Reviews, 22 (1), 39–93.
Ohler, T., Li, A., Louviere, J. J. and Swait, J. (2000) Attribute range effects in binary response
tasks, Marketing Letters, 11 (3), 249–260.
Olshavsky, R. W. (1979) Task complexity and contingent processing in decision making:
a replication and extension, Organizational Behavior and Human Performance, 24,
300–316.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1153 References
Orme, B. (1998) Sample size issues for conjoint analysis studies, Sawtooth Software Technical
Paper.
Ortúzar, J. de D., Iacobelli, A. and Valeze, C. (2000) Estimating demand for a cycle-way
network, Transportation Research Part A, 34 (5), 353–373.
Ortúzar, J. de D. and Willumsen, P. (2011) Transport Modelling, Wiley, New York, 4th edn.
Pareto, V. (1906) Manuale di economia politica, con una introduzione alla scienza sociale,
Societa Editrice Libraria, Milan.
Park, Y.-H., Ding, M. and Rao, V. (2008) Eliciting preference for complex products: a web-
based upgrading method, Journal of Marketing Research, 45, 562–574.
Paterson, R. W., Boyle, K. J., Parmeter, C. F., Beumann, J. E. and De Civita, P. (2008)
Heterogeneity in preferences for smoking cessation, Health Economics, 17 (12), 1363–1377.
Paulhus, D. L. (1991) Measurement and control of response bias, in Robinson, P. J.,
Shaver, P. R. and Wrightsman, L. S. (eds.), Measures of Personality and Social
Psychological Attitudes, Academic Press, San Diego, CA, 17–59.
Payne, J. D. (1972) The effects of reversing the order of verbal rating scales in a postal survey,
Journal of the Market Research Society, 14, 30–44.
Payne, J. W. (1976) Task complexity and contingent processing in decision making: an infor-
mation search and protocol analysis, Organizational Behavior and Human Performance,
16, 366–387.
Payne, J. W. and Bettman, J. R. (1992) Behavioural decision research: a constructive processing
perspective, Annual Review of Psychology, 43, 87–131.
Payne, J. W., Bettman, J. R. and Johnson, E. J. (1993) The Adaptive Decision Maker, Cambridge
University Press.
Payne, J. W., Bettman, J. R. and Schkade, D. A. (1999) Measuring constructed preferences:
towards a building code, Journal of Risk and Uncertainty, 19, 243–270.
Peeta, S., Ramos, J. L. and Pasupathy, R. (2000) Content of variable message signs and on-line
driver behavior, Transportation Research Record, 1725, 102–103.
Peirce, C. S. (1876) Note on the Theory of the Economy of Research, Coast Survey Report,
197–201.
Pendyala, R. and Bricka, S. (2006). Collection and analysis of behavioural process data:
challenges and opportunities, in Stopher, P. and Stecher, C. (eds.), Travel Survey
Methods: Quality and Future Directions, Elsevier, Oxford.
Peters, R.P. and Kramer, J. (2012) Just who should pay for what? Vertical equity, transit subsidy and
broad pricing: the case of New York City, Journal of Public Transportation, 15 (2), 117–136.
Poe, G., Giraud, K. and Loomis, J. (2005) Simple computational methods for measuring the
difference of empirical distributions: application to internal and external scope tests in
contingent valuation, American Journal of Agricultural Economics, 87 (2), 353–365.
Polak, J. (1987) A more general model of individual departure time choice transportation
planning methods, Proceedings of Seminar C held at the PTRC Summer Annual Meeting,
P290, 247–258.
Polak, J., Hess, S. and Liu, X. (2008) Characterising heterogeneity in attitudes to risk in expected
utility models of mode and departure time choice, Paper presented at the Transportation
Research Board (TRB) 87th Annual Meeting, Washington, DC.
Portney, P. R. (1994) The contingent valuation debate: why economists should care, Journal of
Economic Perspectives, 8 (4), 3–17.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1154 References
Powers, D. A. and Xie, Y. (2000) Statistical Methods for Categorical Data Analysis, Academic
Press, New York.
Powers, E. A., Morrow, P., Goudy, W. J. and Keith, P. (1977) Serial order preference in survey
research, Public Opinion Quarterly, 41 (1), 80–85.
Prato, C. G., Bekhor, S. and Pronello, C. (2012) Latent variables and route choice behaviour,
Transportation, 39 (3), 299–319.
Pratt, J. (1981) Concavity of the log likelihood, Journal of the American Statistical Association,
76, 103–106.
Prelec, D. (1998) The probability weighting function, Econometrica, 66 (3), 497–527.
Puckett, S. M. and Hensher, D. A. (2009) Revealing the extent of process heterogeneity in choice
analysis: an empirical assessment, Transportation Research Part A, 43 (2), 117–126.
(2006) Modelling interdependent behaviour utilising a sequentially-administered stated
choice experiment: analysis of urban road freight stakeholders, Conference for the
International Association of Transport Behaviour Research, Kyoto.
(2008) The role of attribute processing strategies in estimating the preferences of road
freight stakeholders under variable road user charges, Transportation Research Part E,
44, 379–395.
Puckett, S. M., Hensher, D. A., Rose, J. M. and Collins, A., (2007) Design and development of a
stated choice experiment for interdependent agents: accounting for interactions between
buyers and sellers of urban freight services, Transportation, 34 (4), 429–451.
Pudney, S. and Shields, M. (2000) Gender, race, pay and promotion in the British nursing
profession: estimation of a generalized ordered probit model, Journal of Applied
Econometrics, 15 (4), 367–399.
Pullman, M. E., Dodson, K. J. and Moore, W. L. (1999) A comparison of conjoint methods
when there are many attributes, Marketing Letters, 10, 1–14.
Quan, W., Rose, J. M., Collins, A. T. and Bliemer, M. C. J. (2011) A comparison of algorithms
for generating efficient choice experiments, Working Paper ITLS-WP-11-19, Institute of
Transport and Logistics Studies, University of Sydney.
Quandt, R. E. (1970) The Demand for Travel: Theory and Measurement, D.C. Heath,
Lexington, MA.
Quiggin, J. (1982) A theory of anticipated utility, Journal of Economic Behavior and
Organization, 3 (4), 323–343.
(1994) Regret theory with general choice sets, Journal of Risk and Uncertainty, 8 (2), 153–165.
(1995) Regret theory with general choice sets, Risk and Uncertainty, 8 (2), 153–165.
(1998) Individual and household willingness to pay for public goods, American Journal of
Agricultural Economics, 80, 58–63.
Racevskis, L. and Lupi, F. (2008) Incentive compatibility in an attribute-based referendum
model, Paper presented at the American Agricultural Economics Association Annual
Meeting, Orlando, FL, 27–29 July.
Rasch, G. (1960) Probabilistic Models for Some Intelligence and Attainment Tests, Denmark
Paedogiska, Copenhagen.
Rasouli, S. and Timmermans, H. (2014) Specification of regret-based models of choice beha-
vior: formal analyses and experimental design based evidence, Eindhoven University of
Technology.
Restle, F. (1961) Psychology of Judgment and Choice, Wiley, New York.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1155 References
Revelt, D. and Train, K. (1998) Mixed logit with repeated choices: households’ choices of
appliance efficiency level, Review of Economics and Statistics, 80 (4), 647–657.
Riedl, R., Brandstatter, E. and Roithmayr, F. (2008). Identifying decision strategies: a
process- and outcome-based classification method, Behavior Research Methods, 40
(3): 795–807.
Riphahn, R., Wambach, A. and Million, A. (2003) Incentive effects in the demand for health
care: a bivariate panel count estimation, Journal of Applied Econometrics, 18 (4), 387–405.
Roberts, J. A., Hann, L.-H. and Slaughter, S. A. (2006) Understanding the motivations, parti-
cipation, and performance of open source software development: a longitudinal study of
the Apache Projects, Management Science, 52 (7), 984–999.
Roeder, K., Lynch, K. and Nagin, D. (1999) Modeling uncertainty in latent class membership: a
case study in criminology, Journal of the American Statistical Association, 94, 766–776.
Rose, J. M. (2014) Interpreting discrete choice models based on best–worst data: a matter of
framing, 93rd Annual Meeting of the Transportation Research Board TRB 2014,
Washington DC, 16 January.
Rose, J. M., Bain, S. and Bliemer, M. C. J. (2011) Experimental design strategies for stated
preference studies dealing with non market goods, in Bennett, J. (ed.), International
Handbook on Non-Marketed Environmental Valuation, Edward Elgar, Cheltenham,
273–299.
Rose, J. M., Bekker de-Grob, E. and Bliemer, E. (2012) If theoretical framework matters, then
why are we ignoring their tenants? An (re)examination of random utility theory and
beyond, Working Paper, Institute of Transport and Logistics Studies, The University of
Sydney, November.
Rose, J. M. and Black, I. (2006) Means matter, but variances matter too: decomposing response
latency influences on variance heterogeneity in stated preference experiments, Marketing
Letters, 17 (4), 295–310.
Rose, J. M. and Bliemer, M. C. J. (2004) The design of stated choice experiments: the state of
practice and future challenges, Working Paper ITS-WP-04-09, Institute of Transport and
Logistics Studies, University of Sydney.
(2005) Sample optimality in the design of stated choice experiments, Report
ITLS-WP-05-13, Institute of Transport and Logistics Studies, University of Sydney.
(2006) Designing efficient data for stated choice experiments, Proceedings of the 11th
International Conference on Travel Behaviour Research, Kyoto.
(2008) Stated preference experimental design strategies, in Hensher, D. A. and Button, K. J.
(eds), Handbook of Transport Modelling, Elsevier, Oxford, 151–179.
(2009) Constructing efficient stated choice experimental designs, Transport Reviews, 29 (5),
587–617.
(2011) Stated preference experimental design strategies, in Hensher, D. A. (ed.), Transport
Economics: Critical Concepts in Economics, Vol. 1, Routledge, Oxford, 304–332.
(2012) Sample optimality in the design of stated choice experiments, in Pendyala, R. and
Bhat, C. (eds), Travel Behaviour Research in the Evolving World, IATBR, Jaipur, 119–145.
(2013) Sample size requirements for stated choice experiments, Transportation, 40 (5),
1021–1041.
(2014) Stated choice experimental design theory: the who, the what and the why, in Hess, S.
and Daly, A. (eds.), Handbook of Choice Modelling, Edward Elgar, Cheltenham.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1156 References
Rose, J., Bliemer, M., Hensher, D. A. and Collins, A. (2008) Designing efficient stated choice
experiments in the presence of reference alternatives, Transportation Research Part B, 42
(4), 395–406.
Rose, J. and Hensher, D. A. (2004) Modelling agent interdependency in group decision making:
methodological approaches to interactive agent choice experiments, Transportation
Research Part E, 40 (1), 63–79.
(2014) Tollroads are only part of the overall trip: the error of our ways in past willingness to
pay studies, Transportation, 41 (4), 819–837.
Rose, J., Hensher, D. A. and Greene, W. (2005) Recovering costs through price and service
differentiation: accounting for exogenous information on attribute processing strategies in
airline choice, Journal of Air Transport Management, 11, 400–407.
Rose, J. M., Scarpa, R. and Bliemer, M. C. J. (2009) Incorporating model uncertainty into
the generation of efficient stated choice experiments: a model averaging approach,
International Choice Modelling Conference, March 30-April 1, Harrogate.
Rumelhart, D. L. and Greeno, J. G. (1968) Choice between similar and dissimilar objects: an
experimental test of the Luce and Restle choice models, presented at the Midwestern
Psychological Association meeting, Chicago, May.
Russell, C. S., Arey, D. G. and Kates, R. W. (1970) Drought and Water Supply: Implications of
the Massachusetts Experience for Municipal Planning, Johns Hopkins University Press for
Resources for the Future, Inc., Baltimore, MD.
Russo, J. E. and Dosher, B. A. (1983) Strategies for multiattribute binary choice, Journal of
Experimental Psychology: Learning, Memory, & Cognition, 9 (4), 676–696.
Sakia, R. M. (1992) The Box–Cox transformation technique: a review, The Statistician, 41 (2),
169–178.
Sándor, Z. and Train, K. (2004), Quasi-random simulation of discrete choice models,
Transportation Research Part B, 38 (4), 313–327.
Sándor, Z. and Wedel, M. (2002) Profile construction in experimental choice designs for mixed
logit models, Marketing Science, 21 (4), 455–475.
(2005) Heterogeneous conjoint choice designs, Journal of Marketing Research, 42, 210–218.
(2001) Designing conjoint choice experiments using managers’ prior beliefs, Journal of
Marketing Research, 36, 430–444.
Savage, L. J. (1954) The Foundations of Statistics, Wiley, London.
Scarpa, R., Campbell, D. and Hutchinson, G. (2005) Individual benefit estimates for rural
landscape improvements: the role of sequential Bayesian design and response rationality
in a choice study, Paper presented at the 14th Annual Conference of the European
Association of Environmental and Resource Economics, Bremen.
(2007) Benefit estimates for landscape improvements: sequential Bayesian design and
respondents’ rationality in a choice experiment study, Land Economics, 83 (4), 617–634.
Scarpa, R., Ferrini, S. and Willis, K. G. (2005) Performance of error component models for
status-quo effects in choice experiments, in Scarpa, R., Ferrini, S. and Willis, K. G. (eds.),
Applications of Simulation Methods in Environmental and Resource Economics, Springer,
Dordrecht, 247–274.
Scarpa, R., Gilbride, T. J., Campbell, D. and Hensher, D. A. (2009) Modelling attribute non-
attendance in choice experiments for rural landscape valuation, European Review of
Agricultural Economics, 36 (2), 151–174.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1157 References
Scarpa, R. and Rose, J. M. (2008) Designs efficiency for non-market valuation with choice
modelling: how to measure it, what to report and why, Australian Journal of Agricultural
and Resource Economics, 52 (3), 253–282.
Scarpa, R., Thiene, M. and Hensher, D. A. (2010) Monitoring choice task attribute attendance
in non-market valuation of multiple park management services: does it matter?, Land
Economics, 86 (4), 817–839, Waikato Management School, The University of Waikato.
(2012) Preferences for tap water attributes within couples: an exploration of alternative
mixed logit parameterizations, Water Resources Research Journal, 48 (1), 1–11,
doi:10.1029/2010WR010148.
Scarpa, R., Thiene, M. and Marangon, F. (2008) Using flexible taste distributions to value
collective reputation for environmentally-friendly production methods, Canadian Journal
of Agricultural Economics, 56, 145–162.
Scarpa, R., Thiene, M. and Train, K. (2008) Utility in willingness to pay space: a tool to address
confounding random scale effects in destination choice to the Alps, American Journal of
Agricultural Economics, 90 (4), 994–1010. (See also Appendix: Utility in WTP space: a tool
to address confounding random scale effects in destination choice to the Alps. Available at:
http://agecon.lib.umn.edu/.).
Scarpa, R., Willis, K. G. and Acutt, M. (2004) Individual-specific welfare measures for public
goods: a latent class approach to residential customers of Yorkshire Water, in Koundouri,
P. (ed.), Econometrics Informing Natural Resource Management, Edward Elgar,
Cheltenham.
Schade, J. and Baum, M. (2007) Reactance or acceptance? Reactions towards the introduction of
road pricing, Transportation Research Part A, 41 (1), 41–48.
Schade, J. and Schlag, B. (eds.) (2003) Acceptability of Transport Pricing Strategies, Elsevier,
Oxford.
Schwanen, T. and Ettema, D. (2009) Coping with unreliable transportation when collecting
children: examining parents’ behavior with cumulative prospect theory, Transportation
Research Part A, 43 (5), 511–525.
Senna, L. A. D. S. (1994) The influence of travel time variability on the value of time,
Transportation, 21 (2), 203–228.
Seror, V. (2007) Fitting observed and theoretical choices – women’s choices about prenatal
diagnosis of Down syndrome, Health Economics, 14 (2), 161–167.
Shafer, G. (1976) A Mathematical Theory of Evidence, Princeton University Press.
Sillano, M. and Ortúzar, J. de D. (2005) Willingness-to-pay estimation with mixed logit models:
some new evidence, Environment and Planning A, 37 (5), 525–550.
Simon, H. (1978) Rational decision making in organisations, American Economic Review, 69
(4), 493–513.
Simonson, I. and Tversky, A. (1992) Choice in context: tradeoff contrast and extremeness
aversion, Journal of Marketing Research, 29 (3): 281–295.
Slovic, P. (1987) Perception of risk, Science, New Series, 236 (4799), 280–285.
(1995) The construction of preference, American Psychologist, 50, 364–371.
Small, K. A. (1992) Using the revenues from congestion pricing, Transportation, 19 (3), 359–381.
Small, K. A., Noland, R. B., Chu, X. and Lewis, D. (1999) Valuation of travel-time savings and
predictability in congested conditions for highway user-cost estimation, NCHRP Report
431, Transportation Research Board, National Research Council.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1158 References
Smith, V. K. and Van Houtven, G. (1998) Non-market valuation and the household, Resources
for the Future, Discussion Paper 98–31, Washington, DC.
Sobol, I. M. (1967) Distribution of points in a cube and approximate evaluation of integrals,
USSR Computational Mathematics and Mathematical Physics, 7 (4), 784–802.
Sonnier, G., Ainslie, A. and Otter, T. (2003) The influence of brand image and product style on
consumer brand valuations, Working Paper, Anderson Graduate School of Management,
University of California, Los Angeles, CA.
(2007) Heterogeneity distributions of willingness-to-pay in choice models, Quantitative
Marketing Economics, 5 (3), 313–331.
Starmer, C. (2000) Developments in non-expected utility theory: the hunt for a descriptive
theory of choice under risk, Journal of Economic Literature, 38, 332–382.
Starmer, C. and Sugden, R. (1993) Testing for juxtaposition and event splitting effects, Journal
of Risk and Uncertainty, 6, 235–254.
Steimetz (2008) Defensive driving and the external costs of accidents and travel delays,
Transportation Research Part B, 42 (9), 703–724.
Steimetz, S. and Brownstone. D. (2005) Estimating commuters’ ‘value of time’ with noisy data: a
multiple imputation approach, Transportation Research Part B, 39 (7), 565–591.
Stewart, M. B. (2004) A comparison of semiparametric estimators for the ordered response
model, Computational Statistics and Data Analysis, 49, 555–573.
Stewart, N., Chater, N., Stott, H. P. and Reimers, S. (2003) Prospect relativity: how choice
options influence decision under risk, Journal of Experimental Psychology: General, 132,
23–46.
Stopher, P. R. and Lisco, T. (1970) Modelling travel demand: a disaggregate behavioral
approach, issues and applications, Transportation Research Forum Proceedings, 195–214.
Stott, H. P. (2006) Cumulative prospect theory’s functional menagerie, Journal of Risk and
Uncertainty, 32 (2), 101–130.
Street, D., Bunch, D. and Moore, B. (2001) Optimal designs for 2k paired comparison experi-
ments, Communications in Statistics – Theory and Method, 30, 2149–2171.
Street, D. J. and Burgess, L. (2004) Optimal and near-optimal pairs for the estimation of effects
in 2-level choice experiments, Journal of Statistical Planning and Inference, 118, 185–199.
Street, D. J., Burgess, L. and Louviere, J. J. (2005) Quick and easy choice sets: contructing
optimal and nearly optimal stated choice experiments, International Journal of Research
in Marketing, 22, 459–470.
Sugden, R. (2005) Anomalies and stated preference techniques: a framework for a discussion of
coping strategies, Environmental and Resource Economics, 32, 1–12.
Sundstrom, G. A. (1987) Information search and decision making: the effects of information
displays, Acta Psychologica, 65, 165–179.
Svenson, O. (1998) The perspective from behavioral decision theory on modelling travel choice,
in Garling, T., Laitila, T. and Westin, K. (eds.), Theoretical Foundations of Travel Choice
Modelling, Elsevier, Oxford, 141–172.
Svenson, O. and Malmsten, N. (1996) Post-decision consolidation over time as a function of
gain or loss of an alternative, Scandinavian Journal of Psychology, 37 (3), 302–311.
Swait, J. (1994) A structural equation model of latent segmentation and product choice for
cross-sectional revealed preference choice data, Journal of Retail and Consumer Services, 1
(2), 77–89.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1159 References
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1160 References
Train, K. and Revelt, D. (2000) Customer-specific taste parameters and mixed logit,
Working Paper, Department of Economics, University of California, Berkeley.
Available at: http://elsa.berkeley.edu/wp/train0999.pdf.
Train, K. and Weeks, M. (2005) Discrete choice models in preference space and willing to-pay
space, in Scarpa, R. and Alberini, A. (eds.), Applications of Simulation Methods in
Environmental and Resource Economics, Springer, Dordrecht, 1–16.
Train, K. and Wilson, W. (2008) Estimation on stated-preference experiments constructed
from revealed-preference choice, Transportation Research Part B, 40 (2), 191–203.
Truong, T. P. and Hensher, D. A. (2012) Linking discrete choice to continuous demand models
within a computable general equilibrium framework, Transportation Research Part B, 46
(9), 1177–1201.
Tuffin, B. (1996) On the use of low-discrepancy sequences in Monte Carlo methods, Monte
Carlo Methods and Applications, 2 (4), 295–320.
Tukey, J. W. (1957) The comparative anatomy of transformations, Annals of Mathematical
Statistics, 28, 602–632.
(1962) The future of data analysis, Annals of Mathematical Statistics, 33 (1), 13.
Tversky, A. and Fox, C. (1995) Weighing risk and uncertainty, Psychological Reviews, 102 (2),
269–283.
Tversky, A. and Kahneman, D. (1981) The framing of decisions and the psychology of choice,
Science, 211 (4), 453–458.
(1992) Advances in prospect theory: cumulative representations of uncertainty, Journal of
Risk and Uncertainty, 5 (4), 297–323.
Tversky, A. and Koehler, D. (1994) Support theory: a nonextensional representation of sub-
jective probability, Psychological Review, 1010, 547–567.
Tversky, A. and Simonson, I. (1993) Context-dependent preferences, Management Science, 39
(10), 1179–89.
Tversky, A., Slovic, P. and Kahneman, D. (1990) The causes of preference reversal, American
Economic Review, 80 (1), 204–217.
Ubbels, B. and Verhoef, E. (2006) Acceptability of road pricing and revenue use in the
Netherlands, European Transport/Trasporti Europei, 32, 69–94.
Uebersax, J. (1999) Probit latent class analysis with dichotomous or ordered category measures:
conditional independence/dependence models, Applied Psychological Measurement, 23,
283–297.
Van Amelsfort, D. H. and Bliemer, M. C. J. (2005) Uncertainty in travel conditions related
travel time and arrival time: some findings from a choice experiment, Proceedings of the
European Transport Conference (ETC), Strassbourg.
van de Kaa, E. J. (2008) Extended prospect theory, TRAIL Research School, Delft.
Verlegh, P. W., Schifferstein, H. N. and Wittink, D. R. (2002) Range and number-of-levels effects
in derived and stated measures of attribute importance, Marketing Letters, 13, 41–52.
Vermeulen, B., Goos, P. and Vandeboek, M. (2008) Models and optimal designs for conjoint
choice experiments including a no-choice option, International Journal of Research in
Marketing, 25 (2), 94–103.
Vermuelen, F. (2002) Collective household models: principles and main results, Journal of
Economic Surveys, 16, 533–564.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1161 References
Viney, R., Savage, E. and Louviere, J. J. (2005) Empirical investigation of experimental design
properties of discrete choice experiments, Health Economics, 14 (4), 349–362.
von Neumann, J. and Morgenstern, O. (1947) Theory of Games and Economic Behavior,
Princeton University Press, 2nd edn.
Vuong, Q. H. (1989) Likelihood ratio tests for model selection and non-nested hypotheses,
Econometrica, 57 (1989), 307–333.
Wakker, P. P. (2008) Explaining the characteristics of the power (CRRA) utility family, Health
Economics, 17 (12), 1329–1344.
Walker, J. L. and Ben-Akiva, M. E. (2002) Generalized random utility model, Math Soc. Sc., 43
(3), 303–343.
Walker, J. L., Ben-Akiva, M. and Bolduc, D. (2007) Identification of parameters in normal error
component logit-mixture (NECLM) models, Journal of Applied Econometrics, 22 (6),
1095–1125.
Wang, X. and Hickernell, F. J. (2000) Randomized Halton sequences, Mathematical and
Computer Modelling, 32 (7–8), 887–899.
Wardman, M. (2001) A review of British evidence on time and service quality valuations,
Transportation Research Part E, 37 (2–3), 107–128.
Watson, S. M., Toner, J. P., Fowkes, A. S. and Wardman, M. R. (2000) Efficiency properties of
orthogonal stated preference designs, in Ortúzar, J. de D. (ed.), Stated Preference Modelling
Techniques, PTRC Education and Research Services Ltd, 91–101.
Williams, H. C. W. L. (1977) On the formation of travel demand models and economic
evaluation measures of user benefit, Environment and Planning Part A, 9 (3),
285–344.
Williams, R. (2006) Generalized ordered logit/partial proportional odds models for ordinal
dependent variables, Stata Journal, 6 (1), 58–82.
Wilson, A. G., Hawkins, A. F., Hill, G. J. and Wagon, D. J. (1969) Calibrating and testing of the
SELNEC transport model, Regional Studies, 3(3), 340–345.
Winiarski, M. (2003) Quasi-Monte Carlo derivative valuation and reduction of simulation bias,
MSc Thesis, Royal Institute of Technology (KTH), Stockholm.
Wong, S. K. M., and Wang, Z. W. (1993) Qualitative measures of ambiguity, in Hackerman, D.
and Mamdani, A. (eds.), Proceedings of The Ninth Conference on Uncertainty in Artificial
Intelligence, Morgan Kaufmann, San Mateo, CA, 443–450.
Wooldridge, J. (2010) Econometric Analysis of Cross Section and Panel Data, MIT Press,
Cambridge, MA.
Yanez, M. F., Raveau, S., Rojas, M. and Ortúzar, J. de D. (2009) Modelling and forecasting with
latent variables in discrete choice panel models, Proceedings of the European Transport
Conference, Noordwijk Conference Centre, Leeuwenhorst.
Yanez, M. F., Bahamonde-Birke, F., Raveau, S. and Ortúzar, J. de D. (2010) The role of tangible
attributes in hybrid discrete choice models, Proceedings of the European Transport
Conference, Glasgow.
Yanez, M. F., Raveau, S. and Ortúzar, J. de D. (2010). Inclusion of latent variables in mixed
logit models: modelling and forecasting, Transportation Research Part A: Policy and
Practice, 44 (9), 744–753.
Yoon, S.-O. and Simonson, I. (2008) Choice set configuration as a determinant of preference
attribution and strength, Journal of Consumer Research, 35, 324–336.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
1162 References
Yu, J., Goos, P. and Vandeboek, M. (2006) The importance of attribute interactions in conjoint
choice design and modeling, Department of Decision Sciences and Information
Management Working Paper 0601.
(2008) A comparison of different Bayesian design criteria to compute efficient conjoint
choice experiments, Department of Decision Sciences and Information Management
Working Paper 0817.
(2009) Efficient conjoint choice designs in the presence of respondent heterogeneity,
Marketing Science, 28 (1), 122–135.
Zavoina, R. and McElvey, W. (1975) A statistical model for the analysis of ordinal level
dependent variables, Journal of Mathematical Sociology, Summer, 103–120.
Zeelenberg, M. (1999) The use of crying over spilled milk: a note on the rationality and
functionality of regret, Philosophical Psychology, 12 (3), 325–340.
Zeelenberg, M. and Pieters, R. (2007) A theory of regret regulation 1.0, Journal of Consumer
Psychology, 17 (1), 3–18.
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:41 BST 2015.
http://dx.doi.org/10.1017/CBO9781316136232.029
Cambridge Books Online © Cambridge University Press, 2015
Index
1163
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1164 Index
attribute levels 192, 196–201, 206, 238, 267, attribute-wise transition 964–965
277–278, 284, 304–305, 961 attributes 4, 12–13, 192
balance 307 ACMA (attribute aggregation in common-
choice experiment 994 metric units) 715–722, 723
design 998 allocated to design columns 228–247
effect 270–275 and alternatives 208, 367–370, 828
expanded alternatives 208, 828 ANA (attribute non-attendance) 715–722, 723
labels 199, 200–201, 206 as blocking variable 227–228
LCM (latent class models) 714–715 cost-related 786
and parameter estimates 248 design columns 239, 243–244, 246–247
in pivot design 256 elemental alternatives 577
range effect 270–275 in experimental design 473
ranges 273 FAA (full attribute attendance) 715–722, 723
reduction 207–208 fixed attribute levels 305, 308, 309
in stated choice 256 hybrid alignable/non-alignable 958
in stated choice experiment 549 ignored by respondents 826, 828
survey design 282 influences on 823–825
attribute mean and standard deviation model inter-attribute correlation 198–199
summary 862–863 narrow attribute range 827
attribute non-attendance model 736–741, 977–979, of non-chosen alternatives 887
1054–1057 non-considered attributes 1057
attribute package levels 934 non-linear 57–71
attribute preservation/non-preservation non-random parameters 618–619
818–819 observable attributes and individual behavior
attribute processing 15, 120, 658, 724, 819, 874, 977–978
1012 observed 360
dimensional versus holistic 1015–1016 pivoted 786
multiple heuristics role in 1058–1062 predefined 282
attribute processing heuristics, through non-linear in public transport alternatives 853
processing 968, 986–987 reference dependency 829–830
attribute processing strategy 874, 983, refining list of 196–201
986, 1010 relative attribute levels 1051–1052
attribute profiles 821 relevancee 887
attribute range SC (stated choice) 969, 1080
CE influence on WTP 890–891 single attribute utility 199–200
and heterogeneity 890–891 statistical significance of 616–617
and MWTP 890 attributes and mod-specific constants model results
profile in choice experiment 1014 summary 862–863
attribute reduction strategy 825 Auger, P. 259
attribute risk 913 Australian case study
attribute strategy consistency 971 commuter service packages 968, 986–987
attribute thresholds 948–949, 987–1009 ordered choice model 817–830
accounting for 989–993 stated choice experiment 853–860
responses 998 Australian cities example, and bivariate probit
upper/lower cut-off 998 model 800–803
attribute transformations 57–71 Australian empirical evidence results summary 881
attribute-accumulation rule 830 automobile purchases, case study data 1079–1082
attribute-based decision strategies 939–941 automobile purchases, case study results 1082–1091
attribute-based processing 939, 940 AVC (asymptotic variance-covariance) matrix 248,
attribute-interaction standard deviation 249–251, 257, 258, 266–267, 304–306,
641–642 309–313, 314–315, 316,
attribute-specific dummy variables 1051–1052 317–319, 1081
attribute-specific standard deviation Average Partial Effect 14, 346–347, 754
641–642 averaging 28
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1165 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1166 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1167 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1168 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1169 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1170 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1171 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1172 Index
Generalized Ordered logit models 808–809, Greene, W.H. 14, 73, 103, 110, 112–114, 353, 360,
823, 824 622, 672–673, 674–675, 677, 694, 707, 708,
marginal effects 826 718, 736, 751, 768, 769–770, 777, 779, 780,
Generalized Ordered probit model 808–809, 811 804, 805, 807, 808, 810, 811–812, 813, 850,
generating efficient designs 247–254 872–873, 878–879, 884–886, 901, 941,
generic parameter estimates 504 952–953, 979–1002, 1009
generic parameters 49–51, 304–305, 306–307, 308, group decision making 1072–1091
309, 312–313, 314, 316, 317, 318–319, group equilibrium model results 1088
439–440 group equilibrium preferences 1076–1078, 1090–1091
estimates 330–331 GSOEP (German Socioeconomic Panel data)
genetic algorithms 254 analysis 756–766, 767
GEV (generalized Extreme value) distribution types Gumbel scale MNL 993
93–98, 848
Geweke, J. 170 Haab, T.C. 352
GHK simulator 93, 170–176 Haaijer, R. 1018
Gilboa, I. 886 habit persistence 961
Gilbride, T.J. 953–954 Hajivassiliou, V. 170
Gilovich, T. 818, 822, 823, 942 ;halton command 608–609
Glynn, P.W. 155 Halton draws 277, 606, 614–615
GMM (generalized method of moments) method Halton, J. 136
321–323 Halton sequences 138–145, 157, 168, 254, 606,
GMNL (generalized multinomial logit model) 166 608–609
GMX (generalized mixed logit model) 676–697, correlation structure 164
704–705, 861 SHS (shuffled Halton sequences) 147–148,
direct elasticity mean estimates 698 614–615
model 1: utility space: RPL unconstrained Hanemann, M. 875
distributions and correlated attributes Harrison, G. 868–870, 874, 875, 876–877, 879,
678–681, 690, 694, 696, 697, 861–865 884–886, 905, 911, 912
model 2: WTP Space: unconstrained Hausman, J. 875
distributions and correlated attributes HCM (Hybrid Choice Models)
681–685, 690, 694, 696, 697, 861–865 data arrangements 935
model 3: U-Specification: GMX unconstrained latent attitude variables 928, 931
T’s with scale and taste heterogeneity and likelihood function 932
correlated attributes 685–688, 690, 694, 696, main elements of 931–936
697, 861–865 multinomial choice utility functions 932
model 4: RPL t,1 688, 697 observed indicators 932
model results summary 862–863 overview of 927–931
Nlogit syntax 865–868 socio-demographic characteristics 927–928
in utility space 697–704 underlying perceptions/attitudes 928
variance parameters 698 health care utilisation cross-tabulation 776
GMXL (generalized random parameter/mixed logit Heckman, J. 707, 719, 782
model) 697, 860–865 Hensher, D.A. 10, 14, 26–27, 30–31, 73, 74, 76, 99,
GOCM (generalized ordered choice model) 100–101, 103, 110, 112–114, 190, 191, 207,
807–817, 826–828, 829–830 259, 264, 265, 272–273, 275, 301–303, 360,
Nlogit commands 830–835 402–403, 547–548, 571, 594, 622, 672–673,
Goldstein, W.M. 909 674–675, 677, 694, 718, 719, 736, 773, 774,
Golob, T. 928 787, 788, 793, 798, 804–805, 808, 817–818,
Gonzales, R. 941 819–820, 822, 838, 846, 848, 850, 853,
good deal/bad deal heuristic 959, 960 869–870, 872–873, 874, 878–879, 884–886,
goodness of fit 702, 792, 888, 890–891, 986 887, 890–891, 892–893, 894, 901, 913, 914,
Goodwin, P. 798, 884–886 917, 921, 938, 940, 941, 942, 945, 947,
Goos, P. 318–319 948–949, 952–953, 959, 960, 962, 964–965,
Gourville, J.T. 957–958 969, 979–1002, 1009, 1010, 1012, 1051,
gradient matrices 176–179 1059, 1072, 1073, 1087
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1173 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1174 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1175 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1176 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1177 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1178 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1179 Index
;show command 496–499 importing small data set from text editor 418–420
aggregation method 505 indirect utility functions 457–461
basic data set up 401–405 initial MNL model 493
binary choice application 522–524 installation 388
calibration 410 intelligent draw methods 606
choice data entered on single line 424–426 Johnson Sb distributed parameter 603
choice data modeling 456–457 kernel densities 621
choice probability 465–466 labeled choice data set 437, 466–471
combining data sources 408–410 LCM (latent class models) 714–741
combining SP-RP data 409 leaving session 391
command format 393–394 LHS choice variable 426
command methods 392–397 limitations in 398
commands 395–396, 437, 450–451, 461–463, LIML (limited information maximum
466, 492–494, 501–502, 515, 554–555, 645, likelihood) estimators 570–571
724–741, 830–835, 858–860 Log-likelihood (LL) functions 571–573
NL (nested logit) model 561–567, 600 lognormal distributed parameter 603
concurrent simulations 522 marginal effects output 492
conditional choice 411–414 Maximize command 554–555
contingency table 501–502 mean centring variables 495
converting single line data commands 431–432 missing data 412, 495
correlation command 637, 639 MNL command 437–444
correlation matrix 493, 639 MNL output 570–571
and covariance matrix 430 model calibration 555–559
CSV (comma delimited file) 510 model parameters 502–518
data cleaning 427–428 model results summary 550
data entered in single line 425 multiple data sets entering 405
data entering 414–415 naive pooling 373, 377, 505
data entering in data editor 421–422 NL ML estimation 570–571
data of interest subset 493 NL (nested logit) see NL (nested logit)
data melding 405 NLWLR 1062–1066
data stacking 405 no choice alternative 406–408, 411–414
data understanding 494–502 no choice variable adding 413
data weighting 527–543 observation removal 547
default missing value 420 overview 387–388
delay choice variable adding 413 parameter estimates 646–647
descriptive statistics 493 parameter names 440–444
;descriptives command 499–501 parameter names selection 494
diagnostic messages 432–433 partial output 492
Dstats descriptive output 494–496 partial/marginal effects 14, 346–347, 374–378,
elasticities 503–504, 507–511, 512–513, 524–527 513–515
elasticities output 492 binary choice 515–518
endogenous weighting 527–543 PARTIALS command 515
error messages 432–433 prediction success 492
exogenous variable weighting 410–411 probabilities calculation 576–577
exogenous weighting 527 program crashes 433
exogenous weights entered 411 project file 390, 414
export command 493, 507 Project File Box 396–397
fen parameter characters 604 question mark (?) in commands 492–493
FIML (full information maximum likelihood) RAM heuristics 1062–1066
estimators 570–571 random parameter covariances 637
functions performed by 391 random parameter draws 604–606
general choice data 402 random parameters 603, 604–606
IACE 1091–1115 Rayleigh variable 621
importing data from file 415–420 reading the data 388, 493
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1180 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1181 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1182 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1183 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1184 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1185 Index
Scarpa, R. 112–114, 288–289, 313–318, 672–673, Model 4 (SMNL scale hetereogeneity model) 697,
677, 690, 696, 718, 952–953, 1025 698, 701–702, 704–705
;scenario command 519 in utility space 697–704
Schade, J. 798 smoothed AR (accept–reject) simulator 169–170
Schmeidler, D. 886 smoothing parameter 620–621
Schroeder, T. 868–869, 872–873, 877–878 Sobol, I.M. 150
SDCs (socio-demographic characteristics) 440, Sobol draws 152–153
461, 518 Sobol sequences 150–153, 175, 254
SDT (slowed down time) 381, 1010–1013 socio-demographic variables 478–483
segmentation instrument 31 socio-economic characteristics 4
selected observations 783 Soman, D. 957–958
selective processing 939 Sonnier, G. 112–114, 674–675, 677
self-stated processing response 981–987 sources of agreement 1086–1087, 1091
SELNEC transport model 9 SP (stated preference) choice 12
SEM (structural equation modeling) 929–930 SP (stated preference) contextual biases 594
semi-compensatory behavior 367 SP (stated preference) data 464–465, 472, 527,
semi-parametric probability representations 126 704–705, 843–848
Senna, L.A.D.S. 906, 920 data enrichment 838–839
sequential estimation 929–930 market constraints 838
SEUT (Subjective Expected Utility Theory) 907 personal constraints 838
shape of the preferences of individuals 20 product sets 838
Shields, M. 810, 811 technological relationships 837
Shiroishi, F. 838 vector of attributes 841
;show command 496–499 and WTP 868
SHS (shuffled Halton sequences) see Halton see also combining data sources
shuffled uniform vectors 606 SP (stated preference) estimates 285
sigmoidal curve 44 SP (stated preference) parameter estimates 847–848
significance value 453 SP (stated preference) t-ratios 285
significant parameter estimates 615 SP–RP, see also combining data sources
Simon, H. 937 SP–RP models 14–15, 409, 846–847, 848–849,
Simonson, I. 937, 938, 956–957, 958, 959 853–868
simulated data 76–77 Spady, R. 126, 745
simulated log-likelihood estimation specific attribute processing heuristics 1009
cross-sectional model 130 specific designs optimal choice probability 314
panel model 132 specific parameter estimates 49–51, 90–91
simulated log-likelihood function 129–131, 134 specification tests 321, 330–333
simulated maximum likelihood 126–133, 675 SPRP dummy variable 408
;simulation command 518–527 SPSS program 223–224, 228, 242, 243–247
simultaneous equations 777–782 SQ (status quo) alternatives 1010–1013, 1025
Singer, B. 707, 719 Srinivasan, V. 955, 956–957
single attribute utility 199–200 Stacey, E.C. 594
single crossing characteristic 807 standard deviation parameters 641–643, 669
Slovic, P. 907 matrix 647
Small, K.A. 1 868–869, 877–878, 879, 880, standard errors 247, 314–315, 669–670
881–882, 884–886, 887, 891, 892, 893, 896, bootstrapping 336–340
907, 913 cluster correction 770–772
Smith, V.K. 1072 standard utility maximizers 971–973
SMNL (scaled multinomial logit) model 111, 861 Starmer, C. 256, 819, 940
Model 1 (MNL multinomial model) 697, 698, Stat Transfer program 416
701–702 state dependence 829–830, 848, 851, 855, 961,
Model 2 (MXL mixed logit model) 697, 698, 967–968
701–702 stated choice see SC
Model 3 (GMXL generalized random parameter/ stated preference data see SP
mixed logit model) 697, 698, 701–702 stated preference experiments 14, 192–194, 202
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1186 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1187 Index
uncertainty and risk 906–907 utility functions 80, 90–91, 278, 439, 440
unconditional distribution 360–363 heuristics in 1059
unconditional parameter estimates 614, 644–645 utility heteroskedastic interpretation see EC (error
unconditional random parameters 614 components) model
under-sampling 12–13 utility level 19, 46
underlying influences 18 utility maximization calculus 1012
underlying perceptions 928 utility maximizing behavior 21
univariate distributions 127 utility modeling 45, 48–75, 81–83
univariate draws 851 utility scale 46
unlabeled alternatives 13 utility space 19
unlabeled choice data/experiments 14–15, 205–207 utility specification 120, 439
and ASCs 473–474, 475, 477–478, 480–481
beyond design attributes 478–483 value learning
and covariates 478–483 heuristic 960, 967–968, 1052–1054
covariates in 478–483 model 1054–1057
descriptive statistics 476 role of 1053
discrete choice data 472–483 Van de Kaa, E.J. 908, 913
interaction terms 479, 482 Van Houtven, G. 1072
introduction to 472–473 Vandebroek, M. 318–319
model results 475–478, 480–483 variable metric algorithms 186
models 473–478, 483–491 variable names 440–444
and non-constant parameter estimates 474–475 variables 32–39
socio-demographic variables 478–483 variance estimation 333–340
unlabeled discrete choice data, Nlogit syntax and variance-covariance matrix 639–640, 641, 850
output 483–491 variances of function 340–359
unobserved effect, normalization of 88–90 variety seeking 25
unobserved preference heterogeneity 848–849 VC (variance-covariance) matrix 13–14, 46, 89–90,
unobserved scale heterogeneity 992–993 91, 160–163, 181, 302, 303
unobserved utility coefficients 361 Verhoef, E. 798
unobserved variability 28–29 Vermeulen, F. 1072
utility 7, 45–48 VETTS 917, 920–922
ASC (alternative-specific constants) 51–52, Viney, R.E. Savage 277–278
53–54 Von Neumann, J. 905
cardinal utility 45–46 VTTS (value of travel time savings)
current/historical 961–962 APS influence 983
estimation of weighted LPLA and NLWLR evidence on value 979–981
decision rules 1062 plausible choice implications 1024
as latent construct 83 summary 671, 981
linear utility function 49 weighted average 986–987, 1044
marginal utility 49, 98, 199 Vuong, Q.H. 327
meta-utility 961 Vuong statistic/test 328–330
non-linear utility function 49
observed components 45, 48–75, 81–83, 88–90, WADD (weighted additive) decision strategy
91, 92 939–941, 943, 944–945, 965–966
ordinal utility 45–46 Wakker, P.P. 908, 912
part-worth utility 199, 210 Wald distance 323–324
single attribute utility 199–200 Wald statistics/test/procedure 323–326, 331, 381,
specific parameter estimates 49–51 460–461, 543, 573, 616, 753, 792
standard utility maximizers 971–973 Walker, J. 89–90, 928
unobserved components 45, 81–86, 92 Wambach, A. 756
see also satisfaction Wang, P. 137, 145–147
utility balance 312–313 Wardman, M. 248, 304–305, 884–886, 887
utility components 278 Watson, S.M. 305
utility expressions 120 Wedel, M. 190, 248, 276, 308–309
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015
1188 Index
Downloaded from Cambridge Books Online by IP 138.253.100.121 on Sun Jul 26 05:53:32 BST 2015.
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9781316136232
Cambridge Books Online © Cambridge University Press, 2015