BayesianStatisticsandMarketing ByRossiand Allenby
BayesianStatisticsandMarketing ByRossiand Allenby
BayesianStatisticsandMarketing ByRossiand Allenby
Marketing Science
Publication details, including instructions for authors and subscription information:
http://pubsonline.informs.org
This article may be used only for the purposes of research, teaching, and/or private study. Commercial use
or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher
approval, unless otherwise noted. For more information, contact [email protected].
The Publisher does not warrant or guarantee the articles accuracy, completeness, merchantability, fitness
for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or
inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or
support of claims made of that product, publication, or service.
2003 INFORMS
INFORMS is the largest professional society in the world for professionals in the fields of operations research, management
science, and analytics.
For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org
Bayesian Statistics and Marketing
Peter E. Rossi Greg M. Allenby
Graduate School of Business, University of Chicago, 1101 E. 58th Street, Chicago, Illinois 60637
Fisher College of Business, Ohio State University, 2100 Neil Avenue, Columbus, Ohio 43210
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
[email protected] [email protected]
B ayesian methods have become widespread in marketing literature. We review the essence
of the Bayesian approach and explain why it is particularly useful for marketing prob-
lems. While the appeal of the Bayesian approach has long been noted by researchers, recent
developments in computational methods and expanded availability of detailed marketplace
data has fueled the growth in application of Bayesian methods in marketing. We emphasize
the modularity and exibility of modern Bayesian approaches. The usefulness of Bayesian
methods in situations in which there is limited information about a large number of units
or where the information comes from different sources is noted. We include an extensive
discussion of open issues and directions for future research.
(Bayesian Statistics; Decision Theory; Marketing Models; Critical Review)
We will see how the Bayesian approach provides a provides a marked contrast to the sampling theo-
unied treatment of all three components. retic view in which we consider the data random,
We will follow these three steps as the outline of and we investigate the behavior of test statistics or
the paper, and conclude the paper with a discussion estimators over imaginary samples from py . The
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
of open issues and directions for future research. We Bayesian would regard the sampling distribution as
have also included Annotated Citations of Bayesian irrelevant to the problem of inference because it con-
Applications in Marketing in Appendix 1, which con- siders events y that have not occurred. Inference is
tains a list of published or accepted papers of the the problem of making statements about the unob-
last ten years that tackle marketing problems using servables conditional on the data.
Bayesian methods. The annotations provide a brief Since the posterior distribution can be a high-
description of the paper and how it relates to the top- dimensional object, investigators typically summarize
ics discussed in this paper. the posterior in terms of some lower dimensional
summary statistics. Typically, the posterior mean
E
= p y d is used as an estimator and the
2. Bayesian Essentials posterior standard deviation is used as a measure of
In this section, we introduce our notation for the
precision. Both of these quantities are the integrals of
Bayesian paradigm, and comment on the impor-
specic functions of the parameter vector, E y h
.
tant distinctions between classical and Bayesian
Other important examples include: (i) Aspects of the
approaches. We feel that these distinctions are under-
appreciated by researchers in marketing. We do not marginal distribution of one element or a subset of
attempt to provide a primer for Bayesian inference. the vector; (ii) posterior probabilities of intervals
For those interested in an introduction to Bayesian or regions of the parameter space (such as the pos-
inference and modern Bayesian computing methods, terior probability that a price coefcient is negative);
there are many excellent texts, including Bernardo and (iii) predictive distributions of the data, pyf y =
and Smith (1994), Gelman et al. (1995), Robert and pyf p y d. Thus, the Bayesian investigator is
Casella (1999), and Liu (2001). faced with the problem of computing a multidimen-
All Bayesian analysis starts with the specication sional integral of the posterior distribution. Methods
of the data-generating mechanism or the distribu- for computing these integrals are at the core of the
tion of the data y, given the unobservable parame- recent revolution in computing for Bayesian statistics.
ters , py . Viewed as a function of the parameters, The Bayesian framework is compelling in the sense
this distribution is sometimes called the likelihood that it provides a unied approach to modeling, incor-
function, l = py . The Bayesian, therefore, sub- poration of prior information, and inference. Inference
scribes to the likelihood principle that states that the here refers to making a posteriori statements about
likelihood function contains all relevant information all unobservables including both parameters and, as
regarding the model parameters. In addition, a proba- yet unrealized, data (prediction). Bayesian inference
bility distribution representing prior beliefs about is adheres to the likelihood principle and is conducted
required, p. Bayes theorem provides the updating using formal rules of probability theory. This means
mechanism for how prior beliefs are translated into that, under mild conditions, Bayes estimators are con-
posterior (or after the data) beliefs. sistent, asymptotically efcient, and admissible. As a
practical matter, Bayesian inference is free from the
py p
p y = py p use of asymptotic approximations and delivers exact,
py
nite sample inference. This is particularly important
p y is called the posterior distribution and reects in nonlinear models and models with discrete data.
both the prior beliefs, as well as sample information. The intuition developed for regression models of the
We note, immediately, that the posterior is a condi- sample size required for asymptotic sampling the-
tional distribution that conditions on the data. This ory to be accurate does not carry over well to many
of the models used with marketing data. In partic- Ansari et al. 2000a, Ter Hofstede et al. 2002) is impor-
ular, choice models may require extremely large (as tant to marketing problems. Classical inference pro-
much as 1,000 observations per parameter) samples cedures are silent on how to incorporate information
to insure the adequacy of asymptotic approximations from sources other than the data.
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
(cf. McCulloch and Rossi 1994). Some contend that specication of the likeli-
In general, Bayesian methods provide a better hood function is another drawback of the Bayesian
approximation to the level of uncertainty or, con- approach. For some models, evaluation of the like-
versely, the amount of information provided by the lihood can be computationally demanding. In other
model and the data than other approaches. For exam- situations, the investigator may be concerned with
ple, consider two-step procedures in which a subset model specication error induced by specifying an
of parameters are estimated in the rst stage, then inappropriate likelihood. Recent developments in sta-
the second stage estimates the remaining parameters, tistical computing have opened up the possibility of
conditional on the rst subset. Parameter uncertainty analyzing likelihood functions once thought to be
is difcult to account for in multistage analyses. Lenk computationally intractable. Regarding prior and like-
and DeSarbo (2000) provide an example of how a lihood specication, we recommend that the investi-
full Bayesian procedure outperforms an approximate gator perform sensitivity analysis.
two-step procedure for clustering problems. Parame-
ter uncertainty and model uncertainty are particularly
important considerations in optimal decision theory. 3. MCMC Simulation Methods
Optimal decision making should take into account The general computational problem facing Bayesians
uncertainty to avoid the problem of overcondence is the computation of various integrals of functions
(see Montgomery and Bradlow 1999). Bayesian deci- with respect to the posterior distribution. Since these
sion theory provides a unied approach to inference, integrals can be written as the posterior expectation
model choice, and uncertainty as discussed in 6 of
of a function of the parameters, simulation methods
this paper
seem natural candidates for approximation. For exam-
The advantages of Bayesian inference are not
ple, if we could make i.i.d. draws from the poste-
obtained without a cost, however. The Bayesian
rior we could simply approximate the integrals by the
approach is likelihood-based and requires a prior.
sample mean
Some have criticized Bayesian methods as relying on
subjective prior information. It is important to note
I = E y h
= hp y d
that the basis of prior information can also be objec-
tive or data-based. In addition, all modeling assump-
R
I = 1/R hr
tions are a form of prior information. The advantage r=1
of the Bayesian approach is that all prior assump-
tions are explicitly stated. Adherence to the princi- If draws from the posterior are available at low com-
ples of scientic inquiry does not rule out the use putational cost, we could simply use a very large
of subjective information but, rather, the specica- sample to approximate I to any desired degree of
tion of explicit and replicable procedures. It should accuracy. However, the general problem of draw-
be noted that in the practical domain of market- ing from an arbitrary multivariate distribution is
ing, methods that make full use of prior information extremely difcult and there is no computationally
are required for reliable inference because informa- feasible general method.
tion about unknown quantities is hard to come by. Instead of using i.i.d. draws, another approach
Prior information from experts (Sandor and Wedel could be to construct a Markov chain with the pos-
2001), theories (Montgomery and Rossi 1999), or other terior as its stationary or equilibrium distribution. In
datasets (Lenk and Rao 1990, Putler et al. 1996, practice, this means specifying a transition density
Kamakura and Wedel 1997, Wedel and Pieters 2000, that produces a sequence of draws. r is a draw
from pr r1 given 0 . If p y is the stationary The sequence of draws converges in distribution to
distribution of this Markov chain, then we can simply the joint posterior distribution of the model param-
iterate the chain long enough to dissipate the effects eters, p1 K y. In addition, draws from the
of the initial condition and then save these draws to posterior distribution for any one parameter,
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
distributions. One very important example is random ity to price. In the analysis of survey response data,
coefcient models, discussed below. Perhaps, more it is sometimes convenient to assume that responses
importantly, the modularity of the hierarchical mod- on a xed-point (e.g., ve- and seven-point) scale are
eling approaches, that dovetails so well with MCMC a censored realization of a latent, continuous vari-
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
methods, has enlarged the class of priors and likeli- able. Finally, in the analysis of multiple response data
hoods available for use in marketing applications. (i.e., pick any of J data), each element of the vector
of multivariate binomial responses can be thought of
as being equal to one if a latent variable surpasses a
4. Within-Unit Analysis: threshold, and equal to zero if the latent variable is
Likelihoods and Marketing Data less than the threshold value.
Marketing is concerned with understanding and The advantage of a latent variable approach to
reacting to the behavior of individual consumers. modeling marketing data is that it provides a exi-
Decisions are ultimately made at a disaggregate level, ble approach to specifying the likelihood function that
although for some types of decisions (e.g., setting is consistent with the observed, lumpy data (Rossi
store prices) an aggregate-level analysis is acceptable. et al. 2001, Marshall and Bradlow 2002). Models for
We note that it is always possible to derive aggre- the latent utility can be continuous even though the
gate predictions of actions by integrating over the dis- range of the observed dependent variable is discrete.
tribution of heterogeneity. Our discussion, therefore, Many useful models can be constructed starting from
focuses on data and models assuming that the unit an underlying multivariate normal regression model
of analysis is an individual respondent, consumer, or z = X + N 0
household.
Here, z is a m 1 vector, which is multivariate normal
Marketing data is sparse at the individual-unit
conditional on x. The latent vector, z, is censored via
level. In scanner panels of household purchases, for
some function which is not a function of the model
example, it is rare to have more than 20 observa-
parameters, . Examples include:
tions per household in most product categories. Each
Tobit Model: m = 1, y = 0, z < 0, y = z, z 0
observation is a vector response corresponding to
Ordered Probit: m = 1, y = r, cr1 z < cr , r = 1
the quantity purchased of a particular offering. The R, c0 = cR =
most frequent response value is zero, indicating no Multinomial Probit (MNP): y = j, zj = maxz1
purchase of the offering, and the second most fre- zm
quent response is one, indicating that one unit of Multivariate Probit: yj = 1, zj > 0; else yj = 0
the good is purchased. Responses also take on inte- These four examples illustrate the exibility of
ger values in surveys where respondents are asked the latent framework. The Tobit model produces a
to choose between discrete alternatives, to rank order discrete-continuous distribution for y given x that has
objects (Bradlow and Fader 2001), and to provide a lump of probability at zero (the no-purchase option,
responses on ve- and seven-point scales. Marketing for example). The ordered probit can be applied to
data are typically very lumpy, and are not well-suited ratings data in which the respondent provides ratings
to standard distributional assumptions (e.g., normal, on a ratings scale. The MNP probit model is a very
gamma, Poisson). exible general model that accommodates situations
Latent variable models are often used to explain in which choices are made from a set of m alterna-
marketing data. A latent variable model typically tives. Finally, the multivariate probit model can be
assumes that there exists an unobserved continuous used in situations such as the pick j from J alterna-
variable and a censoring mechanism that gives rise tives or where binary choice is made in different time
to the discrete outcome. In an economic model of periods or categories of products.
choice between near-perfect substitutes, for example, Latent variable models can often be given an eco-
consumers are assumed to select the offering with nomic interpretation as a random utility model. Con-
greatest value, measured as the ratio of marginal util- sider, for example, the MNP model. If consumers have
linear utility and can only choose one alternative, the problem. Classical econometricians have focused on
utility-maximizing choice is the choice for which the methods for approximating these integrals. The state
ratio of marginal utility to price is the highest; of the art in this area is the so-called GHK algorithm
(Keane 1993). The GHK algorithm uses importance
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
y = j# if Uj /pj = max%Ui /pi & sampling to approximate these probabilities. The cur-
rent classical practice involves using simulation meth-
where Ui is the marginal utility of choice i. In the
ods to approximate the likelihood (Huber and Train
random utility model, marginal utilities are not fully
2001) and then uses standard maximum likelihood
observable. We only observe various attributes of the
procedures, ignoring the simulation error (this is often
choice that are represented in the x vector. If lnUi =
called the simulated maximum likelihood approach).
Vi + i and i N 0 , then the model becomes
model, exact analytic results are available for the pos- elements of the covariance matrix (typically, the (1, 1)
terior of . Draws from the truncated multivari- element) to one. For Bayesian methods, the restric-
ate normal can easily be accomplished via one-by-one tion of the covariance matrix makes it difcult to use
draws from a series of univariate truncated normal standard conjugate priors such as the Wishart prior.
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
distributions (see McCulloch and Rossi 1994, Allenby McCulloch et al. (2000) show how to construct practi-
et al. 1995). This amounts to dening a subchain to cal priors on the appropriate space of matrices.
draw the truncated normal vector. What is important However, Bayesians are not limited to exact restric-
to note is that by augmenting with the latent variable, tions as a way of solving various identication prob-
we have avoided evaluation of any choice probabili- lems. Use of a proper prior distribution ensures that
ties or other integrals of the multivariate normal. The the posterior is proper, even if the likelihood is
cost of computational simplication is an enlargement not identied. In a Bayesian analysis, the issue of
of the state space for the Markov chain. In general, statistical identication shifts from an identiednot
this will cause the data-augmented MCMC method identied dichotomy, to an issue of the degree of iden-
to converge more slowly and exhibit higher auto- tication and to subspaces of the posterior distribu-
correlation than the non-data-augmented sampler. In tion that are well identied. For example, we can use
the case of the MNP model, Nobile (1998) has indi- a proper but diffuse prior in the unidentied param-
cated that, under certain conditions, the standard aug- eter space, and simply marginalize or project down
mented Gibbs sampler can exhibit very high autocor- on the space of identied parameters. The only added
relation and proposes an improved chain. cost of this procedure is making sure that the induced
Thus, data augmentation provides a clever way of prior on the identied quantities is sufciently diffuse
avoiding evaluation of various multivariate integrals to be usable in those situations in which we want our
at the possible expense of introducing high autocorre- inferences driven primarily by the data.
lation in to the MCMC method. Our experience, how- An even more striking example of the usefulness of
ever, has shown that the basic MNP Gibbs sampler this idea of navigating in the full, unidentied space
works well and can handle problems for which the can be found in the multivariate probit model. Here
method of simulated maximum likelihood grinds to the identied parameters consist only of the correla-
a halt. tion matrix of the latent variables because separate
scaling constants can be used for each element. Until
4.2. Identication recently (see Barnard et al. 2000) convenient priors
The latent variable formulation provides a natural for correlation matrices have not be available. Stan-
mechanism for understanding the identication prob- dard MCMC methods, such as Metropolis-Hastings,
lem in these models. Identication problems stem are difcult to adapt to the highly restricted space of
from the fact that various transformations of the valid correlation matrices (Manchanda et al. 1999). In
latent variables leave the observed censored outcome other words, it is hard to draw candidate correlation
variable unchanged. For example, recall that in the matrices. As Edwards and Allenby (2002) illustrate,
MNP model the choice is made with the highest all of this can be avoided by navigating in the uniden-
latent value. There are two transformations that leave tied space and projecting down to the space of cor-
the index of the maximum unchangedlocation and relation matrices (see also DeSarbo et al. 1999). These
scale shifts (see McCulloch and Rossi 1999, for fur- algorithms are fast and reliable.
ther details). Identication can be achieved either We have seen that disaggregate marketing data is
by imposing exact restrictions on the model param- often lumpy, containing discrete mass points of prob-
eters, or by employing informative priors on the full ability. A natural framework for building models with
parameter space and marginalizing on the identied discrete aspects is to use an underlying continuous
parameters. latent variable, coupled with some sort of censoring
In many classical and Bayesian approaches, the mechanism. Not only are latent variables useful for
approach to this scaling problem is to x one of the generating models but also the new MCMC Bayesian
inference methods nicely exploit the latent structure. means that the specication of the functional form
Finally, identication problems that are common in and hyperparameter for the prior may be important
latent variable models can be handled with great ex- in determining the inferences made for any one unit.
ibility in the Bayesian approach. A good example of this can be found in choice data
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
pooled model information matrix. The prior covari- In the hierarchical model, the prior induced on the
ance is scaled back to represent the expected informa- unit-level parameters is not an independent prior.
tion in one observation to insure a relatively diffuse The unit-level parameters are conditionally, but not
prior. Use of this sort of normal prior will induce a unconditionally, a priori independent.
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
unit-level maximum likelihood estimates %i &. For dif- If, for example, the second-stage prior on + is very
fuse prior settings, the normal form of the prior will diffuse, the marginal priors on the unit-level parame-
be responsible for the shrinkage effects. In particu- ters, i , will be highly dependent, as each parameter
lar, outliers will be shrunk dramatically toward the has a large common component.
prior mean. For many applications, this is a very The hierarchical model species that both prior and
desirable feature of the normal form prior. We will sample information will be used to make inferences
shrink the outliers in toward the rest of the param- about the common parameter, +. For example, in nor-
eter estimates and leave the rest pretty much alone. mal prior, i N V , the common parameters pro-
vide the location and the spread of the distribution
5.2. Hierarchical Models of i . Thus, the posterior for the i will reect a level
In general, however, it may be desirable to have the of shrinkage inferred from the data. It is important to
amount of shrinkage induced by the priors driven by remember, however, that the normal functional form
information in the data. That is, we should adapt will induce a great deal of shrinkage for outlying
the level of shrinkage to the information in the data units, even if the posterior of V is centered on large
regarding the dispersion in %i &. If, for example, we values.
observe that the %i & are tightly distributed about
some location or that there is very little information 5.3. Inference for Hierarchical Models
in each unit-level likelihood, then we might want Hierarchical models for panel data structures are
to increase the tightness of the prior so that the ideally suited for MCMC methods. In particular, a
shrinkage effects are larger. This feature of adap- Gibbs-style Markov chain can often be constructed
tive shrinkage was the original motivation for work by considering the basic two sets of conditionals:
by Efron and Morris (1975) and others on empiri- (1) i + yi
cal Bayes approaches in which prior parameters were and
estimated. These empirical Bayes approaches are an (2) + %i &
approximation to a full Bayes approach in which we The rst set of conditionals exploits the fact that
specify a second-stage prior on the hyperparameters the i are conditionally independent. The second set
of the conditional independent prior. This specica- exploits the fact that %i & are sufcient for +. That is,
tion is called a hierarchical Bayes model and con- once the %i & are drawn from (1), these serve as data
sists of the unit-level likelihood and two stages of to the inferences regarding +. If, for example, the rst-
priors. stage prior is normal, then standard natural conjugate
Likelihood: pyi i priors can be used, and all draws can be done one-
First-stage prior: pi + for-one and in logical blocks. This normal prior model
Second-stage prior: p+ h. is also the building block for other more complicated
The joint posterior for the hierarchical model is given priors. The normal model is given by
by
V
i N
p1 m + y1 ym h
A1
N
pyi i pi + p+ h
i V1 W . V
In the normal model, the %i & drawn from (1) are sion in the observables into the mean function
treated as a multivariate normal sample and standard
conditionally conjugate priors are used. It is worth = Bz + u
noting that in many applications the second-stage pri- u N 0 V
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
has been very popular in marketing, due to the inter- about the distribution of heterogeneity can be made
pretation of each mixture point as representing a directly with the set of unit parameters, %i &, with-
segment and to the ease of estimation. In addition, out attempting to identify or estimate the component
the nite mixture approach can be given the interpre- parameters.
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
tation of a nonparametric method as in Heckman and In many situations, we have prior information on
Singer (1982). Critics of the nite mixture approach the signs of various coefcients in the base model. For
have pointed to the implausibility of the existence of example, price parameters are negative and advertis-
a small number of homogeneous segments, as well ing effects are positive. In a Bayesian approach, this
as the fact that the nite mixture approach does not sort of prior information can be included by modify-
allow for extreme units whose parameters lie outside ing the rst-stage prior. We replace the normal dis-
the convex hull of the support points. The mixture of tribution with a distribution with restricted support,
normals approach avoids the drawbacks of the nite corresponding to the appropriate sign restrictions. For
mixture model, while incorporating many of the more example, we can use a log-normal distribution for a
desirable features. parameter which is restricted via sign by the repa-
The MCMC algorithm for the normal heterogeneity rameterization, = ln. However, note that this
model can easily be extended to handle the mixture change in the form of the prior can destroy some of
of normals model by appending indicator variables the conjugate relationships which are exploited in the
for the mixture component to the state space. Con- Gibbs-sampler. However, if metropolis-style methods
ditional on the indicator variables, the draws of the are used to generate draws in the Markov chain, it
normal component parameters are standard conjugate is a simple matter to directly reparameterize the like-
draws given the classication of the observations into lihood function, by substituting exp( ) for , rather
one of the K components. The indicator variables, than rely on the heterogeneity distribution to impose
conditional on all other parameters, have a multi- the range restriction. What is more important is to
nomial distribution with probabilities proportional to ask whether the log-normal prior is appropriate. The
the number of units assigned to the component and left tail of the log-normal distribution declines to zero,
the likelihood that the units parameters are from the insuring a mode for the log-normal distribution at
component distribution. a strictly positive value. For situations in which we
In mixture of components models, there is a generic want to admit zero as a possible value for the param-
identication problem, generally known as the label- eter, this prior may not be appropriate. Boatwright
switching problem. A model with a given sequence et al. (1999) explore the use of truncated normal pri-
of component parameters is observationally equiva- ors as an alternative to the log-normal reparameter-
lent to any permutations of this sequence of parame- ization approach. Truncated normal priors are much
ters. Component labels, therefore, require identifying more exible, allowing for mass to be piled up at zero.
restrictions for inference to occur. One solution to this Bayesian models can also accommodate struc-
problem is to put informative priors on the model tural heterogeneity, or changes in the likelihood
parameters (e.g., 1 > 2 > > K , which works well specication for a unit of analysis. The likelihood is
when the data are in agreement with the restriction. specied as a mixture of likelihoods:
However, if the data are not in agreement (e.g., the
pyit %ik & = r1 p1 yit i1 + + rK pK yit iK
components primarily differ in V , not , then the
prior can lead to a chain that is slow to converge and estimation proceeds by appending indicator vari-
(Frhwirth-Schnatter et al. 2003). It should be noted, ables for the mixture component to the state space.
however, that the presence of label-switching does Conditional on the indicator variables, the datum, yit ,
not affect inference about parameters of a particular is assigned to one of K likelihoods. The indicator
unit, i . If the normal component mixing distribution variables, conditional on all other parameters, have
is seen as a exible device for approximating some a multinomial distribution with probabilities propor-
unknown heterogeneity distribution, then inference tional to the number of observations assigned to the
component, and the probability that the datum arise prior are inferred from the data, the main focus of
from likelihood. Models of structural heterogeneity concern should be on the form of this distribution.
have been used to investigate intraindividual change In the econometric literature, the use of parametric
in the decision process due to environmental changes distributions of heterogeneity (e.g., normal distribu-
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
(Yang and Allenby 2000) and fatigue (Otter et al. tions) are often criticized on the grounds that their
2003). misspecication leads to inconsistent estimates of the
Finally, Bayesian methods have recently been used common model parameters (cf. Heckman and Singer
to relax the commonly made assumption that the unit 1982). For example, if the true distribution of house-
parameters, i , are i.i.d. draws from the distribution hold parameters were skewed or bimodal, our infer-
of heterogeneity. Ter Hofstede et al. (2002) employ a ences based on a symmetric, unimodal normal prior
conditional Gaussian eld specication to study spa- could be misleading. One simple approach would be
tial patterns in response coefcients: to plot the distribution of the posterior household
means and compare this to the implied normal distri-
pi + = pi %j 3 j Si & V bution evaluated at the Bayes estimates of the hyper-
where Si denotes units that are spatially adjacent to parameters, N E data
E V
. The posterior means
unit i. Since the MCMC estimation algorithm employs are not constrained to follow the normal distribu-
full conditional distributions of the model parame- tion because the normal distribution is only part of
ters, the draw of i involves using a local average the prior and the posterior is inuenced by the unit-
for the mean of the mixing distribution. Yang and level data. This simple approach is in the right spirit
Allenby (2002b) employ a simultaneous specication but could be misleading due to the fact that we do
of the unit parameters to reect the possible presence not properly account for uncertainty in the unit-level
of interdependent effects, due to the presence of social parameter estimates.
and information networks. Allenby and Rossi (1999) provide a diagnostic check
of the assumption of normality in the rst stage
= 5W + u of the prior distribution that properly accounts for
parameter uncertainty. To handle uncertainty in our
u N 0 6 2 I
knowledge of the common parameters of the normal
where W is a matrix that species the network, 5, is a distribution, we compute the predictive distribution
coefcient that measures the inuence of the network, of i for unit i , selected at random from the popu-
and u is an innovation. lation of households with the random effects distri-
bution. Using our data and model, we can dene the
5.5. Diagnostic Checks of the First-Stage Prior predictive distribution of i as follows:
In the hierarchical model, the prior is specied in a
i data = V p
( V data d dV
two stage process:
V
N V is the normal prior distribution.
Here (i
We can use our MCMC draws of V , coupled
pV
with draws from the normal prior, to construct an
In the classical literature, the normal distibution of estimate of this distribution. The diagnostic check is
would be called the random effects model and would constructed by comparing the distribution of the unit-
be considered part of the likelihood, rather than part level posterior means to the predictive distribution
of the prior. Typically, very diffuse priors are used for based on the model given above.
the second stage. Thus, it is the rst-stage prior which
is important, and will always remain important, as 5.6. Findings and Inuence on Marketing Practice
long as there are only a few observations available The last ten years of work on heterogeneity in
per household. Since the parameters of the rst-stage marketing has yielded several important ndings.
Researchers have explored a rather large set of rst- short panels typically found in marketing applica-
stage models with a normal distribution of het- tions, it may be difcult to identify much more
erogeneity across units. In particular, investigators detailed structure beyond that afforded by the normal
have considered a rst-stage normal linear regres- model. In addition, relatively short panels may pro-
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
sion (Blattberg and George 1991), a rst-stage logit duce a confounding of the nding of heterogeneity
model (Allenby and Lenk 1994, 1995), a rst-stage pro- with various model misspecications in the rst stage.
bit (McCulloch and Rossi 1994), a rst-stage Poisson If only one observation is available for each unit, then
(Neelamegham and Chintagunta 1999), and a rst- the probability model for the unit level is the mixture
stage generalized gamma distribution model (Allenby of the rst-stage model with the second-stage prior:
et al. 1999, Jen et al. 2003). The major conclusion is that
there is a substantial degree of heterogeneity across py + = py p + d
units in various marketing data sets. This nding of
a large degree of heterogeneity holds out substantial This mixing can provide a more exible probability
model. In the one observation situation, we can never
promise for the study of preferences, both in terms
determine whether it is heterogeneity, or lack of
of substantive and practical signicance (Ansari et al.
exibility that causes the Bayesian hierarchical model
2000). There may be substantial heterogeneity bias
to t the data well. Obviously, with more than one
in models that do not properly account for hetero-
observation per unit, this changes, and it is possi-
geneity (Chang et al. 1999), and there is large value
ble to separately diagnose rst-stage model problems
in customizing marketing decisions to the unit level
and deciencies in the assumed heterogeneity distri-
(see Rossi et al. 1996).
bution. However, with short panels there is unlikely
Yang et al. (2002a) investigate the source of brand
to be a clean separation between these problems, and
preference, and nd evidence that variation in the
it may be the case that some of the heterogeneity
consumption environment, and resulting motivations,
detected in marketing data is really due to lack of
leads to changes in a units preference for a product
exibility in the base model.
offering (see also, Arora and Allenby 1999). Motivat-
There have been some comparisons of the nor-
ing conditions are an interesting domain for research, mal continuous model with the discrete approxima-
as they preexist the marketplace, offering a measure tion approach of a nite-mixture model. It is our
of demand that is independent of marketplace offer- view that it is conceptually inappropriate to view
ings. Other research has documented evidence that any population of units as being comprised of only
the decision process employed by a unit is not nec- a small number of homogeneous groups and, there-
essarily constant throughout a units purchase (Yang fore, the appropriate interpretation of the nite mix-
and Allenby 2000) and response (Otter et al. 2003) his- ture approach is an approximation method. Allenby
tory. This evidence indicates that the appropriate unit and Rossi (1999) and Lenk et al. (1996) show some
of analysis for marketing is at the level that is less of the shortcomings of the nite-mixture model, and
aggregate than a person or respondent, although there provide some evidence that the nite-mixture model
is evidence that household sensitivity to marketing does not recover reasonable unit-level parameter esti-
variables (Ainslie and Rossi 1998) and state depen- mates. In contrast, Andrews et al. (2002) use sim-
dence (Seetharaman et al. 1999) is constant across ulated data to suggest that unit-level recovery is
categories. comparable between the normal- and nite-mixture
The normal continuous model of heterogeneity approaches.
appears to do reasonably well in characterizing this At the same time that the Bayesian work in the
heterogeneity, but there has not yet been sufcient academic literature has shown the ability to produce
experimentation with alternative models, such as the unit-level estimates, there has been increased inter-
mixture of normals, to draw any denitive conclu- est on the part of practitioners in unit-level analysis.
sions (see Allenby et al. 1998). With the relatively Conjoint researchers have always had an interest in
respondent-level part-worths and had various ad hoc Parameter inference is a simple case of the general
schemes for producing these estimates. Recently, the decision theory set-up, in which the loss is often taken
Bayesian hierarchical approach to the logit model has to be quadratic. In this case, the optimal action is
been implemented in the popular Sawtooth conjoint an estimator taken to be the posterior mean of the
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
great deal more spread out than the prior for Model 2, y is driven by the explanatory variables x and
this may result in Bayes factors which favor Model 2 parameters .
(this is certainly true in a limiting sense). In particular, py x
diffuse and improper priors can result in undened
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
6.4. Use of Alternative Information Sets As emphasized in 3, Bayesian methods are ideally
One of the most appealing aspects of the Bayesian suited for inference about the individual or disaggre-
approach is the ability to incorporate a variety of dif- gate parameters, as well as the common parameters.
ferent sources of information. All adaptive shrinkage Recall the prot function for the disaggregate decision
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
6.5. Valuation of Disaggregate Information Comparison of ;agg with ;disagg provides a metric for
Once a fully decision-theoretic approach has been the achievable value of the disaggregate information.
specied, we can use the prot metric to value the
information in disaggregate data. We compare prots
that can be obtained via our disaggregate inferences 7. Open Issues and Directions for
about %i & with prots that could be obtained using Future Research
only aggregate information. The prot opportunities Researchers have long noted the conceptual appeal
afforded by disaggregate data will depend on both of the Bayesian framework for inference and deci-
the amount of heterogeneity across the units in the sion making. However, the potential of the Bayesian
panel data, as well as the level of information at the approach was not realized due to computational con-
disaggregate level. straints. Without modern simulation-based methods,
To make these notions explicit, we will lay out researchers were restricted to a short list of likelihoods
the disaggregate and aggregate decision problems. and associated conjugate priors. The developments
of the last 15 years have freed us from computation directly on the purchase quantities. Models that
constraints, allowing for the analysis of virtually any explicitly recognize that purchases are made in antic-
model. We now can consider models once thought ipation of future consumption have recently received
to be impossible to compute, and we can use pri- attention. For example, Dube (2003) explains simulta-
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
ors of virtually any form. The only constraint now, neous purchases of different varieties via anticipation
is the ability of the data to identify model param- of changes in tastes over future consumption occa-
eters, rather than the ability of the analyst to conduct sions. Yang et al. (2002) consider a model in which
inference for this model. However, the recent devel- the utility derived from goods is dependent on the
opments have an even more profound impact than context of consumption. Erdem and Keane (2003) con-
simply freeing us from computational constraints. The sider dynamic models of consumer demand in which
nature of the MCMC methods emphasize a modular- households stockpile goods for future consumption.
ity in the construction of models, typically achieved All of these models are amenable to Bayesian analy-
through a combination of conditional distributions. sis via data augmentation in which latent variables,
These conditional distributions specify the nature such as consumption, are introduced into the infer-
of the relationships between observed variables and ence procedures.
allow for the construction of more complicated rela- Price search models are another example of a latent
tionships. Thus, the researcher can create a more com- process of great importance in marketing. Consumers
plex model simply by adding layers to the hierarchy. are not always fully informed about the prices of
Consider, as a simple example, the relationship choice alternatives and must engage in price search.
between sales and price. Much attention has been We do not observe this price search process directly
devoted to tting the conditional distribution of sales but only the outcomes. In a classical approach, such
y given price x. However, the actual decision pro- as Mehta and Srinivasan (2003), the likelihood for
cess is certainly not well represented by one condi- the search model must be evaluated by integrating
tional distribution. Many endorse the concept of a over all possible search paths. In a data augmenta-
latent consideration set (Chiang et al. 1999) in which tion approach, this integration can be achieved by
a product must rst be included in the consideration introduction of latent variables that represent search
set before a consumer evaluates the impact of price. possibilities. In an MCMC method for navigating
If w represents the consideration set, then the model the posterior distribution of search parameters and
has been enlarged to the two layers y w x, and w z, latent variables, we do not enumerate all possible
where the consideration set is inuenced by another search paths but, instead, navigate among paths of
variable z (e.g., advertising). In the end, the hierar- high posterior probability. We believe that MCMC
chical model species a special form for the condi- approaches, together with data augmentation, hold
tional distribution of y x z that allows exploration of great promise for analyzing models with very large
the intermediary conditional relationships. Moreover, latent state spaces such as price search models and
the specication of hierarchical conditional models is discrete dynamic programming models, in general.
consistent with process models of consumer behavior Many models of consumer behavior include
(e.g., McFadden 2001). threshold-like effects. For example, some models of
Consideration sets are only one example of a latent consideration set formation have screening rules in
process that intervenes between the measurements which a threshold level of an attribute is dened. The
of the marketing mix variables and the sales out- threshold levels are unobservable parameters, and the
come variable. Other important examples include likelihood over these parameters has discontinuities.
price search and consumption. In typical demand This rules out the use of standard derivative-based
data, we do not observe the consumption of goods but maximization methods. MCMC methods simply
merely their purchases. In much demand modeling require draws from various conditional posterior dis-
in marketing, this distinction is glossed over, and the tributions in order to navigate the parameter space.
demand model is based on a utility function dened Drawing from a distribution with a density that is not
continuous poses no special difculties. Gilbride and but that we must consider the joint distribution of all
Allenby (2003) illustrate how this can be implemented variables.
for choice models with conjunction and disjunctive The joint determination of both outcome and input
screening rules. These developments open many pos- variables poses considerable challenges for statistical
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
sibilities for analysis of models with threshold com- inference and modeling. Manchanda et al. (2003) con-
ponents. sider sales force problems in which the level of sales
Thus, hierarchical modeling methods achieve not force effort at a given account is a function of sales
only a great exibility as emphasized in the Bayesian response parameters. Price endogeneity is another
statistics literature, but also they are well-suited to example of a challenging problem that involves deriv-
the elaboration of various latent process views of ing the joint distribution of price, sales, and possi-
consumer behavior and decision making. We expect ble exogeneous variables. Computational difculties
research in marketing to focus on a better understand- have limited the use of likelihood-based methods and,
ing of the process by which the consumer makes buy- instead, instrumental variables procedures have been
ing decisions, in hopes of creating more realistic, yet commonly employed. We believe there is substan-
still parsimonious, models of behavior. tial room for improvement in this area by the use of
A major challenge facing marketing practitioners is likelihood-based Bayesian approaches. As an exam-
the merging of information acquired across a vari- ple, consider a model of demand and supply in which
ety of different datasets. For example, a rm may there are cost shocks and a common demand shock
have access to consumer purchase information, sur- that is used by retailers in setting prices. This model
vey information on a subsample of consumers, and has a likelihood that is the joint distribution of price
syndicated aggregate sales information. Marketplace and quantity sold. This joint distribution is derived
and survey data cannot be combined without some from the distribution of costs shocks and demand
view to the processes by which consumers make shocks. While the mapping from shocks to observ-
buying decisions and respond to survey instruments.
ables is an implicit nonlinear system of equations,
Bayesian methods will facilitate the integration of
there is no conceptual difculty with implementing
these data sources through the specication of a
a metropolis algorithm for this system. The modu-
common set of behavioral parameters and the pro-
larity of the metropolis style MCMC method means
cesses by which these are translated into either survey
that elaborating the model by adding, for example,
responses or purchase decisions.
consumer heterogeneity, is straightforward (see Yang
The observational data used in much of quanti-
et al. 2003).
tative marketing is derived from an environment in
which the outcome and input variables are jointly
determined. Marketing mix variables are set by man-
agers with a view toward optimizing some objective 8. Conclusion
function that includes the dependent variable. For We have emphasized the value of Bayesian meth-
example, prices may be set with some knowledge of ods in situations with limited information. While
either price sensitivity or price demand shocks. Direct the total amount of data available has exploded, the
marketing response data is obtained from samples of amount of information about any one consumer is
consumers who were selected in a nonrandom fash- likely to remain limited. The customization of mar-
ion, with a view toward maximizing response rates keting actions to ner and ner levels of aggregation
or protability. Sales forces are allocated using some requires the ability to make inferences in conditions
sort of heuristic that attempts to create an optimal of limited information and to characterize the level
allocation in which the marginal benet of further of uncertainty in these inferences. Thus, we expect
effort is equated to marginal cost. This means that we Bayesian methods will play a critical role in realiz-
cannot model just the conditional distribution of the ing the potential of micromarketing and any analysis
outcome variable, given the marketing mix variables, conducted at a microlevel.
Finally, there are a number of important problems Appendix: Annotated Citations of Bayesian
in marketing that are essentially pure prediction prob- Applications in Marketing
lems. Given a set of information on a consumer, This annotated bibliography represents the results of a search for
the prediction problem is to predict the response applications of Bayesian statistics in marketing. Only published
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
tion of variables and in the structure of relationships Allenby, Greg M., Robert P. Leone, Lichung Jen. 1999. A dynamic
between variables. However, structural theories are model of purchase timing with application to direct marketing.
typically silent on the exact parametric form of func- J. Amer. Statist. Assoc. 94 365374.
tional relationships or distributions. Again, there is Customer interpurchase times modeled with a heteroge-
neous generalized gamma distribution, where the distribu-
an opportunity for application of Bayesian nonpara- tion of heterogeneity is a nite mixture of inverse generalized
metric methods to the structural approach as well gamma components. The model allows for structural hetero-
(Kalyanam and Shively 1998, Shively et al. 2000). geneity where customers can become inactive.
In summary, Bayesian statistical methods offer an Allenby, Greg M., Neeraj Arora, James L. Ginter. 1998. On the
appealing set of tools to researchers in marketing. heterogeneity of demand. J. Marketing Res. 35 384389.
The Bayesian approach offers an integrated view of A normal component mixture model is compared to a nite
inference and decision making that is applicable to mixture model using conjoint data and scanner panel data. The
predictive results provide evidence that the distribution of het-
both theoretical and applied analysis. Moreover, the erogeneity is continuous, not discrete.
hierarchical modeling structure that is exploited in
Allenby, Greg M., Lichung Jen, Robert P. Leone. 1996. Economic
MCMC estimation methods is congruent with the-
trends and being trendy: The inuence of consumer condence
ories of behavior and offers a means of integrat- on retail fashion sales. J. Bus. Econom. Statist. 14 103111.
ing information across multiple data sources. Finally,
A regression model with autoregressive errors is used to
the computational advantages of Bayesian methods estimate the inuence of consumer condence on retail sales.
allow for study of high-dimensional data and com- Data are pooled across divisions of a fashion retailer to esti-
mate a model where inuence has a differential impact on pre-
plex relationships that are common in marketing. We
season versus in-season sales.
encourage our colleagues and students to experiment
with and apply Bayesian methods. Allenby, Greg M., Peter J. Lenk. 1995. Reassessing brand loyalty,
price sensitivity, and merchandising effects on consumer brand
choice. J. Bus. Econom. Statist. 13 281289.
Acknowledgments
The logistic normal regression model of Allenby and Lenk
Author Peter Rossi thanks the James M. Kilts Center for Marketing
(1994) is used to explore the order of the brand-choice process
support of this research. The authors thank Rob McCulloch and and to estimate the magnitude of price, display, and feature
Eric Bradlow for useful comments. The authors were inspired by advertising effects across four scanner panel datasets. The evi-
the writings and teachings of Arnold Zellner and Dennis Lindley dence indicates that brand-choice is not zero order, and mer-
throughout their careers. chandising effects are much larger than previously thought.
Allenby, Greg M., James L. Ginter. 1995. Using extremes to design Ansari, Asim, Kamel Jedidi, Sharan Jagpal. 2000. A hierarchical
products and segment markets. J. Marketing Res. 32 392403. Bayesian methodology for treating heterogeneity in structural
equation models. Marketing Sci. 19 328347.
A heterogeneous random-effects binary choice model is
used to estimate conjoint part-worths using data from a tele- Covariance matrix heterogeneity is introduced into a struc-
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
phone survey. The individual-level coefcients available in tural equation model, in contrast to standard models in
hierarchical Bayes models are used to explore extremes of the marketing, where heterogeneity is introduced into the mean
heterogeneity distribution, where respondents are most and structure of a model. The biasing effects of not accounting for
covariance heterogeneity are documented.
least likely to respond to product offers.
Arora, Neeraj, Greg M. Allenby. 1999. Measuring the inuence
Allenby, Greg M., Neeraj Arora, James L. Ginter. 1995. Incorpo-
of individual preference structures in group decision making.
rating prior knowledge into the analysis of conjoint studies.
J. Marketing Res. 36 476487.
J. Marketing Res. 32 152162.
Group preferences differ from the preferences of individuals
Ordinal prior information is incorporated into a conjoint
in the group. The inuence of the group on the distribution of
analysis using a rejection sampling algorithm. The resulting heterogeneity is examined using conjoint data on durable good
part-worth estimates have sensible algebraic signs that are purchases by a husbands, a wifes, and their joint evaluation.
needed for deriving optimal product congurations.
Arora, Neeraj, Greg M. Allenby, James L. Ginter. 1998. A hierarchi-
Allenby, Greg M., Peter J. Lenk. 1994. Modeling household pur- cal Bayes model of primary and secondary demand. Marketing
chase behavior with logistic normal regression. J. Amer. Statist. Sci. 17 2944.
Assoc. 89 12181231.
An economic discrete/continuous demand specication is
A discrete choice model with autocorrelated errors and con- used to model volumetric conjoint data. The likelihood func-
sumer heterogeneity is developed and applied to scanner panel tion is structural, reecting constrained utility maximization.
dataset of ketchup purchases. The results indicate substan-
Blattberg, Robert C., Edward I. George. 1991. Shrinkage estima-
tial unobserved heterogeneity and autocorrelation in purchase
tion of price and promotional elasticities: Seemingly unrelated
behavior.
equations. J. Amer. Statist. Assoc. 86 304315.
Allenby, Greg M. 1990a. Hypothesis testing with scanner data: The Weekly sales data across multiple retailers in a chain are
advantage of Bayesian methods. J. Marketing Res. 27 379389. modeled using a linear model with heterogeneity. Price and
Bayesian testing for linear restrictions in a multivariate promotional elasticity estimates are shown to have improved
predictive performance.
regression model is developed and compared to classical
methods. Boatwright, Peter, Robert McCulloch, Peter E. Rossi. 1999. Account-
Allenby, Greg M. 1990b. Cross-validation, the Bayes theorem, and level modeling for trade promotion: An application of a con-
small-sample bias. J. Bus. Econom. Statist. 8 171178. strained parameter hierarchical model. J. Amer. Statist. Assoc.
94 10631073.
Cross-validation methods that employ plug-in point
A common problem in the analysis of sales data is that
approximations to the average likelihood are compared to for-
price coefcients are often estimated with algebraic signs that
mal Bayesian methods. The plug-in approximation is shown to
are incompatible with economic theory. Ordinal constraints are
overstate the amount of statistical evidence. introduced through the prior to address this problem, leading
to a truncated distribution of heterogeneity.
Andrews, Rick, Asim Ansari, Imran Currim. 2002. Hierarchical
Bayes versus nite mixture conjoint analysis models: A com- Bradlow, Eric T., David Schmittlein. 1999. The little engines that
parison of t, prediction, and partworth recovery. J. Marketing could: Modeling the performance of World Wide Web search
Res. 8798. engines. Marketing Sci. 19 4362.
A simulation study is used to investigate the performance A proximity model is developed for analysis of the per-
of continuous and discrete distributions of heterogeneity in a formance of Internet search engines. The likelihood function
regression model. The results indicate that Bayesian methods reects the distance between the engine and specic URLs,
are robust to the true underlying distribution of heterogeneity, with the mean location of the URLs parameterized with a lin-
and nite mixture models of heterogeneity perform well in ear model.
recovering true parameter estimates. Bradlow, Eric T., S. Fader. 2001. A Bayesian lifetime model for the
Ansari, Asim., Skander Essegaier, Rajeev Kohli. 2000. Internet rec- Hot 100 Billboard songs. J. Amer. Statist. Assoc. 96 368381.
ommendation systems. J. Marketing Res. 37 363375. A time series model for ranked data is developed using a
latent variable model. The deterministic portion of the latent
Random-effect specications for respondents and stimuli variable follows a temporal pattern described by a general-
are proposed within the same linear model specication. The ized gamma distribution, and the stochastic portion is extreme
model is used to pool information from multiple data sources. value.
Bradlow, Eric T., Vithala R. Rao. 2000. A hierarchical Bayes model Kalyanam, Kirthi, Thomas S. Shively. 1998. Estimating irregu-
for assortment choice. J. Marketing Res. 37 259268. lar pricing effects: A stochastic spline regression approach.
A statistical measure of attribute assortment is incorporated J. Marketing Res. 35 1629.
into a random-utility model to measure consumer preference Stochastic splines are used to model the relationship
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
for assortment beyond the effects from the attribute levels between price and sales, resulting in a more exible specica-
themselves. The model is applied to choices between bundled tion of the likelihood function.
offerings.
Kalyanam, Kirthi. 1996. Pricing decision under demand uncer-
Chiang, Jeongwen, Siddartha Chib, Chakravarthi Narasimhan. tainty: A Bayesian mixture model approach. Marketing Sci.
1999. Markov chain Monte Carlo and models of consideration 15 207221.
set and parameter heterogeneity. J. Econometrics 89 223248.
Model uncertainty is captured in model predictions by tak-
Consideration sets are enumerated and modeled with a ing a weighted average where the weights correspond to the
Dirichlet prior in a model of choice. A latent state variable posterior probability of the model. Pricing decisions are shown
is introduced to indicate the consideration set, resulting in a
to be more robust.
model of structural heterogeneity.
Kamakura, Wagner A., Michel Wedel. 1997. Statistical data fusion
Chang, Kwangpil, S. Siddarth, Charles B. Weinberg. 1999. The
for cross-tabulation. J. Marketing Res. 34 485498.
impact of heterogeneity in purchase timing and price respon-
siveness on estimates of sticker shock effects. Marketing Sci. Imputation methods are proposed for analyzing cross-
18 178192. tabulated data with empty cells. Imputation is conducted in
an iterative manner to explore the distribution of missing
A random utility model with reference prices is exam- responses.
ined, with and without allowance for household heterogeneity.
When heterogeneity is present in the model, the reference price Kim, Jaehwan, Greg M. Allenby, Peter E. Rossi. 2002. Modeling
coefcient is estimated to be close to zero. consumer demand for variety. Marketing Sci. 21 223228.
DeSarbo, Wayne, Youngchan Kim, Duncan Fong. 1999. A Bayesian A choice model with interior and corner solutions is derived
multidimensional scaling procedure for the spatial analysis of from a utility function with decreasing marginal utility. Kuhn-
revealed choice data. J. Econometrics 89 79108. Tucker conditions are used to relate the observed data, with
utility maximization in the likelihood specication.
The deterministic portion of a latent variable model is spec-
ied as a scalar product of consumer and brand coordinates Lee, Jonathan, Peter Boatwright, Wagner Kamakura. 2003. A
to yield a spatial representation of revealed choice data. The Bayesian model for prelaunch sales forecasting of recorded
model provides a graphical representation of the market struc- music. Management Sci. 49 179196.
ture of product offerings.
The authors study the forecasting of sales for new music
Edwards, Yancy, Greg M. Allenby. 2003. Multivariate analysis of albums prior to their introduction. A hierarchical logistic
multiple response data. J. Marketing Res. Forthcoming. shaped diffusion model is used to combine a variety of sources
Pick any of J data is modeled with a multivariate pro- of information on attributes of the album, effects of marketing
bit model, allowing standard multivariate techniques to be variables, and dynamics of adoption.
applied to the parameter of the latent normal distribution.
Leichty, John, Venkatram Ramaswamy, Steven H. Cohen. 2001.
Identifying restrictions for the model are imposed by post-
processing the draws of the Markov chain. Choice menus for mass customization. J. Marketing Res. 38
183196.
Huber, Joel, Kenneth Train. 2001. On the similiarity of classical and
A multivariate probit model is used to model conjoint data
Bayesian estimates of individual mean partworths. Marketing where respondents can select multiple items from a menu. The
Lett. 12 259269. observed binomial data is modeled with a latent multivariate
Classical and Bayesian estimation methods are found to normal distribution.
yield similar individual-level estimates. The classical methods
condition on estimated hyperparameters, while Bayesian meth- Lenk, Peter, Ambar Rao. 1990. New models from old: Forecasting
ods account for their uncertainty. product adoption by hierarchical Bayes procedures. Marketing
Sci. 9 4253.
Jen, Lichung, Chien-Heng Chou, Greg M. Allenby. 2003. A Bayesian
The nonlinear likelihood function of the Bass model is com-
approach to modeling purchase frequency. Marketing Lett. 14
bined with a random-effects specication across new prod-
520.
uct introductions. The resulting distribution of heterogeneity is
A model of purchase frequency that combines a Poisson shown to improve early predictions of new product introduc-
likelihood with gamma mixing distribution is proposed, where tions.
the mixing distribution is a function of covariates. The covari-
ates are shown to be useful for customers with short purchase Lenk, Peter J., Wayne S. DeSarbo, Paul E. Green, Martin R.
histories or have infrequent interaction with the rm. Young. 1996. Hierarchical Bayes conjoint analysis: Recovery of
partworth heterogeneity from reduced experimental designs. standard shrinkage estimators that employ the distribution of
Marketing Sci. 15 173191. heterogeneity.
Fractionated conjoint designs are used to assess ability of Neelamegham, Ramya, Pradeep Chintagunta. 1999. A Bayesian
the distribution of heterogeneity to bridge conjoint analy- model to forecast new product performance in domestic and
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
ses across respondents to impute part-worths for attributes not international markets. Marketing Sci. 18 115136.
examined.
Alternative information sets are explored for making new
Manchanda, Puneet, Asim Ansari, Sunil Gupta. 1999. The shop- product forecasts in domestic and international markets, using
ping basket: A model for multicategory purchase incidence a Poisson model for attendance with log-normal heterogeneity.
decisions. Marketing Sci. 18 95114.
Putler, Daniel S., Kirthi Kalyanam, James S. Hodges. 1996. A
Multicategory demand data are modeled with a multi- Bayesian approach for estimating target market potential
variate probit model. Identifying restrictions in the latent with limited geodemographic information. J. Marketing Res.
error covariance matrix require use of a modied Metropolis- 33 134149.
Hastings algorithm.
Prior information about correlation among variables is com-
Marshall, Pablo, Eric T. Bradlow. 2002. A unied approach to con- bined with data on the marginal distribution to yield a joint
joint analysis models. J. Amer. Statist. Assoc. 97 674682. posterior distribution.
Various censoring mechanisms are proposed for relating Rossi, Peter E., Zvi Gilula, Greg M. Allenby. 2001. Overcoming scale
observed interval, ordinal, and nominal data to a latent linear usage heterogeneity: A Bayesian hierarchical approach. J. Amer.
conjoint model. Statist. Assoc. 96 2031.
McCulloch, Robert E., Peter E. Rossi. 1994. An exact likelihood Consumer response data on a xed-point rating scale are
analysis of the multinomial probit model. J. Econometrics 64 assumed to be censored outcomes from a latent normal distri-
217228. bution. Variation in the censoring cutoffs among respondents
allow for scale use heterogeneity.
The multinomial probit model is estimated using data aug-
mentation methods. Approaches to handling identifying model Rossi, Peter E., Robert E. McCulloch, Greg M. Allenby. 1996. The
identication are discussed. value of purchase history data in target marketing. Marketing
Sci. 15 321340.
Moe, Wendy, Peter Fader. 2002. Using advance purchase orders to
track new product sales. Marketing Sci. 21 347364. The information content of alternative data sources is eval-
uated using an economic loss function of coupon protability.
A hierarchical model of product diffusion is developed for The value of a households purchase history is shown to be
forecasting new product sales. The model features a mixture large relative to demographic information and other informa-
of Weibulls as the basic model, with a distribution of hetero- tion sets.
geneity over related products. The model is applied to data on
music album sales. Rossi, Peter E., Greg M. Allenby. 1993. A Bayesian approach to
estimating household parameters. J. Marketing Res. 30 171182.
Montgomery, Alan L. 1997. Creating micro-marketing pricing
strategies using supermarket scanner data. Marketing Sci. Individual-level parameters are obtained with the use of an
16 315337. informative, but relatively diffuse, prior distribution. Methods
of assessing and specifying the amount of prior information
Bayesian hierarchical models are applied to store-level scan- are proposed.
ner data. The model specication involves store-level demo-
graphic variables. Prot opportunities for store-level pricing Sandor, Zsolt, Michel Wedel. 2001. Designing conjoint choice exper-
are explored using constraints on the change in average price. iments using managers prior beliefs. J. Marketing Res. 28
430444.
Montgomery, Alan L., Eric T. Bradlow. 1999. Why analyst overcon-
dence about the functional form of demand models can lead The information from an experiment involving discrete
choice models depends on the experimental design and the
to overpricing. Marketing Sci. 18 569583.
values of the model parameters. Optimal designs are deter-
The specication of a function form involves imposing exact mined with an information measure that is dependent on the
restrictions in an analysis. Stochastic restrictions are introduced prior distribution.
via a more exible model specication and prior distribution,
resulting in less aggressive policy implications. Seetharaman, P. B., Andrew Ainslie, Pradeep Chintagunta. 1999.
Investigating household state dependence effects across cate-
Montgomery, Alan L., Peter E. Rossi. 1999. Estimating price elastic- gories. J. Marketing Res. 36 488500.
ities with theory-based priors. J. Marketing Res. 36 413423.
Multiple scanner panel datasets are used to estimate a
The prior distribution is used to stochastically impose model of brand choice with state dependence. Individual-level
restrictions on price elasticity parameters that are consistent estimates of state dependence effects are examined among cat-
with economic theory. This proposed approach is compared to egories.
Shively, Thomas A., Greg M. Allenby, Robert Kohn. 2000. A non- Yang, Sha, Greg M. Allenby, Geraldine Fennell. 2002a. Modeling
parametric approach to identifying latent relationships in hier- variation in brand preference: The roles of objective environ-
archical models. Marketing Sci. 19 149162. ment and motivating conditions. Marketing Sci. 21 1431.
Stochastic splines are used to explore the covariate speci- Intraindividual variation in brand preference is documented
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
cation in the distribution of heterogeneity. Evidence of highly and associated with variation in the consumption context and
nonlinear relationships is provided. motivations for using the offering. The unit of analysis is
shown be at the level of a person-occasion, not the person.
Steenburgh, Thomas J., Andrew Ainslie, Peder H. Engebretson.
Yang, Sha, Greg M. Allenby. 2003. Modeling interdependent con-
2002. Massively categorical variables: Revealing the informa-
sumer preferences. J. Marketing Res. Forthcoming.
tion in zipcodes. Marketing Sci. 22 4057.
The distribution of heterogeneity is modeled using a spatial
The effects associated with massively categorical variables, autoregressive process, yielding interdependent draws from
such as zip codes, are modeled in a random-effects specica- the mixing distribution. Heterogeneity is related to multiple
tion. Alternative loss functions are examined for assessing the networks dened with geographic and demographic variables.
value of the resulting shrinkage estimates.
Bernardo, Jose, Adrian F. M. Smith. 1994. Bayesian Theory. John Jen, Lichung, Chien-Heng Chou, Greg M. Allenby. 2003. A Bayesian
Wiley, New York. approach to modeling purchase frequency. Marketing Lett. 14
Blattberg, Robert C., Edward I. George. 1991. Shrinkage estima- 520.
tion of price and promotional elasticities: Seemingly unrelated Kalyanam, Kirthi. 1996. Pricing decision under demand uncer-
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
equations. J. Amer. Statist. Assoc. 86 304315. tainty: A Bayesian mixture model approach. Marketing Sci.
Boatwright, Peter, Robert McCulloch, Peter E. Rossi. 1999. Account- 15 207221.
level modeling for trade promotion: An application of a con- , Thomas S. Shively. 1998. Estimating irregular pricing effects:
strained parameter hierarchical model. J. Amer. Statist. Assoc. A stochastic spline regression approach. J. Marketing Res. 35
94 10631073. 1629.
Bradlow, Eric T., S. Fader. 2001. A Bayesian lifetime model for the Kamakura, Wagner A., Michel Wedel. 1997. Statistical data fusion
Hot 100 Billboard songs. J. Amer. Statist. Assoc. 96 368381. for cross-tabulation. J. Marketing Res. 34 485498.
, Vithala R. Rao. 2000. A hierarchical Bayes model for assort- Keane, Michael. 1993. Simulation estimation methods for lim-
ment choice. J. Marketing Res. 37 259268. ited dependent variable models. G. S. Maddala, C. R. Rao,
Chang, Kwangpil, S. Siddarth, Charles B. Weinberg. 1999. The H. D. Vinod, eds. Handbook of Statistics, Vol. 11. North Holland,
impact of heterogeneity in purchase timing and price respon- Amsterdam, The Netherlands.
siveness on estimates of sticker shock effects. Marketing Sci. Kim, Jaehwan, Greg M. Allenby, Peter E. Rossi. 2002. Modeling
18 178192. consumer demand for variety. Marketing Sci. 21 223228.
Chiang, Jeongwen, Siddartha Chib, Chakravarthi Narasimhan. Leichty, John, Venkatram Ramaswamy, Steven H. Cohen. 2001.
1999. Markov chain Monte Carlo and models of consideration Choice menus for mass customization. J. Marketing Res.
38 183196.
set and parameter heterogeneity. J. Econometrics 89 223248.
Lenk, Peter J., Wayne S. DeSarbo 2000. Bayesian inference for nite
Chib, Siddartha. 2003. Monte Carlo methods and Bayesian compu-
mixtures of generalized linear models with random effects.
tation: Overview. S. E. Fienberg, J. B. Kadane, eds. International
Psychometrika 65 93119.
Encyclopedia of the Social and Behavioral Sciences: Statistics. Else-
, Ambar Rao. 1990. New models from old: Forecasting prod-
vier Science, Amsterdam, The Netherlands. In press.
uct adoption by hierarchical Bayes procedures. Marketing Sci. 9
DeSarbo, Wayne, Youngchan Kim, Duncan Fong. 1999. A Bayesian
4253.
multidimensional scaling procedure for the spatial analysis of
, Wayne S. DeSarbo, Paul E. Green, Martin R. Young. 1996.
revealed choice data. J. Econometrics 89 79108.
Hierarchical Bayes conjoint analysis: Recovery of partworth
Dube, Jean-Pierre. 2003. Multiple discreteness and product differ-
heterogeneity from reduced experimental designs. Marketing
entiation: Demand for carbonated soft drinks. Marketing Sci.
Sci. 15 173191.
Forthcoming.
Liu, Jun S. 2001. Monte Carlo Strategies in Scientic Computing.
Edwards, Yancy, Greg M. Allenby. 2002. Multivariate analysis of
Springer-Verlag, New York.
multiple response data. J. Marketing Res. Forthcoming.
Manchanda, Puneet, Asim Ansari, Sunil Gupta. 1999. The shop-
Efron, Brad, Carl Morris. 1975. Data analysis using Steins estimator
ping basket: A model for multicategory purchase incidence
and its generalizations, J. Amer. Statist. Assoc. 70 311319.
decisions. Marketing Sci. 18 95114.
Erdem, Tulin, Micheal Keane. 2003. Brand and quantity choice , Pradeep K. Chintagunta, Peter E. Rossi. 2003. Response
dynamics under price uncertainty. Quantitative Marketing modeling with non-random marketing mix variables. Work-
Econom. 1 564. ing paper, Graduate School of Business, University of Chicago,
Frhwirth-Schnatter, Sylvia, Regina Tckler, Thomas Otter. 2003. Chicago, IL.
Bayesian analysis of the heterogeneity model. J. Bus. Econom. Marshall, Pablo, Eric T. Bradlow. 2002. A unied approach to con-
Statist. Forthcoming. joint analysis models. J. Amer. Statist. Assoc. 97 674682.
Gelfand, Alan E., Adrian F. M. Smith. 1990. Sampling-based McCulloch, Robert, Nicholas Polson, Peter Rossi. 2000. Bayesian
approaches to calculating marginal densities. J. Amer. Statist. analysis of the multinomial probit model with fully identied
Assoc. 87(June) 523532. parameters. J. Econometrics 99 173193.
Gelman, Andrew, John B. Carlin, Hal S. Stern, Donald B. Rubin. , Peter E. Rossi. 1994. An exact likelihood analysis of the multi-
1995. Bayesian Data Analysis. Chapman Hall, London. nomial probit model. J. Econometrics 64 217228.
Gilbride, Tim, Greg Allenby. 2003. Attribute-based consideration , . 1999. Bayesian analysis of multinomial probit model.
sets. Working paper, Ohio State University. Mariano, Weeks, Schuermann, eds. Simulation-Based Inference in
Heckman, James, Bernard Singer. 1982. A method for minimizing Econometrics. Cambridge University, Cambridge, U.K.
the impact of distributional assumptions in econometric mod- McFadden, Daniel. 2001. Economic choices. Amer. Econom. Rev. 91
els for duration data. Econometrica 5 2 271320. 351370.
Huber, Joel, Kenneth Train. 2001. On the similiarity of classical and Mehta, Nitin, Kannan Srinivasan. 2003. Price uncertainty and con-
Bayesian estimates of individual mean partworths. Marketing sumer search: A structural model of consideration set forma-
Lett. 12 259269. tion. Marketing Sci. 22 5884.
Montgomery, Alan L. 1997. Creating micro-marketing pricing Sawtooth Software. 2001. CBC hierarchical Bayes analysis tech-
strategies using supermarket scanner data. Marketing Sci. nical paper. Sawtooth Software Technical Paper Series,
16 315337. www.sawtoothsoftware.com.
, Eric T. Bradlow. 1999. Why analyst overcondence about the Schwarz, Gideon. 1978. Estimating the dimension of a model. Ann.
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
This paper was received July 28, 2002, and was with the authors 2 months for 2 revisions; processed by Pradeep Chintagunta.