0% found this document useful (0 votes)
3 views

nonparametric_application_of_bayesian_inference

This article evaluates the effectiveness of a nonparametric approach to Bayesian inference through two applications: predicting earnings based on educational choices and quantile regression. The nonparametric framework allows for decision-making under uncertainty without relying on asymptotic approximations, which can be misleading in certain cases. The authors highlight the advantages of this method, including its flexibility and the ability to incorporate parameter uncertainty in predictive distributions.

Uploaded by

Chen Wu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

nonparametric_application_of_bayesian_inference

This article evaluates the effectiveness of a nonparametric approach to Bayesian inference through two applications: predicting earnings based on educational choices and quantile regression. The nonparametric framework allows for decision-making under uncertainty without relying on asymptotic approximations, which can be misleading in certain cases. The authors highlight the advantages of this method, including its flexibility and the ability to incorporate parameter uncertainty in predictive distributions.

Uploaded by

Chen Wu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Nonparametric of

Applications
Bayesian Inference
Gary CHAMBERLAIN
Departmentof Economics, HarvardUniversity,Cambridge,MA 02138 ([email protected])
Guido W. IMBENS
Departmentof Economics and Departmentof Agriculturaland Resource Economics, Universityof California,
Berkeley,CA 94720 ([email protected])
This article evaluates the usefulness of a nonparametricapproachto Bayesian inference by presenting
two applications.Our first applicationconsiders an educationalchoice problem. We focus on obtain-
ing a predictivedistributionfor earnings correspondingto various levels of schooling. This predictive
distributionincorporatesthe parameteruncertainty,so that it is relevant for decision making under
uncertaintyin the expected utility frameworkof microeconomics.The second applicationis to quan-
tile regression. Our point here is to examine the potential of the nonparametricframeworkto provide
inferences without relying on asymptotic approximations.Unlike in the first application,the standard
asymptoticnormal approximationturns out not to be a good guide.

KEY WORDS: Bayesian inference; Dirichlet distributions;Nonparametricmodels; Semiparametric


models.

This article evaluatesin the context of two applicationsthe framework to provide inferences without making asymp-
usefulness of a nonparametricapproachto Bayesian inference. totic approximations.Unlike in the first application, in this
The basic approachis due to Ferguson(1973, 1974) and Rubin applicationthe standardasymptotic normal distributionturns
(1981). It has three key features. First, it has the basic bene- out to be a poor approximationto the sampling distribution
fits of Bayesian inferencein providinga well-definedposterior of the estimator in some cases. If the standardnormal dis-
distributionthat is an importantingredientin many decision tributionprovides a good approximationto the finite sample
problems.Second, it has some of the advantagesof semipara- distribution,posteriorintervalsobtainedthroughthe Bayesian
metric models used in frequentistanalyses by not relying on nonparametricapproachdiscussed in this article are close to
a tightly parameterizedlikelihood function, based, for exam- confidence intervals. When the large sample normal approxi-
ple, on a normal distribution.Third, it avoids pitfalls arising mationfails to providea good approximationto the finite sam-
in Bayesian analyses from using high-dimensionalparameter ple distribution,the interpretationof our posteriordistribution
spaces with flat or other conventional prior distributionsby is not affected.
using a prior distributionthat arguablyreflects lack of prior
knowledge. These three features are illustrated in the two
applications.
1. DIRICHLET
PRIORDISTRIBUTIONS
Our first applicationconsiders an educationalchoice prob- Here we present a concise review of the basic theory,
lem. Specifically, we look at an individual's decision on the extended to allow for parametersdefined by moment restric-
level of schooling when the individualis uncertainabout the tions, that is sufficient to follow the applications. For more
returnto schooling. Following Angrist and Krueger(1991) we details, see the work of Ferguson(1973, 1974), Rubin (1981),
allow for endogeneity of the schooling measure by using a Chamberlainand Imbens (1995), and Hirano (2002). There
quarterof birth dummy as an instrumentalvariable. A stan- is a family of probabilitydistributions{Po: 0 e }01, and we
dard parametricmodel would require distributionalassump- observe {Zi}in1, where the random variables Zi are indepen-
tions on the joint distributionof earningsand schooling given dently and identically distributedaccording to P0 for some
the instrument. On the other hand, standard instrumental unknown value of 0 in the parameterspace 0. To simplify
variables methods that do not require such assumptions do notation, let Z denote a random variable that is distributed
not lead to the predictive earnings distributionsrequiredfor according to P.. We assume that the distributionsP0 have
the educationalchoice problem. The Bayesian nonparametric a common, finite support, P0(Z = aj) = 0 (ji = 1, ... , J),
approachdiscussed in this article allows us to obtain a pre- where Oj denotes the jth component of 0, and we take 0
dictive distributionfor earnings correspondingto various lev- to be the unit simplex in RJ. Because J can be arbitrar-
els of schooling that incorporatesthe parameteruncertainty, ily large and our data are measuredwith finite precision, the
so that it is relevant for decision making under uncertainty finite support assumption is arguablynot restrictive.In fact,
in the expected utility frameworkof microeconomics.At the
same time in this applicationthis approachavoids strong dis-
tributionalassumptionswithout introducingstrong sensitivity
to the prior distribution. ? 2003 American Statistical Association
Journal of Business & Economic Statistics
The second applicationis to quantile regression. Our point January 2003, Vol. 21, No. 1
here is to examine the potentialof the nonparametricBayesian DOI 10.1198/073500102288618711

12
Chamberlainand Imbens: NonparametricApplicationsof Bayesian Inference 13

Ferguson's (1973) discussion does not rely on discreteness. and thus also Dirichlet with parametersnj + bj, j = 1, ... , J.
See also Hirano (2002). Withinthis family of Dirichlet priordistributionswe focus on
Typically we are interested in some function of 6 rather the improperprior distributionwith all the b--+ 0. There are
than elements of 6 itself: 3 = g(O), where the function g(.) three importantfeatures of this improperprior distribution.
may depend on the points of support {aj} =,. For example, First, the improperpriordistributionavoids the potentialpit-
we consider cases where g(.) is defined implicitly through fall in using the Dirichlet prior with large J and all of the
moment restrictions, bj
bounded away from zero. Because we rely on J being large to
J make the model flexible, this potentially would be an impor-
= 0, tant drawbackof the method. To see the problem,let 4) denote
Eoq(Z, 3) = r(aj, 3) . 0 (1)
j=1 the probabilitythat Z is in some set B: 4)-= Ej:ajB Oj. Then
the posteriordistributionfor 4) is a beta distributionwith
where q1is a given function with dimensionequal to that of/3,
and there is a unique solution for all 0 e ?. Although it may
appearto be restrictiveto limit this discussion to the case with E(4) Id) = E (nj + bj) (nj + bj),
the dimension of /3 equal to that of q1,one can apply the same j:aj EB j=1

approachto overidentifiedgmm models where the dimension


of q1 is higher than the dimension of/3 by augmenting the Var(4)Id)=E(4) Id)[1-E(4 Id)] 1+ j=1 (nj + bj) .
parameter vector and the moment functions. Specifically, let
y = (00 l 'F Fo, , A), and let Suppose bj = e > 0 for all j, and consider increasing the
number of support points while keeping the data d fixed.
{ Foq(Zg o)) Let the fraction of support points in B approach a limit
/vec(d0(Z9,f0)/do'- F0)
0(Z, y) = lvec((Z, o)(Z, go)'- A)
r:jE 1(a e B) -+ r as J-+oo. Then E(4) Id) -+ r,
Ir'A-1(Z,01)
Var(4)•d) -+ 0, and both prior and posteriordistributionof 4)
- F1)
become concentratedat r, regardlessof the data. In particu-
vec(D0(Z, /3)/0 lar, this argumentcovers a flat priorfor 6 (bj - 1), suggesting
Then the solution to Eil i,(zi, y) =0 gives the standardopti- that a flat prior distributiondoes not capture a lack of prior
mal two-step generalized method of moments estimator for informationvery well when J is large.
The second is computational.The algorithmfor eval-
/3, motivatingour interest in the posteriordistributionfor the uation of / = point defined
parameter defined as the solution to E[&(Z, y)] = 0. Our g(O) through moment functions takes
a particularlysimple form for the limiting posterior distribu-
proposed procedurewill give a posterior distributionfor this
tion that results from letting all the b -+ 0 in (3). Then the
parametergiven the data.
A second example concerns cases where g(-) is defined as Oicorresponding to the supportpoints aj not observed in the
the solution to an optimizationproblem, sample are all zero with posteriorprobabilityone. Let {Vi}i=l
be independentlydistributedaccordingto a standardexponen-
J
tial distribution[i.e., the gamma distribution.9(1, 1)]. Then,
3=-argminEo[p(Z,
t
t)] =argmint p(aj, t).-Oj, (2) for a given function A(.),
1= 1
n n
where p is a given scalar-valuedfunction and there is a unique U,
A(zi)Vi/ / Vi A(aj)Uj
solution for all 0 e ?. In both cases we obtain draws from the i=1 =1 j:nj>O j:nj>0
posteriordistributionof 3 by first drawingfrom the posterior
distributionof 0 and then solving (1) or (2). where Uj = Ei:zi=aj Vi -- (nj, 1), using the fact that a sum
We limit ourselves to prior distributionsin the Dirichlet of independent exponential random variables has a gamma
family with density distribution.Thus to simulate the posterior distributionof 3
based on (1), insteadof drawingfrom the posteriordistribution
J
6 and then solving
p(O) c•Hx for 06e (b1 > 0), (3) of
j= 1 J

• df(a j, l) . Oj --0,
which, with J free parameters bj, is fairly flexible. Simi- j=l
lar to the way the Beta distributionis the conjugate prior
distributionfor the parameterof a binomial distribution,the we draw sets of iid
exponentialrandomvariables{Vi('}in and
Dirichlet distributionis the conjugate prior distributionfor solve
the parametersof a multinomial distribution.Let d = {zi}i=
denote the data, that is, the observed values of the Zi, and let SO(Zi, •(')) Vi(') -- O, (4)
n - 2in, 1(zi = a1) be the number of sample observations i=l
equal to a1. The posteriordensity is proportionalto the prod-
uct of the prior density and the likelihood function, and similarly for /3 based on (2) we solve

J3() -
argmin (zi, t) Vi(/. (5)
Pn(I d) oH0nI+b -1
i=1
j=1
14 Journalof Business & EconomicStatistics,January2003

Repeatingthis for 1 = 1,... , L gives us L independentdraws Barberis (2000). In this first example the large sample nor-
from the posteriordistributionof /3. Rubin (1981) developed mal approximationto the sampling distributioncan be used
this simulationalgorithm(using a representationfor the ratio to approximatethis posteriordistributionfairly accurately.If,
of exponentials to the sum of exponentials as gaps in order however,the objectiveis a posteriordistributionfor the param-
statistics from a uniform distribution),and it was applied by eter of interest,then our procedureis more direct than having
Lancaster(1994) in the analysis of choice-based samples. to first approximatea sampling distributionby a normal dis-
The third issue is that the improperprior distributionfor 0 tributionand then to arguethat this normaldistributioncan be
does not imply a uniquepriordistributionfor the parameterof used to approximatea posteriordistribution.
interest.Althoughfor properpriordistributionsfor 0 the prior We use a very simple model relating earnings and school-
distributionfor / is well defined, the limiting prior distribu- ing with a constant, additive treatmenteffect, linear in years
tion for / as the bj -+ 0 depends on the limits of the ratios of schooling. An individual may choose schooling levels by
bj/bt. To see this, consider the example discussed in which maximizing expected lifetime discounted utility, with utility
we are interestedin 4), the probabilitythat Z is in some set depending on earnings at various schooling levels as well as
B: 4)= Ej:aieB Oj.For fixed bj the priormean of 4)is E(4)) = costs associated with schooling. Such a decision requiresthe
Ej:aj
B/bj/EjI bj. As we let the bj -+ 0, the limiting mean posteriordistributionof earningsat the relevantschooling lev-
depends on the limit of the ratios of bj/bz. The posteriormean els as one of the inputs. The potentialoutcome with treatment
is E(( Id)= Ej:ajeB(nflj+bj)/Ej (l(njJ+bj), which, after tak- level s is Ys = Yo+ ys, where Y0 is the potential outcome
ing the limit bj -+ 0, L
equals j:aB nj/ jt1 nj, which does with treatmentlevel 0 and y is the unknownreturnto school-
not depend on the limit of the ratios bj/bz. As this example ing, common to all individuals and common to all school-
illustrates,it is importantto understandthe implicationsof the ing levels. The actual treatmentlevel is X, which gives an
choice of the limiting Dirichlet distribution.To measure the actual outcome Y of Y = Yo+ yX. Let a be the population
informativenessof the priordistributionfor /3,we proposecal- mean of Y0, and define the disturbanceU = Yo- a so that
culatingthe expected posteriordistributiongiven a small num- E0(U) =0. The instrumentalvariableW satisfies E0(WU) =0
ber m of observations,where we take the expectationover the and Cov0(W, X) 0 0. We are abstractingfrom the presence
empiricaldistribution.Let Fn denote the empiricaldistribution of exogenous covariates-they could be incorporatedinto the
of our sample: Fn(B) = I Enl l(zi e B). Let 7( I- {til}m=1) presentedanalyses without any problems.
denote the posteriordistributionfor 3 based on the m obser- Let Z = (Y, X, W) and P' = (a, y). Then /3 satisfies the
vations Zi = ti (and assume for a moment that this posterior moment condition
Eof(Z, /3) = 0 with
distributionis proper).The expected posteriordistributionfor
3based on a random sample (with replacement)of size m
from Fn is given by
*m(-)= f i7(- {t}ilm=) U=l dFn(ti).
To q(Z,/3) = (Y - a- yX) W"
allow for the possibility of an improperposteriordistribution, (W)
we modify this formulaas
Assuming finite supportfor the distributionof Z, we use the
m improperDirichlet prior [with all the bi-* 0 in (3)] for the
7!= (- I ti )1({ti , ECm) H dFn(ti) parametersof this, and the posteriordistributionof /3 can be
i=1 simulatedas in (4).
m Our data is a subset of the data used by Angrist and
fl({tilm,
E CQm) dFn(ti), (6) Krueger(1991) containingmales born in either the first or the
i=1 fourth quartersbetween 1930 and 1939. The sample size is
=
where the set Cm consists of the points {t}i{l such that n 162,515. The outcome variable Y is the log of weekly
earningsin 1979. The treatmentX is years of schooling com-
7T(.I {ti}m=l)is a properdistribution.If the prior distribution
is not very informativefor /3, different small samples {ti}m=l pleted, and the instrumentalvariable W is an indicatorequal
could lead to very differentposteriordistributions,and thus the to one if the individual was born in the fourth quarterand
to zero otherwise.
average posterior distributionshould be relatively dispersed. equal
If we find, therefore,that this average small sample posterior First we evaluate the informationcontent of the prior dis-
distributionis dispersed comparedto the full posteriordistri- tribution for the parameterof interest y. To do so, we cal-
bution, we interpretthat as evidence that our priordistribution culate the expected posterior distributionim? as in (6), with
does not dominatethe data. m = 10 observations. We compare these expected posteriors
with the actual posteriordistributionbased on the full sample
with n = 162,515 observations.Here are some of the quantiles
2. INSTRUMENTALVARIABLES for the y distributions:
The first application illustrates how the described general
quantile: .025 .05 .25 .50 .75 .95 .975,
method can generate posterior distributionswithout tightly
parameterizedmodels. Such a posteriordistributionis called • '0:-2.43 -1.02 -.09 .07 .23 1.22 2.51,
for to include parameteruncertaintyin the decision making
n( d): .047 .054 .075 .089 .104 .124 .132,
formulation;see, for example, the work of Rossi, McCulloch,
and Allenby (1995), Kandel and Stambaugh (1996), and N(.089, .0212): .048 .055 .075 .089 .103 .124 .130.
Chamberlainand Imbens: NonparametricApplicationsof Bayesian Inference 15

It appearsthat the prior distributionis reasonablyuninforma- Table1. QuantileRegression Coefficientsfor Log of Duration,
tive for y, so that the posteriordistributionmainly reflects the KentuckyHighand Low EarningsGroupsPooled
sample information.
Quantile
The instrumental-variablesestimate 9 [i.e., the solution to
n
inlr(zi, /) = 0, where 3' = (a^,9)] is .089. An asymp- Variables .10 .25 .50 .75 .90 OLS
totic approximationto its sampling distribution(allowing for
heteroscedasticityof unknown form) gives a normal distribu-
Intercept -5.555 -3.067 -1.749 -.811 -1.239 -1.994
(.817) (.497) (.403) (.490) (.692) (.410)
tion with mean y and standarddeviation .021. A normal dis-
Afterincrease .136 .141 .164 .170 .137 .145
tributionwith mean .089 and standarddeviation .021 would *Highearnings (.102) (.057) (.053) (.060) (.088) (.051)
provide a good approximationto our posteriordistribution. group
Afterincrease -.008 -.039 -.029 .013 .074 .000
(.073) (.042) (.034) (.040) (.057) (.033)
REGRESSION
3. QUANTILE
Highearnings 1.755 .525 .024 -.792 -3.191 -.696
The second applicationillustrateshow the posterior distri- group (1.352) (.931) (.771) (1.014) (1.692) (.806)
bution can be well defined when standardapproximationsto NOTE: The dependentvariablein In(.5+duration).The sample size is 5,349. The additional
the sampling distributionare not appropriate.Let Z = (X, Y), regressorsare Ln(previouswage), Ln(previouswage) * Highearningsgroup, Male,Married,
where Y is scalar and X is K x 1. We can define a lin- Ln(age),Ln(totalmedicalcosts), Hospitalstay indicator;Industryindicators:Manufacturing,
Construction;Injurytype indicators:Head, Neck, Upperextremities,Trunk,Lowback, Lower
ear predictor correspondingto the rth quantile as follows: extremities,Occupationaldiseases. The omittedindustryis otherindustries,and the omitted
E*(Y I X = x) =- 'x, where injuryis otherinjuries.

/ = argminE0o[c,(Y - t'X)]
t The amount of the weekly benefit is based on a schedule
- - >
c(t) = t IV[(1 ) l(t < 0) +r. l(t 0)]. that determines the benefit as a function of previous earn-
ings. The schedule has a ceiling, with earningslevels above a
(/ in general depends on r, but this should be clear from the thresholdcorrespondingto the same weekly benefit. Kentucky
context.) If r = .5, then this reduces to minimizing the mean raised the maximum benefit from $131 to $217 per week on
absolute error:mintE0(I Y - t'X 1). By weighting the absolute July 15, 1980.
error differently for positive and negative values, the check Meyer et al. workedwith claims with injurydates duringthe
function c (.) extends this notion of linear predictorto other year before or the year after the change in the benefit sched-
quantiles.The role of the check functionin quantileregression ule. They also limited the sample to a high earnings group
was developed by Koenkerand Bassett (1978, 1982). and a low earningsgroup. The weekly benefit amountfor the
Our simulation procedure produces independent draws high earnings group was affected by the increase in the ben-
{(I)} L_ from the posterior distributionof P. To obtain f/(), efit ceiling, whereas the benefit amount for the low earnings
first take iid draws {Vi( }i1 from a standardexponentialdis- group was not affected. Thus the low earningsgroup can pro-
tribution.Then solve vide a control for period effects. The basic specificationin the
n work of Meyer et al. is
()
- argmin Vi c,(yi-' t'xi),
Eo(YIX = x)
3(l) = + 02 x2 x3 +03
t
i=1 /l "x2 +04 'x3 (7)
where the observed value of Zi is zi = (xi, yi). The computa- =
(xl = 1 denotes a constant). Here Y log of duration,with
tions are simplified by exploiting the fact that rc,(t) = c,(rt) durationmeasuredby weeks of temporarytotal benefits paid;
if r >0. Thus define YiM= Vily and X1) = Vi()xi. Then =
x2 1 if injuredafter the benefit increase, x2 0 otherwise;
=

/3()
-
argmin c,(Yi(l) - t'X ).
t Table2. QuantileRegression Coefficientsfor Duration,KentuckyHigh
i=1
and Low EarningsGroupsPooled
This is a linear programming problem, and we use the
Barrodale-Roberts(1973) modification of the standardsim- Quantile
plex algorithm. Variables .10 .25 .50 .75 .90 OLS
Our applicationis based on the work of Meyer, Viscusi, and
Durbin (1995), who obtained data for two states, Kentucky Intercept -6.199 -7.258 -8.972 -11.566 -19.848 -25.886
and Michigan, on a random sample of indemnity claims. We (1.157) (1.441) (1.779) (3.310) (7.254) (8.412)
focus on Kentucky.The claims were filed by workersseeking After increase .229 .302 .873 1.351 2.661 1.665
*High earnings (.143) (.165) (.230) (.554) (1.339) (1.043)
compensation for work-relatedinjury or illness. Meyer et al. group
concentrateon temporarytotal disability claims. Such a claim After increase -.052 -.032 -.116 .122 .498 .457
is filed when the person is unable to work but is expected to (.085) (.097) (.138) (.289) (.629) (.674)
recoverfully and returnto work. The datainclude date injured, High earnings .051 -.356 -1.655 -11.541 -56.802 -41.783
durationof temporarytotal benefits, total medical costs, pre- group (2.546) (2.848) (3.528) (9.299) (27.400) (16.539)
vious wage, weekly benefit amount,type of injury (body part
NOTE: The dependent variable is duration (in weeks). The sample size is 5,349. The addi-
affected and the type of damage), age, sex, maritalstatus, and tional regressors are the same as those in Table 1. The omitted industry is other industries,
an industrycode. and the omitted injury is other injuries.
16 Journalof Business & EconomicStatistics,January2003

400

0 300-
0
%200-
.0
E
z 100-

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
Weeks

Figure1. PosteriorHistogram.q = .5 (long list for x).

X3 1 if high earningsgroup,x3 = 0 otherwise.The key coef- Table 2 presents results using duration out of work (in
=

ficient is 032, measuring the effect of the benefit increase on weeks) instead of its logarithm. Now the estimates show a
time out of work, with controls for period and for the earnings substantialincrease as we go from low to high quantiles,sug-
group: gesting that the effect of the benefit increase is concentrated
on the upper half of the durationdistribution.The estimated
[Eo(YIx2 = 1,x3 1) Eo(YIx2 = 0, x3 = 1)]
2= = - effect on the median of the distributionis .87 weeks, with a
standarderrorof .23. In contrast,the least squaresestimate of
- [Eo(YI I
x2= 1,x3= 0)- Eo(Yx2= 0, x3= 0)]. the effect on the mean of the distributionis quite imprecise,
with a point estimate of 1.66 and a standarderrorof 1.04.
An appealing aspect of the Meyer et al. analysis is that it is The histogramof the draws from the posteriordistribution
plausible to regard the injury date, and hence the applicable of 02 is shown in Figure 1 for r = .5, using durationin weeks.
benefit schedule, as if it were randomlyassigned. The posteriormean is .87, and the posteriorstandarddeviation
To account for possible changes in the composition of the is .23. Thus
assuming the posteriordistributionis normaland
sample afterthe benefit increase, Meyer et al. includedregres- using .87 + 1.96 x .23 gives a probabilityintervalclose to the
sion controls for attributesof the individual,the job, and the one we constructedwithout
assuming normality.
injury-16 regressors in addition to the 4 in (7). The last We examine the influence of the prior distributionby cal-
column of Table 1 presents least squares estimates (and con-
culating the expected posterior distribution#,mas in (6), for
ventional standarderrors) correspondingto Table 6 in the m = 21
observations,and comparingthis distributionwith the
Meyer et al. work. The first five columns of Table 1 present posterior distribution4f(. I d) based on the full sample with
estimates of the linear predictorcoefficients correspondingto n =
5,349 observations.Here are some of the quantilesof the
the .10, .25, .50, .75, and .90 quantiles. These estimates are
02 distributionsfor r = .5, using durationin weeks:
based on the simulationproceduredescribedearlier.The point
estimates are posterior medians, and the standarderrors in quantile: .025 .05 .25 .50 .75 .95 .975,
parenthesesare constructedso that the point estimate plus or 12.
minus 1.96 standarderrorsgives an intervalwith a .95 poste- -290 -157 -20.4 1.01 24.3 184 323,
rior probability.The key coefficients [correspondingto /2 in .41 .49 .71 .87 1.03 1.25 1.32.
In2(. [ d):
(7)] are in the second row. The effect of the benefit increase
is fairly constant across the quantiles, suggesting a location The priordistributionis dominatedby the sample information.
model in which the distributionof log durationshifts rigidly Now consider dropping all the predictor variables except
in response to the benefit increase. for the four that appearin (7): 1, x2 x3, x2, x3. We compare

3000

2500
ri,

T 2000-

&1500
.M
E 1000-
z
-
500
0-
-1 -0.5 0 0.5 1 1.5 2 2.5 3
Weeks

Figure2. PosteriorHistogram.q = .5 (shortlist for x).


Chamberlainand Imbens: NonparametricApplicationsof Bayesian Inference 17

1000

c 800-
o 600-

.8 400 -
E
z 20 -

-2 0 2 4 6 8 10 12 14 16
Weeks

Figure3. PosteriorHistogram.q = .9 (shortlist for x).

the expected posteriordistributionfor m = 5 observationswith residual is much closer to being satisfied, and the standard
the posterior distributionbased on the full sample. Here are large sample approximationto the sampling distributionis
quantilesof these distributionsfor /2 with r = .5, using dura- more accurate.In contrast,our posteriordistributionsprovide
tion in weeks: straightforwardinferences that do not rely on the approximate
normalityof a sampling distribution.
quantile: .025 .05 .25 .50 .75 .95 .975,
r,.2: -121 -36 -6 1 9 59 110, 4. CONCLUSION
32(. Id): 0 0 1 1 2 2 2. The Bayesian approachto inference provides an attractive
The posteriorhistogramfor /2 is in Figure2. It is concentrated conceptual framework because of its connection with opti-
mizationconcepts in decision theory and its lack of relianceon
on just four points: -1, 0, 1, and 2 weeks, with posteriorprob-
abilities of .01, .14, .55, and .30. This reflects the discreteness large-sampleapproximations.In practice,its use has been lim-
of the benefit durationdistribution.The upper tail of that dis- ited by the requirementof a fully specified parametricmodel
because many econometric models are only partly specified.
tribution is somewhat continuous, but 56% of the distribu-
In this articlewe presentedtwo applicationsof a less paramet-
tion is concentratedon the integers from 0 to 4 weeks. The
ric Bayes approachthat are due to Ferguson(1973, 1974) and
(.5, .75, .9, .95, .975) quantiles are (4, 8, 15, 25, 49) weeks. Rubin (1981). In the first application, the decision-theoretic
Includingthe long list of predictorvariablessmoothes out this nature of the underlyingquestion forces the use of posterior
discreteness in the outcome variable, in the sense of produc-
distributionsratherthan sampling distributions.In the second
ing a residualdistribution(for Y - P'X) that is much closer to
application, the assumptions underlying the asymptotic nor-
being continuous.
Here are the quantiles of the 12 distributionsfor r = .9, mality of the sampling distributionsare clearly violated, but
inference based on posteriordistributionsis straightforward.
using just the four regressorsin (7) and durationin weeks:

quantile: .025 .05 .25 .50 .75 .95 .975, ACKNOWLEDGMENTS


,.2: -145 -41 -7 1 10 72 124, The authors thank David Cox, Jinyong Hahn, and Neil
"7f2(. Id): 2 3 5 7 8 11 12. Shephardfor helpful comments and Alan Kruegerand Bruce
Meyer for making their data availableto us. The National Sci-
The posteriorhistogramfor 32 is in Figure 3. This is closer to ence Foundationprovided financial support.
a normal distribution,correspondingto the continuity in the
upper tail of the durationdistribution. [Received December 2000. Revised June 2001.]
The standard asymptotic distribution theory for quantile
regressionrequiresthat the distributionof the residual Y - P'X
(conditional on 0) be absolutely continuous with a positive REFERENCES
density in a neighborhoodof zero. This requirementmay be Angrist, J., and Krueger,A. (1991), "Does Compulsory School Attendance
satisfiedbecause the distributionof Y conditionalon X is con- Affect Schooling and Earnings?,"QuarterlyJournal of Economics, 106,
tinuous. Alternatively,even if Y is discrete, it may be satis- 979-1014.
Barberis, N. (2000), "Investing for the Long Run when Returns are Pre-
fied because X'1 is continuous. For example, with Y binary dictable,"Journal of Finance, 55, 225-264.
and X uniform on [0, 1], and E[Y I X] = X, the limiting dis- Barrodale,I., and Roberts,F. (1973), "AnImprovedAlgorithmfor Discrete I!
tributionof the coefficient in a quantile regression is normal LinearApproximation,"SIAMJournal of NumericalAnalysis, 10, 839-848.
Chamberlain,G., and Imbens, G. (1995), "SemiparametricApplications of
despite the binary nature of Y. In our example Y is discrete Bayesian Inference,"Discussion Paper 1716, HarvardInstituteof Economic
with most mass concentratedon a few values. With only three Research Cambridge,MA.
binary regressors, the resulting distributionof the residual is Ferguson, T. (1973), "A Bayesian Analysis of Some NonparametricProb-
lems," The Annals of Statistics, 1, 209-230.
still highly discrete. With the long list of regressors,although Ferguson,T. (1974), "PriorDistributionson Spaces of ProbabilityMeasures,"
many of them are discrete, the continuity requirementfor the The Annals of Statistics, 2, 615-629.
18 Journalof Business & EconomicStatistics,January2003

Hirano, K. (2002), "SemiparametricBayesian Inference in Autoregressive Meyer, B., Viscusi, W. K., and Durbin, D. (1995), "Workers'Compensation
Panel Data Models,"Econometrica,70, 781-800. and InjuryDuration:Evidence Froma NaturalExperiment,"AmericanEco-
Kandel,S., and Stambaugh,R. (1996), "Onthe Predictabilityof Stock Returns: nomic Review, 85, 322-340.
An Asset-AllocationPerspective,"Journal of Finance, 51, 385-424. Rossi, P., McCulloch, R., and Allenby, G. (1995), "HierarchicalModelling
Koenker,R., and Bassett, G. (1978), "RegressionQuantiles,"Econometrica, of ConsumerHeterogeneity:An Applicationto TargetMarketing,"in Case
46, 33-50. Studies in Bayesian Statistics (Vol. II), Lecture Notes in Statistics, 105,
Koenker, R., and Bassett, G. (1982), "Robust Tests for Heteroscedasticity eds. C. Gatsonis, J. Hodges, R. Kass, and N. Singpurwalla,New York:
Based on Regression Quantiles,"Econometrica,50, 43-61. Springer-Verlag,323-349.
Lancaster,T. (1994), "BayesWESML:PosteriorInferenceFromChoice-Based Rubin, D. (1981), "The Bayesian Bootstrap,"The Annals of Statistics, 9,
Samples,"unpublishedmanuscript,Brown University,Providence,RI. 130-134.

You might also like