0% found this document useful (0 votes)
134 views72 pages

Econometric Analysis of Cross Section and Panel Data, 2e

This document discusses quantile regression. It begins by explaining why quantile regression is useful, as it allows examining how covariates affect different parts of the outcome distribution, not just the mean. It then reviews means, medians, and quantiles, discussing how ordinary least squares estimates the mean while least absolute deviations estimates the median. The document discusses when OLS and LAD may produce similar results and why they often differ. It also explains how quantile regression can be used to examine censored or corner solution models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
134 views72 pages

Econometric Analysis of Cross Section and Panel Data, 2e

This document discusses quantile regression. It begins by explaining why quantile regression is useful, as it allows examining how covariates affect different parts of the outcome distribution, not just the mean. It then reviews means, medians, and quantiles, discussing how ordinary least squares estimates the mean while least absolute deviations estimates the median. The document discusses when OLS and LAD may produce similar results and why they often differ. It also explains how quantile regression can be used to examine censored or corner solution models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

QUANTILE REGRESSION

Econometric Analysis of Cross Section and Panel Data, 2e


MIT Press
Jeffrey M. Wooldridge

1. Why Quantile Regression?


2. Review of Means, Medians, and Quantiles
3. Asymptotic Results
4. Quantile Regression with Endogenous Explanatory Variables
5. Quantile Regression for Panel Data
6. Quantile Regression for Corner Solutions

1
1. WHY QUANTILE REGRESSION?
∙ Often want to know the effect of changing a covariate – such as a
policy intervention – on features of the distribution other than the mean.
∙ For example, how does eligibility in a particular kind of pension plan
affect total wealth at different quantiles of the wealth distribution? The
mean effect, while useful, may mask very different effects in different
parts of the wealth distribution.

2
∙ Another reason to focus on the median (and quantile more generally)
is that sometimes we can estimate parameters in underlying models
under weaker assumptions using zero conditional median restrictions,
rather than zero conditional mean restrictions. An important case is data
censoring, which we cover later.
∙ Manipulations with medians are useful for certain corner solution
models, too.

3
∙ When we apply different estimation methods (say, ordinary least
squares and least absolute deviations) to the same linear model, we
must remember that these methods generally identify different
quantities (mean versus median in this case).
∙ In the statistics literature the focus on LAD has often been on its
resistance to outliers (which it certainly has). But there are other
reasons OLS and LAD can give different results; it need have nothing
to do with extreme data points.

4
2. REVIEW OF MEANS, MEDIANS, AND QUANTILES
∙ Start with a linear population model, where  is K  1:
y    x  u. (1)

∙ Assume Eu 2   , so that the distribution of u is not too spread out.


(So, for example, we rule out a Cauchy distribution for u, or a t 2
distribution.)
∙ We call this equation a “linear model.” There are many different ways
to estimate the parameters of this model, and the goal is to evaluate the
quality of the estimation procedures under different assumptions.

5
∙ Ordinary Least Squares (OLS) Estimation:
N
min
a,b
∑y i − a − x i b 2 . (2)
i1

∙ Least Absolute Deviations (LAD) Estimation:


N
min
a,b
∑|y i − a − x i b|. (3)
i1

∙ We should not now refer to equation (1) as an “OLS model” or an


“LAD model.” OLS and LAD are estimation methods, not “models.”

6
The OLS and LAD Objective Functions

15
10
OLS

LAD
5
0

-4 -2 0 2 4
u
u^2 |u|

∙ With a large random sample, when should we expect the slope


estimates to be similar? Two important cases.

7
(i) If

Du|x is symmetric about zero (4)

then OLS and LAD both consistently estimate  and  because, under
(4), Eu|x  Medu|x  0, and so

Ey|x    x
Medy|x    x

As we know, OLS consistently estimates the parameters in a


conditional mean and LAD consistently estimates the parameters in a
conditional median.

8
(ii) If

u is independent of x with Eu  0, (5)

where Eu  0 is the normalization that identifies , then OLS and


LAD both consistently estimate the slopes, . By (5), Eu|x  0 and so
we still have Ey|x    x.
∙ If u has an asymmetric distribution, then Medu ≡  ≠ 0, and ̂ LAD
converges to    because
Medy|x    x  Medu|x    x        x. But the
slopes  are identified by LAD.

9
∙ In many applications, neither (4) nor (5) is likely to be true. For
example, y may be a measure of wealth, in which case the error
distribution is probably asymmetric and Varu|x not constant.

Asymmetric Distribution
f(u)

10
∙ It is important to remember that if Du|x is asymmetric and changes
with x, then we should not expect OLS and LAD to deliver similar
estimates of , even for “thin-tailed” distributions. Therefore, claims
that substantive differences between OLS and LAD estimators must be
due to “outliers” may be unwarranted.

11
∙ Of course, LAD is much more resilient to changes in extreme values
because, as a measure of central tendency, the median is much less
sensitive than the mean to changes in extreme values. But it does not
follow that a large difference in OLS and LAD estimates means
something is “wrong” with OLS.
∙ OLS consistently estimates the parameters the provide the best
mean-square approximation to Ey|x. Unfortunately, characterizing
LAD under misspecification is harder but possible (later).
∙ One case where LAD is clearly preferred is when E|u|   but
Eu 2   .

12
∙ Advantage for median over mean: median passes through monotonic
functions. For example, suppose logy    x  u and
Medu|x  0. Therefore,

Medlogy|x    x

Write y  exp  x  u). Then

Medy|x  Medexp  x  u|x  exp  x  Medu|x


 exp  x.

13
∙ By contrast, we cannot generally find
Ey|x  exp  xEexpu|x because Eexpu|x is an unknown
function of x even if we assume Eu|x  0 or Medu|x  0.
∙ As we will see, being able to pass the median through monotonic
functions is very useful when data have been censored.

14
∙ Aside: Suppose u ~ Normal0,  2  and define w  expu, so w has a
lognormal distribution. Then Medw  expMedu  exp0  1
and Eexpu  exp 2 /2  1. So

Ew  Medw,

as is often the case for asymmetric distributions in economics.

15
∙ The previous derivation for finding the conditional median of y given
the conditional median of log y is just a special case of

Medgy|x  gMedy|x

where g is monotonically increasing or decreasing (and need not be


strictly increasing or decreasing) on the support of y.
∙ And we also know that for nonlinear functions g,
Egy|x  gEy|x.

16
∙ The expectation cannot be passed through monotonic functions, but it
has useful properties that the median does not, particularly linearity and
the law of iterated expectations. (The median operator is not linear; in
particular, Medw  y ≠ Medw  Medy. Also, there is no “law of
iterated medians.”)
∙ Suppose
yi  ai  xibi (6)

where a i , b i  is independent of x i . Define the population averages


  Ea i  and   Eb i  (so the  j are average partial effects).

17
∙ Ey i |x i  is easy to find:
Ey i |x i   Ea i |x i   x i Eb i |x i  ≡   x i . (7)

∙ Equation (7) immediatly shows OLS is consistent for  and  because


OLS is consistent for the parameters in a conditional mean linear in
those parameters.
∙ Generally, we cannot find Medy i |x i .

18
∙ What can we add so that LAD estimates something of interest? If r i is
a vector, then its distribution conditional on x i is centrally symmetric if
Dr i |x i   D−r i |x i , which implies that, if g i is any vector function of
x i , Dg ′i r i |x i  has a univariate distribution that is symmetric about zero.
This implies Er i |x i   0.
∙ Write
y i    x i   a i −   x i b i − . (8)

If r i  a i − , b i −  given x i is centrally symmetric then LAD


applied to the usual model y i    x i   u i consistently estimates 
and .

19
∙ Therefore, we can only be guaranteed that LAD consistently
estimates an interesting set of parameters under assumptions that imply
OLS would consistently estimate those same parameters. (Again, we
are ruling out case of very fat-tailed distributions.)
∙ Generally, the problem of what LAD estimates when we deviate from
the model with a single source of heterogeneity appears unsolved,
unless we impose strong assumptions.

20
Quantiles
∙ For 0    1, q is the  th quantile of y i if Py i ≤ q ≥  and
Py i ≥ q ≥ 1 − .
∙ In general, a quantile need not be unique. (Special case: a median
need not be unique.)
∙ In the common case where y i is continuous with a strictly increasing
cdf, q is the unique value such that

Py i ≤ q  
Py i ≥ q  Py i  q  1 − .

21
∙ Let covariates affect quantiles. Index the parameters by . Under
linearity,

Quant  y i |x i     x i . (9)

Under (9), consistent estimators of  and  are obtained by


minimizing the “check” function:
N
min
∈,∈ K
∑ c  y i −  − x i , (10)
i1

where

c  u  1u ≥ 0  1 − 1u  0|u|  − 1u  0u

and 1 is the indicator function.

22
Check Function: tau = .75
c(u)

8
6
4
2
0

-10 -5 0 5 10
u

23
∙ The check function identifies  and  in the sense that these
parameters solve the population problem

min K Ec  y i −  − x i ,


∈,∈

and then we have to assume uniqueness.


∙ The proof proceeds by showing that, for any x i ,  and 
actually solve

min Ec  y i −  − x i |x i .


∈,∈ K

∙ Manski (1988, Analog Estimation Methods in Econometrics) contains


a proof.

24
3. ASYMPTOTIC RESULTS
What Happens if the Quantile Function is Misspecified?
∙ Recall property of OLS: if  ∗ and  ∗ are the plims from the OLS
regression y i on 1, x i then these provide the smallest mean squared error
approximation to Ey|x  x in that  ∗ ,  ∗  solve

min Ex −  − x 2 . (11)


,

Under restrictive assumptions on distribution of x,  ∗j can be equal to or


proportional to average partial effects.

25
∙ Linear quantile formulation has been viewed by several authors as an
approximation. Recently, Angrist, Chernozhukov, and Fernández-Val
(2006) characterized the probability limit of the quantile regression
estimator. Absorb the intercept into x and let  be the solution to the
population quantile regression problem. ACF show that  solves

min Ew  x, q  x − x 2 , (12)


where the weight function w  x,  is


1
w  x,    0 1 − af y|x ax  1 − aq  x|xda. (13)

26
Computing Standard Errors
∙ For given , write
y i  x i   u i , Quant  u i |x i   0, (14)

where x 1 ≡ 1, and let ̂ be the quantile estimator. Define quantile


residuals û i  y i − x i ̂. Generally, N ̂ −  is asymptotically normal
with asymptotic variance A −1 BA −1 , where

A ≡ Ef u|x 0|x i x ′i x i  (15)

and

B ≡ 1 − Ex ′i x i . (16)

27
∙ If the quantile function is actually linear, a consistent estimator of B
is simply
N
̂  1 −  N −1 ∑ x ′i x i .
B (17)
i1

Generally, a consistent estimator of A is (Powell (1991))


N
  2Nh N  −1 ∑ 1|û i |≤ h N x ′i x i , (18)
i1

where h N  0 is a nonrandom sequence shrinking to zero as N → 


with N h N → . For example, h N  aN −1/3 for any a  0. Might use a
smoothed version so that all residuals contribute.

28
∙ If u i and x i are independent,
1 − 
Avar N ̂ −   Ex ′
i x i  −1 , (19)
f u 0 2

and Avar̂ is estimated as


N −1
1 − 
Avar̂  N −1 ∑ x ′i x i , (20)
f̂u 0 2 i1

where, say, ̂f u 0 is the histogram estimator of f u 0:


N
̂f u 0  2Nh N  −1 ∑ 1|û i |≤ h N . (21)
i1

Estimate in (20) is commonly reported (by, say, Stata).

29
∙ If the quantile function is misspecified, the “robust” form based on
̂ as in (17)), is not valid. In the generalized linear models
(18) (with B
literature, distinction between “semi-robust” variance estimator (mean
correctly specified) and a “ fully robust” estimator (mean might be
misspecified).
∙ For quantile regression, a fully robust variance estimator, which
allows the quantile function to be misspecfied, requires a different
estimator of B.

30
∙ Kim and White (2002) and Angrist, Chernozhukov, and
Fernández-Val (2006) show
N
̂ 
B N −1 ∑ − 1û i  0 2 x ′i x i (22)
i1

̂ −1 ̂ −1
is consistent, and then Avar   B with  given by (18).

31
∙ Hahn (1995, 1997) shows that the nonparametric bootstrap – that is,
just resample all variables – generally provides consistent estimates of
the fully robust variance without claims about the conditional quantile
being correct. Bootstrap does not provide “asymptotic refinements” for
testing and confidence intervals.
∙ Example using Abadie (2003). These are nonrobust standard errors.
nettfa is net total financial assets.

32
∙ Stata Output:
. use 401ksubs

. keep if fsize  1
(7258 observations deleted)

. sum

Variable | Obs Mean Std. Dev. Min Max


---------------------------------------------------------------------
e401k | 2017 .3604363 .4802461 0 1
inc | 2017 29.44618 16.67356 10.008 143.067
marr | 2017 .0183441 .1342256 0 1
male | 2017 .5418939 .4983654 0 1
age | 2017 39.27814 10.82328 25 64
---------------------------------------------------------------------
fsize | 2017 1 0 1 1
nettfa | 2017 13.59498 47.59058 -143.5 1134.098
p401k | 2017 .2429351 .4289625 0 1
pira | 2017 .2141795 .4103536 0 1
incsq | 2017 1144.947 1581.761 100.1601 20468.17
---------------------------------------------------------------------
agesq | 2017 1659.857 922.5799 625 4096

33
. count if nettfa  0
706

. tab e401k

1 if |
eligble for |
401(k) | Freq. Percent Cum.
-----------------------------------------------
0 | 1,290 63.96 63.96
1 | 727 36.04 100.00
-----------------------------------------------
Total | 2,017 100.00

34
. reg nettfa inc age agesq e401k, robust

Linear regression Number of obs  2017


F( 4, 2012)  30.66
Prob  F  0.0000
R-squared  0.1273
Root MSE  44.502

------------------------------------------------------------------------------
| Robust
nettfa | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
inc | .7825921 .1041046 7.52 0.000 .5784281 .9867562
age | -1.567659 1.075848 -1.46 0.145 -3.677551 .5422324
agesq | .0283926 .0138173 2.05 0.040 .001295 .0554902
e401k | 6.836563 2.173342 3.15 0.002 2.574328 11.0988
_cons | 2.533552 19.26135 0.13 0.895 -35.24072 40.30782
------------------------------------------------------------------------------

35
. qreg nettfa inc age agesq e401k
Iteration 1: WLS sum of weighted deviations  34159.391

Iteration 1: sum of abs. weighted deviations  34187.253

Iteration 26: sum of abs. weighted deviations  30905.329

Median regression Number of obs  2017


Raw sum of deviations 33151.39 (about 1.4)
Min sum of deviations 30905.33 Pseudo R2  0.0678

------------------------------------------------------------------------------
nettfa | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
inc | .3239283 .0116808 27.73 0.000 .3010205 .3468361
age | -.2443716 .1458544 -1.68 0.094 -.530413 .0416699
agesq | .0047983 .00171 2.81 0.005 .0014448 .0081518
e401k | 2.597726 .4038806 6.43 0.000 1.805658 3.389794
_cons | -3.572832 2.897657 -1.23 0.218 -9.255555 2.10989
------------------------------------------------------------------------------

36
. qreg nettfa inc age agesq e401k, q(.25)
Iteration 1: WLS sum of weighted deviations  29542.707

Iteration 1: sum of abs. weighted deviations  29403.746

Iteration 21: sum of abs. weighted deviations  19568.944

.25 Quantile regression Number of obs  2017


Raw sum of deviations 19760.67 (about -.15000001)
Min sum of deviations 19568.94 Pseudo R2  0.0097

------------------------------------------------------------------------------
nettfa | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
inc | .0712858 .0072275 9.86 0.000 .0571118 .0854599
age | .0336287 .0954666 0.35 0.725 -.153595 .2208524
agesq | .000372 .001113 0.33 0.738 -.0018107 .0025547
e401k | 1.281012 .2627072 4.88 0.000 .7658052 1.796218
_cons | -4.372772 1.895672 -2.31 0.021 -8.090457 -.6550865
------------------------------------------------------------------------------

37
. qreg nettfa inc age agesq e401k, q(.75)
Iteration 1: WLS sum of weighted deviations  35270.543

Iteration 1: sum of abs. weighted deviations  35277.14

Iteration 22: sum of abs. weighted deviations  33600.122

.75 Quantile regression Number of obs  2017


Raw sum of deviations 41098.57 (about 13.2)
Min sum of deviations 33600.12 Pseudo R2  0.1825

------------------------------------------------------------------------------
nettfa | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
inc | .797724 .0252319 31.62 0.000 .7482406 .8472075
age | -1.385644 .2865236 -4.84 0.000 -1.947558 -.8237297
agesq | .024192 .0033797 7.16 0.000 .0175639 .0308202
e401k | 4.460003 .8006231 5.57 0.000 2.889866 6.03014
_cons | 7.538962 5.732206 1.32 0.189 -3.702718 18.78064
------------------------------------------------------------------------------

38
Dependent Variable: nettfa

Explanatory Variable Mean (OLS) .25 Quantile Median (LAD) .75 Quantile

inc . 783 . 0713 . 324 . 798

. 104 . 0072 . 012 . 025

age −1. 568 . 0336 −. 244 −1. 386

1. 076 . 0955 . 146 . 287

age
2 . 0284 . 0004 . 0048 . 0242

. 0138 . 0011 . 0017 . 0034

e401k 6. 837 1. 281 2. 598 4. 460

2. 173 . 263 . 404 . 801

N 2, 017 2, 017 2, 017 2, 017

39
∙ Can use the bootstrap to get the fully robust standard errors (valid
with or with independence between u i and x i ). Below is for LAD, and
the standard errors are substantially larger than the nonrobust ones.
. bsqreg nettfa inc age agesq e401k, reps(500)
(fitting base model)

Median regression, bootstrap(500) SEs Number of obs  2017


Raw sum of deviations 33151.39 (about 1.4)
Min sum of deviations 30905.33 Pseudo R2  0.0678

------------------------------------------------------------------------------
nettfa | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
inc | .3239283 .0396347 8.17 0.000 .246199 .4016576
age | -.2443716 .1997378 -1.22 0.221 -.6360862 .147343
agesq | .0047983 .0025729 1.86 0.062 -.0002475 .0098441
e401k | 2.597726 .5752944 4.52 0.000 1.469491 3.725961
_cons | -3.572832 3.819017 -0.94 0.350 -11.06247 3.916809
------------------------------------------------------------------------------

40
Efficiency Calculations
∙ Suppose in the model
y  x  u,

where x includes unity, u is independent of x with a symmetric


distribution about zero. Also, f u 0  0 and Eu 2   .
∙ Then OLS and LAD are both consistent for  and N -asymptotically
normal:

Avar N ̂ OLS −    2 Ex ′ x −1


Avar N ̂ LAD −   1 Ex ′ x −1 .
4f u 0 2

41
∙ We can compare asymptotic efficiency by comparing the scalars out
front:  2 versus 1/4f u 0 2 .
∙ For example, suppose the population distribution is Normal0,  2 .
Then f u 0  1/ 2 2 and so
1  2 2
   2 ≈ 1. 57 2
4f u 0 2 4 2

∙ This shows that LAD is asymptotically inefficient – 57% less


efficient – compared with OLS when the u i come from a normal
distribution.

42
∙ Now suppose that u i has a double exponential distribution with mean
zero, which has density

fu  1 exp−|u|/,
2
where Varu i   2 2 .
∙ Can show the MLE of  in this case is the LAD estimator. The log
likelihood is
N
ℒ N ,   − log −  −1 ∑|y i − x i |.
i1

43
∙ Further, f0  1/2 and so
1  4 2 /4   2
4f u 0 2

whereas

 2  2 2

∙ The asymptotic variance of the sample average is twice that of the


sample median.

44
∙ For general asymmetric distributions it makes no sense to discuss
asymptotic efficiency of the two estimators unless u is independent of
x, in which case we can compare the slopes.
∙ If Du|x is asymmetric and depends on x, OLS and LAD generally
estimate different parameters.

45
Fully Parametric Approaches
∙ If we have fully specified a model for Dy|x then we can learn
anything about Dy|x, including any quantile we want.
∙ If we start with
y    x  u

then specifying Du|x implies a model for Dy|x.


∙ As an example, suppose
Du|x  Normal0,  2 exp2x

46
∙ Then Ey|x  Medy|x    x and, for any quantile ,
Quant  y|x    x  Quant  u|x.

∙ Now Quant  u|x is the value q  x such that


Pu ≤ q  x|x  

or

u q  x
P ≤ 
 expx  expx

47
∙ But e ≡ u/ expx is independent of x with a standard normal
distribution, and so we must have
q  x
 a
 expx

where a  is the  th quantile of the standard normal distribution (which


we can easily find). Therefore,

q  x  a   expx

and so

Quant  y|x    x  a   expx.

48
∙ If  ≠ 0, this quantile function is nonlinear in x except when  . 5, in
which case a   0.
∙ When   0 (homskedasticity), Quant  y|x    x  a   for any
. (u is independent of x in this case, and so all quantile functions have
the same slopes, , but with intercepts   a  .)
∙ The parameters , ,  2 , and  can be estimated by MLE using the
normal density that recognizes the mean and variance both depend on
x. One might try using only Quant  y|x    x  a   expx, but
there is an identification problem when   0.

49
4. QUANTILE REGRESSION WITH ENDOGENOUS
EXPLANATORY VARIABLES
∙ Suppose
y1  z11  1y2  u1, (23)

where z is exogenous and y 2 is endogenous – whatever that means in


the context of quantile regression.
∙ Amemiya’s (1982) two-stage LAD estimator. Specify a reduced form
for y 2 ,

y 2  z 2  v 2 . (24)

50
∙ The first step applies OLS or LAD to (24), and gets fitted values,
ŷ i2  z i ̂ 2 . These are inserted for y i2 to give LAD of y i1 on z i1 , ŷ i2 .
Consistency of 2SLAD relies on the median of the composite error
 1 v 2  u 1 given z being zero, or at least constant.

51
∙ If Du 1 , v 2 |z is centrally symmetric, can use a control function
approach. Write

u1  1v2  e1, (25)

where e 1 given z, v 2  has a symmetric distribution. Get LAD residuals


v̂ i2  y i2 − z i ̂ 2 and do LAD of y i1 on z i1 , y i2 , v̂ i2 . Use t test on v̂ i2 to
test null that y 2 is exogenous.
∙ Interpretation of LAD in context of omitted variables is difficult
unless lots of joint symmetry assumed.

52
∙ Little has been done on y 2 binary (except in the special case of
treatment effects). Clearly cannot just plug in, say, probit fitted values,
and then use LAD. Similar comments hold for other discrete y 2 .
∙ Control function approaches with “generalized residuals” may
provide good approximations.

53
5. QUANTILE REGRESSION FOR PANEL DATA
∙ Without unobserved effects, QR easy on panel data:
Quant  y it |x it   x it , t  1, . . . , T. (26)

Pooled QR, but account for serial correlation in the score,

s it   −x ′it 1y it − x it  ≥ 0 − 1 − 1y it − x it   0.

Use “cluster robust” variance matrix estimate for B:


N T T
̂  N −1 ∑ ∑ ∑ s it ̂s ̂ ′
B (27)
ir
i1 t1 r1

54
N T
  2Nh N  −1 ∑ ∑ 1|û it |≤ h N x ′it x it . (28)
i1 t1

∙ Explicitly allowing unobserved effects is harder.


Quant  y it |x i , c i   Quant  y it |x it , c i   x it   c i . (29)

∙ “Fixed effects” approach, where Dc i |x i  unrestricted, is attractive.


Honoré (1992) applied to the uncensored case: LAD on the first
differences consistent when u it : t  1, . . . , T is an iid. sequence
conditional on x i , c i  (symmetry not required). When T  2, LAD on
the first differences is equivalent to estimating the c i along with , but
not with general T.

55
∙ Alternative suggested by Abrevaya and Dahl (2006) for T  2. In
Chamberlain’s correlated random effects linear model,

Ey t |x 1 , x 2    t  x t   x 1  1  x 2  2 , t  1, 2 (30)

∂Ey 1 |x ∂Ey 2 |x


 − (31)
∂x 1 ∂x 1
Abrevaya and Dahl suggest modeling Quant  y t |x 1 , x 2  as in (30) and
then defining the partial effects as
∂Quant  y 1 |x ∂Quant  y 2 |x
  − (32)
∂x 1 ∂x 1

56
∙ Correlated RE approaches difficult: quantiles of sums not sums of
quantiles. If c i    x̄ i   a i ,

y it    x it   x̄ i   a i  u it . (33)

Generally, v it  a i  u it will not have zero conditional quantile.


Nevertheless, might estimate (33) by pooled quantile regression for
different quantiles.

57
∙ More flexibility if we start with median,
y it  x it   c i  u it , Medu it |x i , c i   0, (34)

and make symmetry assumptions. Can apply LAD to the


time-demeaned equation ÿ it  ẍ it   ü it , being sure to obtain fully
robust standard errors for pooled LAD.

58
∙ If we impose the Chamberlain-Mundlak device,
y it    x it   x̄ i   a i  u it , we can get by with central symmetry of
Da i , u it |x i , so that Da i  u it |x i  is symmetric about zero, and, if this
holds for each t, pooled LAD of y it on 1, x it , and x̄ i consistently
estimates  t , , . (Remember, if we use pooled OLS with x̄ i included
along with x it , we obtain the FE estimates.)
∙ Should use serial-correlation-robust inference.

59
6. QUANTILE REGRESSION FOR CORNER SOLUTIONS
∙ Suppose that y is a corner solution response with a corner at zero. We
know that a general model that captures this feature is

y  max0, x  u. (35)

∙ If we assume
Medu|x  0 (36)

then

Medy|x  max0, x. (37)

60
∙ In other words, the zero conditional median restriction on u identifies
one feature of Dy|x, namely, Medy|x.

Med(y|x) = max(0,x)
max(0,x)

4
3
2
1
0

-4 -2 0 2 4
x

61
∙ The  j measure the partial effects on Medy|x once Medy|x  0.
For example, if x j is continuous,
∂Medy|x
  j 1x  0. (38)
∂x j

∙ As Honoré (2008) has recently shown, a simple estimator of the


average of these effects (across the distribution of x is easily
estimated: ̂ ̂ j where ̂ is the fraction of strictly positive y i .

62
∙ The so-called censored least absolute deviations (CLAD) estimator
solves
n
minK
b∈R
∑|y i − max0, x i b| (39)
i1

∙ The objective function is continuous in the parameters, so consistency


is relatively straightforward (under identification).
∙ The nonsmoothness makes asymptotic normality hard, but Powell
(1984) shows it under general conditions. Estimation of the asymptotic
variance has been coded, too, and the bootstrap is used.

63
∙ One way to view the CLAD approach is that it identfies
Medy|x  max0, x for a variety of shapes for Du|x. But there is a
cost: other features of Dy|x, such as the mean, are not identified. So,
CLAD does not allow us to aggregate the effects of a policy or
program. We can get the median effect for groups indexed by the
observed covariates.

64
∙ A model no more or less restrictive than y  max0, x  u,
Medu|x  0, is

y  a  expx, Ea|x  1, (40)

and Ey|x  expx is identified. Can have Pa  0  0 to give


Py  0  0.

65
∙ The standard Tobit model, or extensions such as allowing for
heteroskedasticity in the latent error, can be restrictive, but they identify
all of Dy|x. In other words, there is a tradeoff between assumptions
and how much can be learned.

66
∙ In the panel case, we can start with
y it  max0, x it   c i  u it 
Medu it |x i , c i   0, t  1, . . . , T.

∙ These imply that


Medy it |x i , c i   max0, x it   c i . (41)

67
∙ Notice that strict exogeneity is assumed because
x i  x i1 , x i2 , . . . , x iT  appears on the left hand side.
∙ Honoré (1992) showed how to estimate  without restricting Dc i |x i 
by imposing “exchangeability” assumptions on u it : t  1, . . . , T;
independent and identicallly distributed is sufficient but not necessary.
Nonstationarity, including heteroskedasticity, is ruled out. So, it is like
a “fixed effects” method for corner solutions but the c i are not
parameters to estimate. And it does impose extra assumptions on
u it : t  1, . . . , T.

68
∙ The partial effect of x tj on Medy it |x it  x t , c i  c is

 tj x t , c  1x t   c  0 j . (42)

∙ What values should we insert for c? Average of (42) across Dc i 


would be average partial effects (on the median):

 tj x t   E c i  tj x t , c i   1 − Gx t  j

where G is the unconditional cdf of c i .


∙ The  j give the relative effects of the APEs on the median.
∙ If c i has a Normal c ,  2c  distribution,
E c i  tj x t , c i    c − x t / c  j .

69
∙ Honoré (2008) can also be applied in this case. Write
y t  max0, x t   v t  where, in this case, we are thinking v t  c  u t .
Then, if v t has a continuous distribution, the probability of being at the
kink is zero. So
∂y t
x t , v t   1x t   v t  0 j (43)
∂x tj

and averaging out across the joint distribution of x t , v t  gives


∂y t
E x t ,v t  x t , v t   Py t  0 j (44)
∂x tj

70
∙ Given ̂ j – available using methods summarized by Honoré (2008) –
see also Arellano and Honoré (2001, Handbook of Econometrics,
Volume 5) – (44) is easily estimated by multiplying ̂ j by the fraction
of positive y t in the sample.
∙ Remember, we can also use parametric models: the
Chamberlain-Mundlak version of the RE Tobit model (where
Du it |x i , c i   Normal0,  2u  and c i    x̄ i   a i ,
Da i |x i   Normal0,  2a ). In the parametric setting, we can easily
obtain APEs, and if we impose conditional independence on the u it ,
also other partial effects (such as the partial effect at the average).

71
∙ Generally, using median and quantile restrictions on Dy it |x it , c i ,
without restricting the distribution of c i (either unconditionally or
conditional on x i ), we cannot obtain partial effects as a function of x t ;
that would require knowing at least the central tendency the distribution
of c i (say, the mean or median).
∙ With the semiparametric approach, it is unclear what to do about
1 0
discrete changes. If x t and x t are two values of x t , we would like to
study
1 0
max0, x t   c − max0, x t   c,

but we do not know what to plug in for c, or how to average out across
the distribution of c.

72

You might also like