0% found this document useful (0 votes)
16 views41 pages

Econ24 Parmeter

Uploaded by

ashishsagar814
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views41 pages

Econ24 Parmeter

Uploaded by

ashishsagar814
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

The Existence of Inefficiency:

LASSO+SFA

Christopher F. Parmeter1 Artem Prokhorov2


Valentin Zelenyuk3
1
Miami Herbert Business School
2
University of Sydney Business School
3
School of Economics and Centre for Efficiency and Productivity Analysis

November 7th, 2024


From the End

I Combine machine learning with stochastic frontier analysis.

I Establish moment/parameter redundancy for use of post


double LASSO with MLE.

I Simple and effective step-wise estimator that preserves


efficiency and valid inference.
X-inefficiency
X-inefficiency?
X-inefficiency!
Selective Attention
Stochastic Frontier Analysis (SFA)
The Stochastic Frontier Model

The stochastic frontier model we consider in this paper can be


written as follows:

y = x0 β + v − u = x0 β + ε, (1)

where y is an n-vector of output, x is an p × 1 vector of


production inputs including a constant, ε = v − u is the
n-vector of error terms εi composed of a Normal part
vi ∼ N (0, σv2 ) and a Half-Normal inefficiency component
ui ∼ N+ (0, σu2 ).
I Aside from presence of ui this is a trivial model to estimate.

I But we are interested in ui .


ML Formlation of Frontier Model

inputs conf ounders


z}|{ z}|{
yi = x0i β + z0δ +vi −ui , i = 1, . . . , 2n. (2)
| {z i }
stochastic frontier

I p (number of inputs) is small (and fixed).

I d (number of confounders), possibly large (> 2n).

I β can be estimated at O(n−1/2 ) if δ can be.

I Impossible to estimate δ at this rate when d is large.


Double Machine Learning

I Consider estimation of a treatment effect (not a frontier


model)

yi = xi β + zi0 δ +vi , i = 1, . . . , 2n.


|{z} |{z}
scalar treatment conf ounders

1. Use any ML tool to predict E[y|z] and E[x|z], using half of


the sample for each (hence the 2n).

2. Obtain βb from the regression of ỹ on x̃ where


\
w̃ = w − E[w|z].
Double Machine Learning


I β̂ is n-consistent and asymptotically Normal even if RMSE
of z 0 δ has rate O(n−1/4 ) (so can estimate nonparametrically).

I The moment conditions for which β̂ is constructed imply


Neyman Orthogonality.
Neyman Orthogonality and M/P Redundancy

I In the context of asymptotically optimal testing, Neyman


(1959) asked when do errors of nuisance functions not carry
over into β̂.

I Let δ denote the functional nuisance parameter and let


h∗1 (β, δ) be the moment function implied by the FOC for β̂:

E[h∗1 (β, δ)] = 0.


Neyman Orthogonality and M/P Redundancy

I We say h∗1 (·, ·) is Neyman orthogonal if the moment function


remains valid under perturbations in δ:

D12 [δ − δ0 ] = ∂δ E[h∗1 (β, δ)][δ − δ0 ] = 0. (3)

I D12 [δ − δ0 ] is the Gateaux derivative of the moment function


in the direction δ around the true value δ0 .

I Neyman orthogonality is connected to and best understood in


a GMM framework.
Neyman Orthogonality and M/P Redundancy

I GMM of (β, δ) is based on moment conditions assumed to


hold in the population:

[A] for β : E[h1 (β, δ)] = 0 (4)


[B] for δ : E[h2 (β, δ)] = 0. (5)

I We assume that [A] is enough to identify β given δ.

I Using more knowledge in the form of [B] and/or using δ0


improves statistical efficiency asymptotically.

I Prokhorov and Schmidt (2009) asked when is it irrelevant for


the estimation of β whether we know [B] and/or δ0 .
Neyman Orthogonality and M/P Redundancy

I Assume finite dimensional δ:

[A] for β : E[h1 (β, δ)] = 0 (6)


[B] for δ : E[h2 (β, δ)] = 0. (7)

I When is it irrelevant for estimation of β whether we know [B]


and/or δ0 ?

I When asymptotic variance of GMM based on [A] with known


δ is equal to asymptotic variance of GMM based on [A] and
[B] with unknown δ.
Neyman Orthogonality and M/P Redundancy

h1 h01 h1 h02
   
C11 C12
C=E =
h2 h01 h2 h02 C21 C22
and    
∇β h1 ∇δ h1 D11 D12
D=E =
∇β h2 ∇δ h2 D21 D22

C12 = 0 Moment redundancy of [B]


M/P-Redundancy ⇔
D12 = 0 Parameter redundancy of δ
Neyman Orthogonality and M/P Redundancy

I So start by specifying

[A] for β : E[h1 (β, δ)] = 0 (8)


[B] for δ : E[h2 (β, δ)] = 0 (9)

and look for valid moment function h∗1 (β, δ) that is


uncorrelated with h2 (·, ·) such that

D12 = E [∇δ h∗1 (β, δ)] = 0.

I The we can use any slowly converging ML tool (LASSO, GRF,


etc.)
√ to obtain δ̂, plug it into h1 (β, δ) and obtain a
n-consistent and asymptotically Normal β̂.
Return to ML Formlation of Frontier Model

inputs conf ounders


z}|{ z}|{
yi = x0i β + z0δ +vi −ui , i = 1, . . . , 2n. (10)
| {z i }
stochastic frontier

I All ML tools give biased estimators.

I Inputs correlate with confounders: xi = m(zi ) + ηi .

I Biases in δ̂ and m̂(zi ) affect β̂ and ûi .

I So what changes with the introduction of u ≥ 0 into the


model and what are the M/P redundant moments?
Return to ML Formlation of Frontier Model
inputs conf ounders
z}|{ z}|{
yi = x0i β + z0δ +vi −ui , i = 1, . . . , 2n.
| {z i }
stochastic frontier

I Conventional estimation (COLS) (assume u ∼ |N (0, σu2 )| and


v is symmetric and E[v] = 0):
  2n
β̂ X 2
= min (yi − x0i β − zi0 δ)
δ̂ β,δ
i=1
r
2
accounting for E[ui ] = σu > 0.
π
I Evidence of inefficiency is captured through negative skewness
of the residuals ε̂i = yi − x0i β̂ − zi0 δ̂.
Return to ML Formlation of Frontier Model

inputs conf ounders


z}|{ z}|{
yi = x0i β + z0δ +vi −ui , i = 1, . . . , 2n.
| {z i }
stochastic frontier

I Conventional estimation (maximum likelihood) (assume


u ∼ |N (0, σu2 )|, v ∼ N (0, σu2 ) and u ⊥ v):
 
θ̂ = max ln L(θ), θ = β̂, δ̂, σ̂v2 , σ̂u2 .
θ

I Evidence of inefficiency: σu2 >> 0.


Does Inefficiency Exist?

d
X
yi = 1 + 0.3x1i + 0.4x2i + 0.38x3i + δj zij + vi − ui
j=1

True δj = 0, zij ∼ N (0, 1), d = cn,


vi ∼ N (0, 0.5), ui ∼ |N (0, 1.2)|
Does Inefficiency Exist?

Average skewness of OLS residuals over 1,000 simulations


n 0 0.01 0.1 0.2 0.3 0.5 0.9
100 −0.494 −0.488 −0.420 −0.342 −0.267 −0.143 −0.001
200 −0.525 −0.517 −0.445 −0.375 −0.299 −0.177 −0.011
400 −0.536 −0.530 −0.454 −0.380 −0.308 −0.186 −0.012
800 −0.547 −0.539 −0.466 −0.391 −0.319 −0.193 −0.016
1,600 −0.549 −0.542 −0.468 −0.391 −0.319 −0.189 −0.016
Resort to ML? - Post-Single-LASSO

I LASSO ⇒ some elements of δ̂LASSO are exactly 0; drop these


confounders
  2n d
β̂LASSO X
0 0 2
X
= min (yi − xi β − zi δ) + λ |δj |
δ̂LASSO β,δ
i=1 j=1

I COLS using only confounders picked by LASSO ⇒


PSL-COLS
  2n
β̂P SL X 2
= min (yi − x0i β − zi0 δ) ,
δ̂P SL β,δ
i=1
 
s.t. δj = 0 for any j ∈/ supp δ̂LASSO
Resort to ML? - Post-Single-LASSO

I LASSO ⇒ some elements of δ̂LASSO are exactly 0; drop these


confounders
  2n d
β̂LASSO X
0 0 2
X
= min (yi − xi β − zi δ) + λ |δj |
δ̂LASSO β,δ
i=1 j=1

I or MLE using only confounders picked by LASSO ⇒


PSL-MLE
 
θ̂P SL = max ln L(θ), s.t. δj = 0 for any j ∈
/ supp δ̂LASSO .
θ
Inefficiency Exists!

Average skewness of PSL-OLS residuals over 1,000 simulations


n 0 0.01 0.1 0.2 0.3 0.5 0.9
100 −0.503 −0.404 −0.386 −0.374 −0.367 −0.359 −0.350
200 −0.520 −0.452 −0.436 −0.430 −0.425 −0.420 −0.413
400 −0.536 −0.479 −0.470 −0.465 −0.463 −0.459 −0.455
800 −0.546 −0.506 −0.500 −0.498 −0.497 −0.494 −0.492
1,600 −0.552 −0.522 −0.519 −0.517 −0.516 −0.516 −0.514
Another Problem: Inference for PSL-MLE

200
X
yi = βxi + 0.8 δj zij + vi − ui
j=1

True δj = (1/j)2 , zij ∼ N (0, 1), 2n = 100, λ by CV,


vi ∼ N (0, 0.5), ui ∼ |N (0, 1.2)|,
200
X
xi = 0.6 δj zij + ηi , ηi ∼ N (0, 1)
j=1
Sampling distribution of standardized β̂P SL over 1,000
simulations
Why Does PSL Fail?
Look at MLE when yi = x0i β + zi0 δ + vi − ui for
vi ∼ N (0, σv2 ) ⊥ ui ∼ |N (0, σu2 )|
2
fε (εi ) = φ(εi /σ)Φ(−λεi /σ)
σ
where σ 2 = σv2 + σu2 , λ = σu /σv
2n
X
θ̂M LE = max ln fε (εi ), where εi = yi − x0i β − zi0 δ.
θ
i=1

Recall: PSL zeros out some δj s ⇒ let δLASSO contain 0’s for
those j’s, then

ξi = yi − x0i β + zi0 δLASSO = εi + zi0 (δ − δLASSO ) 6= εi .


Why Does LASSO Break?

I For simplicity assume that (σ, λ) = (1, 1), d = dim(δ) < 2n


φ(νi )
and define ri (νi ) = (Inverse Mill’s Ratio).
1 − Φ(νi )
I Moment equations implied by FOCs from MLE:
[A] for β : E [x0i (εi + ri (εi ))] = 0
[B] for δ : E [zi0 (εi + ri (εi ))] = 0.

I Moment equations implied by FOCs from PSL-MLE:


[A] for β : E [x0i (ξi + ri (ξi ))] = 0
[B] for δLASSO : E [zi0 (ξi + ri (ξi ))] = 0.
PSL-MLE is using invalid moment conditions; LASSO
regularization bias carries over to estimation of β̂.
How to Conduct Valid Inference in SFA?

0
Let ε̃i := yi − πy0 zi − (xi − πx0 zi ) β − zi0 δ.

I

I Consider the moment conditions


0
[A∗ ] E (xi − πx0 zi ) (ε̃i + ri (ε̃i )) = 0
 

[B ∗ ] E [zi0 (ε̃i + ri (ε̃i ))] = 0


[C] E [zi0 (xi − πx0 zi )] = 0
[D] E zi0 yi − πy0 zi = 0
 

I Under homoskedasticity, [A∗ ] satisfies Neyman orthogonality.

I Equivalently, [B ∗ ], [C] and [D] are M/P redundant for the


estimation of β.
Sketch of the Argument

I Look at
0
[A∗ ] E (xi − πx0 zi ) (ε̃i + ri (ε̃i )) = 0
 

[B ∗ ] E [zi0 (ε̃i + ri (ε̃i ))] = 0

[A∗ ] ⊥ [B ∗ ] ⇒ C12 = 0.

I Expected derivative:
  
0 0 ∂ ∂
δ : E (xi − πx zi ) ε̃i + ri (ε̃i ) =
∂δ ∂δ
0
E (xi − πx0 zi ) (−zi + zi ri (ε̃i ) (ε̃i + ri (ε̃i ))) = 0.
 
Note

I Identical result holds for both πx and πy .

I The idea is similar to partialing out from Frisch-Waugh-Lovell.

I [A∗ ] and [B ∗ ] correspond to running MLE where the


dependent variable is the part of yi that is orthogonal to zi
and the explanatory variables are zi and the part of xi that is
orthogonal to zi .
Post-Double-LASSO

I LASSO of yi on zi
n d
X 2 X
0
π̂LASSO = min
0
yi − zi0 π 0 + λ0 |πj0 |.
π
i=1 j=1

I LASSO of xi (one-by-one) on zi
n d
X 2 X
`
π̂LASSO = min x`i − zi0 π ` + λ` |πj` |.
π`
i=1 j=1
Post-Double-LASSO

I MLE using the union of confounders picked by LASSO in the


first two steps PDL-MLE
2n
X
θ̂P DL =max ln fε (εi ),
θ
i=1
p
[
`

s.t. δj = 0 for any j ∈
/I= supp π̂LASSO .
`=0

I I is called the amelioration set (Belloni, Chernozhukov and


Hansen, 2013)
Sampling distribution of standardized β̂P DL over 1,000
simulations
Empirical Example

I 137 dairy farms in Spain from 1999-2010 (Alvarez & Arias,


2004).

I y is milk production (liters).

I x is labor (man-equivalent units), cows, feed (kg), land


(hectares) and roughage (expenses incurred to produce
roughage: fertilizer, machines, seed, silage additives, etc.).
Empirical Example

I z is year dummies, zone dummies, land-ownership,


bacteriological content of the milk, price of milk, price of feed,
membership in an agricultural cooperative, milk quality
indicators (fat, protein, somatic cell count), and something
called AVGCOST (neither Antonio or Carlos could remember
what this variable captured).

I dim(z) = 50 with first order terms [Cobb-Douglas];


dim(z) = 87 with second order terms [translog].
Empirical Example: Cobb-Douglas
OLS SFA
OLS SFA Large Large SFA-PSL SFA-PDL
Feedstuffs 0.386 0.360 0.464 0.464 0.439 0.401
0.012 0.013 0.011 0.011 0.011 0.013
Cows 0.595 0.642 0.467 0.467 0.546 0.560
0.020 0.022 0.017 0.017 0.017 0.021
Land −0.010 −0.012 0.032 0.032 0.007 0.033
0.009 0.009 0.009 0.009 0.008 0.010
Labor 0.035 0.032 0.013 0.013 −0.015 0.005
0.012 0.012 0.010 0.009 0.009 0.011
Roughage 0.067 0.060 0.073 0.073 0.082 0.061
0.005 0.005 0.004 0.004 0.004 0.005
RTS 1.074 1.082 1.048 1.048 1.059 1.059
Eff 0.930 0.892 1.000 0.999 0.999 0.926
Empirical Example: Translog
OLS SFA
OLS SFA Large Large SFA-PSL SFA-PDL
Feedstuffs 0.341 0.319 0.457 0.457 0.409 0.342
0.014 0.014 0.013 0.013 0.013 0.014
Cows 0.633 0.676 0.454 0.454 0.574 0.618
0.024 0.024 0.021 0.020 0.020 0.023
Land −0.011 −0.017 −0.017 −0.017 −0.014 −0.013
0.010 0.010 0.009 0.009 0.009 0.010
Labor 0.021 0.014 −0.007 −0.007 −0.033 0.000
0.014 0.013 0.010 0.010 0.010 0.012
Roughage 0.093 0.088 0.122 0.122 0.126 0.079
0.008 0.008 0.007 0.007 0.007 0.007
RTS 1.076 1.080 1.008 1.008 1.062 1.025
Eff 0.932 0.887 1.000 0.999 0.927 0.915
Concluding remarks

I Neyman orthogonality is key to ensuring valid causal inference;


it is equivalent to M/P redundancy.

I Abundance of data makes it harder to establish and address


inefficiency of production.

I Machine Learning tools are effective at reversing the spurious


finding of full efficiency.

I Partialing out offers a way of conducting valid


post-machine-learning causal inference.

I We derive and apply Neyman orthogonal moment conditions


for production frontier models.

You might also like