Machine Learning
Machine Learning
and
Applied Econometrics
An Application: Double Machine
Learning for Price Elasticity
4/22/2019
Structural and Treatment Effects
• The Model
Y f ( D, Z ) u, E (u | Z , D) 0
D h ( Z ) v, E (v | Z ) 0
– D is the target variable of interest (e.g., price) or
the treatment variable (typically, D=0 or 1)
– Z is the set of exogenous covariates or control
variables (instruments, confounders), may be
high-dimensional.
• Partial Linear Model: f ( D, Z ) D g (Z )
4/22/2019
Structural and Treatment Effects
• If D is numeric structural variable
y / D
• If D=1 or 0
– Average Treatment Effect (ATE)
E f (1, Z ) f (0, Z )
– Average Treatment Effect for the Treated (ATT)
E f (1, Z ) f (0, Z ) | D 1
4/22/2019
Structural and Treatment Effects
• Based on Partial Linear Model,
– Frisch-Waugh-Lovel Theorem: ˆ
uˆ Y ˆD gˆ ( Z )
u Y g (Z )
u v if g and h arelinear
v D h(Z )
– Machine Learning : g (Z ) and h(Z )
– OLS: v ' u / v ' v
• This estimate is biased and inefficient!
– De-biased: v ' u / v ' D, in general
4/22/2019
Structural and Treatment Effects
• Based on Partial Linear Model,
– Sample Splitting
• {1,…,N} = Set of all observations
• I1 = main sample = set of observation numbers, of size
n, is used to estimate θ; e.g., n=N/2.
• I2 = auxilliary sample = set of observations, of size πn =
N −n, is used to estimate g;
• I1 and I2 form a random partition of the set {1,...,N}
– Cross Fitting on {I1,I2} and {I2,I1}
4/22/2019
Structural and Treatment Effects
• Cross Fitting on {I1,I2} and {I2,I1}
– Machine Learning:
g1 ( Z ) and h1 ( Z ) on ( I1 , I 2 )
g 2 ( Z ) and h2 ( Z ) on ( I 2 , I1 )
– De-Biased Estimator:
2 ( I1 , I 2 ) 1 2
1 ( I 2 , I1 ) 2
– is N consistent and approximately centered
normal (Chernozhukov, et.al., 2017)
4/22/2019
Structural and Treatment Effects
• Extensions
– Based on sample splitting {1,…,N} = {I1,I2},
de-biased estimator may be obtained from
pooled data and ML residuals:
v1 v2 ' u1 u2 / v1 v2 ' D1 D2
– Cross fitting can be k-fold, e.g. k=2, 5, 10
4/22/2019
Example: Table Wine Sales in Vancouver BC
4/22/2019
Example: Table Wine Sales in Vancouver BC
4/22/2019
4/22/2019
Table Wine Sales in Vancouver BC
Double Machine Learning of Price Elasticity
Y D g ( Z ) u, E (u | Z , D) 0
D m( Z ) v , E (v | Z ) 0
4/22/2019
Table Wine Sales in Vancouver BC
Double Machine Learning of Price Elasticity
• GLM (Lasso)
K-fold CF Y (Val. MSE) D (Val.MSE) (Price Elas.)
2 2.126 0.320 -1.238
5 2.126 0.320 -1.238
10 2.126 0.320 -1.238
4/22/2019
Table Wine Sales in Vancouver BC
Double Machine Learning of Price Elasticity
• DL (20,20)
K-fold CF Y (Val. MSE) D (Val.MSE) (Price Elas.)
2 1.977 0.273 -1.261
5 1.984 0.273 -1.271
10 1.983 0.274 -1.131
• DL (20,10,5)
K-fold CF Y (Val. MSE) D (Val. MSE) (Price Elas.)
2 1.966 0.273 -1.279
5 1.982 0.274 -1.124
10 1.973 0.273 -1.245
4/22/2019
Table Wine Sales in Vancouver BC
Double Machine Learning of Price Elasticity
4/22/2019
Table Wine Sales in Vancouver BC
Double Machine Learning of Price Elasticity
• Conclusion
– Linear regression model may not explain and validate this
set of data. Thus, the price elasticity estimate of 1.23 may
not be reliable.
– The nonparametric Deep Learning Neural Networks and
Gradient Boosting Machine perform better in learning this
dataset.
– Gradient Boosting Machine as applied to a partial linear
model framework in price elasticity is 1.19.
– All computations are done with R package H2O:
• Darren Cook, Practical Machine Learning with H2O,
O'Reilly Media, Inc., 2017.
4/22/2019