Lecture Notes: Artificial Intelligence: The Value Added of Machine Learning To Causal Inference
Lecture Notes: Artificial Intelligence: The Value Added of Machine Learning To Causal Inference
Andrea A. Naghi1
1 Department of Econometrics
1/48
Lecture Notes: Artificial Intelligence
Introduction
Motivation
2/48
Lecture Notes: Artificial Intelligence
Introduction
This Paper
3/48
Lecture Notes: Artificial Intelligence
Introduction
Literature
I Econometrics Literature on CML methods:
I ATE: Chernozhukov et al. (2018, ET and AER P&P), Athey,
Imbens and Wager (2018, JASA), Farrell et al. (2021, ECTA),
Colangelo and Lee (2019)
I HTE: Wager and Athey (2018, JASA), Athey et al. (2019,
Annals of Stat.), Chernozhukov et al. (2019), Semenova et al.
(2018), Oprescu et al. (2019)
I Statistics and ML Literatures: Hill (2011), Imai et al. (2013),
Van der Laan and Rose (2011), Suet al. (2009), Zeileis et al.
(2008), among others.
I Early applications: Davis and Heller (2017) and Davis and
Heller (2017, AER P&P), Strittmatter (2019), Knaus et al.
(2020 JHR), Bertrand et al. (2017), Deryugina et al. (2019,
AER)
5/48
Lecture Notes: Artificial Intelligence
Introduction
Outline
1. Introduction
4. Simulations
6/48
Lecture Notes: Artificial Intelligence
Methodology: ATE
7/48
Lecture Notes: Artificial Intelligence
Methodology: ATE
Y = Dθ0 + g0 (X ) + U (1)
D = m0 (X ) + V (2)
8/48
Lecture Notes: Artificial Intelligence
Methodology: ATE
10/48
Lecture Notes: Artificial Intelligence
Methodology: ATE
I
√
n(θ̌0 − θ0 ) = a∗ + b ∗ + c ∗
1 X
b ∗ = (EV 2 )−1 √ (m0 (X ) − m̂0 (X ))(g0 (Xi ) − ĝ0 (Xi ))
n
i∈I
1 X
c∗ = √ Vi (g0 (Xi ) − ĝ0 (Xi ))
n
i∈I
I Partition the data: Auxiliary Sample (obtain ĝ0 and m̂0 ) and
Main Sample (obtain θ̌0 ); Cross-fitting
11/48
Lecture Notes: Artificial Intelligence
Results: ATE
Revisited Papers
I ATE:
I Djankov et al. AEJ: Macro (2010) ”The Effect of Corporate
Taxes on Investment and Entrepreneurship”
I Alesina et al. QJE (2013) ”On the Origins of Gender Roles:
Women and the Plough”
I Nunn and Trefler, AEJ: Macro (2010) ”The Structure of
Tariffs and Long-Term Growth”
I HTE:
I Della Vigna and Kaplan QJE (2007) ”The Fox News Effect:
Media Bias and Voting”
I Loyalka et al. AEJ: Applied (2019) ”Does Teacher Training
Actually Work? Evidence from a Large-Scale Randomized
Evaluation of a National Teacher Training Program”
12/48
Lecture Notes: Artificial Intelligence
Results: ATE
yc = α + βtaxesc + Xc Γ + c
I Revisit the ”kitchen sink” regression, with all controls:
other taxes (other taxes payable, VAT tax, sales tax, and the highest rate on
personal income tax); log no. of tax payments made, index of tax evasion, and
no. of procedures to start a business; institutional variables (indices for
property rights, rigidity of employment laws, openness to trade, log of GDP pc);
inflation (avg. inflation in the past ten years, and seigniorage)
13/48
Lecture Notes: Artificial Intelligence
Results: ATE
No observations 61 61 61 61 61 61 61 61
Raw covariates 12 12 12 12 12 12 12 12
Notes: Reanalysis of Table 5D from Djankov et al. (2010). DML results are based on 100 splits. Point estimates
are calculated as the median over splits. Standard errors are reported in parentheses. Median standard errors across
splits are reported for the DML estimates. *, ** and *** indicate significance at the 10%, 5% and 1% levels.
14/48
Lecture Notes: Artificial Intelligence
Results: ATE
No observations 60 60 60 60 60 60 60 60
No observations 50 50 50 50 50 50 50 50
Raw covariates 12 12 12 12 12 12 12 12
Notes: Reanalysis of Table 5D from Djankov et al. (2010). DML results are based on 100 splits. Point estimates
are calculated as the median over splits. Standard errors are reported in parentheses. Median standard errors across
splits are reported for the DML estimates. *, ** and *** indicate significance at the 10%, 5% and 1% levels.
15/48
Lecture Notes: Artificial Intelligence
Results: ATE
16/48
Lecture Notes: Artificial Intelligence
Results: ATE
17/48
Lecture Notes: Artificial Intelligence
Results: ATE
Notes: Analysis of Table 5D of Djankov et al. (2010) using the causal random forest. Standard errors are reported
in parentheses. The number of covariates does not include the treatment variable. ***, ** and * indicate
significance at the 1%, 5% and 10% levels respectively.
18/48
Lecture Notes: Artificial Intelligence
Results: ATE
19/48
Lecture Notes: Artificial Intelligence
Results: ATE
20/48
Lecture Notes: Artificial Intelligence
Results: ATE
21/48
Lecture Notes: Artificial Intelligence
Results: ATE
Traditional plough use -10.434*** -9.287*** -11.948*** -11.243*** -11.271*** -11.501*** -11.388*** -12.401***
(3.195) (2.817) (3.129) (3.181) (3.227) (3.182) (3.18) (2.964)
No. observations 165 165 165 165 165 165 165 165
Traditional plough use -13.168*** -12.316*** -13.114**** -12.769** -14.556*** -13.287** -13.54** -15.241***
(3.954) (3.772) (3.945) (4.488) (3.828) (4.128) (4.301) (4.06)
No. observations 123 123 123 123 123 123 123 123
Traditional plough use -5.196** -3.617* -4.29* -5.41** -4.48* -4.817** -5.029** -4.821**
(1.946) (1.707) (1.745) (1.777) (1.853) (1.804) (1.825) (1.782)
No. observations 144 144 144 144 144 144 144 144
Raw covariates 7 7 7 7 7 7 7 7
Notes: Reanalysis of Table 4 from Alesina et al. (2013), columns 1, 3, 5. DML results are based on 100 splits.
Median standard errors across splits are reported in brackets. The number of covariates reported does not include
the treatment variable. * significant at 10 percent; ** significant at 5 percent; *** significant at 1 percent.
22/48
Lecture Notes: Artificial Intelligence
Results: ATE
Panel A: DML, Partially linear Model. Outcome: Female labour force participation
Traditional plough use -5.922 -6.744 -6.445 -5.812 -5.682 -5.925 -5.224 -9.234**
(4.754) (4.086) (4.194) (4.236) (4.664) (4.272) (4.319) (4.301)
No. observations 142 142 142 142 142 142 142 142
Raw covariates 36 36 36 36 36 36 36
Lasso Reg. Tree Boosting Forest Neural Net. Ensemble Best 2SLS
Traditional plough use -38.345* -36.85** -39.429** -36.961** -20.725 -33.645* -38.712* -28.516***
(14.966) (13.161) (13.574) (13.063) (18.414) (16.626) (19.116) (7.559)
No. observations 160 160 160 160 160 160 160 160
Raw covariates 17 17 17 17 17 17 17
Notes: Reanalysis of the main robustness checks of Alesina et al. (2013). DML results are based on 100 splits.
Median standard errors across splits are reported in brackets. The number of covariates reported does not include
the treatment variable. * significant at 10 percent; ** significant at 5 percent; *** significant at 1 percent.
23/48
Lecture Notes: Artificial Intelligence
Results: ATE
24/48
Lecture Notes: Artificial Intelligence
Results: ATE
25/48
Lecture Notes: Artificial Intelligence
Results: ATE
Skill tariff correlation 0.018* 0.016 0.016 0.015 0.014 0.017 0.016 0.035***
(0.010) (0.011) (0.012) (0.011) (0.013) (0.012) (0.011) (0.01)
Tariff differential (low cut-off) 0.009* 0.006 0.007 0.008 0.013 0.009 0.008 0.016***
(0.005) (0.005) (0.005) (0.005) (0.008) (0.006) (0.005) (0.005)
Tariff differential (high cut-off) 0.011* 0.008 0.009 0.009 0.005 0.009 0.009 0.02***
(0.005) (0.005) (0.006) (0.006) (0.008) (0.006) (0.006) (0.004)
Observations 63 63 63 63 63 63 63 63
Raw covariates 17 17 17 17 17 17 17 17
Notes: Standard errors are reported in parentheses. Standard errors adjusted for variability across splits using the
median method are reported for the DML estimates. The number of covariates does not include the treatment
variable. ***, ** and * indicate significance at the 1%, 5% and 10% levels respectively.
industry
26/48
Lecture Notes: Artificial Intelligence
Results: ATE
27/48
Lecture Notes: Artificial Intelligence
Results: ATE
Notes: Analysis of Table 4 (columns 1, 2, 4) of Nunn and Trefler (2010) using the causal random forest. Standard
errors are reported in parentheses. The number of covariates does not include the treatment variable. ***, ** and
* indicate significance at the 1%, 5% and 10% levels respectively.
28/48
Lecture Notes: Artificial Intelligence
Methodology: HTE
29/48
Lecture Notes: Artificial Intelligence
Methodology: HTE
30/48
Lecture Notes: Artificial Intelligence
Methodology: HTE
31/48
Lecture Notes: Artificial Intelligence
Results: HTE
32/48
Lecture Notes: Artificial Intelligence
Results: HTE
33/48
Lecture Notes: Artificial Intelligence
Results: HTE
(1) (2)
District dummies Cluster-robust
Fox News effect (ATE) 0.0065*** 0.0065**
(0.0016) (.0027)
Fox News effect above median 0.013*** 0.0072**
(0.0024) (0.0028)
Fox News effect below median -0.0033 0.0044
(0.0021) (0.0048)
95% CI for the difference (0.01009, 0.02255) (-0.00806, 0.01374)
Observations 9256 9256
Notes: This table reports the estimated average treatment effect and a test for overall heterogeneity using the
causal forest. Standard errors are reported in parentheses. ***, ** and * indicate significance at the 1%, 5% and
10% levels respectively.
2nd approach
34/48
Lecture Notes: Artificial Intelligence
Results: HTE
35/48
Lecture Notes: Artificial Intelligence
Results: HTE
36/48
Lecture Notes: Artificial Intelligence
Results: HTE
37/48
Lecture Notes: Artificial Intelligence
Methodology: HTE
39/48
Lecture Notes: Artificial Intelligence
Results: HTE
40/48
Lecture Notes: Artificial Intelligence
Results: HTE
Notes: The estimates are obtained using neural network to produce the proxy predictor S(Z). The values reported
correspond to the medians over 100 splits.
comparison
41/48
Lecture Notes: Artificial Intelligence
Results: HTE
42/48
Lecture Notes: Artificial Intelligence
Results: HTE
Notes: The estimates are obtained using neural network to produce the proxy predictor S(Z). 90% confidence
intervals are reported in parenthesis. The variables Student math score at baseline and Student baseline math
anxiety are normalized. The values reported correspond to the medians over 100 splits.
43/48
Lecture Notes: Artificial Intelligence
Results: HTE
Simulation Study
44/48
Lecture Notes: Artificial Intelligence
Results: HTE
45/48
Lecture Notes: Artificial Intelligence
Results: HTE
46/48
Lecture Notes: Artificial Intelligence
Results: HTE
47/48
Lecture Notes: Artificial Intelligence
Results: HTE
48/48
Lecture Notes: Artificial Intelligence
Appendix
1/9
Lecture Notes: Artificial Intelligence
Skill tariff correlation 0.026 0.019 0.188* 0.080 0.035 0.153 0.146 0.064***
(0.053) (0.072) (0.103) (0.124) (0.045) (0.115) (0.128) (0.02)
Tariff differential (low cut-off) 0.011 0.013 0.078* 0.044 0.022 0.058 0.058 0.032***
(0.034) (0.028) (0.042) (0.071) (0.022) (0.072) (0.078) (0.01)
Tariff differential (high cut-off) 0.017 0.011 0.055 0.050 0.018 0.063 0.058 0.040***
(0.036) (0.031) (0.035) (0.064) (0.030) (0.058) (0.065) (0.009)
Notes: Standard errors are reported in parentheses. Standard errors adjusted for variability across splits using the
median method are reported for the DML estimates. The number of covariates does not include the treatment
variable. ***, ** and * indicate significance at the 1%, 5% and 10% levels respectively.
back
2/9
Lecture Notes: Artificial Intelligence
Skill tariff correlation 0.046 0.020 0.164* 0.086 0.035 0.142 0.103 0.066***
(0.045) (0.063) (0.091) (0.109) (0.051) (0.105) (0.111) (0.019)
Tariff differential (low cut-off) 0.026 0.015 0.069* 0.048 0.022 0.073 0.055 0.033***
(0.024) (0.033) (0.037) (0.061) (0.026) (0.059) (0.059) (0.01)
Tariff differential (high cut-off) 0.023 0.013 0.068* 0.044 0.019 0.063 0.048 0.039***
(0.021) (0.029) (0.040) (0.059) (0.021) (0.051) (0.056) (0.009)
Notes: Standard errors are reported in parentheses. Standard errors adjusted for variability across splits using the
median method are reported for the DML estimates. The number of covariates does not include the treatment
variable. ***, ** and * indicate significance at the 1%, 5% and 10% levels respectively.
back
3/9
Lecture Notes: Artificial Intelligence
Notes:This table reports the tests for the average treatment effect (mean forest prediction) and for heterogeneity
(differential forest prediction), based on the approach of Chernozhukov et al. (2018), using the causal forest.
Standard errors are reported in parenthesis. * significant at 10 percent; ** significant at 5 percent; *** significant
at 1 percent.
back
4/9
Lecture Notes: Artificial Intelligence
Notes: This table reports the estimated average treatment effect and a test for overall heterogeneity using the
causal forest. Standard errors are reported in parentheses. ***, ** and * indicate significance at the 1%, 5% and
10% levels respectively.
back
5/9
Lecture Notes: Artificial Intelligence
Notes: This table reports the effect of Fox News on the Republican vote share for towns with values below (column
1) and above (column 2) the median of each variable. Column 3 presents the p-value for the null of no difference
between the estimates in columns 1 and 2. Standard errors are reported in parentheses. back
6/9
Lecture Notes: Artificial Intelligence
Notes: The table compares the performance of the three ML methods used to produce the proxy predictors. The
performance measures Best BLP and Best GATES are computed as medians over 100 splits.
back
7/9
Lecture Notes: Artificial Intelligence
Parameter Value
Minimum number of treated and control units 5.00
Number of covariates considered for a split 5.00
Number of trees 1000
Notes: This table shows the values of the tuning parameters used in the generic machine learning method for the
random forest.
back
8/9
Lecture Notes: Artificial Intelligence
9/9