0% found this document useful (0 votes)
14 views

Causal Inference in Python

The document discusses the framework of Rubin's potential outcome model and introduces the Causalinference Python package for causal analysis, highlighting its features such as propensity score estimation and treatment effect estimation. It provides a step-by-step guide on using the package, including data simulation, statistical summaries, and methods for improving covariate balance. The document emphasizes the importance of understanding treatment effects and the role of propensity scores in causal analysis.

Uploaded by

deeplearning.nsz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Causal Inference in Python

The document discusses the framework of Rubin's potential outcome model and introduces the Causalinference Python package for causal analysis, highlighting its features such as propensity score estimation and treatment effect estimation. It provides a step-by-step guide on using the package, including data simulation, statistical summaries, and methods for improving covariate balance. The document emphasizes the importance of understanding treatment effects and the role of propensity scores in causal analysis.

Uploaded by

deeplearning.nsz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

1 Setting and Notation

As is standard in the literature, we work within the framework of Rubin’s potential outcome
model (Rubin, 1974). Let Y (0) denote the potential outcome of a subject in the absence of
treatment, and let Y (1) denote the unit’s potential outcome when it is treated. Let D denote
treatment status, with D = 1 indicating treatment and D = 0 indicating control, and let X be a K-
column vector of covariates or individual characteristics. For unit i, i = 1, 2, . . . , N, the observed
outcome can be written as Y i = (1 − Di )Yi (0) + Di Yi (1) . The set of observables (Y , D , X
i i i

), i = 1, 2, . . . , N, forms the basic input data set for Causalinference . Causalinference is


appropriate for settings in which treatment can be said to be strongly ignorable, as defined in
Rosenbaum and Rubin (1983). That is, for all x in the support of X, we have

(i) Unconfoundedness: D is independent of Y(0) Y(1) conditional on X=x,


(ii) Overlap: c < P(D = 1|X = x) < 1 − c, for some c > 0.

In the following, we illustrate the typical flow of a causal analysis using the tools of
Causalinference and a simulated data set. In simulating the data, we specified a constant
treatment effect of 10 for simplicity, and incorporated systematic overlap issues and
nonlinearities to highlight a number of tools in the package. We focus mostly on illustrating the
use of Causalinference; for details on methodology please refer to Imbens and Rubin (2015).

2. Causalinference
Causalinference is a Python package that provides various statistical methods for causal
analysis. It is a simple package that was used for basic causal analysis learning. The main
features of these packages include:

Propensity score estimation and subclassification


Improvement of covariate balance through trimming
Estimation of treatment effects
Assessment of overlap in covariate distributions We can find the explanation on their web
page for a longer explanation regarding each term.

Let’s try out the Causalinference package. For starters, we need to install the package.

In [1]: pip install causalinference

Defaulting to user installation because normal site-packages is not writeable


Collecting causalinference
Downloading CausalInference-0.1.3-py3-none-any.whl (51 kB)
-------------------------------------- 51.1/51.1 kB 869.9 kB/s eta 0:00:00
Installing collected packages: causalinference
Successfully installed causalinference-0.1.3
Note: you may need to restart the kernel to use updated packages.
In [5]: print(dir(causal))

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format_


_', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass_
_', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex
__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakr
ef__', '_post_pscore_init', 'blocks', 'cutoff', 'est_propensity', 'est_propensity_s',
'est_via_blocking', 'est_via_matching', 'est_via_ols', 'est_via_weighting', 'estimate
s', 'old_data', 'propensity', 'raw_data', 'reset', 'strata', 'stratify', 'stratify_
s', 'summary_stats', 'trim', 'trim_s']

After the installation finishes, we will try to implement a causal model for causal analysis. We
would use the random data that came from the causalinference package.

In [66]: import faker


faker = faker.Faker()
faker.seed_instance(7121)
from causalinference import CausalModel
from causalinference.utils import random_data
#Y is the outcome, D is treatment status, and X is the independent variable
Y, D, X = random_data()
causal = CausalModel(Y, D, X)

The CausalModel class would analyze the data. We would need to do a few more steps to
acquire important information from the model.

First, let’s get the statistical summary.

In [67]: print(causal.summary_stats)

Summary Statistics

Controls (N_c=2503) Treated (N_t=2497)


Variable Mean S.d. Mean S.d. Raw-diff
--------------------------------------------------------------------------------
Y -0.926 1.774 4.986 3.014 5.912

Controls (N_c=2503) Treated (N_t=2497)


Variable Mean S.d. Mean S.d. Nor-diff
--------------------------------------------------------------------------------
X0 -0.337 0.935 0.330 0.949 0.708
X1 -0.320 0.974 0.313 0.935 0.664
X2 -0.327 0.936 0.356 0.946 0.726

In [68]: causal.summary_stats.keys()

dict_keys(['N', 'K', 'N_c', 'N_t', 'Y_c_mean', 'Y_t_mean', 'Y_c_sd', 'Y_t_sd', 'rdif


Out[68]:
f', 'X_c_mean', 'X_t_mean', 'X_c_sd', 'X_t_sd', 'ndiff'])

In [69]: causal.summary_stats['X_t_mean']

array([0.32965893, 0.3130337 , 0.35620761])


Out[69]:

In [70]: causal.summary_stats['ndiff']
array([0.70765718, 0.66358536, 0.7261009 ])
Out[70]:

In [71]: causal.summary_stats['Y_t_mean']

4.986076842982941
Out[71]:

Here rdiff refers to the difference in average observed outcomes between treatment and
control groups. ndiff , on the other hand, refers to the normalized differences in average
covariates, defined as

¯
¯¯ ¯
¯¯
x k,t −x k,t

2 2
s +s
k,t k,c

2

where ¯x
¯¯
k,t
and sk,t are the sample mean and sample standard deviation of the kth covariate of
the treatment group, and ¯x
¯¯
k,c and sk,c are the analogous statistics for the control group.

The normalized differences in average covariates provide a way to measure the covariate
balance between the treatment and the control groups. Unlike the t-statistic, its absolute
magnitude does not increase (in expectation) as the sample size increases.

By using the summary_stats attribute, we would acquire all the basic information of the
dataset.

The main part of causal analysis is acquiring the treatment effect information. The simplest one
to do is by using the Ordinary Least Square method.

3 Least Squares Estimation


One of the simplest treatment effect estimators is the ordinary least squares (OLS) estimator.
Causalinference provides several common regression specifications. By default, the method est
via ols will run the following regression:

¯
¯¯¯¯ ¯
¯¯¯¯
Yi = α + β(Xi − X ) + δYi (Xi − X ) + ϵi

To inspect any treatment effect estimates produced, we can simply invoke print on the attribute
estimates, as in below:

In [72]: causal.est_via_ols()
print(causal.estimates)
Treatment Effect Estimates: OLS

Est. S.e. z P>|z| [95% Conf. int.]


--------------------------------------------------------------------------------
ATE 2.934 0.034 85.182 0.000 2.866 3.001
ATC 1.958 0.040 49.551 0.000 1.881 2.035
ATT 3.911 0.040 97.372 0.000 3.833 3.990

C:\Users\moham\AppData\Roaming\Python\Python310\site-packages\causalinference\estimat
ors\ols.py:21: FutureWarning: `rcond` parameter will change to the default of machine
precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to
keep using the old, explicitly pass `rcond=-1`.
olscoef = np.linalg.lstsq(Z, Y)[0]

ATE, ATC, and ATT stand for Average Treatment Effect, Average Treatment Effect for Control
and Average Treatment Effect for Treated, respectively. Using this information, we could assess
whether the treatment has an effect compared to the control.

Including interaction terms between the treatment indicator D and covariates X implies that
treatment effects can differ across individuals. In some instances we may want to assume a
constant treatment effect, and only run

¯
¯¯¯¯ ¯
¯¯¯¯
Yi = α + β(Xi − X ) + δYi (Xi − X ) + ϵi

This can be achieved by supplying a value of 1 in est via ols to the optional parameter adj (its
default value is 2). To compute the raw difference in average outcomes between treatment and
control groups, we can set adj=0 . In this example, the least squares estimates are radically
different from the true treatment effect of 10. This is the result of the nonlinearity and non-
overlap issues intentionally introduced into the data simulation process. As we shall see, several
other tools exist in Causalinference that can better deal with a lack of overlap and that will
allow us to obtain estimates that are less sensitive to functional form assumptions.

4 Propensity Score Estimation


The probability of getting treatment conditional on the covariates, p(X ) = p(D = 1|X ), also
i i i

known as the propensity score, plays a central role in much of what follows. Two methods, est
propensity and est propensity s, are provided for propensity score estimation. Both involve
running a logistic regression of the treatment indicator D on functions of the covariates.
est_propensity allows the user to specify the covariates to include linearly and/or
quadratically, while est_propensity_s will make this choice automatically based on a
sequence of likelihood ratio tests. In the following, we run est_propensity_s and display the
estimation results. In this example, the specification selection algorithm decided to include both
covariates and all the interaction and quadratic terms.

Using the propensity score method, we could also get information regarding the probability of
treatment conditional on the independent variables.
In [74]: causal.est_propensity_s()
print(causal.propensity)

Estimated Parameters of Propensity Score

Coef. S.e. z P>|z| [95% Conf. int.]


--------------------------------------------------------------------------------
Intercept 0.044 0.042 1.057 0.291 -0.038 0.127
X2 1.017 0.042 24.311 0.000 0.935 1.099
X0 1.032 0.042 24.712 0.000 0.950 1.114
X1 0.986 0.041 23.890 0.000 0.905 1.067
X1*X1 -0.061 0.028 -2.153 0.031 -0.116 -0.005

Using the propensity score method, we could assess the probability of the treatment given the
independent variables.

There are still many methods you could explore and learn from. I suggest you visit the
causalinference web page and learn further.

The propensity attribute is again another dictionary-like container of results. The dictionary keys
of propensity can be found by running:

In [75]: causal.propensity.keys()

dict_keys(['lin', 'qua', 'coef', 'loglike', 'fitted', 'se'])


Out[75]:

In [76]: causal.propensity['lin']

[2, 0, 1]
Out[76]:

In [77]: causal.propensity['qua']

[(1, 1)]
Out[77]:

In [78]: causal.propensity['coef']

array([ 0.04446577, 1.0166564 , 1.03201622, 0.98577117, -0.06090652])


Out[78]:

5 Improving Covariate Balance


When there is indication of covariate imbalance, we may wish to construct a sample where the
treatment and control groups are more similar than the original full sample. One way of doing
so is by dropping units with extreme values of propensity score. For these subjects, their
covariate values are such that the probability of being in the treatment (or control) group is so
overwhelmingly high that we cannot reliably find comparable units in the opposite group. We
may wish to forego estimating treatment effects for such units since nothing much can be
credibly said about them.
A good rule-of-thumb is to drop units whose estimated propensity score is less than α = 0.1 or
greater than 1−α = 0.9. By default, once the propensity score has been estimated by running
either est propensity or est propensity s, a value of 0.1 will be set for the attribute cutoff:

In [79]: causal.cutoff

0.1
Out[79]:

Calling causal.trim() at this point will drop every unit that has propensity score outside of the [α,
1 − α] interval. Alternatively, a procedure exists that will estimate the optimal cutoff that
minimizes the asymptotic sampling variance of the trimmed sample. The method trim s will
perform this calculation, set the cutoff to the optimal α, and then invoke trim to construct the
subsample. For our example, the optimal α was estimated to be slightly less than 0.1:

In [80]: causal.trim_s()

In [81]: causal.cutoff

0.10095500234207272
Out[81]:

The complexity of this cutoff selection algorithm is only O(N log N), so in practice there is very
little reason to not employ it.

6 Stratifying the Sample


With the propensity score estimated, one may wish to stratify the sample into blocks that have
units that are more similar in terms of their covariates. This makes the treatment and control
groups within each propensity bin more comparable, and therefore treatment effect estimates
more credible. Causalinference provides two methods for subclassification based on
propensity score. The first, stratify, splits the sample based on what is specified in the attribute
blocks. The default value of blocks is set to 5, which means that stratify will split the sample into
5 equal-sized bins. In contrast, the second method, stratify_s, will use a data-driven procedure
for selecting both the number of blocks and their boundaries, with the expectation that the
number of blocks should increase with the sample size. Operationally this method is a divide-
and-conquer algorithm that recursively divides the sample into two until there is no significant
advantage of doing so. This algorithm also runs in O(N log N) time, so costs relatively little to
use. To inspect the results of the stratification, we can invoke print on the attribute strata to
display some summary statistics, as follows:

In [82]: causal.stratify_s()

In [83]: print(causal.strata)
Stratification Summary

Propensity Score Sample Size Ave. Propensity Outcome


Stratum Min. Max. Controls Treated Controls Treated Raw-diff
--------------------------------------------------------------------------------
1 0.101 0.152 218 27 0.126 0.127 1.151
2 0.152 0.197 201 42 0.175 0.178 1.430
3 0.197 0.297 364 123 0.247 0.246 1.992
4 0.297 0.392 311 176 0.342 0.346 2.417
5 0.393 0.506 267 220 0.446 0.450 2.660
6 0.506 0.548 121 123 0.527 0.526 3.052
7 0.548 0.602 94 149 0.574 0.576 3.136
8 0.602 0.653 106 138 0.627 0.629 3.445
9 0.653 0.706 76 167 0.681 0.679 3.746
10 0.706 0.802 110 377 0.753 0.755 4.001
11 0.802 0.850 44 200 0.824 0.826 4.489
12 0.851 0.899 34 209 0.871 0.876 4.672

Under the hood, the attribute strata is actually a list-like object that contains, as each of its
elements, a full instance of the class CausalModel, with the input data being those that
correspond to the units that are in the propensity bin. We can thus, for example, access each
stratum and inspect its summary_stats attribute, or as the following illustrates, loop through
strata and estimate within-bin treatment effects using least squares.

In [97]: for stratum in causal.strata:


stratum.est_via_ols(adj=2)

In [100… [stratum.estimates['ols']['att']for stratum in causal.strata]

[1.1010525059277556,
Out[100]:
1.3936541440372463,
2.004643746290025,
2.4002774533086817,
2.662716013620451,
3.0475818205042002,
3.122994358139151,
3.424260986897368,
3.7506272041228654,
3.9854869677920286,
4.443325713714909,
4.67815605365512]

In [102… stratum.estimates['ols']['att']

4.67815605365512
Out[102]:

Note that these estimates are much more stable and closer to the true value of 10 than the
within-bin raw differences in average outcomes that were reported in the stratification summary
table, highlighting the virtue of further controlling for covariates even within blocks. Taking the
sample-weighted average of the above within-bin least squares estimates results in a
propensity score matching estimator that is commonly known as the subclassification estimator
or blocking estimator. However, instead of manually looping through the strata attribute,
estimating within-bin treatment effects, and then averaging appropriately to arrive at an overall
estimate, we can also simply call est via blocking, which will perform these operations and
collect the results in the attribute estimates. We will report these estimates in the next section
along with estimates obtained from other, alternative estimators.

7 Treatment Effect Estimation


In addition to least squares and the blocking estimator described in the last section,
Causalinference provides two alternative treatment effect estimators. The first is the nearest
neighborhood matching estimator of Abadie and Imbens (2006). Instead of relying on the
propensity score, this estimator pairs treatment and control units by matching directly on the
covariate vectors themselves. More specifically, each unit i in the sample is matched with a unit
m(i) in the opposite group, where

m(i) = argmin ∥Xi − Xj ∥


j:Dj ≠Di

and ∥X i − Xj ∥ is some measure of distance between the covariate vectors X and X . The
j i

method est_via_matching implements this estimator, as well as several extensions that can be
invoked through optional arguments.

The last estimator is a version of the Horvitz-Thompson weighting estimator, modified to


further adjust for covariates. Mechanically, this involves running the following weight least
squares regression:


Yi = α + βDi + γ Xi + ϵi

where the weight for unit i is 1/p^(X) if i is in the treatment group, and 1/(1 − p^(X)) if i is in
the control group. This estimator is also sometimes called the doubly-robust estimator,
referring to the fact that this estimator is consistent if either the specification of the propensity
score is correct, or the specification of the regression function is correct. We can invoke it by
calling est via weighting. Note that under this specification the treatment effect does not differ
across units, so the ATC and the ATT are both equal to the overall ATE

In the following we invoke each of the four estimators (including least squares, since the input
data has changed now that the sample has been trimmed), and print out the resulting
estimates.

In [103… causal.est_via_ols()

In [104… causal.est_via_weighting()
C:\Users\moham\AppData\Roaming\Python\Python310\site-packages\causalinference\estimat
ors\weighting.py:23: FutureWarning: `rcond` parameter will change to the default of m
achine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to
keep using the old, explicitly pass `rcond=-1`.
wlscoef = np.linalg.lstsq(Z_w, Y_w)[0]

In [105… causal.est_via_blocking()

In [106… causal.est_via_matching(bias_adj=True)

C:\Users\moham\AppData\Roaming\Python\Python310\site-packages\causalinference\estimat
ors\matching.py:100: FutureWarning: `rcond` parameter will change to the default of m
achine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to
keep using the old, explicitly pass `rcond=-1`.
return np.linalg.lstsq(X, Y)[0][1:] # don't need intercept coef

In [107… print(causal.estimates)

Treatment Effect Estimates: OLS

Est. S.e. z P>|z| [95% Conf. int.]


--------------------------------------------------------------------------------
ATE 2.938 0.036 82.237 0.000 2.868 3.008
ATC 2.463 0.039 62.482 0.000 2.385 2.540
ATT 3.412 0.039 86.655 0.000 3.334 3.489

Treatment Effect Estimates: Weighting

Est. S.e. z P>|z| [95% Conf. int.]


--------------------------------------------------------------------------------
ATE 2.932 0.041 71.746 0.000 2.852 3.012

Treatment Effect Estimates: Blocking

Est. S.e. z P>|z| [95% Conf. int.]


--------------------------------------------------------------------------------
ATE 2.932 0.037 79.554 0.000 2.860 3.004
ATC 2.465 0.042 58.037 0.000 2.382 2.549
ATT 3.398 0.040 84.079 0.000 3.319 3.477

Treatment Effect Estimates: Matching

Est. S.e. z P>|z| [95% Conf. int.]


--------------------------------------------------------------------------------
ATE 2.911 0.069 42.171 0.000 2.775 3.046
ATC 2.427 0.082 29.428 0.000 2.265 2.588
ATT 3.394 0.081 42.106 0.000 3.236 3.552

As we can see above, despite the trimming the least squares estimates are still severely biased,
as is the weighting estimator (since neither the propensity score or the regression function is
correctly specified). The blocking and matching estimators, on the other hand, are less sensitive
to specification assumptions, and thus result in estimates that are closer to the true average
treatment effects.
References
Abadie, A., & Imbens, G. (2006). Large sample properties of matching estimators for
average treatment effects. Econometrica, 74 , 235-267.
Crump, R., Hotz, V. J., Imbens, G., & Mitnik, O. (2009). Dealing with limited overlap in
estimation of average treatment effects. Biometrika, 96 , 187-199.
Imbens, G. W., & Rubin, D. B. (2015). Causal inference in statistics, social, and biomedical
sciences: An introduction. Cambridge University Press.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in
observational studies for causal effects. Biometrika, 70 , 41-55.
Rubin, D. B. (1974). Estimating Causal Effects of Treatments in Randomized and
Nonrandomized Studies. Journal of Educational Psychology, 66 , 688-701

In [ ]:

You might also like