Causal ML Python package for causal inference machine learning
Causal ML Python package for causal inference machine learning
SoftwareX
journal homepage: www.elsevier.com/locate/softx
article info a b s t r a c t
Article history: ‘‘Causality’’ is a complex concept that is based on roots in almost all subject areas and aims to
Received 18 September 2022 answer the ‘‘why’’ question. Causal inference is one of the important branches of causal analysis,
Received in revised form 4 December 2022 which assumes the existence of relationships between variables and attempts to examine and quantify
Accepted 8 December 2022
the actual relationships in the available data. Machine learning (ML) and causal inference are two
Keywords: techniques that emerged and developed separately. However, there is now an intersection between
Causal ML these two fields. Causal ML is a Python package that provides a set of uplift modeling and causal
Causal inference inference methods using machine learning algorithms based on recent research. It gives the user a
Machine learning standard interface that lets them estimate conditional average treatment effects (CATE) or individual
treatment effects (ITE) based on experimental observational data.
© 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
Code metadata
1. Motivation and significance purpose of causal inference is to examine the possible effects of
intervening in a particular system [4].
Nowadays, the study of causality covers a wide range of disci- The Potential Outcome Model [5] and the Structural Causal
plines, such as economics, law, medicine, environmental science, Model (SCM) [6] are two popular models in the field of causal in-
computer science, and philosophy [1–3]. In almost all fields of ference. Both of the two frameworks are essentially the same [7],
research, there are numerous ‘‘why’’ questions, which are the in that they infer causality in observed data but use different
path to the future of science, and causal analysis aims to explain ideological principles. The Potential Outcome Model defines a
the why. In the field of causal analysis, two main tasks can causal effect as the difference between potential outcomes for the
be distinguished: causal discovery and causal inference. Causal same subject [5], and links the intervention to the research object.
discovery is responsible for analyzing and creating models that The intervention is the cause, and the result of the intervention
account for the relationships inherent in the data. While the is the effect [8]. Compared with the Potential Outcome Model,
the SCM is more accurate in determining confounders [9] and is
∗ Corresponding author at: School of Economics and Management, Huainan becoming more and more popular. The SCM uses graph theory,
Normal University, Dongshan West Road, Huainan, Anhui, China.
a mathematical tool, to formally express the causal assumptions
E-mail addresses: [email protected] (Yang Zhao), behind the data. In contrast to the potential outcome model,
[email protected] (Qing Liu). the structural causal model approach requires knowledge of the
https://doi.org/10.1016/j.softx.2022.101294
2352-7110/© 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Yang Zhao and Qing Liu SoftwareX 21 (2023) 101294
existing causal model but does not make strong assumptions research [21]. Causal ML is mainly based on Structural Causal
about the model form. Model to observe causal effects from the data. However, it also
Although correlations can be viewed as the result of un- uses some Potential Outcome Model approaches such as Match-
derlying causal mechanisms [10], however, as expressed by a ing [22]. It provides a common interface through which users can
well-known saying in causal analysis, correlations do not im- estimate conditional average treatment effects (CATE) or individ-
ply causality [11]. Additional assumptions about the underlying ual treatment effects (ITE) using experimental or observational
causal structure are often necessary to guide inference based on data.
observed data before causal inference can be made.
In [12], this process of establishing causality is divided into 3 2. Software description
stages: analysis of the variables involved in the research question;
analysis of the causal map and causal mechanisms; and causal ad- 2.1. Software architecture
equacy assumptions. In [13], a causal analysis approach based on
causal maps was further elaborated with reference to the studies 2.1.1. Algorithms
of others [14–17]. [13] argue that in causal analysis, it is necessary There is a lot of exploratory research on the use of ML for
to first identify the research questions, including the object of causal inference [23,24], and there are some important practical
observation, the main participants, the main phenomena, and the applications in these studies. Causal ML aims to build a one-stop
size of the examination. Then a causal model is specified and the store of machine learning for causal inference. This is an approach
research question is transformed into a causal problem. Finally, that can play an important role in business, science, and other
the variables’ causal links must be investigated, recognized, and fields. It can democratize ascending modeling approaches that
postulated. As shown in Fig. 1, the analysis process of [13] is currently exist only in academic papers or different statistical
consistent with [12]. The absence of an adequate causal analysis packages. The current version of Causal ML implements 15 state-
prior to causal inference will lead to fallacies in the results, of-the-art uplift modeling algorithms (See Fig. 2). With its good
which [18] describes as "causal model misspecification’’. [13], on generality and ease of use, Causal ML is an excellent choice for
the other hand, argues that such errors are due to incorrect causal causal inference.
models, and therefore such errors should be called ‘‘cause model For reasons of manuscript length and structure, we present
neglect’’: i.e., a failure to use causal knowledge to carefully specify
only some of the algorithms in Fig. 2. The Meta-Learner Algo-
the target statistical parameters before performing estimation.
rithms build on the base algorithm to estimate the ATE [23,
Most of the time, the work of causal inference starts from
25]. Theoretically, meta-algorithms can use any base learner,
the premise that the causal variables are given [19]. The task of
such as random forests (RFs), Bayesian additive regression trees
causal inference is founded on the concept of causal sufficiency,
(BARTs), or neural networks. Causal ML supports five main meta-
which is used to assess the causal influence of a specific variable
algorithms: the S-learner, the T-learner, the X-learner, the R-
(treatment) on an outcome of interest and thus causally explain
learner, and the Doubly Robust (DR) learner. Each meta-algorithm
the complete causal model [11,13]. There are many ways to esti-
has a different way of estimating the average output and defining
mate causal effects, e.g., with matching, G-computation, inverse
CATE. S-Learner and T-Learner are used to show the differences
weighting, or augmented inverse weighting. The two fields of
between them.
causal analysis and machine learning arose and developed sep-
The S-learner uses a single machine learning model to esti-
arately. However, in recent years, the two fields have intersected.
mate treatment effects, as follows:
It has been found that statistical-based AI is not intelligent be-
cause statistical models cannot generalize, whereas causal models Step 1 Using a machine learning model to estimate average out-
allow for modeling distributional changes through the concept comes µ (x) using covariates X and a treatment effect indicator
of intervention [7,20]. Similarly, causal analysis can also benefit variable W:
from machine learning. While traditional causal analysis is based
on statistical models, current machine learning is also based on µ (x) = E [Y |X = x, W = w] . (1)
statistical models. Supervised learning gives us a way to do this
kind of exploration in a way that is strong, logical, and fully Step 2 The estimated value of CATE is defined as:
planned out. This can make up for the problems with traditional τ̂ (x) = µ̂ (x, W = 1) − µ̂ (x, W = 0) . (2)
causal inference [13].
Causal ML is a Python package that offers a suite of mathemat- The T-learner is simple and easy to understand, and it trains
ical modeling and causal inference tools based on cutting-edge two base learners (the name ‘‘T’’ comes from two base models) for
2
Yang Zhao and Qing Liu SoftwareX 21 (2023) 101294
the control and treatment groups. The estimation process consists The Euclidean Distance (ED) is given by the following equa-
of two steps. tion:
Step 1 Estimate the average output of the models µ0 (x) and
∑
D (P : Q ) = (pk − qk )2 , (8)
µ1 (x): k =left , right
µ0 (x) = E [Y (0) |T = 0] , (3) where the symbols are the same as in the above equation.
The general idea of matching methods is to find treatment and
µ1 (x) = E [Y (1)|T = 1]. (4) non-treatment units that are as similar as possible in terms of rel-
evant characteristics. Thus, matching methods can be seen as part
Step 2 The estimate of the average treatment effect (ATE) is of a family of causal inference methods that attempt to mimic
defined as
randomized controlled trials. Many scholars have explored causal
τ̂ (x) = µ̂1 (x) − µ̂0 (x). (5) inference pathways based on matching methods. [30] addressed
the problem of matching methods by redefining the matching
The random forest algorithm, proposed by [26], is a widely
problem as a subset selection problem. [31] proposed a new
used statistical learning algorithm. Statisticians usually study ran-
approach to matching in observational studies. The essential ideas
dom forests as a practical method for nonparametric conditional
of this novel method of matching are to balance the treatment
mean estimation [27]. Causal ML supports boosted tree-based es-
groups with respect to a target population and to formulate the
timation of CATE. The boosted tree method is a group of methods
matching problem as a linear-sized mixed integer.
that use a tree-based algorithm and split based on the difference
in boosts. [28] proposed three different methods to quantify the Although there are several techniques to match treated and
divergence gain due to splitting [29]. untreated units, the propensity score is the most widely used
approach:
P T , P C − Dbefore PT , PC ,
( ) ( )
Dgain = Dafter split split (6)
ei (X i ) = P(W i = 1|X i ), (9)
where P T and P C stand for the probability distributions of the de-
sired outcome in the treatment and control groups, respectively, Following that, treated and untreated units are matched in
and D denotes divergence. Causal ML implements several differ- terms of e(X) using a distance criterion, such as k:1 nearest neigh-
ent methods for quantifying divergence, such as Kullback–Leibler bors. Since the treatment group and the control population are of-
(KL), Euclidean Distance (ED), and Chi, etc. ten matched, this method estimates the average treatment effect
The Kullback–Leibler (KL) divergence is given by the following on the treated (ATT).
equation:
E [Y (1) |W = 1] − E [Y (0)|W = 1] (10)
∑ pk
L (P : Q ) = pk log , (7) The advantages and disadvantages of different matching meth-
qk
k =left ,right ods are discussed in [32].
where p is the sample mean of the treatment group and q is the The evolution of causal ML has paralleled the continued mat-
sample mean of the control group, and, indicates the leaf in which uration of causal inference theory and machine learning causal
p and q are computed(Liu & Shum, 2003). inference. Although Causal ML contains many mature causal
3
Yang Zhao and Qing Liu SoftwareX 21 (2023) 101294
inference algorithms, such as tree-based algorithms and meta- total treatment effect estimation (CTTE). TTE can be defined as
learning-based algorithms, it is not perfect, and many of the the comparison of a binary treatment T to Y interventions:
latest algorithms have not been implemented. However, it is not
TTE = E [Y |T = 1] − E [Y |T = 0] . (11)
perfect, and many of the latest algorithms have not yet been
implemented. The reasons for this are related to both Causal ML CTTE can be defined as the comparison of interventions of a
itself, such as the enthusiasm of the contributors and the scope of binary treatment T on Y, with X = x is satisfied:
their knowledge, as well as the maturity and industrialization of
CTTE = E [Y |T = 1, X = x] − E [Y |T = 0, X = x] . (12)
the algorithms. Causal ML does not usually go ahead and use an
algorithm first. Instead, it waits until other platforms have shown In the process of supervised learning, prediction models can
that the algorithm is ready and reliable. evaluate the importance of feature variables based on their con-
tribution to the prediction performance. Based on this prop-
2.1.2. Package erty, Causal ML provides causality analysis that can be explained.
Table 1 shows the main package structure of Causal ML, Causal ML can evaluate the importance of x0 , x1 , . . . , xm in the
the details of which are available on the official website of process of quantifying the effect of X on Y (X → Y ) by considering
Causal ML. In Causal ML, the initial release (V0.2.0, 2019-08-12) a set of feature combinations X = [x0 , x1 , . . . , xm ] consisting of
mainly supports causal inference based on uplift trees and meta- multiple feature variables.
learning algorithms. These algorithms were placed under the Causal ML also provides methods to interpret the trained
causalml.inference.tree and causalml.inference.meta packages, and treatment effect models, such as Meta-Learner Feature Impor-
their parent package was causalml.inference. In versions V0.7.0 tances Visualization, Uplift Tree Visualization, and Uplift Tree
(2020-02-28) and V0.8.0 (2020-07-17), Causal ML updated the Feature Importances Visualization.
nn subpackage and iv subpackage of causalml.inference to sup- Because the true value is not known except for the experi-
port neural network-based and 2SLS-based inference algorithms, mental data, estimation of the treatment impact cannot be val-
respectively. It is easy to see that causalml.inference was designed idated in the same manner as ordinary ML predictions. Here,
at the beginning mainly for organizing and archiving algorithms we concentrate on internal validation approaches based on the
belonging to Structural Causal Model (SCM) [6], while algorithms premise of unconfoundedness of potential outcomes and treat-
based on Potential Outcome Model [5] were not placed under ment status based on the feature set at our disposal. Causal ML
the causalml.inference package, for example, matching algorithms currently supports validation with multiple estimates, validation
were placed under the causalml.match package. with synthetic data sets, validation with uplift curve (AUUC), and
The causalml.features, causalml.dataset, and causalml.metrics validation with sensitivity analysis validation methods.
packages were also present at the beginning of the release of
Causal ML, and these packages provide supporting work for 3. Illustrative examples
data and feature processing and evaluation necessary for causal
We use an example of evaluating the impact of holidays on
inference to be performed. causalml.feature_selection is another
investor sentiment to show how causal effects can be assessed
supporting toolkit updated in Version 7.0 (2020-02-28) for inter-
using Causal ML. Both traditional statistics-based causal estima-
preting the results of causal inference.
tion and machine learning-based causal estimation are based on
Since causal inference machine learning is still a rapidly evolv-
determined causal sufficiency (Mooney et al. 2021). In contrast,
ing branch of technology and Causal ML is a young scientific tool,
real-world problem analysis and causal adequacy assumptions
there are some implausibilities in its structural organization. For
rely almost exclusively on human knowledge and expertise. Ne-
example, matching-based, propensity score, and 2SLS are some
glecting causal adequacy analysis will lead to causal model errors,
traditional causal inference tools, and only the 2SLS method is
thus rendering causal explanations meaningless.
placed under the causalml.inference package. If this is done to
In our case, we use the general context of the Wuhan city
tell the difference between Structural Causal Model and Potential
closure at the beginning of the COVID-19 outbreak as an in-
Outcome Model, algorithms that are not part of the Structural tervention. With reference to [33], we analyzed the impact of
Causal Model are not put in the same package. Another example Wuhan City’s closure on stock prices and investor sentiment. This
is NearestNeighborMatch which is a Propensity Score Matching paper constructs an investor confidence index (ICI) as a proxy for
type of method placed under the Causal ML.match package. Elas- investor sentiment based on investor messages on social media
ticNetPropensityModel is a Propensity Score Estimation method with reference to the method of Antweiler & Frank (2004). And,
that is placed under the Causal ML.propensity package. This ap- as [34] suggested, the agreement index was not created.
pears to be a reasonable but not very well justified way of laying At the beginning of the study, we must make the neces-
out the facts. The difficulty in explaining these organizational sary elaborations and causal assumptions about the real-world
structures is certainly related to the rapid growth of the subject problem.
area and the collaboration of multiple developers. But it also
shows that the way the software is set up has not been thought 3.1. Relevant variables
through well enough. But this cannot be fixed overnight, and we
think these problems will be solved well as knowledge of causal We consider 2 variables P d , ICI d and a treatment T to evaluate
inference machine learning grows and versions are updated. the effect of city closure on the causal relationship between stock
price and investor sentiment.
2.2. Software functionalities (1) P d is the stock price fundamental at date d. P d consists of
the opening price pdO , the closing price pdC , the highest price pdH
Essentially, Causal ML estimates the effect of intervention T on and the lowest price pdL , P d = {pdO , pdC , pdH , pdL }.
the causal relationship X → Y without making strong assump- (2) The investor confidence index ICI d for date d is obtained by
tions about the form of the model. Causal ML implements multi- calculating the number of positive messages NPositive and the num-
ple treatment effect evaluations such as average treatment effect ber of negative messages NNegative among the messages published
estimates (ATE), conditional average treatment effect (CATE) es- by investors on social media on date d.
timates, and individual treatment effects (ITE). ATE and CATE are (3) Treatment T: We see the closure of the city as an interven-
based on total treatment effect (TTE) estimates and conditional tion.
4
Yang Zhao and Qing Liu SoftwareX 21 (2023) 101294
Table 1
The main packages of Causal ML.
Package Parent package Package description Method examples
Causal ML.inference.tree Causal ML.inference Package of tree algorithms. UpliftTreeClassifier
UpliftRandomForestClassifier
Causal ML.inference.meta Causal ML.inference Package of meta-learning algorithms. LRSRegressor XGBTRegressor,
MLPTRegressor
Causal ML.inference.iv Causal ML.inference Package based on 2SLS method. IVRegressor
Causal ML.inference.nn Causal ML.inference Package of neural network algorithms. CEVAE
Causal ML.optimize Causal ML A package of optimization methods using CounterfactualValueEstimator
counterfactual unit selection and a counterfactual PolicyLearner get_actual_value
value estimator.
Causal ML.match Causal ML Package based on matching methods NearestNeighborMatch
MatchOptimizer
Causal ML.propensity Causal ML Package based propensity score model. ElasticNetPropensityModel
Causal Causal ML A class for feature importance methods. FilterSelect
ML.feature_selection
Causal ML.dataset Causal ML Package providing simulated data. make_uplift_classification
synthetic_data
Causal ML.metrics Causal ML Package for causal inference effect validation or to Sensitivity SensitivitySubsetData
assist in causal inference effect validation.
Causal ML.features Causal ML A package that handles label vector LabelEncoder OneHotEncoder
3.2. Causal maps We consider the environmental context of the city’s closure as
an intervention. The effects of the city closure on stock price (P)
We assume a causal graph as shown in Fig. 3, where stock and investor sentiment (ICI) are estimated by assessing the causal
market fundamentals P and investor sentiment ICI are causally effects of stock price (P) and investor sentiment (ICI) before,
related to each other. during, and after the city closure.
(1) Fundamentals of stock prices at time t0 (P t0 ) influence We use T-learner, a meta-learning algorithm, to perform causal
investor sentiment at time t1 (ICI t1 ), effect estimation. We estimate the effect of city closure on the
P t0 → ICI t1 . causal effect between stock price fundamentals and investment
(2) Investor sentiment ICI t1 at t1 influences stock market per- sentiment (P ↔ ICI) using LGBMRegressor, XGBRegressor, and
formance P t2 at t2 , ICI t1 → P t2 . And t0 , t1 and t2 can be very close RandomForestRegressor as the underlying learners, respectively.
to each other. The interaction between the stock price P and the As shown in Table 2, the average disposition effect of city closure
investor sentiment ICI is high-frequency [35–37]. on P ↔ ICI is 0.7516, which implies a higher synergy between
(3) An extraneous change in the environment interferes with stock price and investment sentiment during city closure. In-
the causal mechanism of P ↔ ICI. In this case, we assess the vestors are more sensitive to changes in stock prices during the
impact of the external intervention of the closed city on the causal city’s closure period.
mechanism of stock price and investor sentiment.
4. Impact
3.3. Causal sufficiency
Causal ML applies the latest advances in the field of causal
In this example, we assume that the fundamentals of the stock inference machine learning from academic papers to the real
market (P) and how investors feel about the market (ICI) are world. It facilitates the practical application of machine learning
related in a causal way. Our hypothesis is based on studies by oth- in the field of causal inference by building a one-stop shop for
ers. [38] argue that investor sentiment is an investment’s estimate machine learning causal inference. Causal ML was released in
of future earnings and cash flows. Current stock market perfor- its first public version in August 2019 and has been updated 16
mance affects investors’ estimates, so P → ICI. And investor times.
sentiment likewise affects stock market fundamentals [38–40], so Causal ML received a lot of attention from the academic com-
ICI → P. We assume that there are no unobserved confounding munity upon its publication. [4] listed Causal ML as one of the
factors, i.e., that ICI ↔ P is causally sufficient. Although this is four currently available tools for causal inference. Another causal
a strong assumption, it is necessary for reasoning about causal inference tool recommended by Yao et al. DoWhy, supports in-
effects [12]. tegration with EconML and Causal ML packages [41]. The Causal
5
Yang Zhao and Qing Liu SoftwareX 21 (2023) 101294
Table 2
The causal effect of lockdown in Wuhan.
LGBMRegressor XGBRegressor RandomForestRegressor Mean
E 0.5784 0.8658 0.8106 0.7516
Note: The range of investor sentiment for the statistical sample is [−2.145, 2.124].
ML was used in the study by [42] to construct uplift trees. An [3] White AA, Pichert JW, Bledsoe SH, Irwin C, Entman SS. Cause and effect
overview by [43] summarizes the latest causal methods, which analysis of closed claims in obstetrics and gynecology. Obstet Gynecol
2005;105(5 Part 1):1031–8. http://dx.doi.org/10.1097/01.AOG.0000158864.
includes an introduction to Causal ML. [44] used Causal ML to
09443.77.
study the causal effect of dynamic quarantine policies on COVID- [4] Yao L, Chu Z, Li S, Li Y, Gao J, Zhang A. A survey on causal inference.
19. ACM Trans Knowl Discov Data 2021;15(5):1–46. http://dx.doi.org/10.1145/
In summary, Causal ML is a mature Python package for causal 3444944.
[5] Rubin DB. Bayesian inference for causal effects: The role of randomization.
inference that has been adopted by numerous studies and is of
Ann Statist 1978;34–58, https://www.jstor.org/stable/2958688.
great academic value. [6] Pearl J. Causal diagrams for empirical research. Biometrika
1995;82(4):669–88. http://dx.doi.org/10.1093/biomet/82.4.669.
5. Conclusions [7] Neuberg LG. Causality: Models, reasoning, and inference, by Judea Pearl. In:
Econometric theory 2003. vol. 19, (4):Cambridge University Press; 2000, p.
675–85.
In this paper, we introduce Causal ML, an open-source Python [8] Imbens GW, Rubin DB. Causal inference in statistics, social, and biomedical
package that can be used for causal inference. It widely incor- sciences. Cambridge University Press; 2015.
porates the latest research results in causal inference machine [9] Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic
research. Epidemiology 1999;37–48, https://www.jstor.org/stable/3702180.
learning and applies these methods, which exist in the academic
[10] Zeh H-D. The direction of time. Springer; 1989.
literature, to practice. It provides a unified interface for causal [11] Nogueira AR, Pugnana A, Ruggieri S, Pedreschi D, Gama J. Methods and
inference machine learning and a one-stop integration solution, tools for causal discovery and causal inference. In: Wiley interdisciplinary
which is very important for academic research. reviews: data mining and knowledge discovery. vol. 12, (2):2022, e1449.
http://dx.doi.org/10.1002/widm.1449.
We provide a simple case that illustrates the process of causal
[12] von Kügelgen J, Gresele L, Schölkopf B. Simpson’s paradox in Covid-19 case
inference using Causal ML. It is important to emphasize that fail- fatality rates: A mediation analysis of age-related causal effects. IEEE Trans
ure to perform adequate causal analysis prior to causal inference Artif Intell 2021;2(1):18–27. http://dx.doi.org/10.1109/TAI.2021.3073088.
may lead to ‘‘causal model misspecification’’ [18]. [13] Balzer LB, Petersen ML. Invited commentary: Machine learning in causal
The latest version of Causal ML is 0.13.0 (updated in Septem- inference—How do I Love thee? Let me count the ways. Am J Epidemiol
2021;190(8):1483–7. http://dx.doi.org/10.1093/aje/kwab048.
ber 2022) and is still a causal inference tool in its infancy. As such, [14] L. Balzer, Petersen M, van der Laan MJ. Tutorial for causal inference. 2016.
it still has many imperfections. Causal ML has not evolved in iso- [15] Petersen ML. Applying a causal road map in settings with time-dependent
lation, and its development is closely linked to that of other open- confounding. Epidemiology (Cambridge, Mass.) 2014;25(6):898. http://dx.
source machine learning products, such as scikit-learn (https: doi.org/10.1097/EDE.0000000000000178.
[16] Petersen ML, van der Laan MJ. Causal models and learning from
//scikit-learn.org/stable/). Since Causal ML tends to use some data: Integrating causal modeling and statistical estimation. Epidemi-
well-established algorithms that have been validated by other ology (Cambridge, Mass.) 2014;25(3):418. http://dx.doi.org/10.1097/EDE.
software, some advanced algorithms such as GANITE (a causal 0000000000000078.
inference method based on generative adversarial networks for [17] Spirtes P. Introduction to causal inference. J Mach Learn Res 2010;11(5).
[18] Mooney SJ, Keil AP, Westreich DJ. Thirteen questions about using machine
learning the uncertainty of counterfactual distributions [45]) re- learning in causal research (you won’t believe the answer to number
main unimplemented. For a rapidly evolving scientific tool, Causal 10!). Am J Epidemiol 2021;190(8):1476–82. http://dx.doi.org/10.1093/aje/
ML certainly has some problems, but many of them cannot be kwab047.
solved overnight and depend on the contributions of a larger [19] B. Schölkopf, Locatello F, Bauer S, Ke NR, Kalchbrenner N, Goyal A, et
al. Toward causal representation learning. Proc IEEE 2021;109(5):612–34.
number of scholars and the joint progress of academia and in-
http://dx.doi.org/10.1109/JPROC.2021.3058954.
dustry. However, Causal ML is maturing in the next iteration as [20] Pearl J. Models, reasoning and inference. vol. 19, (2). UK: Cambridge
more and more positive academic results are absorbed by Causal University Press; 2000.
ML. We believe that Causal ML will become one of the best uses [21] H. Chen, Harinen T, Lee J-Y, Yung M, Zhao Z. Causal ML: Python package
for causal machine learning. 2020, http://dx.doi.org/10.48550/arXiv.2002.
of machine learning for causal inference.
11631, arXiv March 2.
[22] A. Abadie, Imbens GW. Matching on the estimated propensity score.
Declaration of competing interest Econometrica 2016;84(2):781–807. http://dx.doi.org/10.3982/ECTA11293.
[23] Künzel SR, Sekhon JS, Bickel PJ, Yu B. Metalearners for estimating het-
erogeneous treatment effects using machine learning. Proc Natl Acad Sci
The authors declare that they have no known competing finan- 2019;116(10):4156–65. http://dx.doi.org/10.1073/pnas.1804597116.
cial interests or personal relationships that could have appeared [24] Nie X, Wager S. Quasi-oracle estimation of heterogeneous treatment ef-
to influence the work reported in this paper. fects. Biometrika 2021;108(2):299–319. http://dx.doi.org/10.1093/biomet/
asaa076.
[25] Mishra N, Rohaninejad M, Chen X, Abbeel P. A simple neural attentive
Data availability meta-learner. 2017, arXiv preprint arXiv:1707.03141.
[26] Breiman L. Random forests. Mach Learn 2001;45(1):5–32. http://dx.doi.org/
Data will be made available on request. 10.1023/A:1010933404324.
[27] Athey S, Tibshirani J, Wager S. Generalized random forests. Ann Statist
2019;47(2):1148–78. http://dx.doi.org/10.1214/18-AOS1709.
References [28] Rzepakowski P, Jaroszewicz S. Decision trees for uplift modeling with
single and multiple treatments. Knowl Inf Syst 2012;32(2):303–27.
[1] Collins SL, Glenn SM, Gibson DJ. Experimental analysis of intermediate [29] Gutierrez P, Gérardy J-Y. Causal inference and uplift modelling: A review
disturbance and initial floristic composition: Decoupling cause and effect. of the literature. In: International conference on predictive applications
Ecology 1995;76(2):486–92. http://dx.doi.org/10.2307/1941207. and APIs. PMLR; 2017, p. 1–13.
[2] Umeda T, Kuriyama T, O’shima E, Matsuyama H. A graphical approach to [30] Tam Cho WK, Sauppe JJ, Nikolaev AG, Jacobson SH, Sewell EC. An optimiza-
cause and effect analysis of chemical processing systems. Chem Eng Sci tion approach for making causal inferences. Stat Neerl 2013;67(2):211–26.
1980;35(12):2379–88. http://dx.doi.org/10.1016/0009-2509(80)85051-2. http://dx.doi.org/10.1111/stan.12004.
6
Yang Zhao and Qing Liu SoftwareX 21 (2023) 101294
[31] Bennett M, Vielma JP, Zubizarreta JR. Building representative matched [38] Baker M, Wurgler J. Investor sentiment in the stock market. J Econ Perspect
samples with multi-valued treatments in large observational studies. 2007;21(2):129–52. http://dx.doi.org/10.1257/jep.21.2.129.
J Comput Graph Statist 2020;29(4):744–57. http://dx.doi.org/10.1080/ [39] McGurk Z, Nowak A, Hall JC. Stock returns and investor sentiment: Textual
10618600.2020.1753532. analysis and social media. J Econ Finance 2020;44(3):458–85.
[32] Stuart EA. Matching methods for causal inference: A review and a look [40] Sayim M, Rahman H. The relationship between individual investor senti-
forward. Statistical science: a review journal of the institute of mathemat- ment, stock return and volatility: Evidence from the Turkish market. Int J
ical statistics. In: Matching methods for causal inference: A review and a Emerg Markets 2015. http://dx.doi.org/10.1108/IJoEM-07-2012-0060.
look forward. 2010;25(1):1. http://dx.doi.org/10.1214/09-STS313. [41] Sharma A, Kiciman E. DoWhy: An end-to-end library for causal inference.
[33] Liu Q, Lee W-S, Huang M, Wu Q. Synergy between stock prices and 2020, http://dx.doi.org/10.48550/arXiv.2011.04216, arXiv preprint arXiv:
investor sentiment in social media. In: Borsa Istanbul review. 2022, http: 2011.04216.
//dx.doi.org/10.1016/j.bir.2022.09.006. [42] Bozorgi ZD, Teinemaa I, Dumas M, La Rosa M, Polyvyanyy A. Process
[34] Liu Q, Zhou X, Zhao L. View on the bullishness index and agreement index. mining meets causal machine learning: Discovering causal rules from event
Front Psychol 2022;13. http://dx.doi.org/10.3389/fpsyg.2022.957323. logs. In: 2020 2nd international conference on process mining. ICPM, IEEE;
[35] Kleinnijenhuis J, Schultz F, Oegema D, van Atteveldt W. Financial 2020, p. 129–36. http://dx.doi.org/10.1109/ICPM49681.2020.00028.
news and market panics in the age of high-frequency sentiment trad- [43] Xu G, Duong TD, Li Q, Liu S, Wang X. Causality learning: A new perspective
ing algorithms. Journalism 2013;14(2):271–91. http://dx.doi.org/10.1177/ for interpretable machine learning. 2020, arXiv preprint arXiv:2006.16789.
1464884912468375. [44] Kristjanpoller W, Michell K, Minutolo MC. A causal framework to deter-
[36] Sun L, Najand M, Shen J. Stock return predictability and investor sentiment: mine the effectiveness of dynamic quarantine policy to mitigate COVID-19.
A high-frequency perspective. J Bank Financ 2016;73:147–64. http://dx.doi. Appl Soft Comput 2021;104:107241. http://dx.doi.org/10.1016/j.asoc.2021.
org/10.1016/j.jbankfin.2016.09.010. 107241.
[37] F. Xing, Hoang DH, Vo D-V. High-frequency news sentiment and its [45] J. Yoon, Jordon J, Van Der Schaar M. GANITE: Estimation of individual-
application to forex market prediction. SSRN scholarly paper 3711227, ized treatment effects using generative adversarial nets. In: International
Rochester, NY: Social Science Research Network; 2020, Available at SSRN: conference on learning representations, 2018.
https://ssrn.com/abstract=3711227.