Manual-3 X
Manual-3 X
Manual-3 X
Johannes Textor
DAGitty is a software for drawing and analyzing causal diagrams, also known as directed acyclic graphs
(DAGs). Functions include identification of minimal sufficient adjustment sets for estimating causal effects,
diagnosis of insufficient or invalid adjustment via the identification of biasing paths, identification of
instrumental variables, and derivation of testable implications.
DAGitty is provided in the hope that it is useful for researchers and students in Epidemiology, Sociology,
Psychology, and other empirical disciplines. The software should run in any modern web browser that
supports JavaScript, HTML, and SVG.
This is the user manual for DAGitty version 3.1. The manual is updated with every release of a new stable
version. DAGitty is available at dagitty.net. An R package ‘dagitty’ implementing the same functionality is
also available on CRAN and at github.com/jtextor/dagitty.
1
causal diagrams are described. A brief introduction to • Publishing models on-line.
causal diagrams is given in Section 2. Advanced users
might also be interested in the R package ‘dagitty’ [19], 1.4 Migrating from earlier versions of DAGitty
which implements all functionality of the web-based
The following two issues are important for users of older
software and more.
DAGitty versions. New users can skip this section.
1.1 Citing DAGitty • The model code syntax has been completely changed
in DAGitty version 3.0; the old syntax based on the
Developing and maintaining DAGitty takes time and
DAG program by Sven Knüppel [6] was getting
effort. If you publish research results obtained with
too limited to accommodate the new features that
the help of DAGitty, please consider giving us credit
were being added. Therefore, I decided to switch
by citing our work. The main reference for DAGitty
to a very different, but much more extensible syn-
is our paper describing the accompanying R package
tax closely based on the “dot” language used by
[19], which is based on the same software library, and
graphviz. DAGitty will still be able to open model
therefore also serves as a reference for the web-based pro-
code from older versions (with one small caveat for
gram. We have also published several research papers
very old code, see below) and will automatically
describing the specific algorithms used in DAGitty, such
convert this to the new syntax.
as for identification of biasing paths [18], adjustment
sets [21], and instrumental variables [22]. • Early versions of DAGitty supported only one
exposure and one outcome variable. It has now
1.2 Running DAGitty online been possible for quite some time to have more
than one exposure and/or outcome variable. This
There are two ways to run DAGitty: either from the
means that the old model code convention where
internet or from your own computer. To run DAGitty
the variable in the first line is the exposure and
online, visit the URL dagitty.net. DAGitty should run
the variable in the second line is the outcome no
in every modern browser. Specifically, I expect it to
longer works. Hence, if you open a model created
work well on recent versions of Firefox, Chrome, Opera,
with an earlier version in DAGitty 2.0 or higher,
and Safari as well as on Internet Explorer (IE) version
exposure and outcome will appear like normal
9.0 or later, which all support scalable vector graphics
variables. To fix this, simply set exposure and
(SVG). If you encounter any problems, please send me
outcome again and save the new model code.
an e-mail so I can fix them (my contact information is
at the end of this manual). Keep in mind that DAGitty
is used by hundreds of people per day from all over 2 A brief introduction to causal dia-
the world – these people all benefit if the problem you
grams
found is fixed so please do consider investing the time
to notify me if you encounter any bugs. In this section, we will briefly review what causal di-
agrams are and how they can be applied in empiri-
1.3 Installing DAGitty on your own com- cal sciences. For a more detailed account, we recom-
puter mend the book Causal Inference in Statistics: A Primer
by Pearl, Glymour and Jewell [9], or the chapter Causal
DAGitty can be “installed” on your computer for use
Diagrams in the Epidemiology textbook of Rothman,
without an internet connection. To do this, download the
Greenland, and Lash [11]. Also take a look at the web
file dagitty.net/dagitty.zip which is a ZIP archive
page dagitty.net/learn/, where I am collecting sev-
containing DAGitty’s source. Unpack this ZIP file any-
eral tutorials (some of them interactive) on specific DAG-
where in your file system. To run DAGitty, just open
related topics.
the file dags.html in the unpacked folder.
In Epidemiology, causal diagrams are also frequently
Some features of DAGitty will not work in the offline
called DAGs.1 In a nutshell, a DAG is a graphic model
version, because they are actually implemented on the
that depicts a set of hypotheses about the causal process
web server. Currently, these features are:
that generates a set of variables of interest. An arrow
• Exporting model drawings as PDF, JPEG or PNG 1 The term “DAG” is somewhat confusing to computer scientists and
2
X → Y is drawn if there is a direct causal effect of X smoking; (2) the natural process by which lung cancer
on Y. Intuitively, this means that the natural process develops is affected by the amount of tar in the lung; (3)
determining Y is directly influenced by the status of X, the natural process by which lung cancer develops is not
and that altering X via external intervention would also affected by the person’s smoking other than indirectly
alter Y. However, an arrow X → Y only represents that via the tar deposit; and finally (4) no variables having
part of the causal effect which is not mediated by any of relevant direct influence on more than one variable of
the other variables in the diagram. If one is certain that the diagram were omitted.
X does not have a direct causal influence on Y, then the In an epidemiological context, we are often interested
arrow is omitted. This has two important implications: in the putative effect of a set of variables, called expo-
(1) arrows should follow time order, or else the diagram sures, on another set of variables called outcomes. A key
contradicts the basic principle that causes must precede question in Epidemiology (and many other empirical
their effects; (2) the omission of an arrow is a stronger sciences) is: how can we infer the causal effect of an ex-
claim than the inclusion of an arrow – the presence of an posure on an outcome of interest from an observational
arrow depicts merely the “causal null hypothesis” that study? Typically, a simple regression will not suffice
X might have an effect on Y. due to the presence of confounding factors, which may
Mathematically, the semantics of an arrow X → Y can lead to an over- or underestimation of the causal effect
be defined as follows. Given a DAG G and a variable from the observed data. If the assumptions encoded in
Y in G, let X1 , . . . , Xn be all variables in G that have a given diagram hold, then it is sometimes possible to
direct arrows Xi → Y (also called the parents of Y). devise an identification strategy from that diagram, by
Then G claims that the causal process determining the which it would be possible to devise an unbiased esti-
value of Y can be modelled as a mathematical function mate of a causal effect from observed data. One example
Y := f (X1 , . . . , Xn , ϵY ), where ϵY (the “causal residual”) identification strategy would be covariate adjustment. For
is a random variable that is jointly independent of all Xi . example, consider the following diagram:
For example, the sentence “smoking causes lung smoking
cancer” could be translated into the following simple
causal diagram: ?
carry matches cancer
smoking
If we were to perform an association study on the
relationship between carrying matches in one’s pocket
lung cancer
and developing lung cancer, we would probably find a
correlation between these two variables. However, as
We would interpret this diagram as follows: (1) The
the above diagram indicates, this correlation would not
variable “smoking” refers to a person’s smoking habit
imply that carrying matches in your pocket causes lung
prior to a later cancer disease status in that same person;
cancer: Smokers are more likely to carry matches in their
(2) the natural process by which a person develops
pockets, and also more likely to develop lung cancer.
cancer might be influenced by the smoking habits of
This is an example of a confounded association between
that person; (3) there exist no other variables that have
two variables, which is mediated via the biasing path
a direct influence on both smoking habits and cancer.
(bold). Now let us assume (unrealistically, and solely for
A slightly more complex version of this diagram might
didactic purposes) that the simplistic diagram above is
look as follows:
an accurate representation of the process that generated
smoking our data. Under this assumption, would we adjust for
smoking, e.g. by weighted averaging of separate effect
tar deposit in lungs estimates for smokers and non-smokers or by including
smoking status as a covariate in a regression model, we
lung cancer would no longer find a correlation between carrying
matches and lung cancer. In other words, adjustment
This diagram is about a person’s smoking habits at for smoking would close the biasing path. In general, any
a time t1 , the tar deposit in her lungs at a later time t2 , set of covariates that closes all biasing paths (and does
and finally the development of lung cancer at an even not open new ones or closes causal paths in the process)
later time t3 . We claim that (1) the natural process which is called an adjustment sets. Adjustment sets will be
determines the amount of tar in the lungs is affected by explained in more detail in Section 5.5.1.
3
In DAGitty we can distinguish between observed 3.1 DAGitty’s textual syntax for causal dia-
and unobserved (latent) variables. This distinction is grams
important when it comes to identifying causal effects:
The textual syntax in DAGitty is based on the ‘dot’
if there are many unobserved variables in a DAG, then
language by graphviz. In fact, many dot graphs should
this can make identification difficult or impossible. A
work directly in dagitty without modifications, although
common situation is when one has so-called “latent
most of the style attributes of the dot language are not
confounding factors” affecting two variables of interest;
supported by dagitty. I believe it’s best to introduce the
often, one does not know all these confounding factors
syntax by a series of examples. Let’s start by defining
and just represents this situation as follows:
the example used in the introduction above.
U
dag{
coffee smoking smoking
"carry matches" [exposure]
Since this situation is so common, there is an abbre- cancer [outcome]
viated notation for this using a bi-directed arrow: smoking -> "carry matches"
smoking -> cancer
"carry matches" -> cancer
coffee smoking }
Importantly, this means that bi-directed edges do not This example shows the three basic components of
represent reciprocal causation (which is impossible to the syntax in action:
represent in a DAG). A common use-case is to depict un-
• The enclosing statement dag{ ... }, which is
known or unobserved confounders without specifying
always there. The DAG can also be given a name
explicitly what those confounders are.
like so: dag Smoking { ... }
The purpose of DAGitty is to aid study design
through devising identification strategies in (possibly • The variable (vertex) statements. These consist
complex) causal diagrams and, more generally, through of a variable name and a list of options enclosed
the identification of causal and biasing paths as well as in square brackets. For instance, the options “ex-
testable implications in a given diagram. posure” and “outcome” set a variable to be an
exposure or outcome, respectively. Other relevant
options are “latent” (for unobserved variables)
3 Loading, saving and sharing dia- and “adjusted” (for variables that have been ad-
grams justed for in a statistical analysis). It is necessary
to double-quote the variable names if they con-
This section covers the three basic steps of working tain spaces or other special characters, like for the
with DAGitty: (1) loading a diagram; (2) manipulating variable “carry matches”.
the graphical layout of the diagram; and (3) saving the
• The edge statements. These consist of a source
diagram. First of all, any causal diagram consists of ver-
variable, and edge type (which can be ->, <-, or
tices (variables) and arrows (direct causal effects). You
<->), and a target variable. As explained above,
can either create the diagram directly using DAGitty’s
bi-directed edges (x<->y) are simply an equivalent
graphical user interface (explained in the next section),
shorthand for typing x<-u->y; u[latent].
or prepare a textual diagram description in a word pro-
cessor and then import this description into DAGitty. In These three syntax components are in fact enough
addition, DAGitty contains some pre-defined examples to define any DAG. We are now going to define the
that you can use to become familiar with the program same DAG in various different ways to showcase vari-
and with DAGs in general. To do so, just select one of ous convenient features of the syntax that make DAG
the pre-defined examples from the “Examples” menu. definitions more compact; it is not necessary to use any
of these features, but they can save a lot of typing.
Variable statements can be omitted if the variable
has no options, such as “smoking” in the above example.
Every time a variable is used in an edge statement, that
4
variable is automatically added as if there had been a
dag{
corresponding node statement without an option.
"carry matches" [exposure] ; cancer [outcome]
smoking -> {"carry matches" cancer}
dag{ "carry matches" -> cancer
"carry matches" [exposure] }
cancer [outcome]
smoking -> "carry matches" The curly braces open a new scope in which a “sub-
smoking -> cancer
graph” is defined. An arrow pointing to a sub-graph
"carry matches" -> cancer
means that there will be arrows made to all variables in
}
the sub-graph, and the sub-graph itself can also define
its own internal arrows. This means that we can also
White-space is optional and several statements can
write the above as:
be combined on one line. For clarity, it is recommended
to insert semi-colons between different statements on the
dag{
same line; however, this is not necessary. The following
"carry matches" [exposure] ; cancer [outcome]
two versions are equivalent:
smoking -> {"carry matches" -> cancer}
}
dag{
"carry matches" [exposure]; cancer [outcome] To save even more typing, several option names can
smoking -> "carry matches"; smoking -> cancer; be abbreviated using single letters like so:
"carry matches" -> cancer
}
dag{
"carry matches" [e] ; cancer [o]
smoking -> {"carry matches" -> cancer}
dag{
}
"carry matches" [exposure] cancer [outcome]
smoking -> "carry matches" smoking -> cancer
"carry matches" -> cancer
Like mentioned above, it is not necessary to use
} grouping or edge statement chaining; the only purpose
of these tricks is to save some typing. In fact, once your
Edge statements can be chained together such that textual syntax is entered in DAGitty, it will be converted
entire paths can be defined at once: back to a trivial form in which the variable and edge
statements are all explicitly listed. (This is similar to
what would happen in graphviz.)
dag{
The above examples covered only the structure of the
"carry matches" [exposure] ; cancer [outcome]
smoking -> "carry matches" -> cancer
DAG, but gave no layout information. A simple layout
smoking -> cancer is automatically generated by DAGitty once you input a
} text description where DAGitty cannot detect any layout
coordinates. The coordinates are also updated when
Arrows can also be written in reverse orientation, you move nodes around or bend edges. See Figure 1
which is quite convenient when used together with edge for how the layout information is added to the variable
chaining: statements. You could of course enter your own layout
information manually into the text syntax as well.
dag{
"carry matches" [exposure] ; cancer [outcome] 3.2 Loading a model text
cancer <- smoking -> "carry matches" -> cancer
To load a textually defined diagram into DAGitty, simply
}
copy&paste the textual description into the “Model code”
text box. Then click on “Update DAG”. DAGitty will
Another very useful feature for short DAG descrip-
now generate a preliminary graphical layout for your
tions is variable grouping using curly braces. This allows
diagram on the canvas, which may not yet look the way
you to define several arrows at once like so:
you intended, but can be freely modified.
5
3.3 Modifying the graphical layout of a dia-
gram
To layout the vertices and arrows of your diagram more
clearly than DAGitty did, simply drag the vertices with
your mouse on the canvas. You may notice that DAGitty
modifies the information in the “Model code” field on the
fly, and augments it with additional position information
for each vertex. In general, all changes you make to your
diagram within DAGitty are immediately reflected in
the model code.
“CTRL + A” to select the entire content of the text field, then pressing
“CTRL + C” to copy the content. You can then paste the content in
another program using “CTRL + V”.
6
Here’s how it works: Draw your DAG to full satisfac- and arrows graphically using the mouse.
tion, then choose “Publish on dagitty.net” from the “Model”
menu. You have two options how to publish your DAG: 4.1 Creating a new diagram
anonymously, or linking it to an e-mail address. If you
To create a new diagram, select “New Model” from the
store the DAG anonymously, you will later on not be
“Model” menu. You will be asked for the names of the
able to edit it or delete it from the server.
exposure and the outcome variable, and an initial model
After choosing “Publish on dagitty.net” from the “Model”
containing just those variables and an arrow between
menu, a small form will appear where you can enter
them will be drawn. Then you can add variables and
some metadata on the DAG, and provide your e-mail
arrows to the model as explained below.
address if you so wish. Upon clicking “Publish”, the
DAG will be sent to the dagitty.net server, and you will
receive a URL under which the DAG is now available. If 4.2 Adding new variables
you provided your e-mail address, you will also receive To add a new variable to the model, double-click on a
a message requesting you to confirm your ownership free space in the canvas (i.e., not on an existing variable)
of the DAG. This is simply done by clicking on a confir- or press the “n” key. A dialog will pop up asking you
mation link. Only then will the DAG be linked to your for the name of the new variable. Enter the name into
e-mail address, and you will receive a password to use the dialog and press the enter key or click “OK”. If you
when deleting or modifying the published DAG. click “Cancel”, no new variable will be created.
If you did link your DAG to your e-mail address, you
can delete it by choosing “Delete on dagitty.net” from the
4.3 Renaming variables
“Model” menu, which will prompt you to enter the DAG’s
URL and the password. If the URL and password match, To rename an existing variable, move the mouse pointer
the DAG will be deleted. Similarly, you can update a over that variable and hit the “r” key. A dialog will pop
stored DAG using the “Load from dagitty.net” function up allowing you to change the variable name.
from the “Model” menu, modifying it, and saving it
again. You can view published DAGs (if you know their 4.4 Setting the status of a variable
URL) by just putting the URL into your address bar
Variables can have one of the following statuses:
of course, but you can also do so using the “Load from
dagitty.net” function. • Exposure
Please note that all DAGs stored on dagitty.net are
meant to be public information. Do not store any data • Outcome
that you consider private or in any way secret. Once
• Adjusted
stored on dagitty.net, every person in the world who
knows your DAG’s URL can view it (but not your e-mail • Selected
address if you provided one). Also note that there is
• Unobserved (latent)
no guarantee that dagitty.net will keep running forever.
Storing your DAGs is done at your own risk. Still, • Other
you may find this feature useful, for instance to e-mail
You can change these statuses when you click on
your DAGs to colleagues or to include links to DAGs in
the variable using the checkboxes in the “Variable” field.
papers under review. For archival purposes, it may be
There are also keyboard shortcuts available. For exam-
more appropriate to include the DAG or the model code
ple, to turn a variable into an exposure, move the mouse
in the paper itself or its supporting information.
pointer over that variable and hit the “e” key; for an
outcome, hit the “o” key instead. To toggle whether a
4 Editing diagrams using the graph- variable is observed or unobserved, hit the “u” key; to
toggle whether it is adjusted, hit the “a” key. Chang-
ical user interface
ing the status of variables may change the colors of
You are free to make changes directly to the textual the diagram vertices to reflect the new structure and
description of your diagram, which will be reflected information flow in the diagram (see below).
on the canvas next time you click on “Update DAG”. At present, the statuses are mutually exclusive – e.g.,
However, you can also create, modify, and delete vertices a variable cannot be both unobserved and adjusted or
7
both exposure and unobserved. This could change in 5 Analyzing diagrams
future versions of DAGitty.
5.1 Paths
4.5 Adding new arrows Causal diagrams contain two different kinds of paths
To add a new arrow, double-click first on the source between exposure and outcome variables.
vertex (which will become highlighted) and then on • Causal paths start at the exposure, contain only
the target vertex. The arrow will be inserted. If an arrows pointing away from the exposure, and
arrow existed before in the opposite direction, then a end at the outcome. That is, they have the form
bi-directed arrow will be created. If a bi-directed arrow e → x1 → . . . → xk → o.
already existed, then it will be deleted. This means it
is currently not possible to have both a directed and a • Biasing paths are all other paths from exposure to
bi-directed arrow between the same variables. If you outcome. For example, such paths can have the
want to represent such structures, please represent the form e ← x1 → . . . → xk → o.
bi-directed arrow x ⇔ y explicitly as x ← u → y, where
With respect to a set Z of conditioning variables
u is a latent variable. (Remember that x ⇔ y is simply a
(that can also be empty if we are not conditioning on
shorthand for x ← u → y.)
anything), paths can be either open or closed (also called
Instead of double-clicking on a vertex, you can also
d-separated [8]). A path is closed by Z if one or both of
move the mouse pointer over the vertex and press the key
the following holds:
“c”. Arrows are by default drawn using a straight line,
but you can change that moving the mouse pointer to the • The path p contains a chain x → m → y or a fork
line, pressing and holding down the left mouse button, x ← m → y such that m is in Z.
and “bending” the line by dragging as appropriate.
• The path p contains a collider x → c ← y such that
c is not in Z and furthermore, Z does not contain
4.6 Deleting variables any successor of c in the graph.
To delete a variable, move the mouse pointer over that
Otherwise, the path is open. The above criteria imply
variable and hit the “del” key on your keyboard, or
that paths consisting of only one arrow are always open,
alternatively the “d” key (the latter comes in handy if
no matter the content of Z. Also it is possible that a path
you’re on a Mac, which has no real delete key). All
is closed with respect to the empty set Z = {}.
arrows to that variable will be deleted along with the
variable. In contrast to DAGitty versions prior to 2.0, all
5.2 Coloring
variables can now be deleted including exposure and
outcome. It is not easy to verify by hand which paths are open and
which paths are closed, especially in larger diagrams.
4.7 Deleting arrows DAGitty highlights all arrows lying on open biasing
paths in red and all arrows lying on open causal paths in
An arrow is deleted just like it has been inserted, i.e., by
green. This highlighting is optional and is controlled via
double-clicking first on the start variable and then on the
the “highlight causal paths” and “highlight biasing paths”
target variable. An arrow is also deleted automatically
checkboxes.
if a new one is inserted in the opposite direction (see
above).
5.3 Effect analysis
4.8 Choosing the style of display As mentioned above, arrows in DAGs represent direct
effects. That is, in a DAG with three variables X, M,
At present, you can choose between two DAG diagram
and Y, an arrow X → Y means that there is a causal
styles: “classic”, where nodes and their labels are sep-
effect of X on Y that is not mediated through the variable
arate from each other, and SEM-like, where labels are
M. Often when building DAGs, people tend to forget
inside nodes. Both have their advantages and disadvan-
this aspect and think only about whether any kind of
tages. By the way, “SEM” refers to structural equation
causal effect exists, without paying attention to how it
modeling.
is mediated. This may result in DAGs with too many
arrows.
8
To aid users with this, George Ellison (Leeds Univer- the sense that they entail exactly the given correlation
sity) suggested to implement a function that identifies graph [17].
arrows for which also a corresponding indirect pathway
exists. After drawing an initial DAG, one might recon- 5.4.2 The moral graph
sider these arrows and judge whether they are really
To identify minimal sufficient adjustment sets, DAGitty
necessary given the indirect pathways already present
uses the so-called “moral graph”, which results from a
in the diagram.
transformation of the model to an undirected graph. This
For example, suppose after thinking about the pair-
procedure is also highly recommended if you wish to
wise causal relationships between our variables X, M, Y
verify the calculation by hand. See the nice explanation
we came up with this DAG:
by Shrier and Platt [14] for details on this procedure.
X M Y In DAGitty, you can switch between display of the
model and its moral graph choosing “moral graph” in
For the arrows drawn in bold, there is no corre-
the“view mode” section on the left-hand side of the
sponding indirect path – removing one of these arrows
page.
from the diagram means that there will no longer be
any causal effect between the corresponding variables.
These arrows are called atomic direct effects in DAGitty, 5.5 Causal effect identification
and they can be highlighted – like in the above DAG – Some of the most important features of DAGitty are
by ticking the checkbox with that name. On the other concerned with the question: how can causal effects be
hand, for the thin arrow X → Y, there is also the indirect estimated from observational data? Currently, two types
pathway X → M → Y. One may therefore reconsider of causal effect identification are supported: adjustment
whether the arrow X → Y is truly necessary – perhaps sets, and instrumental variables.
the causal effect from X to Y is entirely mediated through
M. 5.5.1 Adjustment sets
9
C1 M the path E ← A → Z ← B ← D: Because both E and
D depend on Z, adjusting for Z will induce additional
X Y correlation between E and D.
10
The validity of an instrumental variable I depends that do not involve any unobserved variables. Note that
on two causal conditions – exogeneity and exclusion the set of testable implications displayed by DAGitty
restriction. These two conditions can be expressed in does not constitute a “basis set” [8]. Future versions will
the language of DAGs and paths as follows: (1) there allow choosing between different basis sets.
must be an open path between I and the exposure X; In general, the less arrows a diagram contains, the
and (2) all paths between I and the outcome Y must be more testable predictions it implies. For this reason,
closed in a modified graph where all edges out of X are “simpler” models with fewer arrows are in general easier
removed. A variable that fulfills these two conditions is to falsify (Occam’s razor).
called an instrumental variable or simply an instrument.
Instrumental variables can also be generalized such
that the two conditions are required to hold conditional
6 Acknowledgements
on a set of covariates Z [3]. The two conditions then read I would like to thank my collaborators Maciej Liśkiewicz
as follows: (1) there must be a path between I and X and Benito van der Zander (both at the Institute for
that is opened by Z; and (2) all paths between I and Y Theoretical Computer Science, University of Lübeck,
must be closed by Z in a modified graph where all edges Germany) for our collaborations on developing efficient
out of X are removed. A variable that fulfills these two algorithms to analyze causal diagrams.
conditions is called a conditional instrument. I also thank Michael Elberfeld, Juliane Hardt, Sven
DAGitty will find both “classic” and conditional Knüppel, Keith Marcus, Judea Pearl, Sabine Schipf, and
instruments when the option “Instrumental Variable” is Felix Thoemmes (in alphabetical order) for enlighten-
selected under the “Causal effect identification” field. Note ing discussions (either in person, per e-mail, or on the
that DAGitty will not always list all possible instruments; SEMnet discussion list) about DAGs that made this pro-
instead, it will restrict itself to a certain well-defined gram possible. Furthermore, I thank Robert Balshaw,
subset that we call “ancestral instruments”. However, George Ellison, Marlene Egger, Angelo Franchini, Ulrike
whenever any instrument or conditional instrument Förster, Mark Gilthorpe, Dirk van Kampen, Jeff Martin,
exists at all, then DAGitty is guaranteed to find one. Jillian Martin, Karl Michaëlsson, David Tritchler, Eric
Note also that if there are several instruments available, Vittinghof, and other users for sending feedback and
then it is best to choose the one that is most strongly bug reports that greatly helped to improve DAGitty.
correlated with X (conditional on Z in the case of a The development of DAGitty was sponsored by
conditional instrument). funding from the Institute of Genetics, Health and Ther-
For details regarding ancestral instruments and how apeutics at Leeds University, UK. I thank George Ellison
DAGitty computes them, please refer to the research for arranging this generous support.
paper where we describe these methods [22].
11
the code small. Developed by the Prototype Core Team References
and licensed under the MIT license [16].
Furthermore, DAGitty contains some modified code [1] Silvia Acid and Luis M. De Campos. An algorithm
from the Dracula Graph Library by Philipp Strathausen, for finding minimum d-separating sets in belief
which is also licensed under the MIT license [15]. networks. In Proceedings of the 12th Conference of
I am grateful to the authors of these libraries for their Uncertainty in Artificial Intelligence, pages 3–10, 1996.
valuable work. [2] Joshua D. Angrist, Guido W. Imbens, and Donald B.
Rubin. Identification of causal effects using instru-
9 Bundled examples mental variables. Journal of the American Statistical
Association, 91(434):444–55, 1996.
DAGitty contains some builtin examples for didactic
[3] Carlos Brito and Judea Pearl. Generalized instru-
and illustrative purposes. Some of these examples are
mental variables. In Proceedings of the 18th Conference
taken from published papers or talks given at scientific
on Uncertainty in Artificial Intelligence, pages 85–93,
meetings. These are, in inverse chronological order:
2002.
• van Kampen 2014 [23]
[4] Vanessa Didelez, Svend Kreiner, and Niels Keiding.
• Polzer et al., 2012 [10] Graphical models for inference under outcome-
dependent sampling. Statistical Science, 25(3), Au-
• Schipf et al., 2010 [12] gust 2010.
• Didelez et al., 2010 [4] [5] Guido Imbens. Instrumental variables: An econo-
• Shrier & Pratt, 2008 [14] metrician’s perspective. Statistical Science, 29(3):323–
58, 2014.
• Sebastiani et al.3 , 2005 [13]
[6] Sven Knüppel and Andreas Stang. DAG program:
• Acid & de Campos, 1996 [1] identifying minimal sufficient adjustment sets. Epi-
demiology, 21(1):159, 2010.
Another example was provided by Felix Thoemmes
via personal communication (2013). [7] Steffen L. Laurizen, A. Philip Dawid, Birgitte N.
Larsen, and Hanns-Georg Leimer. Independence
properties of directed markov fields. Networks,
10 Author contact
20(5):491–505, 1990.
I would be glad to receive feedback from those who use [8] Judea Pearl. Causality: Models, Reasoning and Infer-
DAGitty for research or educational purposes. Also, you ence. Cambridge University Press, New York, NY,
are welcome to send me your suggestions or requests USA, 2nd edition, 2009.
for features that you miss in DAGitty.
[9] Judea Pearl, Madelyn Glymour, and Nicholas P
Johannes Textor Jewell. Causality Inference in Statistics: A Primer.
Data Science group Wiley, New York, NY, USA, 1st edition, 2016.
Radboud University
[10] Ines Polzer, Christian Schwahn, Henry Völzke,
Nijmegen, The Netherlands
Torsten Mundt, and Reiner Biffar. The associa-
tion of tooth loss with all-cause and circulatory
[email protected]
mortality. Is there a benefit of replaced teeth? A
johannes-textor.name
systematic review and meta-analysis. Clinical Oral
Mastodon: @johannes textor
Investigations, 16(2):333–351, 2012.
3 The example actually shows only a small part of their DAG. [11] Kenneth J. Rothman, Sander Greenland, and Timo-
thy L. Lash. Modern Epidemiology. Wolters Kluwer,
2008.
12
[12] Sabine Schipf, Robin Haring, Nele Friedrich, International Joint Conference on Artificial Intelligence
Matthias Nauck, Katharina Lau, Dietrich Alte, An- (IJCAI 2015), pages 3243–49. AAAI Press, 2015.
dreas Stang, Henry Völzke, and Henri Wallaschof-
[23] Dirk van Kampen. The ssq model of schizophrenic
ski. Low total testosterone is associated with in-
prodromal unfolding revised: An analysis of its
creased risk of incident type 2 diabetes mellitus in
causal chains based on the language of directed
men: Results from the study of health in pomerania
graphs. European Psychiatry, 29(7):437–48, 2014.
(SHIP). The Aging Male, 14(3):168–75, 2011.
[20] Jin Tian, Azaria Paz, and Judea Pearl. Finding min-
imal d-separators. Technical Report R-254, UCLA,
1998.
13