Manual-3 X

Drawing and Analyzing Causal DAGs with DAGitty
Johannes Textor
July 18, 2023
DAGitty is a software for drawing and analyzing causal diagrams, also known as directed acyclic graphs
(DAGs). Functions include identification of minimal sufficient adjustment sets for estimating causal effects,
diagnosis of insufficient or invalid adjustment via the identification of biasing paths, identification of
instrumental variables, and derivation of testable implications.
DAGitty is provided in the hope that it is useful for researchers and students in Epidemiology, Sociology,
Psychology, and other empirical disciplines. The software should run in any modern web browser that
supports JavaScript, HTML, and SVG.
This is the user manual for DAGitty version 3.1. The manual is updated with every release of a new stable
version. DAGitty is available at dagitty.net. An R package ‘dagitty’ implementing the same functionality is
also available on CRAN and at github.com/jtextor/dagitty.
Contents 5 Analyzing diagrams 8

5.1 Paths . . . . . . . . . . . . . . . . . . . . . 8
1 Introduction 1 5.2 Coloring . . . . . . . . . . . . . . . . . . . 8
1.1 Citing DAGitty . . . . . . . . . . . . . . . 2 5.3 Effect analysis . . . . . . . . . . . . . . . . 8
1.2 Running DAGitty online . . . . . . . . . . 2 5.4 View mode . . . . . . . . . . . . . . . . . . 9
1.3 Installing DAGitty on your own computer 2 5.4.1 The correlation graph . . . . . . . 9
1.4 Migrating from earlier versions of DAGitty 2 5.4.2 The moral graph . . . . . . . . . . 9
5.5 Causal effect identification . . . . . . . . . 9
2 A brief introduction to causal diagrams 2
5.5.1 Adjustment sets . . . . . . . . . . . 9
3 Loading, saving and sharing diagrams 4 5.5.2 Instrumental variables . . . . . . . 10
3.1 DAGitty’s textual syntax for causal dia- 5.6 Testable implications . . . . . . . . . . . . 11
grams . . . . . . . . . . . . . . . . . . . . . 4
6 Acknowledgements 11
3.2 Loading a model text . . . . . . . . . . . . 5
3.3 Modifying the graphical layout of a diagram 6 7 Legal notice 11
3.4 Saving the diagram . . . . . . . . . . . . . 6
3.5 Exporting the diagram . . . . . . . . . . . 6 8 Bundled libraries 11
3.6 Publishing diagrams online . . . . . . . . 6
9 Bundled examples 12
4 Editing diagrams using the graphical user inter-
10 Author contact 12
face 7
4.1 Creating a new diagram . . . . . . . . . . 7
4.2 Adding new variables . . . . . . . . . . . 7 1 Introduction
4.3 Renaming variables . . . . . . . . . . . . . 7
4.4 Setting the status of a variable . . . . . . . 7 DAGitty is a web-based software for analyzing causal
4.5 Adding new arrows . . . . . . . . . . . . . 8 diagrams. It contains some of the fastest algorithms
4.6 Deleting variables . . . . . . . . . . . . . . 8 available for this purpose.
4.7 Deleting arrows . . . . . . . . . . . . . . . 8 This manual describes how causal diagrams can be
4.8 Choosing the style of display . . . . . . . 8 created (Section 3) and manipulated (Section 4) using
DAGitty. In Section 5, DAGitty’s capabilities to analyze
1
causal diagrams are described. A brief introduction to • Publishing models on-line.
causal diagrams is given in Section 2. Advanced users
might also be interested in the R package ‘dagitty’ [19], 1.4 Migrating from earlier versions of DAGitty
which implements all functionality of the web-based
The following two issues are important for users of older
software and more.
DAGitty versions. New users can skip this section.
1.1 Citing DAGitty • The model code syntax has been completely changed
in DAGitty version 3.0; the old syntax based on the
Developing and maintaining DAGitty takes time and
DAG program by Sven Knüppel [6] was getting
effort. If you publish research results obtained with
too limited to accommodate the new features that
the help of DAGitty, please consider giving us credit
were being added. Therefore, I decided to switch
by citing our work. The main reference for DAGitty
to a very different, but much more extensible syn-
is our paper describing the accompanying R package
tax closely based on the “dot” language used by
[19], which is based on the same software library, and
graphviz. DAGitty will still be able to open model
therefore also serves as a reference for the web-based pro-
code from older versions (with one small caveat for
gram. We have also published several research papers
very old code, see below) and will automatically
describing the specific algorithms used in DAGitty, such
convert this to the new syntax.
as for identification of biasing paths [18], adjustment
sets [21], and instrumental variables [22]. • Early versions of DAGitty supported only one
exposure and one outcome variable. It has now
1.2 Running DAGitty online been possible for quite some time to have more
than one exposure and/or outcome variable. This
There are two ways to run DAGitty: either from the
means that the old model code convention where
internet or from your own computer. To run DAGitty
the variable in the first line is the exposure and
online, visit the URL dagitty.net. DAGitty should run
the variable in the second line is the outcome no
in every modern browser. Specifically, I expect it to
longer works. Hence, if you open a model created
work well on recent versions of Firefox, Chrome, Opera,
with an earlier version in DAGitty 2.0 or higher,
and Safari as well as on Internet Explorer (IE) version
exposure and outcome will appear like normal
9.0 or later, which all support scalable vector graphics
variables. To fix this, simply set exposure and
(SVG). If you encounter any problems, please send me
outcome again and save the new model code.
an e-mail so I can fix them (my contact information is
at the end of this manual). Keep in mind that DAGitty
is used by hundreds of people per day from all over 2 A brief introduction to causal dia-
the world – these people all benefit if the problem you
grams
found is fixed so please do consider investing the time
to notify me if you encounter any bugs. In this section, we will briefly review what causal di-
agrams are and how they can be applied in empiri-
1.3 Installing DAGitty on your own com- cal sciences. For a more detailed account, we recom-
puter mend the book Causal Inference in Statistics: A Primer
by Pearl, Glymour and Jewell [9], or the chapter Causal
DAGitty can be “installed” on your computer for use
Diagrams in the Epidemiology textbook of Rothman,
without an internet connection. To do this, download the
Greenland, and Lash [11]. Also take a look at the web
file dagitty.net/dagitty.zip which is a ZIP archive
page dagitty.net/learn/, where I am collecting sev-
containing DAGitty’s source. Unpack this ZIP file any-
eral tutorials (some of them interactive) on specific DAG-
where in your file system. To run DAGitty, just open
related topics.
the file dags.html in the unpacked folder.
In Epidemiology, causal diagrams are also frequently
Some features of DAGitty will not work in the offline
called DAGs.1 In a nutshell, a DAG is a graphic model
version, because they are actually implemented on the
that depicts a set of hypotheses about the causal process
web server. Currently, these features are:
that generates a set of variables of interest. An arrow
• Exporting model drawings as PDF, JPEG or PNG 1 The term “DAG” is somewhat confusing to computer scientists and
mathematicians, for whom a DAG is simply an abstract mathematical

files. structure without specific semantics attached to it.
2
X → Y is drawn if there is a direct causal effect of X smoking; (2) the natural process by which lung cancer
on Y. Intuitively, this means that the natural process develops is affected by the amount of tar in the lung; (3)
determining Y is directly influenced by the status of X, the natural process by which lung cancer develops is not
and that altering X via external intervention would also affected by the person’s smoking other than indirectly
alter Y. However, an arrow X → Y only represents that via the tar deposit; and finally (4) no variables having
part of the causal effect which is not mediated by any of relevant direct influence on more than one variable of
the other variables in the diagram. If one is certain that the diagram were omitted.
X does not have a direct causal influence on Y, then the In an epidemiological context, we are often interested
arrow is omitted. This has two important implications: in the putative effect of a set of variables, called expo-
(1) arrows should follow time order, or else the diagram sures, on another set of variables called outcomes. A key
contradicts the basic principle that causes must precede question in Epidemiology (and many other empirical
their effects; (2) the omission of an arrow is a stronger sciences) is: how can we infer the causal effect of an ex-
claim than the inclusion of an arrow – the presence of an posure on an outcome of interest from an observational
arrow depicts merely the “causal null hypothesis” that study? Typically, a simple regression will not suffice
X might have an effect on Y. due to the presence of confounding factors, which may
Mathematically, the semantics of an arrow X → Y can lead to an over- or underestimation of the causal effect
be defined as follows. Given a DAG G and a variable from the observed data. If the assumptions encoded in
Y in G, let X1 , . . . , Xn be all variables in G that have a given diagram hold, then it is sometimes possible to
direct arrows Xi → Y (also called the parents of Y). devise an identification strategy from that diagram, by
Then G claims that the causal process determining the which it would be possible to devise an unbiased esti-
value of Y can be modelled as a mathematical function mate of a causal effect from observed data. One example
Y := f (X1 , . . . , Xn , ϵY ), where ϵY (the “causal residual”) identification strategy would be covariate adjustment. For
is a random variable that is jointly independent of all Xi . example, consider the following diagram:
For example, the sentence “smoking causes lung smoking
cancer” could be translated into the following simple
causal diagram: ?
carry matches cancer
smoking
If we were to perform an association study on the
relationship between carrying matches in one’s pocket
lung cancer
and developing lung cancer, we would probably find a
correlation between these two variables. However, as
We would interpret this diagram as follows: (1) The
the above diagram indicates, this correlation would not
variable “smoking” refers to a person’s smoking habit
imply that carrying matches in your pocket causes lung
prior to a later cancer disease status in that same person;
cancer: Smokers are more likely to carry matches in their
(2) the natural process by which a person develops
pockets, and also more likely to develop lung cancer.
cancer might be influenced by the smoking habits of
This is an example of a confounded association between
that person; (3) there exist no other variables that have
two variables, which is mediated via the biasing path
a direct influence on both smoking habits and cancer.
(bold). Now let us assume (unrealistically, and solely for
A slightly more complex version of this diagram might
didactic purposes) that the simplistic diagram above is
look as follows:
an accurate representation of the process that generated
smoking our data. Under this assumption, would we adjust for
smoking, e.g. by weighted averaging of separate effect
tar deposit in lungs estimates for smokers and non-smokers or by including
smoking status as a covariate in a regression model, we
lung cancer would no longer find a correlation between carrying
matches and lung cancer. In other words, adjustment
This diagram is about a person’s smoking habits at for smoking would close the biasing path. In general, any
a time t1 , the tar deposit in her lungs at a later time t2 , set of covariates that closes all biasing paths (and does
and finally the development of lung cancer at an even not open new ones or closes causal paths in the process)
later time t3 . We claim that (1) the natural process which is called an adjustment sets. Adjustment sets will be
determines the amount of tar in the lungs is affected by explained in more detail in Section 5.5.1.
3
In DAGitty we can distinguish between observed 3.1 DAGitty’s textual syntax for causal dia-
and unobserved (latent) variables. This distinction is grams
important when it comes to identifying causal effects:
The textual syntax in DAGitty is based on the ‘dot’
if there are many unobserved variables in a DAG, then
language by graphviz. In fact, many dot graphs should
this can make identification difficult or impossible. A
work directly in dagitty without modifications, although
common situation is when one has so-called “latent
most of the style attributes of the dot language are not
confounding factors” affecting two variables of interest;
supported by dagitty. I believe it’s best to introduce the
often, one does not know all these confounding factors
syntax by a series of examples. Let’s start by defining
and just represents this situation as follows:
the example used in the introduction above.
U
dag{
coffee smoking smoking
"carry matches" [exposure]
Since this situation is so common, there is an abbre- cancer [outcome]
viated notation for this using a bi-directed arrow: smoking -> "carry matches"
smoking -> cancer
"carry matches" -> cancer
coffee smoking }
Importantly, this means that bi-directed edges do not This example shows the three basic components of
represent reciprocal causation (which is impossible to the syntax in action:
represent in a DAG). A common use-case is to depict un-
• The enclosing statement dag{ ... }, which is
known or unobserved confounders without specifying
always there. The DAG can also be given a name
explicitly what those confounders are.
like so: dag Smoking { ... }
The purpose of DAGitty is to aid study design
through devising identification strategies in (possibly • The variable (vertex) statements. These consist
complex) causal diagrams and, more generally, through of a variable name and a list of options enclosed
the identification of causal and biasing paths as well as in square brackets. For instance, the options “ex-
testable implications in a given diagram. posure” and “outcome” set a variable to be an
exposure or outcome, respectively. Other relevant
options are “latent” (for unobserved variables)
3 Loading, saving and sharing dia- and “adjusted” (for variables that have been ad-
grams justed for in a statistical analysis). It is necessary
to double-quote the variable names if they con-
This section covers the three basic steps of working tain spaces or other special characters, like for the
with DAGitty: (1) loading a diagram; (2) manipulating variable “carry matches”.
the graphical layout of the diagram; and (3) saving the
• The edge statements. These consist of a source
diagram. First of all, any causal diagram consists of ver-
variable, and edge type (which can be ->, <-, or
tices (variables) and arrows (direct causal effects). You
<->), and a target variable. As explained above,
can either create the diagram directly using DAGitty’s
bi-directed edges (x<->y) are simply an equivalent
graphical user interface (explained in the next section),
shorthand for typing x<-u->y; u[latent].
or prepare a textual diagram description in a word pro-
cessor and then import this description into DAGitty. In These three syntax components are in fact enough
addition, DAGitty contains some pre-defined examples to define any DAG. We are now going to define the
that you can use to become familiar with the program same DAG in various different ways to showcase vari-
and with DAGs in general. To do so, just select one of ous convenient features of the syntax that make DAG
the pre-defined examples from the “Examples” menu. definitions more compact; it is not necessary to use any
of these features, but they can save a lot of typing.
Variable statements can be omitted if the variable
has no options, such as “smoking” in the above example.
Every time a variable is used in an edge statement, that
4
variable is automatically added as if there had been a
dag{
corresponding node statement without an option.
"carry matches" [exposure] ; cancer [outcome]
smoking -> {"carry matches" cancer}
dag{ "carry matches" -> cancer
"carry matches" [exposure] }
cancer [outcome]
smoking -> "carry matches" The curly braces open a new scope in which a “sub-
smoking -> cancer
graph” is defined. An arrow pointing to a sub-graph
means that there will be arrows made to all variables in
}
the sub-graph, and the sub-graph itself can also define
its own internal arrows. This means that we can also
White-space is optional and several statements can
write the above as:
be combined on one line. For clarity, it is recommended
to insert semi-colons between different statements on the
dag{
same line; however, this is not necessary. The following
two versions are equivalent:
smoking -> {"carry matches" -> cancer}
}
dag{
"carry matches" [exposure]; cancer [outcome] To save even more typing, several option names can
smoking -> "carry matches"; smoking -> cancer; be abbreviated using single letters like so:
}
dag{
"carry matches" [e] ; cancer [o]
smoking -> {"carry matches" -> cancer}
dag{
}
"carry matches" [exposure] cancer [outcome]
smoking -> "carry matches" smoking -> cancer
Like mentioned above, it is not necessary to use
} grouping or edge statement chaining; the only purpose
of these tricks is to save some typing. In fact, once your
Edge statements can be chained together such that textual syntax is entered in DAGitty, it will be converted
entire paths can be defined at once: back to a trivial form in which the variable and edge
statements are all explicitly listed. (This is similar to
what would happen in graphviz.)
dag{
The above examples covered only the structure of the
smoking -> "carry matches" -> cancer
DAG, but gave no layout information. A simple layout
smoking -> cancer is automatically generated by DAGitty once you input a
} text description where DAGitty cannot detect any layout
coordinates. The coordinates are also updated when
Arrows can also be written in reverse orientation, you move nodes around or bend edges. See Figure 1
which is quite convenient when used together with edge for how the layout information is added to the variable
chaining: statements. You could of course enter your own layout
information manually into the text syntax as well.
dag{
"carry matches" [exposure] ; cancer [outcome] 3.2 Loading a model text
cancer <- smoking -> "carry matches" -> cancer
To load a textually defined diagram into DAGitty, simply
}
copy&paste the textual description into the “Model code”
text box. Then click on “Update DAG”. DAGitty will
Another very useful feature for short DAG descrip-
now generate a preliminary graphical layout for your
tions is variable grouping using curly braces. This allows
diagram on the canvas, which may not yet look the way
you to define several arrows at once like so:
you intended, but can be freely modified.
5
3.3 Modifying the graphical layout of a dia-
gram
To layout the vertices and arrows of your diagram more
clearly than DAGitty did, simply drag the vertices with
your mouse on the canvas. You may notice that DAGitty
modifies the information in the “Model code” field on the
fly, and augments it with additional position information
for each vertex. In general, all changes you make to your
diagram within DAGitty are immediately reflected in
the model code.
(a) (b) (c) 3.4 Saving the diagram

To save your diagram locally, just copy&paste the con-
dag{ dag{ dag{
tents of the “Model code” field to a text file, and save
E -> D A -> {Z E} A -> {Z -> E}
A -> E B -> {Z D} B -> {Z -> D} that file locally to your computer2 . When you wish to
A -> Z Z -> {E D} E -> D continue working on the diagram, copy the model code
B -> Z E -> D }
back into DAGitty as explained above.
B -> D }
Z -> E
Z -> D
}
3.5 Exporting the diagram
DAGitty can export the diagram as a PDF or SVG vector
(d) (e) graphic (publication quality) or a JPEG or PNG bitmap
graphic (e.g. for inclusion in Powerpoint). Select the
dag { A B
A [pos="0,-2"] corresponding function from the “Model” menu. If you
B [pos="2,-2"] want to edit the graphical layout of the diagram or
D [outcome,pos="2,0"] Z annotate it, it is recommended to export the diagram as
E [exposure,pos="0,0"]
Z [pos="1,-1"] an SVG file and open that in a vector graphics program
A -> { E Z } E D such as Inkscape.
B -> { D Z }
E -> D
Z -> { D E } 3.6 Publishing diagrams online
}
Part of the appeal of using DAGs is that the assumptions
Figure 1: Example for a textual model definition with DAGitty underlying one’s research are made explicit, and the
using (a) simple model code; (b) shorter model codel and (c)
conclusions drawn from the data can be later re-checked
very short model code of the graph in (a) using grouping oper-
ations. (d) When the diagram is edited within DAGitty, vertex if some of the assumptions are found to not hold. Of
labels and adjustment status are augmented with additional course, this requires to make the DAG available together
layout coordinates for each variable, which are indicated as
with the data and interpretation. I have however seen
an option of the corresponding node. (e) Graphical layout
corresponding to (d). many articles where people report having used DAGs
but do not actually show them. If researchers, reviewers
or editors deem it inappropriate to include the DAG (or
its model code) in the manuscript itself, here’s another
option: Store the DAG on the DAGitty website and get
a short URL under which this DAG will be accessible.
Then include this URL in the manuscript, or its sup-
porting information. For example, one of the DAGitty
examples is stored at the URL dagitty.net/mvcFQ.
2 This is most easily done by clicking in the text field, pressing
“CTRL + A” to select the entire content of the text field, then pressing
“CTRL + C” to copy the content. You can then paste the content in
another program using “CTRL + V”.
6
Here’s how it works: Draw your DAG to full satisfac- and arrows graphically using the mouse.
tion, then choose “Publish on dagitty.net” from the “Model”
menu. You have two options how to publish your DAG: 4.1 Creating a new diagram
anonymously, or linking it to an e-mail address. If you
To create a new diagram, select “New Model” from the
store the DAG anonymously, you will later on not be
“Model” menu. You will be asked for the names of the
able to edit it or delete it from the server.
exposure and the outcome variable, and an initial model
After choosing “Publish on dagitty.net” from the “Model”
containing just those variables and an arrow between
menu, a small form will appear where you can enter
them will be drawn. Then you can add variables and
some metadata on the DAG, and provide your e-mail
arrows to the model as explained below.
address if you so wish. Upon clicking “Publish”, the
DAG will be sent to the dagitty.net server, and you will
receive a URL under which the DAG is now available. If 4.2 Adding new variables
you provided your e-mail address, you will also receive To add a new variable to the model, double-click on a
a message requesting you to confirm your ownership free space in the canvas (i.e., not on an existing variable)
of the DAG. This is simply done by clicking on a confir- or press the “n” key. A dialog will pop up asking you
mation link. Only then will the DAG be linked to your for the name of the new variable. Enter the name into
e-mail address, and you will receive a password to use the dialog and press the enter key or click “OK”. If you
when deleting or modifying the published DAG. click “Cancel”, no new variable will be created.
If you did link your DAG to your e-mail address, you
can delete it by choosing “Delete on dagitty.net” from the
4.3 Renaming variables
“Model” menu, which will prompt you to enter the DAG’s
URL and the password. If the URL and password match, To rename an existing variable, move the mouse pointer
the DAG will be deleted. Similarly, you can update a over that variable and hit the “r” key. A dialog will pop
stored DAG using the “Load from dagitty.net” function up allowing you to change the variable name.
from the “Model” menu, modifying it, and saving it
again. You can view published DAGs (if you know their 4.4 Setting the status of a variable
URL) by just putting the URL into your address bar
Variables can have one of the following statuses:
of course, but you can also do so using the “Load from
dagitty.net” function. • Exposure
Please note that all DAGs stored on dagitty.net are
meant to be public information. Do not store any data • Outcome
that you consider private or in any way secret. Once
• Adjusted
stored on dagitty.net, every person in the world who
knows your DAG’s URL can view it (but not your e-mail • Selected
address if you provided one). Also note that there is
• Unobserved (latent)
no guarantee that dagitty.net will keep running forever.
Storing your DAGs is done at your own risk. Still, • Other
you may find this feature useful, for instance to e-mail
You can change these statuses when you click on
your DAGs to colleagues or to include links to DAGs in
the variable using the checkboxes in the “Variable” field.
papers under review. For archival purposes, it may be
There are also keyboard shortcuts available. For exam-
more appropriate to include the DAG or the model code
ple, to turn a variable into an exposure, move the mouse
in the paper itself or its supporting information.
pointer over that variable and hit the “e” key; for an
outcome, hit the “o” key instead. To toggle whether a
4 Editing diagrams using the graph- variable is observed or unobserved, hit the “u” key; to
toggle whether it is adjusted, hit the “a” key. Chang-
ical user interface
ing the status of variables may change the colors of
You are free to make changes directly to the textual the diagram vertices to reflect the new structure and
description of your diagram, which will be reflected information flow in the diagram (see below).
on the canvas next time you click on “Update DAG”. At present, the statuses are mutually exclusive – e.g.,
However, you can also create, modify, and delete vertices a variable cannot be both unobserved and adjusted or
7
both exposure and unobserved. This could change in 5 Analyzing diagrams
future versions of DAGitty.
5.1 Paths
4.5 Adding new arrows Causal diagrams contain two different kinds of paths
To add a new arrow, double-click first on the source between exposure and outcome variables.
vertex (which will become highlighted) and then on • Causal paths start at the exposure, contain only
the target vertex. The arrow will be inserted. If an arrows pointing away from the exposure, and
arrow existed before in the opposite direction, then a end at the outcome. That is, they have the form
bi-directed arrow will be created. If a bi-directed arrow e → x1 → . . . → xk → o.
already existed, then it will be deleted. This means it
is currently not possible to have both a directed and a • Biasing paths are all other paths from exposure to
bi-directed arrow between the same variables. If you outcome. For example, such paths can have the
want to represent such structures, please represent the form e ← x1 → . . . → xk → o.
bi-directed arrow x ⇔ y explicitly as x ← u → y, where
With respect to a set Z of conditioning variables
u is a latent variable. (Remember that x ⇔ y is simply a
(that can also be empty if we are not conditioning on
shorthand for x ← u → y.)
anything), paths can be either open or closed (also called
Instead of double-clicking on a vertex, you can also
d-separated [8]). A path is closed by Z if one or both of
move the mouse pointer over the vertex and press the key
the following holds:
“c”. Arrows are by default drawn using a straight line,
but you can change that moving the mouse pointer to the • The path p contains a chain x → m → y or a fork
line, pressing and holding down the left mouse button, x ← m → y such that m is in Z.
and “bending” the line by dragging as appropriate.
• The path p contains a collider x → c ← y such that
c is not in Z and furthermore, Z does not contain
4.6 Deleting variables any successor of c in the graph.
To delete a variable, move the mouse pointer over that
Otherwise, the path is open. The above criteria imply
variable and hit the “del” key on your keyboard, or
that paths consisting of only one arrow are always open,
alternatively the “d” key (the latter comes in handy if
no matter the content of Z. Also it is possible that a path
you’re on a Mac, which has no real delete key). All
is closed with respect to the empty set Z = {}.
arrows to that variable will be deleted along with the
variable. In contrast to DAGitty versions prior to 2.0, all
5.2 Coloring
variables can now be deleted including exposure and
outcome. It is not easy to verify by hand which paths are open and
which paths are closed, especially in larger diagrams.
4.7 Deleting arrows DAGitty highlights all arrows lying on open biasing
paths in red and all arrows lying on open causal paths in
An arrow is deleted just like it has been inserted, i.e., by
green. This highlighting is optional and is controlled via
double-clicking first on the start variable and then on the
the “highlight causal paths” and “highlight biasing paths”
target variable. An arrow is also deleted automatically
checkboxes.
if a new one is inserted in the opposite direction (see
above).
5.3 Effect analysis
4.8 Choosing the style of display As mentioned above, arrows in DAGs represent direct
effects. That is, in a DAG with three variables X, M,
At present, you can choose between two DAG diagram
and Y, an arrow X → Y means that there is a causal
styles: “classic”, where nodes and their labels are sep-
effect of X on Y that is not mediated through the variable
arate from each other, and SEM-like, where labels are
M. Often when building DAGs, people tend to forget
inside nodes. Both have their advantages and disadvan-
this aspect and think only about whether any kind of
tages. By the way, “SEM” refers to structural equation
causal effect exists, without paying attention to how it
modeling.
is mediated. This may result in DAGs with too many
arrows.
8
To aid users with this, George Ellison (Leeds Univer- the sense that they entail exactly the given correlation
sity) suggested to implement a function that identifies graph [17].
arrows for which also a corresponding indirect pathway
exists. After drawing an initial DAG, one might recon- 5.4.2 The moral graph
sider these arrows and judge whether they are really
To identify minimal sufficient adjustment sets, DAGitty
necessary given the indirect pathways already present
uses the so-called “moral graph”, which results from a
in the diagram.
transformation of the model to an undirected graph. This
For example, suppose after thinking about the pair-
procedure is also highly recommended if you wish to
wise causal relationships between our variables X, M, Y
verify the calculation by hand. See the nice explanation
we came up with this DAG:
by Shrier and Platt [14] for details on this procedure.
X M Y In DAGitty, you can switch between display of the
model and its moral graph choosing “moral graph” in
For the arrows drawn in bold, there is no corre-
the“view mode” section on the left-hand side of the
sponding indirect path – removing one of these arrows
page.
from the diagram means that there will no longer be
any causal effect between the corresponding variables.
These arrows are called atomic direct effects in DAGitty, 5.5 Causal effect identification
and they can be highlighted – like in the above DAG – Some of the most important features of DAGitty are
by ticking the checkbox with that name. On the other concerned with the question: how can causal effects be
hand, for the thin arrow X → Y, there is also the indirect estimated from observational data? Currently, two types
pathway X → M → Y. One may therefore reconsider of causal effect identification are supported: adjustment
whether the arrow X → Y is truly necessary – perhaps sets, and instrumental variables.
the causal effect from X to Y is entirely mediated through
M. 5.5.1 Adjustment sets
Finding sufficient adjustment sets is one main purpose

5.4 View mode
of DAGitty. In a nutshell, a sufficient adjustment set Z is
There are several ways to transform a given DAG such a set of covariates such that adjustment, stratification, or
that it becomes better suited for a particular purpose. selection (e.g. by restriction or matching) will minimize
We call such a transformed DAG a derived graph. Cur- bias when estimating the causal effect of the exposure
rently DAGitty can display two kinds of derived graphs: on the outcome (assuming that the causal assumptions
correlation graphs, and moral graphs. These derived encoded in the diagram hold). You can read more about
graphs can be shown by clicking on the respective radio controlling bias and confounding in Pearl’s textbook,
button in the “View mode” field on the left-hand side of chapter 3.3 and epilogue [8]. Moreover, Shrier and Platt
the screen. [14] give a nice step-by-step tutorial on how to test if a
set of covariates is a sufficient adjustment set.
5.4.1 The correlation graph To identify adjustment sets, the diagram must contain
at least one exposure and at least one outcome.
The correlation graph is not a DAG, but a simple graph
with lines instead of arrows. It connects each pair of
Total and direct effects. One can understand adjust-
variables that, according to the diagram, could be statisti-
ment sets graphically by viewing an adjustment set as a
cally dependent. In other words, variables not connected
set Z that closes all all biasing paths while keeping de-
by a line in the correlation graph must be statistically
sired causal paths open (see previous section). DAGitty
independent. These pairwise independencies are also
considers two kinds of adjustment sets:
listed in the “Testable implications” field on the right-hand
side of the screen, and so the correlation graph could be • Adjustment sets for the total effect are sets that close
seen as encoding a subset of those implications. all biasing paths and leave all causal paths open.
Although this is not implemented in DAGitty yet, it In the literature, if the effect is not mentioned (e.g.
is also possible to take a given correlation graph (which [14, 6]), then usually this kind of adjustment set is
can be obtained e.g. by thresholding a covariance matrix) meant.
and list all the DAGs that are “compatible” with it in
9
C1 M the path E ← A → Z ← B ← D: Because both E and
D depend on Z, adjusting for Z will induce additional
X Y correlation between E and D.
C2 Finding minimal sufficient adjustment sets. To find

minimal sufficient adjustment sets, select the option
Figure 2: A causal diagram where the total and direct effects “Adjustment (total effect)” or “Adjustment (direct effect)”
of exposure X on outcome Y are not equal. The total effect is in the “Causal effect identification” field. DAGitty will
the effect mediated only via the thick (both dashed and solid)
arrows, while the direct effect is the effect mediated only via then calculate all minimal sufficient adjustment sets
the thick arrow. and display them in that field. Any changes made
to the diagram will be instantly reflected in the list of
• Adjustment sets for the direct effect are sets that adjustment sets.
close all biasing paths and all causal paths, and
leave only the direct arrow from exposure X to Forcing adjustment for specific covariates. You can
outcome Y (i.e., the path X → Y, if it exists) open. also tell DAGitty that you wish a specific covariate to be
included into every adjustment set. To do this, move the
In a diagram where the only causal path between mouse over the vertex of that covariate and press the a
exposure and outcome is the path X → Y, the total effect key. DAGitty will then update the list of minimal suffi-
and the direct effect are equal. This is true e.g. for the cient adjustment sets accordingly – every set displayed
diagram in Figure 1. An example diagram where the is now minimal in the sense that removing any variable
direct and total effects are not equal is shown in Figure 2. except those you specified will render that set insufficient.
As proved by Lauritzen et al. [7] (see also Tian et However, when you adjust for an intermediate or an-
al. [20]), it suffices to restrict our attention to the part of other descendant of the exposure, DAGitty will tell you
the model that consists of exposure, outcome, and their that it is no longer possible to find a valid adjustment
ancestors for identifying sufficient adjustment sets. This set.
is indicated by DAGitty by coloring irrelevant nodes in
gray. The relevant variables are colored according to Avoiding adjustment for unobserved covariates. You
which node they are ancestors of (exposure, outcome, or can tell DAGitty that a certain variable is unobserved
both) – see the legend on the left-hand side of the screen. (e.g. not measured at present, or not measurable because
The highlighting may be turned on and off by toggling it is a latent variable) by moving the mouse over that
the “highlight ancestors” checkbox. covariate and pressing the u key. DAGitty will only
calculate adjustment sets that do not contain unobserved
Minimal sufficient adjustment sets. A minimal suf- variables. However, if too many or some important
ficient adjustment set is a sufficient adjustment set of variables are unobserved, then it may be impossible to
which no proper subset is itself sufficient. For example, close all biasing paths.
consider again the causal diagram in Figure ??. The
following three sets are sufficient adjustment sets for the 5.5.2 Instrumental variables
total and direct effects, which are equal in this case:
Sometimes it is not possible to estimate a causal effect
{A, B, Z} by simple covariate adjustment. For example, this is
the case whenever there is an unobserved confounder
that directly effects the exposure and outcome variables.
{A, Z} However, this does not necessarily mean that it is im-
possible to estimate the causal effect at all. Instrumental
{B, Z} variable regression is a technique that is often used in
situations wit unobserved confounders. Note that this
Each of these sets is sufficient because it closes all technique depends on linearity assumptions. For fur-
biasing paths and leaves the causal path open. The sets ther information on instrumental variables, please refer
{A, Z} and {B, Z} are minimal sufficient adjustment sets to the literature [2, 5]. DAGitty can find instrumental
while the set {A, B, Z} is sufficient, but not minimal. In variables in DAGs, as explained below.
contrast, the set {Z} is not sufficient, since this would open
10
The validity of an instrumental variable I depends that do not involve any unobserved variables. Note that
on two causal conditions – exogeneity and exclusion the set of testable implications displayed by DAGitty
restriction. These two conditions can be expressed in does not constitute a “basis set” [8]. Future versions will
the language of DAGs and paths as follows: (1) there allow choosing between different basis sets.
must be an open path between I and the exposure X; In general, the less arrows a diagram contains, the
and (2) all paths between I and the outcome Y must be more testable predictions it implies. For this reason,
closed in a modified graph where all edges out of X are “simpler” models with fewer arrows are in general easier
removed. A variable that fulfills these two conditions is to falsify (Occam’s razor).
called an instrumental variable or simply an instrument.
Instrumental variables can also be generalized such
that the two conditions are required to hold conditional
6 Acknowledgements
on a set of covariates Z [3]. The two conditions then read I would like to thank my collaborators Maciej Liśkiewicz
as follows: (1) there must be a path between I and X and Benito van der Zander (both at the Institute for
that is opened by Z; and (2) all paths between I and Y Theoretical Computer Science, University of Lübeck,
must be closed by Z in a modified graph where all edges Germany) for our collaborations on developing efficient
out of X are removed. A variable that fulfills these two algorithms to analyze causal diagrams.
conditions is called a conditional instrument. I also thank Michael Elberfeld, Juliane Hardt, Sven
DAGitty will find both “classic” and conditional Knüppel, Keith Marcus, Judea Pearl, Sabine Schipf, and
instruments when the option “Instrumental Variable” is Felix Thoemmes (in alphabetical order) for enlighten-
selected under the “Causal effect identification” field. Note ing discussions (either in person, per e-mail, or on the
that DAGitty will not always list all possible instruments; SEMnet discussion list) about DAGs that made this pro-
instead, it will restrict itself to a certain well-defined gram possible. Furthermore, I thank Robert Balshaw,
subset that we call “ancestral instruments”. However, George Ellison, Marlene Egger, Angelo Franchini, Ulrike
whenever any instrument or conditional instrument Förster, Mark Gilthorpe, Dirk van Kampen, Jeff Martin,
exists at all, then DAGitty is guaranteed to find one. Jillian Martin, Karl Michaëlsson, David Tritchler, Eric
Note also that if there are several instruments available, Vittinghof, and other users for sending feedback and
then it is best to choose the one that is most strongly bug reports that greatly helped to improve DAGitty.
correlated with X (conditional on Z in the case of a The development of DAGitty was sponsored by
conditional instrument). funding from the Institute of Genetics, Health and Ther-
For details regarding ancestral instruments and how apeutics at Leeds University, UK. I thank George Ellison
DAGitty computes them, please refer to the research for arranging this generous support.
paper where we describe these methods [22].
5.6 Testable implications 7 Legal notice

Any implications that are obtained from a causal dia- Use of DAGitty is (and will always be) freely permitted
gram, such as possible adjustment sets or instrumental and free of charge. You can download DAGitty’s source
variables, are of course dependent on the assumptions code from github.com/jtextor/dagitty. The source
encoded in the diagram. To some extent, these assump- code is available under the GNU General Public License
tions can be tested via the (conditional) independences (GPL), either version 2.0, or any later version, at the
implied by the diagram: If two variables X and Y are licensee’s choice; see the file LICENSE.txt in the down-
d-separated by a set Z, then X and Y should be condi- load archive for details. In particular, the GPL permits
tionally independent given Z. The converse is not true: you to modify and redistribute the source as you please
Two variables X and Y can be independent given a set as long as the result remains itself under the GPL.
Z even though they are not d-separated in the diagram.
Furthermore, two variables can also be d-separated by
the empty set Z = ∅. In that case, the diagram implies
8 Bundled libraries
that X and Y are unconditionally independent. DAGitty ships along with the JavaScript library Pro-
DAGitty displays all minimal testable implications totype.js, a framework that makes life with JavaScript
in the “Testable implications” text field. Only such impli- much easier. Only some parts of Prototype (mainly
cations will be displayed that are in fact testable, i.e., those focusing on data structures) are included to keep
11
the code small. Developed by the Prototype Core Team References
and licensed under the MIT license [16].
Furthermore, DAGitty contains some modified code [1] Silvia Acid and Luis M. De Campos. An algorithm
from the Dracula Graph Library by Philipp Strathausen, for finding minimum d-separating sets in belief
which is also licensed under the MIT license [15]. networks. In Proceedings of the 12th Conference of
I am grateful to the authors of these libraries for their Uncertainty in Artificial Intelligence, pages 3–10, 1996.
valuable work. [2] Joshua D. Angrist, Guido W. Imbens, and Donald B.
Rubin. Identification of causal effects using instru-
9 Bundled examples mental variables. Journal of the American Statistical
Association, 91(434):444–55, 1996.
DAGitty contains some builtin examples for didactic
[3] Carlos Brito and Judea Pearl. Generalized instru-
and illustrative purposes. Some of these examples are
mental variables. In Proceedings of the 18th Conference
taken from published papers or talks given at scientific
on Uncertainty in Artificial Intelligence, pages 85–93,
meetings. These are, in inverse chronological order:
2002.
• van Kampen 2014 [23]
[4] Vanessa Didelez, Svend Kreiner, and Niels Keiding.
• Polzer et al., 2012 [10] Graphical models for inference under outcome-
dependent sampling. Statistical Science, 25(3), Au-
• Schipf et al., 2010 [12] gust 2010.
• Didelez et al., 2010 [4] [5] Guido Imbens. Instrumental variables: An econo-
• Shrier & Pratt, 2008 [14] metrician’s perspective. Statistical Science, 29(3):323–
58, 2014.
• Sebastiani et al.3 , 2005 [13]
[6] Sven Knüppel and Andreas Stang. DAG program:
• Acid & de Campos, 1996 [1] identifying minimal sufficient adjustment sets. Epi-
demiology, 21(1):159, 2010.
Another example was provided by Felix Thoemmes
via personal communication (2013). [7] Steffen L. Laurizen, A. Philip Dawid, Birgitte N.
Larsen, and Hanns-Georg Leimer. Independence
properties of directed markov fields. Networks,
10 Author contact
20(5):491–505, 1990.
I would be glad to receive feedback from those who use [8] Judea Pearl. Causality: Models, Reasoning and Infer-
DAGitty for research or educational purposes. Also, you ence. Cambridge University Press, New York, NY,
are welcome to send me your suggestions or requests USA, 2nd edition, 2009.
for features that you miss in DAGitty.
[9] Judea Pearl, Madelyn Glymour, and Nicholas P
Johannes Textor Jewell. Causality Inference in Statistics: A Primer.
Data Science group Wiley, New York, NY, USA, 1st edition, 2016.
Radboud University
[10] Ines Polzer, Christian Schwahn, Henry Völzke,
Nijmegen, The Netherlands
Torsten Mundt, and Reiner Biffar. The associa-
tion of tooth loss with all-cause and circulatory
[email protected]
mortality. Is there a benefit of replaced teeth? A
johannes-textor.name
systematic review and meta-analysis. Clinical Oral
Mastodon: @johannes textor
Investigations, 16(2):333–351, 2012.
3 The example actually shows only a small part of their DAG. [11] Kenneth J. Rothman, Sander Greenland, and Timo-
thy L. Lash. Modern Epidemiology. Wolters Kluwer,
2008.
12
[12] Sabine Schipf, Robin Haring, Nele Friedrich, International Joint Conference on Artificial Intelligence
Matthias Nauck, Katharina Lau, Dietrich Alte, An- (IJCAI 2015), pages 3243–49. AAAI Press, 2015.
dreas Stang, Henry Völzke, and Henri Wallaschof-
[23] Dirk van Kampen. The ssq model of schizophrenic
ski. Low total testosterone is associated with in-
prodromal unfolding revised: An analysis of its
creased risk of incident type 2 diabetes mellitus in
causal chains based on the language of directed
men: Results from the study of health in pomerania
graphs. European Psychiatry, 29(7):437–48, 2014.
(SHIP). The Aging Male, 14(3):168–75, 2011.
[13] Paola Sebastiani, Marco F. Ramoni, Vikki Nolan,

Clinton T. Baldwin, and Martin H. Steinberg. Ge-
netic dissection and prognostic modeling of overt
stroke in sickle cell anemia. Nature Genetics, 37:435–
40, 2005.
[14] Ian Shrier and Robert W. Platt. Reducing bias

through directed acyclic graphs. BMC Medical Re-
search Methodology, 8(70), 2008.
[15] Philipp Strathausen. Dracula graph layout and

drawing framework. http://www.graphdracula.
net, 2010.
[16] Prototype Core Team. Prototype–javascript library.

http://www.prototypejs.org, 2010.
[17] Johannes Textor, Alexander Idelberger, and Maciej

Liśkiewicz. Learning from pairwise marginal in-
dependencies. In Proceedings of the 31st Conference
on Uncertainty in Artificial Intelligence, pages 882–91.
AUAI Press, 2015.
[18] Johannes Textor and Maciej Liśkiewicz. Adjust-

ment criteria in casual diagrams: an algorithmic
perspective. In Proceedings of the 27th Conference on
Uncertainty in Artificial Intelligence, pages 681–88.
AUAI Press, 2011.
[19] Johannes Textor, Benito van der Zander, Mark S.

Gilthorpe, Maciej Liśkiewicz, and George TH Elli-
son. Robust causal inference using directed acyclic
graphs: the R package ‘dagitty’. International Journal
of Epidemiology, 45(6):1887–1894, December 2016.
[20] Jin Tian, Azaria Paz, and Judea Pearl. Finding min-
imal d-separators. Technical Report R-254, UCLA,
1998.
[21] Benito van der Zander, Maciej Liśkiewicz, and

Johannes Textor. Constructing separators and ad-
justment sets in ancestral graphs. In Proceedings
of the 30th Conference on Uncertainty in Artificial
Intelligence, pages 907–16. AUAI Press, 2014.
[22] Benito van der Zander, Johannes Textor, and Maciej

Liśkiewicz. Efficiently finding conditional instru-
ments for causal inference. In Proceedings of the 24th
13

Manual-3 X

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Manual-3 X

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Manual-3 X

Uploaded by

Copyright:

Available Formats

Drawing and Analyzing Causal DAGs with DAGitty

July 18, 2023

Contents 5 Analyzing diagrams 8

mathematicians, for whom a DAG is simply an abstract mathematical

(a) (b) (c) 3.4 Saving the diagram

Finding sufficient adjustment sets is one main purpose

C2 Finding minimal sufficient adjustment sets. To find

5.6 Testable implications 7 Legal notice

[13] Paola Sebastiani, Marco F. Ramoni, Vikki Nolan,

[14] Ian Shrier and Robert W. Platt. Reducing bias

[15] Philipp Strathausen. Dracula graph layout and

[16] Prototype Core Team. Prototype–javascript library.

[17] Johannes Textor, Alexander Idelberger, and Maciej

[18] Johannes Textor and Maciej Liśkiewicz. Adjust-

[19] Johannes Textor, Benito van der Zander, Mark S.

[21] Benito van der Zander, Maciej Liśkiewicz, and

[22] Benito van der Zander, Johannes Textor, and Maciej

You might also like