0% found this document useful (0 votes)

40 views11 pages

Causal Inference: 36-350, Data Mining, Fall 2009 4 December 2009

The document discusses two problems in causal inference: 1) estimating causal effects given a known causal structure, and 2) discovering causal structure given observational data. For the first problem, the key is controlling for confounding variables to isolate the causal effect. Two sufficient criteria for doing so are the back-door criterion and front-door criterion. For the second problem, discovering causal structure involves testing for (in)dependence relationships between variables and using the results to orient causal edges in the graph.

Uploaded by

machinelearner

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views11 pages

Causal Inference: 36-350, Data Mining, Fall 2009 4 December 2009

Uploaded by

machinelearner

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Causal Inference

36-350, Data Mining, Fall 2009

4 December 2009

Contents
1 Estimating Causal Effects with Known Structure 1

2 Discovering Causal Structure 4

2.1 Causal Discovery with Known Variables . . . . . . . . . . . . . . 4
2.2 Causal Discovery with Hidden Variables . . . . . . . . . . . . . . 7
2.3 Note on Conditional Independence Tests . . . . . . . . . . . . . . 7
2.4 Limitations on Consistency of Causal Discovery . . . . . . . . . . 7

3 Exercises 9

A Pseudocode for the SGS Algorithm 10

There are two problems which are both known as “causal inference”:
1. Given the causal structure of a system, estimate the effects the variables
have on each other.
2. Given data about a system, find its causal structure.
The first problem is easier, so we’ll begin with it.

1 Estimating Causal Effects with Known Struc-

ture
Suppose we want to estimate the causal effect of X on Y , Pr (Y |do(X = x)).
If we can actually manipulate the system, then the statistical problem is triv-
ial: set X to x, measure Y , and repeat often enough to get an estimate of the
distribution. (As my mother says, “Why think when you can just do the exper-
iment?”) So suppose we can’t do the experiment. We can however get the joint
distribution Pr (X, Y, Z) for some collection of covariates Z, and we know the
causal graph. Is this enough to determine Pr (Y |do(X = x))? That is, does the
joint distribution identify the causal effect?

1
X Y

Figure 1: “Controlling for” additional variables can introduce bias into esti-
mates of causal effects. Here the effect of X on Y is directly identifiable,
Pr (Y |do(X = x)) = Pr (Y |X = x). If we also condition on Z however, because
it is a common effect of X and Y , we’d get Pr (Y |X = x, Z = z) 6= Pr (Y |X = x).
In fact, even if there were no arrow from X to Y , conditioning on Z would make
Y depend on X.

The answer is “yes” when the covariates Z contain all the other relevant
variables. The inferential problem is then trivial again, or at least no worse than
any other statistical estimation problem. In fact, if we know the causal graph
and get to observe all the variables, then we could (in principle) just use our
favorite non-parametric conditional density estimate at each node in the graph,
with its parent variables as the inputs and its own variable as the response.
Multiplying conditional distributions together gives the whole distribution of
the graph, and we can get any causal effects we want by surgery. If we’re
willing to assume a bit more, we can get away with just using non-parametric
regression or even just an additive model at each node. Assuming yet more, we
could use parametric models at each node; the linear-Gaussian assumption is
(alas) very popular.
If some variables are not observed, then the issue of which causal effects are
observationally identifiable is considerably trickier. Apparently subtle changes
in which variables are available to us and used can have profound consequences.
The basic principle underlying all considerations is that we would like to
condition on adequate control variables, which will block paths linking X and
Y other than those which would exist in the surgically-altered graph where all
paths into X have been removed. If other unblocked paths exist, then there is
some confounding of the causal effect of X on Y with their mutual association
with third parties. Just conditioning on everything possible does not give us
adequate control, or even necessarily bring us closer to it (Figure 1 and Exercise
1).
There are two main sufficient criteria we can use to get adequate control;

2
they are called the back-door criterion and the front-door criterion.
If we want to know the effect of X on Y and have a set of variables S as the
control, then S satisfies the back-door criterion if (i) S blocks every path from
X to Y that has an arrow into X (“blocks the back door”), and (ii) no node in
S is a descendant of X. Then
X
Pr (Y |do(X = x)) = Pr (Y |X = x, S = s) Pr (S = s) (1)
s

Notice that all the items on the right-hand side are observational conditional
probabilities, not counterfactuals.
On the other hand, S satisfies the front-door criterion when (i) S blocks all
directed paths from X to Y , (ii) there are no unblocked back-door paths from
X to S, and (iii) X blocks all back-door paths from S to Y . Then
X X
Pr (Y |do(X = x)) = Pr (S = s|X = x) Pr (Y |X = x0 , S = s) Pr (X = x0 )
s x0
(2)
A natural reaction to the front-door criterion is “Say what?”, but it becomes
more comprehensible if we take apart its. Because, by clause (i), S blocks all
directed paths from X to Y , any causal dependence of Y on X must be mediated
by a dependence of Y on S:
X
Pr (Y |do(X = x)) = Pr (Y |do(S = s)) Pr (S = s|do(X = x))
s

Clause (ii) says that we can estimate the effect of X on S directly,

Pr (S = s|do(X = x)) = Pr (S = s|X = x) .
Clause (iii) say that X satisfies the back-door criterion for estimating the effect
of S on Y , and the inner sum in Eq. 2 is just the back-door estimate (Eq. 1) of
Pr (Y |do(S = s)). So really we are using the back door criterion. (See Figure
2.)
Both the back-door and front-door criteria are sufficient for estimating causal
effects from probabilistic distributions, but are not necessary. Necessary and
sufficient conditions for the identifiablity of causal effects are in principle pos-
sible but don’t have a nice snappy form (Pearl, 2009, §§3.4–3.5). A necessary
condition for un-identifiability, however, is the presence of an unblockable back-
door path from X to Y . However, this is not sufficient for lack of identification
— we might, for instance, be able to use the front door criterion, as in Figure
2.
When identification — that is, adequate control of confounding — is not
possible, it may still be possible to bound causal effects. That is, even if we
can’t say exactly that Pr (Y |do(X = x)) must be, we can still say it has to fall
within a certain (non-trivial!) range of possibilities. The development of bounds
for non-identifiable quantities, what’s sometimes called partial identification,
is an active area of research, which I think is very likely to work its way back
into data-mining; the best introduction is probably Manski (2007).

3
U

X Z Y

Figure 2: Illustration of the front-door criterion, after Pearl (2009, Figure 3.5).
X, Y and Z are all observed, but U is an unobserved common cause of both X
and Y . X ← U → Y is a back-door path confounding the effect of X on Y with
their common cause. However, all of the effect of X on Y is mediated through
X’s effect on Z. Z’s effect on Y is, in turn, confounded by the back-door path
Z ← X ← U → Y , but X blocks this path. So we can use back-door adjustment
to find Pr (Y |do(Z = z)), and directly find Pr (Z|do(X = x)) = Pr (Z|X = x),
and putting these together gives Pr (Y |do(X = x)).

2 Discovering Causal Structure

2.1 Causal Discovery with Known Variables
Causal discovery is silly with just one variable, and too hard with just two for
us.1
So let’s start with three variables, X, Y and Z. By testing for independence
and conditional independence, we could learn that there had to be edges be-
tween X and Y and Y and Z, but not between X and Z.2 But conditional
independence is a symmetric relationship, so how could we orient those edges,
give them direction? Well, there are only four possible directed graphs corre-
sponding to that undirected graph:
• X → Y → Z (a chain);

• X ← Y ← Z (the other chain);

1 But see Janzing (2007); Hoyer et al. (2009) for some ideas on how you could do it if

you’re willing to make some extra assumptions. The basic idea of these papers is that the
distribution of effects given causes should be simpler, in some sense, than the distribution of
causes given effects.
2 Remember that an edge between X and Y means that either X is a parent of Y , X → Y ,

or Y is a parent of X, X ← Y . Either way, the two variables will be dependent no matter

what collection of other variables we might condition on. If X Y |S for some set of variables
S, then, and only then, is there no edge between X and Y .

4
• X ← Y → Z (a fork on Y );
• X → Y ← Z ( a collision at Y )

|=
With the fork or either chain, we have X Z|Y . On the other hand, with

|=
the collider we have X 6 Z|Y . (This is where the assumption of faithfulness

|=
comes in.) Thus X 6 Z|Y if and only if there is a collision at Y . By testing for
this conditional independence, we can either definitely orient the edges, or rule
out an orientations. If X − Y − Z is just a subgraph of a larger graph, we can

|=
still identify it as a collider if X 6 Z| {Y, S} for all collections of nodes S (not
including X and Z themselves, of course).
With more nodes and edges, we can induce more orientations of edges
by consistency with orientations we get by identifying colliders. For example,
suppose we know that X, Y, Z is either a chain or a fork on Y . If we learn that
X → Y , then the triple cannot be a fork, and must be the chain X → Y → Z.
So orienting the X − Y edge induces an orientation of the Y − Z edge. We can
also sometimes orient edges through background knowledge; for instance we
might know that Y comes later in time than X, so if there is an edge between
them it cannot run from Y to X.3 We can eliminate other edges based on
similar sorts of background knowledge: men tend to be heavier than women,
but changing weight does not change sex, so there can’t be an edge (or even a
directed path!) from weight to sex.
Orienting edges is the core of the basic causal discovery procedure, the SGS
algorithm (Spirtes et al., 2001, §5.4.1, p. 82). This assumes:
1. The data-generating distribution has the causal Markov property on a
graph G.
2. The data-generating distribution is faithful to G.
3. Every member of the population has the same distribution.
4. All relevant variables are in G.
5. There is only one graph G to which the distribution is faithful.
Abstractly, the algorithm works as follows:
• Start with a complete undirected graph on all variables.
3 Some have argued, or at least entertained the idea, that the logic here is backwards: rather

than order in time constraining causal relations, causal order defines time order. (Versions
of this idea are discussed by, inter alia, Russell (1927); Wiener (1961); Reichenbach (1956);
Pearl (2009); Janzing (2007) makes a related suggestion). Arguably then using order in time
to orient edges in a causal graph begs the question, or commits the fallacy of petitio principii.
But of course every syllogism does, so this isn’t a distinctively statistical issue. (Take the
classic: “All men are mortal; Socrates is a man; therefore Socrates is mortal.” How can
we know that all men are mortal until we know about the mortality of this particular man,
Socrates? Isn’t this just like asserting that tomatoes and peppers must be poisonous, because
they belong to the nightshade family of plants, all of which are poisonous?) While these
philosophical issues are genuinely fascinating, this footnote has gone on long enough, and it
is time to return to the main text.

5
• For each pair of variables, see if conditioning on some set of variables
makes them conditionally independent; if so, remove their edge.
• Identify all colliders by checking for conditional dependence; orient the
edges of colliders.
• Try to orient undirected edges by consistency with already-oriented edges;
do this recursively until no more edges can be oriented.
Pseudo-code is in the appendix.
Call the result of the SGS algorithm G. b If all of the assumptions above hold,
and the algorithm is correct in its guesses about when variables are conditionally
independent, then G b = G. In practice, of course, conditional independence
guesses are really statistical tests based on finite data, so we should write the
output as Gb n , to indicate that it is based on only n samples. If the conditional
independence test is consistent, then

lim Pr G b n 6= G = 0
n→∞

In other words, the SGS algorithm converges in probability on the correct causal
structure; it is consistent for all graphs G. Of course, at finite n, the probability
of error — of having the wrong structure — is (generally!) not zero, but this
just means that, like any statistical procedure, we cannot be absolutely certain
that it’s not making a mistake.
One consequence of the independence tests making errors on finite data can
be that we fail to orient some edges — perhaps we missed some colliders. These
unoriented edges in G b n can be thought of as something like a confidence region
— they have some orientation, but multiple orientations are all compatible with
the data.4 As more and more edges get oriented, the confidence region shrinks.
If the fifth assumption above fails to hold, then there are multiple graphs
G to which the distribution is faithful. This is just a more complicated version
of the difficulty of distinguishing between the graphs X → Y and X ← Y . All
the graphs in this equivalence class may have some arrows in common; in
that case the SGS algorithm will identify those arrows. If some edges differ in
orientation across the equivalence class, SGS will not orient them, even in the
limit. In terms of the previous paragraph, the confidence region never shrinks
to a single point, just because the data doesn’t provide the information needed
to do this.
If there are unmeasured relevant variables, we can get not just unoriented
edges, but actually arrows pointing in both directions. This is an excellent sign
that some basic assumption is being violated.
The SGS algorithm is statistically consistent, but very computationally in-
efficient; the number of tests it does grows exponentially in the number of vari-
ables p. This is the worst-case complexity for any consistent causal-discovery
procedure, but this algorithm just proceeds immediately to the worst case, not
4 I say “multiple orientations” rather than “all orientations”, because picking a direction

for one edge might induce an orientation for others.

6
taking advantage of any possible short-cuts. A refinement, called the PC algo-
rithm, tries to minimize the number of conditional independence tests performed
(Spirtes et al., 2001, §5.4.2, pp. 84–88). There is actually an implementation
of the PC algorithm in R (PCalg on CRAN), but it assumes linear-Gaussian
models (Kalisch and Bühlmnann, 2007).

2.2 Causal Discovery with Hidden Variables

Suppose that the set of variables we measure is not causally sufficient. Could
we at least discover this? Could we possibly get hold of some of the causal rela-
tionships? Algorithms which can do this exist (e.g., the CI and FCI algorithms
of Spirtes et al. (2001, ch. 6)), but they require considerably more graph-fu.
The results of these algorithms can succeed in removing some edges between
observable variables, and definitely orienting some of the remaining edges. If
there are actually no latent common causes, they end up acting like the SGS or
PC algorithms.
There is not, so far as I know, any implementation of the CI or FCI al-
gorithms in R. The FCI and PC algorithms, along with some other, related
procedures, are implemented in the stand-alone Java program Tetrad (http:
//www.phil.cmu.edu/projects/tetrad/). It would be a Good Thing if some-
one were to re-implement these algorithms in R.

2.3 Note on Conditional Independence Tests

The abstract algorithms for causal discovery assume the existence of consistent
tests for conditional independence. The implementations known to me assume
either that variables are discrete (so that one can basically use the χ2 test), or
that they are continuous, Gaussian, and linearly related (so that one can test
for vanishing partial correlations). It bears emphasizing that these restrictions
are not essential. As soon as you have a consistent independence test, you
are, in principle, in business. In particular, consistent non-parametric tests of
conditional independence would work perfectly well. An interesting example of
this is the paper by Chu and Glymour (2008), on finding causal models for the
time series, assuming additive but non-linear models.

2.4 Limitations on Consistency of Causal Discovery

There are some important limitations to causal discovery algorithms (Spirtes
et al., 2001, §12.4). They are universally consistent: for all causal graphs G,5

lim Pr G b n 6= G = 0 (3)
n→∞

The probability of getting the graph wrong can be made arbitrarily small by
using enough data. However, this says nothing about how much data we need
5 If the true distribution is faithful to multiple graphs, then we should read G as their

common graph pattern, which has some undirected edges.

7
to achieve a given level of confidence, i.e., the rate of convergence. Uniform
consistency would mean that we could put a bound on the probability of error
as a function of n which did not depend on the true graph G. Robins et al.
(2003) proved that no uniformly-consistent causal discovery algorithm can exist.
The issue, basically, is that the Adversary could make the convergence in Eq. 3
arbitrarily slow by selecting a distribution which, while faithful to G, came very
close to being unfaithful, making some of the dependencies implied by the graph
arbitrarily small. For any given dependence strength, there’s some amount of
data which will let us recognize it with high confidence, but the Adversary can
make the required data size as large as he likes by weakening the dependence,
without ever setting it to zero.6
The upshot is that so uniform, universal consistency is out of the question;
we can be universally consistent, but without a uniform rate of convergence;
or we can converge uniformly, but only on some less-than-universal class of
distributions. These might be ones where all the dependencies which do exist
are not too weak (and so not too hard to learn reliably from data), or the number
of true edges is not too large (so that if we haven’t seen edges yet they probably
don’t exist; Janzing and Herrmann, 2003; Kalisch and Bühlmnann, 2007).
It’s worth emphasizing that the Robins et al. (2003) no-uniform-consistency
result applies to any method of discovering causal structure from data. Invoking
human judgment, Bayesian priors over causal structures, etc., etc., won’t get
you out of it.

6 Roughly speaking, if X and Y are dependent given Z, the probability of missing this

conditional dependence with a sample of size n should go to zero like O(2−nI[X;Y |Z] ), I being
mutual information. To make this probability equal to, say, α we thus need n = O(− log α/I)
samples. The Adversary can thus make n extremely large by making I very small, yet positive.

8
3 Exercises
Not to hand in.
2
1. Take the model in Figure 1. Suppose that X ∼ N (0, σX ), Y = αX +
and Z = β1 X + β2 Y + η, where and η are mean-zero Gaussian noise. Set
this up in R and run regress Y twice, once on X alone and once on X and
Z. Can you find any values of the parameters where the coefficient of X
in the second regression is even approximately equal to α? (It’s possible
to solve this problem exactly through linear algebra instead.)
2. Take the model in Figure 2 and parameterize it as follows: U ∼ N (0, 1),
X = α1 U +, Z = βX +η, Y = γZ +α2 U +ξ, where , η, ξ are independent
Gaussian noises. If you regress Y on Z, what coefficient do you get? If
you regress Y on Z and X? If you do a back-door adjustment for X?
(Approach this either analytically or through simulation, as you like.)
3. Continuing in the set-up of the previous problem, what coefficient do you
get for X when you regress Y on Z and X? Now compare this to the
front-door adjustment for the effect of X on Y .

9
A Pseudocode for the SGS Algorithm
When you see a loop, assume that it gets entered at least once. “Replace” in
the sub-functions always refers to the input graph.

SGS = function(set of variables V) {

G
b = colliders(prune( complete undirected graph on V))
until (Gb == G0 ) {
G = G0
b
G0 = orient(G)b
}
return(G)b
}

prune = function(G) {
for each A, B ∈ V {
for each S ⊆ V \ {A, B} {
|=

if A B|S { G = G \ (A − B) }
}
}
return(G)
}

collliders = function(G) {
for each (A − B) ∈ G {
for each (B − C) ∈ G {
if (A − C) 6∈ G {
collision = TRUE
for each S ⊂ B ∩ V \ {A, C} {
|=

if A C|S { collision = FALSE }

}
if (collision) { replace (A − B) with (A → B), (B − C) with (B ← C) }
}
}
}
return(G)
}

orient = function(G) {
if ((A → B) ∈ G & (B − C) ∈ G & (A − C) 6∈ G) { replace (B − C) with (B → C) }
if ((directed path from A to B)∈ G & (A − B) ∈ G) { replace (A − B) with (A → B) }
return(G)
}

10
References
Chu, Tianjiao and Clark Glymour (2008). “Search for Additive Nonlinear Time
Series Causal Models.” Journal of Machine Learning Research, 9: 967–991.
URL http://jmlr.csail.mit.edu/papers/v9/chu08a.html.
Hoyer, Patrik O., Domink Janzing, Joris Mooij, Jonas Peters and Bernhard
Schölkopf (2009). “Nonlinear causal discovery with additive noise mod-
els.” In Advances in Neural Information Processing Systems 21 [NIPS 2008]
(D. Koller and D. Schuurmans and Y. Bengio and L. Bottou, eds.), pp. 689–
696. Cambridge, Massachusetts: MIT Press. URL http://books.nips.cc/
papers/files/nips21/NIPS2008_0266.pdf.
Janzing, Dominik (2007). “On causally asymmetric versions of Occam’s Razor
and their relation to thermodynamics.” E-print, arxiv.org. URL http://
arxiv.org/abs/0708.3411.
Janzing, Dominik and Daniel Herrmann (2003). “Reliable and Efficient Infer-
ence of Bayesian Networks from Sparse Data by Statistical Learning Theory.”
Electronic preprint. URL http://arxiv.org/abs/cs.LG/0309015.

Kalisch, Markus and Peter Bühlmnann (2007). “Estimating High-Dimensional

Directed Acyclic Graphs with the PC-Algorithm.” Journal of Machine Learn-
ing Research, 8: 616–636. URL http://jmlr.csail.mit.edu/papers/v8/
kalisch07a.html.
Manski, Charles F. (2007). Identification for Prediction and Decision. Cam-
bridge, Massachusetts: Harvard University Press.
Pearl, Judea (2009). Causality: Models, Reasoning, and Inference. Cambridge,
England: Cambridge University Press, 2nd edn.
Reichenbach, Hans (1956). The Direction of Time. Berkeley: University of
California Press. Edited by Maria Reichenbach.
Robins, James M., Richard Scheines, Peter Spirtes and Larry Wasserman (2003).
“Uniform Consistency in Causal Inference.” Biometrika, 90: 491–515. URL
http://www.stat.cmu.edu/tr/tr725/tr725.html.
Russell, Bertrand (1927). The Analysis of Matter . International Library of Phi-
losophy, Psychology and Scientific Method. London: K. Paul Trench, Trubner
and Co. Reprinted New York: Dover Books, 1954.
Spirtes, Peter, Clark Glymour and Richard Scheines (2001). Causation, Predic-
tion, and Search. Cambridge, Massachusetts: MIT Press, 2nd edn.

Wiener, Norbert (1961). Cybernetics: Or, Control and Communication in the

Animal and the Machine. Cambridge, Massachusetts: MIT Press, 2nd edn.
First edition New York: Wiley, 1948.

Causalidade Diagrama 2013
No ratings yet
Causalidade Diagrama 2013
71 pages
O Que É Casual Diagrama
No ratings yet
O Que É Casual Diagrama
70 pages
Lec 14
No ratings yet
Lec 14
40 pages
Buhlman 2020 PPT
No ratings yet
Buhlman 2020 PPT
88 pages
Valse 2
No ratings yet
Valse 2
89 pages
Lec 13
No ratings yet
Lec 13
35 pages
Shimizu 06 A
No ratings yet
Shimizu 06 A
28 pages
Variables & Sampling in Research
No ratings yet
Variables & Sampling in Research
17 pages
Causal Discovery in Machine Learning - Theories and Applications
No ratings yet
Causal Discovery in Machine Learning - Theories and Applications
29 pages
Causal Effects of Intervening Variables in Settings With Unmeasured Confounding
No ratings yet
Causal Effects of Intervening Variables in Settings With Unmeasured Confounding
54 pages
Wieczorek Roth 2019 Entropy
No ratings yet
Wieczorek Roth 2019 Entropy
26 pages
Kenneth Rothman - Timothy L. Lash - Modern Epidemiology-LWW (2020) - 96-142
No ratings yet
Kenneth Rothman - Timothy L. Lash - Modern Epidemiology-LWW (2020) - 96-142
47 pages
Econ 4
No ratings yet
Econ 4
92 pages
Switching Regression Models and Causal Inference - Potentially Bushit Paper Dressing Up Nicely in Math
No ratings yet
Switching Regression Models and Causal Inference - Potentially Bushit Paper Dressing Up Nicely in Math
46 pages
An Introduction To Causal Modelling: Gauranga Kumar Baishya and M. R. Srinivasan Chennai Mathematical Institute (CMI)
No ratings yet
An Introduction To Causal Modelling: Gauranga Kumar Baishya and M. R. Srinivasan Chennai Mathematical Institute (CMI)
52 pages
Neuberg Review
No ratings yet
Neuberg Review
11 pages
Score Matching Through The Roof Llinear, Nonlinear, and Latent Variables Causal Discovery 26th July 2024 (AAA)
No ratings yet
Score Matching Through The Roof Llinear, Nonlinear, and Latent Variables Causal Discovery 26th July 2024 (AAA)
27 pages
M Api
No ratings yet
M Api
17 pages
Bayesian Causal Tutorial Ohiostate June2019
No ratings yet
Bayesian Causal Tutorial Ohiostate June2019
56 pages
CIML2023
No ratings yet
CIML2023
87 pages
Imperial Causality
No ratings yet
Imperial Causality
124 pages
Scott-Cunningham-Causal-Inference-2020 TERCERA PARTE
No ratings yet
Scott-Cunningham-Causal-Inference-2020 TERCERA PARTE
22 pages
Aronow P.M., Miller B.T. - Foundations of Agnostic Statistics-Cambridge University Press (2019)
No ratings yet
Aronow P.M., Miller B.T. - Foundations of Agnostic Statistics-Cambridge University Press (2019)
318 pages
Causal Inference Intro
No ratings yet
Causal Inference Intro
16 pages
Introduction To Causal Inference-Aug25 2020-Neal
No ratings yet
Introduction To Causal Inference-Aug25 2020-Neal
61 pages
09 Causal Inference II: MSBA7003 Quantitative Analysis Methods
No ratings yet
09 Causal Inference II: MSBA7003 Quantitative Analysis Methods
34 pages
Causal Network
No ratings yet
Causal Network
3 pages
The International Journal of Biostatistics: An Introduction To Causal Inference
No ratings yet
The International Journal of Biostatistics: An Introduction To Causal Inference
62 pages
Annurev Statistics 033121 114601
No ratings yet
Annurev Statistics 033121 114601
30 pages
Causal AI Final
No ratings yet
Causal AI Final
71 pages
Peter Spirtes 2010
No ratings yet
Peter Spirtes 2010
20 pages
Causal Discovery Using Proxy Variables
No ratings yet
Causal Discovery Using Proxy Variables
13 pages
Additive Models: 36-350, Data Mining, Fall 2009 2 November 2009
No ratings yet
Additive Models: 36-350, Data Mining, Fall 2009 2 November 2009
16 pages
Sprites, Glymour, Scheines - 1991 - From Probability To Causality
No ratings yet
Sprites, Glymour, Scheines - 1991 - From Probability To Causality
36 pages
Random Sets Approach and Its Applications
No ratings yet
Random Sets Approach and Its Applications
12 pages
Nonlinear Causal Discovery With Additive Noise Models - Hoyer, P. O., Janzing, D., Mooij, J., Peters, J., & Schölkopf, B.
No ratings yet
Nonlinear Causal Discovery With Additive Noise Models - Hoyer, P. O., Janzing, D., Mooij, J., Peters, J., & Schölkopf, B.
8 pages
Causal Notes
No ratings yet
Causal Notes
17 pages
Pearl 10 A
No ratings yet
Pearl 10 A
20 pages
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
Causal Inference in The Social Sciences
No ratings yet
Causal Inference in The Social Sciences
30 pages
Notes - EDA-Unit5
No ratings yet
Notes - EDA-Unit5
21 pages
Causality Bernhard Schölkopf
No ratings yet
Causality Bernhard Schölkopf
169 pages
Testing Identifiability of Causal Effects - David Galles Judea Pearl
No ratings yet
Testing Identifiability of Causal Effects - David Galles Judea Pearl
11 pages
Discovery of Non-Gaussian Linear Causal Models Using ICA
No ratings yet
Discovery of Non-Gaussian Linear Causal Models Using ICA
8 pages
A Sample Path Measure of Causal in Uence: June 2018
No ratings yet
A Sample Path Measure of Causal in Uence: June 2018
6 pages
Dev Unit 5
No ratings yet
Dev Unit 5
22 pages
Image Searches, Abstraction, Invariance: 36-350: Data Mining 2 September 2009
No ratings yet
Image Searches, Abstraction, Invariance: 36-350: Data Mining 2 September 2009
27 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
19 pages
Causal Inference in Statistics: An Overview
100% (1)
Causal Inference in Statistics: An Overview
51 pages
Information and Interaction Among Features: 36-350: Data Mining 9 September 2009
No ratings yet
Information and Interaction Among Features: 36-350: Data Mining 9 September 2009
16 pages
Lecture 21
No ratings yet
Lecture 21
8 pages
Causal Inference in Statistics: An Overview
100% (2)
Causal Inference in Statistics: An Overview
51 pages
The Statistics of Causal Inference: A View From Political Methodology
No ratings yet
The Statistics of Causal Inference: A View From Political Methodology
23 pages
Fast Phrase Querying With Combined Indexes
No ratings yet
Fast Phrase Querying With Combined Indexes
21 pages
Causal Inference: Yu Xie University of Michigan
No ratings yet
Causal Inference: Yu Xie University of Michigan
51 pages
Non-Gaussian Methods For Causal Structure Learning: Shohei Shimizu
No ratings yet
Non-Gaussian Methods For Causal Structure Learning: Shohei Shimizu
11 pages
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
An Introduction To Causal Inference
No ratings yet
An Introduction To Causal Inference
67 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
8 pages
Modern Information Retrieval Amit Singhal
No ratings yet
Modern Information Retrieval Amit Singhal
9 pages
Lecture 2: More Similarity Searching Multidimensional Scaling
No ratings yet
Lecture 2: More Similarity Searching Multidimensional Scaling
8 pages
Thinking
No ratings yet
Thinking
16 pages
Causal Discovery With General Non-Linear Relationships Using Non-Linear ICA
No ratings yet
Causal Discovery With General Non-Linear Relationships Using Non-Linear ICA
10 pages
Lecture 3 - Page Rank
No ratings yet
Lecture 3 - Page Rank
7 pages
More PCA Latent Semantic Analysis and Multidimensional Scaling
No ratings yet
More PCA Latent Semantic Analysis and Multidimensional Scaling
7 pages
Estimating Distributions and Densities: 36-350, Data Mining, Fall 2009 23 November 2009
No ratings yet
Estimating Distributions and Densities: 36-350, Data Mining, Fall 2009 23 November 2009
7 pages
Pruning Policies For Two-Tiered Inverted Index With Correctness Guarantee
No ratings yet
Pruning Policies For Two-Tiered Inverted Index With Correctness Guarantee
8 pages
Pruning Policies For Two-Tiered Inverted Index With Correctness Guarantee
No ratings yet
Pruning Policies For Two-Tiered Inverted Index With Correctness Guarantee
8 pages
Linear Classifiers and The Perceptron Algorithm: 36-350, Data Mining, Fall 2009 16 November 2009
No ratings yet
Linear Classifiers and The Perceptron Algorithm: 36-350, Data Mining, Fall 2009 16 November 2009
5 pages
Causal Inference, Michael E. Sobel
No ratings yet
Causal Inference, Michael E. Sobel
3 pages
Internet Searching: Crawling Is Conceptually Quite Simple: Starting at Some Well-Known Sites On The Web
No ratings yet
Internet Searching: Crawling Is Conceptually Quite Simple: Starting at Some Well-Known Sites On The Web
4 pages
Multi-Dimensional Causal Discovery
No ratings yet
Multi-Dimensional Causal Discovery
7 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
The Past, Present and Future of Web Information Retrieval
No ratings yet
The Past, Present and Future of Web Information Retrieval
1 page
Extending Linear Regression: Weighted Least Squares, Heteroskedasticity, Local Polynomial Regression
No ratings yet
Extending Linear Regression: Weighted Least Squares, Heteroskedasticity, Local Polynomial Regression
15 pages
(2009) Mixture Models Latent Variables and The EM Algorithm PDF
No ratings yet
(2009) Mixture Models Latent Variables and The EM Algorithm PDF
11 pages
07www-Duplicates Main Paper For My Project
No ratings yet
07www-Duplicates Main Paper For My Project
9 pages
Three-Level Caching For Ef Cient Query Processing in Large Web Search
No ratings yet
Three-Level Caching For Ef Cient Query Processing in Large Web Search
10 pages
E Cient Passage Ranking For Document Databases
No ratings yet
E Cient Passage Ranking For Document Databases
26 pages
Justin Zobel-Inverted File For Text Search Engines
No ratings yet
Justin Zobel-Inverted File For Text Search Engines
56 pages

Causal Inference: 36-350, Data Mining, Fall 2009 4 December 2009

Uploaded by

Causal Inference: 36-350, Data Mining, Fall 2009 4 December 2009

Uploaded by

Causal Inference

36-350, Data Mining, Fall 2009

2 Discovering Causal Structure 4

A Pseudocode for the SGS Algorithm 10

1 Estimating Causal Effects with Known Struc-

Clause (ii) says that we can estimate the effect of X on S directly,

2 Discovering Causal Structure

• X ← Y ← Z (the other chain);

or Y is a parent of X, X ← Y . Either way, the two variables will be dependent no matter

for one edge might induce an orientation for others.

2.2 Causal Discovery with Hidden Variables

2.3 Note on Conditional Independence Tests

2.4 Limitations on Consistency of Causal Discovery

common graph pattern, which has some undirected edges.

SGS = function(set of variables V) {

if A C|S { collision = FALSE }

Kalisch, Markus and Peter Bühlmnann (2007). “Estimating High-Dimensional

Wiener, Norbert (1961). Cybernetics: Or, Control and Communication in the

You might also like