0% found this document useful (0 votes)
3 views11 pages

MCI Summative Assignment

The document discusses methods of causal inference using Python simulations to analyze the stability of sample estimates (mean and standard deviation) across varying sample sizes. It highlights the impact of sample size on the accuracy of statistical estimates, demonstrating that larger samples yield more reliable results consistent with the Central Limit Theorem. Additionally, it explores the Average Treatment Effect (ATE) using Rubin's potential outcomes framework and provides code for simulating causal effects with different model parameters.

Uploaded by

Srijan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views11 pages

MCI Summative Assignment

The document discusses methods of causal inference using Python simulations to analyze the stability of sample estimates (mean and standard deviation) across varying sample sizes. It highlights the impact of sample size on the accuracy of statistical estimates, demonstrating that larger samples yield more reliable results consistent with the Central Limit Theorem. Additionally, it explores the Average Treatment Effect (ATE) using Rubin's potential outcomes framework and provides code for simulating causal effects with different model parameters.

Uploaded by

Srijan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Methods of Causal Inference

Assignment

Srijan Chakravorty
s2511480
13th March, 2024

1
Q1.

(a) The Python code attached below generates ’n’ points from a normal distribution each time
for 5 iterations for different values of the sample size ’n’:

import numpy as np

# Variable Definitions
sample_size = 3 # Can take values 3 ,6 ,20 ,100
mean = 0 # mean = 0 as given in the question
st an da rd _deviation = 1 # variance = 1 as given in the question

# Generating and Printing normally distributed data


data = np . random . normal ( mean , standard_deviation , sample_size )
print ( " Data from normal distribution is : " , data )

# Computing mean and standard deviation of generated sample data


mean_sample_data = np . mean ( data )
s t a n d a r d _ d e v i a t i o n _ s a m p l e _ d a t a = np . std ( data , ddof =1)
# Used ddof =1 for sample standard deviation

# Displaying Final Results


print ( " Mean of sampled data : " , mean_sample_data )
print ( " Standard deviation of sampled data : " , s t a n d a r d _ d e v i a t i o n _ s a m p l e _ d a t a )

The values tabulated in Table 1 below shows the mean and standard deviation of the data gen-
erated in each iteration for a given sample size ”n” using the aforementioned code. Simulations
were conducted with varying sample sizes to observe how the means and standard deviations
fluctuated across multiple iterations. This empirical exploration provided an understanding of
the impact of sample size on the stability of statistical estimates.

n=3 First Time Second Time Third time Fourth Time Fifth Time
Mean 0.6039 -0.6727 -0.5464 0.7172 1.5084
Std Deviation 0.5046 1.2634 1.2565 0.8561 1.1876

n=6 First Time Second Time Third time Fourth Time Fifth Time
Mean 0.0881 0.5061 -0.6856 0.4553 0.1439
Std Deviation 1.0210 1.4198 0.8318 0.8178 0.6661

n = 20 First Time Second Time Third time Fourth Time Fifth Time
Mean -0.2198 -0.3253 -0.1094 -0.1844 -0.3634
Std Deviation 0.8274 1.0264 1.1429 0.9868 0.6898

n = 100 First Time Second Time Third time Fourth Time Fifth Time
Mean -0.0493 0.0983 0.1919 -0.1648 0.16919
Std Deviation 0.9837 1.0266 0.9712 1.0427 0.9135

Table 1: Output generated by the code snippet

1
(b) The stability of the sample estimates (mean and standard deviation) refers to how consistent
the sample statistics are as an estimator of the population statistics across different sample sizes.
Sample statistics estimated using a smaller sample size are found to vary significantly (less
stable) with the random samples drawn from the distribution. The sensitivity to extreme
values appearing in the randomly generated data can potentially skew the sample estimates,
causing them to deviate from the true population statistics, leading to instability.
As we gradually increase the sample size, we observe that the variability in estimates decreases
due to the convergence of sample statistics to population statistics, leading to more consistent
and reliable estimates that better represent the underlying (Normal) population distribution.
Moreover, larger sample sizes reduce the impact of random fluctuations inherent in smaller
sample sizes, giving us the notion of stability.
Statistically, as the sample size n increases, the sample mean tends to converge to the true pop-
ulation mean, which is 0 for Normal distribution, and the sample standard deviation converges
to 1. This follows as a corollary of the Central Limit Theorem, which states that if we take
sufficiently large random samples from the population with replacement (n >= 30), then the
distribution of the sample means will tend to be approximately normally distributed, irrespec-
tive of the shape of the original distribution. For n = 3, 6, 20 and 100, the ranges of means
were [-0.6, 1.5], [-0.6, 0.45], [-0.3, -0.1] and [-0.1, 0.1] respectively and the ranges of standard
deviations were [0.5, 1.2], [0.6, 1.4], [0.6, 1.1] and [0.9, 1.0] respectively. It is evident that the
range of sample statistics are more stable and closer to the true population statistics for n=100
than for n=3.
The stability of the mean focuses on how close the sample mean is to the population mean,
while the standard error provides a measure of the precision or variability associated with that
estimate. While both concepts are related to the accuracy and reliability of statistical estimates,
stability of the mean focuses on the consistency of the estimate across different samples, whereas
the standard error quantifies the precision or uncertainty associated with the estimate.
The standard error represents the variability (standard deviation) of the sampling distribution
of the statistic. For example, the standard error of the mean quantifies the variability of sample
means around the true population mean. Increasing the sample size generally leads to a more
stable mean and a smaller standard error, indicating a more reliable, precise and stable estimate
of the population parameter.
From the observations made in section (a), we calculated the standard deviation of means for
n = 3, 6, 20, 100 to be 0.92, 0.47, 0.10 and 0.15 respectively and the standard deviations of
standard deviations to be 0.33, 0.29, 0.17 and 0.05 respectively. From the observations, it is
evident that the standard deviation of means slightly increases from 0.10 to 0.15 for n=20 to
n=100 rather than decreasing. While the standard error of the mean decreases with increasing
sample size, other factors such as random fluctuations, outliers, or the nature of the distribution
can influence the stability of the estimated mean.

2
(c) The Python code below is for generating 20 extra data points and obtaining n estimates of
mean. Then, we plotted the histogram of these n mean estimates. The process was repeated
for n = 10, 100 and 1000 runs.
import numpy as np
import matplotlib . pyplot as plt

# Variable Definitions
sample_size = 20 # value of n is 20
mean = 0 # mean = 0 as given in the question
st an da rd _deviation = 1 # variance = 1 as given in the question
array_means = [] # array to store estimates of mean
no_of_iterations =1000 # 1000 here due to last run

# Executing n iterations as required


for _ in range ( no_of_iterations ) :

# Data generation
new_data = np . random . normal ( mean , standard_deviation , sample_size )

# Mean calculation and storing in array_means


estimated_mean = np . mean ( new_data )
array_means . append ( estimated_mean )

# Histogram Plot
plt . hist ( array_means , bins =100 , color = ’ skyblue ’ , edgecolor = ’ black ’)
plt . xlabel ( ’ Values ’)
plt . ylabel ( ’ Frequency ’)
plt . title ( ’ No of iterations = ’ + str ( no_of_iterations ) + ’ iterations ’)
plt . show ()

Figure 1: Output showing the histogram for 10 estimates of sample means

3
Figure 2: Output showing the histogram for 100 estimates of sample means

Figure 3: Output showing the histogram for 1000 estimates of sample means

4
As the number of iterations increases from 10 to 1000, the histograms become more smooth and
symmetric, with the overall bell-shaped structure becoming more pronounced, representing the
underlying Normal distribution of the generated data. The means tend to cluster more tightly
around the true population mean (0 for Normal Distribution), complying with the Central Limit
theorem in statistics.
With 10 simulations, we observe significant deviation in the estimates and the distribution of
means is not perfectly centered around the true population mean. There is more variability in
the estimates, leading to a wider spread in the distribution of mean estimates. Estimates that
are higher or lower than the true mean skews the distribution, and it is not perfectly centered
around the true population mean of 0.
As we increase the number of simulations to 100, the distribution starts to stabilize, and the
variability in the estimates decreases. At 1000 simulations, the distribution becomes even
more concentrated around the true population mean (peak at 0). With more data points, each
estimate of the mean becomes more precise. The variability in the estimates decreases compared
to the previous case and the distribution of mean estimates becomes more symmetrical, narrower
and peaked around the true population mean of 0 starting to look like the bell-shape of a Normal
distribution.
This is consistent with the Central Limit Theorem and the law of large numbers, which states
that the sample mean converges to the population mean as the sample size increases beyond
n>= 30, irrespective of the shape of the population distribution.

5
Q2.
Average Treatment Effect in terms of variable T and confounder W can be expressed in terms
of model parameters. Let’s denote E[Y | W, T ] as expected value of Y given W and another
variable T . The true parametric model is given by:

E[Y | W, T ] = exp(T + (T + ε)W )

Rubin’s potential outcomes framework provides a way to quantify causal effects. For a hypo-
thetical intervention, it defines the causal effect for an individual as the difference between the
outcomes that would be observed for that individual with versus without the exposure or inter-
vention under consideration. Expressing ATE as the difference in expected outcomes between
treatment (t = 1) and control (t = 0) groups, we use the expectation:

E[Y | W, T = t] = exp(T + (T + ε))

Taking the expectation with respect W (using EW )

EW [E[Y | W, T = t]] = EW [exp(T + (T + ε)W )]

Here, W ∼ N (0, 1), so we write Probability Density Function of W as fw (W ) which is a


standard-normal PDF. Therefore, the expression becomes:
R∞
−∞ exp(T + (T + ε)W )fW (W ) dW

ATE is the difference in expected outcomes for treatment (T = 1) and control (T = 0)

AT E = EW [exp(1 + (1 + ε)W )] − EW [exp(0 + ϵW )]


Z ∞ Z ∞
= exp(1 + (1 + ε)W )fW (W )dW − exp(εW )fW (W )dW
−∞ −∞

To solve this integral, we use PDF of the univariate standard normal distribution given by:
2
fW (W ) = √1 e−W /2

The Average Treatment Effect (ATE) based on the given model as a function of ε is given by
the equation derived below, after substituting the formula for fw (W ).
Z ∞ Z ∞
1 2 1 2
AT E = exp(1 + (1 + ε)W ) √ e−W /2 dW − exp(εW ) √ e−W /2 dW
−∞ 2π −∞ 2π
( ε2 +2ε+3)/2 ε2 /2
=e −e

6
Q3.(a) Following is the code for the simulation:
import numpy as np
import matplotlib . pyplot as plt
import statsmodels . regression . linear_model as sm

# Variable Definitions
no_ of_data_points = 100 # given to be 100
iterations = 50001 # runs

# Model Parameter Definitions


a_1 = 0.005 # given alpha 1 value
a_2 = 0.5 # given alpha 2 value
b = 0 # given alpha 1 value

# Temporary variables - arrays


a _ 1 _ c a u s a l_ ef f ec t _a rr a y = []
a _ 2 _ c a u s a l_ ef f ec t _a rr a y = []

for _ in range ( iterations ) :

# Define u ,z , epsilons for the model as per given configurations


u = np . random . normal (0 , 1 , no_of_data_points )
z = np . random . normal (1 , 0.5 , no_of_data_points )
epsilon_1 = np . random . normal (0 , 1 , no_of_data_points )
epsilon_2 = np . random . normal (0 , 1 , no_of_data_points )

# Model for alpha = 0.005

# given equations
t_1 = a_1 * z + u + epsilon_1
y_1 = b * t_1 + u + epsilon_2
# estimation using alpha 0.005 using 2 SLS Estimator
conf_1 = sm . OLS ( t_1 , z )
fitted_model_1 = conf_1 . fit ()
t1_predicted = fitted_model_1 . predict ()
conf_1 = sm . OLS ( y_1 , t1_predicted )
fitted_model_1 = conf_1 . fit ()

a_1 _causal_effect = fitted_model_1 . params [0] # causal effect estimate


a _ 1 _ c a us a l_ e ff ec t _a rr a y . append ( a_1_causal_effect ) # Add to array for
histogram

# Model for alpha = 0.5

# given equations
t_2 = a_2 * z + u + epsilon_1
y_2 = b * t_2 + u + epsilon_2
# estimation using alpha 0.5 using 2 SLS Estimator
conf_2 = sm . OLS ( t_2 , z )
fitted_model_2 = conf_2 . fit ()
t2_predicted = fitted_model_2 . predict ()
conf_2 = sm . OLS ( y_2 , t2_predicted )
fitted_model_2 = conf_2 . fit ()

a_2 _causal_effect = fitted_model_2 . params [0] # causal effect estimate


a _ 2 _ c a us a l_ e ff ec t _a rr a y . append ( a_2_causal_effect ) # Add to array for
histogram

7
# Plotting the histograms
plt . hist ( a_1_causal_effect_array , bins = 10000 , edgecolor = ’ black ’)
plt . xlabel ( ’ Causal Effect Estimates ’)
plt . ylabel ( ’ Frequency ’)
plt . title ( ’ Histogram of Causal Effect Estimates (2 SLS ) ’)
plt . xlim ( -50 ,50)
plt . grid ( True )
plt . show ()

plt . hist ( a_2_causal_effect_array , bins = 800 , edgecolor = ’ black ’)


plt . xlabel ( ’ Causal Effect Estimates ’)
plt . ylabel ( ’ Frequency ’)
plt . title ( ’ Histogram of Causal Effect Estimates (2 SLS ) ’)
plt . xlim ( -1 ,1)
plt . grid ( True )
plt . show ()

Figure 4(a): Histogram for α = 0.005 Figure 4(b): Histogram for α = 0.5

For causal estimates we have used the two stage ordinary least square estimator. In the first
stage, estimating E[T | Z] gives T̂ in subspace and then, in the second stage, E[Y | T̂ ] is
estimated to obtain τ̂ , which is the required causal estimate, computed using regression. Tuning
the value of the hyperparameter alpha helps us to avoid overfitting or underfitting and also
complies with the assumptions of instrumental variables.
In the presence of weak instrument bias (weak association between Z and T ), the instrumental
variable tends to overestimate the causal effect (β). It is evident from the histogram for α =
0.005 that the distribution of estimated causal effects is centered higher than the true value
of β = 0, indicating the presence of bias in the estimation. This illustrates the impact of
weak instruments on the accuracy of causal effect estimation. Another observation from this
histogram is that the variance (spread) is significant, from over [-20, 20], suggesting that the
causal effect estimates have more uncertainty.
With α = 0.5, the strength of the association between Z and T is much stronger compared to
α = 0.005 and hence the impact of weak instrument bias is less pronounced in this case. The
peak of the histogram is observed almost exactly at 0, signifying that the instrumental variable
method provides the estimates of the causal effect that are closer to the true value of 0. The
concentration of most values at 0 implies reduced bias in estimation compared to α = 0.5. This
indicates that a stronger association between the instrument and treatment variables mitigates
the effect of weak instrument bias in causal estimation and gives improved estimates with lesser

8
bias. The variance is lower as well, ranging from [-1, 1], suggesting that the estimates are more
accurate for α = 0.5.

Q4.
(a) Given the graph, the adjustment sets are {X1 , X2 , X3 }, {X2 , X3 , Z}, {X1 , Z}, {X1 , X3 }.
These adjustment sets satisfy the backdoor criterion, ensuring that there are no unblocked
backdoor paths from the treatment variable T to the outcome variable Y after adjustment.
Adjusting helps control for confounding and gives a more accurate estimation of the causal
effect of T on Y.
Adjusting for {X1, X2, X3}, we block all backdoor paths from T to Y, ensuring their condi-
tional independence given X1, X2, and X3. {X2, X3, Z} directly influences both T and Y.
Adjusting for them blocks all backdoor paths from T to Y through these variables. {X1, Z}
adjusts for variables that influence T and directly influences Y, blocking backdoor paths from
T to Y through these variables. Adjusting for {X1, X3}, we block the backdoor paths from T
to Y through these variables, ensuring conditional independence of T and Y.

(b) Adjustment set is {Z}


If the variables X1, X2, X3 are unobserved, we apply ”front-door adjustment” criterion on the
available observed variables and the structure of the graphical model, given the validity of the
instrumental variable (Z) and the absence of uncontrolled confounding hold.
For the given causal graphical model, the adjustment set {Z} satisfies the front door criterion
as it is affected by treatment variable T and also influences the outcome Y only through T ,
while not being directly affected by the unobserved variables X1, X2, and X3. Z also acts as an
instrumental variable that helps identify the causal effect of T on Y. Conditioning on Z blocks
all backdoor paths from T to Y .

(c) Applying the PC algorithm for constraint-based causal discovery to learn the structure of a
causal graph from observed data. We iteratively explore conditional independence relationships
between variables to identify the structure of the graph and edge orientations. If the conditional
independence test suggests that two variables are independent given a set of other variables, we
remove the corresponding edge.
1.
We start with a complete graph:

2.
(i) There are no zero order conditional independencies observed. No edges removed
(ii) First order conditional independency observed: W ⊥
⊥ Y | X.
Assuming that variables W and Y are conditionally independent given the set X, we remove
the edge between W and Y. The resulting graph is:

9
(iii) Second order conditional independencies observed: X ⊥
⊥ Q | {W, Y }.
Assuming that variables X and Q are conditionally independent given the set W, Y, we remove
the edge between X and Q. The resulting graph is:

There are no higher order conditional independencies.


3. For orienting V-structures(colliders), we take triplets where two nodes are connected to the
third. We observe that W ̸⊥⊥ Y | Q. So we get the following directionalities in the graph:

Now W → − X and Y → − X are not possible as it would be a collider. It was not detected in the
previous step of orienting V-structures. So modifying the graph with the actual directed edges
yields the following graph:

10

You might also like