0% found this document useful (0 votes)

17 views12 pages

Data Analysis For Social Scientists Cheatsheet

This document is a comprehensive cheat sheet for data analysis, covering fundamental concepts of probability, distributions, and statistical methods as taught in an online course by Prof. Esther Duflo and Prof. Sara Ellison. It includes definitions and properties of various probability distributions, such as binomial and hypergeometric distributions, as well as techniques for data collection and analysis, including web scraping and kernel density estimation. Additionally, it outlines key principles of ethical research involving human subjects, as well as methods for summarizing and describing data through visualizations like histograms and cumulative distributions.

Uploaded by

DianaCruz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views12 pages

Data Analysis For Social Scientists Cheatsheet

Uploaded by

DianaCruz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

14.

310x Data Analysis for Social A probability on a sample space S is a collection of numbers P(A) that satisfy Binomial Distribution X ∼ B(n, p)
axioms 1-3.
Scientists Counting fX (x) =
n
x
p (1 − p)
n−x
, x = 0, 1, . . . , n
x
This is a cheat sheet for data analysis based on the online course given by Prof. Esther 1. If an experiment has two parts, first one having m possibilities and, regardless The binomial distribution describes the number of ”successes in n trials where the
Duflo and Prof. Sara Ellison. Compiled by Janus B. Advincula. of the outcome in the first part, the second one having n possibilities, then the trials are independent and the probability of success in each is p.
experiment has m × n possible outcomes.
Last Updated December 3, 2019 The probability function (PF) of X, where X is a discrete random variable, is the
2. Any ordered arrangement of objects is called a permutation. The number of function fX such that for any real number x, fX (x) = P(X = x).
different permutations of N objects is N !. The number of different
Module 1 permutations of n objects taken from N objects is
N!
.
Properties:
(N − n)!
• 0 ≤ fX (xi ) ≤ 1
Introduction 3. Any unordered arrangement of objects is called a combination. The number of
•
P
N! fX (xi ) = 1
• Data is plentiful. different combinations of n objects taken from N objects is . We i
• Data is beautiful. (N − n)!n!
• P(A) = P(X ⊂ A) =
P
N fX (xi )
• Data is insightful. typically denote this . A
• Data is powerful. n
• Data can be deceitful. • P(X = x) = 0 for any x if X is continuous.
Properties:
Causation vs. Correlation The density or probability density function (PDF) is the continuous analog to the
• P (Ac ) = 1 − P (A) discrete PF in many ways.
• Correlation is not causality. • P (∅) = 0
• A causal story is not causality either. A random variable X is continuous if there exists a non-negative function fX such
• If A ⊂ B then P (A) ≤ P (B) that for any interval A ⊂ R,
• Even more sophisticated data use may still not be causality.
• For all A, 0 ≤ P (A) ≤ 1 Z
What We Need to Learn • P (A ∪ B) = P (A) + P (B) − P (AB) P(X ⊂ A) = fX (x)dx.
A
• How do we model the processes that might have generated our data? • P (AB c ) = P (A) − P (AB)
- Probability Properties:
• How do we summarize and describe data, and try to uncover what process Independence Events A and B are independent if P (AB) = P (A) P (B).
may have generated it? Theorem If A and B are independent, A and B are also independent.
c
• 0 ≤ fX (x)
- Statistics
Conditional Probability The probability of A conditional on B is •
R
• How do we uncover pattern between variables? fX (x)dx = 1
- Exploratory data analysis • P(A) = P(a ≤ X ≤ b) =
R
fX (x)dx
P (AB)
- Econometrics P (A|B) = , P (B) > 0.
A
P (B)
The cumulative distribution function (CDF) FX of a random variable X is defined
Module 2 If A and B are independent and P (B) > 0, then
for each x as
FX (x) = P(X ≤ x).
Fundamentals of Probability P (AB) P (A) P (B) Properties:
P (A|B) = = = P (A) .
P (B) P (B)
A sample space S is a collection of all possible outcomes of an experiment.
• 0 ≤ FX (x) ≤ 1
An event A is any collection of outcomes (including individual outcomes, the entire Bayes’ Theorem
sample space, the null set). • FX (x) is non-decreasing in x
P (B|A) P (A)
Useful results: P (A|B) = • lim FX (x) = 0
P (B|A) P (A) + P (B|Ac ) P (Ac ) x→−∞
• If A ⊂ B, then A ∪ B = B.
• lim FX (x) = 1
• If A ⊂ B and B ⊂ A, then A = B. (A and Ac form a partition of S.) x→∞
• If A ⊂ B, then A ∩ B = AB = A. • FX (x) is right continuous.
• A ∪ Ac = S Random Variables, Distributions, and Joint
Distributions A PF/PDF and a CDF for a particular random variable contain exactly the same
A and B are mutually exclusive (disjoint) if they have no outcomes in common.
information about its distribution, just in a different form.
A and B are exhaustive (complementary) if their union is S. A random variable is a real-valued function whose domain is the sample space.
Z x
A discrete random variable can take on only a finite or countable infinite number of FX (x) = P(X ≤ x) = fX (x)dx
Probability values. −∞

We will assign to every event A a number P (A), which is the probability the event A random variable that can take on any value in some interval, bounded or dF (x)
will occur (P : S → R). unbounded, of the real line is called a continuous random variable.
0
FX (x) = = fX (x)
dx
We require that: Hypergeometric Distribution X ∼ H (N, K, n)
1. P(A) ≥ 0 for all A ⊂ S
Joint Distributions
K N −K
x n−x
2. P(S) = 1 fX (x) = N
, x = max (0, n + K − N ) , . . . , min (n, K) If X and Y are continuous random variables defined on the same sample space S,
n then the joint probability density function of X & Y , fXY (x, y), is the surface such
3. For any sequence of disjoint sets A1 , A2 , . . . ,
that for any region A of the xy-plane,
! The hypergeometric distribution describes the number of ”successes” in n trials
\ X where you’re sampling without replacement from a sample of size N whose initial Z Z
P Ai = P (Ai ) P ((X, Y ) ⊂ A) = fXY (x, y)dxdy.
probability of success was K/N .
i i A
Gathering and Collecting Data The Kernel Density Estimation y

Where can we find data?

Kernel density estimation is a non-parametric way to estimate the probability
(−1, 1) (1, 1)

1. Existing data libraries density function of a random variable.

2. Collecting your own data
3. Extracting data from the internet Let (x1 , x2 , . . . , xn ) be an independent and identically distributed sample drawn
What is web scraping? from some distribution with an unknown PDF f . We are interested in estimating the y = x2
x
• Pull data from one page shape of this function f . Its kernel density estimator is given by
• Crawl an entire web page n n
• A set of forms running in the background 1 X 1 X x − xi

fˆh (x) = Kh (x − xi ) = K What is c?
• Any of the above in an ongoing fashion n i=1 nh i=1 h Z 1 Z 1 4 21
2
cx ydydx = c=1 ⇒ c=
Web scraping in Python −1 x2 21 4
You will work using the request library and the BeautifulSoup library. where K() is the kernel, a non-negative function that integrates to 1 and has mean
zero, and h > 0 is the bandwidth. What is P (X > Y )?
Web scraping in R
R has a web scraping package built by Hadley Wickham called rvest. Things to choose:
y
Human Subject Research • the Kernel function (Epanechnikov, Normal, etc.) (−1, 1) (1, 1)
• the bandwidth (the optimal bandwidth minimizes the Mean Squared Error)
Research A systematic investigation, including research development, testing and
evaluation, designed to develop or contribute to generalizable knowledge.
Human Subject A living individual about whom an investigator (whether
professional or student) conducting research obtains 0.06

y = x2
1. data through intervention or interaction with the individual, or
x
2. identifiable private information. 0.04

density
Key Principles of the Belmont Report
Z 1 Z x 21 2 3
0.02
1. Respect for Persons x ydydx =
• Respect individual autonomy 0 x2 4 20
• Protect individuals with reduced autonomy 0.00

2. Beneficence 120 140 160

height_cm
180

• Maximize benefits and minimize harm Marginal Distribution

3. Justice
• Equitable distribution of research burdens and benefits Cumulative Histogram, CDF For discrete: X
fX (x) = fXY (x, y)
Cumulative Histogram the number/frequency of cases that are smaller or equal to
Module 3 the value for a particular bin
y

For continuous:
You can get a smoothed version of a CDF (using ecdf in R)
Z
Summarizing and Describing Data fX (x) = fXY (x, y)dy
y
Histogram 1.00

Independence X & Y are independent if

A histogram is a rough estimate of the probability distribution function of a
continuous variable. It is a function that counts the number of observations that fit
P (X ⊂ A and Y ⊂ B) = P (X ⊂ A) P (Y ⊂ B)
0.75

into each bin.

Example: Women’s height in Bihar 0.50 for all regions A and B. Also, X and Y are independent iff
y

fXY (x, y) = fX (x)fY (y).

0.25

1500
Conditional Distribution
0.00

The conditional PDF of Y given X is

140 160 180
Height in centimeters

1000
count

fXY (x, y)
Joint, Marginal and Conditional Distributions fY |X (y|x) =
fX (x)
Joint Distribution
500 (= P (Y = y|X = x) for X, Y discrete)
Example:
( Conditional distributions and independence
cx2 y for x2 ≤ y ≤ 1
fXY (x, y) =
0 0 otherwise fY |X (y|x) = fY (y) iff fXY (x, y) = fX (x)fY (y)
120 140 160 180
Height in centimeters, Bihar Females
Support: iff X & Y independent
Module 4 Moments of a Distribution Covariance and Correlation
The mode is the point where the PDF reaches its highest value. Covariance
Functions of Random Variables The median is the point above and below which the integral of the PDF is equal to Cov(X, Y ) = E [(X − µX ) (Y − µY )]
X is a random variable with fX (x) known. We want the distribution of 1/2.
Correlation
Y = h(X). Then, E [(X − µX ) (Y − µY )]
The mean, or expectation, or expected value, is defined as
ρ(X, Y ) =
Z
Var(X) Var(Y )
p p
FY (y) = fX (x)dx Z
{x:h(x)≤y}
E [X] = xfX (x)dx.
If Y is also continuous, then • X & Y are ”positively correlated” if ρ > 0.
• X & Y are ”negatively correlated” if ρ < 0.
dFY (y) Y = g(X) • X & Y are ”uncorrelated” if ρ = 0.
fY (y) = . Z
dy E [Y ] = E [g(X)] = g(x)fX (x)dx
Properties of Covariance:
Example:
( Expectation, Variance and an Introduction to 1. Cov(X, X) = Var(X)
1/2 for − 1 ≤ x ≤ 1
fX (x) = Regression
0 otherwise 2. Cov(X, Y ) = Cov(Y, X)

Y = X . What is fY (y)? Note that the support of X is [−1, 1] which implies that
2 Properties of Expectation: 3. Cov(X, Y ) = E[XY ] − E[X]E[Y ]
the induced support of Y is [0, 1].
1. E [a] = a, a constant 4. X, Y independent ⇒ Cov(X, Y ) = 0

2
FY (y) = P (Y ≤ y) = P X ≤ y 2. E [Y ] = aE [X] + b, Y = aX + b 5. Cov(aX + b, cY + d) = ac Cov(X, Y )
Z √y
√ √ 1 3. E [Y ] = E[X1 ] + · · · + E[Xn ], Y = X1 + · · · + Xn 6. Var(X + Y ) = Var(X) + Var(Y ) + 2Cov(X, Y )
= P (− y ≤ X ≤ y) = √
dx
− y 2
√ 4. E[Y ] = a1 E[X1 ] + · · · + an E[Xn ] + b, Y = a1 X1 + · · · + an Xn + b 7. |ρ(X, Y )| ≤ 1
= y for 0 ≤ y ≤ 1
5. E[XY ] = E[X]E[Y ] if X, Y independent 8. |ρ(X, Y )| = 1 iff Y = aX + b, a 6= 0
1

 √ for 0 ≤ y ≤ 1
fY (y) = 2 y Variance h
2
i A Preview of Regression
0 otherwise Var(X) = E (X − µ)


We have two random variables, X & Y .

Linear Transformation Properties of Variance:
2
E[X] = µX , Var(X) = σX
Let X have PDF fX (x). Let Y = aX + b, a =
6 0. 1. Var(X) ≥ 0
2
1

y−b

2. Var(a) = 0, a constant E[Y ] = µY , Var(Y ) = σY
fY (y) = fX
|a| a 3. Var(Y ) = a2 Var(X), Y = aX + b Cov(X, Y )
ρXY =
σX σY
Probability Integral Transformation Let X, continuous, have PDF fX (x) and CDF 4. Var(Y ) = Var(X1 ) + · · · + Var(Xn ), Y = X1 + · · · + Xn , X1 , . . . , Xn
FX (x). Let Y = FX (X). How is Y distributed? independent If |ρXY | < 1, then we can write Y = α + βX + U .
A continuous random variable transformed by its own CDF will always have a 5. Var(Y ) = a21 Var(X1 ) + · · · + a2n Var(Xn ), Y = a1 X1 + · · · + an Xn + b, Let β = ρXY
σY
.
U [0, 1] distribution. X1 , . . . , Xn independent
σX

Let α = µY − βµX .
Convolution 6. Var(X) = E[X 2 ] − (E[X])2
Then, U = Y − α − βX has the following properties:
A convolution refers to the sum of independent random variables. Standard Deviation q • E[U ] = 0
Let X be continuous with PDF fX , Y continuous with PDF fY . X and Y are SD(X) = σ = Var(X)
• Cov(X, U ) = 0
independent. Let Z be their sum. What is the PDF of Z?
Z ∞ Conditional Expectation α & β are the regression coefficients.
fZ (z) = fX (z − y)fY (y)dy −∞<z <∞ Z
−∞ E [Y |X] yfY |X (y|x)dy
Inequalities
Order Statistics Note that E [Y |X] is a function of X and, therefore, a random variable. Markov Inequality X is a random variable that is always non-negative. Then, for
any t > 0,
Let X1 , . . . , Xn be continuous, independent, identically distributed, with PDF fX . Law of Iterated Expectations
E[X]
Let Yn = max {X1 , . . . , Xn }. This is called the nth order statistic. P (X ≥ t) ≤
E [E [Y |X]] = E[Y ] t
Distribution:
n Chebyshev Inequality X is a random variable for which Var(X) exists. Then, for
Fn (y) = FX (y) Law of Total Variance any t > 0,
dFn (y) n−1 Var(X)
fn (y) = = n (FX (y)) fX (y) Var (E [Y |X]) + E [Var (Y |X)] = Var(Y ) P (|X − E[X]| ≥ t) ≤
dy t2
Module 5 The Sample Mean, Central Limit Theorem and Theorem The sample mean for an i.i.d. sample is unbiased for the population mean.
Estimation Theorem The sample variance for an i.i.d. sample is unbiased for the population
Special Distributions The sample mean is the arithmetic average of the n random variables (or variance, where the sample variance is
Bernoulli Two possible outcomes: success or failure. The probability of success is p, realizations) from a random sample of size n.
n
failure is q (= 1 − p). 2 1 X 2
1 S = Xi − X n
Binomial If X1 , . . . , Xn are i.i.d. random variables, all Bernoulli distributed with Xn = (X1 + · · · + Xn ) n − 1 i=1
success probability p, then n
n
n 1 X
X =
X
Xk ∼ B(n, p) binomial distribution = Xi Given two unbiased estimators, θ̂1 & θ̂2 , θ̂1 is more efficient than θ̂2 if, for a given
n i=1 sample size,
k=1
The binomial distribution is the number of successes in a sequence of n independent Var θ̂1 < Var θ̂2
Expectation of the sample mean:
(success/failure) trials, each of which yields success with probability p.
Mean Squared Error Sometimes we are interested in trading off bias and
h i
Hypergeometric The binomial distribution is used to model the number of successes E Xn = µ
in a sample of size n with replacement. If you sample without replacement, you get the variance/efficiency.
hypergeometric distribution.
Variance of the sample mean: 2 h i 2
Negative Binomial Consider a sequence of independent Bernoulli trials, and let X MSE θ̂ = E θ̂ − θ = Var θ̂ + E θ̂ − θ
be the number of trials necessary to achieve r successes. X has a negative binomial σ2
distribution. Var X n =
n
Geometric A negative binomial distribution with r = 1 is a geometric distribution. θ̂ is a consistent estimator for θ if
It is the number of failures before the first success. The Central Limit Theorem
lim P θ − θ̂n < δ = 1.
• The sum of r independent Geometric (p) random variables is a negative n→∞
Let X1 , . . . , Xn form a random sample of size n from a distribution with finite
binomial (r, p) random variable.
mean and variance. Then for any fixed number x, Roughly, an estimator is consistent if its distribution collapses to a single point at the
• If Xi are i.i.d. and negative binomial (ri , p), then Xi is distributed as a
P
negative binomial ( ri , p). true parameter as n → ∞.
P
 
• Memorylessness √ X−µ
lim P  n ≤ x = Φ(x)
Poisson The Poisson distribution expresses the probability of a given number of n→∞ σ Method of Moments
events occurring in a fixed interval of time if:
1. the events can be counted in whole numbers where Φ(x) is the CDF of a standard normal random variable. Population Moments (about the origin):
2. the occurrences are independent and h i h i
3. the average frequency of occurrence for a time period is known. Statistics 2 3
E [X] , E X , E X , . . .
Relationship between Poisson and Binomial For small values of p, the Poisson An estimator is a function of the random variables in a random sample.
distribution can simulate the Binomial distribution. Sample Moments:
A parameter is a constant indexing a family of distributions.
Exponential
• waiting time between two events in a Poisson process The function of the random sample is the estimator. The number, or realization of 1 X
n n
1 X 2 1 X 3
n

• memoryless the function of the random sample, is the estimate. Xi , Xi , X ,...

n i=1 n i=1 n i=1 i
• It is a special case of a gamma distribution, the “waiting time” before a Example: Suppose X ∼ U [0, θ]
number of occurrences.
Uniform 1
 If you have k parameters to estimate, you will have k moment conditions. In other
0<x<θ words, you will have k equations in k unknowns to solve.
• quasi-random number generators fX (x) = θ
• from a uniform distribution, you can use the inverse CDF method to get a
0 otherwise
sample for many (not all) distributions you are interested in Maximum Likelihood Estimation
We want to estimate θ.
Normal Distribution
Properties: θ̂1 = max{X1 , . . . , Xn } The maximum likelihood estimator of a parameter θ is the value θ̂ which most
likely would have generated the observed sample.
• If X1 is normal, then X2 = a + bX1 is also normal, with mean a + b E[X1 ] 2
n
X
and variance b2 Var(X1 ). θ̂2 = Xi Likelihood Function
n i=1 n
• Normal distributions are symmetric, unimodel, “bell-shaped,” have thin tails,
Y
L (θ|x) = f (xi |θ)
and the support is R. i=1
Useful R Commands:
Module 6 We just maximize L over θ in Θ.

Purpose Syntax Assessing and Deriving Estimators Properties:

• If there is an efficient estimator in a class of consistent estimators, MLE will
rnorm generates random numbers rnorm(n, mean, sd)
h i
An estimator is unbiased for θ if E θ̂ = θ for all θ in Θ. produce it.
from normal distribution
• Under certain regularity conditions, MLEs will have asymptotically normal
dnorm probability density dnorm(x, mean, sd) Example Xi i.i.d. U [0, θ] distributions.
function (PDF)
pnorm cumulative distribution pnorm(q, mean, sd) 1 X
n
Disadvantages:
function (CDF) θ̂ = 2 Xi
qnorm quantile function – qnorm(p, mean, sd) n i=1 • They can be biased.
inverse of pnorm n n
• They might be difficult to compute.
h i 1 X 1 Xθ • They can be sensitive to incorrect assumptions about the underlying
E θ̂ = 2 E [Xi ] = 2 =θ unbiased
n i=1 n i=1 2 distribution, more so than other estimators.
Confidence Intervals and Hypothesis Testing A simple hypothesis is one characterized by a single point, i.e., ΘO = θO . Power Calculations
The standard error of an estimate is the standard deviation (or estimated standard A composite hypothesis is one characterized by multiple points, i.e., ΘO has We tend to pick α low because society does not want to conclude that some
deviation) of the estimator. multiple values or a range of values. treatment work when in fact it really does not.
2
χ Distribution Example We want to pick N = Nc + Nt such that, if the average treatment effect is in fact
The sample variance Xi i.i.d. N (µ, σ 2 ), where σ 2 known maintained hypothesis some value τ , the power of the test will be at least 1 − β for some β, given that a
2 1 X 2
Interested in testing whether µ = 0 testable hypothesis fraction γ of the units are assigned to the treatment group.
S = Xi − X n
n−1 In addition, we must assume (know) something about the variance of the outcome
HO : µ = 0 null hypothesis, simple
is an unbiased estimator for the variance of a distribution. in each treatment arm: for simplicity, we often assume it is the same, and some
HA : µ = 1 alternative hypothesis, simple parameter σ 2 .
(n − 1)S 2 2
∼ χn−1 In summary, we know, impose, or assume α, β, τ, σ, and γ, and we are looking for
σ2
HO True HO False N.
t Distribution
Accept HO No error Type II error Alternatively, we could be interested in the power for a given sample size: we know
If X ∼ N (0, 1) and Z ∼ χ2n and they’re independent, then α, β, τ, σ, and N and look for β.
Reject HO Type I error No error
X
p ∼ tn .
Z/n
obs obs obs obs
The significance level of the test, α, is the probability of type I error. Yt −Yc Y −Yc
Suppose we are sampling from a N µ, σ 2 distribution. Then, T = q ≈ qt

The operating characteristic of the test, β, is the probability of type II error. V̂Neyman σ2
+ σ2
Nt Nc
√ X−µ 1 − α is the confidence level.
n ∼ tn−1 We reject this hypothesis if |T | > Φ 1 − α

, e.g. if α = 0.05, if |T | > 1.96.
S 1 − β is the power. 2

F Distribution We define the critical region of the test, C or CX , as the region of the support of the obs obs
Yt −Yc −τ
If X ∼ χ2n and Z ∼ χ2m and they’re independent, then random sample for which we reject the null. The critical region will take the form q ≈ N (0, 1)
σ2 σ2
X > k for some k yet to be determined. N + Nc t
X/n
∼ Fn,m . Statistical Power
Z/m
HO HA
Confidence Intervals fX under HO fX under HA
“Power”
Case 1 We are sampling from a normal distribution with a known variance and we 1−β
want a confidence interval for the mean.
" #
√ X−µ

−1 α −1 α
P Φ < n < −Φ =1−α
2 σ 2

−1 α σ −1 α σ β α
CI1−α = X + Φ √ ,X − Φ √ β α
2 n 2 n 2

Case 2 We are sampling from a normal distribution with an unknown variance and 0k 1
we want a confidence interval for the mean.
Choice of any one of α, β, or k determines the other two. This involves an explicit
" #
√ X−µ

−1 α −1 α   
P tn−1 < n < −tn−1 =1−α trade-off between the probability of type I and type II errors.
α


α

τ
2 σ 2 |T | > Φ 1− ≈ Φ −Φ
−1
1− + q +
 
P
• increasing k means α ↓ and β ↑ 2 
 2 σ2
+ σ2
−1 α σ −1 α σ Nt Nc
CI1−α = X + tn−1 √ , X − tn−1 √ • decreasing k means α ↑ and β ↓  
2 n 2 n 
What happens as n increases or decreases? −1 α τ 
Φ −Φ 1− − q
 
Hypothesis Testing 2 σ2
+ σ2



Nt Nc
An hypothesis is an assumption about the distribution of a random variable in a
population. β The second term is very small so we ignore it. We want the first term to be equal to
A maintained hypothesis is one that cannot or will not be tested. 1 − β: √ p
τ N γ(1 − γ)

−1 −1 α
A testable hypothesis is one that can be tested using evidence from a random Φ (1 − β) = −Φ 1− +
sample. 2 σ
Nt
The null hypothesis, HO , is the one that will be tested. where γ = N .

The alternative hypothesis, HA , is a possibility (or series of possibilities) other than The required sample size is
the null.
smaller n
2
Φ−1 (1 − β) + Φ−1 1 − α
2
We might want to perform a test concerning unknown parameter θ where larger n
N =
τ2
Xi ∼ f (x|θ). σ2
γ (1 − γ)

HO : θ in ΘO α • With stratified design, the variance of the estimated treatment effect is lower.
HA : θ in ΘA , where ΘO and ΘA disjoint. • With clustered design, the variance of the estimated treatment effect is larger.
Stratified Design Stable Unit Treatment Value Assumption (SUTVA) The potential outcomes for any Analyzing Randomized Experiments
unit do not vary with the treatments assigned to other units, and, for each unit, there
• Take the difference in means within each strata. are no different forms or versions of each treatment level, which lead to different The Average Treatment Effect
• Take a weighted average of the treatment effect with weight the size of the potential outcomes.
strata:
The Assigment Mechanism Let’s assume we have a population of size N , indexed obs obs
h i h i
X Ng ATE = E Yi |Wi = 1 − E Yi |Wi = 0
τ̂g by i. Let the treatment indicator Wi take on the values 0 (the control treatment) and 1
g
N (the active treatment). We have one realized (and possibly observed) potential
This will be an unbiased estimate of the average treatment effect. outcome for each unit, denoted by Yiobs :
Suppose we have a completely randomized experiment with Nt treatment units
• The variance will be calculated as: ( and Nc control units. The difference in sample average
obs Yi (0) if Wi = 0,
X N g 2 Yi = Yi (Wi ) =
V̂g Yi (1) if Wi = 1. 1 1
obs obs obs obs
X X
g
N τ̂ = Yi − Yi = Yt −Yc .
Nt Nc
For each unit we also have one missing potential outcome, Yimis , i:Wi =1 i:Wi =0

• Special case: probability of assignment to control group stays the same in each (
strata. Then this coefficient is equal to the simple difference between treatment mis Yi (1) if Wi = 0,
Yi = Yi (1 − Wi ) = The variance of a difference of two statistically independent variables is the sum of
and control, but the variance is always weakly lower. Yi (0) if Wi = 1.
their variances. Thus,
• Stratification will lower the required sample size for a given power.
Comparisons of Yi (1) and Yi (0) are unit-level causal effects: Sc2 S2
V (τ̂ ) = + t .
Clustered Design Nc Nt
• We need to take into account the fact that the potential outcomes for units Yi (1) − Yi (0)
To estimate the variance, V̂ (τ̂ ), replace Sc2 and St2 by their sample counterpart:
within randomization clusters are not independent.
• Conservative way to do this: just average the outcome by unit and treat each Missing data problem Given any treatment assigned to an individual unit, the
one as an observation. potential outcome associated with any alternate treatment is missing. A key role is 2 1 X
obs
2
therefore played by the missing data mechanism, or the assignment mechanism. How is sc = Yi (0) − Y c
• The number of observations is the number of clusters and you can analyze this Nc − 1
data exactly as a completely randomized experiment but with clusters as the it determined which units get which treatments or, equivalently, which potential i:Wi =0

unit of analysis. outcomes are realized and which are not? 2 1 X

obs
2
st = Yi (0) − Y t
• A randomization with two clusters is unlikely to go very far! The Selection Problem Imagine we have a larger group of people who took aspirin Nt − 1 i:Wi =0
and a group who did not, and we decide to take the sample mean of headache for
Module 7 people who got or did not get the pill. We know that this is a good estimator for
Confidence Intervals We want to find a function of the random samples A and B
E [Yi |Wi = 1] − E [Yi |Wi = 0] . such that
Causality P (A(X1 , . . . , XN ) < θ < B(X1 , . . . , XN )) > 1 − α
Definition of Causal Effects For any unit, the causal effect of a treatment is the obs obs
h i h i
E Yi |Wi = 1 − E Yi |Wi = 0
difference between the potential outcome with and without the treatment. The ratio of the difference and the estimated standard error will follow a
Example Consider a single unit contemplating whether or not to take an aspirin for
= E [Yi (1)|Wi = 1] − E [Yi (0)|Wi = 0] t-distribution, so
headache. The unit-level causal effect involves one of four possibilities:
p p
= E [Yi (1)|Wi = 1] − E [Yi (0)|Wi = 1]

τ
CI1−α = τ̂ − tcrit V̂ , τ̂ + tcrit V̂ .
+ E [Yi (0)|Wi = 1] − E [Yi (0)|Wi = 0]
1. Headache gone only with aspirin:
Y (Aspirin) = No Headache, Y (No Aspirin) = Headache With small samples, take tcrit from a table of t-distribution for the relevant α with
Treatment effect on the treated E [Yi (1)|Wi = 1] − E [Yi (0)|Wi = 1] Nt + Nc − 1 degrees of freedom.
2. No effect of aspirin, with a headache in both cases:
Y (Aspirin) = Headache, Y (No Aspirin) = Headache Selection bias E [Yi (0)|Wi = 1] − E [Yi (0)|Wi = 0]
Hypothesis Testing
3. No effect of aspirin, with the headache gone ib both cases: Randomization solves the selection problem In a completely randomized
Y (Aspirin) = No Headache, Y (No Aspirin) = No Headache experiment, Nt units are randomly drawn to be in the treatment group, and Nc units
N
are drawn to be in the control group. Then, the probability of assignment does not 1 X
4. Headache gone only without aspirin: depend on potential outcomes: H0 : (Yi (1) − Yi (0)) = 0
N i=1
Y (Aspirin) = Headache, Y (No Aspirin) = No Headache
E [Yi (0)|Wi = 1] − E [Yi (0)|Wi = 0] = 0 N
1 X
H1 : (Yi (1) − Yi (0)) 6= 0
Unit Potential Outcomes Causal Effect and N i=1
Y (Aspirin) Y (No Aspirin) obs obs
h i h i
E Yi |Wi = 1 − E Yi |Wi = 0
You No Headache Headache Improvement due to Aspirin Natural Test Statistic
= E [Yi (1)|Wi = 1] − E [Yi (0)|Wi = 1] obs obs
Yt −Yc
The Problem of Causal Inference = E [Yi (1) − Yi (0)|Wi = 1] t= p
= E [Yi (1) − Yi (0)] V̂
• The definition of the causal effect depends on the potential outcomes, but it follows a t-distribution with N − 1 degrees of freedom or, with N large enough, a
does not depend on which outcome is actually observed. Types of RCT normal distribution.
• The causal effect is the comparison of potential outcomes, for the same unit, at
the same moment in time post-treatment. The fundamental problem of causal • Complete randomization
inference is therefore the problem that at most one of the potential outcomes • Stratified randomization Fisher Exact Test
can be realized and thus observed. • Pairwise randomization
• We must rely on multiple units to make causal inferences. • Clustered randomization Another view of uncertainty
(More) Exploratory Data Analysis: Non-Parametric Kernel Estimator Replace f (x, y) and f (x) by their empirical estimates: • reverse least squares
Comparisons and Regressions R
y fˆ(x, y)dy X β 0 + Yi
2
ĝ(x) = min Xi −
Kolmogorov-Smirnov Test fˆ(x) β
i
β1

Let X1 , . . . , Xn be a random sample with CDF F and let Y1 , . . . , Yn be a random Denominator: Under the assumptions of the Classical Linear Regression Model, OLS provides the
sample with CDF G. 1 X
n
x − xi minimum variance (most efficient) unbiased estimator of β0 and β1 . It is the MLE

fˆ(x) = K under normality of errors, and the estimates are consistent and asymptotically
We are interested in testing the hypotheses nh i=1 h
normal.
Ho : F = G Numerator:
n Closed-form solutions:
1 X x − xi

Ha : F 6= G yi K 1
P
nh i=1 h n Xi − X Yi − Y
The Statistic β̂1 = 2 , β̂0 = Y − β̂1 X
where h (the bandwidth) is the kernel estimate of the density of x. K() is a density. 1
P
Xi − X
Dnm = max |Fn (x) − Gm (x)| n
x
Large sample properties:
where Fn and Gm are the empirical CDF of the first and second sample. The Fitted Value Ŷi = β̂0 + β̂1 Xi
empirical CDF just counts the number of sample points below level x: • As h goes to zero, the bias goes to zero.
• As nh goes to infinity, variance goes to zero. Residual ˆi = Yi − Ŷi
n
1 X • As you increase the number of observation, you promise to decrease the Regression Line or Fitted Line Y = β̂0 + β̂1 X
Fn (x) = Pn (X < x) = I(X < x) bandwidth.
n i=1 P 2
Let X = n1
Xi and σ̂X2 1
Xi − X .
P
Choices to make: = n
First Order Stochastic Dominance: One-sided Kolmogorov-Smirnov Test
• Choice of kernel
• We are interested in testing the hypothesis Mean Variance Covariance
1. Histogram: K(u) = 1
2 if |u| ≤ 1, K(u) = 0 otherwise.
2 2 2
Ho : F = G 2. Epanechnikov: K(u) = 3
− u2 ) if |u| ≤ 1, K(u) = 0 otherwise. σ X σ
4 (1 β̂0 β0 + σ2 X
2
nσ̂X n
against 3. Quartic: K(u) = ( 34 (1 − u2 ))2 if u ≤ 1, K(u) = 0 otherwise. − 2
σ2 nσ̂X
Ha : F > G
• Choice of bandwidth: trade off bias and variance β̂1 β1 2
(which would mean that G FSD F ). nσ̂X
• The one-sided KS statistics is: – A large bandwidth will lead to more bias.
+ – A small bandwidth will lead to more variance.
Dnm = max [Fn (x) − Gm (x)]
x Some comparative statistics:

Asymptotic Distribution of the KS Statistic

Module 8
• A larger σ 2 means larger Var β̂
• Under Ho , the limit of KS as n and m go to infinity is 0, so we want to The Linear Model • A larger σ̂X
2

means smaller Var β̂
compare the KS statistics to 0. We will reject the hypothesis if the statistic is Linear Model
“large” enough.

Yi = β0 + β1 Xi + i , i = 1, . . . , n • A larger n means smaller Var β̂
• Under Ho , the distribution of
1 Basic assumptions: • If X > 0, Cov (β0 , β1 ) < 0.
nm 2 1. Xi , i uncorrelated
Dnm If we use the stronger assumption that the errors are i.i.d. N 0, σ 2 , β̂0 and β̂1

n+m 2. identification
1 X 2 will also have normal distributions.
has a known distribution (KS) with associated critical values. Xi − X >0
• Therefore, we reject the null of equality if n i Analysis of Variance
Sample variance is positive.
We want some way to indicate how much of Y ’s variation is explained by X’s

nm
Dnm > c(α) 3. zero mean: E [i ] = 0
n+m variation. We perform an analysis of variance and that leads us to a measure of
4. homoskedasticity: E 2i = σ 2 for all i goodness-of-fit.

where c(α) are critical values which we find in tables.
5. no serial correlation: E [i j ] = 0 if i 6= j Sum of Squared Residuals (SSR)
Non-Parametric (Bivariate) Regression Assumptions 3-5 could be subsumed under a stronger assumption: i i.i.d. N (0, σ 2 ). X 2 X 2
SSR = Yi − β̂0 − β̂i Xi = (ˆ
i )
You have two random variables, X and Y , and express the conditional expectation Properties: i i
of Y given X as E[Y |X] = g(X). Therefore, for any x and y,
• E [Yi ] = β0 + β1 Xi Total Sum of Squares (SST)
y = g(x) + • Var(Yi ) = E 2i = σ 2

X 2
SST = Yi − Y
where is the prediction error. The problem is to estimate g(x) without imposing a • Cov(Yi , Yj ) = 0, i 6= j i
functional form.
Estimates for β0 and β1 Model Sum of Squares (SSM)
The Kernel Regression
Z • least squares (OLS) X 2
E [Y |X = x] = yf (y|x)dy
X 2 SSM = Ŷi − Y
min (Yi − β0 − β1 Xi )
β i
i
By Bayes’ rule: • least absolute deviations The fact that the regression line is the least squares line ensures that SSR ≤ SST.
R
yf (x, y) yf (x, y)dy SSR
Z Z X
yf (y|x)dy = dy = min |Yi − β0 − β1 Xi |
β 0≤ ≤1
f (x) f (x) i SST
We want a measure of fit that had larger values when the fit was better so we define 2. We impose the restrictions of the null and estimate that model. Transformations of the Dependent Variable
3. We compare the goodness-of-fit of the models.
2 SSR Suppose
R =1− . What if the restriction is that some β = c? This is an F -test. β β
SST Yi = AX1i1 X2i2 e i .
1
r (SSRR − SSRU ) Then run the linear regression
In addition to using R2 as a basic measure of goodness-of-fit, we can also use it as T =
SSRU
the basis of a test of the hypothesis that β1 = 0. We reject the hypothesis when log (Yi ) = β0 + β1 log X1i + β2 log X2i + i
n − (k + 1)
(n − 1)R 2 to estimate β1 and β2 . Note that β1 and β2 are elasticities: when X1i changes by
, T ∼ Fr,n−(k+1) under the null and we reject the null for large values of the test 1%, Yi changes by β1 %.
1 − R2 statistic.
Returns to education formulation
which has an F distribution under the null, is large. HO : βi = c
β̂i − c 1/2 log Yi = β0 + β1 Si + i
Interpretation β̂1 is the estimated effect on Y of a one-unit increase in X. T = where SE β̂i = σ 2 (X T X)−1
SE β̂i ii When education increases by 1 year, wages increase by β1 × 100%.
The Multivariate Linear Model Box Cox Transformation Suppose
HO : Rβ = c
General Linear Model: 1
Rβ̂i − c
2 T −1 T 1/2
Yi = .
Yi = β0 + β1 X1i + · · · + βk Xki + i , i = 1, . . . , n T = where SE Rβ̂i = σ R(X X) R β0 + β1 X1i + β2 X2i + i
SE Rβ̂i
Then run the regression
Matrix notation:
Y = Xβ + Module 9 1
= β0 + β1 X1i + β2 X2i + i .
Yi
Assumptions: Practical Issues in Running Regressions Discrete Choice Model Suppose
1. identification: n > k + 1, X has full column rank k + 1 (i.e., regressors are Dummy Variables
linearly independent; X T X is invertible) eβ0 +β1 X1i +β2 X2i +i
Pi = ,
Yi = α + βDi + i 1 + eβ0 +β1 X1i +β2 X2i +i
2. error behavior: E() = 0, E(T ) = Cov() = σ 2 In Di is a dummy variable, or an indicator variable, if it takes the value 1 if the
stronger version ∼ N 0, σ 2 In
Pi is the percentage of individuals choosing a particular option (e.g., buying a
observation is in group A and 0 if in group B. particular car), then run the regression
Interpretation Without any control variables, then
β̂ is the vector that minimizes the sum of squared errors, i.e.,
Pi

β̂ = Y A − Y B . Yi = log = β0 + β1 X1i + β2 X2i + i
T
T 1 − Pi
ˆ ˆ = Y − X β̂ Y − X β̂ You can always estimate the difference between the treatment and control groups for
an RCT using an OLS regression framework. Polynomial Models
Solution: −1 Categorical Variables If there are more than two groups, you can transform them 2
Yi = β0 + β1 X1i + β2 X1i + · · · + βk X1i i
k
β̂ =
T
X X
T
X Y if
T
X X is invertible into dummy variables, one for each group. Warning: Omit one category to avoid
multi-collinearity. • You can choose straight polynomial, series expansion, orthogonal
Properties: Interpretation Each coefficient is the difference between the value of that group and polynomials, etc.
h i
the value for the omitted (reference) group. • If you assume that the model is known, this is just standard OLS.
• E β̂ = β • If you assume that the model is not known, this is a non-parametric method –
−1 Other Variables in the Regression there is bias (because the shape is never quite perfect) and variance (as you
• Cov β̂ = σ 2 X T X Yi = α + βDi + γXi + i add more Xs) so you add more terms as the number of observations increases
(series regression).
ˆT ˆ β̂ is the difference in intercept between group A and group B. Xi s are “control”
• σ̂ 2 = variables – things that did not affect the assignment but may have been different at
n−k Regression Discontinuity Design
baseline.
Inference in the Linear Model • add polynomials
Dummy Variables and Interactions Imagine you have two sets of dummy
variables, say, Treatment and Control Di , Male and Female Mi : 2
Yi = β0 + β1 Dai + β2 ai + β3 ai + β4 ai + i
3
Consider the hypotheses
Yi = α + βDi + γMi + δMi ∗ Di + i
HO : Rβ = c • fit a polynomial on each side of the discontinuity:
• α̂: an estimate of the mean for women in the control group
HA : Rβ 6= c. • β̂: an estimate of the difference between treatment and control group means Yi = β0 + β1 Dai + β2 (ai − a0 ) + β3 (ai − a0 ) ∗ Dai + · · · + i
R is a r × (k + 1) matrix of restrictions. If, for instance, R = [0 1 0 . . . 0] and for women (treatment main effect) Centering the variables ensure that β1 is still the jump at the discontinuity.
c = [0], that corresponds to HO : β1 = 0. If • γ̂: an estimate of the difference between males and females (gender main
effect)
0 1 0 ... 0 0
   
• δ̂: an estimate of the difference between the treatment effect for males and for
0 0 1 ... 0 0 females (interaction effect)
R = . . . and c = .
   
. . . . This is the basic difference-in-differences model which is used by empirical
. . . .
0 0 0 ... 1 0 researchers in a situation where there was a change in the law (or an event) affecting
one group but not the other, and you are willing to assume that in the absence of the
that corresponds to HO : β1 = β2 = · · · = βk = 0. law, the difference between the two groups would have remained stable over time.
One thing you cannot do in this framework is test one-sided hypotheses. More Generally: Interactions More generally, the coefficient on the interaction
between a dummy variable and some variable X tells us the extent to which the
Steps: dummy variable changes the regression function for that regressor.
∗ ∗
1. We estimate the unrestricted model. Yi = β0 + β0 Di + β1 X1i + β Di X1i + · · · + i
Omitted Variable Bias Visualizing Data • β̂1 is the Wald estimate.
Suppose that the regression model excludes a key variable (e.g., the data is Two different goals of data visualization
unavailable).
• For yourself: getting a sense of what is in the data – to guide future analysis The interpretation of IV when the treatment effect is not constant Under a fairly
Example: Consider the model: mild assumption, the Wald estimate still has a causal interpretation. It captures the
• For others: telling a story about the data and your results – to communicate effect of the treatment on those who are compelled by the instrument to get treated
log(Wi ) = β0 + β1 Ei + β2 Xi + β3 Ai + i your results (Local Average Treatment Effect or LATE).
where Wi is wage, Ei is education, Xi is job experience and Ai is ability. We are Scientific visualization
interested in measuring the effects of Ei and Xi on Wi with Ai constant. Suppose From the Wald estimate to Two-Stage Least Squares (2SLS)
What to achieve:
the Ai is unavailable so we run the regression without it. Next, we, instead, use IQ as
proxy to the omitted variable and run the regression. The results are shown below. • Show the data.
• We could couch this in a regression framework.
• Not lie about it.
log(wage) Coeff. SE Coeff. SE • Illustrate a story. • First stage: π̂1 in
Education 0.078 0.007 0.057 0.007 • Reduce clutter. Xi = π0 + π1 Zi + δi
Experience 0.020 0.003 0.020 0.003
IQ — — 0.006 0.001 • Visualization must complement the text and have enough information to stand
Constant 5.503 0.112 5.198 0.122 alone. • Reduced form: γ̂1 in
Tufte’s Principles Yi = γ0 + γ1 Zi + ωi
The estimated return to education changes from 7.8% to 5.7%.
• Show the data.
More Advanced Techniques • Maximize data-ink ratio. • Two-Stage Least Squares: Run the first stage and take the fitted values X̂i . In
• Erase non-data ink (as much as possible). the second stage, run Yi = β0 + β1 X̂i + i .
Machine Learning: Double Post LASSO • Erase redundant data ink.
• Suppose that we have lots of variables and we are not sure which ones to • Avoid chart junk (moiré, ducks).
include. • Try to increase the density of data ink. The 2SLS and the Wald estimates are identical
• There are machine learning techniques to learn which variables are predictive. • Graphics should tend to be horizontal.
• Three steps:
Cov(Yi , X̂i ) Cov(Yi , π0 + π1 Zi )
1. Regress X1 on all the available variables and see what LASSO picks. Module 11 β̂1 = =
Call this X2 . Var(X̂i ) Var(π0 + π1 Zi )
2. Regress Y on all the available variables and see what LASSO picks. Call Endogeneity and Instrumental Variables π1 Cov(Yi , Zi )
=
this X3 . π12 Var(Zi )
Consider a more general model:
3. Run γ1
Yi = β0 + β1 X1i + β2 X2i + β3 X3i + i .
( =
Yi = β0 + β1 Xi + β2 Ti + i π1
Xi = α0 + α1 Yi + α2 Zi + δi
Module 10
Endogenous variables (Xi and Yi ) are determined within the system. Experimental Design
Machine Learning
We talk about endogeneity when there is mutual relationship, i.e., when reasonable
Estimation case can be made either way.
What is experimental design?
• Strict assumptions about data generating process
Instrumental Variables • What is being randomized?
• Back out coefficients – the intervention
An instrument for the model
• Low-dimensional • Who is being randomized?
Yi = β0 + β1 Xi + i – the level of randomization (schools, individuals, villages, cells)
Prediction – the sample over which you randomize
is a variable Zi such that • How is randomization introduced?
• Allow for flexible functional forms
– method of randomization
• Get high quality predictions Cov(Z, X) 6= 0 and Cov(Z, ) = 0. – stratification
• How many units are being randomized?
• Give up on adjudicating between observably similar functions (variables) Three conditions:
– power
Undestanding OLS In-sample fit vs. out-of-sample fit 1. It affects X: Cov(Z, X) 6= 0.
2. It is randomly assigned. Randomization
OLS 2
β̂ = arg min ESn (β; x − y) 3. It has no direct effect on Y : Cov(Z, ) = 0 (exclusion restriction)
β • Simple randomization: define your sample frame and your unit of
∗
βprediction = arg min E(y,x) β x − y
0 2 The IV estimation can be seen as a two-step estimator within a simultaneous randomization, use software to randomly assign one group to treatment, one
β equations model. to control
RCT as IV Let Zi be a dummy variable equal to 1 if assigned to the treatment group • Stratification: create groups that are similar ex-ante
OLS looks good with the sample you have. We overfit in estimation. • Clustering: randomize at the group level
and 0 otherwise. Then,
Processing of data requires machine learning.
E[Yi |Zi = 1] − E[Yi |Zi = 0] Introducing Randomization
Two kinds of processing: β̂1 =
E[Xi |Zi = 1] − E[Xi |Zi = 0]
• Phase-in design
• Pre-processing
• The denominator is the first stage relationship. • Randomization “in the bubble”
• Processing • The numerator is the reduced form relationship. • Encouragement design
Some R commands

Command Library What it does

Create a matrix of
choose(n,m) rows and n
chooseMatrix(n,m) perm columns. The matrix has
unique rows with m ones in
each row and the rest zeros.

Returns the number of rows

NROW(x), NCOL(x)
or columns in matrix x

Computes the variance of x,

var(x) which is a vector, matrix or
dataframe.

Computes the covariance of x

and y, where both arguments
covar(x,y) are vectors, matrices or
dataframes with comparable
dimensions to each other.

Returns a vector or array or

list of values obtained by
apply()
applying a function to
margins of an array or matrix.

Fits a linear model to the

lm()
given data.

Computes confidence
confint() intervals for one or more
parameters in a fitted model.

Fit linear models with

felm() lfe multiple group fixed effects,
similar to lm.

Function to implement the

DCdensity() rdd
McCrary (2008) sorting test.

Function to calculate the

RDestimate() rdd Regression Discontinuity
estimate.

Fit IV regression by a
ivreg() AER two-stage least squares
method.

Recommended Resources

• Causal Inference for Statistics, Social, and Biomedical Sciences (Guido W.

Imbens and Donald B. Rubin)
• Mastering ’Metrics (Joshua D. Angrist and Jörn-Steffen Pischke)
• Data Analysis for Social Scientists [Lecture Slides] (http://www.edx.org)
• R Studio (https://www.rstudio.com)

Please share this cheatsheet with friends!

Summary of Special Distributions

Distribution PDF / PMF Expectation and Variance Graph

pX (x) = px (1 − p)1−x , E[X] = p

Bernoulli
x ∈ {0, 1} Var(X) = p(1 − p)

n x
pX (x) = p (1 − p)n−x , E[X] = np
Binomial x
Var(X) = np(1 − p)
x = 0, 1, . . . , n

nA
A B E[X] =
Hypergeometric A+B
x n−x
pX (x) = nAB A + B − n
Var(X) =

A+B
(A + B)2 A + B − 1
n

k(1 − p)
E[X] =
Negative Binomial r+k−1 k p
pX (k) = p (1 − p)r
k r(1 − p)
Var(X) =
p2
Distribution PDF / PMF Expectation and Variance Graph

1
E[X] =
Geometric pX (k) = (1 − p) k−1
p p
1−p
Var(X) =
p2

λk e−λ E[X] = λ
Poisson pX (k) =
k! Var(X) = λ

1
E[X] =
Exponential fX (x) = λe −λx λ
1
Var(X) = 2
λ

a+b
E[X] =
Uniform 1 2
fX (x) =
b−a (b − a)2
Var(X) =
12

1 (x−µ)2 E[X] = µ
Normal fX (x) = √ e− 2σ2
σ 2π Var(X) = σ 2

ML Cheat Sheet
50% (2)
ML Cheat Sheet
74 pages
Advance Statistics
No ratings yet
Advance Statistics
292 pages
PROBABILITY
No ratings yet
PROBABILITY
127 pages
Lecture 3 EARTH WORKS AND MASS HAUL DIAGRAM
100% (2)
Lecture 3 EARTH WORKS AND MASS HAUL DIAGRAM
18 pages
PTSP
No ratings yet
PTSP
101 pages
Probability Cheatsheet
100% (1)
Probability Cheatsheet
10 pages
CME 106 - Probability Cheatsheet PDF
No ratings yet
CME 106 - Probability Cheatsheet PDF
11 pages
Technical Article SSAB Structural Hollow Sections For Functional Design According To Eurocode3
No ratings yet
Technical Article SSAB Structural Hollow Sections For Functional Design According To Eurocode3
17 pages
Umts Cell Selection and Reselection
0% (1)
Umts Cell Selection and Reselection
29 pages
Whatsup Gold 2016: Installation and Configuration Guide
No ratings yet
Whatsup Gold 2016: Installation and Configuration Guide
25 pages
Probability Cheatsheet
100% (2)
Probability Cheatsheet
10 pages
Probability Theory Cheat Sheet
No ratings yet
Probability Theory Cheat Sheet
10 pages
Three Figure Bearing QP
No ratings yet
Three Figure Bearing QP
9 pages
Unit II
No ratings yet
Unit II
140 pages
Process Safety
No ratings yet
Process Safety
98 pages
ECN-511 Random Variables 11
No ratings yet
ECN-511 Random Variables 11
106 pages
Chapter 2: Earth in Space
No ratings yet
Chapter 2: Earth in Space
75 pages
Technical Information
No ratings yet
Technical Information
88 pages
Probability Cheatsheet
100% (1)
Probability Cheatsheet
8 pages
Probability Cheatsheet 140718
100% (1)
Probability Cheatsheet 140718
7 pages
Probability & Statistics (Refresher-2014)
No ratings yet
Probability & Statistics (Refresher-2014)
106 pages
Probability B and W PDF For Students
No ratings yet
Probability B and W PDF For Students
100 pages
CHP 5
No ratings yet
CHP 5
63 pages
Basics Statistics For Data Analysis
No ratings yet
Basics Statistics For Data Analysis
75 pages
ALL ST218 Lecture Notes
No ratings yet
ALL ST218 Lecture Notes
87 pages
Orientation - Basic Mathematics and Statistics - Probability
No ratings yet
Orientation - Basic Mathematics and Statistics - Probability
48 pages
Modulation
100% (1)
Modulation
5 pages
Chapter 5 Sampling in Discrete Even Simulation
No ratings yet
Chapter 5 Sampling in Discrete Even Simulation
56 pages
History of Elliptic Curve Cryptography
No ratings yet
History of Elliptic Curve Cryptography
3 pages
Wa0002.
No ratings yet
Wa0002.
55 pages
All in One CheatSheet PDF
No ratings yet
All in One CheatSheet PDF
52 pages
PLSQL
No ratings yet
PLSQL
48 pages
Revision - Elements or Probability: Notation For Events
No ratings yet
Revision - Elements or Probability: Notation For Events
20 pages
Stat 350 Study Guide
No ratings yet
Stat 350 Study Guide
37 pages
16 Uji Sterilitas
No ratings yet
16 Uji Sterilitas
39 pages
Slide 2 - 20191
No ratings yet
Slide 2 - 20191
44 pages
L2 - Mathematical Preliminaries
No ratings yet
L2 - Mathematical Preliminaries
41 pages
2024 10 14quiz
No ratings yet
2024 10 14quiz
41 pages
Object Oriented Analysis and Design - Syllabus
No ratings yet
Object Oriented Analysis and Design - Syllabus
1 page
Probability and Random Processes 2023
No ratings yet
Probability and Random Processes 2023
43 pages
GSM Network: S.H.Jamali
No ratings yet
GSM Network: S.H.Jamali
42 pages
Exam P Review Sheet
No ratings yet
Exam P Review Sheet
12 pages
Organic Chemistry Test
No ratings yet
Organic Chemistry Test
8 pages
A 18-Page Statistics & Data Science Cheat Sheets
No ratings yet
A 18-Page Statistics & Data Science Cheat Sheets
18 pages
Probability Basics
No ratings yet
Probability Basics
19 pages
Process Fluid Flow
No ratings yet
Process Fluid Flow
24 pages
Chapter6 Probability
No ratings yet
Chapter6 Probability
20 pages
Diesel Generator Set MTU 16V4000 DS2250 Dimension
No ratings yet
Diesel Generator Set MTU 16V4000 DS2250 Dimension
4 pages
Rvrlecture 1
No ratings yet
Rvrlecture 1
20 pages
Formula Sheet
No ratings yet
Formula Sheet
19 pages
13d-Waves and Optics FR Practice Problems-ANSWERS
No ratings yet
13d-Waves and Optics FR Practice Problems-ANSWERS
28 pages
CME 106 - Probability Cheatsheet
No ratings yet
CME 106 - Probability Cheatsheet
13 pages
Statistics Reviewer
No ratings yet
Statistics Reviewer
17 pages
Petroleum Geoscience Report
No ratings yet
Petroleum Geoscience Report
23 pages
DSP Lect 99.L1
No ratings yet
DSP Lect 99.L1
31 pages
Probability Probability Distribution Function Probability Density Function Random Variable Bayes' Rule Gaussian Distribution
No ratings yet
Probability Probability Distribution Function Probability Density Function Random Variable Bayes' Rule Gaussian Distribution
26 pages
What Is A Data Set?
No ratings yet
What Is A Data Set?
19 pages
ST2334 Notes
No ratings yet
ST2334 Notes
20 pages
Sam Roweis Probx
No ratings yet
Sam Roweis Probx
12 pages
Bayes' Rule and Law of Total Probability: C C C C C 1 2 3
No ratings yet
Bayes' Rule and Law of Total Probability: C C C C C 1 2 3
8 pages
Mit Micro Economics Lecture
No ratings yet
Mit Micro Economics Lecture
9 pages
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
No ratings yet
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
27 pages
Design and Fabrication of Inbuilt Hydraulic Jack For Four Wheelers
No ratings yet
Design and Fabrication of Inbuilt Hydraulic Jack For Four Wheelers
11 pages
Probability FoundationalMathofAI S24
No ratings yet
Probability FoundationalMathofAI S24
7 pages
Refresher Probabilities Statistics PDF
No ratings yet
Refresher Probabilities Statistics PDF
3 pages
Probability
No ratings yet
Probability
9 pages
The Ultimate Probability Cheatsheet
No ratings yet
The Ultimate Probability Cheatsheet
8 pages
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
No ratings yet
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
4 pages
Review of Probability Theory
No ratings yet
Review of Probability Theory
8 pages
ISC 5 Years Chemistry-1
No ratings yet
ISC 5 Years Chemistry-1
8 pages
Math Statistics
No ratings yet
Math Statistics
4 pages
5 Lestina EPRI NSF Condenser Design Challenge
No ratings yet
5 Lestina EPRI NSF Condenser Design Challenge
8 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
Alternating Current Short Notes
No ratings yet
Alternating Current Short Notes
4 pages
FS 10 Series: Model Input Output Max Rip. & Noise Effc
No ratings yet
FS 10 Series: Model Input Output Max Rip. & Noise Effc
6 pages
Satellite Comm Note Ymca
No ratings yet
Satellite Comm Note Ymca
5 pages
AAA Minahil Saeed Assigment 1
No ratings yet
AAA Minahil Saeed Assigment 1
4 pages
Cheat Sheet On Probability
No ratings yet
Cheat Sheet On Probability
2 pages
Minispec Droplet T137159
No ratings yet
Minispec Droplet T137159
2 pages
Part 1: C C: Ode and Ommentary
No ratings yet
Part 1: C C: Ode and Ommentary
1 page
Testing and Evaluating Glycol Sample
No ratings yet
Testing and Evaluating Glycol Sample
3 pages
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Introduction to Differentiable Manifolds
From Everand
Introduction to Differentiable Manifolds
Louis Auslander
4.5/5 (2)
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
A First Course in Functional Analysis
From Everand
A First Course in Functional Analysis
Martin Davis
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
The Green Book of Mathematical Problems
From Everand
The Green Book of Mathematical Problems
Kenneth Hardy
4.5/5 (3)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Data Analysis For Social Scientists Cheatsheet

Uploaded by

Data Analysis For Social Scientists Cheatsheet

Uploaded by

14.

Where can we find data?

1. Existing data libraries density function of a random variable.

2. Beneficence 120 140 160

• Maximize benefits and minimize harm Marginal Distribution

Independence X & Y are independent if

into each bin.

fXY (x, y) = fX (x)fY (y).

The conditional PDF of Y given X is

We have two random variables, X & Y .

• memoryless the function of the random sample, is the estimate. Xi , Xi , X ,...

Purpose Syntax Assessing and Deriving Estimators Properties:

unit of analysis. outcomes are realized and which are not? 2 1 X 

Asymptotic Distribution of the KS Statistic

Command Library What it does

Returns the number of rows

Computes the variance of x,

Computes the covariance of x

Returns a vector or array or

Fits a linear model to the

Fit linear models with

Function to implement the

Function to calculate the

• Causal Inference for Statistics, Social, and Biomedical Sciences (Guido W.

Please share this cheatsheet with friends!

Distribution PDF / PMF Expectation and Variance Graph

pX (x) = px (1 − p)1−x , E[X] = p

You might also like

unit of analysis. outcomes are realized and which are not? 2 1 X