0% found this document useful (0 votes)

65 views80 pages

03 - Causality PDF

The document discusses a lecture on causality in empirical finance methods. It reviews potential biases like omitted variable bias, measurement error bias, and simultaneity bias that can invalidate causal inferences. It also covers ways to address omitted variable bias, such as adding observable omitted variables as controls to the regression.

Uploaded by

bhaskkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views80 pages

03 - Causality PDF

Uploaded by

bhaskkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 80

FNCE 926

Empirical Methods in CF
Lecture 3 – Causality

Professor Todd Gormley

Announcement
n  You should have uploaded Exercise #1
and your DO files to Canvas already
n  Exercise #2 due two weeks from today

2
Background readings for today
n  Roberts-Whited
q  Section 2
n  Angrist and Pischke
q  Section 3.2
n  Wooldridge
q  Sections 4.3 & 4.4
n  Greene
q  Sections 5.8-5.9

3
Outline for Today
n  Quick review
n  Motivate why we care about causality
n  Describe three possible biases & some
potential solutions
q  Omitted variable bias
q  Measurement error bias
q  Simultaneity bias

n  Student presentations of "Classics #2"

4
Quick Review [Part 1]
n  Why is adding irrelevant regressors a
potential problem?
q  Answer = It can inflate standard errors if the
irrelevant regressors are highly collinear with
variable of interest

n  Why is a larger sample helpful?

q  Answer = It gives us more variation in x,
which helps lower our standard errors

5
Quick Review [Part 2]
n  Suppose, β1 < 0 and β3 > 0 … what is the
sign of the effect of an increase in x1 for the
average firm in the below estimation?
y = β 0 + β1 x1 + β 2 x2 + β3 x1 x2 + u
q  Answer: It is the sign of
dy x2 = x2
| = β1 + β3 x2
dx1

6
Quick Review [Part 3]
n  How could we make the coefficients easier
to interpret in the prior example?
q  Shift all the variables by subtracting out their
sample mean before doing the estimation
q  It will allow the non-interacted coefficients to be
interpreted as effect for average firm

7
Quick Review [Part 4]
n  Consider the following estimate:
ln( wage) = 0.32 − 0.11 female + 0.21married
−0.30 ( female × married ) + 0.08education

q  Question: How much lower are wages of married

and unmarried females after controlling for
education, and who is this relative to?
n  Answer = unmarried females make 11% less than single
males; married females make –11%+21%–30%=20% less

8
Outline for Today
n  Quick review
n  Motivate why we care about causality
n  Describe three possible biases & some
potential solutions
q  Omitted variable bias
q  Measurement error bias
q  Simultaneity bias

n  Student presentations of "Classics #2"

9
Motivation
n  As researchers, we are interested in
making causal statements
q  Ex. #1 – what is the effect of a change in
corporate taxes on firms' leverage choice?
q  Ex. #2 – what is the effect of giving a CEO
more stock ownership in the firm on the
CEO's desire to take on risky investments?

n  I.e. we don't like to just say variables are

'associated' or 'correlated' with each other

10
What do we mean by causality?
n  Recall from earlier lecture, that if our linear
model is the following…

y = β 0 + β1 x1 + ... + β k xk + u

And, we want to infer β1 as the causal

effect of x1 on y, holding all else equal, then
we need to make the following
assumptions…

11
The basic assumptions
n  Assumption #1: E(u) = 0
n  Assumption #2: E(u|x1,…,xk) = E(u)
q  In words, average of u (i.e. unexplained portion
of y) does not depend on value of x
q  This is "conditional mean independence" (CMI)
n  Generally speaking, you need the estimation
error to be uncorrelated with all the x's

12
Tangent – CMI versus correlation
n  CMI (which implies x and u are
uncorrelated) is needed for unbiasedness
[which is again a finite sample property]
n  But, we only need to assume a zero
correlation between x and u for consistency
[which is a large sample property]
q  This is why I'll typically just refer to whether
u and x are correlated in my test of whether
we can make causal inferences

13
Three main ways this will be violated
n  Omitted variable bias
n  Measurement error bias
n  Simultaneity bias

n  Now, let's go through each in turn…

14
Omitted variable bias (OVB)
n  Probably the most common concern you
will hear researchers worry about
n  Basic idea = the estimation error, u,
contains other variable, e.g. z, that affects
y and is correlated with an x
q  Please note! The omitted variable is only
problematic if correlated with an x

15
OVB more formally, with one variable
n  You estimate: y = β 0 + β1 x + u
n  But, true model is: y = β 0 + β1 x + β 2 z + v

n  Then, βˆ1 = β1 + δ xz β2 , where δ xz is the

coefficient you'd get from regressing the
omitted variable, z, on x; and

cov( x, z )
δ xz =
var( x)

16
Interpreting the OVB formula
cov( x, z ) Bias
ˆ
β1 = β1 + β2
var( x)

Effect of Effect of
x on y Regression z on y
of z on x

n  Easy to see, estimated coefficient is only unbiased

if cov(x, z) = 0 [i.e. x and z are uncorrelated] or z
has no effect on y [i.e. β2 = 0]

17
Direction and magnitude of the bias
ˆ cov( x, z )
β1 = β1 + β2
var( x)

n  Direction of bias given by signs of β2, cov(x, z)

q  E.g. If know z has positive effect on y [i.e. β2 > 0]
and x and z are positively correlated [cov(x, z) > 0],
then the bias will be positive

n  Magnitude of the bias will be given by

magnitudes of β2, cov(x, z)/var(x)

18
Example – One variable case
n  Suppose we estimate: ln( wage) = β 0 + β1educ + w
n  But, true model is:
ln( wage) = β 0 + β1educ + β 2 ability + u

n  What is likely bias on β̂1 ? Recall,

ˆ cov(educ, ability )
β1 = β1 + β2
var(educ)

19
Example – Answer
q  Ability & wages likely positively correlated, so β2 > 0
q  Ability & education likely positive correlated, so
cov(educ, ability) > 0
q  Thus, the bias is likely to positive! βˆ is too big!
1

20
OVB – General Form
n  Once move away from simple case of just one
omitted variable, determining sign (and
magnitude) of bias will be a lot harder
q  Let β be vector of coefficients on k included variables
q  Let γ be vector of coefficient on l excluded variables
q  Let X be matrix of observations of included variables
q  Let Z be matrix of observations of excluded variables

ˆβ = β + E[ X'Z] γ
E[ X'X]

21
OVB – General Form, Intuition
ˆβ = β + E[ X'Z] γ
E[ X'X]
Vector of partial effects of
Vector of regression excluded variables
coefficients

n  Same idea as before, but more complicated

n  Frankly, this can be a real mess!
[See Gormley and Matsa (2014) for example with
just two included and two excluded variables]

22
Eliminating Omitted Variable Bias

n  How we try to get rid of this bias will

depend on the type of omitted variable
q  Observable omitted variable
q  Unobservable omitted variable

How can we deal with an

observable omitted variable?

23
Observable omitted variables
n  This is easy! Just add them as controls
q  E.g. if the omitted variable, z, in my simple case
was 'leverage', then add leverage to regression
n  A functional form misspecification is a special
case of an observable omitted variable
Let's now talk about this…

24
Functional form misspecification
n  Assume true model is…
y = β 0 + β1 x1 + β 2 x2 + β3 x22 + u
2
n  x
But, we omit squared term, 2
q  Just like any OVB, bias on (β0 , β1 , β2 )will
2
depend on β3 and correlations among 1 2 2 ) ( x , x , x
q  You get same type of problem if have incorrect
functional form for y [e.g. it should be ln(y) not y]
n  In some sense, this is minor problem… Why?

25
Tests for correction functional form
n  You could add additional squared and
cubed terms and look to see whether
they make a difference and/or have
non-zero coefficients
n  This isn't as easy when the possible
models are not nested…

26
Non-nested functional form issues…
n  Two non-nested examples are:
y = β 0 + β1 x1 + β 2 x2 + u Let's use this
versus example and
y = β 0 + β1 ln( x1 ) + β 2 ln( x2 ) + u see how we can
try to figure out
which is right
y = β 0 + β1 x1 + β 2 x2 + u
versus
y = β 0 + β1 x1 + β 2 z + u

27
Davidson-MacKinnon Test [Part 1]
n  To test which is correct, you can try this…
q  Take fitted values, ŷ , from 1st model and add them
as a control in 2nd model
y = β0 + β1 ln( x1 ) + β2 ln( x2 ) + θ1 yˆ + u
q  Look at t-stat on θ1; if significant rejects 2nd model!
q  Then, do reverse, and look at t-stat on θ1 in
y = β 0 + β1 x1 + β 2 x2 + θ1 yˆˆ + u
where ŷˆ is predicted value from 2nd model… if
significant then 1st model is also rejected L

28
Davidson-MacKinnon Test [Part 2]
n  Number of weaknesses to this test…
q  A clear winner may not emerge
n  Both might be rejected
n  Both might be accepted [If this happens, you can
use the R2 to choose which model is a better fit]
q  And, rejecting one model does NOT imply
that the other model is correct L

29
Bottom line advice on functional form
n  Practically speaking, you hope that changes
in functional form won't effect coefficients
on key variables very much…
q  But, if it does… You need to think hard about
why this is and what the correct form should be
q  The prior test might help with that…

30
Eliminating Omitted Variable Bias

n  How we try to get rid of this bias will

depend on the type of omitted variable
q  Observable omitted variable
q  Unobservable omitted variable

Unobservable are much harder to deal with,

but one possibility is to find a proxy variable

31
Unobserved omitted variables
n  Again, consider earlier estimation

ln( wage) = β 0 + β1educ + β 2 ability + u

q  Problem: we don't observe & can't measure ability

q  What can we do? Ans. = Find a proxy variable that
is correlated with the unobserved variable, E.g. IQ

32
Proxy variables [Part 1]
n  Consider the following model…
y = β 0 + β1 x1 + β 2 x2 + β3 x3* + u
where x3* is unobserved, but we have proxy x3
Then, suppose 3 = δ 0 + δ1 x3 + v
*
n  x
q  v is error associated with proxy's imperfect
representation of unobservable x3
q  Intercept just accounts for different scales
[e.g. ability has different average value than IQ]

33
Proxy variables [Part 2]
n  If we are only interested in β1 or β2, we can just
replace x3* with x3 and estimate
y = β 0 + β1 x1 + β 2 x2 + β3 x3 + u

n  But, for this to give us consistent estimates of β1

and β2 , we need to make some assumptions
#1 – We've got the right model, and
#2 – Other variables don't explain unobserved
variable after we've accounted for our proxy

34
Proxy variables – Assumptions
3 ) = 0 ; i.e. we have the right
*
#1 – E (u | x ,
1 2x , x
model and x3 would be irrelevant if we could
control for x1, x2, 3 , such that E (u | x3 ) = 0
*
x
q  This is a common assumption; not controversial

#2 – E (v | x1 , x2 , x3 ) = 0 ; i.e. x3 is a good proxy

* *
for x3 such that after controlling for x3, x3
doesn't depend on x1 or x2
I.e. E ( x3 | x1 , x2 , x3 ) = E ( x3 | x3 )
* *
q 

35
Why the proxy works…
n  Recall true model: y = β 0 + β x
1 1 + β x
2 2 + β 3 3 +u
x *

n  Now plug-in for x3, using x3 = δ 0 + δ1 x3 + v

y = ( β0 + β3δ 0 ) + β1x1 + β 2 x2 + ( β3δ 1 ) x3 + ( u + β3v )

!#"#$ % ! #"# $
α0 α1 e

q  Prior assumptions ensure that E (e | x1 , x2 , x3 ) = 0

such that the estimates of (α 0 , β1 , β 2 , α1 ) are consistent
q  Note: β0 and β3 are not identified

36
Proxy assumptions are key [Part 1]
n  Suppose assumption #2 is wrong such that
x = δ 0 + δ 1x3 + γ 1x1 + γ 2 x2 + w
*
3
!##"## $
v
where E ( w | x1 , x2 , x3 ) = 0

q  If above is true, E (v | x1 , x2 , x3 ) ≠ 0, and if you

substitute into model of y, you'd get…

37
Proxy assumptions are key [Part 2]
*
n  x
Plugging in for 3 , you'd get
y = α 0 + α1 x1 + α 2 x2 + α 3 x3 + e
where α 0 = β 0 + β3δ 0 E.g. α1 captures effect
α1 = β1 + β 3γ 1 of x1 on y, β1 , but also
α 2 = β 2 + β 3γ 2 its correlation with
unobserved variable
α 3 = β 3δ1

n  We'd get consistent estimates of (α 0 ,α1 ,α 2 ,α 3 )

But that isn't what we want!

38
Proxy variables – Example #1
n  Consider earlier wage estimation
ln( wage) = β 0 + β1educ + β 2 ability + u

q  If use IQ as proxy for unobserved ability, what

assumption must we make? Is it plausible?
n  Answer: We assume E (ability | educ, IQ) = E (ability | IQ) ,
i.e. average ability does not change with education after
accounting for IQ… Could be questionable assumption!

39
Proxy variables – Example #2
n  Consider Q-theory of investment
investment = β 0 + β1Q + u

q  Can we estimate β1 using a firm's market-to-book

ratio (MTB) as proxy for Q? Why or why not?
n  Answer: Even if we believe this is the correct model
(Assumption #1) or that Q only depends on MTB
(Assumption #2), e.g. Q=δ0+δ1MTB, we are still not
getting estimate of β1… see next slide for the math

40
Proxy variables – Example #2 [Part 2]
n  Even if assumptions held, we'd only be getting
consistent estimates of
investment = α 0 + α1Q + e
where α 0 = β0 + β1δ 0
α1 = β1δ1

q  While we can't get β1, is there something we can

get if we make assumptions about sign of δ1?
q  Answer: Yes, the sign of β1

41
Proxy variables – Summary

n  If the coefficient on the unobserved variable

isn't what we are interested in, then a proxy
for it can be used to identify and remove
OVB from the other parameters
q  Proxy can also be used to determine sign of
coefficient on unobserved variable

42
Random Coefficient Model
n  So far, we've assumed that the effect of x on y
(i.e. β) was the same for all observations
q  In reality, this is unlikely true; model might look
more like yi = αi + βi xi + ui , where
α i = α + ci I.e. each observation's
βi = β + di relationship between x
and y is slightly different
E (ci ) = E (d i ) = 0

q  α is the average intercept and β is what we call the

"average partial effect" (APE)

43
Random Coefficient Model [Part 2]
n  Regression would seem to be incorrectly
specified, but if willing to make assumptions,
we can identify the APE
If like, can think of
q  Plug in for α and β the unobserved
yi = α + β xi + ( ci + di xi + ui ) differential intercept
and slopes as
q  Identification requires omitted variable

E ( ci + di xi + ui | x ) = 0
What does this imply?

44
Random Coefficient Model [Part 3]
n  This amounts to requiring
E ( ci | x ) = E ( ci ) = 0 ⇒ E (α i | x ) = E (α i )
E ( di | x ) = E ( di ) = 0 ⇒ E ( βi | x ) = E ( βi )

q  We must assume that the individual slopes and

intercepts are mean independent (i.e. uncorrelated
with the value of x) in order to estimate the APE
n  I.e. knowing x, doesn't help us predict the
individual's partial effect

45
Random Coefficient Model [Part 4]
n  Implications of APE
q  Be careful interpreting coefficients when
you are implicitly arguing elsewhere in paper
that effect of x varies across observations
n  Keep in mind the assumption this requires
n  And, describe results using something like…
"we find that, on average, an increase in x
causes a β change in y"

46
Three main ways this will be violated
n  Omitted variable bias
n  Measurement error bias
n  Simultaneity bias

47
Measurement error (ME) bias
n  Estimation will have measurement error whenever
we measure the variable of interest imprecisely
q  Ex. #1: Altman-z-score is noisy measure of default risk
q  Ex. #2: Avg. tax rate is noisy measure of marg. tax rate

n  Such measurement error can cause bias, and

the bias can be quite complicated

48
Measurement error vs. proxies
n  Measurement error is similar to proxy variable,
but very different conceptually
q  Proxy is used for something that is entirely
unobservable or measureable (e.g. ability)
q  With measurement error, the variable we don't
observe is well-defined and can be quantified… it's
just that our measure of it contains error

49
ME of Dep. Variable [Part 1]
n  Usually not a problem (in terms of bias); just
causes our standard errors to be larger. E.g. …
Let y = β0 + β1 x1 + ... + β k xk + u
*
q 

But, we measure y* with error e = y − y

*
q 

q  Because we only observe y, we estimate

y = β 0 + β1 x1 + ... + β k xk + ( u + e )

Note: we always assume E(e)=0; this

is innocuous because if untrue, it
only affects the bias on the constant

50
ME of Dep. Variable [Part 2]
n  As long as E(e|x)=0, the OLS estimates
are consistent and unbiased
q  I.e. as long as the measurement error of y is
uncorrelated with the x's, we're okay
q  Only issue is that we get larger standard errors
when e and u are uncorrelated [which is what
we typically assume] because Var(u+e)>Var(u)

What are some common examples of ME?

51
ME of Dep. Variable [Part 3]
n  Some common examples
q  Market leverage – typically use book value
of debt because market value hard to observe
q  Firm value – again, hard to observe market
value of debt, so we use book value
q  CEO compensation – value of options are
approximated using Black-Scholes

Is assuming e and x are uncorrelated plausible?

52
ME of Dep. Variable [Part 4]
n  Answer = Maybe… maybe not
q  Ex. – Firm leverage is measured with error; hard to
observe market value of debt, so we use book value
n  But, the measurement error is likely to be larger when firm's
are in distress… Market value of debt falls; book value doesn't
n  This error could be correlated with x's if it includes things like
profitability (i.e. ME larger for low profit firms)
n  This type of ME will cause inconsistent estimates

53
ME of Independent Variable [Part 1]
n  Let's assume the model is y = β 0 + β1 x *
+u
n  But, we observe x* with error, e = x − x*
q  We assume that E(y|x*, x) = E(y|x*) [i.e. x
doesn't affect y after controlling for x*; this is
standard and uncontroversial because it is just
stating that we've written the correct model]

n  What are some examples in CF?

54
ME of Independent Variable [Part 2]
n  There are lots of examples!
q  Average Q measures marginal Q with error
q  Altman-z score measures default prob. with error
q  GIM, takeover provisions, etc. are all just noisy
measures of the nebulous "governance" of firm

Will this measurement error cause bias?

55
ME of Independent Variable [Part 2]
n  Answer depends crucially on what we assume
about the measurement error, e
n  Literature focuses on two extreme assumptions
#1 – Measurement error, e, is uncorrelated
with the observed measure, x
#2 – Measurement error, e, is uncorrelated
with the unobserved measure, x*

56
Assumption #1: e uncorrelated with x
n  Substituting x* with what we actually
observe, x* = x – e, into true model, we have
y = β 0 + β1 x + u − β1e
q  Is there a bias?
n  Answer = No. x is uncorrelated with e by assumption,
and x is uncorrelated with u by earlier assumptions

q  What happens to our standard errors?

n  Answer = They get larger; error variance is now σ u2 + β12σ e2

57
Assumption #2: e uncorrelated with x*
n  We are still estimating y = β 0 + β1 x + u − β1e ,
but now, x is correlated with e
q  e uncorrelated with x* guarantees e is correlated
with x; cov( x, e) = E ( xe) = E ( x*e) + E (e2 ) = σ e2
q  I.e. an independent variable will be correlated with
the error… we will get biased estimates!

n  This is what people call the Classical Error-

in-Variables (CEV) assumption

58
CEV with 1 variable = attenuation bias
n  If work out math, one can show that the
estimate of β1, βˆ1 , in prior example (which
had just one independent variable) is…
⎛ σ x2* ⎞ This scaling
p lim( βˆ1 ) = β1 ⎜ 2
⎜ σ * + σ e2 ⎟⎟
factors is always
⎝ x ⎠
between 0 and 1
q  The estimate is always biased towards zero; i.e. it
is an attenuation bias
n  And, if variance of error, σ e2, is small, then attenuation
bias won't be that bad

59
Measurement error… not so bad?
n  Under current setup, measurement error
doesn't seem so bad…
q  If error uncorrelated with observed x, no bias
q  If error uncorrelated with unobserved x*, we
get an attenuation bias… so at least the sign
on our coefficient of interest is still correct

n  Why is this misleading?

60
Nope, measurement error is bad news
n  Truth is, measurement error is
probably correlated a bit with both the
observed x and unobserved x*
q  I.e… some attenuation bias is likely

n  Moreover, even in CEV case, if there

is more than one independent variable,
the bias gets horribly complicated…

61
ME with more than one variable
n  If estimating y = β 0 + β1 x1 + ... + β k xk + u , and
just one of the x's is mismeasured, then…
q  ALL the β's will be biased if the mismeasured
variable is correlated with any other x
[which presumably is true since it was included!]
q  Sign and magnitude of biases will depend on all
the correlations between x's; i.e. big mess!
n  See Gormley and Matsa (2014) math for AvgE
estimator to see how bad this can be

62
ME example
n  Fazzari, Hubbard, and Petersen (1988) is
classic example of a paper with ME problem
q  Regresses investment on Tobin's Q (it's measure
of investment opportunities) and cash
q  Finds positive coefficient on cash; argues there
must be financial constraints present
q  But Q is noisy measure; all coefficients are biased!

n  Erickson and Whited (2000) argues the pos.

coeff. disappears if you correct the ME

63
Three main ways this will be violated
n  Omitted variable bias
n  Measurement error bias
n  Simultaneity bias

64
Simultaneity bias
n  This will occur whenever any of the supposedly
independent variables (i.e. the x's) can be
affected by changes in the y variable; E.g.
y = β 0 + β1 x + u
x = δ 0 + δ1 y + v
q  I.e. changes in x affect y, and changes in y affect x;
this is the simplest case of reverse causality
q  An estimate of y = β0 + β1 x + u will be biased…

65
Simultaneity bias continued…
n  To see why estimating y = β 0 + β1 x + u won't
reveal the true β1, solve for x
x = δ 0 + δ1 y + v
x = δ 0 + δ1 ( β 0 + β1 x + u ) + v
⎛ δ 0 + δ1 β 0 ⎞ ⎛ v ⎞ ⎛ δ1 ⎞
x=⎜ ⎟+⎜ ⎟+⎜ ⎟u
⎝ 1 − δ1β1 ⎠ ⎝ 1 − δ1β1 ⎠ ⎝ 1 − δ1β1 ⎠

q  Easy to see that x is correlated with u! I.e. bias!

66
Simultaneity bias in other regressors
n  Prior example is case of reverse causality; the
variable of interest is also affected by y
n  But, if y affects any x, their will be a bias; E.g.
y = β 0 + β1 x1 + β 2 x2 + u
x2 = γ 0 + γ 1 y + w
q  Easy to show that x2 is correlated with u; and there
will be a bias on all coefficients
q  This is why people use lagged x's

67
"Endogeneity" problem – Tangent
n  In my opinion, the prior example is
what it means to have an "endogeneity"
problem or and "endogenous" variable
q  But, as I mentioned earlier, there is a lot of
misusage of the word "endogeneity" in
finance… So, it might be better just saying
"simultaneity bias"

68
Simultaneity Bias – Summary
n  If your x might also be affected by the y
(i.e. reverse causality), you won't be able to
make causal inferences using OLS
q  Instrumental variables or natural experiments
will be helpful with this problem

n  Also can't get causal estimates with OLS if

controls are affected by the y

69
"Bad controls"
n  Similar to simultaneity bias… this is when
one x is affected by another x; e.g.
y = β 0 + β1 x1 + β 2 x2 + u
x2 = γ 0 + γ 1 x1 + v
q  Angrist-Pischke call this a "bad control", and it
can introduce a subtle selection bias when
working with natural experiments
[we will come back to this in later lecture]

70
"Bad Controls" – TG's Pet Peeve
n  But just to preview it… If you have an x
that is truly exogenous (i.e. random) [as you
might have in natural experiment], do not put
in controls, that are also affected by x!
q  Only add controls unaffected by x, or just
regress your various y's on x, and x alone!

We'll revisit this in later lecture…

71
What is Selection Bias?
n  Easiest to think of it just as an omitted
variable problem, where the omitted
variable is the unobserved counterfactual
q  Specifically, error, u, contains some unobserved
counterfactual that is correlated with whether
we observe certain values of x
q  I.e. it is a violation of the CMI assumption

72
Selection Bias – Example
n  Mean health of hospital visitors = 3.21
n  Mean health of non-visitors = 3.93
q  Can we conclude that going to the hospital
(i.e. the x) makes you less healthy?
n  Answer = No. People going to the hospital are
inherently less healthy [this is the selection bias]
n  Another way to say this: we fail to control for what
health outcomes would be absent the visit, and this
unobserved counterfactual is correlated with going
to hospital or not [i.e. omitted variable]

73
Selection Bias – More later
n  We'll treat it more formally later when
we get to natural experiments

74
Summary of Today [Part 1]
n  We need conditional mean independence
(CMI), to make causal statements
n  CMI is violated whenever an independent
variable, x, is correlated with the error, u
n  Three main ways this can be violated
q  Omitted variable bias
q  Measurement error bias
q  Simultaneity bias

75
Summary of Today [Part 2]
n  The biases can be very complex
q  If more than one omitted variable, or omitted
variable is correlated with more than one
regressor, sign of bias hard to determine
q  Measurement error of an independent
variable can (and likely does) bias all
coefficients in ways that are hard to determine
q  Simultaneity bias can also be complicated

76
Summary of Today [Part 3]
n  To deal with these problems, there are
some tools we can use
q  E.g. Proxy variables [discussed today]
q  We will talk about other tools later, e.g.
n  Instrumental variables
n  Natural experiments
n  Regression discontinuity

77
In First Half of Next Class
n  Before getting to these other tools, will first
discuss panel data & unobserved heterogeneity
q  Using fixed effects to deal with unobserved variables
n  What are the benefits? [There are many!]
n  What are the costs? [There are some…]

q  Fixed effects versus first differences

q  When can FE be used?

n  Related readings: see syllabus

78
Assign papers for next week…
n  Rajan and Zingales (AER 1998)
q  Financial development & growth

n  Matsa (JF 2010)

q  Capital structure & union bargaining

n  Ashwini and Matsa (JFE 2013)

q  Labor unemployment risk & corporate policy

79
Break Time
n  Let's take our 10 minute break
n  We'll do presentations when we get back

Jeffrey M Wooldridge Solutions Manual and Supplementary Materials For Econometric Analysis of Cross Section and Panel Data 2003
94% (17)
Jeffrey M Wooldridge Solutions Manual and Supplementary Materials For Econometric Analysis of Cross Section and Panel Data 2003
135 pages
Time Series Forecasting - Project Report
33% (3)
Time Series Forecasting - Project Report
68 pages
Agricultural Experimentation
80% (5)
Agricultural Experimentation
354 pages
Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
James H. Stock, Mark W. Watson - Introduction To Econometrics - Solutions To End-Of-Chapter Exercises - Libgen - Li
No ratings yet
James H. Stock, Mark W. Watson - Introduction To Econometrics - Solutions To End-Of-Chapter Exercises - Libgen - Li
58 pages
Solutions To Problem Set 1
No ratings yet
Solutions To Problem Set 1
6 pages
ECON 342 AE Model Specification and Data Problems 2021
No ratings yet
ECON 342 AE Model Specification and Data Problems 2021
43 pages
Violations of Gauss Markov Assumptions: Omitted Variable Bias
No ratings yet
Violations of Gauss Markov Assumptions: Omitted Variable Bias
10 pages
ch9 - Model Specification and Data Problems
No ratings yet
ch9 - Model Specification and Data Problems
79 pages
04 - Panel Data PDF
No ratings yet
04 - Panel Data PDF
84 pages
2 Regression With Multiple Regressors 1
No ratings yet
2 Regression With Multiple Regressors 1
22 pages
Chapter 9
No ratings yet
Chapter 9
38 pages
Regression3 Discussion
No ratings yet
Regression3 Discussion
30 pages
Lecture 11 - Stochastic Regressors Measurement Errors
No ratings yet
Lecture 11 - Stochastic Regressors Measurement Errors
6 pages
KTN Omitted Variables
No ratings yet
KTN Omitted Variables
6 pages
Wooldridge 6e Ch09 SSM
No ratings yet
Wooldridge 6e Ch09 SSM
8 pages
Ec226 24-25 Week7 Thursday
No ratings yet
Ec226 24-25 Week7 Thursday
13 pages
CH 07 Specification and Data Issues TQT
No ratings yet
CH 07 Specification and Data Issues TQT
45 pages
Lasya - 21 April Ecotrix
No ratings yet
Lasya - 21 April Ecotrix
14 pages
Specification Choosing Independent Variables
No ratings yet
Specification Choosing Independent Variables
7 pages
Econometrics Cheat Sheet
No ratings yet
Econometrics Cheat Sheet
4 pages
Linear Regression 101
No ratings yet
Linear Regression 101
20 pages
Chapter 4
No ratings yet
Chapter 4
62 pages
Lecture 3-1 - Introduction To Multiple Regression
No ratings yet
Lecture 3-1 - Introduction To Multiple Regression
48 pages
chp2 Econometric
No ratings yet
chp2 Econometric
54 pages
Econ 4
No ratings yet
Econ 4
92 pages
Metrics Topic6 Part2 Controlvariables
No ratings yet
Metrics Topic6 Part2 Controlvariables
30 pages
Homework 3
No ratings yet
Homework 3
10 pages
Week 7 - Omitted Variable Bias
No ratings yet
Week 7 - Omitted Variable Bias
38 pages
Week 2, OLS
No ratings yet
Week 2, OLS
83 pages
Solutions For Tutorial 2
No ratings yet
Solutions For Tutorial 2
14 pages
Econ20222 MJAbackgr
No ratings yet
Econ20222 MJAbackgr
164 pages
Homework 3
No ratings yet
Homework 3
10 pages
Economics 308: Econometrics Professor Moody: Describing The Relationship Between Two Variables
No ratings yet
Economics 308: Econometrics Professor Moody: Describing The Relationship Between Two Variables
8 pages
Lecture 7
No ratings yet
Lecture 7
14 pages
Ssss PDF
No ratings yet
Ssss PDF
50 pages
PS4 Intro To Econometrics
No ratings yet
PS4 Intro To Econometrics
5 pages
Introduction To Econometrics With R
No ratings yet
Introduction To Econometrics With R
18 pages
EC303 MST 2022 Solution - R
100% (2)
EC303 MST 2022 Solution - R
15 pages
04 16 Simple Regression
No ratings yet
04 16 Simple Regression
47 pages
EC228 f2009
No ratings yet
EC228 f2009
16 pages
Metrics Topic6 Part1 Multipleregression
No ratings yet
Metrics Topic6 Part1 Multipleregression
33 pages
Coefficient Stability
No ratings yet
Coefficient Stability
41 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Session-Classical Assumption
No ratings yet
Session-Classical Assumption
26 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
EC1 Slides Part4
No ratings yet
EC1 Slides Part4
35 pages
Additional Cheatsheet en
No ratings yet
Additional Cheatsheet en
3 pages
Lec 5 V 11
No ratings yet
Lec 5 V 11
44 pages
Lec2 Ase Iev
No ratings yet
Lec2 Ase Iev
32 pages
Lecture6 MultiRegEstimate
No ratings yet
Lecture6 MultiRegEstimate
50 pages
Suggested Solutions: Problem Set 5: β = (X X) X Y
No ratings yet
Suggested Solutions: Problem Set 5: β = (X X) X Y
7 pages
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Differential Equations (Calculus) Mathematics E-Book For Public Exams
From Everand
Differential Equations (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Exercises of Equations and Disequations
From Everand
Exercises of Equations and Disequations
Simone Malacrida
No ratings yet
Calculus Super Review
From Everand
Calculus Super Review
Editors of REA
No ratings yet
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
From Everand
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
Luke Aneke
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
ANOVA in R in RIMSR
No ratings yet
ANOVA in R in RIMSR
12 pages
Some Jeh Le Solutions
No ratings yet
Some Jeh Le Solutions
6 pages
STATA Frain
No ratings yet
STATA Frain
68 pages
Dustmann Fasani Meng Minale - JHR (2023)
No ratings yet
Dustmann Fasani Meng Minale - JHR (2023)
34 pages
Dustmann Fasani Meng Minale - Risk Attitudes and HH Migration - JHR - Preprint
No ratings yet
Dustmann Fasani Meng Minale - Risk Attitudes and HH Migration - JHR - Preprint
60 pages
Bhaskar RIMSR CV
No ratings yet
Bhaskar RIMSR CV
2 pages
Ba Rimsr
No ratings yet
Ba Rimsr
110 pages
Draft Agenda For 25th Insurance & Pensions Summit
No ratings yet
Draft Agenda For 25th Insurance & Pensions Summit
5 pages
Knowledge-BasedSystems Lee 2012
No ratings yet
Knowledge-BasedSystems Lee 2012
13 pages
Yuan Yuan 2023 A Comprehensive Method For Ranking Mutual Fund Performance
No ratings yet
Yuan Yuan 2023 A Comprehensive Method For Ranking Mutual Fund Performance
19 pages
The Statistics of Sharpe Ratios
No ratings yet
The Statistics of Sharpe Ratios
18 pages
Income Data
No ratings yet
Income Data
12 pages
R Egression Simplified
No ratings yet
R Egression Simplified
24 pages
Applying Relative, Asset Oriented, and Real Option Valuation Methods To Mergers and Acquisitions
No ratings yet
Applying Relative, Asset Oriented, and Real Option Valuation Methods To Mergers and Acquisitions
36 pages
Edusure School: Macro - Economics
No ratings yet
Edusure School: Macro - Economics
7 pages
4 Days Equity - Analyst - Brochure
No ratings yet
4 Days Equity - Analyst - Brochure
3 pages
What Is Financial Modeling
No ratings yet
What Is Financial Modeling
57 pages
MATH1041 Final Cheat Sheet
No ratings yet
MATH1041 Final Cheat Sheet
3 pages
Unit 5 Descriptive Statistics Measures of Central Tendency
No ratings yet
Unit 5 Descriptive Statistics Measures of Central Tendency
6 pages
Introduction To Minitab
No ratings yet
Introduction To Minitab
14 pages
Research Ii: Whole Brain Learning System Outcome-Based Education
No ratings yet
Research Ii: Whole Brain Learning System Outcome-Based Education
16 pages
MTH220 SU05 Jan22 Annotated
No ratings yet
MTH220 SU05 Jan22 Annotated
54 pages
AGE 302 Introductory Notes-1
No ratings yet
AGE 302 Introductory Notes-1
19 pages
3 Linear Vs KNN Regression
No ratings yet
3 Linear Vs KNN Regression
4 pages
Tsa Solutions
No ratings yet
Tsa Solutions
49 pages
Earned Value Analysis Example - INF3708
100% (2)
Earned Value Analysis Example - INF3708
3 pages
100 Multiple-Choice Questions (MCQS) For Biostatistics - Clinical Corner
No ratings yet
100 Multiple-Choice Questions (MCQS) For Biostatistics - Clinical Corner
15 pages
For The Students - MODULE 3 - Week 5-7 - Numerical Techniques in Describing Data
No ratings yet
For The Students - MODULE 3 - Week 5-7 - Numerical Techniques in Describing Data
24 pages
Civ 7101 Assignment 1 2023 Katende Abdulazziz
No ratings yet
Civ 7101 Assignment 1 2023 Katende Abdulazziz
19 pages
Lesson 6 Weeks 7-8
No ratings yet
Lesson 6 Weeks 7-8
4 pages
Statistical Officer
No ratings yet
Statistical Officer
24 pages
Ken Black QA 5th Chapter 3 Solution
No ratings yet
Ken Black QA 5th Chapter 3 Solution
47 pages
Tutorial 5.2
No ratings yet
Tutorial 5.2
6 pages
The Art of Data Visualization - Learn 7 Visualizations in R
No ratings yet
The Art of Data Visualization - Learn 7 Visualizations in R
15 pages
Correlation and Regression: Libeeth B. Guevarra Department of Mathematics and Natural Sciences
No ratings yet
Correlation and Regression: Libeeth B. Guevarra Department of Mathematics and Natural Sciences
12 pages
Regression Analysis Project
100% (1)
Regression Analysis Project
4 pages
Econometrics Assignment Week 1-806979
No ratings yet
Econometrics Assignment Week 1-806979
6 pages
Week 7 Part 1KNN K Nearest Neighbor Classification
No ratings yet
Week 7 Part 1KNN K Nearest Neighbor Classification
47 pages
E-Notes 716 Content Document 20241127032803PM
No ratings yet
E-Notes 716 Content Document 20241127032803PM
10 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Uji Validitas Dan Reliabilitas
No ratings yet
Uji Validitas Dan Reliabilitas
3 pages
OUTPUT Spss Valid Ratna Mutiadoc
No ratings yet
OUTPUT Spss Valid Ratna Mutiadoc
1 page
PLS SEM Indeed A Silver Bullet
No ratings yet
PLS SEM Indeed A Silver Bullet
15 pages
Jurnal Hannum Anggina Dia
No ratings yet
Jurnal Hannum Anggina Dia
19 pages

03 - Causality PDF

Uploaded by

03 - Causality PDF

Uploaded by

FNCE 926

Professor Todd Gormley

n Student presentations of "Classics #2"

n Why is a larger sample helpful?

q Question: How much lower are wages of married

n Student presentations of "Classics #2"

n I.e. we don't like to just say variables are

And, we want to infer β1 as the causal

n Now, let's go through each in turn…

n Then, βˆ1 = β1 + δ xz β2 , where δ xz is the

n Easy to see, estimated coefficient is only unbiased

n Direction of bias given by signs of β2, cov(x, z)

n Magnitude of the bias will be given by

n What is likely bias on β̂1 ? Recall,

n Same idea as before, but more complicated

n How we try to get rid of this bias will

How can we deal with an

n How we try to get rid of this bias will

Unobservable are much harder to deal with,

ln( wage) = β 0 + β1educ + β 2 ability + u

q Problem: we don't observe & can't measure ability

n But, for this to give us consistent estimates of β1

#2 – E (v | x1 , x2 , x3 ) = 0 ; i.e. x3 is a good proxy

n Now plug-in for x3*, using x3* = δ 0 + δ1 x3 + v

y = ( β0 + β3δ 0 ) + β1x1 + β 2 x2 + ( β3δ 1 ) x3 + ( u + β3v )

q Prior assumptions ensure that E (e | x1 , x2 , x3 ) = 0

q If above is true, E (v | x1 , x2 , x3 ) ≠ 0, and if you

n We'd get consistent estimates of (α 0 ,α1 ,α 2 ,α 3 )

q If use IQ as proxy for unobserved ability, what

q Can we estimate β1 using a firm's market-to-book

q While we can't get β1, is there something we can

n If the coefficient on the unobserved variable

q α is the average intercept and β is what we call the

q We must assume that the individual slopes and

n Such measurement error can cause bias, and

But, we measure y* with error e = y − y

q Because we only observe y, we estimate

Note: we always assume E(e)=0; this

What are some common examples of ME?

Is assuming e and x are uncorrelated plausible?

n What are some examples in CF?

Will this measurement error cause bias?

q What happens to our standard errors?

n This is what people call the Classical Error-

n Why is this misleading?

n Moreover, even in CEV case, if there

n Erickson and Whited (2000) argues the pos.

q Easy to see that x is correlated with u! I.e. bias!

n Also can't get causal estimates with OLS if

We'll revisit this in later lecture…

q Fixed effects versus first differences

n Related readings: see syllabus

n Matsa (JF 2010)

n Ashwini and Matsa (JFE 2013)

You might also like

n  Student presentations of "Classics #2"

n  Why is a larger sample helpful?

q  Question: How much lower are wages of married

n  Student presentations of "Classics #2"

n  I.e. we don't like to just say variables are

n  Now, let's go through each in turn…

n  Then, βˆ1 = β1 + δ xz β2 , where δ xz is the

n  Easy to see, estimated coefficient is only unbiased

n  Direction of bias given by signs of β2, cov(x, z)

n  Magnitude of the bias will be given by

n  What is likely bias on β̂1 ? Recall,

n  Same idea as before, but more complicated

n  How we try to get rid of this bias will

n  How we try to get rid of this bias will

q  Problem: we don't observe & can't measure ability

n  But, for this to give us consistent estimates of β1

n  Now plug-in for x3, using x3 = δ 0 + δ1 x3 + v

q  Prior assumptions ensure that E (e | x1 , x2 , x3 ) = 0

q  If above is true, E (v | x1 , x2 , x3 ) ≠ 0, and if you

n  We'd get consistent estimates of (α 0 ,α1 ,α 2 ,α 3 )

q  If use IQ as proxy for unobserved ability, what

q  Can we estimate β1 using a firm's market-to-book

q  While we can't get β1, is there something we can

n  If the coefficient on the unobserved variable

q  α is the average intercept and β is what we call the

q  We must assume that the individual slopes and

n  Such measurement error can cause bias, and

q  Because we only observe y, we estimate

n  What are some examples in CF?

q  What happens to our standard errors?

n  This is what people call the Classical Error-

n  Why is this misleading?

n  Moreover, even in CEV case, if there

n  Erickson and Whited (2000) argues the pos.

q  Easy to see that x is correlated with u! I.e. bias!

n  Also can't get causal estimates with OLS if

q  Fixed effects versus first differences

n  Related readings: see syllabus

n  Matsa (JF 2010)

n  Ashwini and Matsa (JFE 2013)