0% found this document useful (0 votes)
31 views40 pages

AE 2023 Lecture10

The document discusses endogeneity problems in econometrics and methods for dealing with endogeneity using panel data analysis. It covers four typical cases of endogeneity: omitted variable bias, selection bias, simultaneity (reverse causality), and measurement error. It then describes how fixed effects models, instrumental variable approaches, and difference-in-differences analysis can be used to address endogeneity issues. Finally, it discusses models for panel data analysis including first differencing estimators, fixed effects models, random effects models, and instrumental variable regressions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views40 pages

AE 2023 Lecture10

The document discusses endogeneity problems in econometrics and methods for dealing with endogeneity using panel data analysis. It covers four typical cases of endogeneity: omitted variable bias, selection bias, simultaneity (reverse causality), and measurement error. It then describes how fixed effects models, instrumental variable approaches, and difference-in-differences analysis can be used to address endogeneity issues. Finally, it discusses models for panel data analysis including first differencing estimators, fixed effects models, random effects models, and instrumental variable regressions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Lecture 10: Endogeneity and Panel

Data Analysis
Applied Econometrics
Dr. Le Anh Tuan

1
Endogenous problems

► Variables that are correlated with the residual are called


endogenous variables (as opposed to exogenous variables)

► Notation: ! "#$# = &'( "# , $# ≠ 0 ', ! -′$# ≠ 0

2
Typical Cases Of Endogeneity

► Omitted variable bias


► An explanatory variable is omitted from the equation and
makes part of the error term
► Selection bias
► An unobservable characteristic has influence on both
dependent and explanatory variables
► Simultaneity (Reverse Causality)
► The causal relationship between the dependent variable
and the explanatory variable goes in both directions
► Measurement error
► Some of the variables are measured with error

3
Omitted variables

► True model: ! = #$1 + '$2 + )

► Model as it looks when we omit variable $2:


! = #$1 + * implying that * = '$2 + )
► This gives
+,* $1 , * = +,* $1, '$2 + ) = '+,* $1, $2 ≠ 0

► It can be remedied by including the variable in question, but


sometimes we do not have data for it.

4
Selection bias
► You would like to investigate the effect of pre—test on the English
test performance.
► In your class, you have 40 students, you will pick up 20 students
for taking the pre-test. After the test, you will compare the effect
between pre-test takers (treated group) and non-pre-test takers
(control group).
► Selection sample bias occurs when your sample is not random.
► Choose treated group as students with high English GPA.
► Choose treated group as students with IELTS score >7.0.

► Non sample selection bias if students are selected at random


► Students who have the last student IDs are an even
number.
► Each table, you choose randomly a student.

5
Selection bias
► Very similar to omitted variable bias

► Suppose you want to study how education impacts on the wage


an individual could potentially earn in the labour market - i.e. the
wage offer. Wage = a+ b education +u
► Suppose your sample contains a non-negligible proportion of
unemployed individuals. For these individuals, there is no
information on earnings, and so the corresponding observations
cannot be used when estimating the wage equation (missing
values for the dependent variable). Thus you’re looking at having
to estimate the earnings equation based on a non-random sample
- what we shall refer to as a selected sample.
► Circumstances under which OLS estimates, based on the selected
sample, will suffer from bias specifically selectivity bias.

6
Reverse Causality

► Occurs in models where variables are jointly determined


! = #$1 + '
$1 = (! + )
► Intuitively: Change in $1 will cause a change in ! , which will in turn
cause ! to change again $1
► All coefficients will be biased.

► Example:
► Investment and Productivity
► Sales and Advertisement
► CSR and Firm Performance

7
Measurement error

► Data is often measured with error:


► reporting errors.
► coding errors.

► When the measurement error is in the dependent variable, the


zero conditional mean assumption is not violated and thus no
endogeneity.

► In contrast, when the measure error is in the independent


variable, the problem of endogeneity arises.

8
How to deal with endogeneity?
Solving endogenous problems

► Omitted variable bias


► Fixed effects model, Instrumental variable approach

► Selection bias
► Inclusion of control variables, Difference-in-Differences
Analysis.

► Reverse Causality
► Instrumental variable approach

► Measurement error
► Robustness checks
Panel Data Analysis
► Panel data (also known as
longitudinal or cross-
sectional time-series
data) is a dataset in which
the behavior of entities
are observed across time.

► These entities could be


states, companies,
individuals, countries, etc.
Panel Data Analysis
► Panel data allows us to control for variables we cannot
observe
► measure like cultural factors or difference in business
practices across companies;
► variables that change over time but not across entities
(i.e. national policies, city/state regulations,
international agreements, etc.).
► Hence, it accounts for individual heterogeneity.

► Popular models:
► First-differenced estimator
► Fixed Effects Model
► Random Effects Model
► Instrumental Variable Regression
First-differenced estimator
► Consider an model,:
∆"#$ = &' ∆(#$ + *#$

► First-differenced estimates will be imprecise if explanatory


variables vary only little over time (no estimate possible if time-
invariant)
Fixed Effects Model
► Consider an model,:
!"# = %& '"# + %) *" + +"#
!"# ('"# ): is the dependent (independent) variable where i = country and t = time.

► Where *, is an unobserved variable that varies from one country


to the next but does not change over time.

► We want to estimate %1 , the effect of X on Y holding constant the


unobserved country characteristics Z.

► Because *, varies from one country to the next but is constant


over time, then let -, = %0 + %2*, ,the Equation becomes
!"# = %& '"# + -" + +"#
Fixed Effects Model
!"# = %& '"# + )" + *"#
)" is the unknown intercept for each country (or country-specific intercept)
► Average this equation over time for each i (between estimator):
!+" = %& '̅" + )" + *"̅
► Subtract the second equation from the first for each t (within
estimator)
!"# − !+" = %& ('"# − '̅" ) + (*"# − *"̅ )
► This is the fixed-effects regression model. The important thing is
that the )" have disappeared. We no longer need the assumption
that )" is uncorrelated with '"# .
► Time-constant unobserved heterogeneity is no longer a
problem→ fixed effects model.
► Because the intercept )0 can be thought of as the “effect” of being
in entity 0 (in the current application, entities are countries, the
terms )0 ,known as country fixed effects).
Fixed Effects Model

► The variation in the entity fixed effects comes from omitted


variables that, like !", vary across entities but not over time.

→ we use fixed effects model to deal with endogeneity from


omitted variables.
Fixed Effects Model
Set panel data

Estimate FEs models


Alternative : Fixed Effects by using dummy
variables
► How to estimate these parameters !"
► To develop the fixed effects regression model using dummy
variables.
► #1" be a dummy variable that equals 1 when i = 1 and
equals 0 otherwise.
► #2" equal 1 when i = 2 and equal 0 otherwise.
► so on.
► We choose the dummy variable #1" as a base group. Accordingly,
the fixed effects regression model can be written equivalently as:
$%& = () *%& + ,- #- + ,. #. + ⋯ + +,0 #0 +1%&
► Thus there are two equivalent ways to write the fixed effects
regression model
► In both formulations, the slope coefficient on 2 is the same from
one country to the next.
Alternative : Fixed Effects by using dummy
variables
Discussion of fixed effects estimator

► Strict exogeneity in the original model has to be assumed


► The R-squared of the demeaned equation is inappropriate
► The effect of time-constant variables cannot be estimated
► But the effect of interactions with time-invariant variables can
be estimated (e.g. the interaction of education with time
dummies)
► If a full set of time dummies are included, the effect of variables
whose change over time is constant cannot be estimated (e.g.
experience)
► Degrees of freedom have to be adjusted because the N time
averages are estimated in addition (resulting degrees of
freedom = NT-N-k)
Fixed effects or first differencing?

► Remember that first differencing can also be used if T > 2


► In the case T = 2, fixed effects and first differencing are identical
► For T > 2, fixed effects is more efficient if classical assumptions
hold
► First differencing may be better in the case of severe serial
correlation in the errors, for example if the errors follow a
random walk.
► If T is very large (and N not so large), the panel has a
pronounced time series character and problems such as strong
dependence arise
► In these cases, it is probably better to use first differencing
► Otherwise, it is a good idea to compute both and check
robustness
Random effects model
Random effects model
► The rationale behind random effects model is that, unlike the
fixed effects model, the variation across entities is assumed to
be random and uncorrelated with the predictor or independent
variables included in the model.

Random effects assumption:

The composite error ai + uit is uncorrelated with the explanatory


variables but it is serially correlated for observations coming from the
same i: Under the
assumption that
idiosyncratic errors
are serially
uncorrelated
For example, in a wage equation, for a given individual the same unobserved ability appears in
the error term of each period. Error terms are thus correlated across periods for this individual.

23
Random effects model
► Under the random effects assumptions explanatory variables are
exogenous so that pooled OLS provides consistent estimates

► If OLS is used, standard errors have to be adjusted for the fact


that errors are correlated over time for given i (= clustered
standard errors)
► But, because of the serial correlation, OLS is not efficient
► One can transform the model so that it satisfies the GM-
assumptions:

Quasi-
demeaned
data

Error can be shown to satisfy GM-assump

24
Random effects model

with

► The quasi-demeaning parameter is unknown but it can be


estimated
► FGLS using the estimated ! is called random effects estimation
► If the random effect is relatively unimportant compared to the
idosyncratic error, FGLS will be close to pooled OLS (because
! →0 )
► If the random effect is relatively important compared to the
idiosyncratic term, FGLS will be similar to fixed effects (because
! →1 )
► Random effects estimation works for time-invariant variables
Random effects

26
Random effects or fixed effects?

► In economics, unobserved individual effects are seldomly


uncorrelated with explanatory variables so that fixed effects is
more convincing.

► To decide between fixed or random effects you can run a


Hausman test (Green, 2008, chapter 9)

27
Instrumental variable regression

28
IV regression

► Motivation: Omitted Variables in a Simple Regression Model

► We use IV regression (Two—stage least squares estimation)

► Example: Education in a wage equation

►Definition of an instrumental variable:


1) It does not appear in the regression
2) It is highly correlated with the endogenous variable
3) It is uncorrelated with the error term

29
IV regression – Stages

► We have two stage:

► First stage: Regress Endogenous variable (X) on


!
instrumental variable → get predicted value of X (")

!
► Second stage: Regress Dependent variable on ".

► For example, we choose Father‘s education (fedu) as an IV for


education.
► It doesn‘t appear as regressor
► It is significantly correlated with educ
► It is uncorrelated with the error.

30
IV regression – Stages

► Step 1:

31
IV regression – Stages

► Step 2:

The casual impact of education on wage after controlling for


endogeneity
32
IV regression – Alternative command

Same
results

33
More on policy analysis and program evaluation
Difference-in-Differences Analysis
Difference-in-Differences Analysis
► DiD is a quasi-experimental technique used to understand the
effect of a sharp change in the economic environment or
government policy.

► For example: In 2005, Vietnamese government would like to


increase investment activity of foreign-owned firms, they decided
to decline corporate tax by 15% for foreign firms. In 2010, they
evaluate whether this policy is effective or not??

► How to evaluate??? We use DiD analysis

► The basic idea is to observe the comparison of treatment group


(affected by the policy/program) and a control group (not
affected by the policy/program) before and after the program.
Difference-in-Differences Analysis
► We construct a difference-in-differences model with an outcome y:
! = #$ + #& '()*')+ + #, -./' + #0 '()*')+ ∗ -./' + 2
► !: outcome

► 3()*')+: a dummy variable that equals 1 for treatment group, 0


for control group.

► 4./': a dummy variable that equals 1 for time period after the
policy change, 0 otherwise.

► Our main interest is the DiD coefficient, #0 .


Difference-in-Differences Analysis
► For example: In 2005, Vietnamese government would like to
increase investment activity of foreign-owned firms, they decided
to decline corporate tax by 15% for foreign firms. In 2010, they
evaluate whether this policy is effective or not??
► We have a dataset of firms in Vietnam between 2000-2010.

!"#$"#%&' ()*'+&,')&
= ./ + .1 &#'%&'2 + .3 $"+& + .4 &#'%&'2 ∗ $"+& + !")&#"6+ + 7

► 8#'%&'2: a dummy variable that equals 1 for foreign firms, 0 for


non-foreign firms.

► 9"+&: a dummy variable that equals 1 for years after 2005, 0


otherwise.
Difference-in-Differences Analysis
!"#$"#%&' ()*'+&,')&
= ./ + .1 &#'%&'2 + .3 $"+& + .4 &#'%&'2 ∗ $"+& + !")&#"6+ + 7

Pre Post Difference


Before-After difference for
Treated ./ +.1 ./ +.1 + 89 +8: ‘treatment’ group
.3 +.4
Control ./ ./ +.3 89 Before-After difference for
‘control’ group

Difference 8; 8; +8: 8:

DiD c"'>>(!(')&, .4 , indicating the average treatment effect (ATT


.4 = (P#'%&'2Q"+& − TreatedPre)− (T")&#"6Q"+& − C")&#"6Q#')

Difference Difference

Difference-in-Differences
Difference-in-Differences Analysis
!"#$"#%&' ()*'+&,')& = ./ + .1 &#'%&'2 + .3 $"+& + .4 &#'%&'2 ∗ $"+& + 6
Difference-in-Differences Analysis
"#$%#$&'(! )*+(,'-(*'
= 0.123 + 0.083'$(&'(6 + 0.047%#,' + 0.019'$(&'(6 ∗ %#,' + 0.007 ,);(
(0.078) (0.040) (0.011) (0.007) (0.009)
+0.071 "&,ℎ + 0.122$#( + 0.009"=_+#?&')?)'@
(0.102) (0.019) (0.001)

The coefficient of treated * post is positive (0.019) and statistically


significant at 1%.

→ Compared to non-foreign firms, corporate investment of


foreign firms is higher after the tax change in 2005.

→ This policy is effective to enhance investment activity.

You might also like