0% found this document useful (0 votes)
12 views38 pages

BST281Micro-EconPractice Panel1

The document provides an introduction to panel data, including its structure, estimation methods such as Pooled OLS and Random Effects, and the implications of unobserved heterogeneity. It discusses the assumptions required for these estimation techniques and the challenges associated with omitted variable problems. The document emphasizes the importance of understanding the relationship between covariates and unobserved effects in micro-econometric applications.

Uploaded by

chr4kvcm4h
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views38 pages

BST281Micro-EconPractice Panel1

The document provides an introduction to panel data, including its structure, estimation methods such as Pooled OLS and Random Effects, and the implications of unobserved heterogeneity. It discusses the assumptions required for these estimation techniques and the challenges associated with omitted variable problems. The document emphasizes the importance of understanding the relationship between covariates and unobserved effects in micro-econometric applications.

Uploaded by

chr4kvcm4h
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Panel data:

Introduction, Pooled OLS and Random Effect

Serena Trucchi

Cardiff University – Cardiff Business School

BST281 Micro-econometric Practice

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Roadmap

• Introduction to Panel data


• Motivation;
• Assumptions about the Covariates and the Unobserved Effect.

• Estimation methods
• Pooled OLS;
• Random effects.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Panel data

• Panel data (or longitudinal data) consist of repeated


observations on the same cross section of, say, individuals,
households, firms, or cities over time.

• Balanced panel: the same time periods are available for all cross
section units.
While the mechanics of the unbalanced case are similar to the
balanced case, a careful treatment of the unbalanced case requires
a formal description of why the panel may be unbalanced, and the
sample selection issues can be somewhat subtle.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Unobserved heterogeneity I

• We consider a particular structure for the error term.


• We explicitly add a time constant, unobserved effect to the model.
Often called unobserved heterogeneity .
In addition to unobserved effect and unobserved heterogeneity, ci is
sometimes called a latent effect or an individual effect, firm effect,
school effect, and so on.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Unobserved heterogeneity II

• Start with the balanced panel case, and assume random sampling
across i (the cross section dimension), with fixed time periods T .
So {(xit , yit ) : t = 1, ..., T , ci } where ci is the unobserved effect
drawn along with the observed data.
• ci is constant over time and unobservable (e.g. individual ability
or risk aversion; firm’s managerial quality).
• Note that ci is a random variable, and not a parameter to be
estimated.
• The unbalanced case is trickier because we must know why we
are missing some time periods for some units. We consider this
much later under missing data/sample selection issues.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Motivation: Omitted variable problem I

• For a random draw i from the population, the basic model is

yit = xit β + ci + uit , t = 1, ..., T ,


where {uit : t = 1, ..., T } are the idiosyncratic errors.
• The composite error at time t is

vit = ci + uit

• Because of ci , the sequence {vit : t = 1, ..., T } is almost certainly


serially correlated, and definitely is if {uit } is serially uncorrelated.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Motivation: Omitted variable problem II

• Useful to write a population version of the model in conditional


expectation form:

E (yt |xt , c) = xt β + c, t = 1, ..., T .


Therefore,

∂E (yt |xt , c)
βj = ,
∂xtj
so that βj is the partial effect of xtj on E (yt |xt , c), so that we are
“holding c fixed.”
• Hope is that we can allow c to be correlated with xt .

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Motivation: Omitted variable problem III

With a single cross section, there is nothing we can do unless


• we can find good observable proxies for c or
• IVs for the endogenous elements of xt .
But with two or more periods we have more options.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
The framework I

We assume
• a balanced panel and
• all asymptotic analysis – implicit or explicit – is with fixed T and
N → ∞, where N is the size of the cross section.

The basic unobserved effects model is

yit = xit β + ci + uit , t = 1, ..., T ,


where xit is 1 × K and so β is K × 1.
The model is written with β not depending on time. But xit can
include time period dummies and interactions of variables with
time periods dummies, so the model is quite flexible.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
The framework II

• A general specification is

yit = gt θ + zi δ + wit γ + ci + uit


where gt is a vector of aggregate time effects (often time
dummies), zi is a set of time-constant observed variables, and wit
changes across i and t (for at least some units i and time periods
t). wit can include interactions among time-constant and time
varying variables.
• In microeconometric applications, best to avoid calling ci a
“random effect” or a “fixed effect.” We are treating ci always as a
random variable.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Assumptions about the Covariates and the Unobserved
Effect

• In modern applications, “random effect” essentially means

Cov (xit , ci ) = 0, t = 1, ..., T ,


although we often will strengthen this.
• The term “fixed effect” means that no restrictions are placed on
the relationship between ci and {xit }.
• Recently, “correlated random effects” is used to denote situations
where we model the relationship between ci and {xit }, and it is
especially useful for nonlinear models (but also for linear models, as
we will see).

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Assumptions about Covariates and Idiosyncratic Errors I

yit = xit β + ci + uit

1. Contemporaneous Exogeneity Conditional on Unobserved Effect:

E (uit |xit , ci ) = 0
or
E (yit |xit , ci ) = xit β + ci .
• Ideally, we could proceed with just this assumption.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Assumptions about Covariates and Idiosyncratic Errors II
2. Strict Exogeneity Conditional on the Unobserved Effect

E (yit |xi1 , ..., xiT , ci ) = E (yit |xit , ci ) = xit β + ci ,


only xit affects the expected value of yit once ci is controlled for:
once xit and ci are controlled for, xis has no partial effect on yit for s 6= t.
When this assumption holds, we say that the {xit : t = 1, 2, ..., T }
are strictly exogenous conditional on the unobserved effect ci .

This assumption can also be stated in terms of the idiosyncratic


errors:
E (uit |xi1 , ..., xiT , ci ) = 0
meaning that the error term have mean zero conditional on past,
current and future values of the regressors.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Assumptions about Covariates and Idiosyncratic Errors III

Implications of E (uit |xi1 , ..., xiT , ci ) = 0:


• Zero unconditional mean: E (uit ) = 0
• Orthogonality Conditions:
• E (ci uit ) = 0
• E (x 0 is uit ) = 0 s, t = 1, ..., T
The second assumption is much stronger than assuming zero
contemporaneous correlation: E (x 0 it uit ) = 0. Nevertheless, it
allows arbitary correlation between ci and xit for all t.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Assumptions about Covariates and Idiosyncratic Errors IV
3. Strict Exogeneity Unconditional on the Unobserved Effect

E (yit |xi1 , ..., xiT ) = E (yit |xit ) = xit β,

• The assumption of strict exogeneity conditional on the


unobserved effect is weaker than if we did not condition on ci .
• More generally, it is easy to see that assumption (3) fails
whenever assumption (2) holds and the expected value of ci
depends on (xi1 , ..., xiT ).
Assuming the strict exogeneity condition holds conditional on ci ,
then
E (yit |xi1 , ..., xiT ) = xit β + E (ci |xi1 , ..., xiT ).
So assumption (3) fails if E (ci |xi1 , ..., xiT ) 6= E (ci ).
In particular, it fails if ci is correlated with any of the xit .

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Estimation and Testing

The four most common estimation methods for unobserved effects


models are:

1 Pooled OLS,
2 Random Effects,
3 Fixed Effects,
4 First Differencing.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Pooled OLS

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Pooled OLS I
• Under certain assumptions, the pooled OLS estimator can be
used to obtain a consistent estimator of β in model

yit = xit β + vit


vit = ci + uit

• Consistency (fixed T , N → ∞) of the POLS estimator is ensured


by E (xit0 vit ) = 0, for which we should assume

E (xit0 ci ) = 0
E (xit0 uit ) = 0, t = 1, ..., T .

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Pooled OLS II

• Contemporaneous exogeneity E (xit0 uit ) = 0 is weaker than strict


exogeneity, but it buys us little in practice because POLS also uses
E (xit0 ci ) = 0, which cannot hold for lagged dependent variables
and is unlikely for other variables that are not strictly exogenous.
• The composite errors will be serially correlated due to the
presence of ci in each time period.
Inference should be made robust to serial correlation and
heteroskedasticity.

• In Stata:
reg y x1 x2 ... xK, cluster(id)

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects Estimation I

ASSUMPTION RE.1:

(a) E (uit |xi1 , xi2 , ..., xiT , ci ) = 0, t = 1, ..., T


(b) E (ci |xi1 , xi2 , ..., xiT ) = E (ci )

• Assume xit includes (at least) unity, and probably time dummies
in addition. Then E (ci ) = 0 is without loss of generality.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects Estimation II

• A GLS approach also leaves ci in the error term:

yit = xit β + vit , t = 1, 2, ..., T


and we know that feasible GLS is consistent when

E (xis0 vit ) = 0, all s, t = 1, ..., T .


• This weaker version of strict exogeneity is implied by Assumption
RE.1.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects Estimation III

• Define
Ω = E (vi vi0 ) = Var (vi ).
T ×T

Remember that FGLS estimator is:


N
!−1 N
!
β^FGLS = X0i Ω̂−1 Xi X0i Ω̂−1 yi
X X

i=1 i=1

We exploit the panel dimension to recover the structure of the


T X T matrix Ω.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects Estimation IV

ASSUMPTION RE.2: Ω is nonsingular and rank E (X0i Ω−1 Xi ).


Needed for consistency of the GLS estimator.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects Estimation V

ASSUMPTION RE.3 (Conditional second moment assumptions):

(a) E (ui u0i |xi , ci ) = σu2 IT


(b) E (ci2 |xi ) = σc2

Note that RE estimator is generally consistent with or without


Assumption RE.3.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects Estimation VI

• RE imposes a special structure on Ω (which could be wrong!).


Under RE.1(a) and RE.3(a):

Var (uit ) = σu2 , t = 1, ..., T


Cov (uit , uis ) = 0, t 6= s

Under RE.1(b) and RE.3(b):

Var (cit ) = σc2

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects Estimation VII

Then

Var (vit ) = Var (ci + uit ) = Var (ci ) + Var (uit )


σv2 = σc2 + σu2

• Further, for t 6= s,

Cov (vit , vis ) = Cov (ci + uit , ci + uis )


= Var (ci ) + Cov (ci , uis ) + Cov (uit , ci ) + Cov (uit , uis )
= σc2

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects Estimation VIII

• Written in terms of its elements,


 2
σc + σu2 σc2 σc2 σc2

···

 σ2 .. 
 c σc2 + σu2 σc2 ··· . 

Ω=
 .. .. .. .. ,

 . . . . σc2 
 σc2 ··· σc2 σc2 + σu2 σc2
 

σc2 ··· σc2 σc2 σc + σu2
2

so the T × T matrix depends on only two parameters, σc2 and σu2


or, more directly, σv2 = σc2 + σu2 and σc2 , regardless of the size of T .
• Feasible GLS requires estimating Ω, that is, the two parameters.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects Estimation IX

• The pairwise correlations are Corr (vit , vis ) = ρ = σc2 /(σc2 + σu2 ).
Note that ρ is the fraction of the total variance accounted for by
ci , We can also write Ω as
 
1 ρ ··· ρ
 .. .. 
ρ 1 . .
Ω= σv2
 
. .. .. 
. . .
. ρ

ρ ··· ρ 1
which shows we only need to estimate ρ to proceed with FGLS.
• Typically, we estimate σv2 and σc2 , but ρ is useful for summarizing
the importance of ci .

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects Estimation X

• Note that the correlation between the composite errors vit and vis
do not depend on the difference between t and s:
Corr (vit , vis ) = ρ = σc2 /(σc2 + σu2 ) ≥ 0, s 6= t.

• Therefore, ρ does not tend to zero as t and s get far apart under
the RE covariance structure.

• Unlike standard models for serial correlation in time series settings,


the random effects assumption implies strong persistence in the
unobservables over time, due, of course, to the presence of ci .

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects Estimation XI
For now, assume that we have consistent estimators of σu2 and σc2 .
We can substitute them into the Ω matrix.
In a panel data context, the FGLS estimator that uses the variance
matrix Ω̂ is what is known as the random effects estimator:
N
!−1 N
!
β^RE X0i Ω̂−1 Xi X0i Ω̂−1 yi
X X

i=1 i=1

• Under RE1 and RE2 it is consistent (no need of RE3 for


consistency).
• If RE.3 holds the random effect estimator is asymptotically efficient
in the class of estimators consistent under RE1-RE2, including the
pooled OLS estimator and other weighted least squares estimators
(This is because the RE estimator is asymptotically equivalent to
GLS under assumptions RE1-RE3).
Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects Estimation XII
• In order to implement the RE procedure, we need to obtain σ̂u2
and σ̂c2 . Actually, it is easiest to first find σ̂v2 = σ̂u2 + σ̂c2 .
• We use pooled OLS to get the residuals, v̌it , across all i and t.
• Then a consistent estimator of σv2 (not generally unbiased), as N
gets large for fixed T , is
N X
T
σ̂v2 = (NT − K )−1
X
v̌it2 = SSR/(NT − K ),
i=1 t=1

the usual variance estimator from OLS regression.


This is based on, for each i, σv2 = T −1 T 2 P
t=1 E (vit ) and then
average across i, too.
Subtract K as a degrees-of-freedom adjustment.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects Estimation XIII

For σc2 , note that

T −1 T
σc2 = [T (T − 1)/2]−1
X X
E (vit vis ).
t=1 s=t+1

So a consistent “estimator” would be

N T −1 X
T
σ̃c2 = [NT (T − 1)/2]−1
X X
vit vis .
i=1 t=1 s=t+1

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects Estimation XIV

• An actual estimator replaces vit with the POLS residuals,

N T −1 X
T
σ̂c2 = [NT (T − 1)/2 − K ]−1
X X
v̌it v̌is ,
i=1 t=1 s=t+1

which subtracts K from NT (T − 1)/2 as a df adjustment. By the


usual argument,

plim σ̂c2 = σc2


N→∞

with T fixed.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects Estimation XV

• Now we can use


   
σ̂v2 · · · σ̂c2 σ̂c2 1 ··· ρ̂ ρ̂
 2
σ̂c σ̂v2 2
σ̂c  ρ̂ 1 ρ̂
 

Ω̂ = 
 .. .. ..  or Λ̂ =  ..

. . ..  ,
 . . . . . .
σ̂c2 · · · 2
σ̂c σ̂v 2 ρ̂ · · · ρ̂ 1

where ρ̂ = σ̂c2 /σ̂v2 , in FGLS.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects Estimation XVI

Fully robust inference is available for RE, and there are good
reasons for doing so.
1 Ω may not have the special (and restrictive, especially for large T)
RE structure, that is, E (vi vi ) need not have the RE form. Serial
correlation or changing variances in {uit : 1, ..., T } invalidate the
RE structure.
2 The system homoskedasticity requirement, E (vi vi |Xi ) = E (vi vi )
might not hold.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
Random Effects Estimation XVII

• A fully robust estimator is

N
!−1 N
! N
!
[ (β^RE ) = X0i Ω̂−1 Xi X0i Ω̂−1^ vi0 Ω̂−1 Xi X0i Ω̂−1 Xi
X X X
Avar vi ^
i=1 i=1 i=1

vi = yi − Xi β^RE is the vector of RE (FGLS) residuals.


where ^
• Sometimes, an iterative procedure is used. New residuals can be
used to obtain a new estimate of Ω, and so on.
• Using first-order asymptotics, no efficiency gain from iterating.
Might help with smaller N, though.

Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi
RE or POLS?
Why using RE instead of pooled OLS?

• RE removes a fraction of unobserved heterogeneity from the error


term, and so it has less bias (inconsistency) than POLS.

• RE estimator is more efficient: RE accounts for some form of serial


correlation in estimation, while POLS ignores serial correlation
entirely.

Note that both RE and POLS are inconsistent if ci is correlated


with the explanatory variables.

In Stata, fully robust inference uses the “cluster” option; for the
“usual” variance matrix estimator, drop this option:
xtreg y x1 x2 ... xK, re cluster(id)
Panel data: Introduction, Pooled OLS and Random Effect Serena Trucchi

You might also like