Ec1 15
Ec1 15
Lecture 15
Panel Data Models
y11 y 21
⋯ yi1 y N1
y y 22 yi 2 y N 2
12 ⋯
Time
y
y
y
.
.
.
y
⋯ ⋯
series ⋯ ⋯
1 2 ... i N
y
1t y 2t
⋯ yit y Nt
⋯ ⋯ ⋯
⋯
⋯
y1T y 2T ⋯ yiT
y NT
• A standard panel data set model stacks the 𝑦 𝑠 and the 𝒙 ′𝑠:
𝒚 = X + c +
X is a ∑ T x𝑘 matrix
is a 𝑘x1 matrix
c is ∑ T x1 matrix, associated with unobservable variables.
𝒚 and are ∑ T x1 matrices
– Indices:
- 𝑖: individuals –i.e., the unit of observation–,
- 𝑡: time period,
- 𝑗: observed explanatory variables,
- 𝑝 : unobserved explanatory variables.
– Time trend 𝑡 allows for a shift of the intercept over time, capturing
time effects –technological change, regulations, etc. But, if the implicit
assumption of a constant rate of change is strong (=δ), we use a set of
dummy variables, one for each time period except reference period.
25
𝑦, =𝛽 ∑ 𝛽 𝒙 , 𝑐 𝛿𝑡+ 𝜀,
31
𝑦, =𝛽 ∑ 𝛽 𝒙 , 𝑐 𝛿𝑡+ 𝜀,
Note: If the 𝑋 ’s are so comprehensive that they capture all relevant
characteristics of individual 𝑖, 𝑐 can be dropped and, then, pooled OLS
may be used. But, this is situation is very unlikely.
E 𝑦 , 𝜀 |𝒙 , , 𝑐 ] = 𝛽 ∑ 𝛽 𝒙 , 𝑐 𝛿𝑡
The βj’s are partial effects holding 𝑐 constant.
Long history: Rao (1965) and Chow (1975) worked on these models.
Compact Notation
• Compact Notation: 𝒚 =𝑿 +𝒄 + 𝜺
𝑿 is a T x𝑘 matrix
is a 𝑘x1 matrix
𝒄 is a Tix1 matrix
𝒚 and 𝜺 are T x1 matrices
Or 𝒚 = X* * + , with X * = [X ι] - ∑ Tx 𝑘 1 matrix.
* = [ 𝒄]’ - (𝑘 1)x1 matrix
• Simple idea: Aggregate over the clusters. The key is how (& when) to
cluster.
• These clustered standard errors are called Driscoll & Kraay SE (DK
SE’s). The clustered White-style SE are, sometimes, called Rogers SE.
They can all be just referred as LZ SE!
• PCSE’s using HAR estimators (based on KVB SE) are possible, see
Hansen (2007). Bootstrapping SE is also possible -many approaches;
see Goncalves and Perron (2017) for a factor model application.
• Ibragimov and Muller (2016) have a test for the appropriate clustering
level. It is based on the observed variation across different clusters.
- When the data correlates in more than one way, we have two cases:
- If nested (say, city and state), cluster at highest level of aggregation
- If not nested (e.g., time and industry), use “multi-level clustering.”
Pooled Model
• General DGP 𝑦, =𝒙, ′+𝑐 +𝜀, & (A2)-(A4) apply.
Pooled Model
• We have the CLM, estimating 𝑘 + 1 parameters :
𝑦 , = 𝒙 , ′ + 𝛼 + 𝜀 , Pooled OLS is BLUE & consistent.
Pooled Model
• In this context, OLS produces BLUE and consistent estimator. In this
model, we refer to pooled OLS estimation
Remark: Under the usual assumptions, pooled OLS using the between
transformation is consistent and unbiased.
.∑ ∑ 𝑧 𝑧̿ ∑ ∑ 𝑧 𝑧∗ ∑ 𝑇 𝑧∗ 𝑧̿
• Interpretation:
- Within group variation: Measures variation of individuals over time.
- Between group variation: Measures variation of the means across
individuals.
• For FGLS, use the pooled OLS residuals 𝒆 and 𝒆 to estimate the
covariance σ . Note that
1 T 1
ˆ et et ' E ' E
T t 1 T
• Why? The error is not longer 𝜀 , , but 𝑢 , . The Var[u] is given by:
i,2 i,1 2 2 2 0 0
2
i,3 i,2 2 2 2
Var (Toeplitz form)
0 2 2
i,T i,T 1 0 2 2 2
i i
• With two periods –i.e., before and after– and strict exogeneity:
Δ𝑦 , = 𝑦 , – 𝑦 , = 𝛿 + 𝛿 Treatmenti + (𝒙 , – 𝒙 , ′ 𝑢 ,
(This is a CLM. OLS is consistent and unbiased).
Then,
E[Δ𝑦 , |Treatmenti = 1] = 𝛿 + 𝛿 + E[Δ𝒙 , ′|Treatmenti = 1]
E[Δ𝑦 , |Treatmenti = 0] = 𝛿 + E[Δ𝒙 , ′|Treatmenti = 0]
- Cross-sectional difference
E[𝑦 , |𝑇𝑟 = 1, 𝑃𝑜𝑠𝑡 = 1] = 𝒙 , ′ + 𝑐 + 𝛾 + 𝛾 + 𝛾 + 𝛿
E[𝑦 , |𝑇𝑟 = 0, 𝑃𝑜𝑠𝑡 = 1] = 𝒙 , ′ + 𝑐 + 𝛾 + 𝛾
Note: From Lecture 8, we need to make sure that Treatment is the only
difference between the two groups. Thus, in the absence of treatment,
the average change in 𝑦 , would have been the same for both groups.
This is a key assumption behind the DD estimator, tested with t-tests or,
more usual, by looking at a graph of the behavior of both groups before
treatment –see Redding & Sturm (2008) in Lecture 8.
• We have two periods: Before and after the natural experiment (the
treatment).
Note 1: Under the FEM, pooled OLS omits 𝑐 biased & inconsistent.
FEM: Estimation
• The FE model assumes 𝑐 = 𝛼 (constant; it does not vary with 𝑡):
𝒚 = 𝑿 +𝐝 𝛼+𝜺 , for each individual 𝑖.
• Stacking
y1 X1 d1 0 0 0
y2 X2 0 d2 0 0 β
ε
α
y N X N 0 0 0 dN
β
= [X, D] ε
α
= Zδ ε
FEM: Estimation
• The OLS estimates of β and α are given by:
1
b X X X D X y
D X D D D y
a
U s in g t h e F r is c h - W a u g h th e o r e m
b = [ X M D X ] 1 X M D y
FEM: Estimation
M1D 0 0
2
0 MD 0
MD (The dummy variables are orthogonal)
0 0 MND
MDi I Ti d i ( di d i ) 1 d = I Ti (1/Ti ) d i d
X M D X = Ni=1 X iMDi X i , X M X
i
i
D i k,l
i
T
t=1 (x it,k -x i.,k )(x it,l -x i.,l )
X M D y = Ni=1 X iM Di y i , X M y
i
i
D i k
i
T
t=1 (x it,k -x i.,k )(y it -y i. )
• That is, we subtract the group mean from each individual observation.
Then, the individual effects disappear. Now, OLS can easily be used to
estimate the 𝑘 β parameters, using the demeaned data.
Note:
– This is simple algebra –the estimator is just OLS
– Again, LS is an estimator, not a model.
– Note what a is when 𝑇 =1. Follow this with 𝑦 , – a – 𝒙 , ′b = 0
if 𝑇 = 1.
Δ𝑦 , = ∑ 𝛽 Δ𝒙 , 𝛿 Δ𝜀 ,
• FD Estimator
– Each variable is differenced once over time, so we are effectively
estimating the relationship between changes of variables.
2
Ti
Ni=1 t=1 (y it -ai -x itb)2
ˆ
N
i=1 Ti - N - K
(Note the degrees of freedom correction)
• Different tests:
– F-test based on the LSDV dummy variable model: constant or zero
coefficients for D. Test follows an 𝐹 , distribution.
– F-test based on FEM (the unrestricted model) vs. pooled model (the
restricted model). Test follows an 𝐹 , distribution.
– A LR can also be done –usually, assuming normality. Test follows a
χ distribution.
N+K
Pooled
FEM
• Calculations:
F-test594,3566 = [(651.78 - 83.89)/594]/[83.89/3566] = 40.64 (reject H0)
• Conditions:
(1) It is possible to treat each of the unobserved 𝑍 variables as being
drawn randomly from a given distribution.
(2) The 𝑍 variables are distributed independently of all of the 𝑿
variables. E[𝒛 ′𝑿 ] = 0.
Note: We would have to use the FEM, even if the first condition seems
to be satisfied.
• If (1) and (2) are satisfied, we can use the REM, and OLS will work,
but there is a complication: 𝑤 , is heteroscedastic.
y1 X1 ε1 u1i1 T1 observations
y X
2 2 β ε 2 u2i2 T2 observations
yN XN εN uNiN TN observations
= Xβ+ε+u Ni=1 Ti observations
= Xβ+w
In all that follows, except where explicitly noted, X, X i
and xit contain a constant term as the first element.
To avoid notational clutter, in those cases, x it etc. will
simply denote the counterpart without the constant term.
Use of the symbol K for the number of variables will thus
be context specific but will usually include the constant term.
= I Ti u ii Ti Ti
2 2
= 2 I Ti u2ii
= Ωi
Ω1 0 0
0 Ω2 0 (Note these differ only
Var[ w | X ]
in the dimension Ti )
0 0 Ω N
X X X X
N
Ni1 fi i i a weighted sum of individual moment matrices
i1 T Ti
X ΩX X Ω X
N
Ni1 fi i i i a weighted sum of individual moment matrices
i1 T Ti
X i X i
= 2 Ni1 fi u2 Ni1 fi x i x i
Ti
X i X i
Note asymptotics are with respect to N. Each matrix is the
Ti
moments for the Ti observations. Should be 'well behaved' in micro
level data. The average of N such matrices should be likewise.
T or Ti is assumed to be fixed (and small).
• We can use pooled OLS, but for inferences we need the true
variance –i.e., the sandwich estimator:
1 1
1 XX XΩX XX
Var[b | X] N N
i1 Ti i1 Ti Ni1 Ti Ni1 Ti
0 Q-1 Q * Q-1
0 as N with our convergence assumptions
X Ω X X Ω X
N
Ni 1 f i i i i , w h e re = Ω i = E [ w i w i | X i ]
i1 T Ti
In th e s p irit o f th e W h ite e s tim a to r, u s e
X Ω X X w
ˆ wˆ X
ˆ i = y i - X ib
Ni 1 f i i i i i , w
Ni 1 T Ti
H y p o th e s is te s ts a re th e n b a s e d o n W a ld s ta tis tic s .
T H I S I S T H E 'C L U S T E R ' E S T I M A T O R
Est.Var[b | X ] X X
1
N
i 1 X i w 1
ˆ i X i X X
ˆ iw
ˆ i = set of Ti OLS residuals for individual i.
w
X i = Ti xK data on exogenous variable for individual i.
X i w
ˆ i = K x 1 vector of products
( X i w ˆ i X i ) KxK matrix (rank 1, outer product)
ˆ i )( w
X wˆ wˆ X
N
i 1 i i i i = sum of N rank 1 matrices. Rank K.
REM: GLS
• Standard results for GLS in a GR model
- Consistent
- Unbiased
- Efficient (if functional form for Ω correct)
ˆ = [ X Ω
β -1
X ] 1 [ X Ω -1
y ]
= [ N
i 1 X i Ω -1
i X i] 1
[ N
i 1 X i Ω -1
i y i]
1 2
Ω -1
i 2
I Ti 2 2
ii
T i u
( n o t e , d e p e n d s o n i o n ly th r o u g h T i )
REM: GLS
• The matrix Ω-1/2 = P is used to transform the data. That is,
y it y i x it x i it
where i 1 2 2
T i u2
Asy .Var [ ˆ GLS ] ( X ' 1 X ) 1 2 ( X *' X *) 1
Then, the bigger (smaller) the variance of the unobserved effect –i.e.,
individual heterogeneity is bigger–, the closer it is to FE (pooled
OLS). Also, when T is large, it becomes more like FE.
Ni1 Ti
2 Ni1 tTi 1 (y it aOLS x it b OLS ) 2 Ni1 tTi 1 (y it ai x it bLSDV ) 2
ˆu 0
Ni1 Ti
----------------------------------------------------------------------
Least Squares with Group Dummy Variables..........
LHS=LWAGE Mean = 6.67635
Residuals Sum of squares = 82.34912
Standard error of e = .15205
These 2 variables have no within group variation.
FEM ED
F.E. estimates are based on a generalized inverse.
--------+-------------------------------------------------------------
Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X
--------+-------------------------------------------------------------
EXP| .11346*** .00247 45.982 .0000 19.8538
EXPSQ| -.00042*** .544864D-04 -7.789 .0000 514.405
OCC| -.02106 .01373 -1.534 .1251 .51116
SMSA| -.04209** .01934 -2.177 .0295 .65378
MS| -.02915 .01897 -1.536 .1245 .81441
FEM| .000 ......(Fixed Parameter).......
UNION| .03413** .01491 2.290 .0220 .36399
ED| .000 ......(Fixed Parameter).......
--------+-------------------------------------------------------------
B r e u s c h a n d P a g a n L a g r a n g e M u lt ip lie r s t a t is t ic
A s s u m in g n o r m a lit y ( a n d f o r c o n v e n ie n c e n o w , a
b a la n c e d p a n e l)
2 2
NT Ni 1 ( T e i2 ) NT Ni 1 [ ( T e i2 ) e i e i ]
LM = N 1
N 2
2 ( T - 1 ) i 1 i 1 e it 2 (T -1 ) Ni 1 e i e i
C o n v e r g e s t o c h i- s q u a r e d [ 1 ] u n d e r t h e n u ll h y p o t h e s is
o f n o c o m m o n e f f e c t s . ( F o r u n b a la n c e d p a n e ls , t h e
s c a le in f r o n t b e c o m e s ( Ni 1 T i ) 2 / [ 2 Ni 1 T i ( T i 1 ) ] . )
FE vs. RE
• Q: RE estimation or FE estimation?
• Case for RE:
– Under no omitted variables –or if the omitted variables are
uncorrelated with 𝒙 , in the model– then a REM is probably best: It
produces unbiased and efficient estimates, & uses all the data available.
– RE can deal with observed characteristics that remain constant for
each individual. In FE, they have to be dropped from model.
– In contrast with FE, RE estimates a small number of parameters
– We do not lose N degrees of freedom.
– Philosophically speaking, a REM is more attractive: Why should we
assume one set of unobservables fixed and the other random?
FE vs. RE
• Case against RE:
- If either of the conditions for using RE is violated, we should use FE.
FE vs. RE
• FE estimation is always consistent. On the other hand, a violation of
condition (2) causes inconsistency in the RE estimation.
That is, if there are omitted variables, which are correlated with the 𝒙 ,
in the model, then the FEM provides a way for controlling for omitted
variable bias. In a FEM, individuals serve as their own controls.
ˆ -β
ˆ =β
Wald Criterion: q ˆ ;W =q
ˆ [Var( q
ˆ )]-1 q
ˆ
FE RE
ˆ - β ]
nT [β d
N[0 ,VFE ] (inefficient)
FE
ˆ - β )-( β
ˆ = (β
Note: q ˆ β ). The lemma states that in the
FE RE
Note: Columns of zeroes will show in VFEM if there are time invariant
variables in 𝒙 , . (Also, β does not contain the constant term.)
invariant variables in X.
Note: Pooled OLS is consistent, but inefficient under H0. Then, the
RE estimation is GLS.
• Rejection at the 5% level, like in this case, indicates that βFE ≠ βRE.
- Usually, this result is taken as an indication of a FEM.
• Since the errors and the unobserved effect may not be i.i.d. white noise,
Wooldridge (2009) suggests using PCSE.
--> matr;bm=b(1:8);vm=varb(1:8,1:8)$
--> matr;list;wutest=bm'<vm>bm$
• Now, you cannot reject the REM at the 5% level. Here you can say,
“after accounting for cross-sectional and temporal dependence, the
Hausman test indicates that the coefficient estimates from pooled OLS
estimation are consistent.”
Steps:
– 1. Compute case-specific mean variables
– 2. Transform X variables into deviations (within transformation)
– 3. Do not transform the dependent variable Y
– 4. Include both X deviation & X mean variables
– 5. Estimate with a RE model
Measurement Error
Heteroskedasticity - Review
• Given that there is a cross-section component to panel data, there will
always be a potential for heteroskedasticity.
Autocorrelation - Review
• Although different to autocorrelation using the usual univariate
models, a version of the Breusch-Pagan LM test can be used.
• To deal with autocorrelated errors, we can use the usual methods, say
pseudo-differencing. In general, we will estimate using the LSDV
residuals.
• As usual, OLS plus NW’s PCSE can help you to avoid a complicated
FGLS estimation. (The usual problems with HAC SE apply.)
PCSE - Review
• Key Assumption
– Correlations within a cluster (a group of firms, a region, different
years for the same firm, different years for the same region) are
the same for different observations.
• Procedure
– (1) Identify clusters using economic theory (industry, year, etc.)
– (2) Calculate clustered standard errors
– (3) Try different ways of defining clusters and see how the
estimated SE are affected. Be conservative, report largest SE.
• Performance
– Not a lot of studies –some simulations done for simple DGPs.
– PCSE’s coverage rates are not very good (typically below their
nominal size).
– PCSE using HAR estimators is a good idea.
• Criticisms:
- Angrist and Pichke (2009): Assumptions are not always plausible.
- Allison (2009)
- Bollen and Brand (2010): Hard to compare models.
• General remarks:
- Ignoring dynamics –i.e., lags– not a good idea: omitted variables
problem.
- It is important to think carefully about dynamic processes:
• How long does it take things to unfold?
• What lags does it make sense to include?
• With huge datasets, we can just throw lots in
– With smaller datasets, it is important to think things
through.
y it x it Yit c i it
• X = exogenous covariates
• Y = other endogenous covariates (may be related to εit)
• ci = unobserved unit-specific characteristic
• εit = idiosyncratic error
– Treat ci as random, fixed, or use differencing to wipe it out
– Use contemporaneous or lagged X and (appropriate) lags of Y as
instruments in two-stage estimation of yit.
– Strategies:
• Tests for unit roots in time series & panel data
• Differencing as a solution
– A reason to try FD models.