0% found this document useful (0 votes)
10 views72 pages

Ec1 15

Panel data models involve repeated observations on the same entities over time, allowing researchers to analyze both cross-sectional and time-series effects. The document discusses the structure of panel data sets, the differences between balanced and unbalanced panels, and various modeling approaches including pooled, fixed effects, and random effects models. It highlights the advantages of large sample sizes for estimation while addressing the challenges of dependence among observations.

Uploaded by

Ananya Banerjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views72 pages

Ec1 15

Panel data models involve repeated observations on the same entities over time, allowing researchers to analyze both cross-sectional and time-series effects. The document discusses the structure of panel data sets, the differences between balanced and unbalanced panels, and various modeling approaches including pooled, fixed effects, and random effects models. It highlights the advantages of large sample sizes for estimation while addressing the challenges of dependence among observations.

Uploaded by

Ananya Banerjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

RS-15 – Panel Data

Lecture 15
Panel Data Models

(For private use, not to be posted/shared online).

Panel Data Sets


• A panel, or longitudinal, data set is one where there are repeated
observations on the same units: individuals, households, firms,
countries, or any set of entities that remain stable through time.

• Repeated observations create a potentially very large panel data sets.


With 𝑁 units and 𝑇 time periods  Number of observations: 𝑁𝑇.
– Advantage: Large sample! Great for estimation.
– Disadvantage: Dependence! Observations are, very likely, not
independent.

• Modeling the potential dependence creates different models.

R.S, 2023 - Do not post/shared without written authorization 1


RS-15 – Panel Data

Panel Data Sets


• The National Longitudinal Survey (NLS) of Youth is an example.
The same respondents were interviewed every year from 1979 to 1994.
Since 1994 they have been interviewed every two years.

• Panel data allows us a researcher to study cross section effects –i.e.,


along N, variation across the firms- & time series effects –i.e., along T,
variation across time.
Cross section

 y11 y 21 
⋯ yi1  y N1 
y y 22  yi 2  y N 2 
 12 ⋯
Time

y
y
y
.
.
.
y
⋯     ⋯ 
 
series ⋯ ⋯
 1 2 ... i N
y
 1t y 2t 
⋯ yit  y Nt 
 ⋯ ⋯ ⋯ 
⋯  
 ⋯ 
 y1T y 2T ⋯ yiT 
 y NT 

Panel Data Sets


• Notation:
 y11   yi1   x11 x21 ... xk1   w11 w21 ... wk1 
y  y  x  w w22 ... xk 2 
 12   i2   12 x22 ... xk 2   12
⋯   ⋯    ⋯ ⋯  ... ⋯   ⋯ 
⋯ ...  

y1    ;....; yi    X1    ; ....; X i   
 y1t   yit   x1t x2t ... xk 2   w1t w2t ... wk 2 
⋯   ⋯    ⋯ ⋯  ⋯ ⋯   ⋯ 
⋯ ⋯  

       
 y1T   yiT  x
 1T1 x2T1 ... xkT1  w1T j w2T j ... wkTj 

• A standard panel data set model stacks the 𝑦 𝑠 and the 𝒙 ′𝑠:
𝒚 = X + c + 
X is a ∑ T x𝑘 matrix
 is a 𝑘x1 matrix
c is ∑ T x1 matrix, associated with unobservable variables.
𝒚 and  are ∑ T x1 matrices

R.S, 2023 - Do not post/shared without written authorization 2


RS-15 – Panel Data

Balanced and Unbalanced Panels


• Notation:
𝑦 , , 𝑖 = 1,…, 𝑁; 𝑡 = 1,…, T
• Mathematical and notational convenience:
- Balanced: 𝑁𝑇
(that is, every unit is surveyed in every time period.)
- Unbalanced: ∑ T
Q: Is the fixed T assumption ever necessary? SUR models.

• The NLS of Youth is unbalanced because some individuals have not


been interviewed in some years. Some could not be located, some
refused, and a few have died. CRSP is also unbalanced, some firms are
listed from 1962, others started to be listed later.

Panel Data Models

• With panel data we can study different issues:


- Cross sectional variation (unobservable in time series data) vs.
Time series variation (unobservable in cross sectional data)
- Heterogeneity (observable and unobservable individual
heterogeneity)
- Hierarchical structures (say, zip code, city and state effects)
- Dynamics in economic behavior
- Individual/Group effects (individual effects)
- Time effects

R.S, 2023 - Do not post/shared without written authorization 3


RS-15 – Panel Data

Panel Data Models: Example 1 - SUR


• In Zellner’s SUR formulation (no linear dependence on 𝑦 , ) we have:
(A1) 𝑦 , = 𝒙 , ′ + 𝜀 , – the DGP
(A2) E[𝜺 |X] = 0,
(A3’) Var[𝜀 |X] = i2IT =  IT – groupwise heteroscedasticity.
E[𝜀 𝜀 |X] =  – contemporaneous correlation
E[𝜀 𝜀 |X] = 0 when 𝑡≠𝑠
(A4) Rank(X) = full rank

• In (A1) – (A4), we have the a GR model with heteroscedasticity. OLS


in each equation is OK, but not efficient. GLS is efficient.
• We are not taking advantage of pooling –i.e., using NT observations!
• Use LR or F tests to check if pooling (aggregation) can be done.

Panel Data Models: Example 2 - Pooling


• Assumptions
(A1) 𝑦 , = 𝒙 , ′ + 𝒛 ′ γ + 𝜀, – the DGP
𝑖 = 1, 2, ...., N - we have 𝑁 individual, groups or firms.
𝑡 = 1, 2, ...., T - usually, 𝑁 >> T .
(A2) E[𝜺 |X, 𝒛] = 0, – X and 𝒛: exogenous
(A3) Var[𝜺 |X, 𝒛] = 2I. – Heteroscedasticity can be allowed.
(A4) Rank(X) = full rank

• We think of X as a vector of observed characteristics. For example,


firm size, Market-to-book, Z-score, R&D expenditures, etc.

• We think of 𝒛 as a vector of unobserved characteristics (individual


effects). For example, quality of management, growth opportunities, etc.

R.S, 2023 - Do not post/shared without written authorization 4


RS-15 – Panel Data

Panel Data Models: Basic Model


• The DGP (A1) is linear:
𝑦, =𝛽 ∑ 𝛽 𝒙 , ∑ 𝛾 𝒛 𝛿𝑡+ 𝜀,

– Indices:
- 𝑖: individuals –i.e., the unit of observation–,
- 𝑡: time period,
- 𝑗: observed explanatory variables,
- 𝑝 : unobserved explanatory variables.
– Time trend 𝑡 allows for a shift of the intercept over time, capturing
time effects –technological change, regulations, etc. But, if the implicit
assumption of a constant rate of change is strong (=δ), we use a set of
dummy variables, one for each time period except reference period.
25

Panel Data Models: Basic Model

– 𝑿: The variables of interest –β is the vector of parameter of interest.


– Z: The variables responsible for unobserved heterogeneity (&
dependence on the 𝑦 ’s). Usually, a nuisance component of the model.

• The 𝑍 variables are unobserved: Impossible to obtain information


about the ∑ 𝛾 𝒛 component of the model. We define a term 𝑐 
the unobserved effect, representing the joint impact of the 𝑍 variables on
𝑦 – like an index of unobservables for individual 𝑖:
𝑐 ∑ 𝛾 𝒛

• We can rewrite the regression model as:

𝑦, =𝛽 ∑ 𝛽 𝒙 , 𝑐 𝛿𝑡+ 𝜀,
31

R.S, 2023 - Do not post/shared without written authorization 5


RS-15 – Panel Data

Panel Data Models: Basic Model

𝑦, =𝛽 ∑ 𝛽 𝒙 , 𝑐 𝛿𝑡+ 𝜀,
Note: If the 𝑋 ’s are so comprehensive that they capture all relevant
characteristics of individual 𝑖, 𝑐 can be dropped and, then, pooled OLS
may be used. But, this is situation is very unlikely.

• In general, dropping 𝑐 leads to missing variables problem: bias!

• We usually think of 𝑐 as contemporaneously exogenous to the conditional


error. That is, E[𝜀 |𝑐 ] = 0, 𝑡 = 1,..., 𝑇

A stronger assumption: Strict exogeneity can also be imposed. Then,


E[𝜀 |𝒙 , , 𝒙 , ,..., 𝒙 , , 𝑐 ] = 0, 𝑡 = 1,..., 𝑇
30

Panel Data Models: Basic Model

• Strict exogeneity conditions on the whole history of 𝒙 . Under this


assumption:

E 𝑦 , 𝜀 |𝒙 , , 𝑐 ] = 𝛽 ∑ 𝛽 𝒙 , 𝑐 𝛿𝑡
 The βj’s are partial effects holding 𝑐 constant.

• Violations of strict exogeneity are not rare. For example, if 𝒙 ,


contains lagged dependent variables or if changes in 𝜀 affect 𝒙 , (a
“feedback” effect).

• But to estimate β we still need to say something about the relation


between 𝒙 , and 𝑐 . Different assumptions will give rise to different
models.
30

R.S, 2023 - Do not post/shared without written authorization 6


RS-15 – Panel Data

Panel Data Models: Types

• The basic DGP: 𝑦 , = 𝒙 , ′i + 𝒛 ′ γ + 𝜀 ,


& (A2)-(A4) apply.

Depending on how we model the heterogeneity in the panel, we have


different models.

• Four Popular Models:


(1) Pooled (Constant Effect) Model
𝒛 ' γ is a constant. 𝒛 = 𝛼 (and uncorrelated with 𝒙 , !). Dependence on
the 𝑦 , may enter through the variance. That is, repeated observations
on individual i are linearly independent. In this case,
𝑦 , = 𝒙 , ′i + 𝛼 + 𝜀 ,

 OLS estimates 𝛼 and  consistently. We estimate 𝑘+1 parameters. 31

Panel Data Models: Types

(2) Fixed Effects Model (FEM)


The 𝒛 ’s are correlated with Xi Fixed Effects:
E[𝒛 |𝑿 ] = g(𝑿 ) = 𝛼 ∗ ;
the unobservable effects are correlated with included variables –i.e.,
pooled OLS will be inconsistent.

Assume 𝒛 ' γ = 𝛼 (constant; it does not vary with 𝑡). Then,


𝑦 , = 𝒙 , ′i + 𝛼 + 𝜀 ,
 the regression line is raised/lowered by a fixed amount for each
individual 𝑖 (the dependence created by the repeated observations!). In
econometrics terms, this is the source of the fixed-effects.

 We have a lot of parameters: 𝑘 + 𝑁. We have 𝑁 individual effects!


OLS can be used to estimates α and  consistently.

R.S, 2023 - Do not post/shared without written authorization 7


RS-15 – Panel Data

Panel Data Models: Types

(3) Random Effects Model (REM)


The differences between individuals are random, drawn from a given
distribution with constant parameters. We assume the 𝒛 ’s are
uncorrelated with the 𝑿 . That is,
E[𝒛 |𝑿 ] = μ (if 𝑿 contains a constant term, μ=0 WLOG).

Add and subtract E[𝒛 ' γ] = μ* from (*):


𝑦 , = 𝒙 , ′ + E[𝒛 ' γ] + (𝒛 ' γ) – E[𝒛 ' γ] + 𝜀 ,
= 𝒙 , ′ + μ* + 𝑢 + 𝜀 ,
We have a compound (“composed”) error –i.e., 𝑢 + 𝜀 , = 𝑤 , . This
𝑤 , introduces contemporaneous cross-correlations across the i group.

 OLS estimates μ and  consistently, but GLS will be efficient.

Panel Data Models: Types

(4) Random Parameters /Coefficients Model


We introduce heterogeneity through  . But, this may introduce
additional N parameters. A solution is to model  . For example,
𝑦 , = 𝒙 , ′( + ℎ ) + 𝛼 + 𝜀 ,
ℎ is a random vector that induces parameter variation, where ℎ ~ D(0,
𝜎 ). That is, we introduce heteroscedasticity.

Now, the coefficients are different for each individual. It is possible to


complicate the model by making them different through time:
 = ( + ℎ ) + θt where θt ~ D(0, 𝜎 ).

Estimation: GLS, MLE.

Long history: Rao (1965) and Chow (1975) worked on these models.

R.S, 2023 - Do not post/shared without written authorization 8


RS-15 – Panel Data

Compact Notation
• Compact Notation: 𝒚 =𝑿 +𝒄 + 𝜺
𝑿 is a T x𝑘 matrix
 is a 𝑘x1 matrix
𝒄 is a Tix1 matrix
𝒚 and 𝜺 are T x1 matrices

• Recall we stack the 𝒚 ’s and 𝑿 ’s: 𝒚=𝑿+𝒄+


X is a ∑ T x𝑘 matrix
 is a 𝑘x1 matrix
𝒄, 𝒚 and  are ∑ T x1 matrices

Or 𝒚 = X* * + , with X * = [X ι] - ∑ Tx 𝑘 1 matrix.
* = [ 𝒄]’ - (𝑘 1)x1 matrix

Assumptions for Asymptotics (Greene)


• Convergence of moments involving cross section 𝑿 .
Usually, we assume N increasing, T or T assumed fixed.
– “Fixed-T asymptotics” (see Greene)
– Time series characteristics are not relevant (may be nonstationary)
– If T is also growing, need to treat as multivariate time series.

• Rank of matrices. X must have full column rank.


 𝑿 may not, if T < 𝑘.

• Strict exogeneity and dynamics. If 𝒙 , contains 𝑦 , , then 𝒙 , cannot


be strictly exogenous. 𝒙 , will be correlated with the unobservables in
period t – 1. Inconsistent OLS estimates! (To be revisited later.)

R.S, 2023 - Do not post/shared without written authorization 9


RS-15 – Panel Data

Panel Data Models: (A3’) - No Homoscedasticity


• We can relax assumption (A3). The new DGP model:
𝒚 = X* * + , with X* = [X ι] – ∑ T x 𝑘 1 matrix.
* = [ c]’ – (𝑘 1)x1 matrix
Now, we assume (A3’) E[ '|X] = Σ ≠ σ2 𝐈∑ T
• Potentially, a lot of different elements in E[ '|X] in a panel:
- Individual heteroscedasticity. Usual groupwise heteroscedasticity.
- Autocorrelation (Individual/group/firm) effects. Errors have arbitrary
correlation across time for a particular individual i:
- Temporal correlation (Time) effects. Errors have arbitrary correlation across
individuals at a moment in time (SUR-type correlation).
- Persistent common shocks: Errors have some correlation between different
firms in different time periods (but, these shocks are assumed to die out
over time, and may be ignored after L periods).

Panel Data Models: (A3’) – Error Structures


• To understand the different elements in Σ, consider the following
DGP for the errors, 𝜀 , ’s:
𝜀 , = θi’ft + 𝜂 , , ft ~ D(0, σf2)
& 𝜂, =ϕ𝜂, +𝜍, , 𝜍 , ~ D(0, σςi2)
ft : vector of random factors common to all individuals/groups/firms.
θi: vector of factor loadings, specific to individual 𝑖.
𝜍 , : random shocks to individual 𝑖, uncorrelated across both 𝑖 and 𝑡.
𝜂 , : random shocks to 𝑖. This generates autocorrelation effects in 𝑖.

• θi’ft generates both contemporaneous (SUR) and time-varying cross-


correlations between 𝑖 and 𝑗. (Autocorrelations die out after L periods.)
- If ft is uncorrelated across 𝑡  only contemporaneous (SUR) effects.
- If ft is persistent in 𝑡  both SUR and persistent common effects.

R.S, 2023 - Do not post/shared without written authorization 10


RS-15 – Panel Data

Panel Data Models: (A3’) – Error Structures


• Different forms for E[ '|X]:
- Individual heteroscedasticity. E[𝜀 |X] = σ
 standard groupwise heteroscedasticity driven by 𝜍 , .
- Autocorrelation (Individual) effects: E[𝜀 , 𝜀 , |X] ≠ 0 (𝑡≠𝑠)
 auto-/time-correlation for errors, 𝜀 , driven by 𝜂 , .
- Temporal correlation effects: E[𝜀 , 𝜀 , |X] ≠ 0 (𝑖≠𝑗)
 contemporary cross-correlation for errors driven by ft.
- Persistent common shocks: E[𝜀 , 𝜀 , |X] ≠ 0 (𝑖≠𝑗) and |𝑡 − 𝑠| < L
 time-varying cross-correlation for errors driven by ft.

• Remark: Heteroscedasticity points to GLS efficient estimation, but, as


before, for consistent inferences we can use OLS with (adjusted for
panels) White or NW SE’s.

Panel Data Models: (A3’) – Clustered SE


• For consistent inferences, we can use OLS with White or NW SE’s:
- White SE’s adjust only for heteroscedasticity:
S0 = (1/T) ∑ 𝑒 𝒙 𝒙 
- NW SE’s adjust for heteroscedasticity and autocorrelation:
ST = S0 + (1/T)∑ 𝑘 𝑙 ∑ xt−l𝑒 𝑒 xt+ xt𝑒 𝑒 xt−l)

• But, cross-sectional (SUR) or “spatial” dependencies are ignored. If


present, the White’s or NW’s HAC need to be adjusted.

• Simple intuition: Repeating a dataset 10 times should not increase the


precision of parameter estimates. However, the i.i.d. assumption will do
this: Now, we divide by N𝑇, not 𝑇 or N.
 We cannot ignore the dependence in the data.
Obvious solution: Aggregate the repeated data –i.e., aggregate in groups

R.S, 2023 - Do not post/shared without written authorization 11


RS-15 – Panel Data

Panel Data Models: (A3’) – Clustered SE


• In general, the observations are not identical, but correlated within a
cluster –i.e., a group that share certain characteristic. Depending on the
data, the clusters may correspond to firms, industries, years, cities, etc.

• Simple idea: Aggregate over the clusters. The key is how (& when) to
cluster.

Canonical example: We want to study the effect of class size on 1st


graders' grades, the unobservables of 1st graders belonging to the same
classroom will be correlated (say, teachers’ quality, recess routines)
while will not be correlated with 1st graders in far away classrooms.
Then, we can cluster by school/teacher.

• In finance, it is reasonable to expect that shocks to firms in the same


industry are not independent. Then, we can cluster by industry.

Panel Data Models: (A3’) – Clustered SE


• We assume the existence of G disjoint clusters. Within the cluster, any
pattern of dependence and/or heteroscedasticity is allowed; but, there is
independence across the G clusters.

• Under the above assumption, it is easy to compute a counterpart to


White or NW SE in panels. These SE are usually referred as PCSE –
panel clustered SE- or, more general, just clustered SE or Liang-
Zeger, LZ, SE.

R.S, 2023 - Do not post/shared without written authorization 12


RS-15 – Panel Data

Panel Data Models: (A3’) – Clustered SE


• We remove the dependence by assuming correlation within a cluster,
but independence across clusters. That is, we think of the data as
𝑦 =𝒙 ′+𝑐 +δ𝑡+ 𝜀 𝑔 = 1, ..., G.
E𝜀 𝜀 |𝑿 =0 𝑔 𝑔′
=𝜎 𝑔 = 𝑔′

Or stacking the data by cluster:


𝒚 =𝒙 ′+𝑐 +𝜺 𝑔 = 1, ..., G.

• Let 𝒘 = 𝒙 ′ 𝜺 . Then, let E[𝒘 𝒘 ′] = 𝑾 be the 𝑘x𝑘 “meat” for


the 𝑔 cluster. We make inferences with OLS using the sandwich.
VarT[b|X] = (XX)-1 ∑ 𝑾 (XX)-1

𝑾 needs to be estimated. There are different ways to do it.

Panel Data Models: (A3’) – Clustered SE


• 𝑾 needs to be estimated. There are different ways to do it. But, a
natural estimator is to just replace 𝒘 = 𝒙 ′𝜺 by 𝒘 = 𝒙 ′𝒆
Est Var[b|X] = (XX)-1 ∑ 𝒘 𝒘 ′ (XX)-1

Corrections using degrees of freedom, using transformed residuals, etc.,


are common. We think of these clustered SE structures as clustered White
SE.

• Similarly, if we allow for autocorrelation in the structure of 𝒘 , then


we can also have clustered NW SE.

• Driscoll and Kraay (1998) provide an easy extension to estimate


robust NW SE’s in panels with cross-sectional dependencies: Average
the 𝒙t 𝑒 over the clusters we suspect cause dependence.

R.S, 2023 - Do not post/shared without written authorization 13


RS-15 – Panel Data

Panel Data Models: (A3’) – PCSE


• Recall that, within a cluster, we assume that the correlations within a
cluster are the same for different observations.

We define ℎ (b) = 𝒙 , 𝑒 , , which we average over the cluster 𝑁 :


ℎ (b) = ∑ ℎ (b)

• The 𝑘x𝑘 meat for cluster 𝑔 in the sandwich matrix is estimated as


X𝜮X = ∑ ℎ (b) ℎ (b)′

• Remark: The NW method is applied to the time series of cross-


sectional averages of ℎ (b). We average over the G clusters to get G
ℎ (b). We use the sandwich matrix to estimate clustered NW SE, using
w(l) as usual Bartlett or QS weights –other weights are OK.

Panel Data Models: (A3’) – PCSE


• The NW method is applied to the time series of cross-sectional
averages of ℎ (b). By using cross-sectional averages, estimated SE are
consistent independently of the panel’s cross-sectional dimension N.

• Clearly, these clustered NW SE reduce to the usual NW SE if each


cluster only has one observation.

• If we do not suspect autocorrelation problems –not rare, given that


many panel data sets have heavy temporally spaced observations-, we
can rely on White SE (S0).

• These clustered standard errors are called Driscoll & Kraay SE (DK
SE’s). The clustered White-style SE are, sometimes, called Rogers SE.
They can all be just referred as LZ SE!

R.S, 2023 - Do not post/shared without written authorization 14


RS-15 – Panel Data

Panel Data Models: (A3’) – PCSE


• These PCSE’s are robust to very general forms of cross-sectional (and
temporal) dependence. But, the usual problems with NW SE apply
(downwards biased, poor performance in finite sample, etc.)

• PCSE’s using HAR estimators (based on KVB SE) are possible, see
Hansen (2007). Bootstrapping SE is also possible -many approaches;
see Goncalves and Perron (2017) for a factor model application.

• Consistency of the PCSE is discussed by White (1984), Liang and


Zegger (1986) for panels with finite number of observations per cluster,
as G (or N) → ∞. Hansen (2007) shows that PCSE can be used with
Ng→ ∞ –i.e., long panels–, in addition to G → ∞.
Note: Asymptotic inference can be affected by small G, cluster
heterogeneity (different sizes, dominant cluster), and experiments where
“treatment” occurs only for a small number of clusters.

Panel Data Models: (A3’) – PCSE


• Lots of potential issues when G is not large. PCSE can be very poor.
Angrist and Pischke (2008) popularized “G > 42” for reliable
inferences. Some evidence that in many situations is a “generous” rule.

• Technical point: In practice, clusters vary greatly in size, Hansen and


Lee (2019) restrict the variation of Ng relative to the total sample size,
N. Now, not all Ng→ ∞ , as N → ∞ (in addition to G → ∞.)

Under this situation, the “normalizing factor,” for the application of


the CLT for OLS b, is not G, but “unknown.” This justifies the
widely use of the tG-1 distribution for tests, which is more conservative.

R.S, 2023 - Do not post/shared without written authorization 15


RS-15 – Panel Data

Panel Data Models: PCSE – Clustering


• Before calculating the NW SE, we cluster the data to remove the
dependence caused by the within group correlation of the data.

• We can cluster the SE by one variable (say, industry) or by several


variables (say, year and industry) –“multi-level clustering.” If these several
variables are nested (say, industry and state), cluster at highest level.

• We assume that the correlations within a cluster (a group of firms, a


region, different years for the same firm, different years for the same
region) are the same for different observations.

• Different clusters can produce very different SE. We want to cluster


in groups that produce correlated errors. Usually, we cluster using
economic theory (clustering by industry, year, industry and year).

Panel Data Models: PCSE – Clustering Remarks


• Since we allow for correlation between observations, clustered SE will
increase CIs. The higher the clustering level, the larger the resulting SE.

• In simulations, MacKinnon and Webb (2020) show that clustering at


the wrong level has serious implications: “too fine” clustering leads to
serious over-rejections; while “too coarse” clustering leads to some
over-rejection and loss of power, especially when G is small.

• Ibragimov and Muller (2016) have a test for the appropriate clustering
level. It is based on the observed variation across different clusters.

• When G is small, PCSE tend to be small.

• The asymptotics of multi-level clustering (say, by firm and by year),


popular in economics and finance, are not well established.

R.S, 2023 - Do not post/shared without written authorization 16


RS-15 – Panel Data

Panel Data Models: PCSE – Clustering Remarks


• Practical rules
- Usual rules of thumb for picking the clustering level:
(1) Use the coarsest feasible level (Cameron and Miller, 2015), but this
can be not reasonable when G is small or clusters very different in size.
(2) Try different ways of defining clusters and see how the estimated SE
are affected. Be conservative, use the cluster with the largest SE
(Angrist & Pischke, 2008).

- If aggregate variables (say, by industry, or zip code) are used in the


model, clustering should be done at that level.

- When the data correlates in more than one way, we have two cases:
- If nested (say, city and state), cluster at highest level of aggregation
- If not nested (e.g., time and industry), use “multi-level clustering.”

Pooled Model
• General DGP 𝑦, =𝒙, ′+𝑐 +𝜀, & (A2)-(A4) apply.

• The pooled model assumes that unobservable characteristics are


uncorrelated with 𝒙 , . We can rewrite panel DGP as:
𝑦, =𝒙, ′+𝑣, where 𝑣 = 𝑐 + 𝜀 , (compound error)

To get a consistent estimator of , we need E[𝒙 , ′ 𝑣 ] = 0.

Note: E[𝒙 , ′𝜀 , ] is derived from (A2) E[𝜀 , |𝒙 , , 𝑐 ] = 0. Then, to


get consistency, we need E[𝒙 , ′𝑐 ] = 0 for all 𝑡.

• Given the assumptions, we can assume 𝑐 = α −a constant,


independent of i. That is, no heterogeneity. Then:
𝑦, =𝒙, ′+α+ 𝜀,  CLM, with 𝑘 + 1 parameters.

R.S, 2023 - Do not post/shared without written authorization 17


RS-15 – Panel Data

Pooled Model
• We have the CLM, estimating 𝑘 + 1 parameters :
𝑦 , = 𝒙 , ′  + 𝛼 + 𝜀 ,  Pooled OLS is BLUE & consistent.

• Stacking the variables in matrices, we have:


𝒚=X+𝛼ι+ 
Dimensions:
− 𝒚, ι and  are ∑ T x1
− X is ∑ T x𝑘
−  is 𝑘x1

• We can re-write the pooled equation model as:


𝒚 = X* * + , X* = [X ι] − ∑ T x(𝑘 +1) matrix:
* = [ 𝛼]’ − (𝑘 +1)x1 matrix

Pooled Model
• In this context, OLS produces BLUE and consistent estimator. In this
model, we refer to pooled OLS estimation

• Of course, if our assumption regarding the unobservable variables is


wrong, we are in the presence of an omitted variable, c.

• Then, we have potential bias and inconsistency of pooled OLS. The


magnitude of these problems depends on how the true model behaves:
‘fixed’ or ‘random.’

R.S, 2023 - Do not post/shared without written authorization 18


RS-15 – Panel Data

Pooled Regression: Heterogeneity Bias


• In the pooled model, there is no model for group/individual i
heterogeneity. Thus, pooled regression may result in heterogeneity bias:
Pooled regression:
y
𝑦, =𝛽 +𝛽 𝑥, +𝜀,
• True model: Firm 4
j • •

• • True model: Firm 3
• •

• True model: Firm 2


• •

• • True model: Firm 1

Pooled Regression: Within Transformation


• We can estimate  (𝛽 = 𝛼) by centering the observations around
their group/individual means. That is,
𝑦, =𝛽 ∑ 𝛽 𝒙 , 𝛿𝑡+ 𝜀,
Subtracting the mean:
𝑦, 𝑦 =∑ 𝛽 𝒙 , 𝒙 𝛿 𝜀, 𝜀̅

• This method is called the within-groups estimation because the


model explains the variations about the mean of the dependent
variable in terms of the variations about the means of the explanatory
variables for the set of observations relating to a given unit, 𝑖.

• That is, this estimator reflects the time-series or within-individual 𝑖


information reflected in the changes within individuals across time.
  is estimated using the time-series information in the data.

R.S, 2023 - Do not post/shared without written authorization 19


RS-15 – Panel Data

Pooled Regression: Within Transformation


• There is a cost in the simplicity of the within-groups estimation.
First, the intercept 𝛽 and any 𝒙 variable that remains constant for
each individual (say, gender or College degree) will drop out of the
model.
The elimination of the intercept may not matter, but the loss of the
unchanging explanatory variables may be frustrating.

 Obviously, if we are interested on the effect of gender on CEO


compensation, within transformation will not work. But it will work
well if we are interested on the effect of an independent Board of
Directors, by looking at the compensation pre-/post-BOD.

Pooled Regression: Between Transformation


• There is an additional alternative to estimate , by expressing the
model in terms of group/individual means. That is,
𝑦, =𝛽 ∑ 𝛽 𝒙 , 𝛿𝑡+ 𝜀,
Computing the mean:
𝑦 =𝛽 ∑ 𝛽 𝒙 𝛿* 𝜀̅

• It is called the between estimator because it relies on variations between


individuals (say, 𝑖 & 𝑗). We are estimating  using the cross-sectional
information in the data (the time-series individual i variation is gone!).

 Obviously, if we are interested on the effect of a new independent


BOD during the tenure of a CEO on that CEO’s compensation,
between transformation will not work. But it will work well if we
study the effect of gender on CEO’s compensation.

R.S, 2023 - Do not post/shared without written authorization 20


RS-15 – Panel Data

Pooled Regression: Between Transformation


• We lose observations (and power!): we have only N data points.

Remark: Under the usual assumptions, pooled OLS using the between
transformation is consistent and unbiased.

Useful Analysis of Variance Notation (Greene)


• The variance (total variation) quantifies the idea that each individual i
–say, each firm– differs from the overall average. We can decompose
the variance into two parts: a within-group/individual part and a
between group/individual part:

.∑ ∑ 𝑧 𝑧̿ ∑ ∑ 𝑧 𝑧∗ ∑ 𝑇 𝑧∗ 𝑧̿

Total variation = Within groups variation + Between groups variation

• Interpretation:
- Within group variation: Measures variation of individuals over time.
- Between group variation: Measures variation of the means across
individuals.

R.S, 2023 - Do not post/shared without written authorization 21


RS-15 – Panel Data

WHO Data (Greene)

Note: The variability is driven by between groups variation

Pooled Model: Living with (A3’)


• We start with the pooled model:
𝒚 = X* * + , with X * = [X ι] - ∑ T x(𝑘 +1) matrix.
* = [ α]’ - (𝑘 +1)x1 matrix
Now, we allow E[𝜺 𝜺 '|Xi ] =  Ω

• Potentially a lot of different forms for E[𝜺 𝜺 '|𝑿 ] in a panel:


- Individual heteroscedasticity. E[𝜀 |𝑿 ] = σ
- Individual/group effects: E[𝜀 𝜀 |𝑿 ] ≠ 0 (𝑡≠𝑠)
- Time (SUR or spatial) effects: E[𝜀 𝜀 |𝑿 ] ≠ 0 (𝑖≠𝑗)
- Persistent common shocks: E[𝜀 𝜀 |𝑿 ] ≠ 0 (𝑖≠𝑗) and |𝑡 𝑠| < L

• Heteroscedasticity points to GLS efficient estimation, but, for


consistent inferences we can use OLS with clustered White/NW SE.

R.S, 2023 - Do not post/shared without written authorization 22


RS-15 – Panel Data

Pooled OLS: Cornwell and Rupert Data (Greene)


Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years
Variables in the file are
EXP = work experience
WKS = weeks worked
OCC = occupation, 1 if blue collar,
IND = 1 if manufacturing industry
SOUTH = 1 if resides in south
SMSA = 1 if resides in a city (SMSA)
MS = 1 if married
FEM = 1 if female
UNION = 1 if wage set by unioin contract
ED = years of education
BLK = 1 if individual is black
LWAGE = log of wage = dependent variable in regressions
These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel
Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied
Econometrics, 3, 1988, pp. 149-155. See Baltagi, page 122 for further analysis. The data were
downloaded from the website for Baltagi's text.

Pooled OLS: Clustered SE – Results (Greene)


Ordinary least squares regression ............
LHS=LWAGE Mean = 6.67635
Residuals Sum of squares = 522.20082
Standard error of e = .35447
Fit R-squared = .41121
Model test F[ 8, 4156] (prob) = 362.8(.0000)
Panel Data Analysis of LWAGE [ONE way]
Unconditional ANOVA (No regressors)
Source Variation Deg. Free. Mean Square
Between 646.25374 594. 1.08797
Residual 240.65119 3570. .06741
Total 886.90494 4164. .21299
--------+-------------------------------------------------
Variable| Coefficient Standard Error b/St.Er. P[|Z|>z]
--------+-------------------------------------------------
EXP| .04085*** .00219 18.693 .0000
EXPSQ| -.00069*** .480428D-04 -14.318 .0000
OCC| -.13830*** .01480 -9.344 .0000
SMSA| .14856*** .01207 12.311 .0000
MS| .06798*** .02075 3.277 .0010
FEM| -.40020*** .02526 -15.843 .0000
UNION| .09410*** .01253 7.509 .0000
ED| .05812*** .00260 22.351 .0000
Constant| 5.40160*** .04839 111.628 .0000

R.S, 2023 - Do not post/shared without written authorization 23


RS-15 – Panel Data

Pooled OLS: Clustered SE – Results (Greene)


|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |
+---------+--------------+----------------+--------+---------+
Constant 5.40159723 .04838934 111.628 .0000
EXP .04084968 .00218534 18.693 .0000
EXPSQ -.00068788 .480428D-04 -14.318 .0000
OCC -.13830480 .01480107 -9.344 .0000
SMSA .14856267 .01206772 12.311 .0000
MS .06798358 .02074599 3.277 .0010
FEM -.40020215 .02526118 -15.843 .0000
UNION .09409925 .01253203 7.509 .0000
ED .05812166 .00260039 22.351 .0000
Clustered SE
Constant 5.40159723 .10156038 53.186 .0000
EXP .04084968 .00432272 9.450 .0000
EXPSQ -.00068788 .983981D-04 -6.991 .0000
OCC -.13830480 .02772631 -4.988 .0000
SMSA .14856267 .02423668 6.130 .0000
MS .06798358 .04382220 1.551 .1208
FEM -.40020215 .04961926 -8.065 .0000
UNION .09409925 .02422669 3.884 .0001
ED .05812166 .00555697 10.459 .0000

Note: Clustered SE’s tend to be bigger. The more correlation allowed,


the higher the SE.

Pooled Model: PCSE – Remarks


• All the remarks that we have done before, apply. Driscoll and Kraay
SE –i.e., cluster NW SE- are almost universally applied. Then, the
bigger the cross-sectional correlation, the bigger the SE.

• In simulations, it is found (as expected) that the PCSE perform better


when there is cross-sectional dependence in the data. But, when there is
no dependence in the cross-section, the standard White or NW SE do
better. In some cases, these differences can be significant

• Testing for cross-sectional dependence may be a good idea, especially


when results are not robust to different SE. LM tests can be easily
implemented. Pesaran (2004) proposes an easy test.

R.S, 2023 - Do not post/shared without written authorization 24


RS-15 – Panel Data

Pooled Model: PCSE – Remarks


• The computational issues are straightforward for balanced data. We
need only the vector of residuals, the model matrix (X), and indicators
for group (and, usually, time) to form the clusters.

• But for unbalanced there are two approaches


- Create a balanced subset of the panel to estimate Ω.
Advantage: Computationally simple.
- Loop over 𝑒 , 𝑒 , pairs to estimate covariances over available
overlapping time frames (loop over all pairs can take a long time).
Advantage: Information is not thrown out.

Application: Bid-Ask Spread (Hoechle)

R.S, 2023 - Do not post/shared without written authorization 25


RS-15 – Panel Data

Pooled Model with (A3’) - GLS


• We start with the pooled model:
𝒚 = X* * + ,
where X * = [X ι] − ∑ T x(𝑘 +1) matrix.
* = [ α]’ − (𝑘 +1)x1 matrix
Now, we allow E[𝜺 𝜺 '|𝑿 ] =  Ω

• We can use OLS with PCSE’s or we can do GLS.

Note: Why GLS? Efficiency.

• Suppose Ω = IT. Then, we only have cross-equation correlation, not


time correlation. We are back in the (aggregation) SUR framework

Pooled Model with SUR - GLS


• Suppose Ωij = IT. We are in the (aggregation) SUR framework:

ˆ GLS  ( X 'V 1 X ) 1 X 'V 1 y  ( X '[  I ]1 X ) 1 X '[  I ]1 y


 ( X '[1  I ]X ) 1 X '[1  I ] y

• For FGLS, use the pooled OLS residuals 𝒆 and 𝒆 to estimate the
covariance σ . Note that
1 T 1
ˆ   et et '  E ' E
T t 1 T

where E is a Tx𝑁 matrix and 𝒆 = [𝑒 , 𝑒 , ... 𝑒 , ]' is 𝑁x1 vector. We


need to invert Σ (𝑁x𝑁 matrix).

Note: In general, the rank(E) ≤ T. Then, rank(Σ ) ≤ T < 𝑁 


singularity, FGLS cannot be computed. This is a problem of the data,
not the model.

R.S, 2023 - Do not post/shared without written authorization 26


RS-15 – Panel Data

Pooled Model with Heteroscedasticity - GLS


• Now, suppose we have groupwise heteroscedasticity. That is,
E[𝜺 𝜺 '|𝑿 ] = 0 for 𝑖 ≠ 𝑗
E[𝜀 |𝑿 ] = Var[𝜀 |𝑿 ] = σ IT

• We do FGLS, as usual, using the pooled OLS residuals 𝑒 to estimate


the variance σ and, thus, to estimate Σ:
 12 0 ... 0 
 
 0  2 ... 0 
2

 .. .. ... .. 
 
 0 0 ...  N2 

• We can test this model with H0: σ = σ = ... = σ . We can use:


W=∑ (𝑠 – 𝑠 )/Var(𝑠 ) χ
where 𝑠 is computed using the pooled OLS 𝑒 residuals.

Pooled Model with Autocorrelation - GLS


• Now, suppose we have individual autocorrelation. That is,
E[𝜀 𝜀 |Xi ] = 0 for 𝑖 ≠ 𝑗
E[𝜀 𝜀 |Xi ] ≠ 0 -for example, 𝜀 , ρ 𝜀, 𝑢,
Var[𝜀 |Xi ] = σ2

• We do FGLS, as usual, using the pooled OLS residuals 𝑒 to estimate


the ρ and, thus, to estimate Σi:
 1 i ...  i T 1 
 
 u (i )   i
2
1 ... 0 
i 
1   i2  .. .. ... .. 
 T 1 
  i  iT  2 ... 1 
• We can test this model with H0: ρ1 = ρ2 = ... = ρN= 0. We can use an
LM test to test H0.

R.S, 2023 - Do not post/shared without written authorization 27


RS-15 – Panel Data

Pooled OLS with First Differences


• From the general DGP:
𝑦, =𝒙, ′+𝑐 + 𝜀, & (A2)-(A4) apply.

It may still be possible to use OLS to estimate , when we have


individual heterogeneity. We can use OLS if we eliminate the cause of
heterogeneity: 𝑐 

We can do this by taking first differences of the DGP. That is,


Δ𝑦 , = 𝑦 , – 𝑦 , = (𝒙 , – 𝒙 , ′+Δ𝑐 +Δ𝜀,
= Δ𝒙 , ′ + 𝑢 ,
Note: All time invariant variables, including cdisappear from the
model (one “diff”). If the model has a time trend –economic fluctuations
–, it also disappear, it become the constant term (the other “diff”). Thus,
this method is usually called “diffs in diffs” (DD or DiD).

Pooled OLS with First Differences


• With strict exogeneity of (𝑿 , 𝑐 ), the OLS regression of Δ𝑦 , on
Δ𝒙 , is unbiased and consistent, but inefficient.

• Why? The error is not longer 𝜀 , , but 𝑢 , . The Var[u] is given by:
  i,2  i,1   2 2  2 0 0 
   2 

 i,3   i,2   2 2  2  
Var     (Toeplitz form)
  0  2   2 
   
  i,T   i,T 1   0   2 2 2 
 i i 

• That is, first differencing produces heteroscedasticity. Efficient


estimation method: GLS.
• It turns out that GLS is complicated. Use OLS in first differences and
use Newey-West SE/PCSE with one lag.

R.S, 2023 - Do not post/shared without written authorization 28


RS-15 – Panel Data

OLS with First Diffs: Treatment Application


• Suppose there is random assignment to treatment and control
groups, like in a typical medical experiment.

• We compare the change in outcomes across the treatment and control


groups to estimate the treatment effect. (We used this method –“natural
experiment”– in Lecture 8 to deal with endogeneity.)

• With two periods –i.e., before and after– and strict exogeneity:
Δ𝑦 , = 𝑦 , – 𝑦 , = 𝛿 + 𝛿 Treatmenti + (𝒙 , – 𝒙 , ′  𝑢 ,
(This is a CLM. OLS is consistent and unbiased).
Then,
E[Δ𝑦 , |Treatmenti = 1] = 𝛿 + 𝛿 + E[Δ𝒙 , ′|Treatmenti = 1] 
E[Δ𝑦 , |Treatmenti = 0] = 𝛿 + E[Δ𝒙 , ′|Treatmenti = 0] 

OLS with First Diffs: Treatment Application


• Assuming that controls are orthogonal to Treatment:
𝛿 = E[Δ𝑦 , |Treatmenti = 1] – E[Δ𝑦 , |Treatmenti = 0]

𝛿 is the difference in average change in the two periods –i.e., before


and after– between the treated and control groups. This is the diffs in
diffs (DD, DiD) estimator.

• Typical problem: Exogeneity (randomness) of treatment. That is,


(A2) E[𝑢 , |Treatmenti ] = 0.

• In medical experiments, diffs in diffs estimation is routinely used to


evaluate the effectiveness of a new treatment and/or medication. Usual
H0: 𝛿 = 0. It can be tested with a t-test (using HAR/PCSEs).

R.S, 2023 - Do not post/shared without written authorization 29


RS-15 – Panel Data

OLS with First Diffs: Treatment Application


• Same result can be derived by looking at levels DGP (𝑦 , , 𝒙 , )
including two dummies: One for Treatment, 𝑇𝑟 , and one for after
Treatment (𝑃𝑜𝑠𝑡 ):
𝑦 , = 𝒙 , ′ + 𝑐 + 𝛾 + 𝛾 𝑇𝑟 + 𝛾 𝑃𝑜𝑠𝑡 + 𝛿 𝑇𝑟 x 𝑃𝑜𝑠𝑡  𝑢 ,

Now, it is easy to separate cross-sectional differences from time-series differences.

- Cross-sectional difference
E[𝑦 , |𝑇𝑟 = 1, 𝑃𝑜𝑠𝑡 = 1] = 𝒙 , ′ + 𝑐 + 𝛾 + 𝛾 + 𝛾 + 𝛿
E[𝑦 , |𝑇𝑟 = 0, 𝑃𝑜𝑠𝑡 = 1] = 𝒙 , ′ + 𝑐 + 𝛾 + 𝛾

Then, the cross sectional difference is:


E[𝑦 , |𝑇𝑟 =1, 𝑃𝑜𝑠𝑡 = 1] – E[𝑦 , |𝑇𝑟 = 0, 𝑃𝑜𝑠𝑡 = 1] = 𝛾 + 𝛿

OLS with First Diffs: Treatment Application


- Cross-sectional difference (continuation)
E[𝑦 , |𝑇𝑟 = 1, 𝑃𝑜𝑠𝑡 = 1] – E[𝑦 , |𝑇𝑟 = 0, 𝑃𝑜𝑠𝑡 = 1] = 𝛾 + 𝛿

Note: Unbiased if 𝛾 = 0  No permanent difference between the


treatment and control groups.

E[𝑦 , |𝑇𝑟 = 1, 𝑃𝑜𝑠𝑡 = 0] – E[𝑦 , |𝑇𝑟 = 0, 𝑃𝑜𝑠𝑡 = 0] = 𝛾

R.S, 2023 - Do not post/shared without written authorization 30


RS-15 – Panel Data

OLS with First Diffs: Treatment Application


- Time-sectional difference
E[𝑦 , |𝑇𝑟 = 1, 𝑃𝑜𝑠𝑡 = 1] – E[𝑦 , |𝑇𝑟 = 1, 𝑃𝑜𝑠𝑡 = 0] = 𝛾 + 𝛿

Note: Unbiased if γ2 = 0  No common trend over the pre- and post-


treatment times.
E[𝑦 , |𝑇𝑟 = 0, 𝑃𝑜𝑠𝑡 = 1] – E[𝑦 , |𝑇𝑟 = 0, 𝑃𝑜𝑠𝑡 = 0] = 𝛾

Note: From Lecture 8, we need to make sure that Treatment is the only
difference between the two groups. Thus, in the absence of treatment,
the average change in 𝑦 , would have been the same for both groups.

This is a key assumption behind the DD estimator, tested with t-tests or,
more usual, by looking at a graph of the behavior of both groups before
treatment –see Redding & Sturm (2008) in Lecture 8.

OLS with First Diffs: Natural Experiment


• In Finance & Economics, especially in Corporate Finance, we apply
the DD method when we use natural experiments (change in a law, policy
or a regulation) to study the effect of 𝑥 on 𝑦 . (Recall Lecture 8.)

• We have two periods: Before and after the natural experiment (the
treatment).

• If we also have a well-defined control group, where the treatment was


not administered –i.e., the natural experiment never occurred–, then, we
can use DD estimation.

• The number of groups, S, (treated & not treated) under consideration


is usually small –typically 2. N is usually very large.

R.S, 2023 - Do not post/shared without written authorization 31


RS-15 – Panel Data

Diffs in Diffs: Natural Experiment - 1


Example 1: We are interested in the effect of labor shocks on wages
and employments. Natural experiment: The 1980 Mariel boatlifs, a
temporary lifting of emigration restrictions in Cuba. Most of the
marielitos (the 1980 Cuban immigrants) settled in Miami.

• Two periods: Before and after the 1980 Mariel boatlifs.


• Control group: Low skilled workers in Houston, LA and Atlanta.
• Calculate unemployment and wages of low skilled workers in both
periods. Then, regress Δ𝑦 , against a set of control variables (industry,
education, age, etc.) and a treatment dummy:
Δ𝑦 , = 𝑦 , – 𝑦 , = δ0 + 𝛿 𝑇𝑟 + (𝒙 , – 𝒙 , ′  𝑢 ,

• H0: 𝛿 = 0. Card (1990) found no effect of massive immigration.

Diffs in Diffs: Natural Experiment - 2


Example 2: Suppose we are interested in the effect of a substantial
increase in bank deposits on lending practices. We can use the shale
revolution, which started around 2011, as a natural experiment.

• Two periods: Before and after shale revolution (say, 2011).


• Control group: Banks in counties outside shale formation areas.
• Measure lending practices (amount lent, FICO scores of loans, etc.),
𝑦 , , in both periods & regress Δ𝑦 , against a set of control variables
(size of county, size of bank, experience of bank employees, etc.) and a
treatment dummy:
Δ𝑦 , = 𝑦 , – 𝑦 , = δ0 + 𝛿 𝑇𝑟 + (𝒙 , – 𝒙 , ′  𝑢 ,

• H0: 𝛿 = 0. Glije (2011) rejects H0, especially for counties dominated


by small banks.

R.S, 2023 - Do not post/shared without written authorization 32


RS-15 – Panel Data

Diffs in Diffs: Remarks


• We express the DGP in terms of 𝑖 (individuals), 𝑠 (groups), and 𝑡
(time): 𝑦 , , = δs + δt + 𝛿 𝑇𝑟 , + 𝒙 , , '  𝜀 , ,

• Usually, we have small 𝑆 and 𝑇; but large 𝑁. Since, in general, we


have within group correlation (treated individuals show similar errors),
the asymptotics of the t-test are driven by 𝑆*𝑇.

• Donald and Lang (2004): Under the usual (generous) assumptions, it


converges to a normal distribution (a tST-K may work better).

• Intuition: Suppose that within 𝑠, 𝑡 groups the errors are perfectly


correlated. Then, we only have 𝑆*𝑇 independent observations!

• Given the potential (time-varying) correlations in the errors, OLS SE


can be terrible. PCSE tend to do better.

Dealing with Attrition


• Attrition problem: If an unbalanced panel is a result of some selection
process related to 𝜀 , , then endogeneity is present and need to be dealt
with using some correction methods. Otherwise, we have attrition bias.

• Example: In the "Quality of Life for cancer patients" study discussed


in Greene, appearance for the second interview was low for people with
initial low QOL (death or depression) or with initial high QOL (do not
need the treatment).

• Solutions to the attrition problem


– Heckman selection model (used in the study)
• Prob[Present at exit|covariates] = Φ(z’θ) (Probit model)
• Additional variable added to difference model i = ϕ(𝒛 ′θ)/Φ(𝒛 ′θ)
– The FDA solution: fill with zeros. (!)

R.S, 2023 - Do not post/shared without written authorization 33


RS-15 – Panel Data

Pooled Model: ML Estimation


• In the pooled model, 𝒚 = X  + , we assume 𝜺 ~ N(0, Σ), where
𝜺 = [𝜀 , , 𝜀 , ,..., 𝜀 , ]' and Σ is an NxN matrix.

• We can write the log likelihood function as:


L = log L(, Σ|X) = -NT/2 ln(2π) – T/2 ln|Σ| – ½ ∑ 𝜺 ′Σ−1𝜺

• The ML estimator is equal to the iterated FGLS estimator.

• Testing is straightforward with likelihood ratio test.


Example: H0: No cross correlation across equations: The off-diagonal
elements of Σ are zero.

LR = T (ln|ΣR| – ln|ΣU|) = T (∑ ln(𝑠 – ln|Σ|) χ /

Main Models: FEM and REM


• Two main approaches to fitting models using panel data:
(1) Fixed effects regressions.
(2) Random effects regressions.

• The key difference between these two approaches is how the


unobservable characteristics –the individual effects– are modeled.

• Terminology from experimental design (say, psychology or medicine),


where the emphasis was on the kind of sample at hand and inferences:
- FE: The individuals are fixed. The differences between them are not
of interest, only  is interesting. No intent on generalizing the results.
- RE: The individuals come from a random sample drawn from a larger
population, and the variance between them is interesting and can be
informative about the larger population.

R.S, 2023 - Do not post/shared without written authorization 34


RS-15 – Panel Data

The Fixed Effects Model (FEM)


• The fixed effects (FE) model
(A1) 𝑦 , = 𝒙 , ′  + 𝑐 + 𝜀 , –observation for individual 𝑖 at time 𝑡.
(A2) E[𝜀 , |𝑿 , , 𝑐 , ] = 0, for all 𝑡, 𝑠. –𝑿 and 𝑐 strict exogenous.

• The unobserved component, 𝑐 , is arbitrarily correlated with 𝒙 , :


E[𝑐 |𝑿 ] = g(𝑿 ) = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡  Cov[𝒙 , , 𝑐 ] ≠ 0.

Note 1: Under the FEM, pooled OLS omits 𝑐  biased & inconsistent.

• We summarize (“control for”) these unobservable effects with α , a


constant. All time invariant characteristic of individual 𝑖 (location,
gender, nationality, etc.) are swept away under this formulation.

Note 2: In a FEM, individuals serve as their own controls.

Estimation with Fixed Effects


• Whatever effects the omitted variables have on the individual i at one
time, they will also have the same effect at a later time, thus, their effects
will be constant, or “fixed.”

• For this, we need the omitted variables to have time-invariant values


with time-invariant effects. Typical example, a CEO’s IQ/gender. We
expect this variable to have the same effect at 𝑡=1 or 𝑡=10.

• As we will see, FEM are estimated using the within transformation.


Thus, if individuals do not change much (or at all) across time, a FEM
may not work very well. We need within-individuals variability in the
variables if we are to use individuals as their own controls.

R.S, 2023 - Do not post/shared without written authorization 35


RS-15 – Panel Data

Estimation with Fixed Effects


• Matrix notation
- In matrix notation for individual 𝑖 :
𝒚 =𝒙 ′+𝑐 + 𝜺 –𝑐 is a T x1 vector. (Each individual
has T observations.)

- In matrix notation for all individuals –i.e., stacking:


𝒚=X+c+  –Now, c, 𝒚, and  are ΣiT x1 vectors.

• Dummy variable representation:


𝑦 , =𝒙 , ′  ∑ 𝑐 𝑑 , 𝜀, with 𝑑 , = 1 if 𝑖 𝑗

FEM: Estimation
• The FE model assumes 𝑐 = 𝛼 (constant; it does not vary with 𝑡):
𝒚 = 𝑿 +𝐝 𝛼+𝜺 , for each individual 𝑖.

• Stacking
 y1   X1 d1 0 0 0
  
 y2    X2 0 d2 0 0   β 
ε
           α 
   
 y N   X N 0 0 0 dN 
β 
= [X, D]    ε
α
= Zδ  ε

• The FEM is the CLM, but with many independent variables: 𝑘 + 𝑁.


 OLS is unbiased, consistent, efficient, but impractical if 𝑁 is large.

R.S, 2023 - Do not post/shared without written authorization 36


RS-15 – Panel Data

FEM: Estimation
• The OLS estimates of β and α are given by:
1
b   X X X D   X y 
    D X D D   D y 
a    
U s in g t h e F r is c h - W a u g h th e o r e m
b = [ X M D X ]  1  X M D y 

• In practice, we do not estimate a –the 𝑐 –, they are not very


interesting. Moreover, since we are in a fixed-T situation, a is unbiased,
but not consistent. In addition, there is the potential incidental parameter
problem.

Note (Greene): LS is an estimator, not a model. Given the formulation


with a lot of dummy variables, this particular LS estimator is called Least
Squares Dummy Variable (LSDV) estimator.

FEM: Estimation

M1D 0 0 
 2 
0 MD 0 
MD   (The dummy variables are orthogonal)
 
 
 0 0 MND 
MDi  I Ti  d i ( di d i ) 1 d = I Ti  (1/Ti ) d i d
X M D X =  Ni=1 X iMDi X i , X M X 
i
i
D i k,l
i
T
  t=1 (x it,k -x i.,k )(x it,l -x i.,l )

X M D y =  Ni=1 X iM Di y i , X M y 
i
i
D i k
i
T
  t=1 (x it,k -x i.,k )(y it -y i. )

• That is, we subtract the group mean from each individual observation.
Then, the individual effects disappear. Now, OLS can easily be used to
estimate the 𝑘 β parameters, using the demeaned data.

• We know this method: The within-groups estimation.

R.S, 2023 - Do not post/shared without written authorization 37


RS-15 – Panel Data

FEM: Within Transformation Removes Effects


• The within-groups method estimates the parameters using demeaned
data. That is,
𝑦, 𝑦 =∑ 𝛽 𝒙 , 𝒙 𝛿* 𝜀, 𝜀̅

Recall: It is called within-groups/individuals method because because it


relies on variations within individuals rather than between individuals.

• For the usual asymptotic results, we need:


– (A2) E[Δ𝜀 , |𝑿 ] = 0.
– (A3’) E[𝜺 ′ 𝜺 |𝑿 , 𝑐 ] = Σ –different formulations OK.
– (A4) E[Δ𝑿 ′Δ𝑿 ] has full rank.

FEM: Within Transformation Removes Effects


• There are costs in the simplicity of the within-groups estimation:

1) All time-invariant variables (including constant) for each individual i


drop out of the model. This eliminates all between-individuals
variability (which may be contaminated by omitted variable bias) and
leaves only the within-subject variability to analyze.

2) Dependent variables are likely to have smaller variances than in the


original specification (measured as deviations from the i mean).

3) The manipulation involves the loss of N degrees of freedom (we


are estimating N means!).

R.S, 2023 - Do not post/shared without written authorization 38


RS-15 – Panel Data

FEM: LS Dummy Variable (LSDV) Estimator


• b is obtained by within-groups least squares (group mean deviations).
• Then, we can use the normal equations to estimate a:
D’Xb + D’Da = D’𝒚
a = (D’D)-1D’ (𝒚 – Xb)
𝟏
a = ∑ 𝒚, 𝒙, b =𝑒

Note:
– This is simple algebra –the estimator is just OLS
– Again, LS is an estimator, not a model.
– Note what a is when 𝑇 =1. Follow this with 𝑦 , – a – 𝒙 , ′b = 0
if 𝑇 = 1.

FEM: LSDV Estimator


• Avoid dummy variable trap: If a constant is present in the model, the
number of dummy variable should be N – 1. The omitted individual
or group becomes the reference category.

• However, the choice of reference category is often arbitrary and, thus,


the interpretation of the i will not be particularly interesting.

• Alternatively, we can drop the 1 intercept and define dummy


variables for all of the individuals. This is the more common
approach. The i now become the intercepts for each of the i’s.

• If E[𝜀 , |𝒙 , , 𝑐 ] ≠ 0, then LSDV cannot be used. It is inconsistent.


In this case, we need to use IVs. Or a good natural experiment.

R.S, 2023 - Do not post/shared without written authorization 39


RS-15 – Panel Data

FEM: First-Difference (FD) Method


• We can also eliminate the individual FE using the first-difference method.

• The unobserved effect is eliminated by subtracting the observation


for the previous time period from the observation for the current time
period, for all time periods:
𝑦, 𝑦, =∑ 𝛽 𝒙 , 𝒙 , 𝛿 𝑡 𝑡 1 𝜀, 𝜀,

Δ𝑦 , = ∑ 𝛽 Δ𝒙 , 𝛿 Δ𝜀 ,

• The error term is now (𝜀 , 𝜀 , ). As before, differencing


induces a moving average autocorrelation if 𝜀 , satisfies the CLM
assumptions.
Note: If 𝜀 , is subject to AR(1) autocorrelation and  is close to 1,
taking first differences may approximately solve the problem.

FEM: Estimation – FE or FD?


• Summary:

• Fixed-effects (or Within) Estimator


– Each variable is demeaned –i.e., subtracted by its average.
– Dummy Variable Regression –i.e., put in a dummy variable for each
cross-sectional unit, along with other explanatory variables. This may
cause estimation difficulty when N is large.

• FD Estimator
– Each variable is differenced once over time, so we are effectively
estimating the relationship between changes of variables.

R.S, 2023 - Do not post/shared without written authorization 40


RS-15 – Panel Data

FEM: Estimation – FE or FD?


• When N is large and T is small but greater than 2 (for T=2, FE=FD)
− FE is more efficient when 𝜀 , are serially uncorrelated while FD is
more efficient when 𝜀 , follows a random walk (ρ=1).

• When T is large and N is small


– FD has advantage for processes with large positive autocorrelation.
(If  is near 1, FD solves the nonstationary problem!)
– FE is more sensitive to nonnormality, heteroskedasticity, and serial
correlation in 𝜀 , .
– On the other hand, FE is less sensitive to violation of the strict
exogeneity assumption. Then, FE is preferred when the processes are
weakly dependent over time

FEM: Calculation of Var[b|X]


• Since we have assumed strict exogeneity: Cov[𝜀 , ,(𝒙 , , 𝑐 )] = 0, we
have OLS in the CLM. That is,
Asy.Var[b|X] = (   /  i=1 Ti )plim[(   /  i=1 Ti ) i=1 X iMD X i ]
2 N 2 N N i 1

which is the usual estimator for OLS

2
Ti
Ni=1 t=1 (y it -ai -x itb)2

ˆ 
 N
i=1 Ti - N - K 
(Note the degrees of freedom correction)

PCSE Remark: All previous remarks apply to the FEM.


• We build the SE according to the type of data we have:
- If we do not suspect autocorrelated errors –not a strange situation–,
we can rely on clustered White SE’s (S0).
- If we suspect autocorrelated errors, then the Driscoll and Kraay SE
should be used.

R.S, 2023 - Do not post/shared without written authorization 41


RS-15 – Panel Data

FEM: Testing for Fixed Effects


• Under H0 (No FE): α = α for all 𝑖.
 That is, we test whether to pool or not to pool the data.

• Different tests:
– F-test based on the LSDV dummy variable model: constant or zero
coefficients for D. Test follows an 𝐹 , distribution.
– F-test based on FEM (the unrestricted model) vs. pooled model (the
restricted model). Test follows an 𝐹 , distribution.
– A LR can also be done –usually, assuming normality. Test follows a
χ distribution.

FEM: Hypothesis Testing

• Based on estimated residuals of the fixed effects model.


(1) Estimate FEM:
𝑦 , = 𝒙 , ′ + 𝛼 + 𝜀 , ,  Keep residuals 𝑒 ,,

(2) Tests as usual:


– Heteroscedasticity
• Breusch and Pagan (1980)
– Autocorrelation: AR(1)
• Breusch and Godfrey (1981)
NT2  eFE ' eFE1  d
LM    
12
T 1  eFE ' eFE 

R.S, 2023 - Do not post/shared without written authorization 42


RS-15 – Panel Data

Application: Cornwell and Rupert Data (Greene)


Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years
Variables in the file are: (Not used in regressions)
EXP = work experience, EXPSQ = EXP2
WKS = weeks worked
OCC = occupation, 1 if blue collar,
(IND = 1 if manufacturing industry)
(SOUTH = 1 if resides in south)
SMSA = 1 if resides in a city (SMSA)
MS = 1 if married
FEM = 1 if female
UNION = 1 if wage set by unioin contract
ED = years of education
(BLK = 1 if individual is black)
LWAGE = log of wage = dependent variable in regressions (Y)
These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with
Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal
of Applied Econometrics, 3, 1988, pp. 149-155.

Application: Cornwell and Rupert (Greene)

(1) Returns to Schooling – Pooled OLS Results

RSS & R2 X only

R.S, 2023 - Do not post/shared without written authorization 43


RS-15 – Panel Data

Application: Cornwell and Rupert (Greene)

(2) Returns to Schooling – LSDV Results

N+K

RSS & R2 X and group


effects

FEM: Testing for FE (and other formulations)

Pooled
FEM

• Calculations:
F-test594,3566 = [(651.78 - 83.89)/594]/[83.89/3566] = 40.64 (reject H0)

R.S, 2023 - Do not post/shared without written authorization 44


RS-15 – Panel Data

The Random Effects Model (REM)


• Recall the general DGP:
𝑦 , = 𝒙 , ′  + 𝒛 ′γ + 𝜀 , –observation for individual 𝑖 at time 𝑡.

When the observed characteristics are constant for each individual, a


FEM is not an effective tool because such variables cannot be included.

• An alternative approach, known as a random effects (REM) model


that, subject to two conditions, provides a solution to this problem.

• Conditions:
(1) It is possible to treat each of the unobserved 𝑍 variables as being
drawn randomly from a given distribution.
(2) The 𝑍 variables are distributed independently of all of the 𝑿
variables.  E[𝒛 ′𝑿 ] = 0.

The Random Effects Model (REM)


• Conditions:
(1) Randomly drawn unobserved Zp variables.
 the ci may be treated as RV (thus, the name of this approach) drawn
from a given distribution. Let’s call it 𝑢 . Then,
𝑦, =𝛽 ∑ 𝛽 𝒙 , 𝑢 𝛿𝑡 𝜀,
=𝛽 ∑ 𝛽 𝒙 , 𝛿𝑡 𝑤, 𝑤, = 𝑢 𝜀,

• We deal with the unobserved effect by subsuming it into a compound


disturbance term, 𝑤 , . We assume that 𝑢 ~ D(0, σ ). Then,
𝐸 𝑤 , = E[𝑢 𝐸 𝜀, =0

The zero mean assumption –E[𝑢 ] = 0– is not crucial, any nonzero


component is being absorbed by the intercept, 1.

R.S, 2023 - Do not post/shared without written authorization 45


RS-15 – Panel Data

The Random Effects Model (REM)


(2) Zp is independently of all of the Xj variables.
 Otherwise, 𝑢 (& 𝑤 , ) will not be uncorrelated with 𝑿 . The RE
estimation will be biased and inconsistent.

Note: We would have to use the FEM, even if the first condition seems
to be satisfied.

• If (1) and (2) are satisfied, we can use the REM, and OLS will work,
but there is a complication: 𝑤 , is heteroscedastic.

REM: Error Components Model


• REM Assumptions:
𝑦, =𝒙, ′+𝑐 +𝜀, =𝒙, ′+𝑢 +𝜀, =𝒙, ′+𝑤,
E[𝜀 , |𝑿 ] = 0
E[ , |𝑿 ] = σ
E[𝑢 |𝑿 ] = 0
E[𝑢 |𝑿 ] = σ
E[𝑢 𝜀 , |𝑿 ] = E[uijt |𝑿 ] = 0 –𝑢 & 𝜀 are independent.
E[𝑢 𝑢 |𝑿 ] = 0 (𝑖≠𝑗) – no cross-correlation of RE.
E[𝜀 , 𝜀 , |𝑿 ] = 0 (𝑖≠𝑗) – no cross-correlation for errors, 𝜀 , .
E[𝜀 , 𝜀 , |𝑿 ] = 0 (𝑡≠𝑠) – there is no autocorrelation for 𝜀 , .
2wit  u2i it  u2i  2it  2ui ,it  u2  2
w it1 w it 2
  ( u i   it1 )( u i   it2 )   u2

R.S, 2023 - Do not post/shared without written authorization 46


RS-15 – Panel Data

REM: Notation (Greene)

 y1   X1   ε1   u1i1  T1 observations
y  X     
 2    2  β   ε 2   u2i2  T2 observations
           
       
 yN   XN  εN  uNiN  TN observations
= Xβ+ε+u Ni=1 Ti observations
= Xβ+w
In all that follows, except where explicitly noted, X, X i
and xit contain a constant term as the first element.
To avoid notational clutter, in those cases, x it etc. will
simply denote the counterpart without the constant term.
Use of the symbol K for the number of variables will thus
be context specific but will usually include the constant term.

REM: Notation (Greene)


 2   u2  u2   u2 
 
u 2
  u 
2 2
 u2 
Var[ε i +u ii ]  
     
 2
  u u      u 
2 2 2

=   I Ti   u ii Ti  Ti
2 2

=  2 I Ti   u2ii
= Ωi
 Ω1 0  0 
0 Ω2  0  (Note these differ only
Var[ w | X ]   
      in the dimension Ti )
 
 0 0  Ω N 

• Note: If E[𝜀 , 𝜀 , |𝑿 ] = 0 (𝑖≠𝑗) or E[𝜀 , 𝜀 , |𝑿 ] = 0 (𝑡≠𝑠), we


no longer have this nice diagonal-type structure for Var[w|X].

R.S, 2023 - Do not post/shared without written authorization 47


RS-15 – Panel Data

REM: Assumptions - Convergence of Moments

X X X X
N
 Ni1 fi i i  a weighted sum of individual moment matrices
i1 T Ti
X ΩX X Ω X
N
 Ni1 fi i i i  a weighted sum of individual moment matrices
i1 T Ti
X i X i
=  2 Ni1 fi   u2 Ni1 fi x i x i
Ti
X i X i
Note asymptotics are with respect to N. Each matrix is the
Ti
moments for the Ti observations. Should be 'well behaved' in micro
level data. The average of N such matrices should be likewise.
T or Ti is assumed to be fixed (and small).

REM: Pooled OLS Estimation (Greene)


• Standard results for the pooled OLS estimator b in the GR model
- Consistent and asymptotic normal
- Unbiased
- Inefficient

• We can use pooled OLS, but for inferences we need the true
variance –i.e., the sandwich estimator:
1 1
1  XX  XΩX  XX 
Var[b | X]  N  N   
i1 Ti  i1 Ti  Ni1 Ti  Ni1 Ti 
 0   Q-1   Q *   Q-1
 0 as N   with our convergence assumptions

R.S, 2023 - Do not post/shared without written authorization 48


RS-15 – Panel Data

REM: Sandwich Estimator for OLS (Greene)


1 1
1  X X   X Ω X   X X 
V a r[ b | X ]   N    
 i  1 T i   i  1 T i    Ni  1 T i    Ni  1 T i 
N

X Ω X X Ω X
N
  Ni 1 f i i i i , w h e re = Ω i = E [ w i w i | X i ]
 i1 T Ti
In th e s p irit o f th e W h ite e s tim a to r, u s e
X Ω X X w
ˆ wˆ X
ˆ i = y i - X ib
  Ni 1 f i i i i i , w
 Ni 1 T Ti
H y p o th e s is te s ts a re th e n b a s e d o n W a ld s ta tis tic s .

T H I S I S T H E 'C L U S T E R ' E S T I M A T O R

• Recall: Clustered standard errors or PCSE


There is a grouping, or “cluster,” within which the error term is
possibly correlated, but outside of which (across groups) it is not.

REM: Sandwich Estimator – Mechanics (Greene)

Est.Var[b | X ]   X X 
1
 N
i 1 X i w  1
ˆ i X i  X X 
ˆ iw
ˆ i = set of Ti OLS residuals for individual i.
w
X i = Ti xK data on exogenous variable for individual i.
X i w
ˆ i = K x 1 vector of products
( X i w ˆ i X i )  KxK matrix (rank 1, outer product)
ˆ i )( w
   X wˆ  wˆ X  
N
i 1 i i i i = sum of N rank 1 matrices. Rank  K.

We could compute this as  Ni1 X i  w 


ˆ i  X i =  Ni1 X i Ω
ˆ iw ˆ X .
i i     
Why not do it that way?

R.S, 2023 - Do not post/shared without written authorization 49


RS-15 – Panel Data

REM: GLS
• Standard results for GLS in a GR model
- Consistent
- Unbiased
- Efficient (if functional form for Ω correct)

ˆ = [ X Ω
β -1
X ]  1 [ X Ω -1
y ]
= [ N
i 1 X i Ω -1
i X i] 1
[ N
i 1 X i Ω -1
i y i]
1    2
Ω -1
i  2 
I Ti  2 2
ii  

      T i u 
( n o t e , d e p e n d s o n i o n ly th r o u g h T i )

• As usual, the matrix Ω-1/2 = P will be used to transform the data.

REM: GLS
• The matrix Ω-1/2 = P is used to transform the data. That is,
y it   y i   x it   x i    it
where i  1   2  2
  T i  u2 
Asy .Var [ ˆ GLS ]  ( X '   1 X )  1   2 ( X *' X *)  1

• We call the transformed data: quasi time-demeaned data. As expected,


GLS is just pooled OLS with the transformed data.

Note: The RE can be seen as mixture of two estimators:


- when θ = 0 (σu= 0)  Pooled OLS estimator
- when θ = 1 (σε= 0 or σu→ ∞)  LSDV estimator (ui’s
become the FE)

Then, the bigger (smaller) the variance of the unobserved effect –i.e.,
individual heterogeneity is bigger–, the closer it is to FE (pooled
OLS). Also, when T is large, it becomes more like FE.

R.S, 2023 - Do not post/shared without written authorization 50


RS-15 – Panel Data

REM: FGLS - Estimators for the Variances


• To transform the data, we need to estimate σ and σ , consistently.
• Usual steps (assume a balanced panel):
(1) Start with a consistent estimator of β. For example, pooled OLS, b.
(2) Compute ∑ ∑ 𝑦, 𝒙, b –estimates ∑ ∑ σ σ
(3) Divide by a function of NT. For example: NT – K – 1
 We estimate σ2, s =𝒆 ′𝒆 /(NT – K – 1)
We will use s to estimate the sum: σ σ
(4) Use LSDV estimation to get 𝑎 and 𝒃 . Keep residuals, 𝑒 ,, .
(5) Compute Σi Σt (𝑦 , 𝑎 –𝒙, 𝒃 )2 – estimates ∑ ∑ σ
(6) To estimate σ , divide by NT – K – N:
𝑠 =∑ ∑ 𝑒 ,, /(NT – K – N)
(7) Estimate σ as s = s –𝑠

REM: FGLS - Estimators for the Variances


Feasible GLS requires (only) consistent estimators of 2 and u2 .
Candidates:
2 Ni1 tTi 1 (y it  ai  x itbLSDV )2
From the robust LSDV estimator: 
ˆ 
Ni1 Ti  K  N
Ni1 tTi 1 (y it  aOLS  x itb OLS )2
From the pooled OLS estimator: 2  u2 
Ni1 Ti  K  1
N (y  a  x ib MEANS )2
From the group means regression: 2 / T  u2  i1 it
N  K 1
2 2 Ni1 tTi 11 sTi t 1 w
ˆ it w
ˆ is
(Wooldridge) Based on E[w it w is | X i ]  u if t  s,  ˆu 
Ni1 Ti  K  N
There are many others.

Note: A slight chance in notation, 𝒙 , does not contain the constant


term.

R.S, 2023 - Do not post/shared without written authorization 51


RS-15 – Panel Data

REM: Practical Problems with FGLS


All of the preceding regularly produce negative estimates of  u2 .
Estimation is made very complicated in unbalanced panels.
A bulletproof solution (originally used in TSP, now LIMDEP and others).
2 Ni1  tTi 1 (y it  ai  x it bLSDV ) 2
From th e robust LSDV estimator: 
ˆ 
Ni1 Ti
Ni1  tTi 1 (y it  aOLS  x it b OLS ) 2
From the pooled OLS estimator:  2  u2  
ˆ
2

 Ni1 Ti
2 Ni1  tTi 1 (y it  aOLS  x it b OLS ) 2  Ni1  tTi 1 (y it  ai  x it bLSDV ) 2

ˆu  0
Ni1 Ti

• Bullet proof solution: Do not correct by degrees of freedom. Then,


given that the unrestricted RSS (LSDV) will be lower than the
restricted (pooled OLS) RSS, σu2 will be positive!

Application: Fixed Effects Estimates (Greene)

----------------------------------------------------------------------
Least Squares with Group Dummy Variables..........
LHS=LWAGE Mean = 6.67635
Residuals Sum of squares = 82.34912
Standard error of e = .15205
These 2 variables have no within group variation.
FEM ED
F.E. estimates are based on a generalized inverse.
--------+-------------------------------------------------------------
Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X
--------+-------------------------------------------------------------
EXP| .11346*** .00247 45.982 .0000 19.8538
EXPSQ| -.00042*** .544864D-04 -7.789 .0000 514.405
OCC| -.02106 .01373 -1.534 .1251 .51116
SMSA| -.04209** .01934 -2.177 .0295 .65378
MS| -.02915 .01897 -1.536 .1245 .81441
FEM| .000 ......(Fixed Parameter).......
UNION| .03413** .01491 2.290 .0220 .36399
ED| .000 ......(Fixed Parameter).......
--------+-------------------------------------------------------------

R.S, 2023 - Do not post/shared without written authorization 52


RS-15 – Panel Data

REM: Computing Variance Estimators (Greene)

Using full list of variables (FEM and ED are time invariant)


OLS sum of squares = 522.2008.
2 +u2 = 522.2008 / (4165 - 9) = 0.12565.
Using full list of variables and a generalized inverse (same
as dropping FEM and ED), LSDV sum of squares = 82.34912.
2 = 82.34912 / (4165 - 8-595) = 0.023119.
u2  0.12565 - 0.023119 = 0.10253
Both estimators are positive. We stop here. If u2 were
negative, we would use estimators without DF corrections.

REM: Application (Greene)


----------------------------------------------------------------------
Random Effects Model: v(i,t) = e(i,t) + u(i)
Estimates: Var[e] = .023119
Var[u] = .102531
Corr[v(i,t),v(i,s)] = .816006
Lagrange Multiplier Test vs. Model (3) =3713.07
( 1 degrees of freedom, prob. value = .000000)
(High values of LM favor FEM/REM over CR model)
Fixed vs. Random Effects (Hausman) = .00 (Cannot be computed)
( 8 degrees of freedom, prob. value = 1.000000)
(High (low) values of H favor F.E.(R.E.) model)
Sum of Squares 1411.241136
R-squared -.591198
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
EXP .08819204 .00224823 39.227 .0000 19.8537815
EXPSQ -.00076604 .496074D-04 -15.442 .0000 514.405042
OCC -.04243576 .01298466 -3.268 .0011 .51116447
SMSA -.03404260 .01620508 -2.101 .0357 .65378151
MS -.06708159 .01794516 -3.738 .0002 .81440576
FEM -.34346104 .04536453 -7.571 .0000 .11260504
UNION .05752770 .01350031 4.261 .0000 .36398559
ED .11028379 .00510008 21.624 .0000 12.8453782
Constant 4.01913257 .07724830 52.029 .0000

R.S, 2023 - Do not post/shared without written authorization 53


RS-15 – Panel Data

Testing for Random Effects: LM Test


• We want to test for RE. That is,
H0: σ =0.
• We can use the Breusch-Pagan (1980) Test for RE effects. Similar to
the LM-BP test for autocorrelation, it is based on the pooled OLS
residuals, 𝒆 . It is easy to compute – distributed as 12:

B r e u s c h a n d P a g a n L a g r a n g e M u lt ip lie r s t a t is t ic
A s s u m in g n o r m a lit y ( a n d f o r c o n v e n ie n c e n o w , a
b a la n c e d p a n e l)
2 2
NT   Ni  1 ( T e i2 )  NT   Ni  1 [ ( T e i2 )  e i e i ] 
LM =  N  1   
N 2
2 ( T - 1 )   i 1  i 1 e it  2 (T -1 )   Ni  1 e i e i 
C o n v e r g e s t o c h i- s q u a r e d [ 1 ] u n d e r t h e n u ll h y p o t h e s is
o f n o c o m m o n e f f e c t s . ( F o r u n b a la n c e d p a n e ls , t h e
s c a le in f r o n t b e c o m e s (  Ni  1 T i ) 2 / [ 2  Ni  1 T i ( T i  1 ) ] . )

REM: LM Test Application – Cornwell-Rupert

Note: Check the different


standard errors from both
models.

R.S, 2023 - Do not post/shared without written authorization 54


RS-15 – Panel Data

FE vs. RE: Understanding Differences


• Suppose, we want to study the effect of an MBA on stock trading,
controlling for other factors such as income and experience. We have a
panel, with individuals measured annually over 10 years. We expect
some year-to-year correlation with a given individual i (with
unobservable individual-level effects accounting for part of the
correlation between yearly trading within the same i).

• To understand the difference between FE & RE, we ask:


- Does a regression coefficient for an MBA represent a comparison of
two i’s, one with an MBA and one without one?  Between Effect (RE).
- Or does it compare two yearly trading records from the same i who
happened to receive an MBA in the interim?  Within Effect (FEM).

FE vs. RE: Understanding Differences


• Between Effect (REM): GLS  consistent and efficient (under H0)
- Between-individual effects and within-individual effects are identical.
- Very efficient, no data is thrown away.

• Within Effect (FEM): OLS  consistent estimates.


- No confounding due to unmeasured i-level characteristics.
- Cost: All the between-individual comparisons in the data are thrown
away.

• Q: Are within and between MBA effects the same?

R.S, 2023 - Do not post/shared without written authorization 55


RS-15 – Panel Data

FE vs. RE
• Q: RE estimation or FE estimation?
• Case for RE:
– Under no omitted variables –or if the omitted variables are
uncorrelated with 𝒙 , in the model– then a REM is probably best: It
produces unbiased and efficient estimates, & uses all the data available.
– RE can deal with observed characteristics that remain constant for
each individual. In FE, they have to be dropped from model.
– In contrast with FE, RE estimates a small number of parameters
– We do not lose N degrees of freedom.
– Philosophically speaking, a REM is more attractive: Why should we
assume one set of unobservables fixed and the other random?

FE vs. RE
• Case against RE:
- If either of the conditions for using RE is violated, we should use FE.

• Condition (1): Randomly drawn unobserved Zp variables.


This is a reasonable assumption in many cases: Many of the panels are
designed to be a random sample (for example, NLSY).
But, it would not be a reasonable assumption if the units of
observation in the panel data set were data from the S&P 500 firms.

• Condition (2): Zp is independently of all of the 𝒙 variables.


A violation of condition (2) causes inconsistency in the RE estimation

R.S, 2023 - Do not post/shared without written authorization 56


RS-15 – Panel Data

FE vs. RE
• FE estimation is always consistent. On the other hand, a violation of
condition (2) causes inconsistency in the RE estimation.

That is, if there are omitted variables, which are correlated with the 𝒙 ,
in the model, then the FEM provides a way for controlling for omitted
variable bias. In a FEM, individuals serve as their own controls.

• Q: How can we tell if condition (2) is violated?


A: A DHW test can help.

DHW (Hausman) Specification Test: FE vs. RE


Estimator Random Effects Fixed Effects
E[𝑐 | 𝒙 , ] = 0 E[𝑐 | 𝒙 , ] ≠ 0

FGLS Consistent and Inconsistent


(Random Effects) Efficient
LSDV Consistent Consistent
(Fixed Effects) Inefficient Possibly Efficient

• Under an H0 (RE is true), we have one estimator that is efficient (RE)


and one inefficient (LSDV). We can use a Durbin-Hausman-Wu test.
As in its other applications, the DHW test determines whether the
estimates of the coefficients, taken as a group, are significantly different
in the two regressions.

R.S, 2023 - Do not post/shared without written authorization 57


RS-15 – Panel Data

DHW (Hausman) Specification Test: FE vs. RE


ˆ -β
Basis for the test, β ˆ
FE RE

ˆ -β
ˆ =β
Wald Criterion: q ˆ ;W =q
ˆ [Var( q
ˆ )]-1 q
ˆ
FE RE

A lemma (Hausman (1978)): Under the null hypothesis (RE)


ˆ - β ] 
nT [β d
N[0 ,VRE ] (efficient)
RE

ˆ - β ] 
nT [β d
N[0 ,VFE ] (inefficient)
FE

ˆ - β )-( β
ˆ = (β
Note: q ˆ  β ). The lemma states that in the
FE RE

joint limiting distribution of ˆ - β ] and


nT [ β ˆ , the
nT q
RE

limiting covariance, C Q,RE is 0 . But, C Q,RE = C FE,RE - VRE . Then,


Var[ q] = VFE + VRE - C FE,RE - C FE,RE . Using the lemma, C FE,RE = VRE .
It follows that Var[ q]= VFE - VRE . Based on the preceding
ˆ -β
H=( β ˆ ) [Est.Var( β
ˆ ) - Est.Var( β
ˆ )]-1 ( β
ˆ -β ˆ )
FE RE FE RE FE RE

Note: β does not contain the constant term.

DHW (Hausman) Specification Test: FE vs. RE

• Then following the structure of the DHW test we saw in Lecture 8:


H = (bFEM – bREM)’ V-1(bFEM – bREM)
where V = VFEM – VREM.

Note: Columns of zeroes will show in VFEM if there are time invariant
variables in 𝒙 , . (Also, β does not contain the constant term.)

R.S, 2023 - Do not post/shared without written authorization 58


RS-15 – Panel Data

Computing the DHW Statistic


1
 N  1  
ˆ ]
Est.Var[β ˆ  i1 X i  I  ii  X i 
2
FE
  Ti  
-1
 N  ˆ i   Ti
ˆu
2
ˆ ]
Est.Var[β ˆ
2
 X  I  ii X , 0  
ˆ =  1
RE   i1 i   i i 2 2
  Ti   
ˆ   Ti ˆu
As long as  2
ˆ  and 
2
ˆ u are consistent, as N  , Est.Var[β ˆ ]
ˆ ]  Est.Var[β
FE RE

will be nonnegative definite. In a finite sample, to ensure this, both must


2
be computed using the same estimate of 
ˆ  . The one based on LSDV will
generally be the better choice.

ˆ ] if there are time


Note that columns of zeros will appear in Est.Var[β FE

invariant variables in X.

Note: Pooled OLS is consistent, but inefficient under H0. Then, the
RE estimation is GLS.

DHW Specification Test: Application (Hoechle)

• Bid-Ask Spread Panel estimation.

• Rejection at the 5% level, like in this case, indicates that βFE ≠ βRE.
- Usually, this result is taken as an indication of a FEM.

R.S, 2023 - Do not post/shared without written authorization 59


RS-15 – Panel Data

DHW Specification Test: Application (Greene)


+--------------------------------------------------+
| Random Effects Model: v(i,t) = e(i,t) + u(i) |
| Estimates: Var[e] = .235236D-01 |
| Var[u] = .133156D+00 |
| Corr[v(i,t),v(i,s)] = .849862 |
| Lagrange Multiplier Test vs. Model (3) = 4061.11 |
| ( 1 df, prob value = .000000) |
| (High values of LM favor FEM/REM over CR model.) |
| Fixed vs. Random Effects (Hausman) = 2632.34 |
| ( 4 df, prob value = .000000) |
| (High (low) values of H favor FEM (REM).) |
+--------------------------------------------------+

• The DHW statistic is used to tests the difference in coefficients


between an RE and FE models
- A rejection, like in this case, indicates that βFE ≠ βRE
- But, rejecting H0 does not imply necessarily H1 is “accepted.”
- Either the model is misspecified or ui and xit are correlated
- Q: Is the model misspecified (any variable missing)?

Wu (Variable Addition) Test


• Under the FE assumptions, the common unobserved effect is
correlated with the group means.

Add the group means to the RE model. If statistically significant, this


suggests that the RE model is inappropriate.

• In a panel context, tests based on a regression can be more


computationally more stable, since no problems with non-positive
definiteness are encountered.

• Since the errors and the unobserved effect may not be i.i.d. white noise,
Wooldridge (2009) suggests using PCSE.

R.S, 2023 - Do not post/shared without written authorization 60


RS-15 – Panel Data

Mundlak (Augmented) Regression (Greene)


+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
|EXPBAR | -.08769*** .00162096 -54.099 .0000 19.853782|
|OCCBAR | -.14806*** .03623348 -4.086 .0000 .5111645|
|SMSABAR | .21707*** .03209640 6.763 .0000 .6537815|
|MSBAR | .14855*** .05087686 2.920 .0035 .8144058|
|UNYNBAR | .07831** .03257465 2.404 .0162 .3639856|
|WKSBAR | .00857** .00362039 2.367 .0179 46.811525|
|INDBAR | .03998 .02966215 1.348 .1777 .3954382|
|SOUTHBAR| -.05487 .04293224 -1.278 .2012 .2902761|
|EXP | .11448*** .00225862 50.684 .0000 19.853782|
|EXPSQ | -.00045*** .483957D-04 -9.304 .0000 514.40504|
|OCC | -.02122 .01380348 -1.537 .1243 .5111645|
|SMSA | -.04237** .01945829 -2.178 .0294 .6537815|
|MS | -.02969 .01901293 -1.561 .1184 .8144058|
|FEM | -.31359*** .05419945 -5.786 .0000 .1126050|
|UNION | .03268** .01494574 2.187 .0288 .3639856|
|ED | .05150*** .00550816 9.349 .0000 12.845378|
|BLK | -.15768*** .04463738 -3.533 .0004 .0722689|
|WKS | .00081 .00060031 1.354 .1759 46.811525|
|IND | .01909 .01546993 1.234 .2171 .3954382|
|SOUTH | -.00176 .03435229 -.051 .9592 .2902761|
|Constant| 5.15038*** .20122987 25.595 .0000 |
+--------+------------------------------------------------------------+

Wu Test: Application 1 (Greene)

--> matr;bm=b(1:8);vm=varb(1:8,1:8)$
--> matr;list;wutest=bm'<vm>bm$

Matrix WUTEST has 1 rows and 1 columns.


1
+--------------
1| 3006.13788
--> calc;list;ctb(.95,8)$
+------------------------------------+
| Listed Calculator Results |
+------------------------------------+
Result = 15.507313

R.S, 2023 - Do not post/shared without written authorization 61


RS-15 – Panel Data

Wu Test: Application 2 (Hoechle)


• Bid-Ask Spread Wu test estimation with PCSE’s. Stata code:

Wu Test: Application 2 (Hoechle)


• Bid-Ask Spread Wu test estimation with Driscoll and Kraay SE’s. Stata
code for auxiliary regression:

• Now, you cannot reject the REM at the 5% level. Here you can say,
“after accounting for cross-sectional and temporal dependence, the
Hausman test indicates that the coefficient estimates from pooled OLS
estimation are consistent.”

• Different PCSE’s can give different results.

R.S, 2023 - Do not post/shared without written authorization 62


RS-15 – Panel Data

DHW Specification Test: Remarks


• Issues with Hausman tests –as discussed in Wooldridge (2009):
(1) Fail to reject means either:
- FE and RE are similar -i.e., this is great!
- FE estimates are very imprecise
- Large differences from RE are nevertheless insignificant
- That can happen if the data are awful/noisy. Be careful.
(2) Watch for difference between “statistical significance” and “practical
significance.”
- With a huge sample, the Hausman test may "reject" even though
RE is nearly the same as FE
- If differences are tiny, you can feel comfortable using the REM.
(3) PCSE’s matter  Q: Which ones to use?

Allison’s Hybrid Approach


• Allison (2009) suggests a ‘hybrid’ approach that provides the benefits
of FE and RE
– Also discussed in Gelman & Hill (2007) textbook
– Builds on idea of decomposing X into mean, deviation

Steps:
– 1. Compute case-specific mean variables
– 2. Transform X variables into deviations (within transformation)
– 3. Do not transform the dependent variable Y
– 4. Include both X deviation & X mean variables
– 5. Estimate with a RE model

R.S, 2023 - Do not post/shared without written authorization 63


RS-15 – Panel Data

Allison’s Hybrid Approach

• Benefits of hybrid approach:


– 1. Effects of “X-deviation” variables (within effects) are equivalent
to results from a FEM.
• All time-constant factors are controlled
– 2. Effects of time-constant X variables (between effects)
– 3. You can build a general multilevel model
• Random slope coefficients; more than 2 level models…
– 4. You can directly test FE vs RE
– No Hausman test needed
• REM: X-mean and X-deviation coefficients should be equal
• Conduct a Wald test for equality of coefficients
– Also differing X-mean & X-deviation coefficents are
informative.

Measurement Error

• It can have a severe effect on panel data models.


• It is no longer obvious that a panel data estimator is preferred to a
cross-section estimator.
• Measurement error often leads to “attenuation” of signal to noise
ratio in panels – biases coefficients towards zero.

R.S, 2023 - Do not post/shared without written authorization 64


RS-15 – Panel Data

Heteroskedasticity - Review
• Given that there is a cross-section component to panel data, there will
always be a potential for heteroskedasticity.

• Although there are various tests for heteroskedastcity, as with


autocorrelation there is a tendency to automatically use NW’s PCSE,
which removes the problem. This is fine for the FEM. But not for the
REM: REM tell us the structure of heteroscedasitcity => GLS!

• Baltagi (1995) allows ui to be heteroscedastic. There is an efficiency


problem, however, since only one observation --the estimated error ui,,
repeated Ti per individual-- will be used to estimate s2u(i).

Autocorrelation - Review
• Although different to autocorrelation using the usual univariate
models, a version of the Breusch-Pagan LM test can be used.

• To deal with autocorrelated errors, we can use the usual methods, say
pseudo-differencing. In general, we will estimate using the LSDV
residuals.

• If we allow  to vary with i, we lose power (in general, T is small).

• We can also reformulate the model, by building a ‘Dynamic Model,’


which basically involves adding a lagged dependent variable.

• As usual, OLS plus NW’s PCSE can help you to avoid a complicated
FGLS estimation. (The usual problems with HAC SE apply.)

R.S, 2023 - Do not post/shared without written authorization 65


RS-15 – Panel Data

PCSE - Review
• Key Assumption
– Correlations within a cluster (a group of firms, a region, different
years for the same firm, different years for the same region) are
the same for different observations.
• Procedure
– (1) Identify clusters using economic theory (industry, year, etc.)
– (2) Calculate clustered standard errors
– (3) Try different ways of defining clusters and see how the
estimated SE are affected. Be conservative, report largest SE.
• Performance
– Not a lot of studies –some simulations done for simple DGPs.
– PCSE’s coverage rates are not very good (typically below their
nominal size).
– PCSE using HAR estimators is a good idea.

Dynamic Panel Models


• What if we have a dynamic process?
• Examples
– Cigarette consumption – lots of inertia.
– Behavioral finance/momentum models --lagged returns
matter.
– We might consider a model like:
yit  yit 1  xit   ci   it
• Now, yit-1 is included as an explanatory variable. Now, => ci and yit-
1 are correlated!
- Issue: FE, RE estimators are biased.
• Time-demeaned (or quasi-demeaned) yit-1 correlated with error.
• FE is biased for small T. Gets better as T gets bigger (30+).
• RE also biased.

R.S, 2023 - Do not post/shared without written authorization 66


RS-15 – Panel Data

Dynamic Panel Models


• One solution: Use FD and instrumental variables
– Strategy: If there’s a problem between the error, it, and lag yi,
let’s find a way to calculate a new version of lag yi that doesn’t
pose a problem
• Idea: Further lags of yi are not an issue in a FD model.
• Use them as “instrumental variables,” as a proxy for lag yi.
– Arellano-Bond (1991): GMM estimator
• A FD estimator.
• Lag of levels as an instrument for differenced yi.
– Arellano-Bover (1995)/Blundell-Bond: “System GMM”
• Expand on this by using lags of differences and levels as IVs.
• Generalized Method of Moments (GMM) estimation.

Dynamic Panel Models: GMM


(Arellano/Bond/Bover, Journal of Econometrics, 1995)
y i,t  βx i,t  y i,t 1  i,t  ui
Dynamic random effects model for panel data.
Can't use least squares to estimate consistently. Can't use FGLS without
estimates of parameters.
Many moment conditions: What is orthogonal to the period 1 disturbance?
E[(i,1  ui )x i,1 ]  0 = K orthogonality conditions, K+1 parameters
E[(i,1  ui )x i,2 ]  0 = K more orthogonality conditions, same K+1 parameters
...
E[(i,1  ui )x i,1 ]  0 = K orthogonality conditions, same K+1 parameters
The same variables are orthogonal to the period 2 disturbance.
There are hundreds, sometimes thousands of moment conditions, even for
fairly small models.

R.S, 2023 - Do not post/shared without written authorization 67


RS-15 – Panel Data

Dynamic Panel Models: GMM


• Key usual assumptions / issues
– Serial correlation of differenced errors limited to 1 lag
– No overidentifying restrictions (No Hansen - Sargan test)
– Q: How many instruments?

• Criticisms:
- Angrist and Pichke (2009): Assumptions are not always plausible.
- Allison (2009)
- Bollen and Brand (2010): Hard to compare models.

Dynamic Panel Models: Remarks I

• General remarks:
- Ignoring dynamics –i.e., lags– not a good idea: omitted variables
problem.
- It is important to think carefully about dynamic processes:
• How long does it take things to unfold?
• What lags does it make sense to include?
• With huge datasets, we can just throw lots in
– With smaller datasets, it is important to think things
through.

R.S, 2023 - Do not post/shared without written authorization 68


RS-15 – Panel Data

Dynamic Panel Models: IV Framework


• Traditional IV panel estimator:

y it  x it   Yit   c i   it

• X = exogenous covariates
• Y = other endogenous covariates (may be related to εit)
• ci = unobserved unit-specific characteristic
• εit = idiosyncratic error
– Treat ci as random, fixed, or use differencing to wipe it out
– Use contemporaneous or lagged X and (appropriate) lags of Y as
instruments in two-stage estimation of yit.

Note: This approach works well if lagged Y is plausibly exogenous.

Time Series Cross Section (TSCS) Data


• Time Series Cross Section (TSCS) Data
- Panel Data with large T, small N
- Example I: economic variables for industrialized countries
Often 10-30 countries
Often around 30 to 40 years of data
- Example II: financial variables
Often more than 1,000 firms
Often 40-50 years of data for well-established markets (10-30
for emerging markets).

– Beck’s (2001) advice:


• No specific minimum for T; but be suspicious of T<10
• Large N is not required (though, it does not hurt)

R.S, 2023 - Do not post/shared without written authorization 69


RS-15 – Panel Data

Time Series Cross Section (TSCS) Data


• Typical complications of TSCS Data
- Heteroskedasticity and autocorrelation
- Autocorrelation cannot be ignored.
- As N grows, the probability of cross-correlations
(contemporaneous and time-varying) also grows.
- Correlation at same time point across cases (world factor
affecting all markets)
- Correlation at different time points across cases (contagion
effects over time)

TSCS Data: OLS PCSE


• Beck and Katz (2001)
– “Old” view: Use FGLS to deal with heteroskedasticity &
correlated errors.
• Problem: This underestimates standard errors.

– New view: Use OLS regression


• With FE to deal with unit heterogeneity
– To address panel heteroskedasticity ⇒ With PCSE’s.
• With FE to deal with unit heterogeneity
– To address serial correlation ⇒ With lagged dependent
variable in the model

R.S, 2023 - Do not post/shared without written authorization 70


RS-15 – Panel Data

TSCS Data: Dynamics


• Beck and Katz (2009) examine dynamic models
– OLS PCSE with lagged Y and FE
• Still appropriate
• Better than some IV estimators
– But, did not compare to System GMM.

• Plumper, Troeger, and Manow (2005)


• FE is not theoretically justified and absorbs theoretically
important variance.
• Lagged Y absorbs theoretically important temporal variation
• Ideally, economic theory must guide model choices.

TSCS Data: Nonstationary Data

• Issue: Analysis of longitudinal (time-series) data is going through big


changes
• Realization that strongly trending data cause problems
– Random walk / unit root (ρ=1) / I(1) / non-stationary or
near integrated data.
– Mixing stationary, I(0) data, with I(1) data.
• The “spurious regression” problem.

– Strategies:
• Tests for unit roots in time series & panel data
• Differencing as a solution
– A reason to try FD models.

R.S, 2023 - Do not post/shared without written authorization 71


RS-15 – Panel Data

Panel Data: Final Remarks


1. Panel data strategies are taught as “fixes”
– How do I “fix” unobserved effects?
– How do I “fix” dynamics/serial correlation?
• But, the fixes really change what you are modeling
• A FE (within) model is a very different look at your data,
compared to pooled OLS.
• Goal: learn the “fixes.” But, think about interpretation.

2. Lots of disagreements in literature


• What is the best “fix”?
• Reformulation of model; final model should have “no
problems” –LSE approach.

Panel Data: Final Remarks


3. Very important: Try a wide range of models
• If your findings are robust, you are doing fine.
• If not, differences may help you figure out a better model.
• In both cases, you will not get “surprised” when your results go
away after following the suggestion of a referee!

R.S, 2023 - Do not post/shared without written authorization 72

You might also like