Panel Data Notes
Panel Data Notes
Dattatrya Shanke
Panel data regression is primarily used to account for the problem of omitted variable bias. In this
chapter, the model assumes that there exist some time-invariant unobserved effect which is considered
to be a random variable drawn randomly from the population just like other observed explained and
explanatory variables and is not a parameter to be estimated.
Let y and x be random observable variables and c be an unobserved random variable. we are interested
in estimating partial effects of Xj ’s on y keeping c constant.
E(y|x,c) = β0 + x β + c
Our interest lies in the K*1 vector β. Therefore, if c is uncorrelated with any of explanatory variable
xj then c can be considered just another random variable affecting y and do not influence the effect of xj
on y. On the other hand, if c is correlated with some xj i.e. Cov(Xj ,c)!=0 for some j, then not including
c in the model (omitted variable) can cause serious problem and the estimated β will not be consistent.
In case of cross-sectional data, we can include in the model,
However, in case of panel data, we have different possibilities to accound for the omitted variable
problem.
A few assumptions that are necessary to estimate β in presence of an unobserved effect,
• c is considered to be time invariant that is c has the same effect on the mean response in each time
period.
1
An unobserved, time-constant variable is called an unobserved effect in panel data analysis. When
t represents different time periods for the same individual, the unobserved eďect is often interpreted as
captur- ing features of an individual, such as cognitive ability, motivation, or early family upbringing,
that are given and do not change over time. Similarly, if the unit of ob- servation is the firm, c con-
tains unobserved firm characteristicssuch as managerial quality or structurethat can be viewed as being
(roughly) constant over the period in question.
Let us write the model in error terms form, yt = β_0 + xt β + c + ut . . . (1)
where, E(u_t|xt ,c)= 0, t=1,2
E(x′t ut )=0, t=1,2
We can take difference of variables across two time periods to eliminate the time invariant variable,
y2 – y1 = (x2 – x1 ) + (u2 – u1 )
∆y = ∆xβ + ∆u
• the orthogonality condition, E(∆ x’∆ u)=0, that means xt and us are uncorrelated.
• the rank, E(∆ x’ ∆ x)=K This means that there must be some variation in xtj over time to be able
to estimate β consistently.
A balanced panel data is such that the same time period is available for all cross-section units. All
the methods mentioned here consider a balanced panel data.
Here, N (cross-sectional dimension) is assumed to be large to focus on the asymptotic properties
and observations are assumed to be as independent, identically distributed from population. T (time
dimension) is assumed to be fixed.
Asymptotic analysis is more plausible if N is sufficiently large relative to T, without any need to make
any extra explicit assumptions. If T is much larger than N, say N = 5 companies and T = 40 years, the
framework becomes multiple time series analysis: N can be held fixed while T –> y.
2
2 Assumptions about the Unobserved Effects and Explanatory Vari-
ables
• cit changes across i only. (called as unobserved component, latent variable and unobserved hetero-
geneity)
In the traditional approach to panel data models, ci is called a random eďect when it is treated as
a random variable and a fixed eďect when it is treated as a parameter to be estimated for each cross
section ob- servation i. However, as discussed in previous section, it always makes sense to treat the
unobserved effects, ci , to be a random variable like other observed variables drawn from the population.
In modern econometrics, the key issue involving ci is whether it is correlated with explanatory variables
or not. Therefore, in modern econometrics, "random effect" is synonymous with no correlation between
the unoserved effect and the observed explanatory variables. On the other hand, "fixed effect" means that
the unobserved effect, ci is arbitrarily correlated with the observed explanatory variables, xit .
Strict Exogeneity Assumptions on the Explanatory Variables
We consider explanatory variables to be random. The strict exogeneity assumpation can be interpreted
as, if xit and ci are controlled for, xis (where s!=t) has no partial effect of yit . In simple words, the
explanatory variables in observed in periods other than the period for which outcome variable is observed
have no effect on outcome variable if the unobserved effect and the explanatory variables in that period
are controlled for.
In contrast, if we do not control for the unobserved effect, then the explanatory variables in some
other period may have some effect on the outcome variable in the current period.
Example: Soyabean output in 2024 is not affected by input use in 2023 if input use in 2024 and
the unobserved effect like soil quality, managerial skills of the farmer are controlled for. However, if the
unobserved effect is not controlled then the input use in 2023 may have some effect on the soyabean
output in 2024.
3
E(yit |xit , ci ) = xi β + ci
Also, the assumption that explanatory variables in each time period are uncorrelated with the idiosyn-
cratic error in each time period is important.
In short, the two crucial assumptions or questions when dealing with panel data are, one, whether the
unobserved effect is correlated with the explanatory variables? Secondly, whether the strict exogeneity
assumption of the explanatory variables is reasonable?.
As with pooled OLS, a random eďects analysis puts ci into the error term. Random effects model put
additional restriction, than needed for pooled OLS, that is strict exogeneity in addition to orthogonality
between ci and xit .
The second assumption is always implied by the assumption that the xit are fixed and E(ci ) = 0, or
by the assumption that ci is independent of xi .
Why do we need assumption 1?
The random eďects approach exploits the serial correlation in the composite error, vit = ci + uit , in a
generalized least squares (GLS) framework. In order to ensure that feasible GLS is consistent, we need
some form of strict exoge- neity between the explanatory variables and the composite error.
where,
vit = ci + uit
For consistency of GLS, we need the usual rank condition for GLS:
rankE(Xi′ Ω− 1Xi ) = K (this is analogous to E(x’x)=k)
The assumptions on the idiosyncratic error in Random Effect are,
4
constant unconditional variance across t:
E(u2it ) = σu2 , t = 1,2,. . . ,T
• The second assumption is that the idiosyncratic errors are serially uncorrelated: E(uit uis ) = 0, all
t!=s
• The third assumption is, E(u2i u′i |xi , ci ) = σu2 It , assumes that conditional variances are constant and
conditional covariances are zero.
σv = σc + σu
Given first two assumptions, as long as these hold, β estimator of random effect is consistent, means
βh at approaches true β estimator in probability as N approaches infinity (asymptotically).
The random effect estimator is efficient in the class of estimators as long as ecpected value of vi
conditional on explanatory variables is zero, i.e. the composite error term is not correlated to explanatory
variables. The RE is asymptotically equivalent to GLS under these three assumptions.
The degress of freedom are [NT(T–1)/2 – K].