00 Panels1e
00 Panels1e
Robert M. Kunst
University of Vienna
February 2011
1 Introduction
The word panel is derived from Dutch and originally describes a rectangular
board. In econometrics, it denotes data sets that have a time dimension as
well as a non-time dimension. A genuine panel has the form
Xit , i = 1, . . . , N, t = 1, . . . , T. (1)
1
(N > T ). Time-series panels are common in macroeconomics, while cross-
section panels are dominant in microeconomics, particularly in labor eco-
nomics. Note, however, that many data sets in labor economics do not corre-
spond to the definition of genuine panels (for example, individual histories of
persons). Typical examples for macroeconomic time-series panels are com-
parative cross-country studies of macroeconomic relationships (consumption
function, Phillips curve). Then, i denotes a country.
Advantages of panel analysis vis-a-vis pure time-series studies are
rooted in the larger sample size, as N T > T for N > 1. The additional infor-
mation may imply an increase in the degrees of freedom for estimating model
parameters and for conducting hypothesis tests. However, this increase in
degrees of freedom requires assumptions on the constancy of such parameters
across the cross-section dimension. For example, the regression
yit = α + β ′ Xit + uit , (2)
with dim (β) = K, will yield more precise information than a regression
based on a single time series, if N T data points are available, as the degrees
of freedom increase from T − K − 1 to N T − K − 1. However, if one allows
α to differ across individuals, i.e. if one replaces α by αi , degrees of freedom
will fall to (T − 1) N − K. If one even replaces the fixed coefficients β by
individual-specific βi , degrees of freedom will be a mere (T − K − 1) N . If
α and β are even allowed to change over time or if correlation across indi-
viduals is permitted in the error term u, all benefits from panel analysis will
definitely disappear. Similar remarks hold when comparing panel analysis
to pure cross-section analysis. Some authors (see Baltagi, Wooldridge)
also argue that panels with an individual dimension generally are more in-
formative than aggregate time series. Note, however, that this argument
may depend on the research aim and that the analysis of macroeconomic
aggregate data can be valuable.
Disadvantages of panel analysis are mirror images of the advantages.
If α is assumed constant, while it really varies across individuals as αi , this
will result in a biased estimate of β. The same holds with regard to the
assumptions on β. If β varies as βi across individuals, then an estimate β̂
can at best represent an approximation to an average of the true coefficients
βi . Therefore, treating data sets as panels will only make sense, if it is a priori
plausible to assume that (parts of the) relationships are constant across the
individual dimension. The situation is comparable to standard regression
analysis, which also needs some rudimentary assumption on the constancy
of the underlying linear relationship.
Most econometric text books contain sections on panels. More details are
found in topic literature. The book by Hsiao is a classic and has recently
2
been re-edited. The popular book by Baltagi contains more formal deriva-
tions as well as hands-on empirical examples. Arellano collects many
recently developed methods for dynamic panel analysis. This book is maybe
less accessible to beginners and many practitioners due to its emphasis on
mathematical formalism.
2 Fixed effects
2.1 The LSDV estimator
At first, we consider the regression model
as our basic model. Note that typically a third index may be added to the
subscripts i and t, as the vector of regressors Xit and thus β have dimension
K.
If the errors uit are independent across time and across individuals with
Eu = 0 and varu = σ 2 , then this is a traditional econometric regression model
that can be estimated via OLS. In panel analysis, this—usually quite restric-
tive and unrealistic—model is called ‘pooled regression’. The more common
assumption is that regression constants vary across individuals (countries,
firms, ...). In this case, many texts on panels use the notation αi in lieu
of α. Alternatively, one may subtract the mean ∑N across individuals from αi
−1
and subsume
∑ the deviations µ i = α i − N i=1 αi in the disturbances u.
Observe µi = 0. The individual-specific constants αi are called effects.
With these specifications, the model obtains the form
where various assumptions can be used for the properties of µi and νit . The
simplest assumption is that all µi are fixed unknown values. Thus, the µi
become model parameters, together with α and β, which however can only
be estimated if T gets large, not for N → ∞. In statistics, such parameters
on which information does not increase as the sample ∑ size grows are called
incidental parameters. An important assumption is N i=1 µi = 0, otherwise
the global intercept α is not identified. Note that this assumption does
not represent a restriction but it is necessary in order to make the model
empirically valid. Because the µi are fixed values, the model is called the
‘fixed-effects’ model (FE).
3
To keep the presentation simple, at first we assume that the remaining
errors νit are independent with Eνit = 0 and varνit = σ 2 . A useful general-
ization would be to assume that varνit = σi2 , i.e. that νit are heteroskedastic
across the individual dimension. In some applications, it may also make sense
to allow for correlation of errors across individuals.
For this FE model, we wish to construct an efficient estimator. It will cer-
tainly not be simple OLS, as Euit = µi , which violates one of the assumptions
for the Gauss-Markov theorem. Rather, one may interpret µi as coefficients
of individual dummy variables that equal 1 for i and are 0 otherwise. The
N –vector that contains all these dummies will be denoted as Zµ,it . For every
given i and t, there will be exactly one element of 1 in this vector, while all
other elements are 0. With these definitions, we have the regression
4
of each variable:
y = αιN T + Xβ + Zµ µ + ν.
The vector y has dimension N T . Below T observations for the first individual
i = 1, it contains T observations for the second individual etc. The vector
ιN T simply consists of N T ones. The matrix X has dimension N T × K, and
the vector β has dimension K. The matrix Zµ has dimension N T × N and
very simple structure:
1 0 ··· 0
.. .. ..
. . .
1 0
0 1
. .
. .
. .
Zµ = .
0 1
...
0 1
.. ..
. .
0 ··· 0 1
The vector µ has dimension N and contains the individual deviations from
the global level α, i.e. µi .
Collecting the matrices X and Zµ in a big N T ×(K + N )–matrix Z allows
representing the OLS estimator as (Z′ Z)−1 Z′ y (without a constant, to avoid
the singularity problem). Because this estimator involves an inversion of an
(N + K) × (N + K)–matrix, which can be a really big matrix, it is more con-
venient to apply an algebraic result that is nicknamed the Frisch-Waugh the-
orem in the econometric literature. According to the Frisch-Waugh theorem,
5
the coefficient estimate β̂ for any given OLS regression (in this derivation,
the notation does not match the remainder of the text)
y = Xβ + Zγ + u
y = Zδy + ỹ,
xj = Zδxj + x̃j .
Then, the residuals from the first-step regression are regressed on each other
(X̃ collects the columns x̃j , j = 1, . . . , K)
ỹ = X̃β + u.
This two-step procedure yields the same (numerically identical) estimate for
β̂ and the same residuals û.
If the Frisch-Waugh theorem is applied to the FE model, one may first
regress all variables y and X—these are essentially the interesting variables—
on the uninteresting constants and dummies. Then, the ‘purged’ ỹ and X̃
are regressed on each other. This two-step sequence will then yield the FE
estimator ( )−1
β̂F E = X̃′ X̃ X̃′ ỹ,
where only a K × K–matrix is inverted. Note that the residuals are just the
variables that have been adjusted for (or ‘purged from’) the individual time
averages.
In order to attain a direct closed form in the original variables, we have to
determine the matrix that performs the purging operation on the variables.
Baltagi calls this matrix Q. It has the form
Q = IN T −T −1 IN ⊗ ιT ι′T , (6)
6
the fine structure. [Mnemonic: most people are right-handed and can only
perform coarse work with their left hands]
Here and in the following, ι denotes a vector of ones, I is an identity ma-
trix. Wherever it eases understanding, dimensions are denoted by subscript
indices. Thus, ιι′ is a quadratic matrix of ones, I ⊗ ιι′ is a ‘block-diagonal’
matrix that repeats N blocks of form ιι′ along its main diagonal. Then, Q
has the representation
Q̃ 0 ··· 0
..
0 Q̃ .
Q= . = IN ⊗ Q̃
. ...
. 0
0 ··· 0 Q̃
7
manipulations, note that the matrix Q eliminates the constant α as well as
the expression Zµ µ, such that Qy can be expressed by QXβ + Qν.
( )( )′
varβ̂F E = E β̂F E − β β̂F E − β
{ }{ }′
−1 −1
= E (X′ QX) X′ Qy − β (X′ QX) X′ Qy − β
{ }{ }′
′ −1 ′ ′ −1 ′
= E (X QX) X (QXβ + Qu) − β (X QX) X Qy − β
{ }{ }′
−1 −1
= E (X′ QX) X′ Qν (X′ QX) X′ Qν
−1 −1
= (X′ QX) X′ Q (Eνν ′ ) Q′ X (X′ QX)
−1
= σ 2 (X′ QX) (9)
2.4 An example
Baltagi analyzes a data set on modeling gasoline demand per car. The
dependent variable is explained by three covariates: per capita income, a
relative import price, cars per capita. Data are available for the years 1960–
1978 and for 18 countries. Thus, we have T = 19 and N = 18. The board is
nearly square.
8
Table 1 compares the results for OLS and for LSDV estimation. Both
estimation procedures entail very unsatisfactory Durbin-Watson statistics.
Country effects are not reported here. For example, Spain has a negative µ̂i ,
while Canada, Sweden, and the U.S. show the most positive values for µ̂i .
The three latter countries share the characteristic of regionally low population
density, where cars are used to travel long distances.
( )
Apart from the t–values, which have been computed using tj = β̂j /σ̂ β̂j
( )
and the formulae described above for β̂ and its variance—we use σ̂ β̂j
for the estimated standard error of coefficient β̂—the table also shows log-
likelihood statistics. Note the considerable improvement of the likelihood
when the ‘pooled’ OLS regression is replaced by the FE model.
9
Obviously, the bias depends on µ. The estimator is unbiased for α, as the
row ι′ sums over µ. If µi = 0 for all i, the bias will disappear even for
β. The matrix X′ Zµ has dimension K × N and contains row sums of all
observations of the explanatory variables for each individual. Regressors
that are( centered
)−1 around zero will not cause any bias. For T → ∞, the
′ ′
matrix X# X# X# Zµ converges to the coefficient estimate in a regression
of individual dummies on X# . If the covariates are sufficiently dispersed, the
bias should disappear for large T .
3 Random effects
3.1 The GLS estimator for the RE model
If N becomes large, such as in the typical microeconometric cross-section
panels, the number of estimated parameters of the FE model increases con-
siderably. This motivates the idea of viewing µi not as parameters, but
rather as unobserved variables with mean 0 and variance σµ2 . For small N ,
it is less plausible to assume that individual characteristics have been gen-
erated randomly. For example, to some it may even seem absurd to explain
the differences of Sweden and Portugal by realizations from a probability
distribution.
The RE (random effects) model can be written as
10
that the dummy problem does not appear in the RE model by construction,
as the effects are specified to be stochastic with zero mean. Therefore, all
regressions can be conducted on the basis of the (N T × (K + 1))–matrix of
regressors X# that has been extended by a column of ones for the overall
intercept.
Using the assumption of independence for the error components, it follows
that
Ω = E (µ + ν) (µ + ν)′
= Eµµ′ + Eνν ′
= σµ2 (IN ⊗JT ) + σν2 IN T .
The two matrices P = IN ⊗J̄T and Q = IN T −IN ⊗J̄T are orthogonal to each
other and idempotent. Furthermore, they sum up to the identity matrix I:
PQ = QP = 0, P2 = P, Q2 = Q, P + Q = I.
= P + Q = I.
Therefore, Ω can be inverted componentwise, and the inverses are the original
matrices with inverted scales:
( )−1 ( ) ( )
Ω−1 = T σµ2 + σν2 IN ⊗J̄T + σν−2 IN T −IN ⊗J̄T .
11
In summary, we obtain the following representation for the GLS estimator
in the RE model:
( ′ −1 )−1 ′ −1
X# Ω X# X# Ω y
[ {( )−1 } ]−1
= X′# T σµ2 + σν2 P + σν−2 Q X# ×
{( )−1 }
×X′# T σµ2 + σν2 P + σν−2 Q y,
where (√ )−1
P̃ = 2 2
T σµ + σν P + σν−1 Q.
It evolves that GLS in the RE model can be interpreted as a two-step pro-
cedure. In the first step, all variables are transformed by P̃, and in a second
step one applies OLS to the transformed variables. The matrix P averages
the individual observations over time, while the matrix Q adjusts the obser-
vations for their time averages. The two coefficients in the error variances
indicate the weights of each of the two components. Noting that scales cancel
from the GLS expression, one may restrict attention to the relative weight
that is usually denoted by θ and is defined as
σν
θ =1− √ 2 .
T σµ + σν2
12
1. If σµ2 = 0, then also θ = 0. The assumption Eµ = 0 yields µ = 0 and
the pooled OLS model is obtained. In this case, P̃1 = P + Q = I and
the GLS estimator becomes the OLS estimator, as should be.
2. If σµ2 is very large, then θ approaches 1. The first term in the transfor-
mation P̃1 approaches 0 and P̃1 approaches Q, the fixed-effects sweep-
ing matrix. The RE estimator converges to the FE estimator. This
observation implies that fixed effects are equivalent to random effects
with very large variance.
3. Conversely, it is not possible in the RE model that only the first com-
ponent P appears in P̃1 . This would correspond to θ → ∞. The re-
gression uses the time averages of the data only. The above-mentioned
between estimator is obtained.
13
where û again come from preliminary OLS or LSDV regressions. From both
estimates, one may also compute an estimate for σµ2 according to
( )
σ̂µ2 = T −1 σ̂12 − σ̂ν2 .
Occasionally, this value becomes negative, which then also results in a neg-
ative θ. For this comparatively rare case, the literature offers various and
occasionally conflicting recommendations.
Baltagi considers an alternative procedure that follows Nerlove, who
estimates σµ2 directly from the fixed effects µ̂i of the LSDV estimation. Typ-
ically, all fGLS variants yield similar numerical results. Using iterations be-
tween estimating θ and fGLS regressions, one may approximate the maximum-
likelihood (ML) estimator. Many current computer programs use such iter-
ative ML estimators by default for the RE model.
14
if activated, causes an iteration to θ = 0.928. The value for the likelihood
function that is provided by STATA is considerably below the likelihood for
the FE model calculated by EViews, which may indicate that FE is a better
data description than RE (see also Part II of these notes).
Table 2: RE estimation for the gasoline panel according to Baltagi and Grif-
fin, using Eviews. For a comparison, also LSDV is reported.
15
4 Two-way panels
4.1 Fixed effects
In some cases, it may be attractive to extend the panel model by so-called
‘time effects’:
y = α + Xβ + Zµ µ + Zλ λ + ν. (12)
The matrix IN ⊗ J̄T is block-diagonal and subtracts time averages for each i.
The matrix J̄N ⊗ IT contains N × N identical blocks and subtracts for each
time point t means across individuals. The last term is a matrix filled with
N T × N T times the value (N T )−1 .
16
With this two-way Q2 , the FE estimator can be expressed as
−1
β̂ = (X′ Q2 X) X′ Q2 y. (14)
Again, formally the FE estimator is reminiscent of a GLS estimator, al-
though Q2 is singular and cannot be the inverse of an errors variance matrix.
Similarly, the variance matrix for the estimator is given as
−1
varβ̂ = σν2 (X′ Q2 X) , (15)
where σν2 denotes the variance of the stochastic error νit . The FE estimator
is consistent for β and α, for µ (λ) only if T → ∞ (N → ∞).
17
Note that the expression for a4 is a bit involved. For N, T → ∞ the matrix
Ω−1 converges to the matrix Q2 that we found for the two-way FE model
in (13). It is easily checked directly that T a1 (T ) → a3 , N a2 (N ) → a3 ,
N T a4 (N, T ) → a3 , using some plausible notation. Thus, the GLS estimator
approaches the FE estimator. The exact proof that for (a) N → ∞, (b) T →
∞, (c) N/T → c ∈ R\ {0} both estimators are asymptotically equivalent,
including all distributional properties, is due to Wallace&Hussein.
Using estimates for all variances σλ2 , σµ2 , σν2 yields an estimator Ω̂−1 for
the variance matrix Ω−1 . Then, the expression for the feasible GLS estimator
for the two-way RE–model reads:
( )−1
β̂#,RE = X′# Ω̂−1 X# X′# Ω̂−1 y
The required variance estimates for the error components can be computed
from the (two-way) FE regression residuals, for example. By iteration, one
may again approximate the ML estimator.
4.3 An example
Many software routines, such as the older EViews version used here, do
not offer explicit methods for estimating two-way models. However, it is
typically easy to compute the FE estimate by generating dummy variables
for all time points. Table 3 compares the resulting estimator to a one-way
FE variant. By contrast, the two-way GLS estimator must be programmed
independently, which may require some programming skills.
It is obvious that the two-way model estimation critically affects the
coefficients of the income and import price variables, which may be viewed
as dependent on the business cycle, whereas the influence of car ownership
is nearly unchanged. It is yet to be checked whether the improvement in
the likelihood justifies the transition to the two-way model. The reaction of
the DW statistic is disappointing. Adding time effects did not succeed in
reducing the autocorrelation in the errors susceptibly. This insinuates that
genuine dynamic modeling using lags of variables should be considered.
18
Table 3: FE and RE two-way estimates for the gasoline panel according to
Baltagi and Griffin. For a comparison, the one-way FE-estimator is also
reported.
19
References
[1] Arellano, M. (2003) Panel Data Econometrics, Oxford University
Press.
[4] Diggle, P., Heagerty, P., Liang, K.Y., and S. Zeger (2002)
Analysis of Longitudinal Data, 2nd Edition, Oxford University Press.
[6] Wallace, T.D., and A. Hussein (1969) ‘The use of error components
models in combining cross-section with time series data,’ Econometrica
37, 55–72.
20