0% found this document useful (0 votes)

18 views20 pages

00 Panels1e

The document discusses econometric methods for panel data, explaining the structure and types of panel data, including fixed effects and the LSDV estimator. It highlights the advantages of panel analysis, such as increased sample size and degrees of freedom, while also addressing potential biases from incorrect assumptions about individual-specific parameters. The text provides a detailed examination of the fixed effects model and the algebraic formulation of the LSDV estimator, emphasizing the importance of assumptions in econometric analysis.

Uploaded by

davisantos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views20 pages

00 Panels1e

Uploaded by

davisantos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Econometric Methods for Panel Data — Part I

Robert M. Kunst
University of Vienna
February 2011

1 Introduction
The word panel is derived from Dutch and originally describes a rectangular
board. In econometrics, it denotes data sets that have a time dimension as
well as a non-time dimension. A genuine panel has the form

Xit , i = 1, . . . , N, t = 1, . . . , T. (1)

Thus, it can really be represented in a rectangular form, like a board. Di-

mension i is called the ‘individual dimension’, t is the time dimension. X
can be a scalar (real) variable or also a vector-valued variable. Often, data
sets do not correspond exactly to this pattern, even though they have similar
dimensions i and t. For example, t may denote an individual time dimension
rather than a common time. Such data sets are sometimes called longitudi-
nal data (see Diggle et al.). It may also occur that there exists a common
time index but the physical identity of individuals changes over time. Such
data sets are called repeated cross sections (RCS) or pseudo panels (see also
Carraro et al.). If lengths of time series depend on i, thus violating the
rectangular shape, panels are called unbalanced.
The dimensions of width and length of the ‘board’ are not exchangeable
coordinates. The time index t is logically ordered, while the individual index
is not. Thus, it may make sense to admit a correlation structure over time,
for Xit and Xi,t−1 , while still assuming independence across individuals Xit
and Xi−1,t . The time index t has a quite similar meaning in diﬀerent panels:
it may denote years, days, hours etc. By contrast, the interpretation of the
subscript i varies a lot across applications. It may refer to persons, countries,
ﬁrms, municipalities, trees, lab animals.
According to the positioning of the board (portrait or landscape), one
may also distinguish time-series panels (T > N ) and cross-section panels

1
(N > T ). Time-series panels are common in macroeconomics, while cross-
section panels are dominant in microeconomics, particularly in labor eco-
nomics. Note, however, that many data sets in labor economics do not corre-
spond to the definition of genuine panels (for example, individual histories of
persons). Typical examples for macroeconomic time-series panels are com-
parative cross-country studies of macroeconomic relationships (consumption
function, Phillips curve). Then, i denotes a country.
Advantages of panel analysis vis-a-vis pure time-series studies are
rooted in the larger sample size, as N T > T for N > 1. The additional infor-
mation may imply an increase in the degrees of freedom for estimating model
parameters and for conducting hypothesis tests. However, this increase in
degrees of freedom requires assumptions on the constancy of such parameters
across the cross-section dimension. For example, the regression
yit = α + β ′ Xit + uit , (2)
with dim (β) = K, will yield more precise information than a regression
based on a single time series, if N T data points are available, as the degrees
of freedom increase from T − K − 1 to N T − K − 1. However, if one allows
α to differ across individuals, i.e. if one replaces α by αi , degrees of freedom
will fall to (T − 1) N − K. If one even replaces the fixed coefficients β by
individual-specific βi , degrees of freedom will be a mere (T − K − 1) N . If
α and β are even allowed to change over time or if correlation across indi-
viduals is permitted in the error term u, all benefits from panel analysis will
definitely disappear. Similar remarks hold when comparing panel analysis
to pure cross-section analysis. Some authors (see Baltagi, Wooldridge)
also argue that panels with an individual dimension generally are more in-
formative than aggregate time series. Note, however, that this argument
may depend on the research aim and that the analysis of macroeconomic
aggregate data can be valuable.
Disadvantages of panel analysis are mirror images of the advantages.
If α is assumed constant, while it really varies across individuals as αi , this
will result in a biased estimate of β. The same holds with regard to the
assumptions on β. If β varies as βi across individuals, then an estimate β̂
can at best represent an approximation to an average of the true coefficients
βi . Therefore, treating data sets as panels will only make sense, if it is a priori
plausible to assume that (parts of the) relationships are constant across the
individual dimension. The situation is comparable to standard regression
analysis, which also needs some rudimentary assumption on the constancy
of the underlying linear relationship.
Most econometric text books contain sections on panels. More details are
found in topic literature. The book by Hsiao is a classic and has recently

2
been re-edited. The popular book by Baltagi contains more formal deriva-
tions as well as hands-on empirical examples. Arellano collects many
recently developed methods for dynamic panel analysis. This book is maybe
less accessible to beginners and many practitioners due to its emphasis on
mathematical formalism.

2 Fixed eﬀects
2.1 The LSDV estimator
At ﬁrst, we consider the regression model

yit = α + β ′ Xit + uit , i = 1, . . . , N, t = 1, . . . , T

as our basic model. Note that typically a third index may be added to the
subscripts i and t, as the vector of regressors Xit and thus β have dimension
K.
If the errors uit are independent across time and across individuals with
Eu = 0 and varu = σ 2 , then this is a traditional econometric regression model
that can be estimated via OLS. In panel analysis, this—usually quite restric-
tive and unrealistic—model is called ‘pooled regression’. The more common
assumption is that regression constants vary across individuals (countries,
firms, ...). In this case, many texts on panels use the notation αi in lieu
of α. Alternatively, one may subtract the mean ∑N across individuals from αi
−1
and subsume
∑ the deviations µ i = α i − N i=1 αi in the disturbances u.
Observe µi = 0. The individual-specific constants αi are called effects.
With these specifications, the model obtains the form

yit = α + β ′ Xit + uit ,

uit = µi + νit , i = 1, . . . , N, t = 1, . . . , T, (3)

where various assumptions can be used for the properties of µi and νit . The
simplest assumption is that all µi are fixed unknown values. Thus, the µi
become model parameters, together with α and β, which however can only
be estimated if T gets large, not for N → ∞. In statistics, such parameters
on which information does not increase as the sample ∑ size grows are called
incidental parameters. An important assumption is N i=1 µi = 0, otherwise
the global intercept α is not identified. Note that this assumption does
not represent a restriction but it is necessary in order to make the model
empirically valid. Because the µi are fixed values, the model is called the
‘fixed-effects’ model (FE).

3
To keep the presentation simple, at first we assume that the remaining
errors νit are independent with Eνit = 0 and varνit = σ 2 . A useful general-
ization would be to assume that varνit = σi2 , i.e. that νit are heteroskedastic
across the individual dimension. In some applications, it may also make sense
to allow for correlation of errors across individuals.
For this FE model, we wish to construct an efficient estimator. It will cer-
tainly not be simple OLS, as Euit = µi , which violates one of the assumptions
for the Gauss-Markov theorem. Rather, one may interpret µi as coefficients
of individual dummy variables that equal 1 for i and are 0 otherwise. The
N –vector that contains all these dummies will be denoted as Zµ,it . For every
given i and t, there will be exactly one element of 1 in this vector, while all
other elements are 0. With these definitions, we have the regression

yit = α + β ′ Xit + µ′ Zµ,it + νit , i = 1, . . . , N, t = 1, . . . , T, (4)

which fulﬁlls all standard assumptions for OLS estimation.

Technical escape from the ‘dummy-variables trap’. Note that the
regressor matrix inclusive of the constant is singular. The simplest solution is
to conduct the regression estimation without constant (‘homogenous regres-
sion’) and to determine α̂ from the average of the coefficients of Zµ . Then,
the final estimator for µ results after subtracting α̂. Alternatively,
∑ one may
consider a restricted OLS estimation under the restrictions µi = 0. Both
methods yield numerically identical estimates α̂ and µ̂i . Another alternative
would be omitting any individual i∗ , and then adding the mean of the co-
efficient estimates to α̂, while subtracting it from the coefficient estimates,
including the µ∗i that was originally set at zero. Again, this yields the same
coefficient estimates α̂ and µ̂ but the procedure is less appealing due to its
arbitrary asymmetry. In the following, we will focus on the first method and
the constant will be omitted in constructing estimators for technical reasons.
It is to be noted, however, that the considered regressions are inhomogeneous
rather than homogeneous in principle.
The OLS estimator for (4) is BLUE (best linear) and even efficient if errors
νit are Gaussian. This OLS estimator has several names. Some authors
(see Hsiao, among others) call it within estimator or within groups, as it
focuses on the variation within the observations on each individual after
purging variables from the level-dummy effects. Baltagi calls it LSDV
(least squares dummy variables), computer programs often simply refer to
the ‘FE estimator’.
The counterpart to the within-estimator is the between estimator (or be-
tween groups) that originates from a regression of N individual time averages

4
of each variable:

ȳi. = α + β ′ X̄i. + ūi. , i = 1 . . . , N, (5)

∑
where ȳi. = T −1 Tt=1 yit etc. The estimator violates regression assumptions
because of Eūi. = µi ̸= 0. It is not really considered to be of empirical
importance. The between estimator is mainly of theoretical interest.

2.2 The algebra of the LSDV estimator

The algebraic analysis of the LSDV estimator is convenient for later usage.
If the FE model is written in matrix form, it reads

y = αιN T + Xβ + Zµ µ + ν.

The vector y has dimension N T . Below T observations for the ﬁrst individual
i = 1, it contains T observations for the second individual etc. The vector
ιN T simply consists of N T ones. The matrix X has dimension N T × K, and
the vector β has dimension K. The matrix Zµ has dimension N T × N and
very simple structure:
 
1 0 ··· 0
 .. .. .. 
 . . . 
 
 1 0 
 
 0 1 
 . . 
 . . 
 . . 
Zµ =  .
 0 1 
 
 ... 
 
 
 0 1 
 .. .. 
 . . 
0 ··· 0 1

The vector µ has dimension N and contains the individual deviations from
the global level α, i.e. µi .
Collecting the matrices X and Zµ in a big N T ×(K + N )–matrix Z allows
representing the OLS estimator as (Z′ Z)−1 Z′ y (without a constant, to avoid
the singularity problem). Because this estimator involves an inversion of an
(N + K) × (N + K)–matrix, which can be a really big matrix, it is more con-
venient to apply an algebraic result that is nicknamed the Frisch-Waugh the-
orem in the econometric literature. According to the Frisch-Waugh theorem,

5
the coeﬃcient estimate β̂ for any given OLS regression (in this derivation,
the notation does not match the remainder of the text)

y = Xβ + Zγ + u

can be obtained in two equivalent ways. Either X and Z are collected in a

large matrix and β̂ and γ̂ are evaluated directly, or one proceeds in two steps.
First, y as well as each regressor xj are regressed on Z in K + 1 regressions

y = Zδy + ỹ,
xj = Zδxj + x̃j .

Then, the residuals from the ﬁrst-step regression are regressed on each other
(X̃ collects the columns x̃j , j = 1, . . . , K)

ỹ = X̃β + u.

This two-step procedure yields the same (numerically identical) estimate for
β̂ and the same residuals û.
If the Frisch-Waugh theorem is applied to the FE model, one may ﬁrst
regress all variables y and X—these are essentially the interesting variables—
on the uninteresting constants and dummies. Then, the ‘purged’ ỹ and X̃
are regressed on each other. This two-step sequence will then yield the FE
estimator ( )−1
β̂F E = X̃′ X̃ X̃′ ỹ,
where only a K × K–matrix is inverted. Note that the residuals are just the
variables that have been adjusted for (or ‘purged from’) the individual time
averages.
In order to attain a direct closed form in the original variables, we have to
determine the matrix that performs the purging operation on the variables.
Baltagi calls this matrix Q. It has the form

Q = IN T −T −1 IN ⊗ ιT ι′T , (6)

which contains a Kronecker product.

Kronecker products. We remember that the Kronecker product of
two matrices A and B of dimensions k × l and m × n is defined as the large
(km) × (ln) matrix that contains kl blocks shaped as aij B for i = 1, . . . , k
and j = 1, . . . , l, where aij denotes the typical element of the matrix A.
Note that some authors use the reverse concept for the definition of the
Kronecker product. According to our definition, the left matrix describes
the coarse structure of the resulting matrix, while the right matrix gives

6
the ﬁne structure. [Mnemonic: most people are right-handed and can only
perform coarse work with their left hands]
Here and in the following, ι denotes a vector of ones, I is an identity ma-
trix. Wherever it eases understanding, dimensions are denoted by subscript
indices. Thus, ιι′ is a quadratic matrix of ones, I ⊗ ιι′ is a ‘block-diagonal’
matrix that repeats N blocks of form ιι′ along its main diagonal. Then, Q
has the representation
 
Q̃ 0 ··· 0
 .. 
 0 Q̃ . 
Q= .  = IN ⊗ Q̃
. ... 
 . 0 
0 ··· 0 Q̃

with Q̃ = IT − T −1 ιT ι′T . All diagonal elements of Q̃ are (T − 1) /T , while

all other (oﬀ-diagonal) elements are −1/T . Hsiao calls Q̃ the ‘sweepout
matrix’, as it purges the individuals from their time averages. It is easily
shown that the matrix Q is ‘idempotent’, i.e. QQ = Q, as it is a projection
matrix. Therefore, one has for the FE estimator
−1 −1
β̂F E = (X′ Q′ QX) X′ Q′ Qy = (X′ QX) X′ Qy. (7)

The form of (7) corresponds to a GLS (generalized least squares) esti-

mator, i.e. a BLUE estimator for a regression model with correlated errors.
Adopting this interpretation, the FE estimator would be the GLS estimator
for the regression model
y = Xβ + v, (8)
with Evv ′ = Q−1 . However, this is not a valid assumption for the correlation
of errors. Q does not have rank N T and cannot be inverted. The assumption
that N data segments of length T have individual-speciﬁc means transgresses
the framework of GLS models for errors with constant error mean 0.

2.3 Properties of the LSDV estimator

According to its derivation, the LSDV estimator is BLUE for the FE model
and eﬃcient for the FE model with normally distributed errors. In the pres-
ence of heteroskedasticity or correlation across errors, the estimator will not
be BLUE any more, while it will still be unbiased.
The (optimal=minimal because of the BLUE property) variance matrix
of the LSDV estimator in the FE model results from a calculation that com-
pletely parallels the usual derivation of the GLS variance. In the algebraic

7
manipulations, note that the matrix Q eliminates the constant α as well as
the expression Zµ µ, such that Qy can be expressed by QXβ + Qν.
( )( )′
varβ̂F E = E β̂F E − β β̂F E − β
{ }{ }′
−1 −1
= E (X′ QX) X′ Qy − β (X′ QX) X′ Qy − β
{ }{ }′
′ −1 ′ ′ −1 ′
= E (X QX) X (QXβ + Qu) − β (X QX) X Qy − β
{ }{ }′
−1 −1
= E (X′ QX) X′ Qν (X′ QX) X′ Qν
−1 −1
= (X′ QX) X′ Q (Eνν ′ ) Q′ X (X′ QX)
−1
= σ 2 (X′ QX) (9)

After replacing σ 2 by an empirical estimate, this variance formula admits

conducting t–tests and F –tests, in analogy to the standard regression model.
With respect to the property of consistency of the LSDV estimator, it is
obvious that β̂F E converges to β for T → ∞, as varβ̂F E → 0. For N → ∞,
varβ̂F E will also converge to 0. As N gets larger, the block-diagonal matrix
Q contains ever more blocks, whereas the blocks themselves will increase in
dimension, as T gets larger. This reflects the fact that more and more µi
have to be estimated, as N → ∞. Their estimates cannot converge, as only
T observations are available. Thus, β̂F E will be consistent for N → ∞ as
well as for T → ∞. The estimates for α and µi , however, converge only in
the case of T → ∞.
Estimates α̂ and µ̂i may be obtained from the LSDV regression (4) di-
rectly. Alternatively, after calculating β̂F E , the resulting variable y − Xβ̂F E
can be regressed on a constant and on Zµ . Unfortunately, for large N both
methods require regressing on a multitude of individual dummies. Therefore,
in cross-section panels with N > T the estimation of individual effects is of-
ten omitted. Correspondingly, many programs that are specifically designed
for panel problems will not report estimates for the ‘fixed effects’ µi or will
at least use their omission as the default option.

2.4 An example
Baltagi analyzes a data set on modeling gasoline demand per car. The
dependent variable is explained by three covariates: per capita income, a
relative import price, cars per capita. Data are available for the years 1960–
1978 and for 18 countries. Thus, we have T = 19 and N = 18. The board is
nearly square.

8
Table 1 compares the results for OLS and for LSDV estimation. Both
estimation procedures entail very unsatisfactory Durbin-Watson statistics.
Country eﬀects are not reported here. For example, Spain has a negative µ̂i ,
while Canada, Sweden, and the U.S. show the most positive values for µ̂i .
The three latter countries share the characteristic of regionally low population
density, where cars are used to travel long distances.

Table 1: Estimates for the gasoline panel by Baltagi and Griffin.

Regressor OLS t LSDV t

log(Y /N ) 0.890 [24.86] 0.662 [ 9.02]
log(PM G /PGDP ) -0.892 [-29.42] -0.322 [-7.29]
log(Car/N ) -0.763 [-41.02] -0.640 [-21.58]
α 2.391 [20.45] 2.403 [10.66]
2
R 0.855 0.973
logL 50.493 340.334
Durbin-Watson 0.137 0.327

( )
Apart from the t–values, which have been computed using tj = β̂j /σ̂ β̂j
( )
and the formulae described above for β̂ and its variance—we use σ̂ β̂j
for the estimated standard error of coeﬃcient β̂—the table also shows log-
likelihood statistics. Note the considerable improvement of the likelihood
when the ‘pooled’ OLS regression is replaced by the FE model.

2.5 The bias of OLS in the FE model

Because of E (uit ) = µi ̸= 0, the ‘pooled’ OLS estimator in the FE model
is not merely ineﬃcient, as in the traditional GLS model, but it is even
biased. The size of this bias can be determined easily by straight forward
calculation. Let X# denote the regressor matrix of dimension T × (K + 1)
that has been extended by a vector of ones. Similarly, let β# denote the
vector of coeﬃcients that has been extended by the regression intercept.
( ) ( )−1 ′
E β̂# = E X′# X# X# y
( ′ )−1 ′
= E X# X# X# (Xβ + α + Zµ µ + ν)
( ′ )−1 ′
= E X# X# X# (X# β# + Zµ µ + ν)
( ′ )−1 ′
= β# + X# X# X# Zµ µ

9
Obviously, the bias depends on µ. The estimator is unbiased for α, as the
row ι′ sums over µ. If µi = 0 for all i, the bias will disappear even for
β. The matrix X′ Zµ has dimension K × N and contains row sums of all
observations of the explanatory variables for each individual. Regressors
that are( centered
)−1 around zero will not cause any bias. For T → ∞, the
′ ′
matrix X# X# X# Zµ converges to the coeﬃcient estimate in a regression
of individual dummies on X# . If the covariates are suﬃciently dispersed, the
bias should disappear for large T .

3 Random effects
3.1 The GLS estimator for the RE model
If N becomes large, such as in the typical microeconometric cross-section
panels, the number of estimated parameters of the FE model increases con-
siderably. This motivates the idea of viewing µi not as parameters, but
rather as unobserved variables with mean 0 and variance σµ2 . For small N ,
it is less plausible to assume that individual characteristics have been gen-
erated randomly. For example, to some it may even seem absurd to explain
the differences of Sweden and Portugal by realizations from a probability
distribution.
The RE (random effects) model can be written as

yit = α + β ′ Xit + uit ,

uit = µi + νit ,
( )
µi ∼ i.i.d. 0, σµ2 ,
( )
νit ∼ i.i.d. 0, σν2 ,
i = 1, . . . , N, t = 1, . . . , T. (10)

The two error components µ and ν are assumed to be independent from

each other. In (10), uit follows a mean-zero probability distribution for all i
and t, therefore the RE model is a ‘regular’ GLS model. If both variances
σµ2 and σν2 are known, α and β can be estimated eﬃciently—with normal
errors eﬃciently among all unbiased estimators, otherwise BLUE—via the
GLS estimator { }−1
−1 −1
X′# (Euu′ ) X# X′# (Euu′ ) y.
A drawback for the direct application of this method is the large matrix
Ω = Euu′ , which has dimension N T × N T and must be inverted. Note

10
that the dummy problem does not appear in the RE model by construction,
as the eﬀects are speciﬁed to be stochastic with zero mean. Therefore, all
regressions can be conducted on the basis of the (N T × (K + 1))–matrix of
regressors X# that has been extended by a column of ones for the overall
intercept.
Using the assumption of independence for the error components, it follows
that

Ω = E (µ + ν) (µ + ν)′
= Eµµ′ + Eνν ′
= σµ2 (IN ⊗JT ) + σν2 IN T .

According to the deﬁnition of the Kronecker product, the matrix IN ⊗JT is

block-diagonal, with all N diagonal blocks consisting of T × T –matrices of
ones. Within an individual, the error component µi is correlated perfectly.
The second term expresses the increase in variance along the diagonal of the
large matrix Ω due to the other error component νit .
Inversion of the matrix Ω can be conducted following a suggestion of
Wansbeek&Kapteyn. First, J is replaced by T J̄, where J̄ consists of
T × T identical elements 1/T . Then, a matrix term is subtracted and added
back:
( )
Ω = T σµ2 IN ⊗ J̄T + σν2 IN T
( )( ) ( )
= T σµ2 + σν2 IN ⊗J̄T + σν2 IN T −IN ⊗J̄T

The two matrices P = IN ⊗J̄T and Q = IN T −IN ⊗J̄T are orthogonal to each
other and idempotent. Furthermore, they sum up to the identity matrix I:

PQ = QP = 0, P2 = P, Q2 = Q, P + Q = I.

This implies that, for any given λ1 and λ2 ,

( )
(λ1 P + λ2 Q) λ−1 −1
1 P + λ2 Q
= P2 + λ1 λ−1 −1
2 PQ + λ2 λ1 QP + Q
2

= P + Q = I.

Therefore, Ω can be inverted componentwise, and the inverses are the original
matrices with inverted scales:
( )−1 ( ) ( )
Ω−1 = T σµ2 + σν2 IN ⊗J̄T + σν−2 IN T −IN ⊗J̄T .

11
In summary, we obtain the following representation for the GLS estimator
in the RE model:
( ′ −1 )−1 ′ −1
X# Ω X# X# Ω y
[ {( )−1 } ]−1
= X′# T σµ2 + σν2 P + σν−2 Q X# ×
{( )−1 }
×X′# T σµ2 + σν2 P + σν−2 Q y,

which looks complicated but succeeds in avoiding an N T × N T –matrix in-

version. Furthermore, it is obvious that
(√ √ ) (√ √ )
λ1 P + λ2 Q λ1 P + λ2 Q
= λ1 P + λ2 Q.
Using the fact that both P and Q are symmetric matrices, the estimator can
be re-written as
{( )′ ( )}−1 ( )′
P̃X# P̃X# P̃X# P̃y,

where (√ )−1
P̃ = 2 2
T σµ + σν P + σν−1 Q.
It evolves that GLS in the RE model can be interpreted as a two-step pro-
cedure. In the first step, all variables are transformed by P̃, and in a second
step one applies OLS to the transformed variables. The matrix P averages
the individual observations over time, while the matrix Q adjusts the obser-
vations for their time averages. The two coefficients in the error variances
indicate the weights of each of the two components. Noting that scales cancel
from the GLS expression, one may restrict attention to the relative weight
that is usually denoted by θ and is defined as
σν
θ =1− √ 2 .
T σµ + σν2

In this notation, one may replace the transformation matrix P̃ by

P̃1 = σν P̃
σν
= √ 2 P+Q
T σµ + σν2
= (1 − θ) P + Q,
again noting that multiplication by σν does not change the estimator.
Special cases.

12
1. If σµ2 = 0, then also θ = 0. The assumption Eµ = 0 yields µ = 0 and
the pooled OLS model is obtained. In this case, P̃1 = P + Q = I and
the GLS estimator becomes the OLS estimator, as should be.

2. If σµ2 is very large, then θ approaches 1. The first term in the transfor-
mation P̃1 approaches 0 and P̃1 approaches Q, the fixed-effects sweep-
ing matrix. The RE estimator converges to the FE estimator. This
observation implies that fixed effects are equivalent to random effects
with very large variance.

3. Conversely, it is not possible in the RE model that only the ﬁrst com-
ponent P appears in P̃1 . This would correspond to θ → ∞. The re-
gression uses the time averages of the data only. The above-mentioned
between estimator is obtained.

3.2 feasible GLS in the RE model

Unless the constant θ happens to be known, the GLS estimator in its form
derived above
{( )′ ( )}−1 ( )′
P̃1 X# P̃1 X# P̃1 X# P̃1 y,
σν
P̃1 = √ 2 P + Q = (1 − θ) P + Q,
T σµ + σν2

cannot be calculated. Usually, θ must be estimated.

One suggestion that is found in the literature is to estimate σ12 = T σµ2 +σν2
by
( )2
T ∑N
1 ∑T
σ̂12 = ûit .
N i=1 T t=1
The residuals ûit may be the residuals from a preliminary pooled OLS re-
gression or from an LSDV regression. In contrast to the FE model, the OLS
estimator is unbiased in the RE model, as errors have mean zero and µi are
mere mean-zero random variables. However, many authors (see Amemiya)
support the claim that preliminary LSDV estimation yields better properties.
Similarly, σν2 is estimated by
∑N ∑T ( ∑T )2
i=1 t=1 ûit − T −1 s=1 ûis
σ̂ν2 = ,
N (T − 1)

13
where û again come from preliminary OLS or LSDV regressions. From both
estimates, one may also compute an estimate for σµ2 according to
( )
σ̂µ2 = T −1 σ̂12 − σ̂ν2 .

Occasionally, this value becomes negative, which then also results in a neg-
ative θ. For this comparatively rare case, the literature offers various and
occasionally conflicting recommendations.
Baltagi considers an alternative procedure that follows Nerlove, who
estimates σµ2 directly from the fixed effects µ̂i of the LSDV estimation. Typ-
ically, all fGLS variants yield similar numerical results. Using iterations be-
tween estimating θ and fGLS regressions, one may approximate the maximum-
likelihood (ML) estimator. Many current computer programs use such iter-
ative ML estimators by default for the RE model.

3.3 Properties of the RE estimator

By construction, the GLS estimator in the RE model is BLUE and even
eﬃcient for Gaussian errors. Its variance matrix is given as
( ) { }−1
var β̂#,RE = X′# Ω−1 X# ,

where Ω must be estimated, as it was described above. For large samples,

this minimal variance is also attained by the fGLS estimator. This holds for
T → ∞ as well as for N → ∞.
Note that, for T → ∞, the ratio θ converges to 1. Thus, the fGLS
estimator becomes more and more similar to the FE estimator. One can
show that, for T → ∞, the FE estimator becomes efficient even in the RE
model. Thus, the more complicated RE estimator shows advantages only for
limited T and larger N . Particularly, in time-series panels with T > N there
exist other arguments that discourage using the RE estimator that will be
addressed later.
Estimates of variances and therefore of standard errors permit calculating
the usual t–values for all regression coefficients. For the exemplary data on
gasoline demand, the result using the EViews program is given in Table 2.
Presumably, EViews uses an iterative fGLS procedure to approximate ML,
unfortunately θ is not provided. From other statistics, one may infer a value
of θ = 0.904. By contrast, the program STATA expressly gives θ = 0.892,
which exactly corresponds to one of the values reported by Baltagi for one
of the fGLS procedures. In both cases, the RE estimator is closer to the
FE estimator than to OLS. STATA also offers a separate ML option, which,

14
if activated, causes an iteration to θ = 0.928. The value for the likelihood
function that is provided by STATA is considerably below the likelihood for
the FE model calculated by EViews, which may indicate that FE is a better
data description than RE (see also Part II of these notes).

Table 2: RE estimation for the gasoline panel according to Baltagi and Grif-
ﬁn, using Eviews. For a comparison, also LSDV is reported.

Regressor fGLS t LSDV t

log(Y /N ) 0.555 [9.39] 0.662 [ 9.02]
log(PM G /PGDP ) -0.420 [-10.52] -0.322 [-7.29]
log(Car/N ) -0.607 [-23.78] -0.640 [-21.58]
α 1.997 [10.83] 2.403 [10.66]
R2 0.973 0.973
logL 216.743 340.334
Durbin-Watson 0.335 0.327
σµ 0.196 0.348
σν 0.092 0.092
θ 0.892 1

15
4 Two-way panels
4.1 Fixed eﬀects
In some cases, it may be attractive to extend the panel model by so-called
‘time eﬀects’:

yit = α + β ′ Xit + uit ,

uit = µi + λt + νit , i = 1, . . . , N, t = 1, . . . , T. (11)

For example, in a time-series panel for a cross-country comparison, λt may

denote the influence of an international business cycle that affects all coun-
tries in the sample. Surprisingly, the model is more common in textbooks
than in econometric software or in reported empirical applications. To dis-
tinguish it from the traditional panel model (3), the model (11) is called
the two-way model. The more classical model (3) is also called the one-way
model. It would also be possible to consider models with time effects and
without individual effects but such models are apparently uncommon.
If the N + T constants µi and λt are interpreted as parameters, the
resulting model is the two-way fixed-effects model. It can be analyzed in
complete analogy to the one-way fixed-effects model. Firstly, it can be re-
written in compact notation

y = α + Xβ + Zµ µ + Zλ λ + ν. (12)

Matrices Zµ and Zλ have dimensions T N × N and T N × T . Note that

∑T ∑N
t=1 λt = i=1 µi = 0 must be assumed, otherwise α is not identiﬁed.
For uncorrelated errors ν, the OLS estimator is again BLUE. This esti-
mator, however, requires inverting a matrix of dimension (K + N + T − 1)×
(K + N + T − 1) (avoiding the dummy trap; alternatively a constrained re-
gression with an inversion of a matrix of dimension (K + N + T + 1) ×
(K + N + T + 1)), which should be avoided. In analogy to the one-way
model, one may apply the Frisch-Waugh theorem. One may ﬁrst ‘purge’ all
individual and temporal means from all variables y and X, and then execute
an OLS regression that requires inversion of a K × K)–matrix only.
Formally, the purging matrix is given as

Q2 = IN ⊗ IT − IN ⊗ J̄T − J̄N ⊗ IT + J̄N ⊗ J̄T . (13)

The matrix IN ⊗ J̄T is block-diagonal and subtracts time averages for each i.
The matrix J̄N ⊗ IT contains N × N identical blocks and subtracts for each
time point t means across individuals. The last term is a matrix ﬁlled with
N T × N T times the value (N T )−1 .

16
With this two-way Q2 , the FE estimator can be expressed as
−1
β̂ = (X′ Q2 X) X′ Q2 y. (14)
Again, formally the FE estimator is reminiscent of a GLS estimator, al-
though Q2 is singular and cannot be the inverse of an errors variance matrix.
Similarly, the variance matrix for the estimator is given as
−1
varβ̂ = σν2 (X′ Q2 X) , (15)
where σν2 denotes the variance of the stochastic error νit . The FE estimator
is consistent for β and α, for µ (λ) only if T → ∞ (N → ∞).

4.2 Random eﬀects

The two-way random-effects model assumes that the error components µi
and λt are drawn from independent distributions with variances σµ2 and σλ2 .
The variance matrix of the total unobserved errors u results as the sum of
its three components (using the explicit notation σν2 = Eνit2 ):
Ω = E (uu′ )
= σµ2 (IN ⊗ JT ) + σλ2 (JN ⊗ IT ) + σν2 IN T (16)
Only the middle term is new as compared to the one-way model. It contains
N × N blocks of IN identity matrices. The three terms are not orthogonal to
each other and Ω cannot be inverted on the basis of these components. The
orthogonal decomposition is not as easily found as in the one-way model. It
can be constructed by tentatively specifying the inverse Ω−1 as a weighted
sum of four plausible matrices, i.e. the three components of Ω above and the
N T × N T matrix of ones JN T :
Ω−1 = a1 (IN ⊗ JT ) + a2 (JN ⊗ IT ) + a3 IN T + a4 JN T
Equating ΩΩ−1 = I yields the coefficients aj by a comparison of coefficients.
We just report the results of this exercise:
σµ2
a1 = − ( ) ,
σν2 + T σµ2 σν2
σλ2
a2 = − ,
(σν2 + N σλ2 ) σν2
1
a3 = ,
σν2
σµ2 σλ2 2σν2 + T σµ2 + N σλ2
a4 = ( ) . (17)
σν2 σν2 + T σµ2 (σν2 + N σλ2 ) σν2 + T σµ2 + N σλ2

17
Note that the expression for a4 is a bit involved. For N, T → ∞ the matrix
Ω−1 converges to the matrix Q2 that we found for the two-way FE model
in (13). It is easily checked directly that T a1 (T ) → a3 , N a2 (N ) → a3 ,
N T a4 (N, T ) → a3 , using some plausible notation. Thus, the GLS estimator
approaches the FE estimator. The exact proof that for (a) N → ∞, (b) T →
∞, (c) N/T → c ∈ R\ {0} both estimators are asymptotically equivalent,
including all distributional properties, is due to Wallace&Hussein.
Using estimates for all variances σλ2 , σµ2 , σν2 yields an estimator Ω̂−1 for
the variance matrix Ω−1 . Then, the expression for the feasible GLS estimator
for the two-way RE–model reads:
( )−1
β̂#,RE = X′# Ω̂−1 X# X′# Ω̂−1 y

The required variance estimates for the error components can be computed
from the (two-way) FE regression residuals, for example. By iteration, one
may again approximate the ML estimator.

4.3 An example
Many software routines, such as the older EViews version used here, do
not offer explicit methods for estimating two-way models. However, it is
typically easy to compute the FE estimate by generating dummy variables
for all time points. Table 3 compares the resulting estimator to a one-way
FE variant. By contrast, the two-way GLS estimator must be programmed
independently, which may require some programming skills.
It is obvious that the two-way model estimation critically affects the
coefficients of the income and import price variables, which may be viewed
as dependent on the business cycle, whereas the influence of car ownership
is nearly unchanged. It is yet to be checked whether the improvement in
the likelihood justifies the transition to the two-way model. The reaction of
the DW statistic is disappointing. Adding time effects did not succeed in
reducing the autocorrelation in the errors susceptibly. This insinuates that
genuine dynamic modeling using lags of variables should be considered.

18
Table 3: FE and RE two-way estimates for the gasoline panel according to
Baltagi and Griﬃn. For a comparison, the one-way FE-estimator is also
reported.

Regressor 2–way FE t 2–way RE t 1–way

log(Y /N ) 0.051 [0.56] 0.137 [1.62] 0.662
log(PM G /PGDP ) -0.193 [-4.43] -0.220 [-5.23] -0.322
log(Car/N ) -0.593 [-21.45] -0.600 [-22.53] -0.640
α -0.855 -0.407 [-1.16] 2.403
R2 0.981 0.978 0.973
logL 394.21 340.334
Durbin-Watson 0.348 0.327

Note: FE estimates according to EViews. RE estimates use one GLS itera-

tion starting from two-way FE.

19
References
[1] Arellano, M. (2003) Panel Data Econometrics, Oxford University
Press.

[2] Baltagi, B. (2005) Econometric Analysis of Panel Data, 3rd Edition,

Wiley.

[3] Carraro, C., Peracchi, F. and G. Weber, G. (1993) (eds.) ‘The

Econometrics of Panels and Pseudo Panels’, Journal of Econometrics 59,
1/2, 1993.

[4] Diggle, P., Heagerty, P., Liang, K.Y., and S. Zeger (2002)
Analysis of Longitudinal Data, 2nd Edition, Oxford University Press.

[5] Hsiao, C. (1986, 2004) Analysis of Panel Data, Cambridge University

Press.

[6] Wallace, T.D., and A. Hussein (1969) ‘The use of error components
models in combining cross-section with time series data,’ Econometrica
37, 55–72.

[7] Wansbeek, T.J., and A. Kapteyn (1982) ‘A class of decomposi-

tions of the variance-covariance matrix of a generalized error components
model,’ Econometrica 50, 713–724.

[8] Wooldridge, J.M. (2002) Econometric Analysis of Cross Section and

Panel Data, MIT Press.

Econometric Methods For Panel Data
No ratings yet
Econometric Methods For Panel Data
58 pages
Panel Data Econometrics-1
No ratings yet
Panel Data Econometrics-1
131 pages
Econometrics Chapter Four - Phoenix
No ratings yet
Econometrics Chapter Four - Phoenix
10 pages
Croissant y Millo, Panel Data Econometrics
100% (1)
Croissant y Millo, Panel Data Econometrics
52 pages
Econ-654 - Unit 3-PDM
No ratings yet
Econ-654 - Unit 3-PDM
211 pages
Panel Time-Series
No ratings yet
Panel Time-Series
113 pages
4 Panel Data Regression
No ratings yet
4 Panel Data Regression
59 pages
Clase Panel
No ratings yet
Clase Panel
70 pages
Panel Data Notes
No ratings yet
Panel Data Notes
26 pages
Panel Data Econometrics In: The Package: Yves Croissant Giovanni Millo
No ratings yet
Panel Data Econometrics In: The Package: Yves Croissant Giovanni Millo
51 pages
Introduction To Panel Data UG-students
100% (1)
Introduction To Panel Data UG-students
57 pages
Chapters 10 Practice Quiz
No ratings yet
Chapters 10 Practice Quiz
17 pages
Econometrics II: Panel Data Analysis: First-Differences, Fixed and Random Effects
No ratings yet
Econometrics II: Panel Data Analysis: First-Differences, Fixed and Random Effects
61 pages
Block 3
No ratings yet
Block 3
105 pages
Panel Data
No ratings yet
Panel Data
105 pages
Chapter 2 - Panel Data Regression
No ratings yet
Chapter 2 - Panel Data Regression
30 pages
Panel Data Analysis
No ratings yet
Panel Data Analysis
61 pages
Econometrics II CH-4
No ratings yet
Econometrics II CH-4
25 pages
Week 1
No ratings yet
Week 1
48 pages
Ch11 - Slides - PA April 2024
No ratings yet
Ch11 - Slides - PA April 2024
27 pages
Topic 9: Panel Data Models
No ratings yet
Topic 9: Panel Data Models
46 pages
Lesson 07 - Panel Data Regression - 2024
No ratings yet
Lesson 07 - Panel Data Regression - 2024
32 pages
Panel Data
100% (2)
Panel Data
5 pages
Fere
No ratings yet
Fere
46 pages
Chapter 2 Slides Handout
No ratings yet
Chapter 2 Slides Handout
48 pages
Section10 Solutions
100% (1)
Section10 Solutions
11 pages
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Fullz PDF
No ratings yet
Fullz PDF
3 pages
AE 2023 Lecture10
No ratings yet
AE 2023 Lecture10
40 pages
ECN3322 - Panel Data-1
No ratings yet
ECN3322 - Panel Data-1
56 pages
Ae1 Panel
No ratings yet
Ae1 Panel
36 pages
Moog Valves DIVelectricalInterfaces Manual
No ratings yet
Moog Valves DIVelectricalInterfaces Manual
108 pages
Panel Data Econometrics in R: The PLM Package: Yves Croissant Giovanni Millo
No ratings yet
Panel Data Econometrics in R: The PLM Package: Yves Croissant Giovanni Millo
51 pages
Panel Ecmiic2
No ratings yet
Panel Ecmiic2
57 pages
Chapter 2 Panel Data
No ratings yet
Chapter 2 Panel Data
17 pages
Panel Data Analysis Using STATA 13
No ratings yet
Panel Data Analysis Using STATA 13
17 pages
RohanChakraborty FinancialAnalytics CA2 PDF
No ratings yet
RohanChakraborty FinancialAnalytics CA2 PDF
10 pages
Introduction To Panel Data
No ratings yet
Introduction To Panel Data
20 pages
Panel Data Models
No ratings yet
Panel Data Models
25 pages
Panal Data Method ch14 PDF
No ratings yet
Panal Data Method ch14 PDF
38 pages
PLM
No ratings yet
PLM
51 pages
Panel Data Regression Models-Seminar
No ratings yet
Panel Data Regression Models-Seminar
18 pages
Ch10 Slides .Econometrics - MBA
No ratings yet
Ch10 Slides .Econometrics - MBA
32 pages
Fem & Rem
No ratings yet
Fem & Rem
20 pages
Intracranial Aneurysms by Andrew J Ringer Ebook and TestBank Bundle Official Test Bank
No ratings yet
Intracranial Aneurysms by Andrew J Ringer Ebook and TestBank Bundle Official Test Bank
311 pages
Panel 2 Up
No ratings yet
Panel 2 Up
9 pages
Panel Data Methods
No ratings yet
Panel Data Methods
17 pages
Panel Econometrics History
No ratings yet
Panel Econometrics History
65 pages
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Ecotrics (PR) Panel Data 2
No ratings yet
Ecotrics (PR) Panel Data 2
16 pages
Municipal Corporation of Greater Mumbai
No ratings yet
Municipal Corporation of Greater Mumbai
95 pages
Panel Data-1 FD and FE Estimators
No ratings yet
Panel Data-1 FD and FE Estimators
4 pages
Advanced Econometrics: Based On The Textbook by Verbeek: A Guide To Modern Econometrics
No ratings yet
Advanced Econometrics: Based On The Textbook by Verbeek: A Guide To Modern Econometrics
24 pages
Chapter 5
No ratings yet
Chapter 5
25 pages
Topic 6 - Static Panel Data
No ratings yet
Topic 6 - Static Panel Data
21 pages
ICT 7 Learning Module
No ratings yet
ICT 7 Learning Module
77 pages
Panel Data Lecture Notes
No ratings yet
Panel Data Lecture Notes
38 pages
Panel Data Lecture Rome
No ratings yet
Panel Data Lecture Rome
47 pages
Some Basics For Panel Data Analysis
No ratings yet
Some Basics For Panel Data Analysis
21 pages
Panel Data Model
No ratings yet
Panel Data Model
18 pages
An Introduction To Physics (Classical Mechanics)
From Everand
An Introduction To Physics (Classical Mechanics)
Jason King
No ratings yet
Note On Panel Data
No ratings yet
Note On Panel Data
19 pages
En 10306
No ratings yet
En 10306
1 page
Panel Data Assign
No ratings yet
Panel Data Assign
19 pages
Panel Data
No ratings yet
Panel Data
9 pages
Panel Vs Pooled Data
No ratings yet
Panel Vs Pooled Data
9 pages
RSA Projects Overview
100% (2)
RSA Projects Overview
7 pages
2025 - Fairview Bio Pi Mock F4
No ratings yet
2025 - Fairview Bio Pi Mock F4
13 pages
Deep and Surface Learning PDF
No ratings yet
Deep and Surface Learning PDF
1 page
eME4 HW5 Flores BSME-4B
No ratings yet
eME4 HW5 Flores BSME-4B
4 pages
FreemanWhite Hybrid Operating Room Design Guide PDF
No ratings yet
FreemanWhite Hybrid Operating Room Design Guide PDF
11 pages
RW A. Com: An Essay On Criticism
No ratings yet
RW A. Com: An Essay On Criticism
1 page
Accelerator 960-1 052018
No ratings yet
Accelerator 960-1 052018
4 pages
Lesson 3: Surface Creation
No ratings yet
Lesson 3: Surface Creation
86 pages
CdS/Graphene Photocatalysts
No ratings yet
CdS/Graphene Photocatalysts
28 pages
Spe 201216 Ms Minifrac
No ratings yet
Spe 201216 Ms Minifrac
12 pages
Design Life Cycle
No ratings yet
Design Life Cycle
16 pages
AX Series Hanyoung Brochure
No ratings yet
AX Series Hanyoung Brochure
6 pages
Hazop Ip
No ratings yet
Hazop Ip
117 pages
Ayitenew Determinantsof Internal Audit Effectiveness Evidencefrom Gurage Zone
No ratings yet
Ayitenew Determinantsof Internal Audit Effectiveness Evidencefrom Gurage Zone
12 pages
8300 Gcse Maths Practice Paper Set 3 1f Mark Scheme.321526845
No ratings yet
8300 Gcse Maths Practice Paper Set 3 1f Mark Scheme.321526845
17 pages
Habib Rehman Presentation
No ratings yet
Habib Rehman Presentation
8 pages
M62015L, FP M62016L, FP: V C Reset INT GND
No ratings yet
M62015L, FP M62016L, FP: V C Reset INT GND
4 pages
DxDiag Requisitos
No ratings yet
DxDiag Requisitos
30 pages
Forward and Inverse Modeling of Gravity Data
No ratings yet
Forward and Inverse Modeling of Gravity Data
14 pages
Blood Letting
No ratings yet
Blood Letting
4 pages
Performance and Durability Comparison: Dell Latitude 14 5000 Series vs. HP EliteBook 840 G1
No ratings yet
Performance and Durability Comparison: Dell Latitude 14 5000 Series vs. HP EliteBook 840 G1
20 pages
EE4308: Project 2-Autonomous Hector Navigation and Control: John Tan Victor Tay
No ratings yet
EE4308: Project 2-Autonomous Hector Navigation and Control: John Tan Victor Tay
18 pages
Dpi Reports
No ratings yet
Dpi Reports
2 pages
Troubleshooting Neato Botvac Connected Series
No ratings yet
Troubleshooting Neato Botvac Connected Series
4 pages

00 Panels1e

Uploaded by

00 Panels1e

Uploaded by

Econometric Methods for Panel Data — Part I

Thus, it can really be represented in a rectangular form, like a board. Di-

yit = α + β ′ Xit + uit , i = 1, . . . , N, t = 1, . . . , T

yit = α + β ′ Xit + uit ,

yit = α + β ′ Xit + µ′ Zµ,it + νit , i = 1, . . . , N, t = 1, . . . , T, (4)

which fulﬁlls all standard assumptions for OLS estimation.

ȳi. = α + β ′ X̄i. + ūi. , i = 1 . . . , N, (5)

2.2 The algebra of the LSDV estimator

can be obtained in two equivalent ways. Either X and Z are collected in a

which contains a Kronecker product.

with Q̃ = IT − T −1 ιT ι′T . All diagonal elements of Q̃ are (T − 1) /T , while

The form of (7) corresponds to a GLS (generalized least squares) esti-

2.3 Properties of the LSDV estimator

After replacing σ 2 by an empirical estimate, this variance formula admits

Table 1: Estimates for the gasoline panel by Baltagi and Griffin.

Regressor OLS t LSDV t

2.5 The bias of OLS in the FE model

yit = α + β ′ Xit + uit ,

The two error components µ and ν are assumed to be independent from

According to the deﬁnition of the Kronecker product, the matrix IN ⊗JT is

This implies that, for any given λ1 and λ2 ,

which looks complicated but succeeds in avoiding an N T × N T –matrix in-

In this notation, one may replace the transformation matrix P̃ by

3.2 feasible GLS in the RE model

cannot be calculated. Usually, θ must be estimated.

3.3 Properties of the RE estimator

where Ω must be estimated, as it was described above. For large samples,

Regressor fGLS t LSDV t

yit = α + β ′ Xit + uit ,

For example, in a time-series panel for a cross-country comparison, λt may

Matrices Zµ and Zλ have dimensions T N × N and T N × T . Note that

Q2 = IN ⊗ IT − IN ⊗ J̄T − J̄N ⊗ IT + J̄N ⊗ J̄T . (13)

4.2 Random eﬀects

Regressor 2–way FE t 2–way RE t 1–way

Note: FE estimates according to EViews. RE estimates use one GLS itera-

[2] Baltagi, B. (2005) Econometric Analysis of Panel Data, 3rd Edition,

[3] Carraro, C., Peracchi, F. and G. Weber, G. (1993) (eds.) ‘The

[5] Hsiao, C. (1986, 2004) Analysis of Panel Data, Cambridge University

[7] Wansbeek, T.J., and A. Kapteyn (1982) ‘A class of decomposi-

[8] Wooldridge, J.M. (2002) Econometric Analysis of Cross Section and

You might also like