Chapter-11Panel Data
Chapter-11Panel Data
Chapter-11Panel Data
Chapter 11
Panel Data
Learning Outcomes
• Describe the key features of panel data and outline the advantages and
disadvantages of working with panels rather than other structures
• Contrast the fixed effect and random effect approaches to panel model
specification, determining which is the more appropriate in particular
cases
• Panel data, also known as longitudinal data, have both time series and cross-
sectional dimensions.
• They arise when we measure the same collection of people or objects over a
period of time.
• Econometrically, the setup is
yit xit uit
where yit is the dependent variable, is the intercept term, is a k 1 vector
of parameters to be estimated on the explanatory variables, xit; t = 1, …, T;
i = 1, …, N.
• The simplest way to deal with this data would be to estimate a single, pooled
regression on all the observations together.
• But pooling the data assumes that there is no heterogeneity – i.e. the same
relationship holds for all the data.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 3
• There are a number of advantages from using a full panel technique when a
panel of data is available.
• One approach to making more full use of the structure of the data would be to use the
SUR framework initially proposed by Zellner (1962). This has been used widely in
finance where the requirement is to model several closely related variables over time.
• A SUR is so-called because the dependent variables may seem unrelated across the
equations at first sight, but a more careful consideration would allow us to conclude
that they are in fact related after all.
• Under the SUR approach, one would allow for the contemporaneous relationships
between the error terms in the equations by using a generalised least squares (GLS)
technique.
• The idea behind SUR is essentially to transform the model so that the error terms
become uncorrelated.
• If the correlations between the error terms in the individual equations had been zero
in the first place, then SUR on the system of equations would have been equivalent to
running separate OLS regressions on each equation.
• The fixed effects model for some variable yit may be written
yit xit i vit
• We can think of i as encapsulating all of the variables that affect yit cross-
sectionally but do not vary over time – for example, the sector that a firm
operates in, a person's gender, or the country where a bank has its
headquarters, etc. Thus we would capture the heterogeneity that is
encapsulated in i by a method that allows for different intercepts for each
cross sectional unit.
• Time-variation in the intercept terms can be allowed for in exactly the same way as
with entity fixed effects. That is, a least squares dummy variable model could be
estimated
yit xit 1D1t 2 D 2t ... T DTt vit
where D1t, for example, denotes a dummy variable that takes the value 1 for the first
time period and zero elsewhere, and so on.
• The only difference is that now, the dummy variables capture time variation rather
than cross-sectional variation.
• Similarly, in order to avoid estimating a model containing all T dummies, a within
transformation can be conducted to subtract away the cross-sectional averages from
each observation
• Finally, it is possible to allow for both entity fixed effects and time fixed effects
within the same model. Such a model would be termed a two-way error component
model, and the LSDV equivalent model would contain both cross-sectional and time
dummies
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 13
• The null hypothesis that the bank fixed effects are jointly zero (H0: i = 0) is rejected at the 1%
significance level for the full sample and for the second sub-sample but not at all for the first sub-
sample. Overall, however, this indicates the usefulness of the fixed effects panel model that
allows for bank heterogeneity.
• Unlike the fixed effects model, there are no dummy variables to capture the
heterogeneity (variation) in the cross-sectional dimension.
• Instead, this occurs via the i terms.
• Note that this framework requires the assumptions that the new cross-sectional error
term, i, has zero mean, is independent of the individual observation error term vit, has
constant variance, and is independent of the explanatory variables.
• The parameters ( and the vector) are estimated consistently but inefficiently by
OLS, and the conventional formulae would have to be modified as a result of the
cross-correlations between error terms for a given cross-sectional unit at different
points in time.
• Define the ‘quasi-demeaned’ data as yit* yit yi and similarly for xit,
• will be a function of the variance of the observation error term, v2, and of
the variance of the entity-specific error term, 2:
v
1
T 2 v2
• This transformation will be precisely that required to ensure that there are no
cross-correlations in the error terms, but fortunately it should automatically
be implemented by standard software packages.
• Just as for the fixed effects model, with random effects, it is also
conceptually no more difficult to allow for time variation than it is to allow
for cross-sectional variation.
• It is often said that the random effects model is more appropriate when the
entities in the sample can be thought of as having been randomly selected from
the population, but a fixed effect model is more plausible when the entities in the
sample effectively constitute the entire population.
• More technically, the transformation involved in the GLS procedure under the
random effects approach will not remove the explanatory variables that do not
vary over time, and hence their impact can be enumerated.
• Also, since there are fewer parameters to be estimated with the random effects
model (no dummy variables or within transform to perform), and therefore
degrees of freedom are saved, the random effects model should produce more
efficient estimation than the fixed effects approach.
• However, the random effects approach has a major drawback which arises from
the fact that it is valid only when the composite error term it is uncorrelated with
all of the explanatory variables.
• This assumption is more stringent than the corresponding one in the fixed effects
case, because with random effects we thus require both i and vit to be independent of
all of the xit.
The Data
• There may also be differences in policies for credit provision dependent upon
the nature of the formation of the subsidiary abroad – i.e. whether the
subsidiary's existence results from a take-over of a domestic bank or from the
formation of an entirely new startup operation (a ‘greenfield investment’).
• The data cover the period 1993-2000 and are obtained from BankScope.
The Model
• These are: weakness parent bank, defined as loan loss provisions made by the
parent bank; solvency is the ratio of equity to total assets; liquidity is the ratio
of liquid assets / total assets; size is the ratio of total bank assets to total
banking assets in the given country; profitability is return on assets and
efficiency is net interest margin.
• and the 's are parameters (or vectors of parameters in the cases of 4 and
5), i is the unobserved random effect that varies across banks but not over
time, and it is an idiosyncratic error term.
Estimation Options
• de Haas and van Lelyveld discuss the various techniques that could be
employed to estimate such a model.
• OLS is considered to be inappropriate since it does not allow for differences
in average credit market growth rates at the bank level.
• A model allowing for entity-specific effects (i.e. a fixed effects model that
effectively allowed for a different intercept for each bank) is ruled out on the
grounds that there are many more banks than time periods and thus too many
parameters would be required to be estimated.
• They also argue that these bank-specific effects are not of interest to the
problem at hand, which leads them to select the random effects panel model.
• This essentially allows for a different error structure for each bank. A
Hausman test is conducted, and shows that the random effects model is valid
since the bank-specific effects i are found “in most cases not to be
significantly correlated with the explanatory variables.”
Results