Chapter 4
Chapter 4
25-04-2025
What are panel data?
• Panel (or longitudinal) data combine
time-series and cross-sectional data
in a very specific way.
• Panel data include observations on
the same variables from the same
cross-sectional sample from two or
more different time periods.
– For example, if you surveyed 200
students when they graduated from
your school and then administered the
same questionnaire to the same
individuals five years later, you would
25-04-2025
• Not every data set that combines time-
series and cross-sectional data meets this
definition.
• In particular, if different variables are
observed in the different time periods or
if the data are drawn from different
samples in the different time periods,
then the data are not considered to be
panel data
25-04-2025
Why use panel data?
panel data give “more informative data,
more variability, less collinearity among
variables, more degrees of freedom and
more efficiency.”
to provide insight into analytical
questions that can’t be answered by using
time-series or cross-sectional data alone.
• For example, panel data can help policymakers
design programs aimed at reducing
unemployment by allowing researchers to
determine whether the same people are
unemployed year after year or whether
different
25-04-2025
individuals are unemployed in different
years.
Type of variables we use?
• There are four different kinds of variables
that we encounter when we use panel
data.
First, we have variables that can differ
between individuals but don’t
change over time, such as gender,
ethnicity, and race.
Second, we have variables that change
over time but are the same for all
individuals in a given time period,
such as the retail price index and the
national unemployment rate.
25-04-2025
• If the number of observations differs among
panel members, we call such a panel an
unbalanced panel..
25-04-2025
Estimation of Panel Data
Regression
25-04-2025
The Fixed Effects approaches
25-04-2025
• The intercept values of the four
regions are statistically different;
being −245.7924 for AM, −84.220
(=−245.7924 + 161.5722) for OR,
93.8774 (=−245.7924 + 339.6328)
for TG, and −59.2258 (=−245.7924
+ 186.5666) for SNNP.
• These differences in the intercepts
may be due to unique features of
each region, such as differences in
25-04-2025
. The Random Effects Approach
• The reasoning underlying the LSDV model is
that in specifying the regression model we
have failed to include relevant explanatory
variables that do not change over time (and
possibly others that do change over time but
have the same value for all cross-sectional
units), and that the inclusion of dummy
variables is a cover up of our ignorance.
25-04-2025
• If the dummy variables do in fact
represent a lack of knowledge about the
(true) model, why not express this
ignorance through the disturbance term ?
This is precisely the approach suggested
by the proponents of the so called error
components model (ECM) or random
effects model (REM). The basic idea is
to start with (4.3):
• (4.6)
25-04-2025
• Instead of treating as fixed, we assume that it is a
random variable with a mean value of (no subscript
here). And the intercept value for an individual region
can be expressed as
• =1, 2, ………………. (4.7)
• where is a random error term with a mean value of
zero and variance of
• What we are essentially saying is that the four
regions included in our sample are a drawing
from a much larger universe of such regions and
that they have a common mean value for the
intercept ( = β1) and the individual differences in
the intercept values of each region are reflected
in the error term . Substituting (4.7) into (4.6),
we obtain:
25-04-2025
• The composite error term consists of two
components, , which is the cross-section, or
individual-specific, error component, and ,
which is the combined time series and cross-
section error component.
• The term error components model derives its
name because the composite error term wit
consists of two (or more) error components. The
usual assumptions made by REM are that
• N(0, )
• N(0, ) (4.10)
• ()
• 25-04-2025()
• As a result of the assumptions stated in (4.10), it
follows that
• (4.11)
• (4.12)
•
• Now if , there is no difference between models
(4.1) and (4.8), in which case we can simply pool
all the (cross-sectional and time series)
observations and just run the pooled regression,
as we did in (4.2). As (4.12) shows, the error
term wit is homoscedastic.
25-04-2025
Assumptions of fixed effects
I. The slopes of the regression lines
are the same across states
(countries)
II. The fixed effects capture entirely
the time- constant omitted
variables.
This means we can soak up unmodelled
heterogeneity across
individuals/regions/countries and thus
avoid misspecification error
But if there are time-varying omitted
25-04-2025
Several considerations will affect the choice
between a fixed effects and a random
effects model.
1. What is the nature of the variables that
have been omitted from the model?
a) If you think there are no omitted variables –
or if you believe that the omitted variables are
uncorrelated with the explanatory variables
that are in the model – then a random
effects model is probably best. It will produce
unbiased estimates of the coefficients, use all the
data available, and produce the smallest
standard errors. More likely, however, is that
omitted
25-04-2025 variables will produce at least some bias
b)If there are omitted variables, and these
variables are correlated with the variables in
the model, then fixed effects models may provide a
means for controlling for omitted variable bias.
In a fixed-effects model, subjects serve as
their own controls.
25-04-2025