0% found this document useful (0 votes)
37 views33 pages

Chapter 4

Chapter four introduces panel data regression models, which combine time-series and cross-sectional data to analyze the same variables over different time periods. It discusses the advantages of using panel data, such as increased variability and efficiency, and outlines the two main estimation approaches: fixed effects and random effects. The chapter also highlights the assumptions and considerations for choosing between these two models, emphasizing the importance of understanding the nature of omitted variables and their correlation with the independent variables.

Uploaded by

melkamu lobango
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views33 pages

Chapter 4

Chapter four introduces panel data regression models, which combine time-series and cross-sectional data to analyze the same variables over different time periods. It discusses the advantages of using panel data, such as increased variability and efficiency, and outlines the two main estimation approaches: fixed effects and random effects. The chapter also highlights the assumptions and considerations for choosing between these two models, emphasizing the importance of understanding the nature of omitted variables and their correlation with the independent variables.

Uploaded by

melkamu lobango
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Chapter four

Introduction to Panel Data


regression
Models

25-04-2025
What are panel data?
• Panel (or longitudinal) data combine
time-series and cross-sectional data
in a very specific way.
• Panel data include observations on
the same variables from the same
cross-sectional sample from two or
more different time periods.
– For example, if you surveyed 200
students when they graduated from
your school and then administered the
same questionnaire to the same
individuals five years later, you would
25-04-2025
• Not every data set that combines time-
series and cross-sectional data meets this
definition.
• In particular, if different variables are
observed in the different time periods or
if the data are drawn from different
samples in the different time periods,
then the data are not considered to be
panel data
25-04-2025
Why use panel data?
 panel data give “more informative data,
more variability, less collinearity among
variables, more degrees of freedom and
more efficiency.”
 to provide insight into analytical
questions that can’t be answered by using
time-series or cross-sectional data alone.
• For example, panel data can help policymakers
design programs aimed at reducing
unemployment by allowing researchers to
determine whether the same people are
unemployed year after year or whether
different
25-04-2025
individuals are unemployed in different
years.
Type of variables we use?
• There are four different kinds of variables
that we encounter when we use panel
data.
First, we have variables that can differ
between individuals but don’t
change over time, such as gender,
ethnicity, and race.
Second, we have variables that change
over time but are the same for all
individuals in a given time period,
such as the retail price index and the
national unemployment rate.
25-04-2025
• If the number of observations differs among
panel members, we call such a panel an
unbalanced panel..

25-04-2025
Estimation of Panel Data
Regression

• The two main approaches are the


 FIXED EFFECTS APPROACHES AND
THE RANDOM EFFECTS APPROACHES

25-04-2025
The Fixed Effects approaches

 The fixed effects model estimates panel data


equations by including enough dummy
variables to allow each cross-sectional entity
(like a state or country) and each time period to
have a different intercept:
• (4.4)
• where = 1 if the observation belongs to OR, 0
otherwise; = 1 if the observation belongs to
TG, 0 otherwise; and = 1 if the observation
belongs to SNNP, 0 otherwise.
• Since
25-04-2025
we are using dummies to estimate the
• So, the terms fixed effects and LSDV can be used
interchangeably. The results based on (4.4) are as
follows:
• = −245.7924 + 161.5722D2i + 339.6328D3i +
186.5666D3i + 0.1079X2i + 0.3461X3i
• se = (35.8112) (46.4563) (23.9863) (31.5068)
(0.0175) (0.0266)
• t = (−6.8635) (3.4779) (14.1594) (5.9214) (6.1653)
(12.9821)
• = 0.9345, d = 1.5976, and df = 74 ----------------(4.5)

25-04-2025
• The intercept values of the four
regions are statistically different;
being −245.7924 for AM, −84.220
(=−245.7924 + 161.5722) for OR,
93.8774 (=−245.7924 + 339.6328)
for TG, and −59.2258 (=−245.7924
+ 186.5666) for SNNP.
• These differences in the intercepts
may be due to unique features of
each region, such as differences in
25-04-2025
. The Random Effects Approach
• The reasoning underlying the LSDV model is
that in specifying the regression model we
have failed to include relevant explanatory
variables that do not change over time (and
possibly others that do change over time but
have the same value for all cross-sectional
units), and that the inclusion of dummy
variables is a cover up of our ignorance.

25-04-2025
• If the dummy variables do in fact
represent a lack of knowledge about the
(true) model, why not express this
ignorance through the disturbance term ?
This is precisely the approach suggested
by the proponents of the so called error
components model (ECM) or random
effects model (REM). The basic idea is
to start with (4.3):
• (4.6)
25-04-2025
• Instead of treating as fixed, we assume that it is a
random variable with a mean value of (no subscript
here). And the intercept value for an individual region
can be expressed as
• =1, 2, ………………. (4.7)
• where is a random error term with a mean value of
zero and variance of
• What we are essentially saying is that the four
regions included in our sample are a drawing
from a much larger universe of such regions and
that they have a common mean value for the
intercept ( = β1) and the individual differences in
the intercept values of each region are reflected
in the error term . Substituting (4.7) into (4.6),
we obtain:
25-04-2025
• The composite error term consists of two
components, , which is the cross-section, or
individual-specific, error component, and ,
which is the combined time series and cross-
section error component.
• The term error components model derives its
name because the composite error term wit
consists of two (or more) error components. The
usual assumptions made by REM are that
• N(0, )
• N(0, ) (4.10)
• ()
• 25-04-2025()
• As a result of the assumptions stated in (4.10), it
follows that
• (4.11)
• (4.12)

• Now if , there is no difference between models
(4.1) and (4.8), in which case we can simply pool
all the (cross-sectional and time series)
observations and just run the pooled regression,
as we did in (4.2). As (4.12) shows, the error
term wit is homoscedastic.

25-04-2025
Assumptions of fixed effects
I. The slopes of the regression lines
are the same across states
(countries)
II. The fixed effects capture entirely
the time- constant omitted
variables.
This means we can soak up unmodelled
heterogeneity across
individuals/regions/countries and thus
avoid misspecification error
But if there are time-varying omitted
25-04-2025
Several considerations will affect the choice
between a fixed effects and a random
effects model.
1. What is the nature of the variables that
have been omitted from the model?
a) If you think there are no omitted variables –
or if you believe that the omitted variables are
uncorrelated with the explanatory variables
that are in the model – then a random
effects model is probably best. It will produce
unbiased estimates of the coefficients, use all the
data available, and produce the smallest
standard errors. More likely, however, is that
omitted
25-04-2025 variables will produce at least some bias
b)If there are omitted variables, and these
variables are correlated with the variables in
the model, then fixed effects models may provide a
means for controlling for omitted variable bias.
 In a fixed-effects model, subjects serve as
their own controls.

 The idea/hope is that whatever effects the omitted


variables have on the subject at one time, they will
also have the same effect at a later time; hence
their effects will be constant, or “fixed.”
 HOWEVER, in order for this to be true, the
omitted
25-04-2025
variables must have time-invariant
I. By time-invariant values, we mean
that the value of the variable does not
change across time. Gender and race
are obvious examples, but this can also
include things like the Educational
Level of the Respondent’s Father.
II. By time-invariant effects, we mean
the variable has the same effect
across time, e.g. the effect of gender
on the outcome at time 1 is the same as
the effect of gender at time 5.
III.If either of these assumptions is
25-04-2025
• We also need explicit measurements of time-invariant
variables if they are thought to interact with other
variables in the model
2. How much variability is there within subjects?
a. If subjects change little, or not at all, across
time, a fixed effects model may not work very
well or even at all. There needs to be within-
subject variability in the variables if we are to use
subjects as their own controls.
If there is little variability within subjects then
the standard errors from fixed effects models
may be too large to tolerate.
b. Conversely, random effects models will often
have smaller standard errors. But, the trade-off is
that their coefficients are more likely to be
25-04-2025
3. Do we wish to estimate the effects
of variables whose values do not
change across time, or do we merely
wish to control for them?
a) With fixed effects models, we do
not estimate the effects of variables
whose values do not change across
time. Rather, we control for them or
“partial them out.” This is similar to an
experiment with random assignment. We
may not measure variables , but
whatever effects those variable have are
(subject to sampling variability) assumed
25-04-2025
• The equation for the fixed effects model
becomes:
Yit = β1Xit + αi + uit ---------------[eq.1]
Where
 – αi (i=1….n) is the unknown intercept for each
entity ( n entity-specific intercepts).
 – Yit is the dependent variable (DV) where i =
entity and t = time.
 – Xit represents one independent variable (IV),
 – β1 is the coefficient for that IV,
 – uit is the error term
25-04-2025
“The key insight is that if the unobserved variable
does not change over time, then any changes in
the dependent variable must be due to influences
other than these fixed characteristics.” (Stock
and Watson, 2003, p.289-290).
“The key insight is that if the unobserved variable
does not change over time, then any changes in
the dependent variable must be due to influences
other than these fixed characteristics.” (Stock
and Watson, 2003, p.289-290).
“In the case of time-series cross-sectional data
the interpretation of the beta coefficients would
be “…for a given country, as X varies across time
by 25-04-2025
one unit, Y increases or decreases by β units”
• Another way to see the fixed effects model is by
using binary variables. So the equation for the
fixed effects model becomes:
Yit = β0 + β1X1,it +…+ β kXk,it + γ 2 E 2 +…+ γ n
E n + uit [eq.2] Where
–Yit is the dependent variable (DV) where i = entity
and t = time.
–Xk,it represents independent variables (IV),
–βk is the coefficient for the IVs,
– uit is the error term
–En is the entity n. Since they are binary (dummies)
you have n-1 entities included in the model.
– γ2 Is the coefficient for the binary repressors
25-04-2025
(entities)
• Both eq.1 and eq.2 are equivalents:
“the slope coefficient on X is the same from
one [entity] to the next. The [entity]-specific
intercepts in [eq.1] and the binary
regressors in [eq.2] have the same source:
the unobserved variable Zi that varies
across states but not over time.”
• You could add time effects to the entity
effects model to have a time and entity
fixed effects regression model:
• Yit = β0 + β1X1,it +…+ βkXk,it + γ2E2 +…+
γnEn + δ2T2 +…+ δtTt + uit------------ [eq.3]
25-04-2025
• Yit = β0 + β1X1,it +…+ βkXk,it + γ2E2 +…+ γnEn + δ2T2 +…+ δtTt
+ uit [eq.3]
• Where
–Yit is the dependent variable (DV) where i = entity and t
= time.
–X k, it represents independent variables (IV),
–βk is the coefficient for the IVs,
–uit is the error term
–En is the entity n. Since they are binary (dummies) you
have n-1 entities included in the model.
–γ2 is the coefficient for the binary regressors (entities) .
–Tt is time as binary variable (dummy), so we have t-1
time periods.
–δ t is the coefficient for the binary time regressors .
• Control
25-04-2025
for time effects whenever unexpected variation or special
events my affect the outcome variable.
Summary
• “…The fixed-effects model controls for all time-
invariant differences between the individuals,
so the estimated coefficients of the fixed-effects
models cannot be biased because of omitted
time-invariant characteristics…[like culture, religion,
gender, race, etc] One side effect of the features of
fixed-effects models is that they cannot be used to
investigate time-invariant causes of the dependent
variables. Technically, time-invariant characteristics of
the individuals are perfectly collinear with the person
[or entity] dummies.
• Substantively, fixed-effects models are designed to
study the causes of changes within a person [or
entity]. A time-invariant characteristic cannot cause
such
25-04-2025
a change, because it is constant for each
random effects model
• The rationale behind random effects model is
that, unlike the fixed effects model, the
variation across entities is assumed to be
random and uncorrelated with the predictor or
independent variables included in the model:
• “…the crucial distinction between fixed
and random effects is whether the
unobserved individual effect embodies
elements that are correlated with the
regressors in the model, not whether
these
25-04-2025
effects are stochastic or not”
• If you have reason to believe that
differences across entities have some
influence on your dependent variable
then you should use random effects. An
advantage of random effects is that you
can include time invariant variables (i.e.
gender). In the fixed effects model these
variables are absorbed by the intercept.
The random effects model is:
• Yit = βXit + α + uit + εit----------- [eq.4]
25-04-2025
• Random effects assume that the entity’s error
term is not correlated with the predictors which
allows for time-invariant variables to play a role
as explanatory variables.
• In random-effects you need to specify those
individual characteristics that may or may not
influence the predictor variables.
• The problem with this is that some variables may
not be available therefore leading to omitted
variable bias in the model. RE allows to
generalize the inferences beyond the sample
used
25-04-2025
in the model.
The Random Effects Model
• An alternative to the fixed effects model
is called the random effects model.
• While the fixed effects model is based on
the assumption that each cross-sectional
unit has its own intercept, the random
effects model is based on the assumption
that the intercept for each cross-sectional
unit is drawn from a distribution that is
centered around a mean intercept.
• Thus each intercept is a random draw
25-04-2025
Random Effects…
 The random effects model has several
clear advantages over the fixed effects
model.
 In particular, a random effects model will
have quite a few more degrees of
freedom than a fixed effects model,
because rather than estimating an
intercept for virtually every cross-
sectional unit, all we need to do is to
estimate the parameters that describe
the distribution of the intercepts.
25-04-2025
Thank you!

25-04-2025

You might also like