100% found this document useful (1 vote)
119 views

Panel Data Regression Models

Panel data refers to data collected on the same cross-sectional units (e.g. individuals, firms, countries) over multiple time periods. Panel data has both cross-sectional and time dimensions. There are several advantages to using panel data over cross-sectional or time series data alone, including the ability to better account for heterogeneity across units and observe dynamics of change over time. When estimating models with panel data, common approaches include pooled OLS, fixed effects, and random effects models. The fixed effects model accounts for heterogeneity by including dummy variables for each cross-sectional unit, while the random effects model treats heterogeneity stochastically through the error term. A Hausman test can determine whether the fixed effects or random effects model is

Uploaded by

Abdi Hiir
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
119 views

Panel Data Regression Models

Panel data refers to data collected on the same cross-sectional units (e.g. individuals, firms, countries) over multiple time periods. Panel data has both cross-sectional and time dimensions. There are several advantages to using panel data over cross-sectional or time series data alone, including the ability to better account for heterogeneity across units and observe dynamics of change over time. When estimating models with panel data, common approaches include pooled OLS, fixed effects, and random effects models. The fixed effects model accounts for heterogeneity by including dummy variables for each cross-sectional unit, while the random effects model treats heterogeneity stochastically through the error term. A Hausman test can determine whether the fixed effects or random effects model is

Uploaded by

Abdi Hiir
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Panel Data Regression

Models
Lecture 10
Basic Econometrics II
Instructor: Shahid Akbar
Panel Data
 In panel data the same cross-sectional unit
(say a family or a firm or a state) is surveyed
over time.
 Which means, panel data have space as well
as time dimensions.
Example: U.S. Eggs Production
Why Panel Data?
Advantages of Panel data over cross-section or time series
data?
 Since panel data relate to individuals, firms, states, countries,
etc., over time, there is bound to be heterogeneity in these
units. The techniques of panel data estimation can take such
heterogeneity explicitly into account by allowing for subject-
specific variables, as we shall show shortly. We use the term
subject in a generic sense to include micro units such as
individuals, firms, states, and countries.
 By combining time series of cross-section observations, panel
data gives “more informative data, more variability, less
collinearity among variables, more degrees of freedom and
more efficiency.”
Why Panel Data?
 By studying the repeated cross section of observations, panel
data are better suited to study the dynamics of change. Spells
of unemployment, job turnover, and labor mobility are better
studied with panel data.
 Panel data can better detect and measure effects that simply
cannot be observed in pure cross-section or pure time series
data. For example, the effects of minimum wage laws on
employment and earnings can be better studied if we include
successive waves of minimum wage increases in the federal
and/or state minimum wages.
 Panel data enables us to study more complicated behavioral
models. For example, phenomena such as economies of scale
and technological change can be better handled by panel data
than by pure cross-section or pure time series data.
Why Panel Data?
 By making data available for several thousand units,
panel data can minimize the bias that might result if we
aggregate individuals or firms into broad aggregates.
 In short, panel data can enrich empirical analysis in
ways that may not be possible if we use only cross-
section or time series data.
An Illustrative Example
 The data analyzes the costs of six airline firms (N) for the
period (T) 1970–1984, for a total of 90 panel data
observations. (i.e. Balanced Panel+ Long Panel)
Balanced Panel: a panel is said to be balanced if each
subject (firm, individuals, etc.) has the same number of
observations.
Unbalanced Panel: If each entity has a different number of
observations, then we have an unbalanced panel.
Short Panel: N>T; Long Panel: N<T
 The variables are defined as: I = airline id; T = year id; Q
= output, in revenue passenger miles, an index number; C
= total cost, in $1,000; PF = fuel price; and LF = load
factor, the average capacity utilization of the fleet.
An Illustrative Example
 Suppose, we wish to estimate an airline cost
function.
 How do we go about estimating this function?
 Four Possibilities:

1. Pooled OLS Method


2. The Fixed Effects Least Squares Dummy
Variable (LSDV) Model.
3. Fixed Effects Within-Group Model
4. Random Effects Model (REM)
Pooled OLS Method
In this Estimation Method, All observations are pooled and a “grand”
regression is estimated, neglecting the cross-section and time series
nature of our data.

……………………..
Eq 16.3.1

Here in this model, we have Assumed that


• All 90 observations are pooled together
• The regression coefficients are the same for all
the airlines. That is, there is no distinction
between the airlines—one airline is as good as
the other, an assumption that may be difficult to
maintain.
• Explanatory Variables are Nonstochastic.
• ui t ∼ iid(0, σu2 )
Results of the Illustrative Example
Table
16.2

 All the regression coefficients are not only highly statistically


significant but are also in accord with prior expectations and
that the
 R2 value is very high.
 However, The estimated Durbin–Watson statistic is quite low,
suggesting that perhaps there is autocorrelation and/or
spatial correlation in the data.
 Of course, as we know, a low Durbin–Watson could also be
due to specification errors.
Major Problem in using Pooled OLS
Method
 It does not distinguish between the various Airlines,
heterogeneous nature, nor does it tell us whether the
response of total cost to the explanatory variables
over time is the same for all the airlines.
 Another way of stating this is that the individuality of
each subject is subsumed in the disturbance term uit.
As a consequence, it is quite possible that the error
term may be correlated with some of the regressors
included in the model.
 If that is the case (i.e. discarding heterogeneous
nature , the estimated coefficients in Eq. (16.3.1) may
be Biased as well as Inconsistent.
Fixed Effects Least Squares Dummy Variable
(LSDV) Model
All observations are pooled, but allow each cross-section unit
(i.e., airline in our example) to have its own (intercept) dummy
variable, thus allowing for Heterogeneity.

• In the literature, model (16.4.1) is known as the fixed


effects (regression) model (FEM). The term “fixed
effects” is due to the fact that, although the intercept may
differ across subjects (here the six airlines), each entity’s
intercept does not vary over time, that is, it is time-
invariant.
Cost Function: Comparison Between
POLS and FEM
Figure 16.1: Bias from ignoring fixed
effects

 You can see from Figure 16.1 how the pooled regression
can bias the slope estimate.
 How do we actually allow for the (fixed effect) intercept to
vary among the airlines?
Differential Intercept Dummy
Technique
Now we write Eq. (16.4.1) as:

 where D2i = 1 for airline 2, 0 otherwise; D3i = 1 for airline 3, 0


otherwise; and so on.
 Most Importantly, Since we have six airlines, we have introduced
only five dummy variables to avoid falling into the dummy-variable
trap (i.e., the situation of perfect collinearity).
 Here, we are treating airline 1 as the base, or reference, category.
 the intercept α1 is the intercept value of airline 1 and the other α
coefficients represent by how much the intercept values of the other
airlines differ from the intercept value of the first airline.
 Keep in mind that if you want to introduce a dummy for each
airline, you will have to drop the (common) intercept; otherwise, you
will fall into the dummy-variable trap.
Results of the model (16.4.2) for
Airline Cost Function
Table
16.3

 All the differential intercept coefficients are


individually highly statistically significant, suggesting
that perhaps the six airlines are heterogeneous.
Problems in LSDV Model
 First, if you introduce too many dummy
variables, you will run up against the degrees
of freedom problem.
 Second, with many dummy variables in the
model, both individual and interactive or
multiplicative, there is always the possibility of
multicollinearity, which might make precise
estimation of one or more parameters difficult.
 Third, in some situations the LSDV may not be
able to identify the impact of time invariant
variables.
 Fourth, the classical assumption for ui t may
have to be modified.
Fixed-Effects Within-Group (WG)
Model
All observations are pooled, but for each cross-section (airline) we
express each variable as a deviation from its mean value and then
estimate an OLS regression on such mean-corrected or “de-meaned”
values.
 Letting tcit , qit , pfit , and lfit represent the mean-
corrected values, we now run the regression.

 WG estimator takes into account heterogeneity


through differencing sample observations
around their sample means rather than
introducing dummies.
 As a result WG estimator produces consistent
estimates but with the cost of lost of efficiency.
Results of the model (16.5.1) for
Airline Cost Function
Random Effects Model (REM)
In this model, the intercept values are a random drawing from a
much bigger population, here in our example (of airlines), rather
than fixed for each individual.

 Here, the basic idea is to model the lack of


knowledge about the true model through
the Disturbance Term rather than Dummy
variables.
 The basic model to start with,
REM: Assumptions
The usual assumptions made by the REM are:

As a result,
The Results of REM: Airline Cost
Function

erage
tercept
lues

Intercept Value of Each Airline

 Here, Generalized Least Square (GLS) method has


been used In order to get efficient estimates.
Choice Between FEM and REM
 Hausman Test: This test has an
Asymptotically Chi-square Distribution.
 H : FEM is Appropriate
0
 H : REM is Appropriate
1
 If the null hypothesis is rejected, the

conclusion is that the REM is not appropriate


because the random effects are probably
correlated with one or more regressors. In
this case, FEM is preferred to REM.
Results of Hausman Test: Airline Cost
Function

 The Hausman test clearly rejects the null hypothesis, for the
estimated χ2 value for 3 df is highly significant; if the null
hypothesis were true, the probability of obtaining a chisquare
value of as much as 49.62 or greater would be practically
zero. As a result, we can reject the ECM (REM) in favor of FEM.
Guidelines: FEM and REM
 If it is assumed that εi and the X’s are uncorrelated, ECM may be
appropriate, whereas if εi and the X’s are correlated, FEM may be
appropriate.
 If T>N then there is likely to be little difference in the values of the
parameters estimated by FEM and ECM. Hence the choice here is
based on computational convenience. On this score, FEM may be
preferable.
 If N>T and we strongly believe that the individual, or cross-
sectional, units in our sample are not random drawings from a larger
sample. In that case, FEM is appropriate. If the cross-sectional units
in the sample are regarded as random drawings, however, then ECM
is appropriate.
 If the individual error component εi and one or more regressors are
correlated, then the ECM estimators are biased, whereas those
obtained from FEM are unbiased.
 If N is large and T is small, and if the assumptions underlying ECM
hold, ECM estimators are more efficient than FEM.
Thank You

You might also like