0% found this document useful (0 votes)
66 views21 pages

Topic 6 - Static Panel Data

This document discusses static panel data models. It defines panel data as combining cross-sectional and time-series data on the same units over multiple time periods. It explains the reasons for using panel data models and examines fixed effects, random effects, and the Hausman test for choosing between the two. It also evaluates issues like autocorrelation, heteroskedasticity, and provides an example of a panel data regression.

Uploaded by

shaibu amana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views21 pages

Topic 6 - Static Panel Data

This document discusses static panel data models. It defines panel data as combining cross-sectional and time-series data on the same units over multiple time periods. It explains the reasons for using panel data models and examines fixed effects, random effects, and the Hausman test for choosing between the two. It also evaluates issues like autocorrelation, heteroskedasticity, and provides an example of a panel data regression.

Uploaded by

shaibu amana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Topic 10: Static Panel Data

Dr. O. B. Aworinde
Department of Economics
Babcock University
Ilishan-Remo

These presentation notes are based on Applied


Econometrics Second edition, by Dimitrios Asteriou
and Stephen G. Hall
Introduction
• Describe what panel data is and the
reasons for using it in this format
• Assess the importance of fixed and
random effects
• Examine the Hausman test, which
determines if fixed or random effects
should be used.
• Evaluate some panel data models
Panel Data
• These are Models that Combine Cross-
section and Time-Series Data
• In panel data the same cross-sectional
unit (industry, firm, country) is surveyed
over time, so we have data which is
pooled over space as well as time.
Reasons for using Panel Data
1. Panel data can take explicit account of
individual-specific heterogeneity (“individual”
here means related to the microunit)
2. By combining data in two dimensions, panel
data gives more data variation, less collinearity
and more degrees of freedom.
3. Panel data is better suited than cross-sectional
data for studying the dynamics of change. For
example it is well suited to understanding
transition behaviour – for example company
bankruptcy or merger.
4. Panel data is better at detecting and
measuring effects that cannot be observed
in either cross-section or time-series data.
5. Panel data enables the study of more
complex behavioural models – for example
the effects of technological change, or
economic cycles.
6. Panel data can minimise the effects of
aggregation bias, from aggregating firms
into broad groups.
If all the cross-sectional units have the same number of time
series observations the panel is balanced, if not it is
unbalanced.
Cross section
 y 11 y 21  y i 1  y N 1 
y y 22  y i 2  y N 2 
 12
Time       
series  y y 2t  y it

 y Nt 
 1t
      
 
y 1T y 2T  y iT  y NT 

- a matrix of balanced panel data observations on variable y,


N cross-sectional observations, T time series observations.
Suppose y is investment and x is a measure of profit. We have
i = 1…n companies and t = 1…T time periods. Suppose we
specify a simple econometric model which says that
investment depends on profit:
y it = a 0 + a1x it + u it (1)

uit is a random error term: E (uit ) ~ N (0, σ2)

Estimation of (1) depends on the assumptions that we make


about the intercept (a0), the slope coefficient (a1) and the
error term (uit ).
Several possible assumptions can be made in order to
estimate (1):
1. Assume that the intercept and slope coefficients are
constant across time and firms and that the error term
captures differences over time and over firms.
2. The slope coefficient is constant but the intercept varies
over firms.
3. The slope coefficient is constant but the intercept varies
over firms and over time.
4. All coefficients (intercept and slope) vary over firms.
5. The intercept as well as the slope vary over firms and time.
Pooled regression by OLS
This is estimation option 1 on the list. But pooled regression
may result in heterogeneity bias :
Pooled regression:
y
yit=a0+a1xit+uit

• • True model: Firm 4




• • True model: Firm 3
• •

• True model: Firm 2


• •

• • True model: Firm 1

x
Fixed Effects Estimation
The previous slide suggests that a better way to model the
data would be to allow each group (firm) to have its own
intercept: y it = a 0i + a1 x it + u it (2)

This is know as the (One Way) Fixed Effects Model.


How do we estimate it?
The simplest way to allow each firm to have its own intercept
is to create a set of dummy (binary) variables, one for each
firm, and include them as regressors.
N
y it =  D it a 0i + a1 x it + u it (3)
i =1

Consequently, this form of estimation is also known as Least


Squares Dummy Variables (LSDV). (Note that there is no
constant in this regression.)
However if there are a lot of groups (firms) then it becomes
very tedious to create all the dummy variables needed. Some
econometric software (e.g. Limdep) is able to automate this.
The method used is called the covariance estimator and works
be “differencing” out the fixed effect by expressing variables as
deviations from their group means, y i , x i :
y it − y i = a 0i − a 0i + a1 (x it − x i ) + (u it − u i ) (3)

So: y it − y i = a1 (x it − x i ) + u it ( 4)

A further extension is to allow the intercept to vary across the


different time periods (Two Way Fixed Effects):
N T
y it =  a 0i D it +  a 2i T it + a1 x it + u it (5)
i =1 t =1
The time dummy coefficients can allow the regression function
to shift over time to capture changes in technology,
government regulation, tax policy, external influences (wars…)
etc.
Allowing intercept and slope coefficients to vary across groups
If we have a sufficient long time dimension to the panel, we
could of course just estimate a separate OLS regression for
each group (firm). If the number of firms (cross-sectional
dimension) is small, then we could estimate a single
regression with interactions between x and the group dummy
variables D.
Random Effects Estimation
The fixed effects model assumes that each group (firm) has a
non-stochastic group-specific component to y. Including
dummy variables is a way of controlling for unobservable
effects on y.
But these unobservable effects may be stochastic (i.e.
random). The Random Effects Model attempts to deal with
this:
y it = a 0 + a1 x it + v i +  it (6)

Here the unobservable component, vi , is treated as a


component of the random error term. vi is the element of the
error which varies between groups but not within groups. εit is
the element of the error which varies over group and time.
We assume that:
E (v i ) = E ( it ) = 0
E (v i2 ) = v2
E ( it2 ) =  2 (both components homoscedas tic)
E ( it v j ) = 0  i ,t , j (independe nce of two components )
E ( it  js ) = 0 if t  s or i  j (no autocorrel ation)
E (v i v j ) = 0 if i  j (no across group correlatio n)
E (v i x it ) = E ( it x it ) = 0 (both independen t of regressor)

(We could also introduce an error component which varies


across time periods but not across groups – two way random
effects.)
Estimation of the random effects model cannot be performed
by OLS – instead a technique known as generalised least
squares (GLS) must be used.
Choosing between Fixed Effects (FE) and Random Effects (RE)
1. With large T and small N there is likely to be little
difference, so FE is preferable as it is easier to compute
2. With large N and small T, estimates can differ significantly.
If the cross-sectional groups are a random sample of the
population RE is preferable. If not the FE is preferable.
3. If the error component, vi , is correlated with x then RE is
biased, but FE is not.
4. For large N and small T and if the assumptions behind RE
hold then RE is more efficient than FE.
Hausman test:
Tests for the statistical significance of the difference between
the coefficient estimates obtained by FE and by RE, under
then null hypothesis that the RE estimates are efficient and
consistent, and FE estimates are inefficient.
The test has a Wald test form, and is usually reported in Chi2
form with k-1 degrees of freedom (k is the number of
regressors).
If W < critical value then random effects is the preferred
estimator.
Autocorrelation
• Although different to autocorrelation using the usual OLS
models, a version of the Durbin-Watson test can be used
in the usual way. (E-views reports this).
• To remedy autocorrelation we can use the usual
methods, such as the Error Correction Model.
• ‘Dynamic Models’ are also often used, which basically
involves adding a lagged dependent variable.
• Recently the use of a method for adjusting the standard
errors has become popular, the most common method is
termed the ‘Newey-West’ adjusted standard errors.
Heteroskedasticity
• Given that there is a cross-section component to
panel data, there will always be a potential for
heteroskedasticity.
• Although there are various tests for
heteroskedastcity, as with autocorrelation there
is a tendency to automatically use adjusted
standard errors, which remove the problem.
• With heteroskedasticity, it is usually White’s
adjusted standard errors that are used.
Example
• The data consists of 20 countries over 10 years of
annual data, giving 200 observations in all (T=200).
This produces the following result, where stock
prices are regressed against expenditure on
research (r ):

sˆt = 0.7 + 0.9rt


(0.4) (0.3)
R = 0.7, DW = 1.98
2
Example
• The results are interpreted in the usual way, however
you would need to decide whether you wished to use
fixed or random effects in this model.
• This requires including random effects, then using the
Hausman test to determine if the random effects applies.
If the null hypothesis is rejected, then the fixed effects
model is more appropriate.
• This is an example of a basic OLS panel data model,
you can apply all the other models to panel data too,
although it usually requires separate tests. (i.e. the test
for cointegration)
Conclusion
• Panel data is a method for estimating data
which is both time series and cross
sectional
• It has both advantages but also
disadvantages over OLS estimation
• It applies to many different techniques,
such as tests for stationarity

You might also like