0% found this document useful (0 votes)
48 views47 pages

Chapter 5 Panel Data

Uploaded by

ermiastiruneh12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views47 pages

Chapter 5 Panel Data

Uploaded by

ermiastiruneh12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Chapter 5

Panel Data Econometrics

Wondu A (UOG)
5.1 Introduction
Panel data econometrics is the application of econometric
analysis on panel data. Panel data set is data set where the
same sample units /individuals are observed repeatedly
(more than once) at regular time interval. Panel data have
the characteristics of both crossectional and time series data.
It has space as well as time dimensions.
In panel data the same cross-sectional unit (individual, a
country, a region, firm, a household, etc.) is surveyed over a
repeated period of time.
There are other names for panel data, such as:
• Pooled data (pooling of time series and cross-sectional
observations),
• Crossection – time series data
5.1 cont’d…
• Micro - panel data - is a panel for which the time dimension(T)
is less than individual dimension (N). Three years observation of
1000 households
• Macro - panel data - is a panel for which the time dimension (T) is
similar to the individual dimension (N); E.g. 30 year GDP data for 30
countries.
• Longitudinal data (a study over time of a variable or group of
subjects),
• Event history analysis (e.g., studying the movement over time of
subjects through successive states or conditions),
• Cohort data / cohort analysis (e.g., following the career path of
graduates).
Although there are subtle variations, all these names essentially
connote movement over time of cross-sectional units.
5.1 cont’d…
Terminology and notations:
Cross section unit or panel units (individuals or subjects) are entities
such as country, region, state, firm, consumer, individuals, employees,
unemployed, professionals, organizations, etc. which are being
observed with respect some variables.
Uses of two or double subscripts or indices; ‘i’ and ‘t’:
i; - indicating cross-section unit and t - for time.

i for crossectional unit/individual; where i = 1, ..N


N- is total number of crossectional units or sample subjects observed
‘t’ for time where t = 1, ..T T - total time/years of
observation.
= total sample size of a panel data set in balanced data.
5.1 cont’d…
Balanced vs. Unbalanced penal Data
Panel data can be balanced when all individuals are observed in
all time periods (T is the same for all i) or unbalanced when
individuals are not observed in all time periods (T differ across
individuals).
A penal is said to be balanced if observation for all individuals
made for the same number of time (periods): t = 1, 2… T, is the
same for all cross- sectional units.
If each cross-sectional unit has the same number of time series
observations, then such a panel (data) is called a balanced panel.
If the number of time observations made differs among panel
members, we call such a panel an unbalanced panel. For
example, if some individuals are observed for 3 years, some for 5
years and others for 2 years.
5.1 cont’d…
Long Vs Short Panel Data
Short panel: data set that contain many individuals /
crossectional units and few time periods.
Long panel: panel data set over many time periods
and few individual or crossectional units
5.2 Advantages of Panel Data
What are the main advantages of the panel data sets and
the panel data models?
Advantage 1. Insure larger number of observations
Advantage 2. new economic questions (identification)
Advantage 3. Handle unobservable components
Advantage 4. Address individual heterogeneity in
crossectional units.
Advantage 5; reduce specification bias
5.2 cont’d…
Advantage 1. larger number of observations
Panel data usually give the researcher a large number of data points
(sample size = N T ) , N- number of cross sectional units and T =
time periods(number of time observations are made in time).
By combining time series and cross-sectional observations/units
increase sample size and give more information (data), increase the
degrees of freedom, reducing the collinearity among
explanatory variables and hence improving the efficiency of
econometric estimates
Advantage 2: address new economic questions
(identification)
Longitudinal data allow a researcher to analyze a number of
important questions that cannot be addressed using cross-sectional
or time-series data sets alone.
5.2 cont’d…
Advantage 3. Handle unobservable components or
factors.
Panel data allows to control for omitted variables or
unobserved factors. Panel data provides a means of
resolving econometric problems associated with effects of
omitted (unobserved) variables and individual specific
effects that are correlated with explanatory variables.
Advantage 4. It effectively used to address
heterogeneity among crossectional units/individuals.
Since panel data relate to individuals, firms, states,
countries, etc., over time, there is bound to be
heterogeneity in these units. Panel data estimation can
take such heterogeneity explicitly.
5.3. The heterogeneity issue

Addressing the Issue of heterogeneity across individuals and across time is


key in panel data analysis. In fact , panel data is all about a concern in
dealing with heterogeneity.
Panel data contains data on crossectional units observed over time.
individuals, firms, states, countries, etc., observed at some time intervals,
say year, there would be heterogeneity among the subjects/units. The
techniques of panel data estimation has to take into account such
heterogeneity explicitly by allowing for individual-specific variables.
Pooled OLS Regression Method (application simple OLS regression on
panel data) - OLS regression where the time and individual dimensions of
panel data are ignored and run regression on merged data.
Pooled OLS regression doesn’t take into account the heterogeneity problem
that arises out of individual specific effect and/or time differences. It simply
consider the intercept and slope coefficients are constant over time and
the same for all crossectional units or across individuals.
5.3 cont’d…
• For panel data we cannot assume that observations are
independently distributed across time and serial correlation of
residuals becomes an issue.
• We must be prepared that unobserved factors, while acting
differently on different cross-sectional units, may have a lasting
effect upon the same statistical unit when followed through time.
This makes the statistical analysis of panel data more difficult.
Penal data models are designed to deal with heterogeneity
which are individual specific effects and/ or due to time
effects.
Heterogeneity issue (in model specification) consists in specifying
and estimating the individual and/or time – specific effects that
exist among cross-sectional units or time-series units.
5.4 Penal Data Models
A panel data regression model (or panel data model) is an
econometric
model specifically designed for panel data.
There are a number of panel data models such as:
1. Least Square Dummy variable (LSDV)
2. Fixed Effects Model (FEM)
3. Random effects Model, REM, (GLS – General
Least Square)
4. The Between model
5. First difference Model (FDM)
6. Difference - in- Difference model
5.4 cont’d…

Since penal data were observations across different


crossectional units and time; the construction of a penal
data model is based on possible assumptions that we make
about heterogeneity of the crossectional units in time and
space; and how we may capture this heterogeneity in the
model.
Example: consider Cobb Douglas production function with
two factors (labor and capital); For N number of countries
( crossectional units) and T periods. Let us denote:

Where; log of GDP (gross output) for country i at time t.


log of total capital stock of country ‘i' at time t. log of
total labor employment for country i at time t .
5.4 cont’d…
In this specification, Several alternative specifications can be
considered based on what we assume about nature
(heterogeneity) of coefficients/parameters of the model
across countries and over time. Panel data is all about
addressing heterogeneity in time and across individuals.
Assumption 1. All coefficients or parameters (intercept and slope
coefficients) of the model are constant or the same for all crossectional
units and across time; and the error term captures differences over time and
individuals. Her we are assuming the crossectional units are homogeneous
(this is called pooled OLS regression).
The coefficients of our hypothetical production function is the same for all
crossectional units/Countries,: in this case we have a homogeneous
specification (the usual simple OLS regression).
= , =
The subscript ‘i’ dropped from the coefficients because there is no country -
wise difference in terms of the parameters (elasticities coefficients).
5.4 cont’d…
Assumption 2. The slope coefficients are constant but the
intercept varies over the observed units (across individual
units).The intercept term is heterogenous across individuals and we don’t
have a single intercept term in a model, each cross sectional unit has its
own intercept. But the intercept is still time invariant, individual specific
intercept don’t vary with time.

Model of this form is called an unobserved effects model or fixed effects


model. is generally called an unobserved or fixed effect. This is the
most commonly employed assumption in panel data regression; where
individual specific effects captured by time invariant individual
intercept(in such case, we use fixed effect model, FEM)
Other models assume these individual specific effects are not fixed but
are random , ie, is random , hence , such models are called Random
effect Model.
5.4 cont’d…
Assumption 3 . The slope coefficients are constant but the
intercept varies over individuals and across time. Now the intercept
term not only vary across crossectional units but also vary over
time( individual and time – variant); now it contains time index as
well : ‘

Now individual specific intercept will also vary in time.


Assumption 4. All coefficients (the intercept as well as slope
coefficients) vary over individuals. All coefficient in the model are
individual variant, but they are time invariant. Note that all the
coefficient will have ‘i’ but not time subscript. ,

This is a more complex assumption where individuals will not


have the same intercept and slope coefficients. This will be
captured by adding interactive dummies into the model.
5.4 cont’d…
Assumption 5.The intercept as well as slope coefficients vary
over individuals and time. All the coefficient and intercept term
vary with time and crossectional units.
Now all the coefficient will have ‘i’ and ‘t’ subscripts.
,

These assume a far more complex scenario where both slope


coeffects and intercept are not only individual specific but also
vary in time.
5.4 cont’d…
Now we proceed to brief discussion of Panel Regression
models.
In STATA we use the following command to declare data as
panel (given the data is annual)
xtset id year
Where ‘id ‘ – crossectional unit identifier, and ‘year’ is for
time variable
Command to display summery of data for all variable:
xtsum
Panel data regression command begins with : xtreg
In panel data analysis, Statistical software (such as Stata)
provide three different types of statistics (mean, variance,
Deviation, etc) for that data.
4.4 cont’d…
The Overall statistics are ordinary statistics that are based
on total observation we have (N*T).
The “Between” statistics are calculated on the basis of
summary statistics of individual crossectional units (entities,
N) regardless of time period.
The “within” statistics by summary statistics based on time
periods (T) regardless of individual crossectional units.
Next slide display computation the three variances based a
hypothetical observation for three crossectional units (N= 3),
in three time periods (T = 3) for variable X.
A mean within crossectional unit; Constant for
time invariant variables e.g. For dummies such as
gender, race, location, education level, etc.

Since there only N number of such mean the number of


observation equal to the number of individuals
5.4 cont’d….
Time-invariant regressors (race, gender, education)
have zero within variance, because with respect
time they don’t vary.
Individual-invariant regressors (time, economy
trends) have zero between variation. Since such
regress are invariant across individuals, individual
mean = overall mean: = and = 0, hence, variance
become zero.
Penal regressions model is based on any one these
variances in estimation.
5.4 cont’d…
1. Least Square Dummy variable (LSDV) Model .
Also called Fixed effect LSDV Model
LSDV assumes:
• The intercept term is not the same across individual ,ie, there is
individual specific difference in the panel data
• Individual specific difference/intercept coefficient is time
invariant ,i.e., it doesn’t vary with time
This is OLS method used on panel data with extensive uses of
dummy Variables to address the heterogeneity in the panel
data. LSDV use intercept differential dummies to capture
the fixed individual - specific effects in the panel data.
5.4 cont’d…
This is OLS estimation method customized to deal with
Panel data heterogeneity, it is not available under list of
Stata menu for penal data regression. Estimation is
Performed through linear regression command.
The individual specific heterogeneous intercept term, ,
(assumption 2) will be estimated by introducing intercept
dummies for each crossectional units

Where - dummy for individual crossectional unit, we


would have as many dummies as the number of
crossectional units. The coefficient of is intercept
differential.
5.4 cont’d…
• we can also use time dummies to capture time specific
effects or other interaction dummies for differences in
slope coefficient;
• Can be implemented easily if we have few crossectional
units.
• However, dummy variable regressions become
impractical when the number of cross-section units get
large.
Limitation: when there are large number of
crossectional units, it will become inapplicable for
creating large dummies will complicate analysis,
interpretation and consume more degrees of freedom.
5.4 cont’d…
2. Fixed Effect Model , FEM (within)
Assumptions of FEM
• FEM assume individual-specific effects (the intercept coefficient) vary
across individuals
• however, it assume the intercept coefficient is constant or fixed in
time , that is, it is or time invariant and
• It assumes the intercept term/ individual specific effect Correlate
with the repressors/explanatory variables in the model.
Hence, fixed effect model examines differences in intercept term
across the cross sectional units.
• LSDV is also a variant of fixed model which use dummy variables to
estimate individual specific effect differences (differences in the
intercept term across individual units), but FEM (Within) will not need
or use dummy variables rather it estimate a transformed version of
the model.
5.4 cont’d…
• Since individual effects coefficient () is assumed time
invariant it is allowed to correlate with other regressors in
the model without violation CLRM assumptions.
• We include as the intercept term in FEM
• Each individual has a different intercept term and the same
slope parameters.

• Time dummies and other dummy variables can be included


in FEM
• If αi is correlated with explanatory variables, there will be
an endogeneity problem which would bias the OLS
estimates, hence, can’t use OLS method.
• Fixed Effects (FE) obtain consistent estimates of slope
5.4 cont’d…
• FEM estimator solves the endogeneity problem, by
transforming the original equation using the within
deviation of each observation, like - .
The within estimator or the Fixed Effects estimator,
transform the regression equation as follows:
Given the Original equation: … [1]
i) First compute the mean over time (t) of each variable in
the model and write it as:
= + + ….[2]
= , = and = are individual means.
t = 1,2,3 … T time periods
5.4 cont’d…
ii)Then Subtract the equation the equation
(original equation)
- = - - -
- = -
= + -
= , = and = are the time demeaned data on Y, K
and L .
= + … [3]
This transformation is also called the within
transformation
5.4 cont’d…
• FE runs this transformed version of the original equation.
That is , the estimation is made in deviation of each
variable.
This transformation of the original equation, known as the within
transformation, in effect it has eliminated from the model and
all variables are deviation form.
for individual units can be generated after estimation as: = - -
Stata command for FEM (within): xtreg Y L K, fe
where Y is dependent variable L & K are explanatory and ‘fe’ – for
Fixed effect
• limitation of the FEM (within) estimation is that time-invariant
variables are dropped from the model and their coefficients are not
identified (we can’t include time invariant dummies such
gender, race, location, education level, etc)
5.4 cont’d…
3. Random effects model (REM)
Assumptions of REM
• Individual specific effect vary across individual
• It assumes individual specific effects, , is not correlated with
regressors. It is distributed independently of regressors.
• It assume the individual specific effects, are randomly
distributed and correlated with error term. It is included in
the error term, hence, it is called Random Effect model because
individual specific effect is assumed random.
• Each individual has the same slope parameters and a composite
error term given as:
= +
[3.1] - now average common intercept
5.4 cont’d…..
• REM explores difference in error variance component across
individual.
• The Random Effects (RE) estimator is a GLS
(Generalized Least Square) estimator that
takes into account serial correlation. This works as
follows.
The RE Transformation
• Using GLS involves transforming the original equation, so
that the transformed equation fulfils the assumptions
underlying the classical linear regression model.
5.4 cont’d….
Panel data model
[3.1]
Define λ as:

, and T – time periods


Use to transform the original equation as follows
5.4 cont’d…

Multiply by individual mean of original equation


=+ +
= + + [3.2]
Subtract [3.2] from [3.1]
…[3.3]
5.4 cont’d…..
If OLS used on this transformed equation[3.3] can
give the random effects GLS estimator. However,
statistical soft wares can auto-transfer the
variables/the model and generates the GLS RE
coefficients.
Stata command for REM (GLS): xtreg Y L K,
re
Notes on RE:
1. More efficient than FE: as long as the key assumption is
satisfied RE is superior
2. Time invariant variable can be included, such as
5.4 cont’d…
Hausman Specification Test
There is a formal test to choose from the two: Hausman
specification Test
To decide between fixed or random effects you can run a
Hausman test where the null hypothesis is that the
preferred model is random effects vs. the alternative the
fixed effects (see Green, 2008, chapter 9).
Hausman test is a specification test that used to identify or
choose between FEM and REM that better fit the panel data.
Hypothesis of the test:
Null: REM is appropriate
Alt: FEM is appropriate
5.4 cont’d…
The test is conducted based on the following
procedures:
Estimate the FEM and store the estimated result
using these commands:
xtreg Y L K, fe
estimates store fixed
Do the same for REM:
xtreg Y L K, re
estimates store random
Then enter the following:
hausman fixed random
5.4 cont’d…
The Hausman test result will display(it uses chi-
square statistics).
check the chi-square statistics and its p- value
Decision:
• If the p-value of chi-square statistics is less than 5%
, reject the null hypothesis and we concluded FEM is
appropriate.
• If p- value of chi square statistics is greater than 5%
, we can’t reject the null and we conclude REM is
appropriate.
excercise
xtreg liq asset loan rgdp, fe
Fixed-effects (within) regression Number of obs =
112
Group variable: id Number of groups = 14 (banks)
R-sq: Obs per group:
within = 0.4123 min =
8
between = 0.8198 avg= 8.0
overall = 0.4326 max =
8
Continued….
F(3,95) = 22.22 Prob > F = 0.0000
corr(u_i, Xb) = 0.2482
exercise
liq Coef. Std. Err. t P>t [95% Conf. Interval]

asset .000011 .000046 0.24 0.808 -.0000808 .0001034


3 4
loan -.000201 .000101 -1.98 0.051 -.0004037 4.91e-07
6 8
rgdp -.000016 2.28e-06 -7.10 0.000 -.0000207 -.0000117
2
_cons 43.4537 2.06815 21.01 0.000 39.34798 47.55959
9 4

sigma_ 6.24239
uF test that
97 all u_i =0: F(13, 95) = 1.93 Prob > F =
0.0357estimate
sigma_ 9.50827store fem ( a command to save result)
e 37
rho .301198 (fraction of variance due to u_i)
9
exercise
xtreg liq asset loan rgdp, re

Random-effects GLS regression Number of obs = 112


Group variable: id Number of groups = 14 (bamks)

R-sq: Obs per


group:
within = 0.3731 min
=8
between = 0.8742 avg =8.0
overall = 0.5028 max
=8

Wald chi2(3) = 109.20 Prob > chi2 =0.0000


Continued….
corr(u_i, X) = 0 (assumed)
exercise
liq Coef. Std. Err. z P>z [95% Interval]
Conf.

asset .0000427 .0000229 1.87 0.062 -2.08e-06 .0000875


loan -.0005006 .000076 -6.59 0.000 -.000649 -.0003517
6
rgdp -.000016 2.23e-06 -7.15 0.000 -.000020 -.0000116
3
_cons 45.54496 2.049274 22.22 0.000 41.52845 49.56146
sigma_u 0
sigma_e 9.5082737
estimate 0store rem (fraction
rho (a command to save result,
of variance due to entered
u_i) after regression)
exercise
hausman fem rem (stata command)

Coefficients ----
(b) (B) (b-B) sqrt(diag(V_b V_B))
fem rem Difference S.E.

asset .0000113 .0000427 -.0000314 .0000403


loan -.0002016 -.0005006 .000299 .0000677
rgdp -.0000162 -.000016 -2.40e-07 4.78e-07

b = consistent under Ho and Ha; obtained from xtreg


B = inconsistent under Ha, efficient under Ho; obtained from xtreg
Test: Ho: difference in coefficients not systematic
chi2(3) = 23.44 Prob>chi2 =0.0000
(V_b-V_B is not positive definite)
Exercise LSDV encode banks,
gen(bank dummies)

reg liq asset loan rgdp bankdumies

Source SS df MS Number of obs = 112


F(16, 95) = 9.16
Model 13252.0292 16 828.251824 Prob > F = 0.000
Residual 8588.69046 95 90.407268 R-squared = 0.6068
Adj R-squared = 0.5405
Total 21840.7196 111 196.76324 Root MSE = 9.5083

Continued….
liq Coef. Std. Err. t P>t [95% Conf. Interval]

asset .0000113 .0000464 0.24 0.808 -.0000808 .0001034


loan -.0002016 .0001018 -1.98 0.051 -.0004037 4.91e-07
rgdp -.0000162 2.28e-06 -7.10 0.000 -.0000207 -.0000117
bankdumies
BNIB 1.727913 4.874742 0.35 0.724 -7.949675 11.4055
BOA 1.791108 4.782213 0.37 0.709 -7.702786 11.285
BRIB 8.220132 4.875858 1.69 0.095 -1.459669 17.89993
CBE -4.171271 10.27665 -0.41 0.686 -24.57301 16.23047
CBO 5.135215 4.822893 1.06 0.290 -4.439438 14.70987
DB 2.566775 4.754817 0.54 0.591 -6.87273 12.00628
DBE -16.82429 5.464853 -3.08 0.003 -27.67339 -5.975184
LIB 6.595778 4.86241 1.36 0.178 -3.057327 16.24888
NIB .5790694 4.789504 0.12 0.904 -8.929298 10.08744
OIB 3.457992 4.840787 0.71 0.477 -6.152184 13.06817
UB 1.159528 4.783534 0.24 0.809 -8.336987 10.65604
WB 1.198211 4.790474 0.25 0.803 -8.312083 10.7085
ZB 8.282126 4.871599 1.70 0.092 -1.38922 17.95347
_cons 42.04534 3.832647 10.97 0.000 34.43657 49.65411
5.4 cont’d…..
First Difference Model
if there is feedback correlation b/n the intercept and explanatory variables
that takes more than two periods, FD will be consistent whereas FE will not
(hence weaker form of strict exogeneity).
Rather than time-demeaning the data (which gives the FE estimator), we
now difference the data:

Clearly this removes the individual fixed effect, and so we can obtain
consistent estimates of β by estimating the equation in first differences by
OLS.
5.4 cont’d…..
FE or FD which is better?
FE and FD are two alternative ways of removing the fixed
effect. Which method should we use?
• First of all, when T = 2 (i.e. we have only two time
periods), FE and FD are exactly equivalent and so in this
case it does not matter which one we use (try to prove
this).
• But when T > =3, FE and FD are not the same.

You might also like