0% found this document useful (0 votes)
15 views18 pages

C6 - English

Uploaded by

khoidc234081e
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views18 pages

C6 - English

Uploaded by

khoidc234081e
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Chapter 6.

Models with Panel Data


(Course: Econometrics)

Phuong Le

Faculty of Economic Mathematics


University of Economics and Law
Vietnam National University, Ho Chi Minh City
Contents

1 Models with panel data


Panel data
Model estimation methods

2 Panel regression with STATA


Estimating FE and RE models
Model selection
Panel data

Panel data is data that in general has two characteristics


• Cross-sectional and time-series data;
• Cross-sectional data at different times may be dependent.
Example: Survey data of the population includes 3 variables
Y , X1 , X2 by year, from 2000 to 2009 of five provinces (Hai Duong,
Thai Binh, Nam Dinh, Ha Nam, Ninh Binh ).
Y : Total value of products in the province,
X1 : Total value of agricultural products of the province,
X2 : Total value of industrial products of the province.
Panel data

We can present the data as follows:


Hai Duong Province
Year Y X1 X2
2000 Y1 X11 X21
2001 Y2 X12 X22
···
2009 Y10 X1,10 X2,10
Ninh Binh province
Year Y X1 X2
2000 Y1 X11 X21
2001 Y2 X12 X22
···
2009 Y10 X1,10 X2,10
···
Panel data

With these surveyed data:


• Number of cross-sectional units (number of provinces surveyed)
is n = 5,
• The number of time periods observing the variables for each
cross-sectional unit (number of years) is T = 10.
So we have a panel data set consisting of:
• n = 5 cross-sectional units (5 provinces),
• T = 10 different time periods,
• Panel data size: n × T = 5 × 10 = 50.

Balanced and unbalanced panel data


A panel data set is said to be balanced if its size is n × T and
unbalanced if its size is less than n × T (data is missing).
Panel data

The following possibilities may occur:


• Cross-sectional units have similar characteristics
• Cross-sectional units have different characteristics
• Cross-sectional units have differences in the marginal impact of
influencing factors
• Cross-sectional units have both differences in characteristics and
differences in the marginal impact of influencing factors
• Cross-sectional units do not have differences in characteristics
and marginal effects of the factors under consideration.
Model estimation methods

The pooled OLS method

Yit = a + β1 X1it + β2 X2it + · · · + βk Xkit + εit (POLS).


The intercept (a) and all coefficients of the explanatory variables (βi )
are constant over time; and εit ∼ N(0, σ 2 ).
• This method is used when cross-sectional units have no
difference in characteristics and no difference in the marginal
effect of influencing factors (explanatory variables).
• This method is also called the common constant method.
Model estimation methods

The fixed effects method


Yit = ai + β1 X1it + β2 X2it + · · · + βk Xkit + εit (FE).
Each ai corresponds to a cross-sectional unit; and the slopes (βi )
remain constant over time and cross-sectional units.

Estimation: Fixed effects estimation is the least squares estimation


with dummy variables
(
1 if the observation belongs to the cross-sectional unit i,
Di =
0 if the observation belongs to another cross-sectional unit.

Each dummy variable allows estimation of the constant


corresponding to each different group. Therefore, the model is also
known as Least Squares Dummy Variable Model.
Model estimation methods
Test: test the fixed effects model
Hypothesis
H0 : a1 = a2 = · · · = an (The pooled OLS model is more appropriate).
H1 : ai ̸= aj for some i ̸= j (The fixed effects (FE) model is more
appropriate).
F static:
2 2
(RFE − RPOLS )/(n − 1)
F = 2
∼ F (n − 1, nT − n − k).
(1 − RFE )/(nT − n − k)

Properties: model properties (FE)


• The effects are unique to each unit and do not change over time.
• In some cases the number of dummy variables can be very
large, forming panels with thousands of separate objects. Then
the calculation will be more difficult, so some other methods can
be used to study the model.
Model estimation methods

The random effects method


Yit = a + ui + β1 X1it + β2 X2it + · · · + βk Xkit + εit (RE).
The component wit = ui + εit is called the mixed random error (or
compound error) and the RE model is also called the Error
Component Model. ).
Assumptions for the random effects model:
1 The noises ui , εit have a normal distribution with mean 0,
constant variance, no autocorrelation, no correlation with each
other;
2 E(wit2 |X ) = σ 2 = σε2 + σu2 ; E(wit wis |X ) = σu2 with s ̸= t;
E(wit wjs |X ) = 0 for i ̸= j and all s, t;
3 The values of the explanatory variables are deterministic.
Model estimation methods
Model estimation (RE): Use the generalized least squares method
(GLS).
Testing the model with random effects:
Hausman Test:
H0 : The random effects (RE) model is appropriate
H1 : The fixed effects (FE) model is appropriate
Statistic (Hausman test):

H = (bFE − bRE )′ [Var (bFE ) − Var (bRE )]−1 (bFE − bRE ) ∼ χ2 (k),

where bFE is the estimate of the vector β in the (FE) model, bRE is the
estimate of the vector β in the (RE) model according to the GLS
method.
Breusch & Pagan test:
H0 : σu2 = 0 (Pooled OLS)
H1 : σu2 > 0 (RE)
Statistic:
P 2
2
nT i (T e i )
LM = P P 2 − 1 ∼ χ2 (1).
2(T − 1) i t eit
Estimating FE and RE models

Set up panel data


Method 1: Use the menu: Statistics > Longitudinal/panel data >
Setup and utilities > Declare dataset to be panel data.

Method 2: Using the command line:


xtset [panelvar] [timevar].
If the panelvar variable is a string variable, it needs to be replaced
with a numeric variable:
egen panelvarid = group(panelvar)
xtset [panelvarid] [timevar]

Some commands that can be used to process panel data:


• xtline: draw a graph of a variable in the sample,
• xtsum: summary description of variables in the sample.
Estimating FE and RE models

Model estimation
Method 1: Use the menu: Statistics > Longitudinal/panel data >
Linear models > Linear regression (FE, RE, PA, BE).

Method 2: Using the command line:


xtreg depvar [indepvars] [if] [in] , re: random effects model,
xtreg depvar [indepvars] [if] [in] , fe: fixed effects model.
Note: use the regress command if you want to use the Pooled OLS
method.
Estimating FE and RE models

Example
use http://www.stata-press.com/data/r18/nlswork, clear
Fixed effects model:
xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure
c.tenure#c.tenure 2.race not_smsa south, fe
If we suspect heteroskedasticity or serial correlation in the errors, we
can specify the vce(robust) or vce(hc2) option:
xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure
c.tenure#c.tenure 2.race not_smsa south, fe vce(robust)

Example
Random effects model:
xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure
c.tenure#c.tenure 2.race not_smsa south, re
Model selection

Choose between Pooled OLS and Fixed effect


The choice between Pooled OLS or Fixed effect was made when
estimating using the Fixed effect method, which is the F-test. When
performing Fixed effect estimation, Stata provides F-test results with
the following pair of hypotheses:
H0 : Pooled OLS is more appropriate,
H1 : Fixed effect is more appropriate.

Based on the p-value (in the last line of the test result xtreg ..., fe) to
conclude (if p-value < 5% then we choose Fixed effect, otherwise we
choose Pooled OLS ).
Model selection

Choose between Fixed effect and Random effect


Use Hausman test
1 Execute the fixed effects estimation command and save the
results:
xtreg ..., fe
estimates store fe
2 Estimate the random effect and save:
xtreg ..., re
estimates store re
3 Make a comparison between Fixed effect and Random effect
using Hausman test
hausman fe re
If p-value < 5% then Fixed effect is more suitable than Random
effect. In the opposite case, Random effect is more suitable.
Model selection

Choose between Random effect and Pooled OLS


The Breusch & Pagan Lagrangian multiplier test is often used to test
the agreement between Pooled OLS and Random effect for panel
data:
In the command window type the command xttest0 after executing
the command xtreg ..., re.
If p-value < 5% then Random effect is more suitable. In the opposite
case, Pooled OLS is more suitable.
Model selection

Example
xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure
c.tenure#c.tenure 2.race not_smsa south, fe
Because Prob > F = 0.0000, the Fixed effect model is more suitable
than Pooled OLS.
estimates store fe
xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure
c.tenure#c.tenure 2.race not_smsa south, re
estimates store re
We make a comparison between Fixed effect and Random effect
using Hausman test
hausman fe re
Because p-value < 5%, Fixed effect is more suitable than Random
effect
Conclusion: we choose the Fixed effect model.
Reference: https://www.stata.com/manuals/xtxtreg.pdf

You might also like