0% found this document useful (0 votes)
9 views

Lecture 14 - Panel data models

Uploaded by

Gia Bảo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture 14 - Panel data models

Uploaded by

Gia Bảo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

PANEL DATA

MODELS
Nguyen Quang
[email protected]
1 - Panel data
2 - Pooled OLS estimator
3 - Fixed effects model
4 - Random effects model
5 - FE vs RE: Hausman test
6 - Between group estimator

COVERED IN
THIS LECTURE
PANEL DATA
PANEL DATA

• Cross-sectional data: observations from MANY units at a SINGLE time point.


𝑦! with 𝑖 = 1, … 𝑁
• Time-series data: observations from a SINGLE unit over MULTIPLE time periods
𝑦" with 𝑡 = 1, … , 𝑇
• Panel data: observations from MANY units over SEVERAL time periods
𝑦!" with 𝑖 = 1, … , 𝑁 and 𝑡 = 1, . . , 𝑇
Advantages

• More observations
• More variability
• Less collinearity between regressors
PANEL • Control of individual heterogeneity
• Reduce biases
DATA
Disadvantages

• Require more efforts collecting data


• Selectivity biases
PANEL DATA MODEL
REQUIRES WITHIN GROUP
VARIATION
• Panel data model (FE) requires variation
within group
• An example where panel data does not
work
𝑦!" = 𝛼 + 𝛽𝑥!" + 𝑢
• 𝑦!" is export volume from VN to country 𝑖
in year 𝑡
• 𝑥!" is the distance from VN to country 𝑖 in
year 𝑡
• As distance from VN to country 𝑖 does not
change from year to year, it can’t be
included in the fixed effect model.
Viet Nam Provincial data on
• rgdp: provincial GDP (mil. VND)
• labfo: number of laborers of provinces (1000
persons)
EXAMPLE
• rinvest: provincial gross investment (mil. VND)
DATA • pci: 100-point scaled composite index measuring
and ranking Vietnam’s provinces based on their
overall economic governance quality
• Data for 58 provinces, 5 years (2007-2011)
province provincecode year rgdp labfo rinvest pci
BRVT 11 2010 1.30E+08 531.1 2.60E+07 60.5507
BRVT 11 2009 1.30E+08 513 2.00E+07 64.2287
BRVT 11 2008 1.70E+08 519 1.80E+07 60.5126
BRVT 11 2011 1.40E+08 553.9 2.30E+07 66.13
BRVT 11 2007 1.20E+08 497.6 1.30E+07 65.6337
EXAMPLE Ca Mau 12 2009 1.90E+07 675.6 8.20E+06 61.0756
DATA Ca Mau
Ca Mau
12
12
2010
2007
2.20E+07
1.40E+07
677.1
625.5
9.70E+06
1.20E+07
53.5729
56.194
Ca Mau 12 2008 1.50E+07 654.1 8.70E+06 58.6385
Ca Mau 12 2011 2.40E+07 684.3 1.20E+07 59.43
Can Tho 13 2010 7.20E+07 680.7 1.60E+07 62.4605
Can Tho 13 2008 5.50E+07 684.4 1.00E+07 56.32
Can Tho 13 2011 9.00E+07 690.7 1.60E+07 62.66
Can Tho 13 2007 4.40E+07 680.6 9.70E+06 61.762
Can Tho 13 2009 6.00E+07 656 1.50E+07 52.3378
SUMMARY STATISTICS
MODEL
SPECIFICATION
• In this lecture we will consider the
specification

𝑦!" = 𝛼 + 𝛽𝑋!" + 𝑢
• 𝑦!" is the logarithm of real GDP of
province 𝑖 in year 𝑡
• 𝑋!" includes
• Logarithm of the labor force
• Logarithm of real investment
• Provincial competitiveness index (PCI)
POOLED OLS ESTIMATOR
POOLED OLS ESTIMATOR
• Data of all groups are pooled together
• No difference between groups

𝑦!" = 𝛼 + 𝛽𝑋!" + 𝑢!"


• Coefficients are identical for all groups.
• Some assumptions:
• The error term is not autocorrelated and homoscedastic
• 𝑋 is nonstochastic and not correlated with 𝑢 (𝑋 is strictly exogenous)
THE
POOLED
OLS IN R
POOLED OLS WITH ROBUST STANDARD ERRORS
CLUSTERED STANDARD ERRORS

• The Pooled OLS estimator (and other panel data models) assumes no
correlation between residuals of the same group (no autocorrelation)
• If we relax the assumption, then
cov 𝑢!" , 𝑢!# ≠ 0
• We then have heteroskedasticity and autocorrelation
• If this happens, the Pooled OLS estimator is still consistent, but the standard
errors are incorrect.
• In this case we may use the clustered robust standard errors.
POOLED OLS WITH
CLUSTERED STANDARD ERRORS
POOLED
OLS
USING
PACKAGE
PLM
FIXED EFFECTS MODEL
Within group estimator
THE FIXED EFFECTS MODEL
• The model
𝑦!" = 𝛼! + 𝛽𝑋!" + 𝑢!"
• The slopes are still identical for all groups.
• But each group has a different intercept.
• These intercepts are called fixed effects, which capture individual heterogeneity.
• Two estimators:
• Fixed effects estimator (within group)
• Least square dummy variable estimator (LSDV)
• Note: these are the two ways of estimating the FE model, not two different models.
WITHIN GROUP FIXED EFFECTS ESTIMATOR
• The model
𝑦!" = 𝛼! + 𝛽𝑋!" + 𝑢!" (1)
• We need to allow for the intercept to vary across groups.
• Now take the average of variables across time, note that the parameters are time-invariant
𝑦1!" = 𝛼! + 𝛽𝑋1!" + 𝑢1 !" (2)
# #
where 𝑦1!" = $ ∑$"%# 𝑦!" and 𝑋1!" = $ ∑$"%# 𝑋!"
• Then subtract (2) from (1)
𝑦!" − 𝑦1!" = 𝛼! − 𝛼! + 𝛽 𝑋!" − 𝑋1!" + 𝑢!" − 𝑢1 !"
• Which results in
𝑦4!" = 𝛽𝑋5!" + 𝑢4 !"
• With this way we can estimate 𝛽 but not the fixed effects.
WITHIN
GROUP FIXED
EFFECTS
ESTIMATOR
WITHIN GROUP FIXED EFFECTS ESTIMATOR
robust standard errors
WITHIN GROUP FIXED EFFECTS ESTIMATOR
clustered standard errors
LEAST SQUARES DUMMY VARIABLE ESTIMATOR

• For the model: 𝑦!" = 𝛼! + 𝛽𝑋!" + 𝑢!"


• We can estimate the fixed effects and 𝛽 by introducing the dummy variables
1 if 𝑗 = 𝑖
𝐷&! =
0 otherwise
• We can then estimate the following model using OLS
'
𝑦!" = C 𝛼& 𝐷&! + 𝛽𝑋!" + 𝑢!"
&%#
• This is the least squares dummy variable (LSDV) estimator.
• The LSDV slope estimates are identical to the within group FE estimates.
• However, LSDV also estimates the fixed effects.
• On the other hand, LSDV is not efficient when 𝑁 is large.
LSDV
FIXED
EFFECTS
ESTIMATOR
some factors omitted
LSDV WITH
ROBUST STANDARD ERRORS
LSDV WITH
CLUSTERED STANDARD ERRORS
• The model now includes time fixed effects

' )
LSDV TWO- 𝑦!" = 0 𝛼$ 𝐷$! + 0 𝛾( 𝐷(" + 𝛽𝑋!" + 𝑢!"
WAY FIXED $%& (%&

EFFECTS Where:
1 if 𝑔 = 𝑡
𝐷(" =
MODEL 0 otherwise
LSDV TWO-
WAY
FIXED
EFFECTS some factors omitted

MODEL
• The random effects model is presented by
𝑦!" = 𝛼 + 𝛽𝑋!" + 𝑢!"
• The error component now includes
𝑢!" = 𝜇! + 𝜖!"
RANDOM
• 𝜇! ~𝑁 0, 𝜎*+ the individual specific random
EFFECTS component
MODEL • 𝜖!" ~𝑁 0, 𝜎,+ the idiosyncratic disturbance
• In the random effects model, regressors can be
time-invariant.
• Estimation method: generalized least squares
RANDOM
EFFECTS
MODEL
RANDOM EFFECT MODEL
clustered standard errors
RANDOM VS.
FIXED EFFECTS
• The main difference is that the individual effects
RANDOM are assumed fixed in FE and random in RE.
• The random effects model is preferred for
VS. FIXED • The fixed effects vary over time.
• It is more efficient (higher degree of
EFFECTS freedom)
• It allows time-invariant regressors
HAUSMAN TEST

• Null hypothesis: both RE and FE estimates are consistent


• Alternative hypothesis: RE estimates are inconsistent
• Test statistics
𝐻 = 𝛽-. − 𝛽/. 0 𝑉 𝛽-. − 𝑉 𝛽/. 1& 𝛽-. − 𝛽/.
which follows 𝜒 + with df = number of regressors.
HAUSMAN
TEST IN R
• We can test only when both models have the
same set of regressors.
• If we include time-invariant regressor in the
RE model (which is not possible in FE
NOTES ON model), then Hausman test fails.
HAUSMAN • Hausman test check whether the two estimates
are equal.
TEST • If we reject the null hypothesis, the FE estimates
are consistent and RE model is mis-specified.
• If any regressor is correlated with the error
term, both estimates are biased.
BETWEEN
ESTIMATOR
• The between estimator analyzes the cross-
sectional variation.
• Suppose we have a data set with 𝑁 units and 𝑇
periods of time.
• Average over time all variables
BETWEEN
ESTIMATOR 𝑦H!" = 𝛼! + 𝛽 𝑋H!" + 𝑢H !"
• Where
#
• 𝑦1!" = $ ∑$"%# 𝑦!"
# $
1
• 𝑋!" = $ ∑"%# 𝑋!"
• We then have a data set of 𝑁 observations.
BETWEEN
ESTIMATOR

You might also like