Chapter 2_Panel Data Regression
Chapter 2_Panel Data Regression
Chapter 3
Endogeneity Problem
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢
❖ Endogeneity problem is when the independent variable is correlated with the
error term:
𝑐𝑜𝑣(𝑋, 𝑢) ≠ 0
❖ If the model contains endogenous variables => The coefficients estimated by
the OLS method are biased and unstable: 𝐸(𝛽 1 ) ≠ 𝛽1
❖ Endogeneity is a frequent problem in economic and econometrics.
Introduction
Sources of endogeneity
❖ Omitted variable: Independent variables are not observed and end up in
the error term, so the error term is correlated with the independent
variables used in the model
❖ Measurement: Measurement error can cause correlation between the
mismeasured variable and the error term
❖ Simultaneity: The independent variable and the dependent variable are
related at the same time
Introduction
❖ Note:
The variables in the panel dataset can include the following groups:
▪ Group 1: Variables that change in both directions, such as: the output of
a enterprises, personal consumption, etc.
▪ Group 2: Variables that change horizontally but not vertically, such as:
the gender of household head, religion, etc.
▪ Group 3: Variables that change vertically but not horizontally. such as:
exchange rate, basic interest rate, general macroeconomic environment
So, panel data provides more dimensional information than other data
types and are very useful in applied research.
Panel Data
❖ Balance panel data is a data set with full individual information at all
times of observation
❖ Unbalance panel data is a data set with missing information of some
individuals at some time of observation
❖ Sources of unbalance:
▪ Self-selection (enterprise bankruptcy, province merged, individual
death, etc.)
▪ Random factors (data entry errors, data at a certain time cannot be
collected)
Panel Data
❖ Practice on Stata
▪ Create an panel data file from annual data files
▪ Using the commands:
➢ merge
➢ reshape long, i(id) j(time)
➢ xtset id time
Regression models with Panel data
𝐻0 : 𝑣𝑎𝑟 𝑐𝑖 = 0
ቊ
𝐻1 : 𝑣𝑎𝑟(𝑐𝑖 ) ≠ 0
❖ Test statistics
σ𝑛𝑖=1(σ𝑇𝑡=1 𝑣𝑖𝑡 )2
1− 𝑛
(𝑛𝑇)2 σ𝑖=1 σ𝑇𝑡=1 𝑣 2 𝑖𝑡
𝜆𝐿𝑀 =
2 𝑛𝑇 2 − 𝑛𝑇
If 𝐻0 is true, then 𝜆𝐿𝑀 obeys the law of Chi-squared with one degree of freedom
Command on Stata: xttest0
Estimation model selection Tests
Hausman Test
❖ Hypothesis testing:
𝐻0 : 𝑐𝑜𝑣 𝑐𝑖 , 𝑢𝑖𝑡 = 0
ቊ
𝐻1 : 𝑐𝑜𝑣 (𝑐𝑖 , 𝑢𝑖𝑡 ) ≠ 0
❖ Test statistics
𝜒 2 𝑞𝑠 = (𝛽መ𝐹𝐸 − 𝛽መ𝑅𝐸 )′(𝑉𝐹𝐸 − 𝑉𝑅𝐸 )−1 (𝛽መ𝐹𝐸 − 𝛽መ𝑅𝐸 )
If 𝐻0 is true, then 𝜆𝐿𝑀 obeys the law of Chi-squared with one degree of
freedom
Command on Stata: hausman fe re
Estimation model selection Tests
𝑐𝑖 = 0
RE POLS
(xttest0; P>>)
𝑐𝑖 ≠ 0, P<<
FE or RE 𝑃≫
(Hausman) RE
𝑃≪
FE
Estimation model selection Tests
Practice on Stata
❖ Step 1: Model selection between POLS and RE
xtreg Y X1…Xk, re
Xttest0 => If P-value >> then POLS is the best model
❖ Step 2: Model selection between FE and RE
xtreg Y X1…Xk, fe
est store fe
xtreg Y X1…Xk, re
est store re
hausman fe re => If P-value << then FE is the best model
If P-value >> then RE is the best model
Some defects of the panel model
❖ In FE model
▪ Autocorrelation
xtserial Y X If P-value << then the model has autocorrelation
=> xtregar Y, X, fe
▪ Contemporaneous correlations
xttest2
If P-value << then the model has contemporaneous correlations
=> xtscc Y X, fe
▪ Heteroskedasticity
xttest3 If P-value << then the model has heteroskedasticity
=> xtreg Y X, fe robust
Some defects of the panel model
❖ In RE model
▪ Autocorrelation
xttest1 If P-value >> then the model has autocorrelation
=> xtregar Y, X, fe
▪ Heteroskedasticity
xtreg Y X, re
predict res1, ue
robvar res1, by (id)
If P-value << then the model has heteroskedasticity
=> xtreg Y X, re robust
Practice