0% found this document useful (0 votes)
26 views

Chapter 2_Panel Data Regression

The document discusses panel data regression, focusing on the endogeneity problem, its sources, and solutions, including the use of panel data models. It explains the structure and advantages of panel data, as well as different regression models such as pooled, random effects, and fixed effects estimation methods. Additionally, it covers estimation model selection tests like the Breusch-Pagan and Hausman tests, along with practical commands for analysis in Stata.

Uploaded by

Linh Phạm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Chapter 2_Panel Data Regression

The document discusses panel data regression, focusing on the endogeneity problem, its sources, and solutions, including the use of panel data models. It explains the structure and advantages of panel data, as well as different regression models such as pooled, random effects, and fixed effects estimation methods. Additionally, it covers estimation model selection tests like the Breusch-Pagan and Hausman tests, along with practical commands for analysis in Stata.

Uploaded by

Linh Phạm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

National Economic University

Chapter 3

Panel Data Regression

Dr. Phung Minh Duc


Contents
1. Introduction
2. Panel data
3. Regression models with Panel data
4. Estimation model selection Tests
5. Some defects of the panel model
6. Commands on Stata
7. Practice
Introduction

Endogeneity Problem
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢
❖ Endogeneity problem is when the independent variable is correlated with the
error term:
𝑐𝑜𝑣(𝑋, 𝑢) ≠ 0
❖ If the model contains endogenous variables => The coefficients estimated by
the OLS method are biased and unstable: 𝐸(𝛽 ෢1 ) ≠ 𝛽1
❖ Endogeneity is a frequent problem in economic and econometrics.
Introduction

Sources of endogeneity
❖ Omitted variable: Independent variables are not observed and end up in
the error term, so the error term is correlated with the independent
variables used in the model
❖ Measurement: Measurement error can cause correlation between the
mismeasured variable and the error term
❖ Simultaneity: The independent variable and the dependent variable are
related at the same time
Introduction

Solutions for endogeneity


❖ Find and include a proxy variable in the model
❖ Using instrumental variable method (IV)
❖ Using econometric models with panel data
Panel Data

❖ Panel data is a set of data collected on the same set of individuals


(household, enterprise, province, etc…) along time at equally spaced time
points..
❖ Panel data contains two directions:
▪ The horizontal information between objects at the same time
(characteristic of cross-sectional data)
▪ The vertical information of each object along time(characteristic of
time series data).
Panel Data

❖ Panel data structure


Individual Time Depvar (Y) Indepvar (X)
1 1 𝑦11 𝑥11
1 2 𝑦12 𝑥12
1 3 𝑦13 𝑥13
… … … …
N 1 𝑦𝑁1 𝑥𝑁1
N 2 𝑦𝑁2 𝑥𝑁2
N 3 𝑦𝑁3 𝑥𝑁3
Panel Data

❖ Note:
The variables in the panel dataset can include the following groups:
▪ Group 1: Variables that change in both directions, such as: the output of
a enterprises, personal consumption, etc.
▪ Group 2: Variables that change horizontally but not vertically, such as:
the gender of household head, religion, etc.
▪ Group 3: Variables that change vertically but not horizontally. such as:
exchange rate, basic interest rate, general macroeconomic environment
So, panel data provides more dimensional information than other data
types and are very useful in applied research.
Panel Data

❖ Balance panel data is a data set with full individual information at all
times of observation
❖ Unbalance panel data is a data set with missing information of some
individuals at some time of observation
❖ Sources of unbalance:
▪ Self-selection (enterprise bankruptcy, province merged, individual
death, etc.)
▪ Random factors (data entry errors, data at a certain time cannot be
collected)
Panel Data

❖ The size of the dataset


Suppose the data set contains information about N individuals at T of the
observation period, then there are the following cases:
▪ N >> and T <<: The traditional panel data format
▪ N << and T >>: Take care of the autocorrelation problem
▪ N << and T<<: The data format is rarely used
▪ N >> and T>>: Being interested in research (Big data)
Panel Data

❖ Advantages of panel data


▪ Rich information: Horizontal (observations) and vertical (time)
▪ Solve the problem of endogeneity due to the lack of unobserved variables
(individual characteristic variables)
If intra-individual variation is considered, the impact of unobserved factors
can be excluded (individual characteristics do not change over time).
Panel Data

❖ Advantages of panel data


▪ Achieve vivid and refined analytics:
For example, in poverty reduction research, the panel data not only shows
the number of poor households, but also provides information on which
households are chronically poor, temporary poor or falling back into poverty.
Panel Data

❖ Advantages of panel data


▪ Reducing multicollinearity in the problem with distributed lag
▪ Increasing degrees of freedom, increasing the accuracy of statistical
inferences
▪ Suitable for datasets collected in developing countries
Panel Data

❖ Some typical panel datasets in Vietnam


▪ Vietnam Household Living Standard Survey (VHLSS)
▪ General Enterprise Survey (GES)
▪ Small and Medium Enterprise Survey (SMES)
▪ Provincial Competitiveness Index (PCI)
▪ …
Panel Data

❖ Practice on Stata
▪ Create an panel data file from annual data files
▪ Using the commands:
➢ merge
➢ reshape long, i(id) j(time)
➢ xtset id time
Regression models with Panel data

❖ General Panel Regression Model


𝑌𝑖𝑡 = 𝛽0 + 𝛽1 𝑋1𝑖𝑡 + ⋯ + 𝛽𝑘 𝑋𝑘𝑖𝑡 + 𝑐𝑖 + 𝑢𝑖𝑡 (1)
In which:
• 𝑖 is the individual index, 𝑗 is the time index;
• 𝑐𝑖 represents an unobserved factor (individual characteristic), which does
not change over time, that has an impact on 𝑌.
Note: Since 𝑐𝑖 represents the difference between individuals in the set of
observations, and this difference does not depend on time, the model (1) is
also called individual effect models.
Regression models with Panel data

Depending on the nature of 𝑐𝑖 , we have three models with different estimation


methods as follows:
▪ Pooled Estimation Model: There is no (or omitted) 𝑐𝑖 in the model
▪ Random Effects Estimation: There exists 𝑐𝑖 but 𝑐𝑖 is not correlated with
any independent variable 𝑋𝑘 in the model
▪ Fixed Effects Estimation: There exist 𝑐𝑖 and 𝑐𝑖 are correlated with at
least one independent variable 𝑋𝑘 in the model
Regression models with Panel data

❖ Pooled Estimation Model (POLS)


𝑌𝑖𝑡 = 𝛽0 + 𝛽1 𝑋1𝑖𝑡 + ⋯ + 𝛽𝑘 𝑋𝑘𝑖𝑡 + 𝑐𝑖 + 𝑢𝑖𝑡 (1)
▪ If 𝑐𝑖 really does not exist, then OLS is the best estimator for (1), with the
following assumptions:
➢ POLS1: 𝐸 𝑢𝑖𝑡 𝑋 = 0, ∀𝑖, 𝑡
➢ POLS2: Random error _𝑖𝑡 not autocorrelated
➢ POLS3: 𝑣𝑎𝑟 𝑢𝑖𝑡 𝑋 = 𝜎 2 , ∀𝑖, 𝑡
▪ If 𝑐𝑖 exists (which is quite common), then OLS obtains a biased estimator
▪ Command on Stata: reg Y X1 … Xk
Regression models with Panel data

❖ Random Effects Estimation (RE)


𝑌𝑖𝑡 = 𝛽0 + 𝛽1 𝑋1𝑖𝑡 + ⋯ + 𝛽𝑘 𝑋𝑘𝑖𝑡 + 𝑐𝑖 + 𝑢𝑖𝑡 (1)
▪ If 𝑐𝑖 exists, but 𝑐𝑖 is not correlated with 𝑋𝑖𝑡 , then there is no
endogenous variable problem. However, since c is included in the
random error, then the new random errors 𝑣𝑖𝑡 = 𝑐𝑖 + 𝑢𝑖𝑡 is
autocorrelated.
▪ The random effect estimation method focuses on solving the
autocorrelation problem of 𝑣𝑖𝑡
Regression models with Panel data

❖ Random Effects Estimation (RE)


𝑌𝑖𝑡 = 𝛽0 + 𝛽1 𝑋1𝑖𝑡 + ⋯ + 𝛽𝑘 𝑋𝑘𝑖𝑡 + 𝑐𝑖 + 𝑢𝑖𝑡 (1)
The assumptions of the RE estimation method are as follows
❖ RE1: 𝐸 𝑢𝑖𝑡 𝑋, 𝑐 = 0, ; 𝐸 𝑐𝑖 , 𝑋 = 0, ∀𝑖, 𝑡
❖ RE2: The random error 𝑢𝑖𝑡 not autocorrelated
❖ RE3: v𝑎𝑟 𝑐𝑖 𝑋 = 𝜎𝑐2 ; 𝑣𝑎𝑟 𝑢𝑖𝑡 𝑋, 𝑐 = 𝜎𝑢2 , ∀𝑖, 𝑡
Regression models with Panel data

❖ Random Effects Estimation (RE)


𝑌𝑖𝑡 = 𝛽0 + 𝛽1 𝑋1𝑖𝑡 + ⋯ + 𝛽𝑘 𝑋𝑘𝑖𝑡 + 𝑐𝑖 + 𝑢𝑖𝑡 (1)
Estimation methods for RE model:
❖ Generalized Least Squares (GLS) Estimator
❖ Maximum Likelihood Estimation (MLE)

Command on Stata (for GLS estimator method):


xtreg Y X1 … Xk, re
Regression models with Panel data

❖ Fixed Effects Estimation (FE)


𝑌𝑖𝑡 = 𝛽0 + 𝛽1 𝑋1𝑖𝑡 + ⋯ + 𝛽𝑘 𝑋𝑘𝑖𝑡 + 𝑐𝑖 + 𝑢𝑖𝑡 (1)
▪ If 𝑐𝑖 exists, and 𝑐𝑖 is correlated with 𝑋𝑖𝑡 , then there is endogenous
variable problem.
▪ The assumptions of the FE estimation method are as follows:
❖ FE1: 𝐸 𝑢𝑖𝑡 𝑋𝑖 , 𝑐𝑖 = 0, ∀𝑡, that mean:
𝑐𝑜𝑣 𝑢𝑖𝑡 , 𝑋𝑖 = 0 and 𝑐𝑜𝑣 𝑢𝑖𝑡 , 𝑐𝑖 = 0
❖ FE2: 𝑟𝑎𝑛𝑘 𝐸 𝑋 ′ 𝑋 = 𝑘
❖ RE3: v𝑎𝑟 𝑢𝑖𝑡 𝑋𝑖𝑡 = 𝜎𝑢2 ; 𝑐𝑜𝑣(𝑢𝑖 , 𝑢𝑗 ) = 0, ∀𝑖 ≠ 𝑗
Regression models with Panel data

❖ The within estimator with FE model


𝑌𝑖𝑡 = 𝛽0 + 𝛽1 𝑋𝑖𝑡 + 𝑐𝑖 + 𝑢𝑖𝑡 (1)
▪ For each 𝑖, average the equation (1) over time, we get:
1 1 1
σ 𝑌 = 𝛽0 + 𝛽1 . 𝑇 σ𝑡 𝑋𝑖𝑡 + 𝑐𝑖 + 𝑇 . σ𝑡 𝑢𝑖𝑡
𝑇 𝑡 𝑖𝑡
or 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑐𝑖 + 𝑢𝑖 (2)
▪ From (1) and (2), because 𝑐𝑖 is fixed over time, we have:
ሷ = 𝛽1 𝑋ሷ 𝑖𝑡 + 𝑢ሷ 𝑖𝑡
𝑌𝑖𝑡 − 𝑌𝑖 = 𝛽1 (𝑋𝑖𝑡 − 𝑋𝑖 ) + (𝑢𝑖𝑡 − 𝑢𝑖 ) or 𝑌𝑖𝑡 (3)
A pooled OLS estimator that is based on the time-demeaned variables is called the fixed
effects estimator or the within estimator.
Command on Stata: xtreg Y X, fe
Estimation model selection Tests

Breusch – Pagan Test


❖ Hypothesis testing:

𝐻0 : 𝑣𝑎𝑟 𝑐𝑖 = 0

𝐻1 : 𝑣𝑎𝑟(𝑐𝑖 ) ≠ 0
❖ Test statistics
σ𝑛𝑖=1(σ𝑇𝑡=1 𝑣𝑖𝑡 )2
1− 𝑛
(𝑛𝑇)2 σ𝑖=1 σ𝑇𝑡=1 𝑣 2 𝑖𝑡
𝜆𝐿𝑀 =
2 𝑛𝑇 2 − 𝑛𝑇
If 𝐻0 is true, then 𝜆𝐿𝑀 obeys the law of Chi-squared with one degree of freedom
Command on Stata: xttest0
Estimation model selection Tests

Hausman Test
❖ Hypothesis testing:
𝐻0 : 𝑐𝑜𝑣 𝑐𝑖 , 𝑢𝑖𝑡 = 0

𝐻1 : 𝑐𝑜𝑣 (𝑐𝑖 , 𝑢𝑖𝑡 ) ≠ 0
❖ Test statistics
𝜒 2 𝑞𝑠 = (𝛽መ𝐹𝐸 − 𝛽መ𝑅𝐸 )′(𝑉𝐹𝐸 − 𝑉𝑅𝐸 )−1 (𝛽መ𝐹𝐸 − 𝛽መ𝑅𝐸 )
If 𝐻0 is true, then 𝜆𝐿𝑀 obeys the law of Chi-squared with one degree of
freedom
Command on Stata: hausman fe re
Estimation model selection Tests

𝑐𝑖 = 0
RE POLS
(xttest0; P>>)
𝑐𝑖 ≠ 0, P<<

FE or RE 𝑃≫
(Hausman) RE

𝑃≪
FE
Estimation model selection Tests

Practice on Stata
❖ Step 1: Model selection between POLS and RE
xtreg Y X1…Xk, re
Xttest0 => If P-value >> then POLS is the best model
❖ Step 2: Model selection between FE and RE
xtreg Y X1…Xk, fe
est store fe
xtreg Y X1…Xk, re
est store re
hausman fe re => If P-value << then FE is the best model
If P-value >> then RE is the best model
Some defects of the panel model

❖ In FE model
▪ Autocorrelation
xtserial Y X If P-value << then the model has autocorrelation
=> xtregar Y, X, fe
▪ Contemporaneous correlations
xttest2
If P-value << then the model has contemporaneous correlations
=> xtscc Y X, fe
▪ Heteroskedasticity
xttest3 If P-value << then the model has heteroskedasticity
=> xtreg Y X, fe robust
Some defects of the panel model

❖ In RE model
▪ Autocorrelation
xttest1 If P-value >> then the model has autocorrelation
=> xtregar Y, X, fe
▪ Heteroskedasticity
xtreg Y X, re
predict res1, ue
robvar res1, by (id)
If P-value << then the model has heteroskedasticity
=> xtreg Y X, re robust
Practice

You might also like