Panel Data Analysis
Panel Data Analysis
Analysis
Panel Data
PANEL DATA ARE A PANEL CONSISTS THE CROSS-SECTION SOME EXPLANATORY SOME VARIABLES LIKE
CONSTRUCTED OF A SET OF MULTIPLE UNITS MAY BE VARIABLES ARE AGE AND WEALTH
THROUGH SURVEY ENTITIES FROM HOUSEHOLDS, FIRMS, OBSERVED THAT CAN PROFILE ARE TIME-
CONDUCTED AT WHICH COUNTRIES AND SO BE CONTROLLED, BUT DEPENDENT, WHILE
SEVERAL POINTS IN INFORMATION ON ON. SOME INFORMATION SOME VARIABLES LIKE
TIME USING THE SAME SIMILAR ISSUES IS IS UNOBSERVED GENDER ARE TIME-
CROSS SECTION COLLECTED OVER WHICH IS INDEPENDENT
UNITS. TIME. UNCONTROLLABLE.
Panel Data
The number of records for k number of variables in the corresponding data file is N.T
Data Matrix of a Single Variable X
Structure of Panel Data
In other words, in a
In a balanced panel, balanced panel, all
there will be no missing entities have
value in the data set. measurements in all time
periods.
Types of Panel Data
➢ There are two types of economic panel data based on the types of cross
section units: micro panel and macro panel.
➢ If the cross section units are micro units, the panel data are micro panel.
In the case of micro panel, number of cross section units is much larger than
time period (N >> T).
The large surveys of the households or firms over time form micro panel
The micro panel is also called cross section panel or short panel.
Types of Panel Data
On the other hand, if the cross section units are macro units, the panel forms a
macro panel.
In a macro panel, the number of cross section units is much smaller than time
period (N < T).
The macro panel is also called long panel or time series panel.
As the time dimension is large, time series properties will be dominating in
macro panel.
Variation in a Panel Data Set
➢ The . use command in Stata reads a data set, and the clear option removes data in
memory currently used and then loads new one into the main memory.
➢ The . list command lists data items of individual observations.
➢ Suppose that we want to look at output per worker (output_per_worker), GDP
growth (gdp_growth) and workers in wage employment (wage_workers) in the data
set by using the following command:
list country_SA year output_per_worker gdp_growth wage_workers in 1/35, sep (28)
Data Description by using STATA
The following command reshapes the data from the wide form to long one.
. reshape long output_per_worker gdp_growth wage_workers, i(country_SA) j( year)
The i(country_SA) specifies identification variables to be used as identification of
observations.
. describe command displays the basic information of the variables
Descriptive statistics like mean, standard deviation, minimum and maximum of
variables listed in the sample data set are obtained by using . summary
Data Description by using STATA
➢ To use panel data commands, we need to declare which variable is treated as cross section units
and which one is used as time series variables
➢ Using the .xtset command followed by the name of the cross section and time series variables in
order.
➢ For example, if country is the variable name for cross section units and year is the name for time
variable, we can use the following command:
xtset country year
Data Description by using STATA
➢ We need to convert the cross section id ‘country’ to numeric by using the command:
encode country, g(country_i)
Then execute: xtset country_i year
To describe the pattern: xtdescribe
To explore descriptive statistics: xtsum
To plot: xtline
Benefits of Panel Data
Panel data have several advantages over cross section data and or time series
data
1. In panel data no. of data points is increased. If there are N cross section units
and T time periods, then total number of observations is NT. Thus degrees of
freedom is more
2. It is helpful in constructing and testing more complicated behavioural
hypothesis, eg., in the presence of unobserved heterogeneity
Benefits of Panel Data
reduced
Benefits of Panel Data
5. panel data are helpful in providing micro foundations for aggregate data analysis. If micro units are
heterogeneous, the time series properties of aggregate data will be very different from those of disaggregate
data. In this case, the prediction of aggregate outcomes by using aggregate time series may be misleading.
The use of panel data can resolve this problem by capturing the heterogeneity issue.
6. In panel data, if observations among cross-sectional units are independent, one can show by using the
central limit theorem that the limiting distributions of many estimators remain asymptotically normal even for
nonstationary series
Error Component
model
Error Component model
One way to restore homogeneity across i or over t and to solve the problem of endogeneity (when
explanatory variable is correlated with the error term) and to decompose the random error, and the
model developed is known as the error component model.
If the error is decomposed in one way, either cross section-specific or time-specific, it is called one-
way error component model
If error is decomposed in both cross section- and time-specific, it will be two-way error component
model
One-way Error Component model
Or
uit = λt + εit
In one-way error component structure, then the multiple linear regression takes
the following form:
yit = μi + xit’β + εit
One-way Error Component model
Or,
yit = λt + xit’β + εit
We impose the restrictions that slope coefficients are identical but the intercepts are
not and it is estimated by applying OLS
Two-way Error Component model
The random error can also be decomposed in two ways: both cross section-specific and time-
specific errors
uit = μi + λt + εit
➢ The error component model can be estimated by applying either fixed effects or random effects
specification depending on the nature of the error component
Error Component model
In a fixed effects model, the cross section - or time-specific errors are treated as the coefficients
of the dummy variables and are the part of the intercept term.
The fixed effects error component model therefore is sometimes called the least squares dummy
variable (LSDV) model
➢ In a random effects model the errors are combined to the random disturbance
➢ A simple trick to eliminate the unobserved cross section-specific error is the first-differenced
estimator
First-Differenced Estimator
One inherent problem in estimation applying OLS is that it contains unobserved heterogeneity
and that cannot be estimated separately
With panel data we can difference out the cross section-specific error after taking difference over
time
The advantage of FD estimation is that the fixed effects are cancelled out.
The intuition behind the FD estimator is that it uses only within-entity changes bypassing the
between-entity change
Unobserved differences between countries no longer bias the estimator
But, in the first-differenced model we cannot estimate the measure of heterogeneity, μi
The fixed effects model can incorporate the estimates of cross section-specific unobserved
heterogeneity
Fixed Effects Model
We assume that the individual effects are time constant but are not common across the entities.
The distinctive error varies over individuals and time.
In the fixed effects model we can estimate each μi along with β
There are several ways for estimating a fixed effects model
One popular method is the “within” estimation or mean-corrected estimation
Another method for estimating fixed effects is the least squares dummy variable (LSDV) model
that uses dummy variables for the cross section units
The Within Estimation
The “within” estimation uses deviations from group (or time period) means or variation
within each individual or entity.
Let us start from the one-way error component model with single regressor:
yit = β0 + μi+ β1xit+ εit …………. (i)
➢ Taking mean of this equation over time for each i (“between” transformation), we have:
-----------------(ii)
The Within Estimation
Again by taking average of equation (ii) across individuals, we have the following
mean equation:
This restriction on the coefficients of dummy variable is required to avoid the dummy
variable trap. Only β 1 and (β0 + μi) are estimable from equation (i) and not β0 & μi
separately unless the restriction is imposed.
The Within Estimation
➢ On the other hand, subtracting (ii) from (i) for each t (“within”
transformation) we get
➢ Here, the incidental parameter (μi) is no longer a problem and the model can
be estimated by applying OLS. Time constant unobserved heterogeneity is
no longer an issue in “within” estimation.
The “within” estimation, however, has
several disadvantages
• First, it will not work well with data for which within-
cluster variation is minimum.
• Second, data transformation for “within” estimation
Disadvantages wipes out all time-invariant variables like gender,
of Within citizenship and ethnic group, and it is not possible to
Estimation
estimate coefficients of such variables in “within”
estimation.
The least squares dummy variable (LSDV) regression is the OLS regression of a set of dummies in
fixed effects framework
In many cases, the unobserved characteristics of the cross section units may be of interest to the
researchers.
But, the within-group method does not estimate the unobserved fixed effects because of the
construction of the model, the unobserved effects are swept from the model.
LSDV Regression
To estimate the fixed effects, we can treat the unobserved fixed effects as
the coefficients of the binary variables representing the cross section units
➢ The least squares dummy variable (LSDV) model provides the fixed effects
estimators along with the slope parameter.
➢ We also get estimates for the μi
➢ The LSDV estimator is practical only when N is small
Random Effects Model
For i ≠ j and t ≠ s
The μi are independent of the error term εit and the regressors xit, for all i and t.
Therefore, the mean and variance of the composite error are
E(uit) = 0, V(uit) = V(yit) = σy2 = σμ2 + σε2