0% found this document useful (0 votes)
42 views

Panel Data Analysis

Notes

Uploaded by

Ivy Dasgupta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Panel Data Analysis

Notes

Uploaded by

Ivy Dasgupta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Panel Data Dr.

Ivy Das Gupta

Analysis
Panel Data

PANEL DATA ARE A PANEL CONSISTS THE CROSS-SECTION SOME EXPLANATORY SOME VARIABLES LIKE
CONSTRUCTED OF A SET OF MULTIPLE UNITS MAY BE VARIABLES ARE AGE AND WEALTH
THROUGH SURVEY ENTITIES FROM HOUSEHOLDS, FIRMS, OBSERVED THAT CAN PROFILE ARE TIME-
CONDUCTED AT WHICH COUNTRIES AND SO BE CONTROLLED, BUT DEPENDENT, WHILE
SEVERAL POINTS IN INFORMATION ON ON. SOME INFORMATION SOME VARIABLES LIKE
TIME USING THE SAME SIMILAR ISSUES IS IS UNOBSERVED GENDER ARE TIME-
CROSS SECTION COLLECTED OVER WHICH IS INDEPENDENT
UNITS. TIME. UNCONTROLLABLE.
Panel Data

Panel data econometric


Panel data can take care of models examine unobserved
A formulation of the panel inter-individual differences heterogeneity by estimating
data model includes both
and intra-individual dynamics cross section-specific effects,
observed and unobserved time effects or both which
by mixing cross section and
explanatory variables time series components. are analysed by fixed effects
or random effects model
Panel Data

THE ANALYSIS OF COLLECTING PANEL FOR THIS REASON, THE NATIONAL


PANEL DATA HAS DATA, HOWEVER, IS PANEL DATA HAVE LONGITUDINAL
BEEN RAPIDLY MUCH MORE COSTLY NOT BECOME WIDELY SURVEYS OF LABOUR
GROWING BECAUSE THAN COLLECTING AVAILABLE EVEN IN MARKET EXPERIENCE
OF THE AVAILABILITY CROSS SECTION OR MANY DEVELOPED IN THE US, THE
OF ECONOMETRIC TIME SERIES DATA. COUNTRIES. DATABASE PREPARED
AND STATISTICAL BY THE MICHIGAN
PROGRAMMES. PANEL STUDY OF
INCOME DYNAMICS
(PSID), ARE THE WELL-
KNOWN PANEL DATA
USED BY THE
RESEARCHERS.
Panel Data

In India, panel data are still not available in official


statistics.

Although the industrial statistics wings of the Central


Statistics Office (CSO) has been trying to prepare factory
level panel in Annual Survey of Industries (ASI)

The types of panel data econometric model depend on


the cross-section dimension, time dimension and the
nature of the entities (cross section units).
Structure of Panel Data

 Panel data can be organised by taking three


dimensions into account: number of cross section
units (i = 1, 2, 3, …, N), number of time periods (t = 1, 2,
3, …, T) and the number of variables (v = 1, 2, …, k).
 We need to rearrange these three dimensions into a
two-dimensional data matrix to estimate an
econometric model by using software.
 The long format is the appropriate way of organising
panel data in a computer programme
 In this format, the data matrix has N.T rows and k
columns.
Structure of Panel Data

 The number of records for k number of variables in the corresponding data file is N.T
Data Matrix of a Single Variable X
Structure of Panel Data

 A particular row of the data matrix represents cross section data


 The entries in a particular column present information on X from a particular entity over time,
forming the time series data
 In a panel data, X varies across i and over t and a particular entry in the data matrix is presented as
Xit
 The means:
Types of Panel Data

If each cross-section unit


Panel data may be is observed each and
balanced or every time period, the
unbalanced. data are called
balanced panel.

In other words, in a
In a balanced panel, balanced panel, all
there will be no missing entities have
value in the data set. measurements in all time
periods.
Types of Panel Data

On the other hand, in unbalanced panel the information of


some cross section units are not available for the entire time
period.

If there are missing data, the number of measurements, Ti,


varies between cross section units and the data set formed is
called an unbalanced panel

In other words, in this case, each cross-section unit does not


appear in every time, and there are missing values.
Types of Panel Data

➢ There are two types of economic panel data based on the types of cross
section units: micro panel and macro panel.
➢ If the cross section units are micro units, the panel data are micro panel.
 In the case of micro panel, number of cross section units is much larger than
time period (N >> T).
 The large surveys of the households or firms over time form micro panel
 The micro panel is also called cross section panel or short panel.
Types of Panel Data

 On the other hand, if the cross section units are macro units, the panel forms a
macro panel.
 In a macro panel, the number of cross section units is much smaller than time
period (N < T).
 The macro panel is also called long panel or time series panel.
 As the time dimension is large, time series properties will be dominating in
macro panel.
Variation in a Panel Data Set

 Sources of variation in a Panel Dataset :


➢ Within Variation : Variable values change over time for any single unit (e.g., firm or individual
or household)
➢ Between Variation : Variable values change between units (i.e. variables differ across firms /
individuals / households)
➢ Overall or Total Variation = Within Variation + Between Variation
 Let us understand these sources of variation in a Panel Dataset with a labour-market example
This is an excerpt from a panel dataset used
for a labour-market study on determinants of
earnings:
● Dependent variable is log of wage
(lwage)
● Every Cross Section unit is given a
unique numeric identifier (person-id)
● This is the most standard arrangement
of panel data (long-form). All years
stacked in a column for each unit.
● Within variation indicated in a single
colour for each unit
● Between variation combines two
colours
● Total variation can be decomposed
into Within and Between variations
Data Description by using STATA

➢ The . use command in Stata reads a data set, and the clear option removes data in
memory currently used and then loads new one into the main memory.
➢ The . list command lists data items of individual observations.
➢ Suppose that we want to look at output per worker (output_per_worker), GDP
growth (gdp_growth) and workers in wage employment (wage_workers) in the data
set by using the following command:
list country_SA year output_per_worker gdp_growth wage_workers in 1/35, sep (28)
Data Description by using STATA

 The following command reshapes the data from the wide form to long one.
. reshape long output_per_worker gdp_growth wage_workers, i(country_SA) j( year)
 The i(country_SA) specifies identification variables to be used as identification of
observations.
 . describe command displays the basic information of the variables
 Descriptive statistics like mean, standard deviation, minimum and maximum of
variables listed in the sample data set are obtained by using . summary
Data Description by using STATA

➢ To use panel data commands, we need to declare which variable is treated as cross section units
and which one is used as time series variables
➢ Using the .xtset command followed by the name of the cross section and time series variables in
order.
➢ For example, if country is the variable name for cross section units and year is the name for time
variable, we can use the following command:
xtset country year
Data Description by using STATA

 If following error appears:


xtset country year
string variables not allowed in varlist;
country is a string variable
r(109);

➢ We need to convert the cross section id ‘country’ to numeric by using the command:
encode country, g(country_i)
 Then execute: xtset country_i year
 To describe the pattern: xtdescribe
 To explore descriptive statistics: xtsum
 To plot: xtline
Benefits of Panel Data

 Panel data have several advantages over cross section data and or time series
data
1. In panel data no. of data points is increased. If there are N cross section units
and T time periods, then total number of observations is NT. Thus degrees of
freedom is more
2. It is helpful in constructing and testing more complicated behavioural
hypothesis, eg., in the presence of unobserved heterogeneity
Benefits of Panel Data

3. It contains intertemporal dynamics and may allow to control

the effects of unobserved variables in estimating a model

4. The collinearity between current and lag variables can be

reduced
Benefits of Panel Data

5. panel data are helpful in providing micro foundations for aggregate data analysis. If micro units are
heterogeneous, the time series properties of aggregate data will be very different from those of disaggregate
data. In this case, the prediction of aggregate outcomes by using aggregate time series may be misleading.
The use of panel data can resolve this problem by capturing the heterogeneity issue.

6. In panel data, if observations among cross-sectional units are independent, one can show by using the
central limit theorem that the limiting distributions of many estimators remain asymptotically normal even for
nonstationary series
Error Component
model
Error Component model

 One way to restore homogeneity across i or over t and to solve the problem of endogeneity (when
explanatory variable is correlated with the error term) and to decompose the random error, and the
model developed is known as the error component model.
 If the error is decomposed in one way, either cross section-specific or time-specific, it is called one-
way error component model
 If error is decomposed in both cross section- and time-specific, it will be two-way error component
model
One-way Error Component model

 In one-way error component model, the random disturbance is decomposed into a


cross section-specific error μi (or time-specific error λt) and an distinctive error εit
uit = μi + εit

Or
uit = λt + εit
 In one-way error component structure, then the multiple linear regression takes
the following form:
yit = μi + xit’β + εit
One-way Error Component model

 Or,
yit = λt + xit’β + εit

 We impose the restrictions that slope coefficients are identical but the intercepts are
not and it is estimated by applying OLS
Two-way Error Component model

 The random error can also be decomposed in two ways: both cross section-specific and time-
specific errors
uit = μi + λt + εit

 The two-way error component model is expressed as


yit = μi + λt + xit’β + εit

➢ The error component model can be estimated by applying either fixed effects or random effects
specification depending on the nature of the error component
Error Component model

➢ When the error component is assumed to be non-stochastic, it will be a fixed effects


model
➢ When the error component is treated as random, it becomes random effects model

We have therefore four different types of error component models:

1) One-way error component fixed effects model,


2) One-way error component random effects model,
3) Two-way error component fixed effects model,
4) Two-way error component random effects model.
Error Component model

 In a fixed effects model, the cross section - or time-specific errors are treated as the coefficients
of the dummy variables and are the part of the intercept term.
 The fixed effects error component model therefore is sometimes called the least squares dummy
variable (LSDV) model
➢ In a random effects model the errors are combined to the random disturbance
➢ A simple trick to eliminate the unobserved cross section-specific error is the first-differenced
estimator
First-Differenced Estimator

 One inherent problem in estimation applying OLS is that it contains unobserved heterogeneity
and that cannot be estimated separately
 With panel data we can difference out the cross section-specific error after taking difference over
time

△yi = △xi’β + △εi

 This is a simple cross-sectional regression equation in differences without constant


 The coefficient vector β can be estimated consistently by applying OLS
First-Differenced Estimator (using STATA)

 We generate the first-differenced variables after setting the data in panel


.xtset country_SA year /* xtset the data
. gen d_lab= ln_lab - l.ln_lab /* l. is the lag-operator
. gen d_pro = ln_lab_pro - l.ln_lab_pro
. gen d_growth= gdp_growth - l.gdp_growth
➢ Then we estimate an OLS regression (with no constant):
. reg d_lab d_pro d_growth, noconstant
First-Differenced Estimator

 The advantage of FD estimation is that the fixed effects are cancelled out.
 The intuition behind the FD estimator is that it uses only within-entity changes bypassing the
between-entity change
 Unobserved differences between countries no longer bias the estimator
 But, in the first-differenced model we cannot estimate the measure of heterogeneity, μi
 The fixed effects model can incorporate the estimates of cross section-specific unobserved
heterogeneity
Fixed Effects Model

 We assume that the individual effects are time constant but are not common across the entities.
 The distinctive error varies over individuals and time.
 In the fixed effects model we can estimate each μi along with β
 There are several ways for estimating a fixed effects model
 One popular method is the “within” estimation or mean-corrected estimation
 Another method for estimating fixed effects is the least squares dummy variable (LSDV) model
that uses dummy variables for the cross section units
The Within Estimation

 The “within” estimation uses deviations from group (or time period) means or variation
within each individual or entity.
 Let us start from the one-way error component model with single regressor:
yit = β0 + μi+ β1xit+ εit …………. (i)
➢ Taking mean of this equation over time for each i (“between” transformation), we have:

-----------------(ii)
The Within Estimation

 Again by taking average of equation (ii) across individuals, we have the following
mean equation:

 The underlying assumption here,

 This restriction on the coefficients of dummy variable is required to avoid the dummy
variable trap. Only β 1 and (β0 + μi) are estimable from equation (i) and not β0 & μi
separately unless the restriction is imposed.
The Within Estimation

➢ On the other hand, subtracting (ii) from (i) for each t (“within”
transformation) we get

➢ Here, the incidental parameter (μi) is no longer a problem and the model can
be estimated by applying OLS. Time constant unobserved heterogeneity is
no longer an issue in “within” estimation.
The “within” estimation, however, has
several disadvantages
• First, it will not work well with data for which within-
cluster variation is minimum.
• Second, data transformation for “within” estimation
Disadvantages wipes out all time-invariant variables like gender,
of Within citizenship and ethnic group, and it is not possible to
Estimation
estimate coefficients of such variables in “within”
estimation.

• Third, the “within” estimation does not report the


estimated fixed effects
LSDV Regression

 The least squares dummy variable (LSDV) regression is the OLS regression of a set of dummies in
fixed effects framework
 In many cases, the unobserved characteristics of the cross section units may be of interest to the
researchers.
 But, the within-group method does not estimate the unobserved fixed effects because of the
construction of the model, the unobserved effects are swept from the model.
LSDV Regression

 To estimate the fixed effects, we can treat the unobserved fixed effects as
the coefficients of the binary variables representing the cross section units
➢ The least squares dummy variable (LSDV) model provides the fixed effects
estimators along with the slope parameter.
➢ We also get estimates for the μi
➢ The LSDV estimator is practical only when N is small
Random Effects Model

 In LSDV, there is a possibility of the loss of degrees of freedom.


 The loss of degrees of freedom could be avoided if the unobserved effect μi is assumed to be
random.
 If the unobserved effects are random, the error component model will be random effects
model
 Random effect of the unobserved heterogeneity is captured by the distribution of the intercepts.
 In the random effects model, degrees of freedom are more because we do not need to estimate
the parameters describing the cross section-specific or time-specific unobserved effects
Random Effects Model

 The random effects model is an appropriate specification when the cross


section units in a panel are drawn randomly from a large population.
 Such type of sampling is more relevant for micro panel.
 The variation of unobserved effects across entities is assumed to be
random and uncorrelated with the independent variables included in the
model
Assumptions in Random Effect Model

For i ≠ j and t ≠ s

The components of the error terms are not correlated, i.e.


E(μi, εit) = 0
Assumptions in Random Effect Model

 The μi are independent of the error term εit and the regressors xit, for all i and t.
 Therefore, the mean and variance of the composite error are
E(uit) = 0, V(uit) = V(yit) = σy2 = σμ2 + σε2

 σμ2 & σε2 are called variance components of σy2

 Random effects model is also known as variance components model


 The covariance of the composite error

Cov(uit, ujs) = E(uit, ujs) = 0

You might also like