0% found this document useful (0 votes)
89 views16 pages

Ecotrics (PR) Panel Data 2

1. The document discusses panel data models, which involve observing multiple variables over multiple time periods for individual units like firms. 2. A key issue is whether unobserved productivity factors that are constant over time for each firm, like managerial quality, are correlated with observable inputs in the model. 3. The document outlines different assumptions that can be made about this correlation and how they determine whether to use a fixed effects, random effects, or pooled regression model.

Uploaded by

Arka Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views16 pages

Ecotrics (PR) Panel Data 2

1. The document discusses panel data models, which involve observing multiple variables over multiple time periods for individual units like firms. 2. A key issue is whether unobserved productivity factors that are constant over time for each firm, like managerial quality, are correlated with observable inputs in the model. 3. The document outlines different assumptions that can be made about this correlation and how they determine whether to use a fixed effects, random effects, or pooled regression model.

Uploaded by

Arka Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Panel Data

Examples of Panel Data

Suppose that the population is all manufacturing firms in a


country operating during a given three year period. Production
function describing the output in the population of firm is

Log (outputt ) = δ t + β1 log(lobourt ) + β 2 log(capitalt )


+ β 3 spillovert + qualityi + u

Here spillover is a measure of foreign firm concentration in a


region containing the firm.

The term quality contains unobserved factors- unobserved


managerial or work quality – which affect productivity and are
constant over time.

The error term u represents the unobserved shocks in each time

period. The presence of the parameter δ t represents intercepts


in each time period, allows aggregate productivity to change
over time.

The coefficients of regressors are assumed to be constant.

1
The important issue in this panel model is whether unobserved
productivity factors are correlated with the observable inputs.

Another issue is whether we can assume at any time point t, the


spillover effect is uncorrelated with the error term of all time
periods or not.

This problem is known as the problem of endogeneity.

For panel data it is always useful to add a subscript i to indicate


the corss section observation.

In this example i should represent the randomly sampled firm.

Let us rewrite the empirical model:

Log (outputit ) = δ t + β1 log(lobourit ) + β 2 log(capitalit )


+ β3 spilloverit + qualityi + uit

Here the explanatory variable quality does not vary over time it
varies only over cross section unit that is firm in this example.
Thus quality has the same effect in each time period, while the
term uit varies over time as well as the cross section unit.

2
Each firm is randomly chosen from the population of all
manufacturing firm (in this example).

Thus ‘i’ is an indicator of cross section unit and ‘t’ is an


indicator of time in a panel regression.

Example (Panel Data Model)

Suppose that for each cross section unit we observe data on


same set of variables for T time periods.

Let Xt be a vector of k exogenous variables which affect Yt. At


any time point t the population model becomes

Yt = X t β + u t , t=1,2,….,T
Now consider a simple equation to describe the saving behavior
over five year span.

Savingt= β 0 + β 1 Incomet + β 2 aget + β 3 education + u t

Where Income: Annual Income

Education is the years of education of the head of the household

3
Age : age of the head of the household

This is an example of a linear static panel data model. It is a


static model because all explanatory variables are date
contemporaneously with saving in period t.

In the Panel data model we observe saving and explanatory


variables over five year period.

The ordinary least square method that you have learnt in your
last class and its extentions mainly assumes that the explanatory
variables are exogenous in nature and they are uncorrelated with
the random error term.

What kind of exogeneity assumption about the explanatory


variables do we use for Panel data analysis?

One possibility is to assume that ut and xt are orthogonal in


conditional mean sense.

E(ut|xt)=0, t=1,2,3,…,T

4
We call this as contemporaneous exogeneity of Xt because it
only restricts the relationship between the disturbance and
explanatory variable in the same time period.

Now it is important to differentiate between the assumption


E(ut|xt)=0 and the assumption E(ut|x1, x2, …… xT)=0

E(ut|xt)=0 does not places any restriction on correlation between


xs and ut for s not equal to t, but E(ut|x1, x2, …… xT)=0

implies that ut is uncorrelated with the explanatory variables in


all time periods. This implies strict exogeneity of the
explanatory variables.

Primary motivation behind the panel data is to solve the omitted


variables problem. Now we will consider the time constant
unobserved effect (like quality in our first example).

We assume that unobserved effects are random variables.

Now the key question is the unobserved effect uncorrelated with


the explanatory variable?

Let y and x1, x2, …… xK be the observable random variables. Let c


be an unobservable random variable. We are often interested in

5
knowing the partial effect of observable explanatory variable xj
on the dependent variable:

E(y| x1, x2, …… xK,c)

We would like to hold c constant obtaining partial effects of the


observable explanatory variables.

Assuming a linear model with c entering along with xj we have

E(y| x1, x2, …… xK,c)= β 0 + Xβ + c

Now if c is uncorrelated with xj , that is, cov (xj, c)=0, then c is


just another observed factor affecting y that is not systematically
related to the observable explanatory variables whose effects are
of interest. On the other hand, if cov(xj,c) ≠ 0 for some j, putting c
into the error term can cause serious problems.

Without additional assumptions we can not consistently estimate


β nor will be able to determine whether there is a problem.

There are ways to address the problem:

1. Find a proxy for c which is uncorrelated with x.


2. Find proxy for x which is uncorrelated with c.

6
One of the important assumption of the panel data model is
the assumption that the unobserved effect c is constant over
time.
These effects are often interpreted as the feature of an
individual/firm such as congnigitive ability, motivation or
early family upbringing, that are given and do not change
over time.

Here we will study different estimation methods:

1. Pooled regression
2. Fixed effects
3. Random Effects

Simplest way of classifying these models is in terms of


unobserved effects.

Consider the linear panel model

yt = β0 + xt β + ci + ut , t=1,2

y it = xit β + vit

7
By definition E(ut |xt,c)=0

If we assume that the unobserved effects are present and they


are uncorrelated with the explanatory variables in all time
periods then the random effect model will be used.

i,e, if cov (xj, c)=0

If we assume that there is a correlation between the unobserved


term and the explanatory variables i.e.,

cov(xit,ci) ≠ 0

then fixed effect model will be used.

The above model can be rewritten as

y it = xit β + vit
Where vit = uit+ci, t=1,2,..,T are the composite errors. For each t,
vit is the sum of the unobserved effect and the idiosyncratic

8
error. The pooled OLS estimation is consistent if

E ( X it' vit ) = 0 , t=1,2,..,T


' '
Implies E ( X u
it it ) = 0 and E ( X it c i ) = 0

Then apply the pooled regression.

1. FE
2. RE&pooled reg

E ( X it' ci ) = 0

E ( X it' vit ) = 0
Pooled regression

An unobserved time constant variable is called an unobserved


effect in a panel data analysis.

If the unit of analysis is firm then c contains unobserved firm


characteristics- such as managerial quality that can be viewed as
being (roughly) constant over the period in question.

9
• When will you use the Panel data analysis?
• What is the data structure of a panel data model?
• What is the basic problem of Panel data modeling?
• Write down the panel model
• What is an unobserved effect?
• How can you use the concept of correlation between
unobserved effect and the exogenous explanatory variables
to identify which method of estimation to be followd?

10
If we were to assume E(xt’c)=0, we would apply the pooled
OLS.

If c is correlated with any element of xt, then pooled OLS is


biased and inconsistent.

Why?

Hint: Endogeneity problem

Explain endogeneity problem in this case.

Consider the linear panel data model

yit = β 0 + xit β + ci + uit (1)

E ( X it' u it ) = 0 and E ( X it' ci ) = 0

Now elements of xt being exogenous are uncorrelated with ut.

Now ci is unobserved random variable therefore we don’t now


the elements of ci.

The model that we actually estimate is yit = β 0 + xit β + vit

11
Where vit = uit + ci ,

As cov( xit , ci ) ≠ 0 therefore cov( xit , v it ) ≠ 0

Thus the elements of xit are no longer exogenous rather they are
endogenous in nature. This is known as the problem of
endogeneity.

The assumption of cov( x t , u s ) = 0 for all t & s puts no

restrictions on correlation between xit & unobserved effect ci .

If ci is allowed to be arbitrarily correlated with the elements

of xit , the effects of any variables that is constant across time


cannot be distinguished from the effect of ci.

Assumptions about the unobserved effects and explanatory


variables:

The basic unobserved effects model (UEM) can be written, for a


randomly drawn cross section observation i, as

yit = β 0 + xit β + ci + uit , t=1,2,….,T

12
Where xit is 1xk and can contain observable variables that
change across t but not i.

ci: unobserved effect

if i indicates individual

then ci indicates individual effect or individual heterogeneity.

uit is the idiosyncratic error or idiosyncratic disturbances

because uit s change across t as well as across i.

In the text you will find that ci is treated either as random effect
or a fixed effect.

In modern econometric analysis “random effect” is synonymous


with zero correlation between the observed explanatory
variables and the unobserved effect.

cov(xit,ci) = 0

In applied papers, when ci is referred to as, say, an “individual


random effect”, then ci is probably being assumed to be

uncorrelated with xit .

13
In microeconometric application, the term “fixed effect” does
not usually mean that ci is being treated as non-random rather it
means that one is allowing for arbitrary correlation between the

unobserved effect ci and the observed explanatory variable xit .

Implies cov( xit , ci ) ≠ 0

We will discuss two different estimation methods random


effects and fixed effects estimation.

Example:

A standard model for estimating the effects of job training or


other programs on subsequent wages

log(wageit ) = θt + Z it γ + δ1 programmeit + ci + uit

Where i: individual, t: time period, θt is time varying intercept

Z it is the set of observable characteristics that affect wage and


may also be correlated with programme participation.

14
Suppose at t=1 no one has participated in the programme
implying programmei1=0 for all i.

Then a subscript is chosen to participate in the programme(or an


individual choose to participate), and subsequent wages are
observed for the control and the treatment groups in t=2.

If individuals choose whether or not to participate in the


program, that choice could be correlated with the ability. This
possibility is often called the self selection problem.

Administration might assign duties based on characteristics that


econometricians can not observe.

We feel comfortable with assuming that uit with programmeit.


The future programme participation could depend on uit if
people choose to participate in the future based on shocks to
their wage in the past or if administrations choose people as
participants at time t+1 who had low uit. Such feedback may not
be important since ci is being allowed for but it could be.

Estimating unobserved effects model by pooled OLS.

Consider the linear Panel Model

15
yit = β 0 + xit β + vit , t=1,2,….,T

Where vit = uit + ci , t=1,2,…T are the composite errors. For


each t, vit is the sum of unobserved effect and idiosyncratic
error. Thw pooled OLS estimation is consistent

if E(Xit’vit)=0, t=1,2,….,T

Implies E(Xit’uit)=0 and E(Xit’ci)=0, t=1,2,….,T

Even if the above assumption holds, the composite errors will be


serially correlated due to presence of ci in each time period.
Therefore, inference using pooled OLS requires the robust test
statistic.

Since vit depends on ci for all t, the correlation between vit and
Vis does not generally decreases as the distance |t-s| increases.
Therefore, it is important that we be able to do large N and fixed
T asymptotics when applying pooled OLS.

If all explanatory variables are date contemporaneously with


dependent variable yt implies static model.

16

You might also like