0% found this document useful (0 votes)

6K views97 pages

Pooled Cross-Section Time Series Data

The document discusses panel or longitudinal data, which involves observing unique cross-sectional units over multiple time periods. It provides examples of how to model panel data using individual fixed effects and time dummies to control for unobserved heterogeneity. A key challenge is that unobserved time-invariant characteristics of individuals may be correlated with the independent variables, biasing ordinary least squares estimates. The difference-in-differences estimator is presented as a way to estimate treatment effects using panel data.

Uploaded by

thiagarage

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6K views97 pages

Pooled Cross-Section Time Series Data

Uploaded by

thiagarage

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 97

Pooled Cross-Section

Time Series Data

Wooldridge Chapters 13 and 14
Types of Data
 Pooled Cross Sections: Independent cross
section data at different points in time.

 Panel / Longitudinal: Uniquely identified

cross section units (i) followed over time.
• Balanced Panel: All i appear in every period.
• Unbalanced Panel: Some i are missing for
some time periods.

2
Example: Two Period Panel Data
N=4, T=2
i t Consumption (Y) Income
(X)
1 1 72 98
1 2 75 102
2 1 31 40
2 2 26 39
3 1 55 66
3 2 62 70
4 1 41 59
4 2 45 60 3
Yit = B0 + B1Xit + eit

B1 = 0.72, but how to interpret?

80
60
40
20

40 60 80 100
x

Fitted values y 4
Interpreting Coefficients
 Yit = B0 + B1Xit + eit

Change in Yi Yit Yit  Y jt

B1  
across individuals X it X it  X jt
at time t.

Yit Yi ,t 1  Yit
Change in Yt over B1  
time for a given X it X i ,t 1  X it
individual.
5
Use intercept dummies to
differentiate between “time” and
“type” effects

 Time Dummies: the effect of being

in time period 2 vs. time period 1 on
the expected value of Yit, holding all
else constant.

 Type Dummies: the effect of being

of type B vs. type A on the expected
value of Yit, holding all else constant. 6
Time Dummies
 Let D2,t = 0 if t = 1
1 if t = 2

Yit = B0 + TD2,t + eit

Where Y2 is the
 T  (Y2  Y1 ) mean at time 2
across all i.

7
Example: Two Period Panel Data
with Time Dummy
i t DT (Yit) (Xit)
1 1 0 72 98
1 2 1 75 102
2 1 0 31 40
2 2 1 26 39
3 1 0 55 66
3 2 1 62 70
4 1 0 41 59
4 2 1 45 60 8
Time Dummy Example
sum y if t==1

Variable | Obs Mean

-------------+--------------------------
y | 4 49.75

sum y if t==2

Variable | Obs Mean

-------------+--------------------------
y | 4 52

Reg y dt
Coeff = 52 - 49.75 = 2.25
9
Time Dummy represents shift of regression line from period
1 to period 2. When regressed on Yit along with Xit:

t=2
80

t=1
60

T = 2.25
40
20

40 60 80 100
x

Fitted values y 10
Type Dummies
 Separate cross-sectional dimension of
sample into qualitative “types” (e.g. male vs.
female, rural vs. urban, foreign vs. domestic,
treatment vs. control, etc.)

 Let DiB = 1 if individual i is Type B

 = 0 otherwise.
Yit = B0 + BDiB + eit   (Y  Y
B B A )
When Xit is included in regression, B
represents shift in intercept. 11
Example: Two Period Panel Data
with Type Dummy
i t Type DB (Yit) (Xit)
1 1 A 0 72 98
1 2 A 0 75 102
2 1 B 1 31 40
2 2 B 1 26 39
3 1 B 1 55 66
3 2 B 1 62 70
4 1 A 0 41 59
4 2 A 0 45 60 12
From Simple Example
reg y db
y | Coef. Std. Err. t
db | -14.75 12.51582 -1.18
_cons | 58.25 8.850024 6.58

sum y if db==1
Variable | Obs Mean
y | 4 43.5

sum y if db==0
Variable | Obs Mean
y | 4 58.25

Coefficient = difference in means

= 43.5 - 58.25 = -14.75
13
Type Dummy represents shift of regression line from type B
to Type A. When regressed on Yit along with Xit:

type=A
80

B = -14.25
60

type=B
40
20

40 60 80 100
x

Fitted values y 14
Difference-in-Differences Estimator
 Estimates the difference across types, and
over time, using simple dummy variable
framework.

 Excellent for policy analysis. Takes advantage

of “natural experiment” quality of panel data.

 Can be expanded beyond two period

framework.

Examples: stadium construction, natural

disaster, water treatment facility, tax cuts. 15
Use interaction term between type
and time dummies.
Yit  B0  B1 X it   0 D2,t   1 DB ,it   DD D2,t D B ,it  eit

 DD  (YB , 2  Y A, 2 )  (YB ,1  Y A,1 )

Difference Difference
“After” “Before” 16
Difference Coefficient
 Also known as “Average Treatment
Effect”,

 Can also be written as

 DD  (YB , 2  YB ,1 )  (Y A, 2  Y A,1 )

Treatment Impact on Treatment Impact on

‘treated’ control group.
17
D-in-D example
i t Type DB D2T DB*D2T (Yit) (Xit)
1 1 A 0 0 0 72 98
1 2 A 0 1 0 75 102
2 1 B 1 0 0 31 40
2 2 B 1 1 1 26 39
3 1 B 1 0 0 55 66
3 2 B 1 1 1 62 70
4 1 A 0 0 0 41 59
4 2 A 0 1 0 45 60
18
From simple example
Reg y db d2 dd
y | Coef. Std. Err. t
--------------------------------------------
db | -13.5 21.6015 -0.62
d2 | 3.5 21.6015 0.16
dd | -2.5 30.54914 -0.08
_cons | 56.5 15.27457 3.70

Mean of y for type b when t=2: 44.00

Mean of y for type a when t=2: 60.00
Mean of y for type b when t=1: 43.00
Mean of y for type a when t=1: 56.50

Coefficient = (44.00 - 60.00) - (43.00- 56.50)

= (-16)-(-13.5) = -2.5 19
How much more did treatment group (B) outcome increase
than control group (A) from time 1 to time 2?

YB , 2
 DD  (YB , 2  YB ,1 )
 (Y A, 2  Y A,1 )

YB ,1

Y A, 2

Y A,1
t=1 t=2
20
Panel Data Problem!
Unobserved Heterogeneity
 There exist characteristics of each
individual that persist over time
which cannot be included in the
regression (unobservable in available
data), but which none-the-less impact
the observed variation in our
dependent variable.

21
Composite Errors
 These time-invariant unobserved
effects are best modeled as a
component in the regression error term.

 It is this “composite error” approach

that sets apart panel regression from
OLS.
22
Examples
 Unobservable motivational skills of
firm manager in a production function.

 Skills, charisma, connections,

nepotism in a wage model.

 Levels of unobserved macro-level

institutional corruption or inefficiency
in a cross-sectional growth model.

23
The Composite Error Model
 Yit = B0 + Xit + vit

 Where vit = uit + ai is the composite

error, and…

 uit is the random, time-varying

idiosyncratic error.

 ai is the time invariant error 24

component.
The Composite Error Problems
 1.) If COV(ai, Xit)  0, then OLS
estimates will be biased.

Very much like simultaneous equations

(endogeneity) bias, but here covariance
with error term will only involve cross
sectional variation.

25
Composite Error Bias
Yit  B0  B1 X it  a i  u it

Bˆ1 
 ( X  X )( a  u
it i it )
(X  X )
2
it

E   ( X it  X )ai  E   ( X it  X )u it 
E ( Bˆ1 )  
(X (X
2 2
it  X) it  X)

ˆ COV ( X it , ai ) COV ( X it , u it )
E ( B1 )  
 ( X it  X )  ( X it  X )
2 2
26
Examples:
1. manager charisma correlated with
firm size in production function.

2. Nepotism/networking correlated
with education in wage equation.

3. Institutional quality associated

with development in corruption
equation.
27
 2.) Since ai represents a time-invariant
component of the error, composite
errors will be correlated over time –

Serial Correlation is the result:

Corr(vit, vi,t+s)  0

Estimates will not be biased, but

goodness of fit and significance of
coefficients will be overstated.
28
How to deal with the Composite
Error problem?
 Pooled OLS – do nothing about it.

 First Difference – eliminate ai.

 Dummy Variables – estimate the ai when N

small

 Fixed Effects. – estimate ai when N large.

 Random Effects. – account for serial correlation

29
First Difference Transformation (two
period panel) with Time dummy
Yit = B0 + 0DTt + Xit + ai + uit
 For Period 2:
Yi2 = (B0 + 0) + Xi2 + ai + ui2
 For Period 1:
Yi1 = B0 + Xi1 + ai + ui1
 First Difference = Yi = Yi2 – Yi1

Yi = 0 + B1(Xi2 – Xi1) + (ui2 – ui1)

Yi = 0 + B1(Xi) + ui
30
First Difference
 Transformation eliminates ai terms.

 Corrects for heterogeneity bias and serial

correlation.

 Problems:
• 1. Eliminates all time invariant variables (type
dummies)

• 2. Eliminates time dimension in two period

panel (reduces T by 1 in general)
31
“Type” Dummy Variables for each i
 If ai terms are viewed as coefficients to be
estimated, a dummy can be constructed
that uniquely identifies each individual in
the sample.

 Dummy coefficient will represent effect of

the sum of all unobserved attributes.

32
Type Dummies
 Solves ‘time invariant bias’ problem
by removing ai from error
component, and directly estimating
the effects.

 Obvious problem is that degrees of

freedom are vastly reduced.
Requires a large number of time
periods relative to cross sectional
units.
33
Example: 4 country panel over 250
months
Step 1: append the separate country data
files:
 use c:/stata627/nfa/canada.dta

 append using

c:/stata627/nfa/italy.dta
 append using

c:/stata627/nfa/japan.dta
 append using c:/stata627/nfa/uk.dta

 tsset code time

34
Dummy Example – estimates of ai
xi:reg y cpi r er i.code
i.code _Icode_1-5 (naturally coded; _Icode_1 omitted)

Number of obs = 990

Prob > F = 0.0000
R-squared = 0.7464
Adj R-squared = 0.7448
------------------------------------------------------
y | Coef. Std. Err. t P>|t|
-------------+----------------------------------------
cpi | .3817633 .0117365 32.53 0.000
r | -.4944136 .0780945 -6.33 0.000
er | -.0196729 .0014589 -13.49 0.000
_Icode_3 | 26.41765 2.053128 12.87 0.000
_Icode_4 | -12.51685 .6298041 -19.87 0.000
_Icode_5 | -1.729212 .5753217 -3.01 0.003
_cons | 67.36739 1.632653 41.26 0.000
-------------------------------------------------------
 Code 1 = Canada, omitted
 Code 3 = Italy, positive estimate of ai
 Code 4 = Japan, negative ai
35
 Code 5 = UK, negative ai
Fixed Effects
 Assume CORR(ai, Xit)  0, but
 CORR(uit, Xit) = 0.

 An alternative to the first difference

transformation is the “Time De-
meaning” transformation of the fixed
effects model.

 Results in a model essentially identical to

the Dummy model, without having to
estimate N-1 dummy coefficients.
36
Fixed Effects Transformation
(1) y it   0  1 xit  ai  u it original model
1 T
y i  avg. for each individual over time   y it
T t 1
(2) y i   0   1 xi  a i  u i
(2) is the " between" equation, only shows variation
between individuals, taking out the time element.
Subtract the mean from the level each time period :
y it  y i  (  0   0 )   1 ( xit  xi )  (ai  ai )  (u it  u i )
( y it  y i )  1 ( xit  xi )  (u it  u i )
(3) 
 y it  1 xit  uit
37
Fixed Effects Regression is equivalent to
running OLS on Equation 3:

( yit  yi )  1 ( xit  xi )  (uit  ui )

(3) 
 yit  1 xit  uit

CORR( xit , uit )  0

ˆ
 1 is unbiased.
FE

This is also known as the “within”

estimation equation, as it shows the 38

variation within a group over time.

Fixed Effects Coefficients
 Will have same “two-dimension”
interpretation as pooled OLS.

 Variation in transformed variables are

same as in Yit and Xit.

Yit Yit
B1  
X it X it
39
Fixed Effects Transformation With Time-Invariant
Dummy Independent Variable
(1) y it   0  1 xit   0 Dit  ai  u it
Dit  (0,1), and time invariant for each i
1 T 1
Di  avg. for each individual over time   Dit  TDi  Di
T t 1 T
(2) y i   0  1 xi   0 Di  ai  u i

y it  y i  (  0   0 )   1 ( xit  xi )   0 ( Di  Di )  (ai  ai )  (u it  u i )
( y it  y i )  1 ( xit  xi )  (u it  u i )
(3) 
 y it  1 xit  uit

Problem : Both ai and Di are eliminated. 40

Example: Two Period Panel Data
N=4, T=2
i t (Yit) Yi (Yit  Yi )  Yit
1 1 72 73.5 -1.5
1 2 75 73.5 1.5
2 1 31 28.5 2.5
2 2 26 28.5 -2.5
3 1 55 58.5 -3.5
3 2 62 58.5 3.5
4 1 41 43 -2
4 2 45 43 2
41
Goodness of Fit
 A fixed effects regression returns three
“R-square” measures. They are each
actually squared correlations between
predicted and observed values:

 1. Within R2: fitted de-meaned yit

 2. Between R2: fitted y_bari
 3. Overall R2: fitted yit (pooled OLS)
42
Panel Regressions in Stata
 XT = cross-section time series.
 “xtreg y x, fe” will run a panel fixed
effects regression.

 Must declare your “i” and “t” identifiers:

• tsset code time, for example.

 Unfortunately, Stata refers to the time-

invariant error component (our ai) as u_i.

43
Fixed Effects Stata Example
xtreg y cpi r er,fe

Fixed-effects (within) regression Number of obs = 990

Group variable (i): code Number of groups = 4

R-sq: within = 0.7071 Obs per group: min = 244

between = 0.0335 avg = 247.5
overall = 0.1827 max = 250

F(3,983) = 791.14
corr(u_i, Xb) = -0.7495 Prob > F = 0.0000

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cpi | .3817633 .0117365 32.53 0.000 .3587318 .4047948
r | -.4944136 .0780945 -6.33 0.000 -.6476647 -.3411625
er | -.0196729 .0014589 -13.49 0.000 -.0225358 -.0168101
_cons | 70.49544 1.529625 46.09 0.000 67.49374 73.49715
-------------+----------------------------------------------------------------
sigma_u | 16.538008 (std. error of time-invariant error)
sigma_e | 6.3818613 (std. error of idiosyncratic error)
rho | .87038904 (fraction of variance due to u_i)
------------------------------------------------------------------------------
44
F test that all u_i=0: F(3, 983) = 362.02 Prob > F = 0.0000
Random Effects
 Assumes CORR(ai, Xit) = 0.
 Therefore, OLS coefficients will not
suffer “composite error bias”, as was
assumed with Fixed Effects.

 we do not need to eliminate ai terms.

 Although ai terms do not truly have

to be “randomly” assigned, there is
no structural relationship between ai
and Xit in a correctly specified model.
45
Random Effects
 Even when CORR(ai, Xit) = 0, we still have
to account for the serial correlation
introduced by the ai error component.

 A “Quasi-demeaned” data transformation is

used to accomplish this, wherein ai are
altered but not eliminated.

 A bonus is that time-invariant dummies are

46
not eliminated.
Random Effects Assumptions
 1. E(ai |Xit) = E(ai) = 0,
• independence of ai’s and X’s. cov(ai,Xit)=0

 2. E(uit | Xit, ai) = 0

 3. E(uituis) = cov(uit,uis) = 0 for all t≠s.

 4. E(uit2 |Xit,ai) = 2u = constant

 5. E(ai2 | Xit) = Var(ai) = 2a

47
Random Effects
 Under the preceding criteria, the
composite error does not violate OLS
assumptions.

 Unnecessarily eliminating the ai terms

will cause estimates to be inefficient.

 Don’t use Fixed Effects unless

warranted.
48
Random Effects
 However, running Pooled OLS will
not be appropriate because the
composite errors are still serially
correlated over time.

 It can be shown that:

 2
corr (vit , vis )  2 ,t  s
a

a u
2

49
Where, again: vit = uit + ai
Random Effects
 Random effects transformation is
more complicated than FD or FE, but
basic idea is to eliminate serial
correlation in the error term by using
information on variances of fixed and
idiosyncratic errors.
50
Random Effects (RE)

 Transformation results in a weighted

average of the estimates provided by
the “within” and “between”
estimators.

51
RE Transformation
(1) yit   0  1 xit  ai  uit

Define a weighted average :

 y i   0  1 xi  ai  ui then subtract from (1)

 
( yit   y i )  (1   )  0  1 ( xit   xi )  (1   )ai  (uit   ui )

 ˆ 2
where ˆi  1  u
Tiˆ a  ˆ u
2 2
52
ˆ u
ˆ 2
Given i  1 
Ti a   u
ˆ 2
ˆ 2

 It can be shown that the composite error

term vit augmented by the weighting
term  (lambda) will NOT suffer from
serial correlation.

Corr(vit, vis) = 0
53
ˆ u ˆ 2
Given i  1 
Tiˆ a2  ˆ u2
NOTE:
 If var(a ) = 0, meaning a is always zero
i i
(no time-invariant effects), then lambda
equals 0 and RE regression is equivalent
to Pooled OLS equation (1) - all lambda-
weighted terms drop out.

 As 2a dominates 2u, ai terms become

more important,  goes to 1, and
RE→FE.
54
RE Stata Example (N=4)
xtreg y cpi r er
Random-effects GLS regression Number of obs = 990
Group variable (i): code Number of groups = 4
R-sq: within = 0.6252 Obs per group:min = 24
beween = 0.7702 avg = 247.5
overall = 0.4662 max = 250
Random effects u_i ~ Gaussian Wald chi2(3) = 861.17
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
----------------------------------------------------------
y | Coef. Std. Err. z P>|z|
-------------+--------------------------------------------
cpi | .3468475 .0158341 21.91 0.000
r | .0072637 .1077631 0.07 0.946
er | -.0002592 .0005834 -0.44 0.657
_cons | 61.17895 2.168505 28.21 0.000
-------------+--------------------------------------------
sigma_u | 0
sigma_e | 6.3818613
rho | 0 (fraction of variance due to u_i) 55
Fixed vs. Random Effects
 As a practical matter, Random Effects is preferred
when key explanatory variables are time-
invariant.

 The Fixed Effects view is that the unobserved

heterogeneity is in itself an explanatory variable
that ideally would have a coefficient to be
estimated.

56
Fixed vs. Random Effects
 The Random Effects view is that
unobserved heterogeneity is
“randomly assigned” to each
cross sectional entity and not
correlated with other explanatory
variables.
57
When to use FE vs. RE?
The Hausman Coefficient Test
 The logic of the test is the following:
• If CORR(ai, Xit)  0, then RE is biased.

• If CORR(ai, Xit) = 0, then both RE and FE are

unbiased, but it can be shown that RE is more
efficient (smaller standard error of
coefficienents)
• Therefore, if the FE coefficients are significantly
different from the RE coefficients, then RE must
be biased, so use FE.
• If FE coefficients are not significantly different
from RE, then neither is biased, so use RE.
58
General Hausman Test
 test the equality of the vector of coefficients:

 ˆ1FE  ˆ1RE
 ˆ FE  ˆ RE
ˆ FE   2 , ˆ RE   2
 ˆ FE  ˆ RE
 3  3
ˆ
V (  )  vector of variance terms for each coefficient.
*E

Test Statistic 
H  ( ˆ FE  ˆ RE )'[V ( ˆ FE )  V ( ˆ RE )]1 ( ˆ FE  ˆ RE )

H is distributed Chi-square with k degrees of freedom 59

Single Coefficient Version
 If we are primarily interested in a single
parameter, there is a t-statistic version
of the Hausman test.

 Let B1FE and B1RE be the fixed- and

random effects coefficients for X1,it

( B1FE  B1RE ) Where t is

t
 se( B
1
FE 2
)  se( B )1 
RE 2 1 / 2 asymptotically
normally
distributed 60
Note: Hausman Test Problem
 Most of the time the Hausman test
works fine, however…

 The test statistic is based on the

assumption that RE is more efficient
(estimates have a smaller variance)
than FE.

61
 While this can be shown to be
asymptotically true, it may not hold for a
given sample.

 If this is the case, then the test statistic is

negative, and cannot be interpreted as a
Chi-square.

 This is why it is important to type :

• ’Hausman unbiased efficient’

 Where ‘unbiased’ is the vector of FE

coefficients and ‘efficient’ is the vector of
RE coefficients

62
Hausman Test Interpretation
 H0: FE = RE (difference in coefficients
is NOT systematic)
 HA: FE  RE.

 If H > critical value, we reject H0,

• conclude that since FE  RE
• Random Effects is biased, therefore
• CORR(ai, Xit)  0, and
• Fixed Effects is the appropriate model. 63
Hausman Test in Stata
xtreg y cpi r er,fe
estimates store fe
xtreg y cpi r er
estimates store re
hausman fe re
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| fe re Difference S.E.
----+------------------------------------------------------------
cpi | .3817633 .3468475 .0349158 .
r | -.4944136 .0072637 -.5016774 .
er | -.0196729 -.0002592 -.0194137 .0013371
-----------------------------------------------------------------
b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg

Test: Ho: difference in coefficients not systematic

chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 162.38
Prob>chi2 = 0.0000
(V_b-V_B is not positive definite)

Reject H0 in this case, so go with Fixed Effects

64
Lagrange Multiplier Test for
Random Effects
 Essentially, this is a derivation of a test for
heteroskedasticity in a panel composite
error setting, where vit = ai + uit.

 Assume var(uit) is constant, and uit is not

correlated with Xit.

 Then any correlation between var(vit) and

Xit must be due to the time-invariant error
ai.
65
Stata Note for Panel Regressions
 You will notice that running FE / RE
regressions with large N can be time
consuming, which is really annoying
during the specification search process.

 This is because each regression requires

Stata to perform the ‘de-meaning’
transformation for each observation from
the original data.

66
Stata Note
 The ‘xtdata’ command allows you to create
a new data set of the transformed
variables.

 Running OLS on the transformed variables

is equivalent to the transformed FE/RE
regression.

 Typing ‘xtdata y x1 x2,fe’ will create a

new .dta file with the fixed effect de-
meaned values of the specified variables
for each observation. 67
Extensions to Panel Regression
 1.) 2SLS/IV with panel
 Xtivreg y x1 (x2=z), fe

 2.) Cluster effects for cross-sectional

data.

 3.) Auto-correlated idiosynchratic errors

(uit)

68
Extension 1: IV Panel
 When an independent variable is
endogenous in a panel regression,
each stage of the two stage least
squares process must take into
account the composite error issue.

 i.e. the first stage and second stage

will either be RE or FE regression,
depending on which is appropriate. 69
Yit = B0 + Xit + ai + uit

 The fixed effects transformation will

address the issue of
COV(Xit,ai) ≠ 0.

But what about when

COV(Xit,uit) ≠ 0?

70
Panel 2SLS
(1) yit   0  1 xit  ai  uit
CORR( xit , ai )  0
CORR( xit , uit )  0

Define variable zit such that

CORR( zit , ai )  0 but
CORR( zit , uit )  0
Therefore zit will be exogenous in (1), but
will require the fixed effects transformation
to be used as an effective instrument. 71
First Stage FE

xit   0   1zit  eit , where zit  ( zit  zi )

ˆ FE
1 is unbiased.
Save fitted, transformed, values of xˆit .

72
Second Stage FE
(1' ) yit  1 xˆit  uit

CORR ( xˆit , ai )  0 (because of the umlat)

CORR ( xˆ , u )  0 (because of the hat)
it it

ˆ
 FE , 2 SLS
1 will be unbiased.
73
Extension 2: Cluster Regression
 Allows for a Fixed Effects transformation with
single period cross-section data.

 “cluster-” or “group-” invariant errors replace

“time-invariant” errors (ai).

 For example, there may be “within village

effects” that will be the same for all households
in Village A that differ from Village B.

 Often can be controlled for with “cluster 74

dummy” variables.
Cross Section Cluster Example
Household Village Consumptio Income
(i) (j) n (Yij) (Xij)
1 1 500 750
2 1 650 1000
3 1 475 725
1 2 600 700
2 2 625 750
3 2 550 600
1 3 575 1100
2 3 625 1200
75
3 3 600 1000
Cluster Regression
 Model:
 Xij = observation for household i in village j

Yij = B0 + Xij + aj + uij

 The analogy to panel structure is that i acts like
the time variable, and j acts like the cross-
sectional identifier.
 Multiple observations for a given village j.
 aj is the “cluster invariant error” or “village
level fixed effect”
76
Fixed Effects for Cluster
 Again, if there is correlation between
the “cluster-invariant” error (aj) and
the independent variables (Xij), then
the coefficient estimates will be biased.

 Fixed Effects transformation eliminates

the aj by subtracting the cluster mean
from each observation.
(Yij  Y j )  B1 ( X ij  X j )  (a j  a j )  (uij  u j )
Y j  village level mean
Y  B FE X  u 77

ij 1 ij ij
Cluster Effects Transformation

i j y x ybarj y_umlatij xbarj x_umlatij

1 1 500 750 541.67 -41.67 825 -75

2 1 650 1000 541.67 108.33 825 175
3 1 475 725 541.67 -66.67 825 -100
1 2 600 700 591.67 8.3333 683.33 16.67
2 2 625 750 591.67 33.333 683.33 66.67
3 2 550 600 591.67 -41.67 683.33 -83.3
1 3 575 1100 600 -25 1100 0
2 3 625 1200 600 25 1100 100
3 3 600 1000 600 0 1100 -100

78
Transformed OLS Regression
reg y_umlat x_umlat

Source | SS df MS Number of obs = 9

-------------+------------------------------ F( 1, 7) = 27.86
Model | 17649.2873 1 17649.2873 Prob > F = .0011
Residual | 4434.04639 7 633.435199 R-squared = .7992
-------------+------------------------------ Adj R-squared = .7705
Total | 22083.3337 8 2760.41671 Root MSE = 25.168

------------------------------------------------------------------------
y_umlat | Coef. Std. Err. t P>|t|
-------------+----------------------------------------------------------
x_umlat | .4759358 .0901646 5.28 0.001
_cons | 4.09e-07 8.38938 0.00 1.000
------------------------------------------------------------------------

79
FIXED EFFECTS
tsset j i
panel variable: j (strongly balanced)
time variable: i, 1 to 3
delta: 1 unit

xtreg y x,fe
Fixed-effects (within) regression Number of obs = 9
Group variable: j Number of groups = 3
within = 0.7992 Obs per group: min = 3
between = 0.0961 avg = 3.0
overall = 0.2517 max = 3

F(1,5) = 19.90
corr(u_i, Xb) = -0.8365 Prob > F = 0.0066
---------------------------------------------------------------------
y | Coef. Std. Err. t P>|t|
-------------+-------------------------------------------------------
x | .4759358 .1066842 4.46 0.007
_cons | 163.978 93.28558 1.76 0.139
-------------+-------------------------------------------------------
sigma_u | 95.865925
sigma_e | 29.779343
rho | .91199744 (fraction of variance due to u_i)
---------------------------------------------------------------------
F test that all u_i=0: F(2, 5) = 9.34 Prob > F = 0.020580
Cluster (village) Dummies
xi:reg y x i.j

i.j _Ij_1-3 (naturally coded; _Ij_1 omitted)

Source | SS df MS Number of obs = 9

-------------+------------------------------ F( 3, 5) = 8.88
Model | 23621.5092 3 7873.8364 Prob > F = 0.0191
Residual | 4434.04635 5 886.809269 R-squared = 0.8420
-------------+------------------------------ Adj R-squared = 0.7471
Total | 28055.5556 8 3506.94444 Root MSE = 29.779

-----------------------------------------------------------------------
y | Coef. Std. Err. t P>|t|
-------------+---------------------------------------------------------
x | .4759358 .1066842 4.46 0.007
Vlg2 _Ij_2 | 117.4242 28.62912 4.10 0.009
Vlg3 _Ij_3 | -72.54902 38.10424 -1.90 0.115
Vlg1 _cons | 149.0196 89.678 1.66 0.157
-----------------------------------------------------------------------
81
“predict ai, u” to view the estimated ai

i j _Ij_2 _Ij_3 ai
1 1 0 0 -14.9584
2 1 0 0 -14.9584
3 1 0 0 -14.9584
1 2 1 0 102.4658
2 2 1 0 102.4658
3 2 1 0 102.4658
1 3 0 1 -87.5074
2 3 0 1 -87.5074
3 3 0 1 -87.5074
82
Aside. . . ”xtdes” command
xtdes
j: 1, 2, ..., 3 n = 3
i: 1, 2, ..., 3 T = 3
Delta(i) = 1 unit
Span(i) = 3 periods
(j*i uniquely identifies each observation)

Distribution of T_i:
min 5% 25% 50% 75% 95% max
3 3 3 3 3 3 3

Freq. Percent Cum. | Pattern

---------------------------+---------
3 100.00 100.00 | 111
---------------------------+---------
3 100.00 | XXX

83
Extension 3: Autocorrelation of uit’s
 Random Effects transformation eliminated
autocorrelation amongst composite errors
due to presence of ai.

 Fixed Effects eliminated autocorrelation due

to ai by eliminating the time-invariant error.

 What if, in addition, uit is autocorrelated?

RE or FE alone will not address the issue.
84
Panel FE Regression with AC
(1) yit   0  1 xit  ai  uit
CORR ( xit , ai )  0
uit  ui ,t 1   it
-1    1
 it ~ N (0,  ) 2


(2) yit  1 xit  ui ,t 1   it  ui

(2.1) yit  1 xit   (ui ,t 1  ui ,t 1 )   it 85
 Equation (2.1) is now a linear AR(1) Model.

 To solve, we need to use the Cochrane-

Orcutt method of estimating , then using
the generalized difference equation to
eliminate the term:

 (ui ,t 1  ui ,t 1 )

86
STATA to the rescue again!
 The command:

“xtregar y x,fe”

Will simultaneously transform the data

to eliminate the ai terms AND
estimate  AND provide consistent
standard errors with the generalized
difference equation.
87
Xtregar Example from 4 country panel
xtregar y r cpi er,fe

FE (within) regression with AR(1) disturbances Number of obs =986

Group variable: code Number of groups =4

R-sq: within = 0.0155 Obs per group: min =243

between = 0.5840 avg =246.5
overall = 0.4567 max =249

F(3,979) = 5.13
corr(u_i, Xb) = -0.1308 Prob > F = 0.0016
------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t|
-------------+----------------------------------------------------------
r | -.0362285 .0875633 -0.41 0.679
cpi | .2832925 .076438 3.71 0.000
er | .0015201 .0029347 0.52 0.605
_cons | 68.766 .2288196 300.52 0.000
-------------+----------------------------------------------------------
rho_ar | .9718915
sigma_u | 6.3918957
sigma_e | 1.7246814
rho_fov | .93213626 (fraction of variance because of u_i)
88
------------------------------------------------------------------------
F test that all u_i=0: F(3,979) = 2.14 Prob > F = 0.094
Stata Note – balancing your panel
 It may be useful to use only those
“entities” that appear in all time
periods. Suppose T=20 – use the
following:

Sort entity time

by entity: gen count=_N
keep if count==20
89
Panel Data Management in STATA
 Common problem is that original data
is stored in “wide” or “rectangular”
form, wherein values for a given year
are stored in a separate column.
 For example, in a cross-country panel,
FDI in 2000 has one column, with each
row representing a unique country.
Likewise for FDI in 2001, etc.

90
Example of “wide” form data set

Countries Code fdi2000 fdi2001 fdi2002

Argentina 1 1.04E+10 2.17E+09 2.15E+09
Australia 2 1.36E+10 8.26E+09 1.77E+10

Austria 3 8.52E+09 5.91E+09 3.19E+08

Bangladesh 4 2.80E+08 7.90E+07 5.20E+07

91
Problem
 In order to run a panel regression in
STATA, we need data to be stored in
“long” form.

 Here, each row is identified by both a

time period and country code. A
variable like FDI will have a single
column.
92
Example of “long” form data set
 code year countries fdi
1 2000 Argentina 1.040e+10
1 2001 Argentina 2.170e+09
1 2002 Argentina 2.150e+09

2 2000 Australia 1.360e+10

2 2001 Australia 8.260e+09
2 2002 Australia 1.770e+10

93
The “reshape” STATA command
 Instead of copying and pasting in
excel, load the data into STATA as
“wide” form, then transform.

 The “reshape” command will

generate the “time” variable for you,
and combine separate time periods
into a single column.
94
reshape long fdi, i(code) j(year)

 Keys on specified variable, here “fdi”.

 Must declare cross-section identifier i.

 Generates “within” group identifier j.

Put new varname in parentheses.
Typically j will represent time, but not
necessarily.
95
Reshape Notes
 In general, list all variables that must be
combined into a single column.
 You do not need to list time-invariant
variables, but they will be converted to
“long” as well.
 Note that “reshape wide” will convert data
from long to wide format.
 Seems to be touchy about year values. ‘99
for 1999 is ok, but ‘00 for 2000 is a
problem.
96
Fixed Effects Logit
y *it   X it   i  u it where y it  (0,1)
y it  1 if y *it  0
y it  1 if  X it   i  u it  0
Pr( y it  1)  Pr(  X it   i  u it  0) 
Pr(u it  ( X it   i )) 
1  G ((  X it   i )) 
e X it  i
G ( X it   i ) 
1  e X it  i

log L   y it logG (  X it   i )  (1  y it ) log[1  G ( X it   i )]

A Second Course in Statistics Regression Analysis 8th Edition William Mendenhall Terry T Sincich ISBN10 013516379X ISBN13 9780135163795 eBook and TestBank Bundle Full Download
No ratings yet
A Second Course in Statistics Regression Analysis 8th Edition William Mendenhall Terry T Sincich ISBN10 013516379X ISBN13 9780135163795 eBook and TestBank Bundle Full Download
407 pages
Panel Data
No ratings yet
Panel Data
105 pages
Applied General Statistics (HIS 223)
No ratings yet
Applied General Statistics (HIS 223)
35 pages
R Programming for Actuarial Science -- Peter McQuire
No ratings yet
R Programming for Actuarial Science -- Peter McQuire
632 pages
Desug12 Royston
No ratings yet
Desug12 Royston
50 pages
Chapter 2: Properties of The Regression Coefficients and Hypothesis Testing
No ratings yet
Chapter 2: Properties of The Regression Coefficients and Hypothesis Testing
16 pages
8-Simple Regression Analysis
No ratings yet
8-Simple Regression Analysis
9 pages
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
24 pages
L6_TwoVarModel_Extension_2023
No ratings yet
L6_TwoVarModel_Extension_2023
65 pages
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
No ratings yet
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
60 pages
Bivariate Analysis
No ratings yet
Bivariate Analysis
10 pages
Forecasting Techniques: Quantitative Techniques in Management
No ratings yet
Forecasting Techniques: Quantitative Techniques in Management
25 pages
Probability Distributions and Curve Fitting
No ratings yet
Probability Distributions and Curve Fitting
53 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Correlation and Regression 2020
No ratings yet
Correlation and Regression 2020
63 pages
Lec 5 V 11
No ratings yet
Lec 5 V 11
44 pages
Course Notes For Unit 6 of The Udacity Course ST101 Introduction To Statistics PDF
No ratings yet
Course Notes For Unit 6 of The Udacity Course ST101 Introduction To Statistics PDF
23 pages
12.1correlation and simple linear
No ratings yet
12.1correlation and simple linear
45 pages
Topic 6B Regression
No ratings yet
Topic 6B Regression
13 pages
Regression Analysis
No ratings yet
Regression Analysis
47 pages
Lec16-Stata PanelData
No ratings yet
Lec16-Stata PanelData
39 pages
L'Analyse de Données de Panel
No ratings yet
L'Analyse de Données de Panel
40 pages
Panel Data Analysis: Fixed & Random Effects (Using Stata 10.x)
0% (1)
Panel Data Analysis: Fixed & Random Effects (Using Stata 10.x)
40 pages
L9.1_2023
No ratings yet
L9.1_2023
47 pages
Panel Ecmiic2
No ratings yet
Panel Ecmiic2
57 pages
Econometrics notes Heidelberg
No ratings yet
Econometrics notes Heidelberg
62 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
No ratings yet
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
31 pages
Econometrics Final Exam Study Guide PDF
No ratings yet
Econometrics Final Exam Study Guide PDF
14 pages
Corelation and Reg.-12-27
No ratings yet
Corelation and Reg.-12-27
16 pages
Unit-2 Numericals
No ratings yet
Unit-2 Numericals
17 pages
Sample Question Econometrics
No ratings yet
Sample Question Econometrics
11 pages
Maths Decisions
No ratings yet
Maths Decisions
20 pages
Midterm Answer
No ratings yet
Midterm Answer
5 pages
Regn_lect_5
No ratings yet
Regn_lect_5
9 pages
Stats Set-1
No ratings yet
Stats Set-1
4 pages
LearningActivitySheetQ4Wk8 1
No ratings yet
LearningActivitySheetQ4Wk8 1
4 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Stat_Model _exam_2017_DBU
No ratings yet
Stat_Model _exam_2017_DBU
20 pages
lecture 6 linear regression
No ratings yet
lecture 6 linear regression
8 pages
Simple Linear Regression and Correlation Analysis: Chapter Five
No ratings yet
Simple Linear Regression and Correlation Analysis: Chapter Five
5 pages
Panel Vs Pooled Data
No ratings yet
Panel Vs Pooled Data
9 pages
Intergrated Problem
No ratings yet
Intergrated Problem
8 pages
Unit 4 Multiple Linear Regression
No ratings yet
Unit 4 Multiple Linear Regression
3 pages
AP Stats 3.2
No ratings yet
AP Stats 3.2
57 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
questionbank_011020035933
No ratings yet
questionbank_011020035933
9 pages
Topic 1 class exercises
No ratings yet
Topic 1 class exercises
5 pages
October 25, 2011
No ratings yet
October 25, 2011
27 pages
Panel Data Analysis
No ratings yet
Panel Data Analysis
39 pages
Profitability Matrix of Standalone Health Insurance Companies in India
No ratings yet
Profitability Matrix of Standalone Health Insurance Companies in India
9 pages
Study On Factors Affecting Consumer Loyalty in The Restaurant Industry
No ratings yet
Study On Factors Affecting Consumer Loyalty in The Restaurant Industry
7 pages
Attitude, Subjective Norms, Perceived Behavior, Entrepreneurship Education and Self-Efficacy Toward Entrepreneurial Intention University Student in Indonesia
No ratings yet
Attitude, Subjective Norms, Perceived Behavior, Entrepreneurship Education and Self-Efficacy Toward Entrepreneurial Intention University Student in Indonesia
21 pages
CorporateGovernanceandAuditQuality SentAlaka
No ratings yet
CorporateGovernanceandAuditQuality SentAlaka
37 pages
The - Effect - of - PowerPoint - Presentations - O20161024 12795 z4c656 With Cover Page v2
No ratings yet
The - Effect - of - PowerPoint - Presentations - O20161024 12795 z4c656 With Cover Page v2
23 pages
SPSS Def + Example - New - 1!1!2011
No ratings yet
SPSS Def + Example - New - 1!1!2011
43 pages
Business Mathematics 1
No ratings yet
Business Mathematics 1
13 pages
Manipulation of Financial Information Through Creative Accounting: Case Study at Companies Listed On The Romanian Stock Exchange
No ratings yet
Manipulation of Financial Information Through Creative Accounting: Case Study at Companies Listed On The Romanian Stock Exchange
18 pages
The Linear Regression Model
No ratings yet
The Linear Regression Model
36 pages
Complete ML Notes
No ratings yet
Complete ML Notes
62 pages
PHD Econ, Applied Econometrics 2021/22 - Takehome University of Innsbruck
No ratings yet
PHD Econ, Applied Econometrics 2021/22 - Takehome University of Innsbruck
20 pages
1.+Nurbuana+Amir
No ratings yet
1.+Nurbuana+Amir
13 pages
Interview Questions
No ratings yet
Interview Questions
13 pages
Social Media Marketing and Business Performance of MSMEs During The COVID-19 Pandemic
No ratings yet
Social Media Marketing and Business Performance of MSMEs During The COVID-19 Pandemic
9 pages
Peer Group Pressure, Study Habit and Academic Achievement of Secondary School Students
No ratings yet
Peer Group Pressure, Study Habit and Academic Achievement of Secondary School Students
8 pages
Mra Exam Notes
No ratings yet
Mra Exam Notes
10 pages
Module 3 - Market Research
No ratings yet
Module 3 - Market Research
8 pages
UNSW - Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics
No ratings yet
UNSW - Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics
28 pages
Practical 9
No ratings yet
Practical 9
6 pages
ML MCQ
100% (2)
ML MCQ
31 pages
Josh Rombach Case 2
No ratings yet
Josh Rombach Case 2
5 pages
Lev&Thiagaraian 1993JAR
No ratings yet
Lev&Thiagaraian 1993JAR
27 pages
Tax Awareness Among Students 1
No ratings yet
Tax Awareness Among Students 1
14 pages
Chapter 3: Quantitative Demand Analysis Answers To Questions and Problems
No ratings yet
Chapter 3: Quantitative Demand Analysis Answers To Questions and Problems
14 pages
Midterm Fall2011
No ratings yet
Midterm Fall2011
13 pages
Service Quality - A Study of The Luxury Hotels in Malaysia
100% (3)
Service Quality - A Study of The Luxury Hotels in Malaysia
11 pages
Problem 1 (7 Points) :: T 2t 3t 4t T T
No ratings yet
Problem 1 (7 Points) :: T 2t 3t 4t T T
2 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Student Solutions Manual to Accompany Modern Macroeconomics
From Everand
Student Solutions Manual to Accompany Modern Macroeconomics
Sanjay K. Chugh
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet
Capsule Calculus
From Everand
Capsule Calculus
Ira Ritow
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Mathematical Formulas for Economics and Business: A Simple Introduction
From Everand
Mathematical Formulas for Economics and Business: A Simple Introduction
K.H. Erickson
4/5 (4)
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet