Pooled Cross-Section Time Series Data
Pooled Cross-Section Time Series Data
2
Example: Two Period Panel Data
N=4, T=2
i t Consumption (Y) Income
(X)
1 1 72 98
1 2 75 102
2 1 31 40
2 2 26 39
3 1 55 66
3 2 62 70
4 1 41 59
4 2 45 60 3
Yit = B0 + B1Xit + eit
80
60
40
20
40 60 80 100
x
Fitted values y 4
Interpreting Coefficients
Yit = B0 + B1Xit + eit
Yit Yi ,t 1 Yit
Change in Yt over B1
time for a given X it X i ,t 1 X it
individual.
5
Use intercept dummies to
differentiate between “time” and
“type” effects
7
Example: Two Period Panel Data
with Time Dummy
i t DT (Yit) (Xit)
1 1 0 72 98
1 2 1 75 102
2 1 0 31 40
2 2 1 26 39
3 1 0 55 66
3 2 1 62 70
4 1 0 41 59
4 2 1 45 60 8
Time Dummy Example
sum y if t==1
sum y if t==2
Reg y dt
Coeff = 52 - 49.75 = 2.25
9
Time Dummy represents shift of regression line from period
1 to period 2. When regressed on Yit along with Xit:
t=2
80
t=1
60
T = 2.25
40
20
40 60 80 100
x
Fitted values y 10
Type Dummies
Separate cross-sectional dimension of
sample into qualitative “types” (e.g. male vs.
female, rural vs. urban, foreign vs. domestic,
treatment vs. control, etc.)
sum y if db==1
Variable | Obs Mean
y | 4 43.5
sum y if db==0
Variable | Obs Mean
y | 4 58.25
type=A
80
B = -14.25
60
type=B
40
20
40 60 80 100
x
Fitted values y 14
Difference-in-Differences Estimator
Estimates the difference across types, and
over time, using simple dummy variable
framework.
Difference Difference
“After” “Before” 16
Difference Coefficient
Also known as “Average Treatment
Effect”,
DD (YB , 2 YB ,1 ) (Y A, 2 Y A,1 )
= (-16)-(-13.5) = -2.5 19
How much more did treatment group (B) outcome increase
than control group (A) from time 1 to time 2?
YB , 2
DD (YB , 2 YB ,1 )
(Y A, 2 Y A,1 )
YB ,1
Y A, 2
Y A,1
t=1 t=2
20
Panel Data Problem!
Unobserved Heterogeneity
There exist characteristics of each
individual that persist over time
which cannot be included in the
regression (unobservable in available
data), but which none-the-less impact
the observed variation in our
dependent variable.
21
Composite Errors
These time-invariant unobserved
effects are best modeled as a
component in the regression error term.
23
The Composite Error Model
Yit = B0 + Xit + vit
component.
The Composite Error Problems
1.) If COV(ai, Xit) 0, then OLS
estimates will be biased.
25
Composite Error Bias
Yit B0 B1 X it a i u it
Bˆ1
( X X )( a u
it i it )
(X X )
2
it
E ( X it X )ai E ( X it X )u it
E ( Bˆ1 )
(X (X
2 2
it X) it X)
ˆ COV ( X it , ai ) COV ( X it , u it )
E ( B1 )
( X it X ) ( X it X )
2 2
26
Examples:
1. manager charisma correlated with
firm size in production function.
2. Nepotism/networking correlated
with education in wage equation.
29
First Difference Transformation (two
period panel) with Time dummy
Yit = B0 + 0DTt + Xit + ai + uit
For Period 2:
Yi2 = (B0 + 0) + Xi2 + ai + ui2
For Period 1:
Yi1 = B0 + Xi1 + ai + ui1
First Difference = Yi = Yi2 – Yi1
Problems:
• 1. Eliminates all time invariant variables (type
dummies)
32
Type Dummies
Solves ‘time invariant bias’ problem
by removing ai from error
component, and directly estimating
the effects.
append using
c:/stata627/nfa/italy.dta
append using
c:/stata627/nfa/japan.dta
append using c:/stata627/nfa/uk.dta
34
Dummy Example – estimates of ai
xi:reg y cpi r er i.code
i.code _Icode_1-5 (naturally coded; _Icode_1 omitted)
Yit Yit
B1
X it X it
39
Fixed Effects Transformation With Time-Invariant
Dummy Independent Variable
(1) y it 0 1 xit 0 Dit ai u it
Dit (0,1), and time invariant for each i
1 T 1
Di avg. for each individual over time Dit TDi Di
T t 1 T
(2) y i 0 1 xi 0 Di ai u i
y it y i ( 0 0 ) 1 ( xit xi ) 0 ( Di Di ) (ai ai ) (u it u i )
( y it y i ) 1 ( xit xi ) (u it u i )
(3)
y it 1 xit uit
43
Fixed Effects Stata Example
xtreg y cpi r er,fe
F(3,983) = 791.14
corr(u_i, Xb) = -0.7495 Prob > F = 0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cpi | .3817633 .0117365 32.53 0.000 .3587318 .4047948
r | -.4944136 .0780945 -6.33 0.000 -.6476647 -.3411625
er | -.0196729 .0014589 -13.49 0.000 -.0225358 -.0168101
_cons | 70.49544 1.529625 46.09 0.000 67.49374 73.49715
-------------+----------------------------------------------------------------
sigma_u | 16.538008 (std. error of time-invariant error)
sigma_e | 6.3818613 (std. error of idiosyncratic error)
rho | .87038904 (fraction of variance due to u_i)
------------------------------------------------------------------------------
44
F test that all u_i=0: F(3, 983) = 362.02 Prob > F = 0.0000
Random Effects
Assumes CORR(ai, Xit) = 0.
Therefore, OLS coefficients will not
suffer “composite error bias”, as was
assumed with Fixed Effects.
47
Random Effects
Under the preceding criteria, the
composite error does not violate OLS
assumptions.
a u
2
49
Where, again: vit = uit + ai
Random Effects
Random effects transformation is
more complicated than FD or FE, but
basic idea is to eliminate serial
correlation in the error term by using
information on variances of fixed and
idiosyncratic errors.
50
Random Effects (RE)
51
RE Transformation
(1) yit 0 1 xit ai uit
( yit y i ) (1 ) 0 1 ( xit xi ) (1 )ai (uit ui )
ˆ 2
where ˆi 1 u
Tiˆ a ˆ u
2 2
52
ˆ u
ˆ 2
Given i 1
Ti a u
ˆ 2
ˆ 2
Corr(vit, vis) = 0
53
ˆ u ˆ 2
Given i 1
Tiˆ a2 ˆ u2
NOTE:
If var(a ) = 0, meaning a is always zero
i i
(no time-invariant effects), then lambda
equals 0 and RE regression is equivalent
to Pooled OLS equation (1) - all lambda-
weighted terms drop out.
56
Fixed vs. Random Effects
The Random Effects view is that
unobserved heterogeneity is
“randomly assigned” to each
cross sectional entity and not
correlated with other explanatory
variables.
57
When to use FE vs. RE?
The Hausman Coefficient Test
The logic of the test is the following:
• If CORR(ai, Xit) 0, then RE is biased.
ˆ1FE ˆ1RE
ˆ FE ˆ RE
ˆ FE 2 , ˆ RE 2
ˆ FE ˆ RE
3 3
ˆ
V ( ) vector of variance terms for each coefficient.
*E
Test Statistic
H ( ˆ FE ˆ RE )'[V ( ˆ FE ) V ( ˆ RE )]1 ( ˆ FE ˆ RE )
61
While this can be shown to be
asymptotically true, it may not hold for a
given sample.
62
Hausman Test Interpretation
H0: FE = RE (difference in coefficients
is NOT systematic)
HA: FE RE.
chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 162.38
Prob>chi2 = 0.0000
(V_b-V_B is not positive definite)
66
Stata Note
The ‘xtdata’ command allows you to create
a new data set of the transformed
variables.
68
Extension 1: IV Panel
When an independent variable is
endogenous in a panel regression,
each stage of the two stage least
squares process must take into
account the composite error issue.
70
Panel 2SLS
(1) yit 0 1 xit ai uit
CORR( xit , ai ) 0
CORR( xit , uit ) 0
72
Second Stage FE
(1' ) yit 1 xˆit uit
ˆ
FE , 2 SLS
1 will be unbiased.
73
Extension 2: Cluster Regression
Allows for a Fixed Effects transformation with
single period cross-section data.
dummy” variables.
Cross Section Cluster Example
Household Village Consumptio Income
(i) (j) n (Yij) (Xij)
1 1 500 750
2 1 650 1000
3 1 475 725
1 2 600 700
2 2 625 750
3 2 550 600
1 3 575 1100
2 3 625 1200
75
3 3 600 1000
Cluster Regression
Model:
Xij = observation for household i in village j
ij 1 ij ij
Cluster Effects Transformation
78
Transformed OLS Regression
reg y_umlat x_umlat
------------------------------------------------------------------------
y_umlat | Coef. Std. Err. t P>|t|
-------------+----------------------------------------------------------
x_umlat | .4759358 .0901646 5.28 0.001
_cons | 4.09e-07 8.38938 0.00 1.000
------------------------------------------------------------------------
79
FIXED EFFECTS
tsset j i
panel variable: j (strongly balanced)
time variable: i, 1 to 3
delta: 1 unit
xtreg y x,fe
Fixed-effects (within) regression Number of obs = 9
Group variable: j Number of groups = 3
within = 0.7992 Obs per group: min = 3
between = 0.0961 avg = 3.0
overall = 0.2517 max = 3
F(1,5) = 19.90
corr(u_i, Xb) = -0.8365 Prob > F = 0.0066
---------------------------------------------------------------------
y | Coef. Std. Err. t P>|t|
-------------+-------------------------------------------------------
x | .4759358 .1066842 4.46 0.007
_cons | 163.978 93.28558 1.76 0.139
-------------+-------------------------------------------------------
sigma_u | 95.865925
sigma_e | 29.779343
rho | .91199744 (fraction of variance due to u_i)
---------------------------------------------------------------------
F test that all u_i=0: F(2, 5) = 9.34 Prob > F = 0.020580
Cluster (village) Dummies
xi:reg y x i.j
-----------------------------------------------------------------------
y | Coef. Std. Err. t P>|t|
-------------+---------------------------------------------------------
x | .4759358 .1066842 4.46 0.007
Vlg2 _Ij_2 | 117.4242 28.62912 4.10 0.009
Vlg3 _Ij_3 | -72.54902 38.10424 -1.90 0.115
Vlg1 _cons | 149.0196 89.678 1.66 0.157
-----------------------------------------------------------------------
81
“predict ai, u” to view the estimated ai
i j _Ij_2 _Ij_3 ai
1 1 0 0 -14.9584
2 1 0 0 -14.9584
3 1 0 0 -14.9584
1 2 1 0 102.4658
2 2 1 0 102.4658
3 2 1 0 102.4658
1 3 0 1 -87.5074
2 3 0 1 -87.5074
3 3 0 1 -87.5074
82
Aside. . . ”xtdes” command
xtdes
j: 1, 2, ..., 3 n = 3
i: 1, 2, ..., 3 T = 3
Delta(i) = 1 unit
Span(i) = 3 periods
(j*i uniquely identifies each observation)
Distribution of T_i:
min 5% 25% 50% 75% 95% max
3 3 3 3 3 3 3
83
Extension 3: Autocorrelation of uit’s
Random Effects transformation eliminated
autocorrelation amongst composite errors
due to presence of ai.
(ui ,t 1 ui ,t 1 )
86
STATA to the rescue again!
The command:
“xtregar y x,fe”
F(3,979) = 5.13
corr(u_i, Xb) = -0.1308 Prob > F = 0.0016
------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t|
-------------+----------------------------------------------------------
r | -.0362285 .0875633 -0.41 0.679
cpi | .2832925 .076438 3.71 0.000
er | .0015201 .0029347 0.52 0.605
_cons | 68.766 .2288196 300.52 0.000
-------------+----------------------------------------------------------
rho_ar | .9718915
sigma_u | 6.3918957
sigma_e | 1.7246814
rho_fov | .93213626 (fraction of variance because of u_i)
88
------------------------------------------------------------------------
F test that all u_i=0: F(3,979) = 2.14 Prob > F = 0.094
Stata Note – balancing your panel
It may be useful to use only those
“entities” that appear in all time
periods. Suppose T=20 – use the
following:
90
Example of “wide” form data set
91
Problem
In order to run a panel regression in
STATA, we need data to be stored in
“long” form.
93
The “reshape” STATA command
Instead of copying and pasting in
excel, load the data into STATA as
“wide” form, then transform.