Linear Regression Models For Panel Data Using SAS, STATA, LIMDEP and SPSS
Linear Regression Models For Panel Data Using SAS, STATA, LIMDEP and SPSS
Introduction
Least Squares Dummy Variable Regression
Panel Data Models
The Fixed Group Effect Model
The Fixed Time Effect Model
The Fixed Group and Time Effect Model
Random Effect Models
The Poolability Test
Conclusion
1. Introduction
Panel data are cross sectional and longitudinal (time series). Some examples are the cumulative
General Social Survey (GSS) and Current Population Survey (CPS) data. Panel data may have
group effects, time effects, or the both. These effects are analyzed by fixed effect and random
effect models.
1.1 Data Arrangement
A panel data set contains observations on n individuals (e.g., firms and states), each measured
at T points in time. In other word, each individual (1 through n subject) includes T observations
(1 through t time period). Thus, the total number of observations is nT. Figure 1 illustrates the
data arrangement of a panel data set.
Figure 1. Data Arrangement of Panel Data
Group
Time
Variable1
1
1
1
2
2
...
2
1
2
T
1
2
http://www.indiana.edu/~statmath
Variable2
Variable3
n
n
1
2
* vit ~ IID(0, v )
2
The random effect model, by contrast, estimates variance components for groups and error,
assuming the same intercept and slopes. The difference among groups (or time periods) lies in
the variance of the error term. This model is estimated by generalized least squares (GLS) when
the matrix, a variance structure among groups, is known. The feasible generalized least
squares (FGLS) method is used to estimate the variance structure when is not known. A
typical example is the groupwise heteroscedastic regression model (Greene 2003). There are
various estimation methods for FGLS including maximum likelihood methods and simulations
(Baltagi and Cheng 1994).
Fixed effects are tested by the (incremental) F test, while random effects are examined by the
Lagrange multiplier (LM) test (Breusch and Pagan 1980). If the null hypothesis is not rejected,
the pooled OLS regression is favored. The Hausman specification test (Hausman 1978)
compares fixed effect and random effect models. Table 1 compares the fixed effect and random
effect models.
Group effect models create dummies using grouping variables (e.g., country, firm, and race). If
one grouping variable is considered, it is called a one-way fixed or random group effects model.
Two-way group effect models have two sets of dummy variables, one for a grouping variable
and the other for a time variable.
http://www.indiana.edu/~statmath
LSDV regression, the within effect model, the between effect model (group or time mean
model), GLS, and FGLS are fundamentally based on OLS in terms of estimation. Thus, any
procedure and command for OLS is good for the panel data models.
The REG procedure of SAS/STAT, STATA .regress (.cnsreg), LIMDEP regress$, and
SPSS regression commands all fit LSDV1 dropping one dummy and have options to
suppress the intercept (LSDV2). SAS, STATA, and LIMDEP can estimate OLS with
restrictions (LSDV3), but SPSS cannot. Note that the STATA .cnsreg command requires
the .constraint command that defines a restriction (Table 2).
Table 2. Procedures and Commands in SAS, STATA, LIMDEP, and SPSS
SAS 9.1
STATA 9.0
LIMDEP 8.0
Regression (OLS)
LSDV1
LSDV2
LSDV3
Fixed effect
(within effect)
Two-way fixed
(within effect)
Between effect
Random effect
Two-way random
PROC REG
w/o a dummy
/NOINT
RESTRICT
TSCSREG /FIXONE
PANEL /FIXONE
.regress
w/o a dummy
Noconstant
.cnsreg
.xtreg w/ fe
TSCSREG /FIXTWO
PANEL /FIXTWO
N/A
PANEL /BTWNG
PANEL /BTWNT
TSCSREG /RANONE
PANEL /RANONE
TSCSREG /RANTWO
PANEL /RANTWO
.xtreg w/ be
.xtreg w/ re
N/A
Regress$
w/o a dummy
w/o One in Rhs
Cls:
Regress;Panel;St
r=;Pds=;Fixed$
Regress;Panel;St
r=;Pds=;Fixed$
Regress;Panel;St
r=;Pds=;Means$
Regress;Panel;St
r=;Pds=;Random$
Problematic
SPSS 13.0
Regression
w/o a dummy
/Origin
N/A
N/A
N/A
N/A
N/A
N/A
SAS, STATA, and LIMDEP also provide the procedures (commands) that are designed to
estimate panel data models conveniently. SAS/ETS has the TSCSREG and PANEL procedures
to estimate one-way and two-way fixed and random effect models.1 For the fixed effect model,
these procedures estimate LSDV1, which drops one of the dummy variables. For the random
effects model, they by default use the Fuller-Battese method (1974) to estimate variance
components for group, time, and error. These procedures also support other estimation methods
such as Parks (1967) autoregressive model and Da Silva moving average method.
The TSCSREG procedure can handle balanced data only, whereas the PANEL procedure is
able to deal with balanced and unbalanced data. The former provides one-way and two-way
fixed and random effect models, while the latter supports the between effect model and pooled
OLS regression as well. Despite advanced features of PANEL, output from the two procedures
looks alike.
The STATA .xtreg command estimates within effect (fixed effect) models with the fe option,
between effect models with the be option, and random effect models with the re option. This
command, however, does not fit the two-way fixed and random effect models. The LIMDEP
1
SAS recently announced the PROC PANEL, an experimental procedure, for panel data models.
http://www.indiana.edu/~statmath
regress$ command with the panel; subcommand estimates panel data models, but this
command is not sufficiently stable. SPSS has limited ability to analyze panel data.
1.4 Data Sets
This document uses two data sets. The cross-sectional data set contains research and
development (R&D) expenditure data of the top 50 information technology firms presented in
OECD Information Technology Outlook 2004. The panel data set has cost data for U.S. airlines
(1970-1984) from Econometric Analysis (Greene 2003). See the Appendix for the details.
http://www.indiana.edu/~statmath
The ordinary least squares (OLS) regression without dummy variables, a pooled regression
model, assumes a constant intercept and slope regardless of firm types. In the following
regression equation, 0 is the intercept; 1 is the slope of net income in 2000; and i is the
error term.
Model 1: R & Di = 0 + 1incomei + i
The pooled model has the intercept of 1,482.697 and slope of .223. For a $ one million increase
in net income, a firm is likely to increase R&D expenditure in 2002 by $ .223 million.
. regress rnd income
Source |
SS
df
MS
-------------+-----------------------------Model | 15902406.5
1 15902406.5
Residual | 83261299.1
37 2250305.38
-------------+-----------------------------Total | 99163705.6
38
2609571.2
Number of obs
F( 1,
37)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
39
7.07
0.0115
0.1604
0.1377
1500.1
-----------------------------------------------------------------------------rnd |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------income |
.2230523
.0839066
2.66
0.012
.0530414
.3930632
_cons |
1482.697
314.7957
4.71
0.000
844.8599
2120.533
------------------------------------------------------------------------------
http://www.indiana.edu/~statmath
You may assume that equipment and software firms have more R&D expenditure than other
types of companies. Let us take this group difference into account.2 We have to drop one of the
two dummy variables in order to avoid perfect multicollinearity. That is, OLS does not work
with both dummies in a model. The 1 in model 2 is the coefficient that is valid in equipment
and software companies only.
Model 2: R & Di = 0 + 1incomei + 1 d1i + i
Unlike Model 1, this model results in two different regression equations for two groups. The
difference lies in the intercepts, but the slope remains unchanged.
. regress rnd income d1
Source |
SS
df
MS
-------------+-----------------------------Model | 24987948.9
2 12493974.4
Residual | 74175756.7
36 2060437.69
-------------+-----------------------------Total | 99163705.6
38
2609571.2
Number of obs
F( 2,
36)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
39
6.06
0.0054
0.2520
0.2104
1435.4
-----------------------------------------------------------------------------rnd |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------income |
.2180066
.0803248
2.71
0.010
.0551004
.3809128
d1 |
1006.626
479.3717
2.10
0.043
34.41498
1978.837
_cons |
1133.579
344.0583
3.29
0.002
435.7962
1831.361
------------------------------------------------------------------------------
There is only a tiny difference in the slope (.223 versus .218) between Model 1 and Model 2.
The intercept 1,483 of Model 1, however, is quite different from 1,134 for equipment and
software companies and 2,140 for telecommunications and electronics in Model 2. This result
appears to support Model 2.
Figure 3 highlights differences between Model 1 and 2 more clearly. The black line (pooled) in
the middle is the regression line of Model 1; the red line at the top is one for equipment and
software companies (d1=1) in Model 2; finally the blue line at the bottom is for
telecommunication and electronics firms (d2=1 or d1=0).
2
The dummy variable (firm types) and regressors (net income) may or may not be correlated.
http://www.indiana.edu/~statmath
This plot shows that Model 1 ignores the group difference, and thus reports the misleading
intercept. The difference in the intercept between two groups of firms looks substantial.
Moreover, the two models have the similar slopes. Consequently, Model 2 considering fixed
group effects seems better than the simple Model 1. Compare goodness of fit statistics (e.g., F, t,
R2, and SSE) of the two models. See Section 3.2.2 and 4.7 for formal hypothesis testing.
The least squares dummy variable (LSDV) regression is ordinary least squares (OLS) with
dummy variables. The critical issue in LSDV is how to avoid the perfect multicollinearity or
the so called dummy variable trap. LSDV has three approaches to avoid getting caught in the
trap. They produce different parameter estimates of dummies, but their results are equivalent.
The first approach, LSDV1, drops a dummy variable as in Model 2 above. The second
approach includes all dummies and, in turn, suppresses the intercept (LSDV2). Finally, include
the intercept and all dummies, and then impose a restriction that the sum of parameters of all
dummies is zero (LSDV3). Take a look at the following functional forms to compare these
three LSDVs.
LSDV1: R & Di = 0 + 1incomei + 1 d1i + i or R & Di = 0 + 1incomei + 2 d 2i + i
LSDV2: R & Di = 1incomei + 1 d1i + 2 d 2i + i
LSDV3: R & Di = 0 + 1incomei + 1 d1i + 2 d 2i + i , subject to 1 + 2 = 0
http://www.indiana.edu/~statmath
The main differences among these approaches exist in the meanings of the dummy variable
parameters. Each approach defines the coefficients of dummy variables in different ways
(Table 3). The parameter estimates in LSDV2 are actual intercepts of groups, making it easy to
interpret substantively. LSDV1 reports differences from the reference point (dropped dummy
variable). LSDV3 computes how far parameter estimates are away from the average group
effect. Accordingly, null hypotheses of t-tests in the three approaches are different. Keep in
mind that the R2 of LSDV2 is not correct. Table 3 contrasts the three LSDVs.
Table 3. Three Approaches of Least Squares Dummy Variable Models
LSDV1:
LSDV2:
LSDV3:
Drop one dummy
Suppress the intercept Impose a restriction
Dummy included
a , d 2a d da
d1* d d*
c , d1c d dc
Intercept?
All dummy?
Restriction?
Yes
No (d-1)
No
No
Yes (d)
No
Yes
Yes (d)
Meaning of coefficient
Coefficients
d i* = a + d ia ,
d1* , d 2* , d d*
d i* = c + d ic , where
1
c = d i*
d
1
d i* d i* = 0
d
*
d dropped
=a
H0 of T-test
*
d i* d dropped
=0
d i* = 0
c
i
= 0*
The SAS REG procedure, STATA .regress command, LIMDEP Regress$ command, and
SPSS Regression command all fit OLS and LSDVs. Let us estimate three LSDVs using SAS
and STATA.
2.5.1 LSDV 1 without a Dummy
LSDV 1 drops a dummy variable. The intercept is the actual parameter estimate of the dropped
dummy variable. The coefficient of the dummy included means how far its parameter estimate
is away from the reference point or baseline (i.e., the intercept).
Here we include d2 instead of d1 to see how a different reference point changes the result.
Check the sign of the dummy coefficient included and the intercept. Dropping other dummies
does not make any significant difference.
PROC REG DATA=masil.rnd2002;
MODEL rnd = income d2;
RUN;
http://www.indiana.edu/~statmath
50
39
11
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Corrected Total
2
36
38
24987949
74175757
99163706
12493974
2060438
Root MSE
Dependent Mean
Coeff Var
1435.42248
2023.56410
70.93536
R-Square
Adj R-Sq
F Value
Pr > F
6.06
0.0054
0.2520
0.2104
Parameter Estimates
Variable
Intercept
income
d2
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
2140.20468
0.21801
-1006.62593
434.48460
0.08032
479.37174
4.93
2.71
-2.10
<.0001
0.0101
0.0428
LSDV 2 includes all dummy variables and suppresses the intercept. The STATA .regress
command has the noconstant option to fit LSDV2. The coefficients of dummies are actual
parameter estimates; thus, you do not need to compute intercepts of groups. This LSDV,
however, reports wrong R2 (.7135 .2520).
. regress rnd income d1 d2, noconstant
Source |
SS
df
MS
-------------+-----------------------------Model |
184685604
3 61561868.1
Residual | 74175756.7
36 2060437.69
-------------+-----------------------------Total |
258861361
39 6637470.79
Number of obs
F( 3,
36)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
39
29.88
0.0000
0.7135
0.6896
1435.4
-----------------------------------------------------------------------------rnd |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------income |
.2180066
.0803248
2.71
0.010
.0551004
.3809128
d1 |
2140.205
434.4846
4.93
0.000
1259.029
3021.38
d2 |
1133.579
344.0583
3.29
0.002
435.7962
1831.361
http://www.indiana.edu/~statmath
------------------------------------------------------------------------------
LSDV 3 includes the intercept and all dummies and then imposes a restriction on the model.
The restriction is that the sum of all dummy parameters is zero. The STATA .constraint
command defines a constraint, while the .cnsreg command fits a constrained OLS using the
constraint()option. The number in the parenthesis indicates the constraint number defined in
the .constraint command.
. constraint 1 d1 + d2 = 0
. cnsreg rnd income d1 d2, constraint(1)
Constrained linear regression
Number of obs =
F( 2,
36) =
Prob > F
=
Root MSE
=
39
6.06
0.0054
1435.4
( 1) d1 + d2 = 0
-----------------------------------------------------------------------------rnd |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------income |
.2180066
.0803248
2.71
0.010
.0551004
.3809128
d1 |
503.313
239.6859
2.10
0.043
17.20749
989.4184
d2 |
-503.313
239.6859
-2.10
0.043
-989.4184
-17.20749
_cons |
1636.892
310.0438
5.28
0.000
1008.094
2265.69
------------------------------------------------------------------------------
50
39
11
Analysis of Variance
Source
Model
http://www.indiana.edu/~statmath
DF
Sum of
Squares
Mean
Square
F Value
Pr > F
24987949
12493974
6.06
0.0054
36
38
74175757
99163706
Root MSE
Dependent Mean
Coeff Var
1435.42248
2023.56410
70.93536
2060438
R-Square
Adj R-Sq
0.2520
0.2104
Parameter Estimates
Variable
DF
Parameter
Estimate
Standard
Error
t Value
Intercept
income
d1
d2
RESTRICT
1
1
1
1
-1
1636.89172
0.21801
503.31297
-503.31297
1.81899E-12
310.04381
0.08032
239.68587
239.68587
0
5.28
2.71
2.10
-2.10
.
Pr > |t|
<.0001
0.0101
0.0428
0.0428
.
Table 4 compares how SAS, STATA, LIMDEP, and SPSS conducts LSDVs. SPSS is not able
to fit the LSDV3. In LIMDEP, the b(2) of the Cls: indicates the parameter estimate of the
second independent variable. In SPSS, pay attention to the /ORIGIN option for LSDV2.
Table 4. Estimating Three LSDVs Using SAS, STATA, LIMDEP, and SPSS
LSDV 1
LSDV 2
LSDV 3
PROC
REG;
PROC
REG;
PROC
REG;
SAS
MODEL rnd = income d2;
MODEL rnd = income d1 d2 /NOINT;
MODEL rnd = income d1 d2;
RUN;
RUN;
STATA
LIMDEP
REGRESS;
Lhs=rnd;
Rhs=ONE,income, d2$
REGRESS;
Lhs=rnd;
Rhs=income, d1, d2$
SPSS
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT rnd
/METHOD=ENTER income d2.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/ORIGIN
/DEPENDENT rnd
/METHOD=ENTER income d1 d2.
http://www.indiana.edu/~statmath
RESTRICT d1 + d2 = 0;
RUN;
. constraint 1 d1+ d2 = 0
. cnsreg rnd income d1 d2 const(1)
REGRESS;
Lhs=rnd;
Rhs=ONE,income, d1, d2;
Cls: b(2)+b(3)=0$
N/A
There are several strategies for estimating fixed effect models. The least squares dummy
variable model (LSDV) uses dummy variables, whereas the within effect does not. These
strategies produce the identical slopes of non-dummy independent variables. The between
effect model also does not use dummies, but produces different parameter estimates. There are
pros and cons of these strategies (Table 5).
3.2.1 Estimations: LSDV, Within Effect, and Between Effect Model
As discussed in Chapter 2, LSDV is widely used because it is relatively easy to estimate and
interpret substantively. This LSDV, however, becomes problematic when there are many
http://www.indiana.edu/~statmath
groups or subjects in the panel data. If T is fixed and N , only coefficients of regressors
are consistent. The coefficients of dummy variables, + i , are not consistent since the
number of these parameters increases as N increases (Baltagi 2001). This is so called the
incidental parameter problem. Under this circumstance, LSDV is useless, calling for another
strategy, the within effect model.
The within effect model does not use dummy variables, but uses deviations from group means.
Thus, this model is the OLS of ( y it y i ) = ' ( xit xi ) + ( it i ) without an intercept.3 You
do not need to worry about the incidental parameter problem any more. The parameter
estimates of regressors are identical to those of LSDV. The within effect model in turn has
several disadvantages.
Table 5. Three Strategies for Fixed Effect Models
LSDV1
Within Effect
Functional form
y i = i i + X i + i
yit yi = xit xi + it i
Dummy
Dummy coefficient
Transformation
Intercept (estimation)
R2
SSE
MSE
Standard error of
DFerror
Observations
Between Effect
y i = + xi + i
Yes
Presented
No
Yes
Correct
Correct
Correct
Correct
No
Need to be computed
Deviation from the group means
No
Incorrect
Correct
Smaller
Incorrect (smaller)
No
N/A
Group means
No
nT-n-k
nT
nT-k (Larger)
nT
n-K
n
Since this model does not report dummy coefficients, you need to compute them using the
formula d g* = y g ' x g Since no dummy is used, the within effect model has a larger degree
of freedom for error, resulting in a small MSE (mean square error) and incorrect (larger)
standard errors of parameter estimates. Thus, you have to adjust the standard error using the
Within
df error
nT k
formula se = sek
. Finally, R2 of the within effect model is not
= sek
LSDV
nT n k
df error
correct because an intercept is suppressed.
*
k
The between group effect model, so called the group mean regression, uses the group means of
the dependent and independent variables. Then, run OLS of yi = + xi + i The number of
observations decreases to n. This model uses aggregated data to test effects between groups (or
individuals), assuming no group and time effect. Table 5 contrasts LSDV, the within effect
model, and the between group models. In two-way fixed effect model, LSDV2 and the between
effect model are not valid.
You need to follow three steps: 1) compute group means of the dependent and independent variables; 2)
transform variables to get deviations of individual values from the group means; 3) run OLS with the transformed
variables without the intercept.
http://www.indiana.edu/~statmath
The null hypothesis is that all dummy parameters except one are zero: H 0 : 1 = ... = n 1 = 0 .
This hypothesis is tested by the F test, which is based on loss of goodness-of-fit. The robust
model in the following formula is LSDV and the efficient model is the pooled regression.4
(e' e Efficient e' e Robust ) (n 1)
(e' e Robust ) (nT n k )
2
2
( RRobust
REfficient
) (n 1)
2
(1 RRobust
) (nT n k )
~ F (n 1, nT n k )
If the null hypothesis is rejected, you may conclude that the fixed group effect model is better
than the pooled OLS model.
3.2.3 Fixed Time Effect and Two-way Fixed Effect Models
For the fixed time effects model, you need to switch n and T, and i and t in the formulas.
Model: y it = + t + ' X it + it
H 0 : 1 = ... = T 1 = 0 .
(e' e Efficient e' e Robust ) (T 1)
F-test:
~ F (T 1, Tn T k ) .
(e' e Robust ) (Tn T k )
Within
df error
Tn k
= sek
LSDV
Tn T k
df error
The fixed group and time effect model uses slightly different formulas. The within effect model
of this two-way fixed model has four approaches for LSDV (see 6.1 for details).
Model: y it = + i + t + ' X it + it .
Within
df error
nT k
= sek
LSDV
nT n T k + 1
df error
When comparing fixed effect and random effect models, the fixed effect estimates are considered as the robust
estimates and random effect estimates as the efficient estimates.
http://www.indiana.edu/~statmath
When is known (given), GLS based on the true variance components is BLUE and all the
feasible GLS estimators considered are asymptotically efficient as either n or T approaches
infinity (Baltagi 2001). The matrix looks like,
2 + v2
2
2
2 + v2
=
...
T T
...
2
2
2
2
...
...
...
... 2 + v2
...
v2
T +
2
2
v
.6 Then transform
variables as follows.
yit* = yit yi
* = 1
Finally, run OLS with the transformed variables: yit* = * + xit* * it* . Since is often
unknown, FGLS is more frequently used rather than GLS.
3.3.2 Feasible Generalized Least Squares (FGLS)
If
2 ( 2 + v2 )
if i=j and t s .
= 0 , run pooled OLS. If = 1 and v2 = 0 , then run the within effect model.
http://www.indiana.edu/~statmath
= 1
v2
T 2 + v2
=1
v2
.
2
T between
The v2 is derived from the SSE (sum of squares due to error) of the within effect model or
from the deviations of residuals from group means of residuals:
n
SSE within
e' ewithin
v2 =
=
=
nT n k nT n k
(v
i =1 t =1
it
vi ) 2
nT n k
The 2 comes from the between effect model (group mean regression):
2
2 = between
v2
T
2
=
, where between
SSEbetween
.
nK
Next, transform variables using and then run OLS: yit* = * + xit* * it* .
y * = y y
it
it
The estimation of the two-way random effect model is skipped here, since it is complicated.
3.3.3 Testing Random Effects (LM test)
The null hypothesis is that cross-sectional variance components are zero, H 0 : u2 = 0 . Breusch
and Pagan (1980) developed the Lagrange multiplier (LM) test (Greene 2003; Judge et al.
1988). In the following formula, e is the n X 1 vector of the group specific means of pooled
regression residuals, and e' e is the SSE of the pooled OLS regression. The LM is distributed as
chi-squared with one degree of freedom.
2
nT e' DDe
nT T 2 e ' e
LM =
=
1 ~ 2 (1) .
1
2(T 1) e' e
2(T 1) e' e
2
2
nT ( eit )
nT (Tei )
LM =
1
1
~ 2 (1) .
2
T
2(T 1) eit2
2
(
1
)
it
The two way random effect model has the null hypothesis of H 0 : u2 = 0 and v2 = 0 . The LM
test combines two one-way random effect models for group and time,
LM v = LM + LM v ~ 2 (2) .
http://www.indiana.edu/~statmath
The Hausman specification test compares the fixed versus random effects under the null
hypothesis that the individual effects are uncorrelated with the other regressors in the model
(Hausman 1978). If correlated (H0 is rejected), a random effect model produces biased
estimators, violating one of the Gauss-Markov assumptions; so a fixed effect model is preferred.
Hausmans essential result is that the covariance of an efficient estimator with its difference
from an inefficient estimator is zero (Greene 2003).
' 1
(bRobust bEfficient ) ~ 2 (k ) ,
m = (bRobust bEfficient )
= Var[b
Robust bEfficient ] = Var (bRobust ) Var (bEfficient ) is the difference between the estimated
covariance matrix of the parameter estimates in the LSDV model (robust) and that of the
random effects model (efficient). It is notable that an intercept and dummy variables SHOULD
be excluded in computation.
What is poolability? It asks if slopes are the same across groups or over time. Thus, the null
hypothesis of the poolability test is H 0 : ik = k . Remember that slopes remain constant in
fixed and random effect models; only intercepts and error variances matter.
The poolability test is undertaken under the assumption of ~ N (0, s 2 I NT ) . This test uses the F
statistic, Fobs =
(e' e ei' ei ) (n 1) K
e e
'
i i
n(T K )
pooled OLS and ei' ei is the SSE of the OLS regression for group i. If the null hypothesis is
rejected, the panel data are not poolable. Under this circumstance, you may go to the random
coefficient model or hierarchical regression model.
Similarly, the null hypothesis of the poolability test over time is H 0 : tk = k . The F-test is
Fobs =
(e' e et' et ) (T 1) K
et' et T (n K )
regression at time t.
http://www.indiana.edu/~statmath
Number of obs
F( 3,
86)
Prob > F
R-squared
Adj R-squared
Root MSE
=
90
= 2419.34
= 0.0000
= 0.9883
= 0.9879
= .12461
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------output |
.8827385
.0132545
66.60
0.000
.8563895
.9090876
fuel |
.453977
.0203042
22.36
0.000
.4136136
.4943404
load |
-1.62751
.345302
-4.71
0.000
-2.313948
-.9410727
_cons |
9.516923
.2292445
41.51
0.000
9.0612
9.972645
------------------------------------------------------------------------------
LSDV1 drops a dummy variable to identify the model. LSDV1 produces correct ANOVA
information, goodness of fit, parameter estimates, and standard errors. As a consequence, this
approach is commonly used in practice. LSDV produces six regression equations for six groups
(airlines).
Group1:
Group2:
Group3:
Group4:
Group5:
Group6:
cost
cost
cost
cost
cost
cost
=
=
=
=
=
=
9.706
9.665
9.497
9.891
9.730
9.793
+
+
+
+
+
+
.919*output
.919*output
.919*output
.919*output
.919*output
.919*output
+.417*fuel
+.417*fuel
+.417*fuel
+.417*fuel
+.417*fuel
+.417*fuel
-1.070*load
-1.070*load
-1.070*load
-1.070*load
-1.070*load
-1.070*load
In SAS, the REG procedure fits the OLS regression model. Let us drop the last dummy g6, the
reference point.
PROC REG DATA=masil.airline;
MODEL cost = g1-g5 output fuel load;
http://www.indiana.edu/~statmath
RUN;
The REG Procedure
Model: MODEL1
Dependent Variable: cost
Number of Observations Read
Number of Observations Used
90
90
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Corrected Total
8
81
89
113.74827
0.29262
114.04089
14.21853
0.00361
Root MSE
Dependent Mean
Coeff Var
0.06011
13.36561
0.44970
R-Square
Adj R-Sq
F Value
Pr > F
3935.79
<.0001
0.9974
0.9972
Parameter Estimates
Variable
Intercept
g1
g2
g3
g4
g5
output
fuel
load
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
1
1
1
1
1
9.79300
-0.08706
-0.12830
-0.29598
0.09749
-0.06301
0.91928
0.41749
-1.07040
0.26366
0.08420
0.07573
0.05002
0.03301
0.02389
0.02989
0.01520
0.20169
37.14
-1.03
-1.69
-5.92
2.95
-2.64
30.76
27.47
-5.31
<.0001
0.3042
0.0941
<.0001
0.0041
0.0100
<.0001
<.0001
<.0001
Note that the parameter estimate of g6 is presented in the intercept (9.793). Other dummy
parameter estimates are computed with the reference point. The actual intercept of the group 1,
for example, is computed as 9.706 = 9.793 + (-.087)*1 + (-.1283)*0 + (-.2960)*0 + (.0975)*0 +
(-.0630)*0, where 9.793 is the reference point.
STATA has the .regress command for OLS regression (LSDV).
. regress cost g1-g5 output fuel load
Source |
SS
df
MS
-------------+-----------------------------Model |
113.74827
8 14.2185338
Residual | .292622872
81 .003612628
-------------+-----------------------------Total | 114.040893
89 1.28135835
Number of obs
F( 8,
81)
Prob > F
R-squared
Adj R-squared
Root MSE
=
90
= 3935.79
= 0.0000
= 0.9974
= 0.9972
= .06011
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
http://www.indiana.edu/~statmath
-------------+---------------------------------------------------------------g1 | -.0870617
.0841995
-1.03
0.304
-.2545924
.080469
g2 | -.1282976
.0757281
-1.69
0.094
-.2789728
.0223776
g3 | -.2959828
.0500231
-5.92
0.000
-.395513
-.1964526
g4 |
.097494
.0330093
2.95
0.004
.0318159
.1631721
g5 |
-.063007
.0238919
-2.64
0.010
-.1105443
-.0154697
output |
.9192846
.0298901
30.76
0.000
.8598126
.9787565
fuel |
.4174918
.0151991
27.47
0.000
.3872503
.4477333
load | -1.070396
.20169
-5.31
0.000
-1.471696
-.6690963
_cons |
9.793004
.2636622
37.14
0.000
9.268399
10.31761
------------------------------------------------------------------------------
Now, run the LIMDEP Regress$ command to fit the LSDV1. Do not forget to include ONE
for the intercept in the Rhs;.
--> REGRESS;Lhs=COST;Rhs=ONE,G1,G2,G3,G4,G5,OUTPUT,FUEL,LOAD$
+-----------------------------------------------------------------------+
| Ordinary
least squares regression
Weighting variable = none
|
| Dep. var. = COST
Mean=
13.36560933
, S.D.=
1.131971444
|
| Model size: Observations =
90, Parameters =
9, Deg.Fr.=
81 |
| Residuals: Sum of squares= .2926207777
, Std.Dev.=
.06010 |
| Fit:
R-squared= .997434, Adjusted R-squared =
.99718 |
| Model test: F[ 8,
81] = 3935.82,
Prob value =
.00000 |
| Diagnostic: Log-L =
130.0865, Restricted(b=0) Log-L =
-138.3581 |
|
LogAmemiyaPrCrt.=
-5.528, Akaike Info. Crt.=
-2.691 |
| Autocorrel: Durbin-Watson Statistic =
1.02645,
Rho =
.48677 |
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Constant
9.793021272
.26366104
37.142
.0000
G1
-.8707201949E-01 .84199161E-01
-1.034
.3042
.16666667
G2
-.1283060033
.75727781E-01
-1.694
.0940
.16666667
G3
-.2959885994
.50022855E-01
-5.917
.0000
.16666667
G4
.9749253376E-01 .33009146E-01
2.954
.0041
.16666667
G5
-.6300770422E-01 .23891796E-01
-2.637
.0100
.16666667
OUTPUT
.9192881432
.29889967E-01
30.756
.0000
-1.1743092
FUEL
.4174910457
.15199071E-01
27.468
.0000
12.770359
LOAD
-1.070395015
.20168924
-5.307
.0000
.56046016
(Note: E+nn or E-nn means multiply by 10 to + or -nn power.)
What if you drop a different dummy variable, say g1, instead of g6? Since the different
reference point is applied, you will get different dummy coefficients. The other statistics such
as goodness-of-fits, however, remain unchanged.
. regress cost g2-g6 output fuel load // LSDV1 dropping g1
Source |
SS
df
MS
-------------+-----------------------------Model |
113.74827
8 14.2185338
Residual | .292622872
81 .003612628
-------------+-----------------------------Total | 114.040893
89 1.28135835
Number of obs
F( 8,
81)
Prob > F
R-squared
Adj R-squared
Root MSE
=
90
= 3935.79
= 0.0000
= 0.9974
= 0.9972
= .06011
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------g2 | -.0412359
.0251839
-1.64
0.105
-.0913441
.0088722
g3 | -.2089211
.0427986
-4.88
0.000
-.2940769
-.1237652
g4 |
.1845557
.0607527
3.04
0.003
.0636769
.3054345
http://www.indiana.edu/~statmath
g5 |
.0240547
.0799041
0.30
0.764
-.1349293
.1830387
g6 |
.0870617
.0841995
1.03
0.304
-.080469
.2545924
output |
.9192846
.0298901
30.76
0.000
.8598126
.9787565
fuel |
.4174918
.0151991
27.47
0.000
.3872503
.4477333
load | -1.070396
.20169
-5.31
0.000
-1.471696
-.6690963
_cons |
9.705942
.193124
50.26
0.000
9.321686
10.0902
------------------------------------------------------------------------------
When you have not created dummy variables, take advantage of the .xi prefix command.7
Note that STATA by default drops the first dummy variable while the SAS TSCSREG and
PANEL procedures in 4.5.2 drops the last dummy.
.
i.airline
_Iairline_1-6
Source |
SS
df
MS
-------------+-----------------------------Model |
113.74827
8 14.2185338
Residual | .292622872
81 .003612628
-------------+-----------------------------Total | 114.040893
89 1.28135835
Number of obs
F( 8,
81)
Prob > F
R-squared
Adj R-squared
Root MSE
=
90
= 3935.79
= 0.0000
= 0.9974
= 0.9972
= .06011
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_Iairline_2 | -.0412359
.0251839
-1.64
0.105
-.0913441
.0088722
_Iairline_3 | -.2089211
.0427986
-4.88
0.000
-.2940769
-.1237652
_Iairline_4 |
.1845557
.0607527
3.04
0.003
.0636769
.3054345
_Iairline_5 |
.0240547
.0799041
0.30
0.764
-.1349293
.1830387
_Iairline_6 |
.0870617
.0841995
1.03
0.304
-.080469
.2545924
output |
.9192846
.0298901
30.76
0.000
.8598126
.9787565
fuel |
.4174918
.0151991
27.47
0.000
.3872503
.4477333
load | -1.070396
.20169
-5.31
0.000
-1.471696
-.6690963
_cons |
9.705942
.193124
50.26
0.000
9.321686
10.0902
------------------------------------------------------------------------------
LSDV2 reports actual parameter estimates of the dummies. Because LSDV2 suppresses the
intercept, you will get incorrect F and R2 statistics.
In the SAS REG procedure, you need to use the /NOINT option to suppress the intercept. Note
that the F value of 497,985 and R2 of 1 are not likely.
PROC REG DATA=masil.airline;
MODEL cost = g1-g6 output fuel load /NOINT;
RUN;
The REG Procedure
Model: MODEL1
Dependent Variable: cost
Number of Observations Read
Number of Observations Used
90
90
The STATA .xi is used either as an ordinary command or a prefix command like .bysort. This command
creates dummies from a categorical variable specified in the term i. and then run the command following the
colon.
http://www.indiana.edu/~statmath
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Uncorrected Total
9
81
90
16191
0.29262
16192
1799.03381
0.00361
Root MSE
Dependent Mean
Coeff Var
0.06011
13.36561
0.44970
R-Square
Adj R-Sq
F Value
Pr > F
497985
<.0001
1.0000
1.0000
Parameter Estimates
Variable
g1
g2
g3
g4
g5
g6
output
fuel
load
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
1
1
1
1
1
9.70594
9.66471
9.49702
9.89050
9.73000
9.79300
0.91928
0.41749
-1.07040
0.19312
0.19898
0.22496
0.24176
0.26094
0.26366
0.02989
0.01520
0.20169
50.26
48.57
42.22
40.91
37.29
37.14
30.76
27.47
-5.31
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
STATA uses the noconstant option to suppress the intercept. Note that noc is its abbreviation.
. regress cost g1-g6 output fuel load, noc
Source |
SS
df
MS
-------------+-----------------------------Model | 16191.3043
9 1799.03381
Residual | .292622872
81 .003612628
-------------+-----------------------------Total | 16191.5969
90 179.906633
Number of obs
F( 9,
81)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
90
.
0.0000
1.0000
1.0000
.06011
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------g1 |
9.705942
.193124
50.26
0.000
9.321686
10.0902
g2 |
9.664706
.198982
48.57
0.000
9.268794
10.06062
g3 |
9.497021
.2249584
42.22
0.000
9.049424
9.944618
g4 |
9.890498
.2417635
40.91
0.000
9.409464
10.37153
g5 |
9.729997
.2609421
37.29
0.000
9.210804
10.24919
g6 |
9.793004
.2636622
37.14
0.000
9.268399
10.31761
output |
.9192846
.0298901
30.76
0.000
.8598126
.9787565
fuel |
.4174918
.0151991
27.47
0.000
.3872503
.4477333
load | -1.070396
.20169
-5.31
0.000
-1.471696
-.6690963
------------------------------------------------------------------------------
In LIMDEP, you need to drop ONE out of the Rhs; to suppress the intercept. Unlike SAS and
STATA, LIMDEP reports correct R2 and F even in LSDV2.
http://www.indiana.edu/~statmath
--> REGRESS;Lhs=COST;Rhs=G1,G2,G3,G4,G5,G6,OUTPUT,FUEL,LOAD$
+-----------------------------------------------------------------------+
| Ordinary
least squares regression
Weighting variable = none
|
| Dep. var. = COST
Mean=
13.36560933
, S.D.=
1.131971444
|
| Model size: Observations =
90, Parameters =
9, Deg.Fr.=
81 |
| Residuals: Sum of squares= .2926207777
, Std.Dev.=
.06010 |
| Fit:
R-squared= .997434, Adjusted R-squared =
.99718 |
| Model test: F[ 8,
81] = 3935.82,
Prob value =
.00000 |
| Diagnostic: Log-L =
130.0865, Restricted(b=0) Log-L =
-138.3581 |
|
LogAmemiyaPrCrt.=
-5.528, Akaike Info. Crt.=
-2.691 |
| Model does not contain ONE. R-squared and F can be negative!
|
| Autocorrel: Durbin-Watson Statistic =
1.02645,
Rho =
.48677 |
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
G1
9.705949253
.19312325
50.258
.0000
.16666667
G2
9.664715269
.19898117
48.571
.0000
.16666667
G3
9.497032673
.22495746
42.217
.0000
.16666667
G4
9.890513806
.24176245
40.910
.0000
.16666667
G5
9.730013568
.26094094
37.288
.0000
.16666667
G6
9.793021272
.26366104
37.142
.0000
.16666667
OUTPUT
.9192881432
.29889967E-01
30.756
.0000
-1.1743092
FUEL
.4174910457
.15199071E-01
27.468
.0000
12.770359
LOAD
-1.070395015
.20168924
-5.307
.0000
.56046016
(Note: E+nn or E-nn means multiply by 10 to + or -nn power.)
LSDV3 imposes a restriction that the sum of the dummy parameters is zero. The SAS REG
procedure uses the RESTRICT statement to impose restrictions.
PROC REG DATA=masil.airline;
MODEL cost = g1-g6 output fuel load;
RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0;
RUN;
The REG Procedure
Model: MODEL1
Dependent Variable: cost
NOTE: Restrictions have been applied to parameter estimates.
90
90
Analysis of Variance
Source
http://www.indiana.edu/~statmath
DF
Sum of
Squares
Mean
Square
F Value
Pr > F
8
81
89
113.74827
0.29262
114.04089
Root MSE
Dependent Mean
Coeff Var
0.06011
13.36561
0.44970
14.21853
0.00361
R-Square
Adj R-Sq
3935.79
<.0001
0.9974
0.9972
Parameter Estimates
Variable
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
Intercept
g1
g2
g3
g4
g5
g6
output
fuel
load
RESTRICT
1
1
1
1
1
1
1
1
1
1
-1
9.71353
-0.00759
-0.04882
-0.21651
0.17697
0.01647
0.07948
0.91928
0.41749
-1.07040
3.01674E-15
0.22964
0.04562
0.03798
0.01606
0.01942
0.03669
0.04050
0.02989
0.01520
0.20169
1.51088E-10
42.30
-0.17
-1.29
-13.48
9.11
0.45
1.96
30.76
27.47
-5.31
0.00
<.0001
0.8683
0.2023
<.0001
<.0001
0.6547
0.0532
<.0001
<.0001
<.0001
1.0000*
The dummy coefficients mean deviations from the averaged group effect (9.714). The actual
intercept of group 2, for example, is 9.665 =9.714+ (-.049). Note that the 3.01674E-15 of
RESTRICT below is virtually zero.
In STATA, you have to use the .cnsreg command rather than .regress. The command,
however, does not provide an ANOVA table and goodness-of-fit statistics.
. constraint define 1 g1 + g2 + g3 + g4 + g5 + g6 = 0
. cnsreg cost g1-g6 output fuel load, constraint(1)
Constrained linear regression
Number of obs =
90
F( 8,
81) = 3935.79
Prob > F
= 0.0000
Root MSE
= .06011
( 1) g1 + g2 + g3 + g4 + g5 + g6 = 0
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------g1 | -.0075859
.0456178
-0.17
0.868
-.0983509
.0831792
g2 | -.0488218
.0379787
-1.29
0.202
-.1243875
.0267439
g3 | -.2165069
.0160624
-13.48
0.000
-.2484661
-.1845478
g4 |
.1769698
.0194247
9.11
0.000
.1383208
.2156189
g5 |
.0164689
.0366904
0.45
0.655
-.0565335
.0894712
g6 |
.0794759
.0405008
1.96
0.053
-.001108
.1600597
output |
.9192846
.0298901
30.76
0.000
.8598126
.9787565
fuel |
.4174918
.0151991
27.47
0.000
.3872503
.4477333
load | -1.070396
.20169
-5.31
0.000
-1.471696
-.6690963
_cons |
9.713528
.229641
42.30
0.000
9.256614
10.17044
------------------------------------------------------------------------------
http://www.indiana.edu/~statmath
LIMDEP has the Cls$ subcommand to impose restrictions. Again, do not forget to include
ONE in the Rhs;.
--> REGRESS;Lhs=COST;Rhs=ONE,G1,G2,G3,G4,G5,G6,OUTPUT,FUEL,LOAD;
Cls:b(1)+b(2)+b(3)+b(4)+b(5)+b(6)=0$
+-----------------------------------------------------------------------+
| Linearly restricted regression
|
| Ordinary
least squares regression
Weighting variable = none
|
| Dep. var. = COST
Mean=
13.36560933
, S.D.=
1.131971444
|
| Model size: Observations =
90, Parameters =
9, Deg.Fr.=
81 |
| Residuals: Sum of squares= .2926207777
, Std.Dev.=
.06010 |
| Fit:
R-squared= .997434, Adjusted R-squared =
.99718 |
|
(Note: Not using OLS. R-squared is not bounded in [0,1] |
| Model test: F[ 8,
81] = 3935.82,
Prob value =
.00000 |
| Diagnostic: Log-L =
130.0865, Restricted(b=0) Log-L =
-138.3581 |
|
LogAmemiyaPrCrt.=
-5.528, Akaike Info. Crt.=
-2.691 |
| Note, when restrictions are imposed, R-squared can be less than zero. |
| F[ 1,
80] for the restrictions =
.0000, Prob = 1.0000
|
| Autocorrel: Durbin-Watson Statistic =
1.02645,
Rho =
.48677 |
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Constant
12.12205614
.27886962
43.469
.0000
G1
-2.416106889
.89836871E-01 -26.894
.0000
.16666667
G2
-2.457340873
.82929154E-01 -29.632
.0000
.16666667
G3
-2.625023469
.56175656E-01 -46.729
.0000
.16666667
G4
-2.231542336
.41557714E-01 -53.697
.0000
.16666667
G5
-2.392042574
.29995908E-01 -79.746
.0000
.16666667
G6
-2.329034870
.33569388E-01 -69.380
.0000
.16666667
OUTPUT
.9192881432
.29889967E-01
30.756
.0000
-1.1743092
FUEL
.4174910457
.15199071E-01
27.468
.0000
12.770359
LOAD
-1.070395015
.20168924
-5.307
.0000
.56046016
LSDV3 in LIMDEP reports different dummy coefficients. But you may draw actual intercepts
of groups in a manner similar to what you would do in SAS and STATA. The actual intercept
of group 3, for example, is 9.497 = 12.122 + (-2.625).
The within effect model does not use the dummies and thus has larger degrees of freedom,
smaller MSE, and smaller standard errors of parameters than those of LSDV. As a consequence,
you need to adjust standard errors. This model does not report individual dummy coefficients
either. The SAS TSCSREG procedure and LIMDEP Regress$ command report the adjusted
(correct) MSE, SEE (Root MSE), R2, and standard errors.
4.5.1 Estimating the Within Effect Model
http://www.indiana.edu/~statmath
First, let us manually estimate the within group effect model in STATA. You need to compute
group means and transform dependent and independent variables using group means (log is
skipped here).
.
.
.
.
egen
egen
egen
egen
.
.
.
.
gen
gen
gen
gen
gw_cost =
gw_output
gw_fuel =
gw_load =
Now, we are ready to run the within effect model. Keep in mind that you have to suppress the
intercept. Carefully check MSE, SEE, R2, and standard errors.
. regress gw_cost gw_output gw_fuel gw_load, noc // within effect
Source |
SS
df
MS
-------------+-----------------------------Model | 39.0683861
3 13.0227954
Residual | .292622861
87 .003363481
-------------+-----------------------------Total |
39.361009
90 .437344544
Number of obs
F( 3,
87)
Prob > F
R-squared
Adj R-squared
Root MSE
=
90
= 3871.82
= 0.0000
= 0.9926
= 0.9923
=
.058
-----------------------------------------------------------------------------gw_cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------gw_output |
.9192846
.028841
31.87
0.000
.86196
.9766092
gw_fuel |
.4174918
.0146657
28.47
0.000
.3883422
.4466414
gw_load | -1.070396
.1946109
-5.50
0.000
-1.457206
-.6835858
------------------------------------------------------------------------------
You may compute group intercepts using d g* = y g ' x g . For example, the intercept of
airline 5 is computed as 9.730 = 12.363 {.919*(-2.286) + .417*12.792 + (-1.073)*.566 }. In
order to get the correct standard errors, you need to adjust them using the ratio of degrees of
freedom of the within effect model and the LSDV. For example, the standard error of the
logged output is computed as .0299=.0288*sqrt(87/81).
4.5.2 Using the SAS TSCSREG and PANEL Procedures
The TSCSREG and PANEL procedures of SAS/ETS allows users to fit the within effect model
conveniently. The procedures, in fact, report LSDV1, but you do not need to create dummy
http://www.indiana.edu/~statmath
variables and compute deviations from the group means. This procedures reports correct MSE,
SEE, R2, and standard errors, and conducts the F test for the fixed group effect as well.
PROC SORT DATA=masil.airline;
BY airline year;
PROC TSCSREG DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /FIXONE;
RUN;
The TSCSREG Procedure
Dependent Variable: cost
Model Description
Estimation Method
Number of Cross Sections
Time Series Length
FixOne
6
15
Fit Statistics
SSE
MSE
R-Square
0.2926
0.0036
0.9974
DFE
Root MSE
81
0.0601
Den DF
F Value
Pr > F
81
57.73
<.0001
Parameter Estimates
DF
Estimate
Standard
Error
t Value
Pr > |t|
CS1
-0.08706
0.0842
-1.03
0.3042
CS2
-0.1283
0.0757
-1.69
0.0941
CS3
-0.29598
0.0500
-5.92
<.0001
CS4
0.097494
0.0330
2.95
0.0041
CS5
-0.06301
0.0239
-2.64
0.0100
Intercept
output
fuel
load
1
1
1
1
9.793004
0.919285
0.417492
-1.0704
0.2637
0.0299
0.0152
0.2017
37.14
30.76
27.47
-5.31
<.0001
<.0001
<.0001
<.0001
Variable
http://www.indiana.edu/~statmath
Label
Cross Sectional
Effect
1
Cross Sectional
Effect
2
Cross Sectional
Effect
3
Cross Sectional
Effect
4
Cross Sectional
Effect
5
Intercept
Note that a data set needs to be sorted in advance by variables to appear in the ID statement of
the TSCSREG and PANEL procedures. The following PANEL procedure returns the same
output.
PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /FIXONE;
RUN;
The STATA .xtreg command fits the within group effect model without creating dummy
variables. The command reports correct standard errors and the F test for fixed group effects.
This command, however, does not provide an analysis of variance (ANOVA) table and correct
R2 and F statistics. The .xtreg command should follow the .tsset command that specifies
grouping and time variables.
. tsset airline year
panel variable:
time variable:
airline, 1 to 6
year, 1 to 15
The fe of .xtreg indicates the within effect model and i(airline) specifies airline as the
independent unit. Note that this command reports adjusted (correct) standard errors.
. xtreg cost output fuel load, fe i(airline) // within group effect
Fixed-effects (within) regression
Group variable (i): airline
Number of obs
Number of groups
=
=
90
6
R-sq:
15
15.0
15
within = 0.9926
between = 0.9856
overall = 0.9873
corr(u_i, Xb)
= -0.3475
F(3,81)
Prob > F
=
=
3604.80
0.0000
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------output |
.9192846
.0298901
30.76
0.000
.8598126
.9787565
fuel |
.4174918
.0151991
27.47
0.000
.3872503
.4477333
load | -1.070396
.20169
-5.31
0.000
-1.471696
-.6690963
_cons |
9.713528
.229641
42.30
0.000
9.256614
10.17044
-------------+---------------------------------------------------------------sigma_u |
.1320775
sigma_e | .06010514
rho | .82843653
(fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0:
F(5, 81) =
57.73
Prob > F = 0.0000
The last line of the output tests the null hypothesis that all dummy parameters in LSDV1 are
zero (e.g., g1=0, g2=0, g3=0, g4=0, and g5=0). Not the intercept of 9.714 is that of LSDV3.
4.5.4 Using LIMDEP
http://www.indiana.edu/~statmath
In LIMDEP, you have to specify the panel data model and stratification or time variables. The
Panel$ and Fixed$ subcommands mean a fixed effect panel data model. The Str$
subcommand specifies a stratification variable.
--> REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=AIRLINE;Fixed$
+-----------------------------------------------------------------------+
| OLS Without Group Dummy Variables
|
| Ordinary
least squares regression
Weighting variable = none
|
| Dep. var. = COST
Mean=
13.36560933
, S.D.=
1.131971444
|
| Model size: Observations =
90, Parameters =
4, Deg.Fr.=
86 |
| Residuals: Sum of squares= 1.335449522
, Std.Dev.=
.12461 |
| Fit:
R-squared= .988290, Adjusted R-squared =
.98788 |
| Model test: F[ 3,
86] = 2419.33,
Prob value =
.00000 |
| Diagnostic: Log-L =
61.7699, Restricted(b=0) Log-L =
-138.3581 |
|
LogAmemiyaPrCrt.=
-4.122, Akaike Info. Crt.=
-1.284 |
| Panel Data Analysis of COST
[ONE way]
|
|
Unconditional ANOVA (No regressors)
|
| Source
Variation
Deg. Free.
Mean Square
|
| Between
74.6799
5.
14.9360
|
| Residual
39.3611
84.
.468584
|
| Total
114.041
89.
1.28136
|
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
OUTPUT
.8827386341
.13254552E-01
66.599
.0000
-1.1743092
FUEL
.4539777119
.20304240E-01
22.359
.0000
12.770359
LOAD
-1.627507797
.34530293
-4.713
.0000
.56046016
Constant
9.516912231
.22924522
41.514
.0000
(Note: E+nn or E-nn means multiply by 10 to + or -nn power.)
+-----------------------------------------------------------------------+
| Least Squares with Group Dummy Variables
|
| Ordinary
least squares regression
Weighting variable = none
|
| Dep. var. = COST
Mean=
13.36560933
, S.D.=
1.131971444
|
| Model size: Observations =
90, Parameters =
9, Deg.Fr.=
81 |
| Residuals: Sum of squares= .2926207777
, Std.Dev.=
.06010 |
| Fit:
R-squared= .997434, Adjusted R-squared =
.99718 |
| Model test: F[ 8,
81] = 3935.82,
Prob value =
.00000 |
| Diagnostic: Log-L =
130.0865, Restricted(b=0) Log-L =
-138.3581 |
|
LogAmemiyaPrCrt.=
-5.528, Akaike Info. Crt.=
-2.691 |
| Estd. Autocorrelation of e(i,t)
.573531
|
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
OUTPUT
.9192881432
.29889967E-01
30.756
.0000
-1.1743092
FUEL
.4174910457
.15199071E-01
27.468
.0000
12.770359
LOAD
-1.070395015
.20168924
-5.307
.0000
.56046016
(Note: E+nn or E-nn means multiply by 10 to + or -nn power.)
LIMDEP reports both the pooled OLS regression and the within effect model. Like the SAS
TSCSREG procedure, LIMDEP provides correct MSE, SEE, R2, and standard errors.
http://www.indiana.edu/~statmath
The between effect model uses aggregate information, group means of variables. In other
words, the unit of analysis is not an individual observation, but groups or subjects. The number
of observations jumps down to n from nT. This group mean regression produces different
goodness-of-fits and parameter estimates from those of LSDV and the within effect model.
Let us compute group means and run the OLS regression with them. The .collapse command
computes aggregate information and saves into a new data set. Note that /// links two command
lines.
. collapse (mean) gm_cost=cost (mean) gm_output=output (mean) gm_fuel=fuel (mean) ///
gm_load=load, by(airline)
. regress gm_cost gm_output gm_fuel gm_load
Source |
SS
df
MS
-------------+-----------------------------Model | 4.94698124
3 1.64899375
Residual | .031675926
2 .015837963
-------------+-----------------------------Total | 4.97865717
5 .995731433
Number of obs
F( 3,
2)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
6
104.12
0.0095
0.9936
0.9841
.12585
-----------------------------------------------------------------------------gm_cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------gm_output |
.7824568
.1087646
7.19
0.019
.3144803
1.250433
gm_fuel | -5.523904
4.478718
-1.23
0.343
-24.79427
13.74647
gm_load | -1.751072
2.743167
-0.64
0.589
-13.55397
10.05182
_cons |
85.8081
56.48199
1.52
0.268
-157.2143
328.8305
------------------------------------------------------------------------------
The SAS PANEL procedure has the /BTWNG and /BTWNT option to estimate the between
effect model. The TSCSREG procedure does not have this option.
PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /BTWNG;
RUN;
The PANEL Procedure
Between Groups Estimates
Dependent Variable: cost
Model Description
Estimation Method
Number of Cross Sections
Time Series Length
BtwGrps
6
15
Fit Statistics
SSE
MSE
http://www.indiana.edu/~statmath
0.0317
0.0158
DFE
Root MSE
2
0.1258
0.9936
Parameter Estimates
Variable
DF
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
85.80901
0.782455
-5.52398
-1.75102
56.4830
0.1088
4.4788
2.7432
1.52
7.19
-1.23
-0.64
0.2681
0.0188
0.3427
0.5886
Intercept
output
fuel
load
Label
Intercept
The STATA .xtreg command has the be option to fit the between effect model. This
command, however, does not report the ANOVA table.
. xtreg cost output fuel load, be i(airline)
Between regression (regression on group means)
Group variable (i): airline
Number of obs
Number of groups
=
=
90
6
R-sq:
15
15.0
15
within = 0.8808
between = 0.9936
overall = 0.1371
sd(u_i + avg(e_i.))=
.1258491
F(3,2)
Prob > F
=
=
104.12
0.0095
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------output |
.7824552
.1087663
7.19
0.019
.3144715
1.250439
fuel | -5.523978
4.478802
-1.23
0.343
-24.79471
13.74675
load | -1.751016
2.74319
-0.64
0.589
-13.55401
10.05198
_cons |
85.80901
56.48302
1.52
0.268
-157.2178
328.8358
------------------------------------------------------------------------------
LIMDEP has the Mean; subcommand to fit the between effect model.
--> REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=AIRLINE;Means$
+-----------------------------------------------------------------------+
| Group Means Regression
|
| Ordinary
least squares regression
Weighting variable = none
|
| Dep. var. = YBAR(i.) Mean=
13.36560933
, S.D.=
.9978636346
|
| Model size: Observations =
6, Parameters =
4, Deg.Fr.=
2 |
| Residuals: Sum of squares= .3167277206E-01, Std.Dev.=
.12584 |
| Fit:
R-squared= .993638, Adjusted R-squared =
.98410 |
| Model test: F[ 3,
2] = 104.13,
Prob value =
.00953 |
| Diagnostic: Log-L =
7.2185, Restricted(b=0) Log-L =
-7.9538 |
|
LogAmemiyaPrCrt.=
-3.635, Akaike Info. Crt.=
-1.073 |
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
OUTPUT
.7824472689
.10876126
7.194
.0000 .23025612E-11
FUEL
-5.524437466
4.4786519
-1.234
.2174
.18642891
LOAD
-1.750947653
2.7430470
-.638
.5233
.32541105
Constant
85.81483169
56.481148
1.519
.1287
http://www.indiana.edu/~statmath
How do we know whether there are fixed group effects? The null hypothesis is that all dummy
parameters except one are zero: H 0 : 1 = ... = n 1 = 0 .
In order to conduct a F-test, let us take the SSE (ee) of 1.3354 from the pooled OLS regression
and .2926 from the LSDVs (LSDV1 through LSDV3) or the within effect model. Alternatively,
you may draw R2 of .9974 from LSDV1 or LSDV3 and .9883 from the pooled OLS. Do not,
however, use LSDV2 and the within effect model for R2.
The Fstatistic is computed as
The large F statistic rejects the null hypothesis in favor of the fixed group effect model
(p<.0000).
The SAS TSCSREG and PANEL procedures and STATA .xtreg command by default conduct
the F test. Alternatively, you may conduct the same test with LSDV1. In SAS, add the TEST
statement in the REG procedure and run the procedure again (other outputs are skipped).
PROC REG DATA=masil.airline;
MODEL cost = g1-g5 output fuel load;
TEST g1 = g2 = g3 = g4 = g5 = 0;
RUN;
The REG Procedure
Model: MODEL1
Test 1 Results for Dependent Variable cost
Source
DF
Mean
Square
Numerator
Denominator
5
81
0.20856
0.00361
F Value
Pr > F
57.73
<.0001
In STATA, run the .test command, a follow-up command for the Wald test, right after
estimating the model.
. quietly regress cost g1-g5 output fuel load // LSDV1
. test g1 g2 g3 g4 g5
(
(
(
(
(
1)
2)
3)
4)
5)
g1
g2
g3
g4
g5
F(
=
=
=
=
=
0
0
0
0
0
5,
81) =
Prob > F =
57.73
0.0000
http://www.indiana.edu/~statmath
4.8 Summary
Table 6 summarizes the estimation of panel data models in SAS, STATA, and LIMDEP. The
SAS REG and TSCSREG procedures are generally preferred to STATA and LIMDEP
commands.
Table 6 Comparison of the Fixed Effect Model in SAS, STATA, LIMDEP*
SAS 9.1
STATA 9.0
LIMDEP 8.0
OLS estimation
LSDV1
LSDV2
PROC REG;
Correct
Incorrect F, (adjusted) R2
. regress (cnsreg)
Correct
Incorrect F, (adjusted) R2
LSDV3
Correct
. cnsreg command
No R2 , ANOVA table but F
. xtreg
Regress$
Correct (slightly different F)
Correct (slightly different F)
Correct R2
Correct (slightly different F)
Different dummy coefficients
Regress; Panel$
PROC TSCSREG;
PROC PANEL;
Estimation type
LSDV1
Within and between effect
Within effect
SSE (ee)
Correct
No
Correct
MSE or SEE
Correct (adjusted)
No
Correct (adjusted) SEE
Model test (F)
No
Incorrect
Slightly different F
(adjusted) R2
Correct
Incorrect
Correct
Intercept
Correct
LSDV3 intercept
No
Coefficients
Correct
Correct
Correct
Standard errors
Correct (adjusted)
Correct (adjusted)
Correct (adjusted)
Effect test (F)
Yes
Yes
No
Between effect
Yes (PROC PANEL;)
N/A
Yes (the be option)
* Yes/No means whether the software reports the statistics. Correct/incorrect indicates whether the statistics
are different from those of the least squares dummy variable (LSDV) 1 without a dummy variable.
Panel Estimation
http://www.indiana.edu/~statmath
The least squares dummy variable (LSDV) model produces fifteen regression equations. This
section does not present all outputs, but one or two for each LSDV approach.
Time01:
Time02:
Time03:
Time04:
Time05:
Time06:
Time07:
Time08:
Time09:
Time10:
Time11:
Time12:
Time13:
Time14:
Time15:
cost
cost
cost
cost
cost
cost
cost
cost
cost
cost
cost
cost
cost
cost
cost
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
20.496
20.578
20.656
20.741
21.200
21.412
21.503
21.654
21.830
22.114
22.465
22.651
22.617
22.552
22.537
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
.868*output
.868*output
.868*output
.868*output
.868*output
.868*output
.868*output
.868*output
.868*output
.868*output
.868*output
.868*output
.868*output
.868*output
.868*output
.484*fuel
.484*fuel
.484*fuel
.484*fuel
.484*fuel
.484*fuel
.484*fuel
.484*fuel
.484*fuel
.484*fuel
.484*fuel
.484*fuel
.484*fuel
.484*fuel
.484*fuel
-1.954*load
-1.954*load
-1.954*load
-1.954*load
-1.954*load
-1.954*load
-1.954*load
-1.954*load
-1.954*load
-1.954*load
-1.954*load
-1.954*load
-1.954*load
-1.954*load
-1.954*load
Let us begin with the SAS REG procedure. The test statement examines fixed time effects.
PROC REG DATA=masil.airline;
MODEL cost = t1-t14 output fuel load;
RUN;
The REG Procedure
Model: MODEL1
Dependent Variable: cost
Number of Observations Read
Number of Observations Used
90
90
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Corrected Total
17
72
89
112.95270
1.08819
114.04089
6.64428
0.01511
Root MSE
http://www.indiana.edu/~statmath
0.12294
R-Square
F Value
Pr > F
439.62
<.0001
0.9905
13.36561
0.91981
Adj R-Sq
0.9882
Parameter Estimates
Variable
Intercept
t1
t2
t3
t4
t5
t6
t7
t8
t9
t10
t11
t12
t13
t14
output
fuel
load
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
22.53677
-2.04096
-1.95873
-1.88103
-1.79601
-1.33693
-1.12514
-1.03341
-0.88274
-0.70719
-0.42296
-0.07144
0.11457
0.07979
0.01546
0.86773
-0.48448
-1.95440
4.94053
0.73469
0.72275
0.72036
0.69882
0.50604
0.40862
0.37642
0.32601
0.29470
0.16679
0.07176
0.09841
0.08442
0.07264
0.01541
0.36411
0.44238
4.56
-2.78
-2.71
-2.61
-2.57
-2.64
-2.75
-2.75
-2.71
-2.40
-2.54
-1.00
1.16
0.95
0.21
56.32
-1.33
-4.42
<.0001
0.0070
0.0084
0.0110
0.0122
0.0101
0.0075
0.0076
0.0085
0.0190
0.0134
0.3228
0.2482
0.3477
0.8320
<.0001
0.1875
<.0001
The following are the corresponding STATA and LIMDEP commands for LSDV1 (outputs are
skipped).
. regress cost t1-t14 output fuel load
REGRESS;Lhs=COST;Rhs=ONE,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,OUTPUT,FUEL,LOAD$
Let us use LIMDEP to fit LSDV2 because it reports correct (although slightly different) F and
R2 statistics.
--> REGRESS;Lhs=COST;Rhs=T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,OUTPUT,FUEL,LOAD$
+-----------------------------------------------------------------------+
| Ordinary
least squares regression
Weighting variable = none
|
| Dep. var. = COST
Mean=
13.36560929
, S.D.=
1.131971002
|
| Model size: Observations =
90, Parameters = 18, Deg.Fr.=
72 |
| Residuals: Sum of squares= 1.088190223
, Std.Dev.=
.12294 |
| Fit:
R-squared= .990458, Adjusted R-squared =
.98820 |
| Model test: F[ 17,
72] = 439.62,
Prob value =
.00000 |
| Diagnostic: Log-L =
70.9837, Restricted(b=0) Log-L =
-138.3581 |
|
LogAmemiyaPrCrt.=
-4.010, Akaike Info. Crt.=
-1.177 |
| Model does not contain ONE. R-squared and F can be negative!
|
| Autocorrel: Durbin-Watson Statistic =
2.93900,
Rho =
-.46950 |
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
http://www.indiana.edu/~statmath
T1
20.49580478
4.2095283
4.869
.0000 .66666667E-01
T2
20.57803885
4.2215262
4.875
.0000 .66666667E-01
T3
20.65573100
4.2241771
4.890
.0000 .66666667E-01
T4
20.74075857
4.2457497
4.885
.0000 .66666667E-01
T5
21.19983202
4.4403312
4.774
.0000 .66666667E-01
T6
21.41162082
4.5386212
4.718
.0000 .66666667E-01
T7
21.50335085
4.5713968
4.704
.0000 .66666667E-01
T8
21.65402827
4.6228858
4.684
.0000 .66666667E-01
T9
21.82957108
4.6569062
4.688
.0000 .66666667E-01
T10
22.11380260
4.7926483
4.614
.0000 .66666667E-01
T11
22.46532734
4.9499089
4.539
.0000 .66666667E-01
T12
22.65133704
5.0085924
4.522
.0000 .66666667E-01
T13
22.61655508
4.9861391
4.536
.0000 .66666667E-01
T14
22.55222832
4.9559418
4.551
.0000 .66666667E-01
T15
22.53676562
4.9405321
4.562
.0000 .66666667E-01
OUTPUT
.8677267843
.15408184E-01
56.316
.0000
-1.1743092
FUEL
-.4844835367
.36410849
-1.331
.1875
12.770359
LOAD
-1.954404328
.44237771
-4.418
.0000
.56046015
(Note: E+nn or E-nn means multiply by 10 to + or -nn power.)
The following are the corresponding SAS REG procedure and STATA command for LSDV2
(outputs are skipped).
PROC REG DATA=masil.airline;
MODEL cost = t1-t15 output fuel load /NOINT;
RUN;
90
90
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Corrected Total
17
72
89
112.95270
1.08819
114.04089
6.64428
0.01511
http://www.indiana.edu/~statmath
F Value
Pr > F
439.62
<.0001
Root MSE
Dependent Mean
Coeff Var
0.12294
13.36561
0.91981
R-Square
Adj R-Sq
0.9905
0.9882
Parameter Estimates
Variable
DF
Parameter
Estimate
Standard
Error
t Value
Intercept
t1
t2
t3
t4
t5
t6
t7
t8
t9
t10
t11
t12
t13
t14
t15
output
fuel
load
RESTRICT
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
-1
21.66698
-1.17118
-1.08894
-1.01125
-0.92622
-0.46715
-0.25536
-0.16363
-0.01296
0.16259
0.44682
0.79834
0.98435
0.94957
0.88524
0.86978
0.86773
-0.48448
-1.95440
-3.946E-15
4.62405
0.41783
0.40586
0.40323
0.38177
0.19076
0.09856
0.07190
0.04862
0.06271
0.17599
0.32940
0.38756
0.36537
0.33549
0.32029
0.01541
0.36411
0.44238
.
4.69
-2.80
-2.68
-2.51
-2.43
-2.45
-2.59
-2.28
-0.27
2.59
2.54
2.42
2.54
2.60
2.64
2.72
56.32
-1.33
-4.42
.
Pr > |t|
<.0001
0.0065
0.0090
0.0144
0.0178
0.0168
0.0116
0.0258
0.7907
0.0115
0.0133
0.0179
0.0132
0.0113
0.0102
0.0083
<.0001
0.1875
<.0001
.
In STATA, define the restriction with the .constraint command and specify the restriction
using the constraint() option of the .cnsreg command.
. constraint define 3 t1+t2+t3+t4+t5+t6+t7+t8+t9+t10+t11+t12+t13+t14+t15=0
. cnsreg cost t1-t15 output fuel load, constraint(3)
Constrained linear regression
Number of obs =
90
F( 17,
72) = 439.62
Prob > F
= 0.0000
Root MSE
= .12294
( 1) t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------t1 | -1.171179
.4178338
-2.80
0.007
-2.004115
-.3382422
t2 | -1.088945
.4058579
-2.68
0.009
-1.898008
-.2798816
t3 | -1.011252
.4032308
-2.51
0.014
-1.815078
-.2074266
t4 | -.9262249
.3817675
-2.43
0.018
-1.687265
-.1651852
t5 | -.4671515
.1907596
-2.45
0.017
-.8474239
-.0868791
t6 | -.2553627
.0985615
-2.59
0.012
-.4518415
-.0588839
t7 | -.1636326
.0718969
-2.28
0.026
-.3069564
-.0203088
t8 | -.0129552
.0486249
-0.27
0.791
-.1098872
.0839768
t9 |
.1625876
.0627099
2.59
0.012
.0375776
.2875976
t10 |
.4468191
.175994
2.54
0.013
.0959814
.7976568
t11 |
.7983439
.3294027
2.42
0.018
.1416916
1.454996
t12 |
.9843536
.3875583
2.54
0.013
.2117702
1.756937
http://www.indiana.edu/~statmath
t13 |
.9495716
.3653675
2.60
0.011
.2212248
1.677918
t14 |
.8852448
.3354912
2.64
0.010
.2164554
1.554034
t15 |
.8697821
.3202933
2.72
0.008
.2312891
1.508275
output |
.8677268
.0154082
56.32
0.000
.8370111
.8984424
fuel | -.4844835
.3641085
-1.33
0.188
-1.210321
.2413535
load | -1.954404
.4423777
-4.42
0.000
-2.836268
-1.07254
_cons |
21.66698
4.624053
4.69
0.000
12.4491
30.88486
------------------------------------------------------------------------------
The following are the corresponding LIMDEP command for LSDV3 (outputs are skipped).
REGRESS;Lhs=COST;Rhs=ONE,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,OUTPUT,FUEL,LOAD;
Cls:b(1)+b(2)+b(3)+b(4)+b(5)+b(6)+b(7)+b(8)+b(9)+b(10)+b(11)+b(12)+b(13)+b(14)+b(15)=0$
The within effect mode for the fixed time effects needs to compute deviations from the time
means. Keep in mind that the intercept should be suppressed.
5.2.1 Estimating the Time Effect Model
egen
egen
egen
egen
tm_cost =
tm_output
tm_fuel =
tm_load =
+---------------------------------------------------+
| year
tm_cost
tm_output
tm_fuel
tm_load |
|---------------------------------------------------|
|
1
12.36897
-1.790283
11.63606
.4788587 |
|
2
12.45963
-1.744389
11.66868
.4868322 |
|
3
12.60706
-1.577767
11.67494
.52358 |
|
4
12.77912
-1.443695
11.73193
.5244486 |
|
5
12.94143
-1.398122
12.26843
.5635266 |
|
6
13.0452
-1.393002
12.53826
.5541809 |
|
7
13.15965
-1.302416
12.62714
.5607425 |
|
8
13.29884
-1.222963
12.76768
.5670587 |
|
9
13.4651
-1.067003
12.86104
.6179098 |
|
10
13.70187
-.9023156
13.23183
.6233943 |
|
11
13.91324
-.9205539
13.66246
.5802577 |
|
12
14.05984
-.8641667
13.82315
.5856243 |
|
13
14.12841
-.7923916
13.75979
.5803183 |
|
14
14.23517
-.6428015
13.67403
.5804528 |
|
15
14.32062
-.5527684
13.62997
.5797168 |
+---------------------------------------------------+
.
.
.
.
gen
gen
gen
gen
tw_cost =
tw_output
tw_fuel =
tw_load =
http://www.indiana.edu/~statmath
=
90
= 2015.95
= 0.0000
= 0.9858
= 0.9853
= .11184
-----------------------------------------------------------------------------tw_cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------tw_output |
.8677268
.0140171
61.90
0.000
.8398663
.8955873
tw_fuel | -.4844836
.3312359
-1.46
0.147
-1.142851
.1738836
tw_load | -1.954404
.4024388
-4.86
0.000
-2.754295
-1.154514
------------------------------------------------------------------------------
If you want to get intercepts of years, use d t* = yt ' xt . For example, the intercept of year 7
is 21.503=13.1597-{.8677*(-1.3024) + (-.4845)*12.6271 + (-1.9544)*.5607}. As discussed
previously, the standard errors of the within effects model need to be adjusted. For instance, the
correct standard error of fuel price is computed as .364 = .3312*sqrt(87/72).
5.2.2 Using the TSCSREG and PANEL procedures
You need to sort the data set by variables (i.e., year and airline) to appear in the ID
statement of the TSCSREG and PANEL procedures.
PROC SORT DATA=masil.airline;
BY year airline;
PROC PANEL DATA=masil.airline;
ID year airline;
MODEL cost = output fuel load /FIXONE;
RUN;
The PANEL Procedure
Fixed One Way Estimates
Dependent Variable: cost
Model Description
Estimation Method
Number of Cross Sections
Time Series Length
FixOne
15
6
Fit Statistics
SSE
MSE
R-Square
1.0882
0.0151
0.9905
DFE
Root MSE
72
0.1229
Den DF
F Value
Pr > F
14
72
1.17
0.3178
Parameter Estimates
Variable
DF
Estimate
http://www.indiana.edu/~statmath
Standard
Error
t Value
Pr > |t|
Label
CS1
-2.04096
0.7347
-2.78
0.0070
CS2
-1.95873
0.7228
-2.71
0.0084
CS3
-1.88103
0.7204
-2.61
0.0110
CS4
-1.79601
0.6988
-2.57
0.0122
CS5
-1.33693
0.5060
-2.64
0.0101
CS6
-1.12514
0.4086
-2.75
0.0075
CS7
-1.03341
0.3764
-2.75
0.0076
CS8
-0.88274
0.3260
-2.71
0.0085
CS9
-0.70719
0.2947
-2.40
0.0190
CS10
-0.42296
0.1668
-2.54
0.0134
CS11
-0.07144
0.0718
-1.00
0.3228
CS12
0.114571
0.0984
1.16
0.2482
CS13
0.079789
0.0844
0.95
0.3477
CS14
0.015463
0.0726
0.21
0.8320
Intercept
output
fuel
load
1
1
1
1
22.53677
0.867727
-0.48448
-1.9544
4.9405
0.0154
0.3641
0.4424
4.56
56.32
-1.33
-4.42
<.0001
<.0001
0.1875
<.0001
Cross Sectional
Effect
1
Cross Sectional
Effect
2
Cross Sectional
Effect
3
Cross Sectional
Effect
4
Cross Sectional
Effect
5
Cross Sectional
Effect
6
Cross Sectional
Effect
7
Cross Sectional
Effect
8
Cross Sectional
Effect
9
Cross Sectional
Effect
10
Cross Sectional
Cross Sectional
Effect
12
Cross Sectional
Effect
13
Cross Sectional
Effect
14
Intercept
The STATA .xtreg command uses the fe option for the fixed effect model.
. xtreg cost output fuel load, fe i(year)
Fixed-effects (within) regression
Group variable (i): year
Number of obs
Number of groups
=
=
90
15
R-sq:
6
6.0
6
within = 0.9858
between = 0.4812
overall = 0.5265
corr(u_i, Xb)
= -0.1503
F(3,72)
Prob > F
=
=
1668.37
0.0000
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
http://www.indiana.edu/~statmath
-------------+---------------------------------------------------------------output |
.8677268
.0154082
56.32
0.000
.8370111
.8984424
fuel | -.4844835
.3641085
-1.33
0.188
-1.210321
.2413535
load | -1.954404
.4423777
-4.42
0.000
-2.836268
-1.07254
_cons |
21.66698
4.624053
4.69
0.000
12.4491
30.88486
-------------+---------------------------------------------------------------sigma_u |
.8027907
sigma_e | .12293801
rho | .97708602
(fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0:
F(14, 72) =
1.17
Prob > F = 0.3178
http://www.indiana.edu/~statmath
OUTPUT
.8677268093
.15408179E-01
56.316
.0000
-1.1743092
FUEL
-.4844946699
.36410984
-1.331
.1868
12.770359
LOAD
-1.954414378
.44237791
-4.418
.0000
.56046016
(Note: E+nn or E-nn means multiply by 10 to + or -nn power.)
+------------------------------------------------------------------------+
|
Test Statistics for the Classical Model
|
|
|
|
Model
Log-Likelihood
Sum of Squares
R-squared |
| (1) Constant term only
-138.35814
.1140409821D+03
.0000000 |
| (2) Group effects only
-120.52864
.7673414157D+02
.3271354 |
| (3) X - variables only
61.76991
.1335449522D+01
.9882897 |
| (4) X and group effects
70.98362
.1088193393D+01
.9904579 |
|
|
|
Hypothesis Tests
|
|
Likelihood Ratio Test
F Tests
|
|
Chi-squared
d.f. Prob.
F
num. denom. Prob value |
| (2) vs (1)
35.659
14
.00117
2.605
14
75
.00404 |
| (3) vs (1)
400.256
3
.00000 2419.329
3
86
.00000 |
| (4) vs (1)
418.684
17
.00000
439.617
17
72
.00000 |
| (4) vs (2)
383.025
3
.00000 1668.364
3
72
.00000 |
| (4) vs (3)
18.427
14
.18800
1.169
14
72
.31776 |
+------------------------------------------------------------------------+
The between effect model regresses time means of dependent variables on those of independent
variables. See also 3.2 and 4.6.
. collapse (mean) tm_cost=cost (mean) tm_output=output (mean) tm_fuel=fuel ///
(mean) tm_load=load, by(year)
. regress tm_cost tm_output tm_fuel tm_load // between time effect
Source |
SS
df
MS
-------------+-----------------------------Model | 6.21220479
3 2.07073493
Residual | .005590631
11 .000508239
-------------+-----------------------------Total | 6.21779542
14 .444128244
Number of obs
F( 3,
11)
Prob > F
R-squared
Adj R-squared
Root MSE
=
15
= 4074.33
= 0.0000
= 0.9991
= 0.9989
= .02254
-----------------------------------------------------------------------------tm_cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------tm_output |
1.133337
.0512898
22.10
0.000
1.020449
1.246225
tm_fuel |
.3342486
.0228284
14.64
0.000
.2840035
.3844937
tm_load | -1.350727
.2478264
-5.45
0.000
-1.896189
-.8052644
_cons |
11.18505
.3660016
30.56
0.000
10.37949
11.99062
------------------------------------------------------------------------------
The SAS PANEL procedure has the /BTWNT option to estimate the between effect model.
PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /BTWNT;
RUN;
The PANEL Procedure
http://www.indiana.edu/~statmath
BtwTime
6
15
Fit Statistics
SSE
MSE
R-Square
0.0056
0.0005
0.9991
DFE
Root MSE
11
0.0225
Parameter Estimates
Variable
Intercept
output
fuel
load
DF
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
11.18504
1.133335
0.334249
-1.35073
0.3660
0.0513
0.0228
0.2478
30.56
22.10
14.64
-5.45
<.0001
<.0001
<.0001
0.0002
Label
Intercept
You may use the be option in the STATA .xtreg command and the Means; subcommand in
LIMDEP (outputs are skipped).
. xtreg cost output fuel load, be i(year) // between time effect model
--> REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=YEAR;Means$
The null hypothesis is that all time dummy parameters except one are zero:
(1.3354 1.0882) (15 1)
H 0 : 1 = ... = T 1 = 0 . The F statistic is
~ 1.1683[14,72] . The p(1.0882) (6 *15 15 3)
value of .3180 does not reject the null hypothesis.
The SAS TSCSREG and PANEL procedures and the STATA .xtreg command conduct the
Wald test. You may get the same test using the TEST statement in LSDV1 and the
STATA .test command (the output is skipped).
PROC REG DATA=masil.airline;
MODEL cost = t1-t14 output fuel load;
TEST t1=t2=t3=t4=t5=t6=t7=t8=t9=t10=t11=t12=t13=t14=0;
RUN;
http://www.indiana.edu/~statmath
There are four approaches to avoid the perfect multicollinearity or the dummy variable trap.
You may not suppress the intercept under any circumstances.
Drop one cross-section and one time-series dummy variables.
Drop one cross-section dummy and impose a restriction on the time-series dummies of
t = 0
Drop one time-series dummy and impose a restriction on the cross-section dummies of
g = 0
Include all dummy variables and impose two restrictions on the cross-section and timeseries dummies of g = 0 and t = 0
Number of obs
F( 22,
67)
Prob > F
R-squared
Adj R-squared
Root MSE
=
90
= 1960.82
= 0.0000
= 0.9984
= 0.9979
= .05138
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------g1 |
.1742825
.0861201
2.02
0.047
.0023861
.346179
g2 |
.1114508
.0779551
1.43
0.157
-.0441482
.2670499
g3 |
-.143511
.0518934
-2.77
0.007
-.2470907
-.0399313
g4 |
.1802087
.0321443
5.61
0.000
.1160484
.2443691
g5 | -.0466942
.0224688
-2.08
0.042
-.0915422
-.0018463
t1 | -.6931382
.3378385
-2.05
0.044
-1.367467
-.0188098
t2 | -.6384366
.3320802
-1.92
0.059
-1.301271
.0243983
t3 | -.5958031
.3294473
-1.81
0.075
-1.253383
.0617764
t4 | -.5421537
.3189139
-1.70
0.094
-1.178708
.0944011
t5 | -.4730429
.2319459
-2.04
0.045
-.9360088
-.0100769
t6 | -.4272042
.18844
-2.27
0.027
-.8033319
-.0510764
t7 | -.3959783
.1732969
-2.28
0.025
-.7418804
-.0500762
t8 | -.3398463
.1501062
-2.26
0.027
-.6394596
-.040233
t9 | -.2718933
.1348175
-2.02
0.048
-.5409901
-.0027964
t10 | -.2273857
.0763495
-2.98
0.004
-.37978
-.0749914
t11 | -.1118032
.0319005
-3.50
0.001
-.175477
-.0481295
t12 |
-.033641
.0429008
-0.78
0.436
-.1192713
.0519893
t13 | -.0177346
.0362554
-0.49
0.626
-.0901007
.0546315
t14 | -.0186451
.030508
-0.61
0.543
-.0795393
.042249
http://www.indiana.edu/~statmath
output |
.8172487
.031851
25.66
0.000
.7536739
.8808235
fuel |
.16861
.163478
1.03
0.306
-.1576935
.4949135
load | -.8828142
.2617373
-3.37
0.001
-1.405244
-.3603843
_cons |
12.94004
2.218231
5.83
0.000
8.512434
17.36765
------------------------------------------------------------------------------
The following is the corresponding SAS REG procedure (outputs are skipped).
PROC REG DATA=masil.airline;
MODEL cost = g1-g5 t1-t14 output fuel load;
RUN;
The LIMDEP example is skipped here, since many dummy variables need to be listed in the
Regress$ command.
In the second approach, you may drop either one group dummy or one time dummy. The
following drops one time dummy, includes all group dummies, and imposes a restriction on
group dummies.
PROC REG DATA=masil.airline;
MODEL cost = g1-g6 t1-t14 output fuel load;
RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0;
RUN;
The REG Procedure
Model: MODEL1
Dependent Variable: cost
NOTE: Restrictions have been applied to parameter estimates.
90
90
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Corrected Total
22
67
89
113.86404
0.17685
114.04089
5.17564
0.00264
Root MSE
Dependent Mean
Coeff Var
0.05138
13.36561
0.38439
R-Square
Adj R-Sq
Parameter Estimates
Parameter
http://www.indiana.edu/~statmath
Standard
F Value
Pr > F
1960.82
<.0001
0.9984
0.9979
Variable
DF
Estimate
Error
t Value
Intercept
g1
g2
g3
g4
g5
g6
t1
t2
t3
t4
t5
t6
t7
t8
t9
t10
t11
t12
t13
t14
output
fuel
load
RESTRICT
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
-1
12.98600
0.12833
0.06549
-0.18947
0.13425
-0.09265
-0.04596
-0.69314
-0.63844
-0.59580
-0.54215
-0.47304
-0.42720
-0.39598
-0.33985
-0.27189
-0.22739
-0.11180
-0.03364
-0.01773
-0.01865
0.81725
0.16861
-0.88281
-1.9387E-16
2.22540
0.04601
0.03897
0.01561
0.01832
0.03731
0.04161
0.33784
0.33208
0.32945
0.31891
0.23195
0.18844
0.17330
0.15011
0.13482
0.07635
0.03190
0.04290
0.03626
0.03051
0.03185
0.16348
0.26174
.
5.84
2.79
1.68
-12.14
7.33
-2.48
-1.10
-2.05
-1.92
-1.81
-1.70
-2.04
-2.27
-2.28
-2.26
-2.02
-2.98
-3.50
-0.78
-0.49
-0.61
25.66
1.03
-3.37
.
Pr > |t|
<.0001
0.0069
0.0975
<.0001
<.0001
0.0155
0.2733
0.0441
0.0588
0.0750
0.0938
0.0454
0.0266
0.0255
0.0268
0.0477
0.0040
0.0008
0.4357
0.6263
0.5432
<.0001
0.3061
0.0012
.
Alternatively, you may run the STATA .cnsreg command with the second constraint (output
is skipped).
. cnsreg cost g1-g6 t1-t14 output fuel load, constraint(2)
The following drops one group dummy and imposes a restriction on time dummies.
. cnsreg cost g1-g5 t1-t15 output fuel load, constraint(3)
Constrained linear regression
Number of obs =
90
F( 22,
67) = 1960.82
Prob > F
= 0.0000
Root MSE
= .05138
( 1) t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------g1 |
.1742825
.0861201
2.02
0.047
.0023861
.346179
g2 |
.1114508
.0779551
1.43
0.157
-.0441482
.2670499
g3 |
-.143511
.0518934
-2.77
0.007
-.2470907
-.0399313
g4 |
.1802087
.0321443
5.61
0.000
.1160484
.2443691
g5 | -.0466942
.0224688
-2.08
0.042
-.0915422
-.0018463
t1 | -.3740245
.191872
-1.95
0.055
-.7570026
.0089536
t2 | -.3193228
.1860877
-1.72
0.091
-.6907554
.0521097
t3 | -.2766893
.1833501
-1.51
0.136
-.6426576
.0892789
t4 | -.2230399
.1729671
-1.29
0.202
-.5682837
.1222038
t5 | -.1539291
.0864404
-1.78
0.079
-.3264649
.0186066
t6 | -.1080904
.0448591
-2.41
0.019
-.1976296
-.0185513
t7 | -.0768646
.0319336
-2.41
0.019
-.1406043
-.0131248
t8 | -.0207326
.0204506
-1.01
0.314
-.061552
.0200869
t9 |
.0472205
.0290822
1.62
0.109
-.0108278
.1052688
http://www.indiana.edu/~statmath
t10 |
.0917281
.0811525
1.13
0.262
-.0702531
.2537092
t11 |
.2073105
.1491443
1.39
0.169
-.0903829
.5050039
t12 |
.2854727
.1756365
1.63
0.109
-.0650993
.6360447
t13 |
.3013791
.1660294
1.82
0.074
-.030017
.6327752
t14 |
.3004686
.1536212
1.96
0.055
-.0061606
.6070978
t15 |
.3191137
.1474883
2.16
0.034
.0247259
.6135015
output |
.8172487
.031851
25.66
0.000
.7536739
.8808235
fuel |
.16861
.163478
1.03
0.306
-.1576935
.4949135
load | -.8828142
.2617373
-3.37
0.001
-1.405244
-.3603843
_cons |
12.62093
2.074302
6.08
0.000
8.480603
16.76125
------------------------------------------------------------------------------
You may run the following SAS REG procedure to get the same result (output is skipped).
PROC REG DATA=masil.airline; /* LSDV3 */
MODEL cost = g1-g5 t1-t15 output fuel load;
RESTRICT t1+t2+t3+t4+t5+t6+t7+t8+t9+t10+t11+t12+t13+t14+t15=0;
RUN;
The third approach includes all group and time dummies and imposes two restrictions on group
and time dummies.
. cnsreg cost g1-g6 t1-t15 output fuel load, constraint(2 3)
Constrained linear regression
Number of obs =
90
F( 22,
67) = 1960.82
Prob > F
= 0.0000
Root MSE
= .05138
( 1) g1 + g2 + g3 + g4 + g5 + g6 = 0
( 2) t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9 + t10 + t11 + t12 + t13 + t14 + t15 = 0
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------g1 |
.1283264
.0460126
2.79
0.007
.0364849
.2201679
g2 |
.0654947
.0389685
1.68
0.097
-.0122867
.1432761
g3 | -.1894671
.0156096
-12.14
0.000
-.220624
-.1583102
g4 |
.1342526
.0183163
7.33
0.000
.097693
.1708121
g5 | -.0926504
.0373085
-2.48
0.016
-.1671184
-.0181824
g6 | -.0459561
.0416069
-1.10
0.273
-.1290038
.0370916
t1 | -.3740245
.191872
-1.95
0.055
-.7570026
.0089536
t2 | -.3193228
.1860877
-1.72
0.091
-.6907554
.0521097
t3 | -.2766893
.1833501
-1.51
0.136
-.6426576
.0892789
t4 | -.2230399
.1729671
-1.29
0.202
-.5682837
.1222038
t5 | -.1539291
.0864404
-1.78
0.079
-.3264649
.0186066
t6 | -.1080904
.0448591
-2.41
0.019
-.1976296
-.0185513
t7 | -.0768646
.0319336
-2.41
0.019
-.1406043
-.0131248
t8 | -.0207326
.0204506
-1.01
0.314
-.061552
.0200869
t9 |
.0472205
.0290822
1.62
0.109
-.0108278
.1052688
t10 |
.0917281
.0811525
1.13
0.262
-.0702531
.2537092
t11 |
.2073105
.1491443
1.39
0.169
-.0903829
.5050039
t12 |
.2854727
.1756365
1.63
0.109
-.0650993
.6360447
t13 |
.3013791
.1660294
1.82
0.074
-.030017
.6327752
t14 |
.3004686
.1536212
1.96
0.055
-.0061606
.6070978
t15 |
.3191137
.1474883
2.16
0.034
.0247259
.6135015
output |
.8172487
.031851
25.66
0.000
.7536739
.8808235
fuel |
.16861
.163478
1.03
0.306
-.1576935
.4949135
load | -.8828142
.2617373
-3.37
0.001
-1.405244
-.3603843
_cons |
12.66688
2.081068
6.09
0.000
8.513054
16.82071
------------------------------------------------------------------------------
The following SAS REG procedure gives you the same result (output is skipped).
PROC REG DATA=masil.airline;
http://www.indiana.edu/~statmath
The two-way within group and time effect model requires a transformation of the data set as
yit* = yit yi yt + y and xit* = xit xi xt + x . The following commands do this task.
.
.
.
.
gen
gen
gen
gen
w_cost =
w_output
w_fuel =
w_load =
Now, run the OLS with the transformed variables. Do not forget to suppress the intercept.
. regress w_cost w_output w_fuel w_load, noc // within effect
Source |
SS
df
MS
-------------+-----------------------------Model | 1.87739643
3 .625798811
Residual | .176848774
87 .002032745
-------------+-----------------------------Total | 2.05424521
90 .022824947
Number of obs
F( 3,
87)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
90
307.86
0.0000
0.9139
0.9109
.04509
-----------------------------------------------------------------------------w_cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------w_output |
.8172487
.0279512
29.24
0.000
.7616927
.8728048
w_fuel |
.16861
.1434621
1.18
0.243
-.1165364
.4537565
w_load | -.8828142
.2296907
-3.84
0.000
-1.339349
-.426279
------------------------------------------------------------------------------
Note again that R2, MSE, standard errors, and DFerror are not correct. The dummy variable
coefficients are computed as d g* = ( y g y ) b' ( x g x ) and d t* = ( yt y ) b' ( xt x ) .
The standard errors also need to be adjusted; for instance, the standard error of the load factor
is .2617=.2297*sqrt(87/67).
The SAS TSCSREG and PANEL procedures have the /FIXTWO option to fit the two-way
fixed effect model.
PROC TSCSREG DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /FIXTWO;
RUN;
http://www.indiana.edu/~statmath
FixTwo
6
15
Fit Statistics
SSE
MSE
R-Square
0.1768
0.0026
0.9984
DFE
Root MSE
67
0.0514
Den DF
F Value
Pr > F
19
67
23.10
<.0001
Parameter Estimates
DF
Estimate
Standard
Error
t Value
Pr > |t|
CS1
0.174283
0.0861
2.02
0.0470
CS2
0.111451
0.0780
1.43
0.1575
CS3
-0.14351
0.0519
-2.77
0.0073
CS4
0.180209
0.0321
5.61
<.0001
CS5
-0.04669
0.0225
-2.08
0.0415
TS1
-0.69314
0.3378
-2.05
0.0441
TS2
-0.63844
0.3321
-1.92
0.0588
TS3
-0.5958
0.3294
-1.81
0.0750
TS4
-0.54215
0.3189
-1.70
0.0938
TS5
-0.47304
0.2319
-2.04
0.0454
TS6
-0.4272
0.1884
-2.27
0.0266
TS7
-0.39598
0.1733
-2.28
0.0255
TS8
-0.33985
0.1501
-2.26
0.0268
Variable
http://www.indiana.edu/~statmath
Label
Cross Sectional
Effect
1
Cross Sectional
Effect
2
Cross Sectional
Effect
3
Cross Sectional
Effect
4
Cross Sectional
Effect
5
Time Series
Effect
1
Time Series
Effect
2
Time Series
Effect
3
Time Series
Effect
4
Time Series
Effect
5
Time Series
Effect
6
Time Series
Effect
7
Time Series
Effect
8
TS9
-0.27189
0.1348
-2.02
0.0477
TS10
-0.22739
0.0763
-2.98
0.0040
TS11
-0.1118
0.0319
-3.50
0.0008
TS12
-0.03364
0.0429
-0.78
0.4357
TS13
-0.01773
0.0363
-0.49
0.6263
TS14
-0.01865
0.0305
-0.61
0.5432
Intercept
output
fuel
load
1
1
1
1
12.94004
0.817249
0.16861
-0.88281
2.2182
0.0319
0.1635
0.2617
5.83
25.66
1.03
-3.37
<.0001
<.0001
0.3061
0.0012
Time Series
Effect
9
Time Series
Effect
10
Time Series
Effect
11
Time Series
Effect
12
Time Series
Effect
13
Time Series
Effect
14
Intercept
The STATA .xtreg command does not fit the two-way fixed or random effect model. The
following LIMDEP command fits the two-way fixed model. Note that this command has Str$
and Period$ specifications to specify stratification and time variables. This command presents
the pooled model and one-way group effect model as well, but reports the incorrect intercept in
the two-way fixed model, 12.667 (2.081).
REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=AIRLINE;Period=YEAR;Fixed$
The null hypothesis is that parameters of group and time dummies are zero:
H 0 : 1 = ... = n 1 = 0 and 1 = ... = T 1 = 0 . The F test compares the pooled regression and
two-way group and time effect model. The F statistic of 23.1085 rejects the null hypothesis at
the .01 significance level (p<.0000).
(1.3354 .1768) (6 + 15 2)
~ 23.1085[19,67]
(.1768) (6 *15 6 15 3 + 1)
The SAS TSCSREG and PANEL procedures conduct the F-test for the group and time effects.
You may also run the following SAS REG procedure and .regress command to perform the
same test.
PROC REG DATA=masil.airline;
MODEL cost = g1-g5 t1-t14 output fuel load;
TEST g1=g2=g3=g4=g5=t1=t2=t3=t4=t5=t6=t7=t8=t9=t10=t11=t12=t13=t14=0;
RUN;
http://www.indiana.edu/~statmath
When the omega matrix is not known, you have to estimate using the SSEs of the pooled
model (.0317) and the fixed effect model (.2926).
The variance component of error 2 is .00361263 = .292622872/(6*15-6-3)
The variance component of group u2 is .01559712 =.031675926/(6-4) - .00361263/15
Thus, is .87668488 = 1
.00361263
15 * .031675926/(6 - 4)
Now, transform the dependent and independent variables including the intercept.
.
.
.
.
.
gen
gen
gen
gen
gen
Finally, run the OLS with the transformed variables. Do not forget to suppress the intercept.
This is the groupwise heteroscedastic regression model (Greene 2003).
. regress rg_cost rg_int rg_output rg_fuel rg_load, noc
Source |
SS
df
MS
-------------+-----------------------------Model | 284.670313
4 71.1675783
Residual | .311586777
86 .003623102
-------------+-----------------------------Total |
284.9819
90 3.16646556
Number of obs
F( 4,
86)
Prob > F
R-squared
Adj R-squared
Root MSE
=
90
=19642.72
= 0.0000
= 0.9989
= 0.9989
= .06019
-----------------------------------------------------------------------------rg_cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------rg_int |
9.627911
.2101638
45.81
0.000
9.210119
10.0457
8
Baltagi and Cheng (1994) introduce various ANOVA estimation methods, such as a modified Wallace and
Hussain method, the Wansbeek and Kapteyn method, the Swamy and Arora method, and Hendersons method III.
They also discuss maximum likelihood (ML) estimators, restricted ML estimators, minimum norm quadratic
unbiased estimators (MINQUE), and minimum variance quadratic unbiased estimators (MIVQUE). Based on a
Monte Carlo simulation, they argue that ANOVA estimators are Best Quadratic Unbiased estimators of the
variance components for the balanced model, whereas ML, restricted ML, MINQUE, and MIVQUE are
recommended for the unbalanced models.
http://www.indiana.edu/~statmath
rg_output |
.9066808
.0256249
35.38
0.000
.8557401
.9576215
rg_fuel |
.4227784
.0140248
30.15
0.000
.394898
.4506587
rg_load |
-1.0645
.2000703
-5.32
0.000
-1.462226
-.6667731
------------------------------------------------------------------------------
The SAS TSCSREG and PANEL procedures have the /RANONE option to fit the one-way
random effect model. These procedures by default use the Fuller and Battese (1974) estimation
method, which produces slightly different estimates from FGLS.
PROC TSCSREG DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /RANONE;
RUN;
The TSCSREG Procedure
Dependent Variable: cost
Model Description
Estimation Method
Number of Cross Sections
Time Series Length
RanOne
6
15
Fit Statistics
SSE
MSE
R-Square
0.3090
0.0036
0.9923
DFE
Root MSE
86
0.0599
0.018198
0.003613
m Value
Pr > m
0.92
0.8209
Parameter Estimates
Variable
Intercept
output
http://www.indiana.edu/~statmath
DF
Estimate
Standard
Error
t Value
Pr > |t|
1
1
9.637
0.908024
0.2132
0.0260
45.21
34.91
<.0001
<.0001
1
1
0.422199
-1.06469
29.95
-5.34
<.0001
<.0001
The PANEL procedure has the /VCOMP=WK option for the Wansbeek and Kapteyn (1989)
method, which is close to groupwise heteroscedastic regression. The BP option of the MODEL
statement, not available in the TSCSREG procedure, conducts the Breusch-Pagen LM test for
random effects. Note that two procedures estimate the same variance component for error
(.0036) but a different variance component for groups (.0182 versus .0160),
PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /RANONE BP VCOMP=WK;
RUN;
The PANEL Procedure
Wansbeek and Kapteyn Variance Components (RanOne)
Dependent Variable: cost
Model Description
Estimation Method
Number of Cross Sections
Time Series Length
RanOne
6
15
Fit Statistics
SSE
MSE
R-Square
0.3111
0.0036
0.9923
DFE
Root MSE
86
0.0601
m Value
Pr > m
1.63
0.4429
m Value
Pr > m
334.85
<.0001
Parameter Estimates
http://www.indiana.edu/~statmath
0.016015
0.003613
Variable
DF
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
9.629513
0.906918
0.422676
-1.06452
0.2107
0.0257
0.0140
0.2000
45.71
35.30
30.11
-5.32
<.0001
<.0001
<.0001
<.0001
Intercept
output
fuel
load
The STATA .xtreg command has the re option to produce FGLS estimates. The .iis
command specifies the panel identification variable, such as a grouping or cross-section
variable that is used in the i() option.
. iis airline
. xtreg cost output fuel load, re i(airline) theta
Random-effects GLS regression
Group variable (i): airline
Number of obs
Number of groups
=
=
90
6
R-sq:
15
15.0
15
within = 0.9925
between = 0.9856
overall = 0.9876
Wald chi2(3)
Prob > chi2
=
=
11091.33
0.0000
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------output |
.9066805
.025625
35.38
0.000
.8564565
.9569045
fuel |
.4227784
.0140248
30.15
0.000
.3952904
.4502665
load | -1.064499
.2000703
-5.32
0.000
-1.456629
-.672368
_cons |
9.627909
.210164
45.81
0.000
9.215995
10.03982
-------------+---------------------------------------------------------------sigma_u | .12488859
sigma_e | .06010514
rho | .81193816
(fraction of variance due to u_i)
------------------------------------------------------------------------------
The theta option reports the estimated theta (.8767). The sigma_u and sigma_e are square
roots of the variance components for groups and errors (.0036=.0601^2).
In LIMDEP, you have to specify Panel$ and Het$ subcommands for the groupwise
heteroscedastic model. Note that LIMDEP presents the pooled OLS regression and least square
dummy variable model as well.
--> REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=AIRLINE;Het=AIRLINE$
+-----------------------------------------------------------------------+
| OLS Without Group Dummy Variables
|
| Ordinary
least squares regression
Weighting variable = none
|
| Dep. var. = COST
Mean=
13.36560933
, S.D.=
1.131971444
|
| Model size: Observations =
90, Parameters =
4, Deg.Fr.=
86 |
| Residuals: Sum of squares= 1.335449522
, Std.Dev.=
.12461 |
| Fit:
R-squared= .988290, Adjusted R-squared =
.98788 |
| Model test: F[ 3,
86] = 2419.33,
Prob value =
.00000 |
| Diagnostic: Log-L =
61.7699, Restricted(b=0) Log-L =
-138.3581 |
|
LogAmemiyaPrCrt.=
-4.122, Akaike Info. Crt.=
-1.284 |
http://www.indiana.edu/~statmath
http://www.indiana.edu/~statmath
+--------------------------------------------------+
| Random Effects Model: v(i,t) = e(i,t) + u(i)
|
| Estimates: Var[e]
=
.361260D-02 |
|
Var[u]
=
.119159D-01 |
|
Corr[v(i,t),v(i,s)] =
.767356
|
| Lagrange Multiplier Test vs. Model (3) = 334.85 |
| ( 1 df, prob value = .000000)
|
| (High values of LM favor FEM/REM over CR model.) |
| Fixed vs. Random Effects (Hausman)
=
.00 |
| ( 3 df, prob value = 1.000000)
|
| (High (low) values of H favor FEM (REM).)
|
| Reestimated using GLS coefficients:
|
| Estimates: Var[e]
=
.362491D-02 |
|
Var[u]
=
.392309D-01 |
| Var[e] above is an average. Groupwise
|
| heteroscedasticity model was estimated.
|
|
Sum of Squares
.147779D+01 |
+--------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
OUTPUT
.9041238041
.24615477E-01
36.730
.0000
-1.1743092
FUEL
.4238986905
.13746498E-01
30.837
.0000
12.770359
LOAD
-1.064558659
.19933132
-5.341
.0000
.56046016
Constant
9.610634379
.20277404
47.396
.0000
(Note: E+nn or E-nn means multiply by 10 to + or -nn power.)
Like SAS TSCSREG and PANEL procedures, LIMDEP estimates a slightly different variance
component for groups (.0119), thus producing different parameter estimates. In addition, the
Hausman test is not successful in this example.
Let us compute using the SSEs of the between effect model (.0056) and the fixed effect
model (1.0882).
The variance component for error 2 is .01511375 = 1.08819022/(15*6-15-3)
The variance component for time v2 is -.00201072 =.005590631/(15-4)- .01511375/6
The is - 1.226263 = 1
.
.
.
.
.
gen
gen
gen
gen
gen
.01511375
6 * .005590631/(15 - 4)
http://www.indiana.edu/~statmath
=
=
=
=
=
=
90
.
0.0000
1.0000
1.0000
.14438
-----------------------------------------------------------------------------rt_cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------rt_int |
9.516098
.1489281
63.90
0.000
9.220038
9.812157
rt_output |
.8883838
.0143338
61.98
0.000
.8598891
.9168785
rt_fuel |
.4392731
.0129051
34.04
0.000
.4136186
.4649277
rt_load | -1.279176
.2482869
-5.15
0.000
-1.772754
-.7855982
------------------------------------------------------------------------------
However, the negative value of the variance component for time is not likely. This section
presents examples of procedures and commands for the one-way time random effect model
without outputs.
In SAS, use the TSCSREG or PANEL procedure with the /RANONE option.
PROC SORT DATA=masil.airline;
BY year airline;
PROC TSCSREG DATA=masil.airline;
ID year airline;
MODEL cost = output fuel load /RANONE;
RUN;
PROC PANEL DATA=masil.airline;
ID year airline;
MODEL cost = output fuel load /RANONE BP;
RUN;
In STATA, you have to switch the grouping and time variables using the .tsset command.
. tsset year airline
panel variable:
time variable:
year, 1 to 15
airline, 1 to 6
The random group and time effect model is formulated as y it = + ' X ti + u i + t + it . Let us
first estimate the two way FGLS using the SAS PANEL procedure with the /RANTWO option.
The BP2 option conducts the Breusch-Pagan LM test for the two-way random effect model.
PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /RANTWO BP2;
http://www.indiana.edu/~statmath
RUN;
RanTwo
6
15
Fit Statistics
SSE
MSE
R-Square
0.2322
0.0027
0.9829
DFE
Root MSE
86
0.0520
0.017439
0.001081
0.00264
m Value
Pr > m
6.93
0.0741
m Value
Pr > m
336.40
<.0001
Parameter Estimates
Variable
Intercept
output
fuel
load
DF
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
9.362677
0.866448
0.436163
-0.98053
0.2440
0.0255
0.0172
0.2235
38.38
33.98
25.41
-4.39
<.0001
<.0001
<.0001
<.0001
Similarly, you may run the TSCSREG procedure with the /RANTWO option.
http://www.indiana.edu/~statmath
The Breusch-Pagan Lagrange multiplier (LM) test is designed to test random effects. The null
hypothesis of the one-way random group effect model is that variances of groups are zero:
H 0 : u2 = 0 . If the null hypothesis is not rejected, the pooled regression model is appropriate.
The ee of the pooled OLS is 1.33544153 and e ' e is .0665147.
2
6 * 15 15 2 * .0665
LM is 334.8496=
1 ~ 2 (1) with p <.0000.
2(15 1) 1.3354
With the large chi-squared, we reject the null hypothesis in favor of the random group effect
model. The SAS PANEL procedure with the /BP option and the LIMDEP Panel$ and Het$
subcommands report the LM statistic. In STATA, run the .xttest0 command right after
estimating the one-way random effect model.
. quietly xtreg cost output fuel load, re i(airline)
. xttest0
Breusch and Pagan Lagrangian multiplier test for random effects:
cost[airline,t] = Xb + u[airline] + e[airline,t]
Estimated results:
|
Var
sd = sqrt(Var)
---------+----------------------------cost |
1.281358
1.131971
e |
.0036126
.0601051
u |
.0155972
.1248886
Test:
Var(u) = 0
chi2(1) =
Prob > chi2 =
334.85
0.0000
The null hypothesis of the one-way random time effect is that variance components for time are
zero, H 0 : v2 = 0 . The following LM test uses Baltagis formula. The small chi-squared of
1.5472 does not reject the null hypothesis at the .01 level.
2
2
Tn (net )
15 * 6 .7817
2
1 =
LM is 1.5472 =
2
(
6
1
)
http://www.indiana.edu/~statmath
Var(u) = 0
chi2(1) =
Prob > chi2 =
1.55
0.2135
The two way random effects model has the null hypothesis that variance components for
groups and time are all zero. The LM statistic with two degrees of freedom is 336.3968 =
334.8496 + 1.5472 (p<.0001).
How do we compare a fixed effect model and its counterpart random effect model? The
Hausman specification test examines if the individual effects are uncorrelated with the other
regressors in the model. Since computation is complicated, let us conduct the test in STATA.
. tsset airline year
panel variable:
time variable:
airline, 1 to 6
year, 1 to 15
Ho:
The Hausman statistic 2.12 is different from the PANEL procedures 1.63 and Greene (2003)s
4.16. It is because SAS, STATA, and LIMDEP use different estimation methods to produce
slightly different parameter estimates. These tests, however, do not reject the null hypothesis in
favor of the random effect model.
http://www.indiana.edu/~statmath
7.7 Summary
Table 7 summarizes random effect estimations in SAS, STATA, and LIMDEP. The SAS
PANEL procedure is highly recommended.
Table 7 Comparison of the Random Effect Model in SAS, STATA, LIMDEP*
SAS 9.1
STATA 9.0
LIMDEP 8.0
. xtreg
Regress; Panel$
Procedure/Command PROC TSCSREG PROC PANEL
One-way
/RANONE
/RANONE WK
re
Str=;Pds=;Het;Random$
Two-way
/RANTWO
/RANTWO
No
Problematic
SSE (ee)
Slightly different
Correct
No
No
MSE or SEE
Slightly different
Correct
No
No
Model test (F)
No
No
Wald test
No
(adjusted) R2
Slightly different
Slightly different
Incorrect
No
Intercept
Slightly different
Correct
Correct
Slightly different
Coefficients
Slightly different
Correct
Correct
Slightly different
Standard errors
Slightly different
Correct
Correct
Slightly different
Variance for group
Slightly different
Correct
Correct (sigma) Slightly different
Variance for error
Correct
Correct
Correct (sigma) Correct
theta
Theta
No
No
No
. xttest0
Breusch-Pagan (LM) No
BP option
Yes
. hausman
Hausman Test (H)
Incorrect
Yes
Yes (unstable)
* Yes/No means whether the software reports the statistics. Correct/incorrect indicates whether the statistics
are different from those of the groupwise heteroscedastic regression.
http://www.indiana.edu/~statmath
In SAS, use the BY statement in the REG procedure. Do not forget to sort the data set in
advance.
PROC SORT DATA=masil.airline;
BY airline;
PROC REG DATA=masil.airline;
MODEL cost = output fuel load;
BY airline;
RUN;
Number of obs
F( 3,
11)
Prob > F
R-squared
Adj R-squared
Root MSE
=
15
= 1843.46
= 0.0000
= 0.9980
= 0.9975
= .02486
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------output |
1.18318
.0968946
12.21
0.000
.9699164
1.396444
fuel |
.3865867
.0181946
21.25
0.000
.3465406
.4266329
load | -2.461629
.4013571
-6.13
0.000
-3.34501
-1.578248
_cons |
10.846
.2972551
36.49
0.000
10.19174
11.50025
-----------------------------------------------------------------------------OLS regression for group 2
Source |
SS
df
MS
-------------+-----------------------------Model | 6.47622084
3 2.15874028
Residual | .007587838
11 .000689803
-------------+-----------------------------Total | 6.48380868
14 .463129191
Number of obs
F( 3,
11)
Prob > F
R-squared
Adj R-squared
Root MSE
=
15
= 3129.50
= 0.0000
= 0.9988
= 0.9985
= .02626
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------output |
1.459104
.0792856
18.40
0.000
1.284597
1.63361
fuel |
.3088958
.0272443
11.34
0.000
.2489315
.36886
load | -2.724785
.2376522
-11.47
0.000
-3.247854
-2.201716
_cons |
11.97243
.4320951
27.71
0.000
11.02139
12.92346
http://www.indiana.edu/~statmath
Number of obs
F( 3,
11)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
15
608.10
0.0000
0.9940
0.9924
.0456
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------output |
.7268305
.1554418
4.68
0.001
.3847054
1.068956
fuel |
.4515127
.0381103
11.85
0.000
.3676324
.5353929
load | -.7513069
.6105989
-1.23
0.244
-2.095226
.5926122
_cons |
8.699815
.8985786
9.68
0.000
6.722057
10.67757
-----------------------------------------------------------------------------OLS regression for group 4
Source |
SS
df
MS
-------------+-----------------------------Model | 7.37252558
3 2.45750853
Residual | .034752343
11 .003159304
-------------+-----------------------------Total | 7.40727792
14
.52909128
Number of obs
F( 3,
11)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
15
777.86
0.0000
0.9953
0.9940
.05621
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------output |
.9353749
.0759266
12.32
0.000
.7682616
1.102488
fuel |
.4637263
.044347
10.46
0.000
.3661192
.5613333
load | -.7756708
.4707826
-1.65
0.128
-1.811856
.2605148
_cons |
9.164608
.6023241
15.22
0.000
7.838902
10.49031
-----------------------------------------------------------------------------OLS regression for group 5
Source |
SS
df
MS
-------------+-----------------------------Model | 7.08313716
3 2.36104572
Residual | .012986435
11 .001180585
-------------+-----------------------------Total | 7.09612359
14 .506865971
Number of obs
F( 3,
11)
Prob > F
R-squared
Adj R-squared
Root MSE
=
15
= 1999.89
= 0.0000
= 0.9982
= 0.9977
= .03436
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------output |
1.076299
.0771255
13.96
0.000
.9065471
1.246051
fuel |
.2920542
.0434213
6.73
0.000
.1964845
.3876239
load | -1.206847
.3336308
-3.62
0.004
-1.941163
-.4725305
_cons |
11.77079
.7430078
15.84
0.000
10.13544
13.40614
-----------------------------------------------------------------------------OLS regression for group 6
Source |
SS
df
MS
-------------+-----------------------------Model | 11.1173565
3 3.70578551
Residual | .015663323
11 .001423938
-------------+-----------------------------Total | 11.1330199
14 .795215705
Number of obs
F( 3,
11)
Prob > F
R-squared
Adj R-squared
Root MSE
=
15
= 2602.49
= 0.0000
= 0.9986
= 0.9982
= .03774
-----------------------------------------------------------------------------cost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------output |
.9673393
.0321728
30.07
0.000
.8965275
1.038151
fuel |
.3023258
.0308235
9.81
0.000
.2344839
.3701678
load |
.1050328
.4767508
0.22
0.830
-.9442886
1.154354
http://www.indiana.edu/~statmath
_cons |
10.77381
.4095921
26.30
0.000
9.872309
11.67532
------------------------------------------------------------------------------
The null hypothesis of the poolability test across groups is H 0 : ik = k . The ee is 1.3354, the
SSE of the pooled OLS regression. The ei' ei is .1007 = .0068 + .0076 + .0229 + .0348 + .0130
+ .0157.
Thus, the F statistic is
The large 40.4812 rejects the null hypothesis of poolability (p< .0000). We conclude that the
panel data are not poolable with respect to group.
The null hypothesis of the poolability test over time is H 0 : tk = k . The sum of et' et is
computed from the 15 time by time regression.
. di .044807673 + .023093978 + .016506613 + .012170358 + .014104542 + ///
.000469826 + .063648817 + .085430285 + .049329439 + .077112957 + ///
.029913538 + .087240016 + .143348297 + .066075346 + .037256216
.7505079
The small F statistic does not reject the null hypothesis in favor of poolable panel data with
respect to time (p<.9991).
http://www.indiana.edu/~statmath
9. Conclusion
Panel data models investigate group and time effects using fixed effect and random effect
models. The fixed effect models ask how group and/or time affect the intercept, while the
random effect models analyze error variance structures affected by group and/or time. Slopes
are assumed unchanged in both fixed effect and random effect models.
Fixed effect models are estimated by least squares dummy variable (LSDV) regression, the
within effect model, and the between effect model. LSDV has three approaches to avoid perfect
multicollinearity. LSDV1 drops a dummy, LSDV2 suppresses the intercept, and LSDV3
includes all dummies and imposes restrictions instead. LSDV1 is commonly used since it
produces correct statistics. LSDV2 provides actual parameter estimates of group intercepts, but
reports incorrect R2 and F statistic. Note that the dummy parameters of three LSDV approaches
have different meanings and thus different t-tests.
The within effect model does not use dummy variables but deviations from the group means.
Thus, this model is useful when there are many groups and/or time periods in the panel data set
(no incidental parameter problem at all). The dummy parameter estimates need to be computed
afterward. Because of its larger degrees of freedom, the within effect model produces incorrect
MSE and standard errors of parameters. As a result, you need to adjust the standard errors to
conduct the correct t-tests.
Random effect models are estimated by the generalized least squares (GLS) and the feasible
generalization least squares (FGLS). When the variance structure is known, GLS is used. If
unknown, FGLS estimates theta. Parameter estimates may vary depending on estimation
methods.
Fixed effects are tested by the F-test and random effects by the Breusch-Pagan Lagrange
multiplier test. The Hausman specification test compares a fixed effect model and a random
effect model. If the null hypothesis of uncorrelation is rejected, the fixed effect model is
preferred. Poolabiltiy is tested by running group by group or time by time regressions.
Among the four statistical packages addressed in this document, I would recommend SAS and
STATA. In particular, the SAS PANEL procedure, although experimental now, provides
various ways of analyzing panel data. STATA is very handy to manipulate panel data, but it
does not fit two-way effect models. LIMDEP is able to estimate various panel data models, but
it is not stable enough. SPSS is not recommended for panel data models.
http://www.indiana.edu/~statmath
Data set 2: Cost data for U.S. airlines (1970-1984) presented in Greene (2003).
URL: http://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
airline, 1 to 6
year, 1 to 15
http://www.indiana.edu/~statmath
References
Baltagi, Badi H. 2001. Econometric Analysis of Panel Data. Wiley, John & Sons.
Baltagi, Badi H., and Young-Jae Chang. 1994. "Incomplete Panels: A Comparative Study of
Alternative Estimators for the Unbalanced One-way Error Component Regression
Model." Journal of Econometrics, 62(2): 67-89.
Breusch, T. S., and A. R. Pagan. 1980. "The Lagrange Multiplier Test and its Applications to
Model Specification in Econometrics." Review of Economic Studies, 47(1):239-253.
Fox, John. 1997. Applied Regression Analysis, Linear Models, and Related Methods. Newbury
Park, CA: Sage.
Freund, Rudolf J., and Ramon C. Littell. 2000. SAS System for Regression, 3rd ed. Cary, NC:
SAS Institute.
Fuller, Wayne A. and George E. Battese. 1973. "Transformations for Estimation of Linear
Models with Nested-Error Structure." Journal of the American Statistical
Association, 68(343) (September): 626-632.
Fuller, Wayne A. and George E. Battese. 1974. "Estimation of Linear Models with CrossedError Structure." Journal of Econometrics, 2: 67-78.
Greene, William H. 2002. LIMDEP Version 8.0 Econometric Modeling Guide, 4th ed.
Plainview, New York: Econometric Software.
Greene, William H. 2003. Econometric Analysis, 5th ed. Upper Saddle River, NJ: Prentice Hall.
Hausman, J. A. 1978. "Specification Tests in Econometrics." Econometrica, 46(6):1251-1271.
SAS Institute. 2004. SAS/ETS 9.1 Users Guide. Cary, NC: SAS Institute.
SAS Institute. 2004. SAS/STAT 9.1 Users Guide. Cary, NC: SAS Institute.
http://www.sas.com/
STATA Press. 2005. STATA Base Reference Manual, Release 9. College Station, TX: STATA
Press.
STATA Press. 2005. STATA Longitudinal/Panel Data Reference Manual, Release 9. College
Station, TX: STATA Press.
STATA Press. 2005. STATA Time-Series Reference Manual, Release 9. College Station, TX:
STATA Press.
Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel
Data. Cambridge, MA: MIT Press.
Acknowledgements
I have to thank Dr. Heejoon Kang in the Kelley School of Business and Dr. David H. Good in
the School of Public and Environmental Affairs, Indiana University at Bloomington, for their
insightful lectures. I am also grateful to Jeremy Albright and Kevin Wilhite at the UITS Center
for Statistical and Mathematical Computing for comments and suggestions.
Revision History
z 2005.11 First draft
http://www.indiana.edu/~statmath