File Download PDF
File Download PDF
J A C H O - C H V E Z , R O B E R T
J. PETRUNIA AND MARCEL C. VOIA.
LECTURE NOTES
F O R A DVA N C E D
MICROECONOMETRICS
W I T H S TATA ( V. 1 5 . 1 )
H U Y N H , J A C H O - C H V E Z , P E T RU N I A A N D VO I A
Copyright 2015 Kim P. Huynh, David T. Jacho-Chvez, Robert J. Petrunia and Marcel C. Voia.
published by huynh, jacho-chvez, petrunia and voia
These lecture notes are provided as-is, without warranty of any kind, express or implied, including but
not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. They
are drawn from a plethora of sources and we attempt to cite and acknowledge when possible. In no event
shall the authors be liable for any claim, damages or other liability, whether in an action of contract, tort
or otherwise, arising from, out of or in connection with these lecture notes or the use or other dealings in
these lecture notes.
First printing, December 2015
Introduction
Course Objectives: The purpose of this course is to provide students
with the necessary tools to manage and work with large administrative databases using STATA programming tools. The course is
designed for new and intermediate STATA users who want to acquire
advanced skills in data management and programming in STATA. Besides tools for data management, this course exposes participants to
current empirical work along with microeconometric topics and techniques common to the analysis using large administrative datasets. In
addition to the emphasis on the statistical inference of these models,
we will stress their empirical relevance. After taking this course, the
participants should be able to:
1. Perform database management and estimation tasks using STATA.
2. Leverage STATA programming routines and user-contributed .ado
files.
3. Understand empirical research using microeconometrics, and
choose appropriate models and estimators for given economic
applications.
4. Interpret model estimates and diagnose potential problems with
models and know how to remedy them.
Prerequisite: Undergraduate Econometrics and Matrix Algebra
Lecture Notes: These lecture notes are provided to participants as-is
and without guarantees. They are drawn from a plethora of sources
and we attempt to cite and acknowledge when possible. Mainly the
notes are derived from the following sources:
Cameron, A. C. and P. K. Trivedi (2005) Microeconometrics: Methods
and Applications,1st edition, Cambridge University Press.
Wooldridge, J. M. (2010) Econometric Analysis of Cross Section and
Panel Data, 2nd edition, MIT Press.
Stata Data Analysis and Statistical Software (various manuals).
Course Organization
The course will consist of ten 75 minute lectures. Participants are
encouraged to ask questions.
Monday
4-January
08:00 - 09:00
09:00 - 10:30
10:30 - 10:45
10:45 - 12:15
12:15 - 14:00
14:00 - 15:30
15:30 - 15:45
15:45 - 17:15
Lecture 1
Break
Lecture 2
Tuesday
5-January
Wednesday
6-January
Thursday
7-January
Lecture 3
Break
Lecture 4
Lunch
Lecture 5
Break
Lecture 6
Lunch
Lecture 7
Break
Lecture 8
Lunch
Friday
8-January
Lecture 9
Lecture
10
Viernes
Econmico
Office
Hours
Office
Hours
Office
Hours
Office
Hours
Stata basics
Stata basics describes using directories, file types, basic commands
and steps prior to loading data.
Directories
Stata stores executable files in different folders on your computer.
Find the executable file locations by typing
. sysdir
STATA:
UPDATES:
BASE:
SITE:
PLUS:
c:\ado\plus\
PERSONAL:
c:\ado\personal\
OLDPLACE:
c:\ado\
10
Files are accessible from the working directory without the need
to specific the directory path. Stata has a default directory, which is
changeable.
. cd c:\course
c:\course
Command Syntax
The basic command syntax generally follows the form:
command [command specifics] [qualifiers] [, options]
where
command is the Stata command
command specifics fills in details necessary for the command. These
specifics include variables and mathematical expressions.
qualifiers are if statements for any necessary restrictions and using
statements to specify.
options sets options available specific to a command. Stata has
defaults for all options available.
For example, the summarize command provides summary statistics
(means, standard deviations, counts) for a list of variables.
summarize [varlist] [if statements] [, options]
sum ....
or
. man summarize
(output omitted)
The help command accesses the help viewer, while the man
presents text in the Stata output window. A search option is available
if the command name is unknown.
. search ols
(output omitted)
File Types
This section discusses the various file types associated with Stata.
These include:
1. ADO
These are files which contain routines to execute Stata commands.
Stata provides regular updates to the Ado files with the update
command through the web. We advise you contact your system
administrator about this process.
Individual Stata users often create and make available to other
users their own ado files. These unofficial ado files are usable after
being copied to an executable file directory such as
c:\ado\personal\
11
12
. adopath +C:\course\user_ado
[1]
(UPDATES)
[2]
(BASE)
[3]
(SITE)
[4]
"."
[5]
(PERSONAL)
[6]
(PLUS)
"c:\ado\plus/"
[7]
(OLDPLACE)
"c:\ado/"
[8]
"c:\ado\personal/"
"C:\course\user_ado"
are now usable by Stata. Users also create help files for their ado
files. These help files should also be added to the folder in order to
access while in Stata using the help command. A good source for
these user created ado procedure files is
http://ideas.repec.org/s/boc/bocode.html
2. DO
These are user created program files. A users may wish to execute
a series of Stata commands. These commands may be submitted
individually. However, a do file allows a batch submission of these
commands. The command
do "filedirectory\filename.do"
3. DTA
Stata stores data in dta files. The use command accesses dta file
. use "C:\course\auto.dta", clear
(1978 Automobile Data)
Notice the second time the data is saved that the replace option
is used. The replace option overwrites the file. The browse allows
the user to look at the data in a spreadsheet form, while the edit
command allows the user to look at and manually change the
data.
4. LOG
Files containing output results are called log files. Log file must
be started or opened prior to saving results.
log using "filedirectory/filename.log", options
The Stata default on log files extension is smcl, but the above
command changes the file extension to log. using is necessary
to specify the log file location and name. The option replace
13
14
overwrites the file, while the option append adds new results to the
original log file.2
Data Handling
This section provides details on managing data sets.
1. Data Preliminaries
In Stata, the whole data set is in memory. Therefore, some preliminaries are necessary with large data sets to ensure proper
loading into Stata. The command "set memory value" sets the
memory available to Stata, while the command "set maxvar value"
sets the maximum number of variables. The memory allocation
and maximum number of variables must be large enough to load
the relevant data set. These are not changeable if Stata has data
in active memory. Restrictions on maximum possible memory
and variables vary across different versions of Stata (MP versus
IC versus SE). Increasing the memory allows the loading and
maintaining of larger data sets in Stata at the cost of reducing the
amount of memory available to other software applications.
The last relevant size command is "set matsize value". The
matsize refers to the matrix size or the maximum number of variables that can be included in any of Statas estimation commands.
The user is able to change the maxsize at anytime, both prior and
after loading data. The maximum matsize possible varies across
versions of Stata. The following example shows an OLS regression
with a problem.
. set matsize 100
. regress y x1-x400
matsize too small
2. Loading data
The use command, discussed previously, loads dta files into Statas
active memory. An alternative method to load a Stata dta file is to
use the file > open in the drop down menu.
74
12
----------------------------------------------------------------------------
15
16
storage
variable name
type
display
value
format
label
variable label
---------------------------------------------------------------------------make
str18
%-18s
price
int
%8.0gc
Price
mpg
int
%8.0g
Mileage (mpg)
rep78
int
%8.0g
headroom
float
%6.1f
Headroom (in.)
trunk
int
%8.0g
weight
int
%8.0gc
Weight (lbs.)
length
int
%8.0g
Length (in.)
turn
int
%8.0g
displacement
gear_ratio
int
%8.0g
float
%6.2f
Gear Ratio
foreign
byte
%8.0g
origin
Car type
---------------------------------------------------------------------------Sorted by:
foreign
type
display
value
format
label
variable label
---------------------------------------------------------------------------make
str18
%-18s
price
gear_ratio
int
%8.0gc
Price
float
%6.2f
Gear Ratio
4. Labelling
Stata allows for two types of labelling associated with variables.
The first type or variable label places a label on each variable to
better describe a variable. This label is the variable label given
in the describe command. The following sequence changes a
variables label and changes the label back to its original.
. describe price
storage
variable name
type
display
value
format
label
variable label
---------------------------------------------------------------------------price
int
%8.0gc
Price
. describe price
storage
variable name
type
display
value
format
label
variable label
---------------------------------------------------------------------------price
int
%8.0gc
price in dollars
type
display
value
format
label
variable label
---------------------------------------------------------------------------price
int
%8.0gc
Price
74
12
---------------------------------------------------------------------------storage
variable name
type
display
value
format
label
variable label
---------------------------------------------------------------------------make
str18
%-18s
price
int
%8.0gc
Price
mpg
int
%8.0g
Mileage (mpg)
rep78
int
%8.0g
headroom
float
%6.1f
Headroom (in.)
trunk
int
%8.0g
weight
int
%8.0gc
Weight (lbs.)
length
int
%8.0g
Length (in.)
turn
int
%8.0g
displacement
gear_ratio
int
%8.0g
float
%6.2f
Gear Ratio
17
18
foreign
byte
%8.0g
origin
Car type
---------------------------------------------------------------------------Sorted by:
foreign
The first command creates the label, while the second command
attaches the label to a particular variable. Finally, three command
lines are combined to change a value label
. label drop origin
. label define origin 0 "Domestic" 1 "Foreign"
. label values foreign origin
5. Missing values
Stata denotes missing values as ., .a, .b, .... or .z. Thus,
there are 27 missing values.
In the above data, the variable rep78 has some missing values. A
warning is necessary when using the if qualifier in a command
statement. Stata treats missing values as extremely large values
with .<.a<.b<....<.z. Thus, any or > qualifiers also
includes missing observations. To illustrate, the count command
provides the number of observations satisfying a qualifier condition. There are two cases. The first case uses the > qualifier
. count if rep78>3
34
19
20
In the above, the sort command sorts on the variable var1 first and
then on variable var2 within var1.
7. Stata and Database Management
Stata allows the accessing, loading, writing or viewing of data
stored in database management systems via the
textttodbc command and its various functions. ODBC stands for
Open DataBase Connectivity. Examples of database management
system files include a dBASE file, an Excel file, or an ACCESS file.
Situations for using the odbc command include
To read in Office files but do not have Office on machine
To read and manipulate an Excel file with many tabs
To read and manipulate an Assess database file with many
tables
Database management
Export a data file to a database
Stata requires a setting up data sources step. You must register
and establish ODBC database file as a Data Source prior to using
the odbc commands and reading in an Excel or Access file into
Stata. The process varies depending on the platform but our
example shows how to do it for Windows 7.
(a) In Start Menu select Control Panel
(b) In the Control Panel select System and Security and then Administrative Tools which will bring up the ODBC Data Source
Administrator, see Figure 1.
(c) Within Administrative Tools click on Data Sources (ODBC), see
Figure 2
The odbc list command lists the data sources accessible by Stata.
. odbc list
Data Source Name
Driver
21
22
23
24
25
---------------------------------------------------------------------------dBASE Files
Excel Files
MS Access Database
testdb
testxl
Microsoft Excel Driver (*.xls, *.xlsx, *.xl
----------------------------------------------------------------------------
The odbc query Data Source Name command reports the sheet
name information for an Excel data source file and table name
information for an Access data source file.
. odbc query "testxl"
DataSource: testxl
Path
: C:\Users\robert\Documents\work related\Statcan\CMFE\stc2014\intro\cdn4.xlsx
---------------------------------------------------------------------------full$
partial$
Sheet3$
------------------------------------------------------------------------------. odbc query "testdb"
DataSource: testdb
Path
: C:\Users\robert\Documents\work related\Statcan\CMFE\stc2014\intro\cdn4.accdb
------------------------------------------------------------------------------cdn4
--
The odbc load command now loads the data source into Stata for
use.
Notes:
(a) The command odbc desc describes output for a specified table.
Loading the table into Stata and using the describe command
provides similar output.
(b) Stata allows: (i) point and click to execute the odbc query following the odbc list command; (ii) point and click to execute
the odbc describe following the odbc query command; and
point and click to execute the odbc load following the odbc
query command.
(c) The odbc load command choice of variables to load and if
qualifiers.
26
(d) The exec(sqlstmt) option for the odbc load command allows
the user to submit an SQL SELECT statement.
(e) The odbc insert exports data to an existing ODBC data
source. This command allows data to be added to an existing
table, modifying data in an existing table/sheet or the creation
of a new table. The option create creates a new table/sheet,
overwrite overwrites the existing table/sheet, while specifying
no option augments the table/sheet by adding data. For the
overwrite or no option case, the specified variables must be a
subset of the variable in the table being modified.
8. Data manipulation commands
Stata allows variables and observations to be dropped using the
keep and drop commands. For variables
keep varlist
changes the name of the variable but the new variable retains the
original descriptions.
Data Manipulation
In this section, we discuss the creation of new variables, rewriting
variable values, dummy variables, string variables and dealing with
longitudinal variables.
1. The generate and replace commands
The generate command creates a new variable, while the replace
command causes a replacement of a variables values. Consider
the following command sequence
. generate x1=.
(74 missing values generated)
. replace x1=5
(74 real changes made)
. generate y = 11
. replace y = 3
(74 real changes made)
2. Dummy variables
Dummy or indicator variables are useful when dealing with
conditional states or categorical information. There are a number
of ways to create indicatory variables
(a) The generate and replace command
Use these two commands and if conditional statements to
generate the indicator variable.
. gen d=1 if rep78>3 & rep78~=.
(45 missing values generated)
. replace d=0 if rep78<=3
(40 real changes made)
27
28
_Iforeign_0-1
. xi i.foreign*i.make
_Iforeign_0-1
i.make
_Imake_1-74
i.for~n*i.make
_IforXmak_#_#
(coded as above)
i.foreign
. xi i.foreign*mpg
_Iforeign_0-1
i.foreign
_IforXmpg_#
i.foreign*mpg
Freq.
Percent
Cum.
------------+----------------------------------Domestic |
52
70.27
70.27
Foreign |
22
29.73
100.00
------------+----------------------------------Total |
74
100.00
In the above, the variable make is the make of car, while make1
is now a set of numerical values representing each make of car.
The data looks like the follow
In the first picture, the original make variable appears. The
variable is red in text colour to indicate it is a string variable. The
second picture shows the make1 variable, which appears to
be make names of cars. However, the blue text colour indicates
the make1 variable an underlying numerical value. The decode
command creates a string variable from a numeric variable.
29
30
4. Longitudinal formats
Longitudinal data contain multiple observations for a series of
individuals, firms or countries, among other things Stata has
two ways to handle such data: wide format and long format. To
illustrate, we consider data across multiply years for a set of firms.
In the wide format, the data has one firm identifier variable. The
other variables may have one value or a value for each year of
observation. The variable name for multi-year variables is varname
year. In the long format, each firm has an identification variable
and a variable to indicate the year of observation.
The pictures in Figures 8 illustrate these two formats. For the
second figure, we have a variable, profit, that varies by year
and a variable, naics, that does not. Thus, we have a yearly
profit variable in the wide format and only one naics variable.
The reshape command converts data from wide to long format
and vice versa. To convert from wide to long, we have
. reshape long profit, i(id) j(year)
(note: j = 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011)
Data
wide
->
long
----------------------------------------------------------------------------Number of obs.
Number of variables
->
16
->
14
4
->
year
->
profit
xij variables:
profit1998 profit1999 ... profit2011
-----------------------------------------------------------------------------
long
->
wide
----------------------------------------------------------------------------Number of obs.
Number of variables
j variable (14 values)
14
->
->
16
year
->
(dropped)
profit
->
xij variables:
-----------------------------------------------------------------------------
31
32
Figure 8: Reshape
Single quotes around macro name are used to assess information in a local. After defining the local var we can now use it
any Stata command requiring a variable list
. describe var
storage
variable name
type
display
value
format
label
variable label
---------------------------------------------------------------------------make
str18
%-18s
price
int
%8.0gc
foreign
byte
%8.0g
Car type
type
display
value
format
label
variable label
---------------------------------------------------------------------------foreign
byte
%8.0g
origin
Car type
33
34
(b) Globals
Global macros are accessed globally across multiple Stata do
files or throughout a Stata session. Defining a global is similar
to defining a local. However, a $ followed by the global macro
name calls the macro
. global var1 "make foreign"
. describe $var1
storage
variable name
type
display
value
format
label
variable label
---------------------------------------------------------------------------make
str18
%-18s
foreign
byte
%8.0g
Car type
Statas macro notation implies global and local macros can have
the same name. Further, a variable and macro can have the same
name. However, the user must also keep track of macro definitions.
2. Scalars
Scalar stores a number or string.
. scalar c=5
. scalar b = "2 plus 3 is"
. display b
2 plus 3 is
. display c
5
local i 1
while i<= 5 {
command
command
...
local i = i+1
}
The first line sets the initial value for the macro i. The second
line provides the condition to execute a series of commands.
The second line must end with {, which opens the loop
and allows the series of commands to be feed. The next lines
provides the commands to be executed. The local i = i+1
updates the i value. The last line closes the while loop with
}. The while loop must be open and closed to work, so a {
must have a corresponding }. For example, the while loop
. local i 1
. while i<=3 {
2. display i
3. local i = i+1
4. }
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------mpg |
52
19.82692
Variable |
Obs
Mean
4.743297
Std. Dev.
12
34
Min
Max
-------------+-------------------------------------------------------mpg |
22
24.77273
6.611187
14
41
35
36
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------mpg |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------mpg |
74
21.2973
Variable |
Obs
Mean
5.785503
Std. Dev.
12
41
Min
Max
-------------+--------------------------------------------------------
foreign |
74
.2972973
.4601885
Min
Max
and
. foreach t in 1 2 {
2. sum mpg if foreign==t
3. }
Variable |
Obs
Mean
Std. Dev.
-------------+-------------------------------------------------------mpg |
22
24.77273
Variable |
Obs
Mean
6.611187
Std. Dev.
14
41
Min
Max
-------------+-------------------------------------------------------mpg |
4. Matrices
Matrices store numerical values. Stata provides for two ways to
handle matrices: (i) Built-in matrix language in Stata; and (ii) Mata
- a self contained matrix programming language.
(a) Built-in matrix language in Stata
Statas matrix language provides a method to complete matrix
calculations necessary for estimators or store results. Let us
consider the following
. matrix c=(1,2\3,4\5,6)
. matrix list c
c[3,2]
c1
c2
r1
r2
r3
. display c
type mismatch
r(109);
. scalar b=c[2,2]
. display b
4
37
38
: y=st_data(.,("price"))
: beta_hat = (invsym(X*X))*(X*y)
: e_hat = y - X * beta_hat
: s2 = (1 / (rows(X) - cols(X))) * (e_hat * e_hat)
: V_ols = s2 * invsym(X*X)
: se_ols = sqrt(diagonal(V_ols))
: beta_hat
1
+----------------+
1 |
-220.1648801
2 |
43.55851009
3 |
10254.94983
+----------------+
: se_ols
1
+---------------+
1 |
65.59262431
2 |
88.71884015
3 |
2349.08381
+---------------+
: end
---------------------------------------------------------------------------. reg price mpg trunk
Source |
SS
df
MS
Number of obs =
-------------+-----------------------------Model |
141126459
70563229.4
Residual |
493938937
71
6956886.44
F(
-------------+-----------------------------Total |
635065396
73
2,
71) =
Prob > F
R-squared
Adj R-squared =
8699525.97
Root MSE
74
10.14
0.0001
0.2222
0.2003
2637.6
-----------------------------------------------------------------------------price |
Coef.
Std. Err.
P>|t|
-------------+----------------------------------------------------------------
39
40
mpg |
-220.1649
65.59262
-3.36
0.001
-350.9529
-89.3769
trunk |
_cons |
43.55851
88.71884
0.49
0.625
-133.3418
220.4589
10254.95
2349.084
4.37
0.000
5571.01
14938.89
------------------------------------------------------------------------------
creates matrix x where the ith row is the ith observation of mpg
and trunk variables.
Stata commands are accessible when in Mata using the stata()
command. For example
: stata("summarize price")
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------price |
74
6165.257
2949.496
3291
15906
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------price |
. return list
74
6165.257
2949.496
3291
15906
scalars:
r(N) =
74
r(sum_w) =
74
r(mean) =
6165.256756756757
r(Var) =
8699525.974268789
r(sd) =
2949.495884768919
r(min) =
3291
r(max) =
15906
r(sum) =
456229
. sum mpg,d
Mileage (mpg)
------------------------------------------------------------Percentiles
1%
Smallest
12
12
5%
14
12
10%
14
14
Obs
74
25%
18
14
Sum of Wgt.
74
50%
20
Mean
Largest
21.2973
Std. Dev.
5.785503
75%
25
34
90%
29
35
Variance
33.47205
95%
34
35
Skewness
.9487176
99%
41
41
Kurtosis
3.975005
. return list
scalars:
r(N) =
74
r(sum_w) =
74
r(mean) =
21.2972972972973
r(Var) =
33.47204738985561
r(sd) =
5.785503209735141
r(skewness) =
.9487175964588155
r(kurtosis) =
3.97500459645325
r(sum) =
1576
r(min) =
12
r(max) =
41
r(p1) =
12
r(p5) =
14
r(p10) =
14
r(p25) =
18
41
42
r(p50) =
20
r(p75) =
25
r(p90) =
29
r(p95) =
34
r(p99) =
41
. dis r(p10)
14
SS
df
MS
Number of obs =
-------------+-----------------------------Model |
Residual |
7.8589e+14
7.8589e+14
2.1656e+15 13998
1.5471e+11
F(
-------------+-----------------------------Total |
2.9515e+15 13999
1, 13998) = 5079.93
Prob > F
R-squared
Root MSE
0.0000
0.2663
Adj R-squared =
2.1083e+11
14000
0.2662
3.9e+05
-----------------------------------------------------------------------------profit |
Coef.
Std. Err.
P>|t|
-------------+---------------------------------------------------------------revenue |
_cons |
.0657316
.0009222
71.27
0.000
.0639239
.0675393
-6042.726
3466.723
-1.74
0.081
-12837.96
752.5133
14000
1
13998
5079.92696289744
e(r2) =
.2662724819513582
e(rmse) =
393325.8937651275
e(mss) =
785891415003325.8
e(rss) =
e(r2_a) =
2165564211368496
e(ll) =
e(ll_0) =
-200217.6525015677
e(rank) =
.266220065354841
-202384.9753383144
macros:
e(cmdline) : "regress profit revenue"
e(title) : "Linear regression"
e(marginsok) : "XB default"
e(vce) : "ols"
e(depvar) : "profit"
e(cmd) : "regress"
e(properties) : "b V"
e(predict) : "regres_p"
e(model) : "ols"
e(estat_cmd) : "regress_estat"
matrices:
e(b) :
1 x 2
e(V) :
2 x 2
functions:
e(sample)
. matrix list e(b)
e(b)[1,2]
y1
revenue
_cons
.06573159
-6042.7258
_cons
8.505e-07
-.90726892
12018166
The above example shows that Stata temporarily stores the estimated coefficients and their variance-covariance matrix as matrices.
2. Using and storing results in Stata
The temporary r-class or e-class results can be stored longer
43
44
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------price |
52
6072.423
Variable |
Obs
Mean
3097.104
Std. Dev.
3291
15906
Min
Max
-------------+-------------------------------------------------------price |
22
6384.682
2621.915
3748
. dis rm0
6072.4231
c2
r1
6072.4231
6384.6818
r2
3097.1043
2621.9151
12990
The egen command does not replace the active data with new data,
but creates a new variable. The new variable is a summary statistic
for a specified variable. For all three commands, the by option
allows statistics to be generated across groups.
45
Introduction
The validity of the OLS estimator and any inference based on this
estimator relies on the following assumptions. These assumptions
provide the data-generating process for a n 1-vector of random
variables y, and a n K-matrix of characteristics X:
Assumption A:
(A1) The true model is y = X + .
(A2) X is a nonstochastic and finite n K matrix such that n K.
(A3) X > X is nonsingular.
(A4) E () = 0
(A5) var() = E > = 2 In , 2 < .
(A6) N (0, var ()).
We now discuss the implications of violations of each of these assumptions and methods to handle these violations within Stata.
Non-Spherical Disturbances
Non-spherical disturbances refers to violation of the classical assumption A5. In particular, let us replace A5 by:
(A5 ) E i j = ij , ii i2 > 0, and max ij < +.
1i,jn
48
12 12 . . .
21 22 . . .
=
..
..
..
.
.
.
n1
n2
...
(1)
1n
2n
..
.
n2
12
0
..
.
0
0
22
..
.
0
...
...
..
.
0
0
..
.
n2
...
( 0 )
( 1 )
..
.
( n 1 )
( 1 )
( 0 )
..
.
( n 2 )
...
...
..
.
...
( n 1 )
( n 2 )
..
.
( 0 )
E[t ] = 0,
E[t2 ] = 2 < , and
E[t ] = 0, for t 6= .
E[ t ] = < , t
E[( t )( ts )] = (s) , t.
2
=
1
..
.
1
..
.
...
...
..
.
n 1
n 2
...
n 1
n 2
..
.
1
2 .
(2)
Example 3 (Panel Data with Random Effects) Let {{yit }tT=1 , { xit> }tT=1 }in=1
represent Panel Data. Another potential model for the response yit is a Random Effect model:
yit = xit> + uit
uit = i + it , for i = 1, . . . , n and t = 1, . . . , T,
.
.
.
..
..
..
2
2 + 2 . . .
.
.
..
..
..
.
2
0
...
...
...
2
2
2
2
. . . +
0
...
...
...
.
..
=
0
0
...
0
0
0
...
.
.
.
..
..
..
0 2 + 2
2
...
.
.
.
.
.
.
.
.
2
2
2
.
.
.
.
+ . . .
..
..
..
..
..
..
..
.
.
.
.
.
.
0
=
=
2 InT
2 InT
+ 2 [ In T >
T]
2
+ TV, where V
...
0
..
.
0
0
0
2
..
.
2
2 + 2
= T 1 [ In T >
T ].
Estimation
In this section we discuss estimation strategies in the presence of
non-spherical disturbances. The discussion is somehow informal, and
we should try to strike understanding rather than preciseness.
In what follows, we maintain assumptions A1A4, A5 , and A6.
1 Maximum Likelihood Estimation (MLE)
Assumptions A1A4, A5 , and A6 imply that the joint density for
is a multivariate normal5 , i.e.
1/2
1
1
f () = (2 )n/2 2
exp > 2
.
2
After noticing that 2 = 2n ||, the log-likelihood of observa
n
tions yi , xi> i=1 is
l ( , 2 y, X ) = log f ( y X| , 2 )
n
n
1
= log (2 ) log 2 log ||
2
2
2
1
> 1
2 (y X) (y X).
2
49
50
"
#
l ( ,2 |y,X )
1
> 1 y X > 1 X )
2 (X
2
l ( , y, X ) =
=
.
l ( ,2 |y,X )
2n2 + 2(12 )2 (y X)> 1 (y X)
2
2
If the second-order conditions are satisfied, solving l ( bMLE , b
MLE
y, X ) =
0 gives
bMLE = ( X > 1 X )1 X > 1 y
2
b
MLE
1
= (y X bMLE )> 1 (y X bMLE ).
n
(3)
(4)
= P> P, and
= ( P> P)
= P 1 ( P > ) 1 .
(5)
51
(6)
with variance
var( bGLS ) = 2 (( X )> X )1
= 2 ( X > P> PX )1
= 2 ( X > 1 X ) 1 .
(7)
H0 : R = c,
results in the test statistic 7
1
1
>
s2GLS J
( R bGLS c)> ( R( X > X )1 R )1 ( R bGLS c),
2
>
bGLS = xi x
i =1
i2 xi yi .
i =1
0
...
0
..
1 + 2 . . .
.
1
.
.
.
..
..
..
= 0
0
.
..
.
2
.
.
1 +
0
...
0
(8)
52
and
P=
1 2
1
..
.
0
..
.
0
...
0
..
.
..
.
..
.
0
0
..
.
0 .
0
1
...
..
Therefore,
p
y = Py =
1 2 y1
y2 y1
..
.
y T y T 1
, x = Px =
(1 2 x1> )1/2
x2> x1>
..
.
x T> x T>1
(9)
...
y1T y1
...
yn1 yn
...
ynT yn
i>
= W + V.
In other words, model 5 becomes
yit (1 ) yi = [ xit> (1 ) xi> ] + { it (1 ) i } .
(10)
Feasible GLS
All the above results requires knowledge of to be fully operational. In reality, we are hardly blessed with such information and
therefore GLS cannot be computed. Instead, we always replace
b (a consistent estimator9 of ). Thus an estimator of b is:
by
b 1 X ) 1 X >
b 1 y.
bFGLS = ( X >
(11)
Claim 1 Let assumptions A1, A2*, A3, A4*, A5 hold. Then a sufficient condition for bFGLS and bGLS to have the same asymptotic distribution is
p
b ) X
n 1 X > (
0, and
p
b )
n1/2 X > (
0.
53
54
(12)
yi = xi> + i + i ,
(13)
>
yit yi = ( xit xi ) + ( it i ),
(14)
and it should be clear that the fitted residuals from equation 14 are
informative about {{ it }tT=1 }in=1 only, while the fitted residuals from
equation 13 is informative about both, {i }in=1 , and {{ it }tT=1 }in=1 .
Therefore, again, the following algorithm can be shown to provide a
consistent estimator:
Algorithm 3 (Feasible Estimation: Random Effects)
(a) Regress yit on xit> using fixed effect estimator, i.e. regress yit yi
on xit xi in 14 by least squares. Then obtain the fitted residuals, and
construct
b
2 = (nT n K )1
i =1 t =1
(b) Regress yit on xit> using between estimator, i.e. regress yi on xi> in
13 by least squares. Then, obtain the fitted residuals, and construct10
b
2 = (n K 1)1
(c) Construct
s
b =
[yi xi> bB ]2 T b2
i =1
b
2
,
b
2 + Tb
2
10
Notice that b
2 can be negative, in
which case, it is common practice to set
b
2 = 0.
= .
Inefficient:
It can be shown that
var( b) = 2 ( X > X )1 X > X ( X > X )1 .
(15)
Qn n1 X > X Q > 0,
it can be shown11 that
11
b .
Corrections to OLS:
Since OLS is still consistent and unbiased, we could still use it
in a given empirical study, provided we are willing to accept the
c ( b) = s2 ( X > X )1 is not longer a
loss of efficiency. However, var
consistent estimator of var( b). The following procedures can be
implemented:
(a) Heteroskedasticity (Eicker-White): Under quite general conditions, White [1980] showed that
n 1
i =1
55
56
where {b
i }in=1 are the OLS fitted residuals. Therefore, a consistent estimator of 15 is
c ( b) = ( X > X )1 X > diag(b
var
21 , . . . , b
2n ) X ( X > X )1 .
Example 4 Suppose we have found evidence of heteroskedasticity and
would like to correct the OLS standard errors. There are two ways to
correct the OLS standard errors in Stata: (i) regress command plus
the robust option; or (ii) regress command plus the vce(robust)
option. The following shows both option:
. reg profit revenue emp
Source |
SS
df
MS
Number of obs =
-------------+-----------------------------Model |
Residual |
8.7206e+14
F(
4.3603e+14
Prob > F
0.0000
2.0632e+15 13161
1.5677e+11
R-squared
0.2971
-------------+-----------------------------Total |
13164
2, 13161) = 2781.37
2.9353e+15 13163
Adj R-squared =
2.2299e+11
Root MSE
0.2970
4.0e+05
-----------------------------------------------------------------------------profit |
Coef.
Std. Err.
P>|t|
-------------+---------------------------------------------------------------revenue |
.0895376
.0013408
66.78
0.000
.0869093
.0921658
emp |
-10.55602
.424665
-24.86
0.000
-11.38842
-9.72361
_cons |
1516.313
3619.056
0.42
0.675
-5577.559
8610.184
Number of obs =
F(
2, 13161) =
13164
158.28
Prob > F
0.0000
R-squared
0.2971
Root MSE
4.0e+05
-----------------------------------------------------------------------------|
profit |
Robust
Coef.
Std. Err.
P>|t|
-------------+---------------------------------------------------------------revenue |
.0895376
.0050334
17.79
emp |
_cons |
-10.55602
1.192198
-8.85
1516.313
2450.219
0.62
0.000
.0796714
.0994037
0.000
-12.89289
-8.219136
0.536
-3286.469
6319.094
Linear regression
Number of obs =
F(
2, 13161) =
57
13164
158.28
Prob > F
0.0000
R-squared
0.2971
Root MSE
4.0e+05
-----------------------------------------------------------------------------|
Robust
profit |
Coef.
Std. Err.
P>|t|
-------------+---------------------------------------------------------------revenue |
.0895376
.0050334
17.79
0.000
.0796714
.0994037
emp |
-10.55602
1.192198
-8.85
0.000
-12.89289
-8.219136
_cons |
1516.313
2450.219
0.62
0.536
-3286.469
6319.094
------------------------------------------------------------------------------
12
ST = tT=1b
2t xt xt> + s=1 tT=s+1 w p (s) b
tb
ts [ xt xt>s + xts xt> ],
where w p (s) are weights, i.e. Barlett Weights w p (s) = 1
s/ ( p + 1). The implementation of this estimator requires us to
select13 p.
13
(c) Random Effects: To obtain the correct standard errors for the
OLS estimates, the cluster(varname) or vce(cluster varname)
options for the regress command can be used. Both options
do exactly the same correction to the OLS standard errors. The
cluster option adjusts standard errors to allow for within
group correlation. The variable given by varname provides the
categorical or group variable by which to cluster observations.
For the random effects model, the within group correlation
is at the individual or firm level, so the cluster option uses
a firm or individual identifier variable. However, alternative
random effect correlation structures are possible. For example,
firm within an industry may experience some correlation. In
this situation, the cluster option uses an industry identifier
variable.
. reg profit revenue emp, cluster(id)
Linear regression
Number of obs =
13164
F(
41.97
2,
2857) =
Prob > F
0.0000
58
R-squared
0.2971
Root MSE
4.0e+05
Robust
profit |
Coef.
Std. Err.
P>|t|
-------------+---------------------------------------------------------------revenue |
.0895376
.0098886
9.05
0.000
.070148
.1089272
emp |
-10.55602
1.96084
-5.38
0.000
-14.40082
-6.71121
_cons |
1516.313
4771.025
0.32
0.751
-7838.687
10871.31
Number of obs =
F(
2,
13164
199) =
16.58
Prob > F
0.0000
R-squared
0.2971
Root MSE
4.0e+05
Robust
Coef.
Std. Err.
P>|t|
-------------+---------------------------------------------------------------revenue |
.0895376
.0156597
5.72
0.000
.0586573
.1204179
emp |
-10.55602
2.571695
-4.10
0.000
-15.62729
-5.484744
_cons |
1516.313
6387.663
0.24
0.813
-11079.88
14112.51
------------------------------------------------------------------------------
Heteroskedasticity
The choice of the most appropriate test for heteroskedasticity is determined by how explicit we want to be about the form of heteroskedas-
ticity. In general, the more explicit we are, the more powerful the test
will be.
Breusch-Pagan/Godfrey Test
This is a Lagrange Multiplier test for the hypothesis
H0
HA
:
:
i2 = 2 ,
i2 = 2 f 0 + > zi .
1
2
where b
= n i =1 b
i . The resulting test is asymptotically chisquared with degrees of freedom equal to the number of variables in
zi under H0 .
Example 5 Suppose we are interested to test for the presence of heteroskedasticity in our Canadian firm level data. In Stata, the Breusch-Pagan
test is a post regression diagnostics test implemented with the following set
of commands:
. quietly reg profit revenue emp
. estat hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of profit
chi2(1)
= 93143.73
0.0000
. estat hettest,iid
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of profit
chi2(1)
81.72
0.0000
59
60
105.20
0.0000
The iid option on the estat hettest command calculates the nR2
from the auxiliary version of the test. The estat hettest command
also allows for choice of variables, z, affecting the heteroskedasticity.
Without specifying any variables, the default is to perform the test
with fitted values as the dependent variable serving as the z variable.
White Test
This test does not require additional structure on the alternative
hypothesis. Specifically, it tests the general hypothesis of the form
H0
HA
:
:
i2 = 2 ,
Not H0 .
A simple operational version of this test is carried out by computing nR2 in the regression of b
2i on a constant and all (unique) first
moments, second moments, and cross-products of the original regressors. The test is asymptotically distributed as chi-squared with
degrees of freedom equal to the number of regressors in the auxiliary
regression, excluding the intercept.
Example 6 We now apply White Test instead. Besides manually calculating the test statistic, Stata allows two ways to perform the White test.
The first method uses the estat imtest with the white option specified.
The second method uses estat hettest with the iid option and letting z
variables in the auxiliary regression equal all the right-hand side variable
along with their products and cross products:
. gen rev2=revenue*revenue
. gen emp2=emp*emp
. gen revemp=revenue*emp
. quietly reg profit revenue emp
. estat imtest, white
Whites test for Ho: homoskedasticity
against Ha: unrestricted heteroskedasticity
chi2(5)
131.21
0.0000
61
chi2
df
---------------------+----------------------------Heteroskedasticity |
131.21
0.0000
Skewness |
-2.15e+09
1.0000
Kurtosis |
-4.78e+25
1.0000
---------------------+----------------------------Total |
-4.78e+25
1.0000
131.21
0.0000
Other tests can also be found in the literature, e.g. see Johnston
and DiNardo [1997, Sections 6.2.3 and 6.2.4, pg. 168-169].
Serial Correlation
In order to illustrate these tests, we will fabricate a data set14
{yt , xt }36
t=1 , where
t = t1 + ut ,
xt = t + t1 ,
yt = 0 + xt 1 + t ,
36
with {ut }36
t=1 , and { t }t=1 are i.i.d. N (0, 1), 0 = 0, 1 = 1, = 0.9,
and = 0.9.
The code to generate this data set is:
clear
set obs 37
gen t=_n-1
14
62
63
d=
t b
t 1 )
tT=2 (b
,
T
2t
t =2 b
SS
df
MS
15
Number of obs =
-------------+------------------------------
F(
1,
34) =
36
16.57
Model |
109.129232
109.129232
Prob > F
0.0003
Residual |
223.877824
34
6.58464187
R-squared
0.3277
-------------+-----------------------------Total |
333.007055
35
Adj R-squared =
9.51448729
Root MSE
0.3079
2.5661
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
P>|t|
-------------+---------------------------------------------------------------x |
_cons |
1.320137
.3242759
4.07
0.000
.6611294
1.979145
.2189512
.4313634
0.51
0.615
-.6576847
1.095587
2,
36) =
3.790313
64
Breusch-Godfrey Test
This is a Lagrange multiplier test of
H0 : No Autocorrelation
H A : { t }tT=1 AR ( p) , or { t }tT=1 MA ( p) .
Again, suppose that in the model yt = xt> + t one suspect that the
disturbance follows an AR (1) process, namely
t = t1 + ut .
Stata calculates the Breusch-Godfrey test statistic with the post
regress estimation command estat bgodfrey. Alternatively, this
Lagrangian multiplier test can be applied as follow:
Algorithm 4 Step 1 Regress {yt }tT=1 on { xt> }tT=1 by OLS, and obtain the
fitted residuals {b
t }tT=1 .
T
Step 2 Regress {b
t }tT=1 on 1, xt> , b
t1 t=2 and find R2 .
Step 3 Then, TR2 2(1) .
H0
. quietly reg y x
. estat bgodfrey
Breusch-Godfrey LM test for autocorrelation
--------------------------------------------------------------------------lags(p)
chi2
df
-------------+------------------------------------------------------------1
30.871
0.0000
65
16
Remark 10 A remarkable feature of this test is that it also tests against the
alternative hypothesis of an MA ( p) process for the disturbance.
Remark 11 See Greene [2012, Section 20.7.1, pg. 922] and Johnston and
DiNardo [1997, Section 6.6.4, pg. 185] for more information.
Other popular tests are Box and Pierces test and Ljungs refinement17 , e.g. Greene [2012, Section 20.7.2, pg. 922-923] and Johnston
and DiNardo [1997, Section 6.6.5, pg. 187].
Further Reading
Chapter 9, and Sections 20.120.9 (pg. 903-930) in Greene [2012].
Chapter 6 in Johnston and DiNardo [1997].
Sections 1.6, 2.52.8, and 2.10 in Hayashi [2000].
Chapters 27, and 28 in Goldberger [1991].
Section 7 in Davidson and MacKinnon [2004].
17
Caution: These tests are designed to
t , the effect
work with t , and not with b
of having estimated residuals rather
than the series itself is unknown.
18
You may have already been familiar
with the 0.5-th quantile, i.e. the median.
19
68
Quantile Regression
In the same way the sample mean solves the problem
minin=1 (yi )2
R
1.2
8
iyi
1.4
10
iyi 2
RK
1.6
12
1.0
6
0.8
minin=1 (yi ) ,
R
0.6
2
0.4
RK
Given a sample {yi , xi0 }in=1 from the joint density f yx = f y|x f x ,
where x X Rk . Suppose we specify the th conditional quantile
function as
Q y ( |x ) = x0 0 ( ).
69
20
Q( ) := E[ (y x0 )],
where (u) := u( I(u < 0)) is the check function introduced
earlier on, we have also used the short-hand notation := ( ) and
0, := 0 ( ). The quantile regression estimator of 0, is defined as
b := arg min En [ (y x0 )].
21
Because
(17)
22
Notice that
E[s( )|x] = x[ E[I(y x0 < 0)|x]]
23
70
Remark 15 If u is independent of x, then Asy. Var( n(b 0, )) simplifies to (1 ) f u2 (0)E[xx0 ]. In any case, the construction of asymptotically
consistent standard errors is cumbersome because of the presence of the
unknown (conditional) density of u (given x). See Wooldridge [2010, pp.
456-457] for an accessible account.
Remark 16 The coefficient estimates b have interpretation (almost) like
those in least squares regression.24
24
http://www.ats.ucla.edu/stat/
stata/faq/quantreg.htm.
0.257
0.386
0.277
0.149
0.046
0.055
0.047
0.060
totchr |
0.445
0.459
0.394
0.374
0.018
0.022
0.018
0.022
age |
0.013
0.016
0.015
0.018
0.004
0.004
0.004
0.005
female |
-0.077
-0.016
-0.088
-0.122
0.046
0.054
0.047
0.060
white |
0.318
0.338
0.499
0.193
|
_cons |
0.141
0.166
0.143
0.182
5.898
4.748
5.649
6.600
0.296
0.363
0.300
0.381
-----------------------------------------------------legend: b/se
71
Introduction
We assume that there is an unobserved (possibly vector-valued) latent
variable y that we wish to model based on an observed vector-valued
set of covariates x. An observed (possibly vector-valued) response y is
observed instead, that can be written as
y = ( y ),
25
For scalar y :
( u ) = I( u > 0)
(u) = max{u, 0}
(u) = u iff u > 0
( u1 , u2 ) = u2 I( u1 > 0)
( u1 , u2 ) = I( u1 > 0)I( u2 > 0)
(u1 , u2 ) = max{u2 , 0}I(u1 > 0)
(u1 , u2 ) = max{u1 , u2 }
74
vate insurance coverage (ins) from many sources. Explanatory variables include self-retirement status (retire) and spouse retirement
status (sretire), age (age), self-assessed health-status (hstatusg) reported as good, very good or excellent, household income (hhincome),
years of education (educyear), marry dummy (married), ethnicity dummies (hisp and white), gender dummy (female), activities
of daily living (adl) and the total number of chronic conditions
(chronic).
y = (y ) :=
0
y > 0;
y 0.
27
= Pr {y = 1|x} .
The log-likelihood function is
n
Ln ( ) := ln L( |y, X) =
h
io
yi ln[1 F|x (xi0 )] + (1 yi ) ln F|x (xi0 ) .
i =1
Ln ( ) =
1
n
ln
F|x qi xi0
i
i =1
Therefore
1
Ln ( )
=
qi f |x qi xi0
1
F q x 0 xi = : n
i i
|x
i =1
n
i =1
(yi , xi0 )
1
n ( )
= xi
.
0
n i =1
0
2 L
Once one chooses a specific functional form for F|x , the (C)ML
estimator can be defined as28
75
2 L n ( )
Asy. Var( n( b(C)ML )) = E
0
1
.
Name
Probit
Logit
Linear
Gumbel
Cloglog
F|x (e)
(
t
)
dt
:= (e)
exp
(
e
)
/
[
1
+
exp(e)] := (e)
e<0
0
e
0e<1
1
e1
exp[ exp(e)]
1 exp[ exp(e)]
Re
.
. *--Data preparation
. global xlist retire age hstatusg hhincome educyear married hisp
. generate linc = ln(hhinc)
(9 missing values generated)
.
. *--Estimating & Comparing various models
. quietly probit ins $xlist
. estimates store bprobit
. quietly logit ins $xlist
. estimates store blogit
. quietly cloglog ins $xlist
. estimates store bcloglog
. estimates table bprobit blogit bcloglog, t stats(N ll) b(%7.3f) stfmt(%8.2f)
----------------------------------------------Variable | bprobit
blogit
bcloglog
-------------+--------------------------------retire |
0.118
0.197
0.136
2.31
2.34
2.10
age |
-0.009
-0.015
-0.010
-1.29
-1.29
-1.19
hstatusg |
0.198
0.312
0.250
3.56
3.41
3.39
hhincome |
0.001
0.002
0.001
3.19
3.02
2.75
educyear |
0.071
0.114
0.089
8.34
8.05
8.23
married |
0.362
0.579
0.464
6.47
6.20
6.14
hisp |
-0.473
-0.810
-0.726
-4.28
-4.14
-4.14
_cons |
-1.069
-1.716
-1.735
-2.33
-2.29
-3.01
76
-------------+--------------------------------N |
3206
3206
3206
ll | -1993.62
-1994.88
-2001.45
----------------------------------------------legend: b/t
Remark 17 For testing linear or non-linear hypotheses about the coefficients, standard W, LM or LR test are handy.
Remark 18 If |x has a logistic distribution, 2|x = 2 /3, and if it has a
standard normal distribution, 2|x = 1, the logit estimates of should be
Marginal Effects
Let x := [xc , xd ]0 , where xc and xd denote the sub-vectors
containing the continuous and discrete elements of x respectively.
Let c and d be defined accordingly. Then
Greene writes: But there is no guarantee that the [(C)ML] will converge to
anything interesting or useful. Simply
computing a robust covariance matrix for
an otherwise inconsistent estimator does not
give it redemption. Consequently, the virtue
of a robust covariance matrix in this setting
is unclear.
29
Pr {y = 1|x}
E[y|x]
=
= f |x (x0 ) c ,
xc
xc
E[y|x] = E[y|xc , xd + ] E[y|xc , xd ]
n
o
n
o
= Pr y = 1|xc , xd + Pr y = 1|xc , xd .
b [y|x] E[y|x]))
Asy. Var( n(E
o 0
o
n
n
f |x (x0 )
f |x (x0 )
Asy. Var( n( b ))
,
=
b [y|x] E[y|x]))
Asy. Var( n(E
0
E[y|x]
E[y|x]
=
Asy. Var( n( b ))
.
E[y|x]
= f |x (x0 )
xc
xd
,
E[y|x]
= f |x (xc0 c (xd + )0 d )
c
x
f |x (x0 )
.
xd
xd
xc
+
. *------------------------------*
. * Calculating Marginal Effects *
. *------------------------------*
.
. *--Note: "i." operator indicates finite-difference method
.
. *--Marginal Effect @ a representative value
. quietly probit ins i.retire age i.hstatusg hhincome educyear i.married i.hisp
. margins, dydx(*) at (retire=1 age=75 hstatusg=1 hhincome=35 educyear=12 marrie
> d=1 hisp=0) noatlegend
Conditional marginal effects
Number of obs
Model VCE
: OIM
Expression
: Pr(ins), predict()
3206
Delta-method
dy/dx
Std. Err.
P>|z|
-------------+---------------------------------------------------------------1.retire |
.0460306
.0196708
2.34
0.019
.0074767
.0845846
age |
-.0034912
.0026889
-1.30
0.194
-.0087614
.0017789
1.hstatusg |
.076091
.020875
3.65
0.000
.0351767
.1170053
hhincome |
.0004853
.0001516
3.20
0.001
.0001882
.0007825
educyear |
.0278473
.0033257
8.37
0.000
.021329
.0343656
1.married |
.1355362
.0199649
6.79
0.000
.0964057
.1746667
1.hisp |
-.1728396
.0365783
-4.73
0.000
-.2445318
-.1011474
-----------------------------------------------------------------------------Note: dy/dx for factor levels is the discrete change from the base level.
. mata: dydx_probit_MER = st_matrix("r(b)")
. quietly logit ins i.retire age i.hstatusg hhincome educyear i.married i.hisp
. quietly margins, dydx(*) at (retire=1 age=75 hstatusg=1 hhincome=35 educyear=1
> 2 married=1 hisp=0) noatlegend
. mata: dydx_logit_MER = st_matrix("r(b)")
. quietly cloglog ins i.retire age i.hstatusg hhincome educyear i.married i.hisp
. quietly margins, dydx(*) at (retire=1 age=75 hstatusg=1 hhincome=35 educyear=1
> 2 married=1 hisp=0) noatlegend
. mata: dydx_cloglog_MER = st_matrix("r(b)")
. mata: dydx_MER = dydx_probit_MER, dydx_logit_MER, dydx_cloglog_MER
. mata: dydx_MER[(2,3,5,6,7,9,11),1..3]
1
+----------------------------------------------+
77
78
1 |
.046030642
.0475583906
.0424805527
2 |
-.0034912069
-.0035828756
-.0033310914
3 |
.0760909676
.0744857137
.0756515623
4 |
.0004853401
.000565483
.0002844106
5 |
.0278472908
.0280488975
.0284452284
6 |
.1355362013
.1331654045
.1326820986
7 |
-.1728395842
-.1794231966
-.1925148858
+----------------------------------------------+
31
y = x0 + w + ,
w = z0 + u,
!
"
0
,
|x, z N
0
1
u
u
u2
!#
.
.
+ ln
n i =1
u
u
Remark 20 Notice that all the parameters are identified by the form of the
likelihood.
Theorem 5 If
z1
N
z2
11
21
1
2
12
22
,
,
1 n
p
Ln ( , , e
, e
u , ) := ln (2yi 1)
.
2
n i =1
1
The corresponding 2-Step estimators of , and will maximize Ln ( , , e
, e
u , ).
. *------------------------------*
. *
Endogenous Regressors
*
. *------------------------------*
. global xlist female age age2 educyear married hisp white chronic adl hstatusg
.
Probit regression
Number of obs
3197
Wald chi2(11)
366.94
0.0000
Pseudo R2
0.0946
-----------------------------------------------------------------------------|
ins |
Robust
Coef.
Std. Err.
P>|z|
-------------+---------------------------------------------------------------linc |
.3466893
.0402173
8.62
0.000
.2678648
.4255137
female |
-.0815374
.0508549
-1.60
0.109
-.1812112
.0181364
age |
.1162879
.1151924
1.01
0.313
-.109485
.3420608
age2 |
-.0009395
.0008568
-1.10
0.273
-.0026187
.0007397
educyear |
.0464387
.0089917
5.16
0.000
.0288153
.0640622
married |
.1044152
.0636879
1.64
0.101
-.0204108
.2292412
hisp |
-.3977334
.1080935
-3.68
0.000
-.6095927
-.1858741
white |
-.0418296
.0644391
-0.65
0.516
-.168128
.0844687
chronic |
.0472903
.0186231
2.54
0.011
.0107897
.0837909
adl |
-.0945039
.0353534
-2.67
0.008
-.1637953
-.0252125
hstatusg |
_cons |
.1138708
.0629071
1.81
0.070
-.0094248
.2371664
-5.744548
3.871615
-1.48
0.138
-13.33277
1.843677
-----------------------------------------------------------------------------.
Number of obs
3197
Wald chi2(11)
382.34
0.0000
-----------------------------------------------------------------------------|
|
Robust
Coef.
Std. Err.
P>|z|
-------------+---------------------------------------------------------------linc |
-.5338185
.3852354
-1.39
0.166
-1.288866
.221229
female |
-.1394069
.0494475
-2.82
0.005
-.2363223
-.0424915
age |
.2862283
.1280838
2.23
0.025
.0351887
.5372678
age2 |
-.0021472
.0009318
-2.30
0.021
-.0039736
-.0003209
educyear |
.1136877
.0237927
4.78
0.000
.0670549
.1603205
married |
.7058269
.2377729
2.97
0.003
.2398005
1.171853
hisp |
-.5094513
.1049488
-4.85
0.000
-.7151473
-.3037554
white |
.156344
.1035713
1.51
0.131
-.046652
.35934
chronic |
.0061943
.0275259
0.23
0.822
-.0477556
.0601441
79
80
adl |
-.1347663
.03498
-3.85
0.000
hstatusg |
_cons |
.2341782
-10.00785
-.2033259
-.0662067
.0709769
3.30
4.065795
-2.46
0.001
.0950661
.3732904
0.014
-17.97666
-2.03904
-------------+---------------------------------------------------------------/athrho |
.67453
.3599913
1.87
0.061
-.0310399
1.3801
/lnsigma |
-.331594
.0233799
-14.18
0.000
-.3774178
-.2857703
-------------+---------------------------------------------------------------rho |
.5879518
.2355468
-.0310299
.8809736
sigma |
.7177787
.0167816
.6856296
.7514352
-----------------------------------------------------------------------------Instrumented:
linc
Instruments:
E [Y | X = x ] = E [Y | x ] =
y j Pr[Y = y j |X = x].
j =1
ydFY | X .
33
82
(18)
The ATE describes the expected effect of treatment for an arbitrary observation i chosen at random from the population, while the ATT is
the mean effect for those that actually participate in the programme,
i.e. the Average Treatment Effect in the treated subpopulation.36
Let re78 represent real earnings in 1978, age is age in years, educ
is number of years of formal education, black and hisp are ethnicity
dummies, married is a marital status indicator, nodegr equals 1 if
the person does not have a high school diploma, while re# and u#
represent real earning and unemployment indicators for # {74, 75}.
Remark 22 Since for each observation i one only observes either y1i or y0i ,
but not both (missing data problem), the joint distribution F10 (y1 , y0 ) is
35
36
83
not identified. One can only identified the marginals F1 (y1 ) and F0 (y0 ).
It turns out that even when one can not identify F10 (, ), certain features
of it such as ATE can be identified under less restrictive conditions that
independence between y1 and y0 .
Relationship between ATE & ATT:
Let E[y1i ] =: 1 and E[y0i ] =: 0 , then by construction y1i =
1 + v1i , y0i = 0 + v0i , such that E[v1i ] = E[v0i ] = 0 and
y1i y0i = 1 0 + [v1i v0i ]
So by construction y1 = 1 (x) + v1 ,
y0 = 0 (x) + v0 , such that E[v1 |x] =
E[v0 |x] = 0 and
39
E[y1 y0 |x, d = 1]
84
(19)
40
Estimation:
b 1 (xi ) and m
b 0 (xi ) represent consistent estimators of m1 (x) and
If m
41
m0 (x), then using the entire random sample of size n one has by
the analogy principle
bATE = n1
bATT =
bATE = n1
i =1
(nj=1 d j )1
i =1
bATT = (nj=1 d j )1
i =1
i =1
. gen re7578=re78-re75
. gen re74sq=re74^2
. gen agesq=age^2
. gen educsq=educ^2
. global xlist age agesq educ educsq black hisp married nodegr re74 re74sq u74 u75
.
. teffects ra (re78 $xlist, linear) (treat), ate
Iteration 0:
EE criterion =
1.396e-20
Iteration 1:
EE criterion =
7.677e-24
Treatment-effects estimation
Number of obs
445
Estimator
: regression adjustment
Outcome model
: linear
85
Robust
Coef.
Std. Err.
P>|z|
-------------+---------------------------------------------------------------ATE
|
treat |
(1 vs 0)
1624.517
645.9279
2.52
0.012
358.5212
2890.512
-------------+---------------------------------------------------------------POmean
|
treat |
0
4546.023
341.5606
13.31
0.000
3876.577
5215.47
EE criterion =
1.396e-20
Iteration 1:
EE criterion =
4.622e-24
Treatment-effects estimation
Number of obs
Estimator
: regression adjustment
Outcome model
: linear
445
Robust
Coef.
Std. Err.
P>|z|
-------------+---------------------------------------------------------------ATET
|
treat |
(1 vs 0)
1815.461
702.8208
2.58
0.010
437.9571
3192.964
-------------+---------------------------------------------------------------POmean
|
treat |
0
4533.685
383.3647
11.83
0.000
3782.304
5285.066
------------------------------------------------------------------------------
86
(
x
)
E
x = 1 (x)
1
p (x)
p (x)
p (x)
p (x)
= 1 (x).
(1 d)E[y0 |x, d]
(1 d)y0
(1 d)y0
(1 d)y
x =E
x =E E
x, d x = E
E
x
1 p (x)
1 p (x)
1 p (x)
1 p (x)
(1 d)E[y0 |x]
1 d
1 E [ d |x]
=E
x = 0 (x)E
x = 0 (x)
1 p (x)
1 p (x)
1 p (x)
= 0 (x).
Recall
(1 d)y
(d p(x))y
dy
x
=
E
x , so
CATE (x) := 1 (x) 0 (x) = E
p (x) 1 p (x)
p(x)(1 p(x))
(d p(x))y
ATE := E[CATE (x)] = E
.
(20)
p(x)(1 p(x))
Now, notice that42
42
(d p(x))y
(d p(x))y0
=
+ d ( y1 y0 ).
1 p (x)
1 p (x)
Write
(d p(x))y
x = E [ d(y1 y0 )| x] , and
1 p (x)
(d p(x))y
E
= E [d(y1 y0 )] .
1 p (x)
(21)
Now
E [d(y1 y0 )] = E[E [ d(y1 y0 )| d]]
= E [ d(y1 y0 )| d = 1] Pr {d = 1} + E [ d(y1 y0 )| d = 0] Pr {d = 0}
= E [ (y1 y0 )| d = 1] Pr {d = 1} ,
so replacing this on the right-hand side of (21) one obtains
(d p(x))y
E
= E [ (y1 y0 )| d = 1] Pr {d = 1}
1 p (x)
(d p(x))y
ATT = E
, where := Pr {d = 1} .
[1 p(x)]
(22)
Estimation:
If pb(xi ) represents a consistent estimator of p(x), and we estimate
by b = n1 nj=1 d j , then using the entire random sample of size n
one has by the analogy principle
n
(di pb(xi ))yi
1
bATE = n
,
pb(xi )(1 pb(xi ))
i =1
n
(di pb(xi ))yi
1
bATT = n
.
b[1 pb(xi )]
i =1
. quie teffects ipw (re78) (treat $xlist, probit), atet
. estimate store ipwPROBIT
.
. quie teffects ipw (re78) (treat $xlist, logit), atet
. estimate store ipwLOGIT
.
. estimates table ipwPROBIT ipwLOGIT, b(%9.3f) p
-------------------------------------Variable | ipwPROBIT
ipwLOGIT
-------------+-----------------------ATET
|
r.treat |
1
1788.526
1785.963
0.0100
0.0101
-------------+-----------------------POmean
r.treat |
0
4560.619
4563.182
0.0000
0.0000
-------------+-----------------------TME1
|
age |
0.019
0.025
0.7350
0.7879
agesq |
-0.000
-0.000
0.7437
0.7991
educ |
-0.537
-0.861
0.0284
0.0252
educsq |
0.028
0.044
0.0393
0.0433
black |
-0.158
-0.258
0.4869
0.4819
hisp |
-0.524
-0.863
0.0951
0.0956
87
88
married |
0.134
0.220
0.4333
0.4282
nodegr |
-0.252
-0.409
0.2888
0.2871
re74 |
-0.000
-0.000
0.9310
0.9576
re74sq |
-0.000
-0.000
0.5947
0.6303
u74 |
-0.036
-0.070
0.8952
0.8748
u75 |
-0.273
-0.438
0.1501
0.1567
_cons |
2.598
4.252
0.0570
0.0686
-------------------------------------legend: b/p
yi
1
M
jJ M (i) y j
;
;
di = 1
di = 0
(
y0i =
Therefore43
1
M
yi
jJ M (i) y j
;
;
di = 1
di = 0
These estimators can achieve
the same precision of those previously
discussed only when the number of continuous covariates is equal to 1. When
the number of continuous covariates
is at least 3 the speed at which they
converge to the true values is slower
than those previously discussed. When
the number is 2 the estimates will be
biased.
43
bATE = n1
bATT = b1
(y1i y0i ),
i =1
n
di (y1i y0i ).
i =1
Number of obs
445
Estimator
: nearest-neighbor matching
Matches: requested =
Outcome model
: matching
min =
max =
-----------------------------------------------------------------------------|
re78 |
AI Robust
Coef.
Std. Err.
P>|z|
-------------+---------------------------------------------------------------ATE
|
treat |
(1 vs 0)
2062.063
734.4061
2.81
0.005
622.654
3501.473
------------------------------------------------------------------------------
Number of obs
Estimator
: propensity-score matching
Matches: requested =
Outcome model
: matching
min =
max =
445
-----------------------------------------------------------------------------|
re78 |
AI Robust
Coef.
Std. Err.
P>|z|
-------------+---------------------------------------------------------------ATE
|
treat |
(1 vs 0)
1765.745
673.3646
2.62
0.009
445.9744
3085.515
------------------------------------------------------------------------------
89
90
92
(23)
(24)
The condition 23 implies that E[(1, D (i, 1), t, D (i, t))> (i, t)] = 0, that
is, that the orthogonal equations in a simple regression model hold
for this linear model and therefore the parameters [, , , ]> will be
consistently estimated by a simple OLS regression of Y (i, t) on D (i, 1),
t, y D (i, t) (including a constant).
Why is it called DiD?:
Model 24 is called Difference-in-Difference because under assumption 23, we have
= { E[Y (i, 1)| D (i, 1) = 1] E[Y (i, 1)| D (i, 1) = 0]}
Difference 2
(25)
46
Note that
(t) (0) + [(1) (0)] t,
. global xlist age agesq educ educsq black hisp married nodegr re74 re74sq u74 u75
.
. quie reg re7578 treat, robust
. estimates store DiDnoX
.
. quie reg re7578 treat $xlist, robust
. estimates store DiDwithX
. estimates table DiDnoX DiDwithX, k(treat) b(%9.3f) p
-------------------------------------Variable |
DiDnoX
DiDwithX
-------------+-----------------------treat |
1529.197
1489.351
0.0330
0.0321
-------------------------------------legend: b/p
93
94
Abadie [2005] shows that if assumptions Step 1 and Step 2 hold for
each values of X, then
Y (1) Y (0)
D Pr{ D = 1| X }
1
0
ATT := E[Y (1) Y (1)| D = 1] = E
Pr{ D = 1}
1 Pr{ D = 1| X }
Estimation I:
If pb( X (i )) represents a consistent estimator of p( X (i )) := Pr{ D =
1| X }, and we estimate Pr{ D = 1} with b = n1 nj=1 D ( j), then using
the entire sample of size n, one has by the analogy principle
n
Y (i, 1) Y (i, 0)
D (i ) pb( X (i ))
.
(26)
bATT = n1
1 pb( X (i ))
b
i =1
. quie summarize treat, meanonly
. scalar rho=r(mean)
. quie probit treat $xlist
. predict phat, pr
. g tauATT=re7578*(treat-phat)/(rho*(1-phat))
. quie summarize tauATT, meanonly
. di r(mean)
1631.7815
Now that we have computed the bATT the next question is how
to compute standard errors? The next section describes how the
bootstrap to compute standard errors.
The Bootstrap Method
Suppose we have a random sample { Zi ; i = 1, . . . , n} from a
random variable with CDF F0 . Lets say we are interested in the distribution of the statistic Tn = Tn ( Z1 , . . . , Zn ) which has a finite sample
distribution Gn (, F0 ) = Pr( Tn ) this distribution is generally unknown and it depends on F0 . In our Statistics & Econometric courses
we learnt to use G (, F0 ) in its place (the asymptotic distribution).
The basic idea of the bootstrap is to approximate Gn (, F0 )
with Gn (, Fn ) instead, where Fn is the empirical distribution
function of the data.
Unlike the calculation of G (, F0 ) which requires knowledge
of asymptotic theory and approximation theorems, the calculation
of Gn (, Fn ) can be performed on a computer instead as follows:
The idea is to treat the original random sample, { Zi ; i = 1, . . . , n},
as if it were the population [in which case the original statistics
Tn = Tn ( Z1 , . . . , Zn ) becomes the population parameter we wish
Define as p( X, ) := Pr{ D = 1| X }
a Probit or Logit model that can be
estimated by (conditional) maximum
likelihood methods. Let b be the (C)ML
estimate, then
"
#
n
D (i ) p( X (i ), b)
Y (i, 1) Y (i, 0)
bATT = n
.
b
1 p( X (i ), b)
i =1
Standard errors can be calculated as
described in Wooldridge [2010, pp.
922-924]
to estimate and make inference about]. We resample (with replacement) from this population a pseudo-sample of the same size, say
{ Zib ; i = 1, . . . , n} and recalculate the original statistic using this
pseudo-sample, i.e. Tnb = Tn ( Z1b , . . . , Znb ). We can do this B many
times, i.e. b = 1, . . . , B and we end up with another pseudo-sample
{ Tnb : b = 1, . . . , B}. Then, Gn (, Fn ) can be estimated simply as the
empirical distribution function of { Tnb : b = 1, . . . , B}, i.e.
1
B
I Tnb .
b =1
3.
gen re7578=re78-re75
4.
gen re74sq=re74^2
5.
gen agesq=age^2
6.
gen educsq=educ^2
7.
bsample
8.
9.
scalar rho=r(mean)
10.
11.
predict phat, pr
12.
13.
g tauATTv=re7578*(treat-phat)/(rho*(1-phat))
quie summarize tauATTv, meanonly
14.
15. end
.
. use lalonde.dta, clear
. gen re7578=re78-re75
. gen re74sq=re74^2
. gen agesq=age^2
. gen educsq=educ^2
. global xlist age agesq educ educsq black hisp married nodegr re74 re74sq u74 u75
. quie summarize treat, meanonly
. scalar rho=r(mean)
. quie probit treat $xlist
. predict phat, pr
. g tauATT=re7578*(treat-phat)/(rho*(1-phat))
. quie summarize tauATT, meanonly
. scalar tauATT0 = r(mean)
95
96
.
. simulate tauATTb=r(tauATT), seed(10101) reps(999) nodots saving(bdata,replace): onebootrep
command:
onebootrep
tauATTb:
r(tauATT)
. summarize tauATTb
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------tauATTb |
999
1624.884
751.8149
-817.0814
4326.195
Tn
(b)
Tn .
b =1
The bias-corrected version of the estimator, (Tn bias( Tn )), can be estimated
as
1 B (b)
2Tn Tn .
B b =1
97
. global xlist y81 nearinc age agesq lintst lland larea rooms baths
.
. quie reg lrprice y81 nearinc y81nrinc, robust
. estimates store DiDnoX
.
. quie reg lrprice y81nrinc $xlist, robust
. estimates store DiDwithX
. estimates table DiDnoX DiDwithX, k(y81nrinc) b(%9.3f) p
-------------------------------------Variable |
DiDnoX
DiDwithX
-------------+-----------------------y81nrinc |
-0.063
-0.132
0.5086
0.0289
-------------------------------------legend: b/p
Let assume that for each individual in the joint sample we observe
(Y, D, T, X > )> where T is the temporal indicator that takes on the
values 1 if the individual belongs to the post-treatment sample.47
Using this notation we make the following assumption:
3. Given T = 0, the data is a random sample from the joint distribution of (Y (0), D, X > )> ; given T = 1, the data is also a random
sample from the joint distribution of de (Y (1), D, X > )>
Denote as (0, 1) the proportion out of the total number of
observations sampled in period t = 1.
Abadie [2005] also shows that if assumptions Step 1, 3 and 0 <
Pr{ D = 1| X } < 1 hold for each value of X, then
ATT := E[Y 1 (1) Y 0 (1)| D = 1]
Pr{ D = 1| X }
=E
0 Y ,
Pr{ D = 1}
T
D Pr{ D = 1| X }
0 : =
.
(1 ) Pr{ D = 1| X }[1 Pr{ D = 1| X }]
Estimation II:
Based on the joint sample {Yi , Di , Ti , Xi }in=1 we can estimate this
quantity as
Ti b
Di pb( Xi )
,
b
b)
pb( Xi )[1 pb( Xi )]
(1
n
pb( Xi )
1
=n
bi Yi ,
b
i =1
bi =
bATT
where b
= n1 nj=1 Tj .
(27)
(28)
98
3.
bsample
4.
5.
scalar rho=r(mean)
6.
7.
scalar lambda=r(mean)
8.
9.
predict phat, pr
b
Di p( Xi , b)
Ti
.
b
b
(1 )
p( Xi , b)[1 p( Xi , b)]
10.
g phi=((y81 - lambda)/(lambda*(1-lambda)))*((nearinc-phat)/(phat*(1-phat)))
11.
12.
g tauATT=lrprice*phi*phat/rho
quie summarize tauATT, meanonly
13.
14. end
.
. use KIELMC.dta, clear
. quie summarize nearinc, meanonly
. scalar rho=r(mean)
. quie summarize y81, meanonly
. scalar lambda=r(mean)
. quie probit nearinc larea
. predict phat, pr
. g phi=((y81 - lambda)/(lambda*(1-lambda)))*((nearinc-phat)/(phat*(1-phat)))
. g tauATT=lrprice*phi*phat/rho
. quie summarize tauATT, meanonly
. scalar tauATT0 = r(mean)
99
.
. simulate tauATTb=r(tauATT), seed(10101) reps(999) nodots saving(bdata,replace): onebootrep
command:
onebootrep
tauATTb:
r(tauATT)
. summarize tauATTb
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------tauATTb |
999
-.2311108
2.913331
-10.3286
8.815673
Introduction
We assume to have a repeated sample on the cross-section unit i
0 ), . . . , ( y , x0 ). That is, the entire sample
through time, i.e. (yi1 , xi1
iTi iT
T
can be written as {{yit , xit0 }t=i 1 }in=1 and has 2 dimensions: n (crosssectional) and T := max1in Ti (Time Series). For notational simplicity, we are only concern with balanced panels, i.e. Ti = T, 1 i n.48
We will only be concerned with situations where T is small and n
is large.49 Like in standard regression analysis, when xit contains
lagged values of the dependent variable, i.e. yit1 ,. . . ,yit p , we will
refer to this case as dynamic, and when it does not, it will be labeled
static.
The data corresponds to a random sample of 595 individuals that
were drawn from the Panel Study of Income Dynamics (PSID) with
T = 7. We are interested in modeling earnings (lwage) with years of
full-time work experience (exp), number of weeks worked (wks) and
years of education (ed).
Stata performs panel data analysis in the long data format rather
than the wide data formal. After loading the above data, Stata may
require the user to set the panel identifier and time variables. There
are two command options. The tsset or xtset commands both
perform this task. The tsset command is also used to set the time
variable and structure when using time series data.
Figure 11: Time-series plots for each of
the first 20 individuals
102
lwage
exp
exp2
wks
ed
1.
5.56068
32
2.
5.72031
16
43
3.
5.99645
25
40
.
. *--Declaring individual & time identifier
. xtset id t
panel variable:
id (strongly balanced)
time variable:
t, 1 to 7
delta:
1 unit
. tsset id t
panel variable:
id (strongly balanced)
time variable:
t, 1 to 7
delta:
1 unit
. xtdes
id:
t:
1, 2, ..., 595
n =
595
1, 2, ..., 7
T =
Delta(t) = 1 unit
Span(t)
= 7 periods
Freq.
Percent
min
5%
25%
50%
75%
95%
max
Cum. |
Pattern
---------------------------+--------595
100.00
100.00 |
1111111
---------------------------+--------595
100.00
XXXXXXX
The xtdescribe command provides the distribution of crosssectional participation patterns. The command details whether the
103
panel is balanced or not. The xtdesribe shows that the above data
is a balanced panel with an observation for each participant in every
year.
In the second example, we use data on Canadian firms. The
xtdescribe command shows the panel is unbalanced but the most
popular pattern is the 201 firms with observations in every year.
. use cdn4.dta
. xtdes
id:
year:
1, 2, ..., 2981
n =
2977
T =
14
Delta(year) = 1 unit
Span(year)
= 14 periods
Freq.
Percent
min
5%
25%
50%
75%
95%
max
14
14
Cum. |
Pattern
---------------------------+---------------201
6.75
6.75 |
11111111111111
153
5.14
11.89 |
1.............
133
4.47
16.36 |
11............
129
4.33
20.69 |
.............1
109
3.66
24.35 |
111...........
102
3.43
27.78 |
............11
74
2.49
30.27 |
......1.......
73
2.45
32.72 |
.........11111
35.14 |
...........111
72
2.42
1931
64.86
---------------------------+---------------2977
100.00
XXXXXXXXXXXXXX
uit = i + t + it .
See Hsiao (2003) and Baltagi (2008) for
details.
104
uit = i + it ,
where {i }in=1 represent random variables that capture unobserved
heterogeneity, and it is i.i.d. over i and t, uncorrelated with i , such
that E[ it |i , xi1 , . . . , xiT ] = 0,51 2 := V(i ), and 2 := V( it ).
Therefore,
yit = i + xit0 + it , i = 1, . . . , n, t = 1, . . . , T,
(29)
(30)
Basic Framework
The simple linear regression model in (29) can be written in matrix
form as
y = [In T ] + X + ,
(31)
where In is the n n identity matrix, T is the T 1 vector of ones,
represents the Kronecker product,52 y = [y10 , . . . , y0n ]0 , X = [X10 , . . . , X0n ]0 ,
= [01 , . . . , 0n ]0 ,53 and = [1 , . . . , n ]0 , = [ 1 , . . . , k ]0 . We will also
use two idempotent and symmetric matrices
P = [In T 1 T 0T ],
Q = InT P.
If A is a m n matrix and B is a p q
matrix, then the Kronecker product A B
is the mp nq block matrix
a11 B
a1n B
..
..
..
.
AB =
.
.
.
am1 B amn B
52
yi1
yi2
53
yi = .
..
yiT
i1
i2
i = .
..
iT
0
xi1
0
xi2
, Xi = .
..
0
xiT
54
This is known as the PopulationAveraged model in Statistics.
(32)
vit := i + it ,
where E[vit |xi1 , . . . , xiT ] = 0 and therefore55
V(v) = E[vv0 ] := = 2 InT + 2 [In T 0T ], or
2 + 2
2
..
.
2
2
2 + 2
..
.
...
...
...
..
.
2
2
0
0
0
0
0
0
...
...
0
0
2
2 + 2
0
0
0
0
...
...
0
0
0
0
0
..
.
0
0
0
0
..
.
0
0
0
..
.
0
0
0
..
.
0
...
...
...
..
.
...
0
2 + 2
2
..
.
2
0
2
2 + 2
..
.
...
...
...
...
..
.
0
2
2
0
0
..
.
0
105
2
2 + 2
(33)
Between Estimator
Notice that by taking the sample means across time in (29), we obtain
the representation
yi = + xi0 + (i + i ).
The BE estimator is the OLS estimator in this model. In matrix form,
e +v
it is simply the OLS regression of the transformed model y
e = W
e,
e = PW, v
where y
e = Py, W
e = Pv, i.e.
bBE = (W0 PW)1 W0 Py,
V(bBE |X) = (W0 PW)1 W0 PPW(W0 PW)1 .
106
1
0
1
bGLS = (W W) W y, and V(bGLS |X) = (W0 1 W)1 .
Nerlove [1971] showed that
R = InT (1 )P = Q + P, where
s
2
:=
.
2
+ T2
(34)
(35)
(36)
OLS
BE
RE
-------------+-----------------------------exp |
0.045
0.038
0.089
0.005
0.006
0.003
exp2 |
-0.001
-0.001
-0.001
0.000
0.000
0.000
wks |
0.006
0.013
0.001
0.002
0.004
0.001
ed |
0.076
0.074
0.112
|
_cons |
0.005
0.005
0.006
4.908
4.683
3.829
0.140
0.210
0.094
-------------------------------------------legend: b/se
"
#2
d 21
. xttest0
Breusch and Pagan Lagrangian multiplier test for random effects
lwage[id,t] = Xb + u[id] + e[id,t]
Estimated results:
|
Var
sd = sqrt(Var)
---------+-----------------------------
Test:
lwage |
.2129935
.4615122
e |
.0231658
.1522032
u |
.1020921
.3195186
chibar2(01) =
5192.13
0.0000
Var(u) = 0
107
108
Within Estimator
As pointed out above, the fixed effect model permits identification of
the for time-varying regressors by treating the i as fixed in estimating equation (29) or in matrix form (31). Therefore W estimator can
bW = (V0 V)1 V0 y,
be obtained by simple OLS regression on (31), i.e.
bW |X) = 2 (V0 V)1 , where V := [In T , X], and := [0 , 0 ]0 .
and V(
bW and V(
bW |X) require the inversion of a nT nT matrix V,
Both
which might be an undesirable feature. Fortunately, one can compute
the estimator for and by means of the partitioned regression
result:58
bW = [X0 M[In T ] X]1 X0 M[In T ] y,
(37)
First-Difference Estimator
Define the first difference transformation D := In d, where d is of
dimension ( T 1, T ).60 Then, the FD estimator can be obtained by
e +e
running OLS on the transformed model y
e = X
, where y
e = Dy,
61
e
X = DX, e
= D:
bFD = [X0 DX]1 X0 Dy,
V( bFD |X) = (X0 DX)1 X0 DV(|X)D0 X(X0 DX)1 .
1 1
0
0
... 0
0
0
1
1
0
... 0
0
0
0
1
1 . . . 0
0
.
..
..
..
..
..
..
.
.
. .
.
.
.
.
0
0
0
0
. . . 1 1
61
0.
109
estimator or the Covariance estimator. Both the W/FE and FD estimators are
consistent under both random effects and fixed effects models. However, they
are inefficient in the random effect case.
Remark 26 For T = 2, the FD and W/FE estimators are equal because
yi = (yi1 + yi2 )/2, so (yi1 yi ) = (yi2 yi1 )/2, and (yi2 yi ) =
(yi2 yi1 )/2, and similarly for xi , but when T > 2 the two estimators differ.
e +e
Remark 27 The GLS estimator of the transformed model y
e = X
, where
e = DX, e
y
e = Dy, X
= D equals the W/FE estimator. Therefore, since the
FD estimator is simply the OLS of this transformed model, the FD estimator
is less efficient than the W/FE estimator.
Remark 28 Unless further assumptions are made the W/FE and the FD
estimators do not identified the coefficient of time-invariant regressors.
However, this somewhat shortcoming turns out to also provide a robustness
property, i.e. W/FE and the FD estimators are robust to time-invariant
omitted-variable bias.
. *--Fixed-Effect Estimator
. quietly xtreg lwage exp exp2 wks ed, fe
. predict alpha_fe, u
. predict e_fe, e
. estimates store FE
. mata: beta_FE = st_matrix("e(b)")
.
. *--First-Difference Estimator
. sort id t
. quietly regress D.(lwage exp exp2 wks ed), vce(cluster id) noconstant
. mata: beta_FD = (st_matrix("e(b)"),0)
.
. *--Comparison of W/FE & FD estimates
. mata: beta_coef = beta_FE,beta_FD
. mata: beta_coef
1
+-------------------------------+
1 |
.1137878577
.1170653986
2 |
-.0004243693
-.0005321208
3 |
.0008358763
-.0002682652
4 |
5 |
4.596396197
+-------------------------------+
Figure 13: Histogram and estimated
density of the estimated individualspecific effects {b
i }4160
i =1 . Time-series
plots, t = 1, . . . , 7, for each of the first
{b it }20
i =1 implied by the W/FE estimator.
110
qb20 [Asy. Var(qb2 )]1 qb2 = [ bBE bRE ]0 [Asy. Var( bBE )
Asy. Var( bRE )]1 [ bBE bRE ],
qb30 [Asy. Var(qb3 )]1 qb3 = [ bW bBE ]0 [Asy. Var( bW )
+ Asy. Var( bBE )]1 [ bW bBE ].
These three versions can be shown to be numerically exactly identical
if all the regressors are time-varying. However, in practice because
of rounding errors and the fact that we have to use feasible GLS for
RE estimation, numerically, these three versions may give us results
slightly different.
. *--Hausman test assuming RE is fully efficient under H0
. hausman FE RE, sigmamore
---- Coefficients ---|
(b)
(B)
FE
RE
(b-B)
Difference
sqrt(diag(V_b-V_B))
S.E.
-------------+---------------------------------------------------------------exp |
.1137879
.0888609
.0249269
.0012778
exp2 |
-.0004244
-.0007726
.0003482
.0000285
wks |
.0008359
.0009658
-.0001299
.0001108
111
Ho:
1513.02
Prob>chi2 =
0.0000
xi0 .
mexp = 0
( 2)
mexp2 = 0
( 3)
mwks = 0
chi2(
3) = 1792.41
0.0000
112
Endogeneity
So far we have made the strict exogeneity assumption, E[ it |i , xi1 , . . . , xiT ] =
0, which can be very strong and is often violated in economic problems.
We are now interested of modeling earnings (lwage) with a larger
set of covariates: Time varying regressors include exp, exp2, wks, marital status (ms), an indicator if wages were set by an union contract
(union), an indicator for blue-collar occupation (occ), indicators for
residence (south and smsa), and an indicator if the person works in
a manufacturing industry (ind). Time invariant covariates are ed, a
gender indicator (fem) and race indicator (blk).
(39)
where
x1it represents k1 variables that are time-varying and uncorrelated
with i .
z1i represents l1 variables that are time invariant and uncorrelated
with i .
x2it represents k2 variables that are time-varying and correlated with
i .
z2i represents l2 variables that are time invariant and correlated with
i .
The assumptions about the random terms in the model are64
E[i |x1it , z1i ] = 0 though E[i |x2it , z2i ] 6= 0,
V[i |x1it , z1i , x2it , z2i ] = 2 ,
cov(i , it |x1it , z1i , x2it , z2i ) = 0,
V[i + it |x1it , z1i , x2it , z2i ] = 2 + 2 =: 2 ,
correlation(i + it , i + is |x1it , z1i , x2it , z2i ) = 2 /2 =: .
Note the crucial underlying assumption that one can distinguish between
the set of variables (both time-varying
and invariants) that are correlated or
not with i
64
We set x1it = [occit ,southit ,smsait ,indit ]0 , x2it = [expit ,exp2it ,wksit ,msit ,unionit ]0 ,
z1i = [femi ,blki ]0 and z2i = [edi ].
Then, the GLS transformation of (39) becomes65
yit (1 )yi = (x1it (1 )x1i )0 1 + (x2it (1 )x2i ) 2
0
0
+ z1i
1 + z2i
2 + i + it (1 )i .
113
0
and define W := [W10 , . . . , W0
n ] , where Wi = [wi1 , . . . , wiT ] . Since
the RE estimator is inconsistent because functionals of x2it , x1it and
i are still present in the above specification, Hausman and Taylor
[1981] proposed using the following instrumental variables66
0
0 0
vit := [(x1it x1i )0 , (x2it x2i )0 , z1i
, x1i
],
0
[y10 , . . . , y0
n ] and yi : = [ yi1 (1 ) yi , . . . , yiT (1 ) yi ] . Then the
instrumental variable estimator would be67
0
[ 0HT , HT
]0 = [W0 V(V0 V)1 V0 W ]1 W0 V(V0 V)1 V0 y .
Notice that the IV estimator is consistent but inefficient if the data are not
weighted, that is, if W rather than W is
used in the computation.
67
Hausman-Taylor estimation
Number of obs
4165
Group variable: id
Number of groups
595
avg =
max =
Wald chi2(12)
6891.87
0.0000
-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
P>|z|
-------------+---------------------------------------------------------------TVexogenous
occ |
-.0207047
.0137809
-1.50
0.133
-.0477149
.0063055
south |
.0074398
.031955
0.23
0.816
-.0551908
.0700705
smsa |
-.0418334
.0189581
-2.21
0.027
-.0789906
-.0046761
ind |
.0136039
.0152374
0.89
0.372
-.0162608
.0434686
exp |
.1131328
.002471
45.79
0.000
.1082898
.1179758
exp2 |
-.0004189
.0000546
-7.67
0.000
-.0005259
-.0003119
wks |
.0008374
.0005997
1.40
0.163
-.0003381
.0020129
ms |
-.0298508
.01898
-1.57
0.116
-.0670508
.0073493
union |
.0327714
.0149084
2.20
0.028
.0035514
.0619914
TVendogenous |
114
TIexogenous
fem |
-.1309236
.126659
-1.03
0.301
-.3791707
.1173234
blk |
-.2857479
.1557019
-1.84
0.066
-.5909179
.0194221
ed |
.137944
.0212485
6.49
0.000
.0962977
.1795902
|
_cons |
2.912726
.2836522
10.27
0.000
2.356778
3.468674
TIendogenous |
-------------+---------------------------------------------------------------sigma_u | .94180304
sigma_e |
.15180273
rho |
.97467788
-----------------------------------------------------------------------------Note:
(40)
115
(41)
(42)
Specifications given in (41) and (42) remove the unobserved heterogeneity parameter. Unfortunately, running OLS or GLS in (40), (41)
and (42) yield inconsistent estimates.68
Therefore IV and GMM estimation can be used instead. Specification (42) is the most popular version of (40) as it gets rid of i (as well
as any other time-invariant covariate).
68
This is so because
cov(yit1 , i + it ) 6= 0,
cov(yit1 yi , it i ) 6= 0,
cov(yit1 yit2 , it it1 ) 6= 0.
(40):
(41):
(42):
!
!
! 1
n
bIV =
Xe i0 Zi
i =1
Xe i0 Zi
i =1
Zi0 Zi
i =1
n
Zi0 Zi
i =1
! 1
Zi0 Xe i
i =1
!
Zi0 yei .
69
E[ i3 |yi1 ] = 0
t=4
E[ i4 |yi1 ] = 0
E[ i4 |yi2 ] = 0
t=5
E[ i5 |yi1 ] = 0
E[ i5 |yi2 ] = 0
E[ i5 |yi3 ] = 0
..
.
E[ iT |yi1 ] = 0
E[ iT |yi2 ] = 0
..
.
E[ iT |yiT 2 ] = 0
i =1
..
.
t=T
0 , x0 , . . . , x0
yi1 , xi1
0
...
0
i2
iT
0
0
0
0
0
yi1 , yi2 , xi1 , xi2 , . . . , xiT . . .
Zi =
..
..
..
..
.
.
.
0 , x0 , . . . , x0
0
0
. . . yi1 , yi2 , . . . , yiT 2 , xi1
i2
iT
0 , x0
0
...
0
yi1 , xi1
i2
0 , x0 , x0
0
yi1 , yi2 , xi1
0
i2 i3 . . .
.
Zi =
..
..
..
..
.
.
.
0 , x0 , . . . , x0
0
0
. . . yi1 , yi2 , . . . , yiT 2 , xi1
i2
iT 1
Note: Lags of xit and xit can additionally be used as instruments, and that is
exactly what Stata does.
116
0 , x0
yi1 , xi1
0
i2
0 , x0
0
yi1 , yi2 , xi1
i2
Zi =
..
..
.
.
0
0
...
...
..
.
...
0
0
..
.
0 , x0
yi1 , yi2 , . . . , yiT 2 , xi1
i2
Number of obs
2380
Group variable: id
Number of groups
595
min =
avg =
max =
Wald chi2(10)
1287.77
0.0000
Time variable: t
Obs per group:
Number of instruments =
40
Two-step results
(Std. Err. adjusted for clustering on id)
-----------------------------------------------------------------------------|
lwage |
WC-Robust
Coef.
Std. Err.
P>|z|
-------------+---------------------------------------------------------------lwage |
L1. |
.611753
.0373491
16.38
0.000
.5385501
.6849559
L2. |
.2409058
.0319939
7.53
0.000
.1781989
.3036127
--. |
-.0159751
.0082523
-1.94
0.053
-.0321493
.000199
L1. |
.0039944
.0027425
1.46
0.145
-.0013807
.0093695
ms |
.1859324
.144458
1.29
0.198
-.0972
.4690649
union |
-.1531329
.1677842
-0.91
0.361
-.4819839
.1757181
occ |
-.0357509
.0347705
-1.03
0.304
-.1038999
.032398
south |
-.0250368
.2150806
-0.12
0.907
-.446587
.3965134
smsa |
-.0848223
.0525243
-1.61
0.106
-.187768
.0181235
ind |
_cons |
.0227008
.0424207
0.54
0.593
-.0604422
.1058437
1.639999
.4981019
3.29
0.001
.6637377
2.616261
|
wks |
117
Options lags(2) means that lwageit1 and lwageit2 are regressors, maxldep(3) means that at most 3 lags
of lwageit are used as instruments, pre(wks,lag(1,2)) means that wksit1 is a predetermined regressor, and
that up to 2 lags are used as instruments. If wksit1 is predetermined then this variable should not serve
an instrument due to the resulting correlation in (42). Now, the 2 lags instruments are wksit2 and wksit3 .
Option endogenous(,lag(0,2)) means that is an endogenous regressor that appears on the right-hand
side at its level (0) and that up to 2 lags are used as instruments.
1
T3j
rtj ,
t =4+ j
b
rj
b (b
[V
r j )/n]1/2
where b
r j is the sample counterpart of r j based on first-difference
rtj = n1 in=1 b
it b
it j , and the expression for
residuals b
it and b
b (b
V
r j ) is given by equation 6.158, page 122, in Arellano [2003].70
Under H0 , Arellano & Bond (1991) showed that z j N (0, 1).
Prob > z|
|------+----------------|
|
|-4.5244
0.0000 |
|-1.6041
0.1087 |
| .35729
0.7209 |
+-----------------------+
H0: no autocorrelation
118
. estat sargan
Sargan test of overidentifying restrictions
H0: overidentifying restrictions are valid
chi2(29)
39.87571
0.0860
and the level equation 40. This is particularly important for very
persistent data as the instruments available for 42 are more likely
to be weak and induce finite sample bias. The resulting moment
conditions cannot be used for the identification of the when we
have unit root. The additional moment conditions create instruments
that are not weak to reduce this potential bias.
. *--Blundell & Bonds (1998) Estimator
. xtdpdsys lwage occ south smsa ind, lags(2) maxldep(3) pre(wks,lag(1,2)) endoge
> nous(ms,lag(0,2)) endogenous(union,lag(0,2)) twostep vce(robust) artests(3)
Number of obs
2975
Group variable: id
Number of groups
595
Time variable: t
Obs per group:
Number of instruments =
60
min =
avg =
max =
Wald chi2(10)
2270.88
0.0000
Two-step results
-----------------------------------------------------------------------------|
lwage |
WC-Robust
Coef.
Std. Err.
P>|z|
-------------+---------------------------------------------------------------lwage |
L1. |
.6017533
.0291502
20.64
0.000
.5446199
.6588866
L2. |
.2880537
.0285319
10.10
0.000
.2321322
.3439752
--. |
-.0014979
.0056143
-0.27
0.790
-.0125017
.009506
L1. |
.0006786
.0015694
0.43
0.665
-.0023973
.0037545
ms |
.0395337
.0558543
0.71
0.479
-.0699386
.1490061
union |
-.0422409
.0719919
-0.59
0.557
-.1833423
.0988606
occ |
-.0508803
.0331149
-1.54
0.124
-.1157843
.0140237
south |
-.1062817
.083753
-1.27
0.204
-.2704346
.0578713
smsa |
-.0483567
.0479016
-1.01
0.313
-.1422422
.0455288
ind |
_cons |
.0144749
.031448
0.46
0.645
-.0471621
.0761118
.9584113
.3632287
2.64
0.008
.2464961
1.670327
|
wks |
119
120
The options follow to match the previous discussion using the xtabond command.
When the cross-section dimension is small, the Blundell and Bond
system GMM estimator typically yields downward biased standard
errors, which affects inference. The finite aspects of the data can be
used to address the later problem. 71 proposes a variance estimator,
which allows for heteroskedasticity-consistent standard errors and
corrects for this potential finite sample bias. The option vec(robust)
provide the Windmeijer corrected standard errors.
Serially uncorrelated it and overidentification are also considerations when using the Blundell and Bond system GMM estimator.
Fortunately, the post-estimation Stata commands to test for serial
correlation and overidentification extend to the xtdpdsys command.
Therefore, estat abond provides the test statistics for autocorrelation
of various orders, while estat sargan provides the test statistic for
the Sargan overidentification test. However, the Sargan statistic is
unavailable when using the Windmeijer corrected standard errors.
Favoring space over speed. To switch, type or click on mata: mata set matafavor
> speed, perm.
Warning: Two-step estimated covariance matrix of moments is singular.
Using a generalized inverse to calculate optimal weighting matrix for two-step
>
estimation.
Difference-in-Sargan statistics may be negative.
-----------------------------------------------------------------------------Group variable: id
Number of obs
2380
Time variable : t
Number of groups
595
Number of instruments = 39
Wald chi2(10) =
Prob > chi2
1287.77
avg =
4.00
0.000
max =
-----------------------------------------------------------------------------|
lwage |
Corrected
Coef.
Std. Err.
P>|z|
-------------+---------------------------------------------------------------lwage |
L1. |
.611753
.0373491
16.38
0.000
.5385501
.6849559
L2. |
.2409058
.0319939
7.53
0.000
.1781989
.3036127
--. |
-.0159751
.0082523
-1.94
0.053
-.0321493
.000199
L1. |
.0039944
.0027425
1.46
0.145
-.0013807
.0093695
|
wks |
|
ms |
.1859324
.144458
1.29
0.198
-.0972
.4690649
union |
-.1531329
.1677842
-0.91
0.361
-.4819839
.1757181
occ |
-.0357509
.0347705
-1.03
0.304
-.1038999
.032398
south |
-.0250368
.2150806
-0.12
0.907
-.446587
.3965134
smsa |
-.0848223
.0525243
-1.61
0.106
-.187768
.0181235
ind |
.0227008
.0424207
0.54
0.593
-.0604422
.1058437
-4.52
Pr > z =
0.000
-1.60
Pr > z =
0.109
0.36
Pr > z =
0.721
59.55
0.001
0.086
39.88
121
122
chi2(21)
32.35
0.054
7.52
0.481
21.15
0.450
18.73
0.016
31.14
0.071
8.74
0.365
chi2(18)
23.59
0.169
16.29
0.131
28.00
0.308
11.87
0.018
chi2(21)
chi2(21)
chi2(25)
Notice that we must specify all the regressors of the model including the lagged dependent or any other endogenous variables. Option
noleveleq specifies not to instrument the level equation, which
means that the results provide the Arellano-Bond estimator. Option
iv() means that include the exogenous variables in the model and
any other variables serving as instruments. Option gmm(,lag(a,b))
means that is an endogenous or a predetermined regressor that
appear on the right-hand side with upto a lags and between a and b
lags are used as instruments. Option gmm also contains a suboption
collapse, which restricts the instrument set to one instrument for
each variable and lag distance, rather than one for each time period,
variable, and lag distance. This restriction is useful for two reasons.
First, a bias arises in small samples as the number of instruments
moves towards the number of observations due to overfitting of the
model. Second, the suboption collapse reduces the width of the
instrument matrix which lowers the computational burdens and prevents the instrument matrix from exceeding Statas size limits. The
above syntax treats the variable wksit1 as predetermined and instruments with its two lags (wksit2 , wksit3 ). Equivalently, we could treat
the variable wksit as predetermined and instrument with its second
and third lags (wksit2 , wksit3 ).
xtabond2 command.
.
. xtabond2 lwage l(1/2).lwage l(0/1).wks ms union occ south smsa ind, gmm(ms,lag
> (2 3)) gmm(union,lag(2 3)) twostep artests(3) noleveleq iv(occ south smsa ind)
>
Favoring space over speed. To switch, type or click on mata: mata set matafavor
> speed, perm.
estimation.
Difference-in-Sargan statistics may be negative.
Number of obs
2380
Time variable : t
Number of groups
595
Number of instruments = 39
1287.77
avg =
4.00
0.000
max =
Wald chi2(10) =
Prob > chi2
-----------------------------------------------------------------------------|
lwage |
Corrected
Coef.
Std. Err.
P>|z|
-------------+---------------------------------------------------------------lwage |
L1. |
.611753
.0373491
16.38
0.000
.5385501
.6849559
L2. |
.2409058
.0319939
7.53
0.000
.1781989
.3036127
--. |
-.0159751
.0082523
-1.94
0.053
-.0321493
.000199
L1. |
.0039944
.0027425
1.46
0.145
-.0013807
.0093695
|
wks |
|
ms |
.1859324
.144458
1.29
0.198
-.0972
.4690649
union |
-.1531329
.1677842
-0.91
0.361
-.4819839
.1757181
occ |
-.0357509
.0347705
-1.03
0.304
-.1038999
.032398
south |
-.0250368
.2150806
-0.12
0.907
-.446587
.3965134
smsa |
-.0848223
.0525243
-1.61
0.106
-.187768
.0181235
ind |
.0227008
.0424207
0.54
0.593
-.0604422
.1058437
-4.52
Pr > z =
0.000
-1.60
Pr > z =
0.109
0.36
Pr > z =
0.721
------------------------------------------------------------------------------
123
124
59.55
0.001
0.086
39.88
chi2(21)
32.35
0.054
7.52
0.481
21.15
0.450
18.73
0.016
31.14
0.071
8.74
0.365
chi2(18)
23.59
0.169
16.29
0.131
28.00
0.308
11.87
0.018
chi2(21)
chi2(21)
chi2(25)
> ) twostep
Dynamic panel-data estimation
Number of obs
2975
Group variable: id
Number of groups
595
Time variable: t
Obs per group:
Number of instruments =
40
min =
avg =
max =
Wald chi2(10)
1640.91
0.0000
Two-step results
------------------------------------------------------------------------------
lwage |
Coef.
Std. Err.
P>|z|
-------------+---------------------------------------------------------------lwage |
L1. |
.611753
.0251464
24.33
0.000
.562467
.661039
L2. |
.2409058
.0217815
11.06
0.000
.198215
.2835967
--. |
-.0159751
.0067113
-2.38
0.017
-.029129
-.0028212
L1. |
.0039944
.0020621
1.94
0.053
-.0000472
.008036
|
wks |
|
ms |
.1859324
.1263155
1.47
0.141
-.0616413
.4335062
union |
-.1531329
.1345067
-1.14
0.255
-.4167613
.1104955
occ |
-.0357509
.0303114
-1.18
0.238
-.0951602
.0236583
south |
-.0250368
.1537619
-0.16
0.871
-.3264046
.2763309
smsa |
-.0848223
.0477614
-1.78
0.076
-.1784329
.0087884
ind |
.0227008
.03597
0.63
0.528
-.0477991
.0932006
_cons |
1.639999
.3656413
4.49
0.000
.9233556
2.356643
Number of obs
2975
Group variable: id
Number of groups
595
Time variable: t
Obs per group:
Number of instruments =
Two-step results
60
min =
avg =
max =
Wald chi2(10)
2270.88
0.0000
125
126
WC-Robust
Coef.
Std. Err.
P>|z|
-------------+---------------------------------------------------------------lwage |
L1. |
.6017533
.0291502
20.64
0.000
.5446199
.6588866
L2. |
.2880537
.0285319
10.10
0.000
.2321322
.3439752
--. |
-.0014979
.0056143
-0.27
0.790
-.0125017
.009506
L1. |
.0006786
.0015694
0.43
0.665
-.0023973
.0037545
ms |
.0395337
.0558543
0.71
0.479
-.0699386
.1490061
union |
-.0422409
.0719919
-0.59
0.557
-.1833423
.0988606
occ |
-.0508803
.0331149
-1.54
0.124
-.1157843
.0140237
south |
-.1062817
.083753
-1.27
0.204
-.2704346
.0578713
smsa |
-.0483567
.0479016
-1.01
0.313
-.1422422
.0455288
ind |
_cons |
.0144749
.031448
0.46
0.645
-.0471621
.0761118
.9584113
.3632287
2.64
0.008
.2464961
1.670327
|
wks |
When using xtdpd, the distinction between the difference GMM estimator of Arellano-Bond and the system
GMM estimator of Blundell-Bond occurs through the specification of instrumenting the difference equation
(42) or both the difference and levels (40) equations as part of the options. The option dg(,l(a,b)) means
that is an endogenous or a predetermined regressor that appear on the right-hand side with upto a lags
and between a and b lags are used as instruments in the difference equation. The option lg(,l(a)) means
that ath lag of the differences of serve as instruments in the levels equation. Not specifying the option lg
results in the Arellano-Bond estimator.
Introduction
One standard approach to estimate a fixed effect model with panel
data is to use the within estimator. This approach appeals to the
Frisch-Waugh Lovell theorem by demeaning data at the individual
level to remove the individual effect. The within estimator avoids two
problems which result when trying to estimate a fixed effect model
with a set of dummy variables for individuals. The first is a statistical
problem in that the inclusion of the individual dummy variables
leads to the estimation of excessive number of nuisance parameters
which causes inconsistency in the estimates of the parameters of
interest. The second is a practical problem, where the inclusion of
individual dummy variables as regressors leads to too many right
hand side variables for Stata to handle given its matrix limitations.
128
SS
df
MS
Number of obs =
-------------+-----------------------------Model |
34105.8511
17052.9256
Residual |
1004.57925
997
1.00760205
F(
-------------+-----------------------------Total |
35110.4303
999
35.1455759
2,
1000
997) =16924.27
Prob > F
R-squared
0.0000
0.9714
Adj R-squared =
0.9713
Root MSE
1.0038
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
P>|t|
-------------+---------------------------------------------------------------x |
5.019581
.0313022
160.36
0.000
4.958156
5.081007
firm |
_cons |
.4994599
.0055049
90.73
0.000
.4886574
.5102625
4.862866
.1699874
28.61
0.000
4.529291
5.19644
------------------------------------------------------------------------------
firm (balanced)
Source |
SS
df
MS
Number of obs =
-------------+-----------------------------Model |
34123.6364
20
1706.18182
Residual |
986.793935
979
1.00796112
F( 20,
-------------+-----------------------------Total |
35110.4303
999
35.1455759
1000
979) = 1692.71
Prob > F
R-squared
0.9719
Adj R-squared =
0.9713
Root MSE
0.0000
1.004
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
P>|t|
-------------+---------------------------------------------------------------x |
5.021169
.0315942
158.93
0.000
4.959169
5.083169
|
firm |
2
.3819136
.2008479
1.90
0.058
-.0122283
.7760555
.7969512
.2009165
3.97
0.000
.4026747
1.191228
1.306139
.200848
6.50
0.000
.9119964
1.700281
1.794757
.2014689
8.91
0.000
1.399396
2.190117
2.637599
.2008385
13.13
0.000
2.243476
3.031723
2.699761
.2007947
13.45
0.000
2.305723
3.093798
3.361955
.2010312
16.72
0.000
2.967454
3.756457
3.956463
.2012489
19.66
0.000
3.561534
4.351392
10
4.187189
.2010149
20.83
0.000
3.79272
4.581659
11
4.74716
.200897
23.63
0.000
4.352921
5.141398
12
5.314797
.2010879
26.43
0.000
4.920184
5.70941
13
5.845618
.2011525
29.06
0.000
5.450879
6.240358
14
6.114079
.2008724
30.44
0.000
5.719889
6.508269
15
6.98943
.2009823
34.78
0.000
6.595024
7.383835
16
7.436002
.2009108
37.01
0.000
7.041737
7.830267
17
7.718788
.200799
38.44
0.000
7.324742
8.112834
18
8.298004
.2008403
41.32
0.000
7.903877
8.692131
19
9.109146
.2008015
45.36
0.000
8.715095
9.503196
20
9.361374
.2009232
46.59
0.000
8.967084
9.755664
|
_cons |
5.496404
.2074166
26.50
0.000
5.089372
5.903436
-----------------------------------------------------------------------------. xtreg y x, fe
Fixed-effects (within) regression
Number of obs
1000
Number of groups
20
R-sq:
= 0.9627
50
between = 0.0415
avg =
50.0
overall = 0.7351
max =
50
within
129
130
corr(u_i, Xb)
= -0.0039
F(1,979)
25257.78
Prob > F
0.0000
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
P>|t|
-------------+---------------------------------------------------------------x |
_cons |
5.021169
.0315942
158.93
0.000
4.959169
5.083169
10.09926
.1610984
62.69
0.000
9.783122
10.4154
-------------+---------------------------------------------------------------sigma_u |
2.958017
sigma_e |
1.0039727
rho |
.89670228
SS
df
MS
Number of obs =
-------------+-----------------------------Model |
25458.8598
25458.8598
Residual |
986.793937
998
.98877148
F(
-------------+-----------------------------Total |
26445.6537
999
26.4721259
1,
1000
998) =25747.97
Prob > F
R-squared
0.0000
0.9627
Adj R-squared =
0.9626
Root MSE
.99437
-----------------------------------------------------------------------------yi |
Coef.
Std. Err.
P>|t|
-------------+---------------------------------------------------------------xi |
_cons |
5.021169
.031292
160.46
0.000
4.959763
5.082574
3.06e-09
.0314447
0.00
1.000
-.0617054
.0617054
------------------------------------------------------------------------------
The algorithm generates the variable f e1, which accounts for the
effect of the f irm variable on the value of y. Therefore, we have:
. reg y x fe1
Source |
SS
df
MS
Number of obs =
-------------+------------------------------
F(
2,
1000
997) =17238.28
Model |
34123.6364
17061.8182
Prob > F
0.0000
Residual |
986.793935
997
.989763225
R-squared
0.9719
Adj R-squared =
0.9718
Root MSE
.99487
-------------+-----------------------------Total |
35110.4303
999
35.1455759
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
P>|t|
-------------+---------------------------------------------------------------x |
5.021169
.0310239
161.85
0.000
4.960289
5.082048
fe1 |
.0109121
91.64
0.000
.9785868
1.021413
_cons |
8.30e-08
.1931887
0.00
1.000
-.379103
.3791032
------------------------------------------------------------------------------
131
132
The following generates a random data set of 1000000 observations. The x variable is an integer random variable. This new data
has two fixed effects which we call for exposition: (i) a firm effect;
and (ii) an industry effect. Given the set up of the random data set,
firms do switch industries.
global tob_obs "1000000"
global nfirms "20000"
*block1 = tot_obs/nfirms
global block1 "50"
global ind "200"
*block2= tot_obs/ind
global block2 "5000"
set obs $tob_obs
set seed 20140317
* Create a data set
*************************************************
gen rnd=uniform()
gen x=int(10*uniform())
egen firm=seq(), from(1) to($nfirms) block($block1)
sort rnd
egen ind=seq(), from(1) to(200) block($block2)
gen y=0.5*firm+5*x-0.5*ind+5*uniform()
sum
*************************************************
SS
df
MS
-------------+-----------------------------Model |
Residual |
8.3344e+12
F(
2.7781e+12
Prob > F
0.0000
2040669.98999996
2.04067814
R-squared
1.0000
-------------+-----------------------------Total |
3,999996) =
8.3344e+12999999
Adj R-squared =
8334370.46
Root MSE
1.0000
1.4285
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
P>|t|
-------------+---------------------------------------------------------------x |
5.000167
.0004976
1.0e+04
0.000
4.999192
5.001142
fe1 |
4.95e-07
2.0e+06
0.000
.999999
1.000001
fe2 |
_cons |
.0000495
2.0e+04
0.000
.999903
1.000097
1.28e-08
.0036142
0.00
1.000
-.0070837
.0070837
-----------------------------------------------------------------------------r; t=0.17
133
134
Transforming variable: x
Variable x converged after 5 Iterations
Checking if model converged - Coefficients for fixed effects should equal 1
Coefficient for id1 --> 1
Coefficient for id2 --> 1.0000002
1000000
F(20199, 979800)=
1.98e+08
Prob > F
0.0000
R-squared
1.0000
Adj R-squared
1.0000
Root MSE
1.4432
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
P>|t|
-------------+---------------------------------------------------------------x |
5.00006
.0005079
9844.74
0.000
4.999064
5.001055
-----------------------------------------------------------------------------r; t=44.49
Bibliography
Alberto Abadie. Semiparametric difference-in-differences. Review of
Economic Studies, 72:119, 2005.
T.W. Anderson and Cheng Hsiao. Formulation and estimation of
dynamic models using panel data. Journal of Econometrics, 18(1):47
82, 1982.
Manuel Arellano. Panel data econometrics. Advanced texts in econometrics. Oxford University Press, 2003. ISBN 9780199245291.
Manuel Arellano and Stephen Bond. Some tests of specification for
panel data: Monte carlo evidence and an application to employment
equations. Review of Economic Studies, 58(2):27797, April 1991.
Badi H. Baltagi. Econometric Analysis of Panel Data. John Wiley & Sons
Ltd, 5 edition, 2013.
Badi H Baltagi and Sophon Khanti-Akom. On efficient estimation
with panel data: An empirical comparison of instrumental variables
estimators. Journal of Applied Econometrics, 5(4):40106, Oct.-Dec. 1990.
T S Breusch and A R Pagan. The lagrange multiplier test and its
applications to model specification in econometrics. Review of
Economic Studies, 47(1):23953, January 1980.
A. Colin Cameron and Pravin K. Trivedi. Microeconometrics Using
Stata. Stata Press, revised edition, 2010.
Russell Davidson and James G. MacKinnon. Econometric Theory and
Methods. Oxford University Press, New York, Oxford, 2004.
Arthur S. Goldberger. A Course in Econometrics. Harvard University
Press, Cambridge and London, 1991.
William H. Greene. Econometric Analysis. Prentice Hall, 7 edition,
2012. ISBN 0130600383.
Paulo Guimaraes and Pedro Portugal. A simple feasible procedure
to fit models with high-dimensional fixed effects. The Stata Journal,
10(4):628649, 2010.
136