0% found this document useful (0 votes)
566 views136 pages

File Download PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
566 views136 pages

File Download PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 136

K I M P. H U Y N H , D A V I D T.

J A C H O - C H V E Z , R O B E R T
J. PETRUNIA AND MARCEL C. VOIA.

LECTURE NOTES
F O R A DVA N C E D
MICROECONOMETRICS
W I T H S TATA ( V. 1 5 . 1 )

H U Y N H , J A C H O - C H V E Z , P E T RU N I A A N D VO I A

Copyright 2015 Kim P. Huynh, David T. Jacho-Chvez, Robert J. Petrunia and Marcel C. Voia.
published by huynh, jacho-chvez, petrunia and voia

These lecture notes are provided as-is, without warranty of any kind, express or implied, including but
not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. They
are drawn from a plethora of sources and we attempt to cite and acknowledge when possible. In no event
shall the authors be liable for any claim, damages or other liability, whether in an action of contract, tort
or otherwise, arising from, out of or in connection with these lecture notes or the use or other dealings in
these lecture notes.
First printing, December 2015

It is a capital mistake to theorize before


one has data. Insensibly one begins to twist
facts to suit theories, instead of theories to
suit facts." Sherlock Holmes
It is a poor musician who blames his instrument."
Unknown.

Introduction
Course Objectives: The purpose of this course is to provide students
with the necessary tools to manage and work with large administrative databases using STATA programming tools. The course is
designed for new and intermediate STATA users who want to acquire
advanced skills in data management and programming in STATA. Besides tools for data management, this course exposes participants to
current empirical work along with microeconometric topics and techniques common to the analysis using large administrative datasets. In
addition to the emphasis on the statistical inference of these models,
we will stress their empirical relevance. After taking this course, the
participants should be able to:
1. Perform database management and estimation tasks using STATA.
2. Leverage STATA programming routines and user-contributed .ado
files.
3. Understand empirical research using microeconometrics, and
choose appropriate models and estimators for given economic
applications.
4. Interpret model estimates and diagnose potential problems with
models and know how to remedy them.
Prerequisite: Undergraduate Econometrics and Matrix Algebra
Lecture Notes: These lecture notes are provided to participants as-is
and without guarantees. They are drawn from a plethora of sources
and we attempt to cite and acknowledge when possible. Mainly the
notes are derived from the following sources:
Cameron, A. C. and P. K. Trivedi (2005) Microeconometrics: Methods
and Applications,1st edition, Cambridge University Press.
Wooldridge, J. M. (2010) Econometric Analysis of Cross Section and
Panel Data, 2nd edition, MIT Press.
Stata Data Analysis and Statistical Software (various manuals).

Course Organization
The course will consist of ten 75 minute lectures. Participants are
encouraged to ask questions.
Monday
4-January
08:00 - 09:00
09:00 - 10:30
10:30 - 10:45
10:45 - 12:15
12:15 - 14:00
14:00 - 15:30
15:30 - 15:45
15:45 - 17:15

Lecture 1
Break
Lecture 2

Tuesday
5-January

Wednesday
6-January

Thursday
7-January

Lecture 3
Break
Lecture 4
Lunch

Lecture 5
Break
Lecture 6
Lunch

Lecture 7
Break
Lecture 8
Lunch

Friday
8-January
Lecture 9
Lecture
10
Viernes
Econmico

Office
Hours

Office
Hours

Office
Hours

Office
Hours

The first part provides an introduction to Stata. This introduction


will take roughly four lectures. Introductory topics (Chapter 1)
include: (i) types of Stata files; (ii) command structure; (iii) loading
data and database management; (iv) matrix commands; and (v)
summary statistics and regression. Day one will conclude with
a discussion of regression diagnostics (Chapter 2). Then we will
proceed with quantile regression (Chapter 3), binary choice models
(Chapter 4), program evaluation (Chapter 6) and parametric and
semiparametric difference-in-difference methods (Chapter 6). We will
conclude with panel data models (Chapter 7) and - if time permits
- a specialized application to high-dimensional fixed-effects models
(Chapter 8).

Disclaimer: The course outline provided is a guide. Some topics


may be expanded or omitted depending on time constraints.

Chapter 1: Introduction to Stata


These lecture notes are meant to guide a Stata beginner through
an introductory process. Topics include: (i) Stata basics; (ii) Data
handling; (iii) Introduction to Stata programming; (iv) Stata programming - macros and looping; (v) Data manipulation; and (vi) summary
statistics and regression. Data handling

Stata basics
Stata basics describes using directories, file types, basic commands
and steps prior to loading data.

Directories
Stata stores executable files in different folders on your computer.
Find the executable file locations by typing
. sysdir
STATA:
UPDATES:

C:\\Program Files (x86)\\Stata12\


C:\Program Files (x86)\Stata13\ado\updates\

BASE:

C:\Program Files (x86)\Stata13\ado\base\

SITE:

C:\Program Files (x86)\Stata13\ado\site\

PLUS:

c:\ado\plus\

PERSONAL:

c:\ado\personal\

OLDPLACE:

c:\ado\

The files in these directories execute various Stata commands.


Thus, these directories are generally left alone to prevent accidental
damage to or deletion of a file.
The working directory is given by:
. pwd
C:\Users\user\Documents

10

lecture notes for advanced microeconometrics with stata (v.15.1)

Files are accessible from the working directory without the need
to specific the directory path. Stata has a default directory, which is
changeable.
. cd c:\course
c:\course

New directories are creatable and accessible


. mkdir examples
. cd examples
c:\course\examples
. pwd
c:\course\examples

Command Syntax
The basic command syntax generally follows the form:
command [command specifics] [qualifiers] [, options]

where
command is the Stata command
command specifics fills in details necessary for the command. These
specifics include variables and mathematical expressions.
qualifiers are if statements for any necessary restrictions and using
statements to specify.
options sets options available specific to a command. Stata has
defaults for all options available.
For example, the summarize command provides summary statistics
(means, standard deviations, counts) for a list of variables.
summarize [varlist] [if statements] [, options]

Generally, we do not have to type the whole command name for


Stata to recognize the desired command. For example, summarize
command works by typing

chapter 1: introduction to stata

sum ....

The Help and Manual Commands


Stata offers built in help for its various commands. If the command
is known, the help and man commands access the help file for the
command. For example, the help for the summarize is given when
typing:
. help summarize
(output omitted)

or
. man summarize
(output omitted)

The help command accesses the help viewer, while the man
presents text in the Stata output window. A search option is available
if the command name is unknown.
. search ols
(output omitted)

File Types
This section discusses the various file types associated with Stata.
These include:
1. ADO
These are files which contain routines to execute Stata commands.
Stata provides regular updates to the Ado files with the update
command through the web. We advise you contact your system
administrator about this process.
Individual Stata users often create and make available to other
users their own ado files. These unofficial ado files are usable after
being copied to an executable file directory such as
c:\ado\personal\

. As an alternative the user may create a folder to store these user


created ado files. The following allows access this new folders
contents

11

12

lecture notes for advanced microeconometrics with stata (v.15.1)

. adopath +C:\course\user_ado
[1]

(UPDATES)

"C:\Program Files (x86)\Stata12\ado\updates/"

[2]

(BASE)

"C:\Program Files (x86)\Stata12\ado\base/"

[3]

(SITE)

"C:\Program Files (x86)\Stata12\ado\site/"

[4]

"."

[5]

(PERSONAL)

[6]

(PLUS)

"c:\ado\plus/"

[7]

(OLDPLACE)

"c:\ado/"

[8]

"c:\ado\personal/"

"C:\course\user_ado"

Any ado files added to folder


C:\course\user_ado

are now usable by Stata. Users also create help files for their ado
files. These help files should also be added to the folder in order to
access while in Stata using the help command. A good source for
these user created ado procedure files is
http://ideas.repec.org/s/boc/bocode.html

2. DO
These are user created program files. A users may wish to execute
a series of Stata commands. These commands may be submitted
individually. However, a do file allows a batch submission of these
commands. The command
do "filedirectory\filename.do"

runs the do file.1 The default end-of-line delimiter in Stata is


the carriage return. However, the command "#delimit ;" sets
the end-of-line delimiter as a semi-colon. There are two options
for commenting within a do file. The first option comments a
line by place an asterisk at the beginning of the line. The second
option comments a section within a do file. Typing /* starts the
commenting section, while */ ends the commenting section. Any
text editor can be used to write a do file. Stata contains a built
in editor, which is accessible in the window drop down menu or
pressing Ctrl+9.
Stata allows for scrolling to be on or off when executing a do
file. When scrolling is off, Stata will partially run a do file and ask
for more. Any key stroke will cause the do file to continue. The
command set more off enables scrolling with no breaks, while
the command set more on turns more on.

Quotation marks are only necessary


when the filedirectory and/or
filename contains spaces or special
characters.
1

chapter 1: introduction to stata

3. DTA
Stata stores data in dta files. The use command accesses dta file
. use "C:\course\auto.dta", clear
(1978 Automobile Data)

The clear option clears any data currently in memory to allow


Stata to load the new data. The save command saves data in use.
. save "C:\course\examples\auto.dta"
file C:\course\examples\auto.dta saved
. save "C:\course\examples\auto.dta",replace
file C:\course\examples\auto.dta saved

Notice the second time the data is saved that the replace option
is used. The replace option overwrites the file. The browse allows
the user to look at the data in a spreadsheet form, while the edit
command allows the user to look at and manually change the
data.
4. LOG
Files containing output results are called log files. Log file must
be started or opened prior to saving results.
log using "filedirectory/filename.log", options

The Stata default on log files extension is smcl, but the above
command changes the file extension to log. using is necessary
to specify the log file location and name. The option replace

13

14

lecture notes for advanced microeconometrics with stata (v.15.1)

overwrites the file, while the option append adds new results to the
original log file.2

Data Handling
This section provides details on managing data sets.
1. Data Preliminaries
In Stata, the whole data set is in memory. Therefore, some preliminaries are necessary with large data sets to ensure proper
loading into Stata. The command "set memory value" sets the
memory available to Stata, while the command "set maxvar value"
sets the maximum number of variables. The memory allocation
and maximum number of variables must be large enough to load
the relevant data set. These are not changeable if Stata has data
in active memory. Restrictions on maximum possible memory
and variables vary across different versions of Stata (MP versus
IC versus SE). Increasing the memory allows the loading and
maintaining of larger data sets in Stata at the cost of reducing the
amount of memory available to other software applications.
The last relevant size command is "set matsize value". The
matsize refers to the matrix size or the maximum number of variables that can be included in any of Statas estimation commands.
The user is able to change the maxsize at anytime, both prior and
after loading data. The maximum matsize possible varies across
versions of Stata. The following example shows an OLS regression
with a problem.
. set matsize 100
. regress y x1-x400
matsize too small

The regression contains 400 independent variables, but the matsize


is too small. This problem is corrected by changing the matsize
. set matsize 500
. regress y x1-x400
(output omitted)

2. Loading data
The use command, discussed previously, loads dta files into Statas
active memory. An alternative method to load a Stata dta file is to
use the file > open in the drop down menu.

For the do, use and log commands,


filedirectory does not have to be

specified if the file location is the


working directory.

chapter 1: introduction to stata

3. Data types and memory requirements


Stata allows numerical variable data types: (i) byte; (ii) int; (iii)
long; (iv) float; and (v) double. Stata also allows the storage of
alpha, numeric and alphanumeric variables as strings.
The recast command changes the data type of numeric variables:
. recast type varlist

The advantage of changing a variables data type is the potential


to eliminate inefficient memory usage and reduce memory requirements. Data type double requires the most memory, while
data type byte require the least. The compress command attempts
to reduce memory requirements for the data in active memory
by demoting each variable into its least taxing data type while
maintaining the variables level of precision.
The describe command provides details on the whole data set and
individual variables.
. describe
Contains data from C:\Program Files\Stata11\ado\base/a/auto.dta
obs:
vars:
size:

74

1978 Automobile Data

12

13 Apr 2009 17:45


(_dta has notes)

3,774 (99.9% of memory free)

----------------------------------------------------------------------------

15

16

lecture notes for advanced microeconometrics with stata (v.15.1)

storage
variable name

type

display

value

format

label

variable label

---------------------------------------------------------------------------make

str18

%-18s

Make and Model

price

int

%8.0gc

Price

mpg

int

%8.0g

Mileage (mpg)

rep78

int

%8.0g

Repair Record 1978

headroom

float

%6.1f

Headroom (in.)

trunk

int

%8.0g

Trunk space (cu. ft.)

weight

int

%8.0gc

Weight (lbs.)

length

int

%8.0g

Length (in.)

turn

int

%8.0g

Turn Circle (ft.)

displacement
gear_ratio

int

%8.0g

Displacement (cu. in.)

float

%6.2f

Gear Ratio

foreign

byte

%8.0g

origin

Car type

---------------------------------------------------------------------------Sorted by:

foreign

Details on individual variables are also available


. describe make price gear_ratio
storage
variable name

type

display

value

format

label

variable label

---------------------------------------------------------------------------make

str18

%-18s

Make and Model

price
gear_ratio

int

%8.0gc

Price

float

%6.2f

Gear Ratio

4. Labelling
Stata allows for two types of labelling associated with variables.
The first type or variable label places a label on each variable to
better describe a variable. This label is the variable label given
in the describe command. The following sequence changes a
variables label and changes the label back to its original.
. describe price
storage
variable name

type

display

value

format

label

variable label

---------------------------------------------------------------------------price

int

%8.0gc

. label variable price "price in dollars"

Price

chapter 1: introduction to stata

. describe price
storage
variable name

type

display

value

format

label

variable label

---------------------------------------------------------------------------price

int

%8.0gc

price in dollars

. label variable price "Price"


. describe price
storage
variable name

type

display

value

format

label

variable label

---------------------------------------------------------------------------price

int

%8.0gc

Price

The changing of the variable label is useful to put a more detailed


description of the variable. The second type or value label places
a label on a particular value of a variable. This type of labelling
occurs for integer variables whose value indicates a particular category. Examples include: (i) sex - Male=0 Female=1; (ii) industry
codes; (iii) age category; and (iv) occupation categories.
. describe
Contains data from C:\Program Files\Stata11\ado\base/a/auto.dta
obs:
vars:
size:

74

1978 Automobile Data

12

13 Apr 2009 17:45


(_dta has notes)

3,774 (99.9% of memory free)

---------------------------------------------------------------------------storage
variable name

type

display

value

format

label

variable label

---------------------------------------------------------------------------make

str18

%-18s

Make and Model

price

int

%8.0gc

Price

mpg

int

%8.0g

Mileage (mpg)

rep78

int

%8.0g

Repair Record 1978

headroom

float

%6.1f

Headroom (in.)

trunk

int

%8.0g

Trunk space (cu. ft.)

weight

int

%8.0gc

Weight (lbs.)

length

int

%8.0g

Length (in.)

turn

int

%8.0g

Turn Circle (ft.)

displacement
gear_ratio

int

%8.0g

Displacement (cu. in.)

float

%6.2f

Gear Ratio

17

18

lecture notes for advanced microeconometrics with stata (v.15.1)

foreign

byte

%8.0g

origin

Car type

---------------------------------------------------------------------------Sorted by:

foreign

In the above, foreign is the only variable with a value label. To


obtain a full list of value labels, we type
. label dir
origin

The meaning of a specific value label is given by typing


. label list origin
origin:
0 Domestic
1 Foreign

The value label is removed when typing


. label drop origin

We use the following two command lines create a value label


. label define quality 1 "poor" 2 "fair" 3 "average" 4 "good" 5 "excellent"
. label values rep78 quality

The first command creates the label, while the second command
attaches the label to a particular variable. Finally, three command
lines are combined to change a value label
. label drop origin
. label define origin 0 "Domestic" 1 "Foreign"
. label values foreign origin

5. Missing values
Stata denotes missing values as ., .a, .b, .... or .z. Thus,
there are 27 missing values.
In the above data, the variable rep78 has some missing values. A
warning is necessary when using the if qualifier in a command
statement. Stata treats missing values as extremely large values
with .<.a<.b<....<.z. Thus, any or > qualifiers also
includes missing observations. To illustrate, the count command
provides the number of observations satisfying a qualifier condition. There are two cases. The first case uses the > qualifier

chapter 1: introduction to stata

. count if rep78>3
34

The second case combines the and not equal to to qualify


. count if rep78>3 & rep78~=.
29

The resulting counts are different.


6. Merging additional data
Stata allows the possibility to merge additional data to data already in active memory. The append and merge are two commands
to use in the process. The append command appends data currently in active memory by adding to the end one or more data
sets from disk memory.
append using "filedirectory1\filename1.dta" "filedirectory2\filename2.dta" ...

The merge command allows additional variables to be added from


external data sets when there is a common identifier or categorical
variables. For example, the merge command can be used to add
yearly variables; individual, household or firm information; and
industry information, among others, from external data to data in
active memory.

19

20

lecture notes for advanced microeconometrics with stata (v.15.1)

merge varlist using "filedirectory\filename.dta"

varlist is the list of individual or categorical variables on which


to merge between active data set with the external data set.
varlist can include more than one variable. The number of observations does not have to be the same for the active and merge data
sets. However, there are two requirements for the merge command.
First, both data sets must contain the variable(s) in varlist. Second, both data sets must have their observations sorted according
to varlist. The sort command sorts the data according to a
specified list of variables.
sort var1 var2

In the above, the sort command sorts on the variable var1 first and
then on variable var2 within var1.
7. Stata and Database Management
Stata allows the accessing, loading, writing or viewing of data
stored in database management systems via the
textttodbc command and its various functions. ODBC stands for
Open DataBase Connectivity. Examples of database management
system files include a dBASE file, an Excel file, or an ACCESS file.
Situations for using the odbc command include
To read in Office files but do not have Office on machine
To read and manipulate an Excel file with many tabs
To read and manipulate an Assess database file with many
tables
Database management
Export a data file to a database
Stata requires a setting up data sources step. You must register
and establish ODBC database file as a Data Source prior to using
the odbc commands and reading in an Excel or Access file into
Stata. The process varies depending on the platform but our
example shows how to do it for Windows 7.
(a) In Start Menu select Control Panel
(b) In the Control Panel select System and Security and then Administrative Tools which will bring up the ODBC Data Source
Administrator, see Figure 1.
(c) Within Administrative Tools click on Data Sources (ODBC), see
Figure 2

chapter 1: introduction to stata

(d) Click on Add which bring up the following, see Figure 3


(e) Select the relevant drive, which is Microsoft Access Driver (*.mdb,
*.accdb) in this example, and click finish. This step bring up
Figure 4
(f) Choose an arbitrary name for Data Source Name, here the name
is testdb
(g) Click Select and select data source file. In Figure 5 we select
the database. Figure 6 shows testdb has now been as a ODBC
data source.
(h) The same can be done for Excel files as shown in Figure 7.
(i) Select the relevant drive, which is Microsoft Access Driver (*.mdb,
*.accdb) in this example, and click finish. This step bring up
Figure 4
(j) Choose an arbitrary name for Data Source Name, here the name
is testdb
(k) Click Select and select data source file. In Figure 5 we select
the database. Figure 6 shows testdb has now been as a ODBC
data source.
(l) The same can be done for Excel files as shown in Figure 7.
Now both the Access and Excel datasources are accessible in
Stata via the odbc command
Figure 1: ODBC setup

The odbc list command lists the data sources accessible by Stata.
. odbc list
Data Source Name

Driver

21

22

lecture notes for advanced microeconometrics with stata (v.15.1)

Figure 2: ODBC setup

Figure 3: ODBC setup

chapter 1: introduction to stata

Figure 4: ODBC setup

Figure 5: ODBC setup

23

24

lecture notes for advanced microeconometrics with stata (v.15.1)

Figure 6: ODBC setup

Figure 7: ODBC setup

chapter 1: introduction to stata

25

---------------------------------------------------------------------------dBASE Files
Excel Files
MS Access Database
testdb

Microsoft Access dBASE Driver (*.dbf, *.ndx


Microsoft Excel Driver (*.xls, *.xlsx, *.xl
Microsoft Access Driver (*.mdb, *.accdb)
Microsoft Access Driver (*.mdb, *.accdb)

testxl
Microsoft Excel Driver (*.xls, *.xlsx, *.xl
----------------------------------------------------------------------------

The odbc query Data Source Name command reports the sheet
name information for an Excel data source file and table name
information for an Access data source file.
. odbc query "testxl"
DataSource: testxl
Path

: C:\Users\robert\Documents\work related\Statcan\CMFE\stc2014\intro\cdn4.xlsx

---------------------------------------------------------------------------full$
partial$
Sheet3$
------------------------------------------------------------------------------. odbc query "testdb"
DataSource: testdb
Path

: C:\Users\robert\Documents\work related\Statcan\CMFE\stc2014\intro\cdn4.accdb

------------------------------------------------------------------------------cdn4
--

The odbc load command now loads the data source into Stata for
use.
Notes:
(a) The command odbc desc describes output for a specified table.
Loading the table into Stata and using the describe command
provides similar output.
(b) Stata allows: (i) point and click to execute the odbc query following the odbc list command; (ii) point and click to execute
the odbc describe following the odbc query command; and
point and click to execute the odbc load following the odbc
query command.
(c) The odbc load command choice of variables to load and if
qualifiers.

26

lecture notes for advanced microeconometrics with stata (v.15.1)

(d) The exec(sqlstmt) option for the odbc load command allows
the user to submit an SQL SELECT statement.
(e) The odbc insert exports data to an existing ODBC data
source. This command allows data to be added to an existing
table, modifying data in an existing table/sheet or the creation
of a new table. The option create creates a new table/sheet,
overwrite overwrites the existing table/sheet, while specifying
no option augments the table/sheet by adding data. For the
overwrite or no option case, the specified variables must be a
subset of the variable in the table being modified.
8. Data manipulation commands
Stata allows variables and observations to be dropped using the
keep and drop commands. For variables
keep varlist

tells Stata to keep the variables in varlist in active memory,


while
drop varlist

tells Stata to drop the varlist variables from active memory.


For observations
keep if qualifiers

tells Stata to keep any observations satisfying the if qualifier


statements, while
drop if qualifiers

tells Stata to drop any observations satisfying the if qualifier


statements.
The rename command changes the name of a variable. For example
. rename foreign for

changes the name of the variable but the new variable retains the
original descriptions.

chapter 1: introduction to stata

Data Manipulation
In this section, we discuss the creation of new variables, rewriting
variable values, dummy variables, string variables and dealing with
longitudinal variables.
1. The generate and replace commands
The generate command creates a new variable, while the replace
command causes a replacement of a variables values. Consider
the following command sequence
. generate x1=.
(74 missing values generated)
. replace x1=5
(74 real changes made)
. generate y = 11
. replace y = 3
(74 real changes made)

The first command creates a new variable x1 which is exclusively


missing values. The second command changes the value of x1 to
equal 5. Similarly we create the variable y and change its value.
Stata also tells the user the number of missing values, when necessary, and the number of changes made. The general syntax for
these two commands is
gen/replace [type] newvarname = exp [if qualifier statements]

2. Dummy variables
Dummy or indicator variables are useful when dealing with
conditional states or categorical information. There are a number
of ways to create indicatory variables
(a) The generate and replace command
Use these two commands and if conditional statements to
generate the indicator variable.
. gen d=1 if rep78>3 & rep78~=.
(45 missing values generated)
. replace d=0 if rep78<=3
(40 real changes made)

27

28

lecture notes for advanced microeconometrics with stata (v.15.1)

Alternatively, the above two commands could be accomplished


with the following single command:
. gen d_rep78= (rep78>3) if rep78~=.
(5 missing values generated)

The above statement generates the variable d_rep78 which is


equal to 1 if the condition is true and 0 otherwise. Notice we
put the additional condition due to the missing value being
treated as extreme numbers.
(b) xi command
The second method to create a series of indicators is the xi
command. If a variable consists of n categories then the xi
command creates n 1 indicator variables. Further, the xi
command can be used to create interactions between multiple
indicator variables or an interaction between an indicator and a
continuous variable. The command structure for this command
is one of the following:
xi i.varname
xi i.varname1*i.varname2
xi i.varname1*varname3

where varname, varname1 and varname2 are numerical or string


categorical variables, while varname3 is a continuous numerical
variable.
. xi i.foreign
i.foreign

_Iforeign_0-1

. xi i.foreign*i.make
_Iforeign_0-1

(naturally coded; _Iforeign_0 omitted)

i.make

_Imake_1-74

(naturally coded; _Iforeign_0 omitted)


(_Imake_1 for make==AMC Concord omitted)

i.for~n*i.make

_IforXmak_#_#

(coded as above)

i.foreign

. xi i.foreign*mpg
_Iforeign_0-1
i.foreign
_IforXmpg_#
i.foreign*mpg

(naturally coded; _Iforeign_0 omitted)


(coded as above)

The variable naming structure on the xi command is important.


Everytime the xi command, any indicator variable previously
generated by the xi command is dropped unless the variable
has been renamed. In the above examples, the variable _Iforeign_1 is dropped with the xi i.foreign*i.make command
line.
Finally, the xi command can be used in conjunction with other
commands where indicator variables are used. For example, the

chapter 1: introduction to stata

regress command completes an OLS regression. To complete


a regression of variable y on a series of indicator variables, we
type
xi: regress y i.varname

(c) tabulate command


The tabulate command calculates frequencies of distinct values for a variable. We can use this command along with the
generate(newvarname) option to create a series of indicator
variables newvarname1, newvarname2...
. tabulate foreign, gen(dum)
Car type |

Freq.

Percent

Cum.

------------+----------------------------------Domestic |

52

70.27

70.27

Foreign |

22

29.73

100.00

------------+----------------------------------Total |

74

100.00

3. Numerical strings variables


We can convert string variables to numerical variables and vice
versa. If a string variable is essential numerical then the destring
command converts the string variable to a numerical variable.
Using the replace option replaces the variables original string
values with the numerical value, while the generate(newvar)
creates a new numerical variable. The ignore(char) tells Stata
to ignore any specified characters in the destringing process. The
tostring converts numerical values into a string variable.
The encode command also generates a new numerical variable
from a string variable. The advantage of the encode command is
that the string variable does not have to contain any numerical
characters.
. encode make, gen(make1)

In the above, the variable make is the make of car, while make1
is now a set of numerical values representing each make of car.
The data looks like the follow
In the first picture, the original make variable appears. The
variable is red in text colour to indicate it is a string variable. The
second picture shows the make1 variable, which appears to
be make names of cars. However, the blue text colour indicates
the make1 variable an underlying numerical value. The decode
command creates a string variable from a numeric variable.

29

30

lecture notes for advanced microeconometrics with stata (v.15.1)

chapter 1: introduction to stata

4. Longitudinal formats
Longitudinal data contain multiple observations for a series of
individuals, firms or countries, among other things Stata has
two ways to handle such data: wide format and long format. To
illustrate, we consider data across multiply years for a set of firms.
In the wide format, the data has one firm identifier variable. The
other variables may have one value or a value for each year of
observation. The variable name for multi-year variables is varname
year. In the long format, each firm has an identification variable
and a variable to indicate the year of observation.
The pictures in Figures 8 illustrate these two formats. For the
second figure, we have a variable, profit, that varies by year
and a variable, naics, that does not. Thus, we have a yearly
profit variable in the wide format and only one naics variable.
The reshape command converts data from wide to long format
and vice versa. To convert from wide to long, we have
. reshape long profit, i(id) j(year)
(note: j = 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011)
Data

wide

->

long

----------------------------------------------------------------------------Number of obs.
Number of variables

->

16

->

j variable (14 values)

14
4

->

year

->

profit

xij variables:
profit1998 profit1999 ... profit2011

-----------------------------------------------------------------------------

A new variable, year, is created to identify year. Alternatively,


we convert from wide to long with
. reshape wide profit, i(id) j(year)
(note: j = 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011)
Data

long

->

wide

----------------------------------------------------------------------------Number of obs.
Number of variables
j variable (14 values)

14

->

->

16

year

->

(dropped)

profit

->

profit1998 profit1999 ... profit2011

xij variables:
-----------------------------------------------------------------------------

31

32

lecture notes for advanced microeconometrics with stata (v.15.1)

Figure 8: Reshape

chapter 1: introduction to stata

The reshape wide command specifies the variable profit, since


this variable has yearly observations. These data still contain the
naics variable.

Macros, Loops, Matrices and Scalars


This section provides useful information to aid in programming
within Stata.
1. Macros
Macros is a string of characters which provide a short-hand for
one thing standing for another. There are two types local or global.
(a) Locals
Local macros are only accessed locally within a given do file
or program. A local macro can be anything including a list
of variables, an if qualifying statement, directory structure
or number. The command local defines a local macro. For
example
. local var "make price foreign"
. local i=1
. local direc "c:\"
. local des ="describe foreign"

Single quotes around macro name are used to assess information in a local. After defining the local var we can now use it
any Stata command requiring a variable list
. describe var
storage
variable name

type

display

value

format

label

variable label

---------------------------------------------------------------------------make

str18

%-18s

price

int

%8.0gc

foreign

byte

%8.0g

Make and Model


Price
origin

Car type

Similar the local des completes the Stata command


. des
storage
variable name

type

display

value

format

label

variable label

---------------------------------------------------------------------------foreign

byte

%8.0g

origin

Car type

33

34

lecture notes for advanced microeconometrics with stata (v.15.1)

(b) Globals
Global macros are accessed globally across multiple Stata do
files or throughout a Stata session. Defining a global is similar
to defining a local. However, a $ followed by the global macro
name calls the macro
. global var1 "make foreign"
. describe $var1
storage
variable name

type

display

value

format

label

variable label

---------------------------------------------------------------------------make

str18

%-18s

foreign

byte

%8.0g

Make and Model


origin

Car type

Statas macro notation implies global and local macros can have
the same name. Further, a variable and macro can have the same
name. However, the user must also keep track of macro definitions.
2. Scalars
Scalar stores a number or string.
. scalar c=5
. scalar b = "2 plus 3 is"

. display b
2 plus 3 is
. display c
5

Scalars and macros are interchangeable, but scalars are simpler.


When storing number, scalars allow a precision of 16 digits while
macros allow only 12 digits.
3. Looping
There are three methods to do a loop in Stata: while, forvalues
and foreach.
(a) while loops
The while loop executes or continues until a user specified
condition is no long true. The structure of a while loop is

chapter 1: introduction to stata

local i 1
while i<= 5 {
command
command
...
local i = i+1
}

The first line sets the initial value for the macro i. The second
line provides the condition to execute a series of commands.
The second line must end with {, which opens the loop
and allows the series of commands to be feed. The next lines
provides the commands to be executed. The local i = i+1
updates the i value. The last line closes the while loop with
}. The while loop must be open and closed to work, so a {
must have a corresponding }. For example, the while loop
. local i 1
. while i<=3 {
2. display i
3. local i = i+1
4. }

gives the output


1
2
3

We could also use conditional statements related to the looping


value.
. local i=0
. while i<=2{
2. summarize mpg if foreign==i
3. local i = i+1
4. }
Variable |

Obs

Mean

Std. Dev.

Min

Max

-------------+-------------------------------------------------------mpg |

52

19.82692

Variable |

Obs

Mean

4.743297
Std. Dev.

12

34

Min

Max

-------------+-------------------------------------------------------mpg |

22

24.77273

6.611187

14

41

35

36

lecture notes for advanced microeconometrics with stata (v.15.1)

Variable |

Obs

Mean

Std. Dev.

Min

Max

-------------+-------------------------------------------------------mpg |

(b) forvalues loop


The forvalues loop executes Stata commands over a consecutive series of integers. To complete the above example using a
forvalues loop, we would type
. forvalues i=1/3 {
2. display i
3. }

Again, we can use forvalues loop to do conditioning.


(c) foreach loop
The foreach loop does not specify a local macro and allows
numbers or text as inputs. Thus, foreach loop allows for looping over values and conditions on numerical variables, but also
over variables and conditions on string variables. The structure
of foreach loops is
foreach i in a b c d ... {
command
command
...
}

The foreach loop eliminates the need to loop over redundant


values. For example, 2012 North American Industry Classification System (NAICS) codes at the two-digit level include
some but not all integers between 11 and 91. The foreach loop
can be used to specify the relevant values to condition on only
NAICS codes. Examples of foreach loop and resulting output
include
. foreach i in mpg foreign {
2. sum i
3. }
Variable |

Obs

Mean

Std. Dev.

Min

Max

-------------+-------------------------------------------------------mpg |

74

21.2973

Variable |

Obs

Mean

5.785503
Std. Dev.

12

41

Min

Max

-------------+--------------------------------------------------------

chapter 1: introduction to stata

foreign |

74

.2972973

.4601885

Min

Max

and
. foreach t in 1 2 {
2. sum mpg if foreign==t
3. }
Variable |

Obs

Mean

Std. Dev.

-------------+-------------------------------------------------------mpg |

22

24.77273

Variable |

Obs

Mean

6.611187
Std. Dev.

14

41

Min

Max

-------------+-------------------------------------------------------mpg |

4. Matrices
Matrices store numerical values. Stata provides for two ways to
handle matrices: (i) Built-in matrix language in Stata; and (ii) Mata
- a self contained matrix programming language.
(a) Built-in matrix language in Stata
Statas matrix language provides a method to complete matrix
calculations necessary for estimators or store results. Let us
consider the following
. matrix c=(1,2\3,4\5,6)
. matrix list c
c[3,2]
c1

c2

r1

r2

r3

. display c
type mismatch
r(109);
. scalar b=c[2,2]
. display b
4

The first command line defines the matrix c. The comma


denotes a new column, while backslash denotes a new row.
The second command line uses the matrix list command to
display the matrix c. The third command line shows that the
display command does not work to display matrices. The fifth

37

38

lecture notes for advanced microeconometrics with stata (v.15.1)

command line creates a scalar from the row 2 and column 2


element of matrix c.
The svmat command creates new variables from the columns
of a matrix, while the mkmat creates a matrix from variables
in active data with the columns corresponding to a particular
variable.
(b) Mata
Mata is a self contained matrix programming language that is
built into Stata. Mata is comparable to other matrix programming languages such as Gauss and Matlab. The advantages of
Mata are: (i) computational speed; (ii) the only restrictions on
matrix size are computer specific, so Mata handles larger matrices; (iii) additional matrix commands are available; and (iv) ease
of implementation of estimators based on matrix commands
including general method of moments (GMM) and special cases
of maximum likelihood (ML). Type mata to enter mata from
stata. The following set of commands enters Mata, loads data
in memory from Stata in to matrices to performs an OLS regression in Mata, exits Mata and uses the regress command to
perform the same regression:
mata
x=st_data(.,("mpg", "trunk"))
cons = J(rows(x), 1, 1)
X = (x, cons)
y=st_data(.,("price"))
beta_hat = (invsym(X*X))*(X*y)
e_hat = y - X * beta_hat
s2 = (1 / (rows(X) - cols(X))) * (e_hat * e_hat)
V_ols = s2 * invsym(X*X)
se_ols = sqrt(diagonal(V_ols))
beta_hat
se_ols
end
reg price mpg trunk

The results from this series of commands is


. mata
---------------------- mata (type end to exit) ------------------------: x=st_data(.,("mpg", "trunk"))
: cons = J(rows(x), 1, 1)
: X = (x, cons)

chapter 1: introduction to stata

: y=st_data(.,("price"))
: beta_hat = (invsym(X*X))*(X*y)
: e_hat = y - X * beta_hat
: s2 = (1 / (rows(X) - cols(X))) * (e_hat * e_hat)
: V_ols = s2 * invsym(X*X)
: se_ols = sqrt(diagonal(V_ols))
: beta_hat
1
+----------------+
1 |

-220.1648801

2 |

43.55851009

3 |

10254.94983

+----------------+
: se_ols
1
+---------------+
1 |

65.59262431

2 |

88.71884015

3 |

2349.08381

+---------------+
: end
---------------------------------------------------------------------------. reg price mpg trunk
Source |

SS

df

MS

Number of obs =

-------------+-----------------------------Model |

141126459

70563229.4

Residual |

493938937

71

6956886.44

F(

-------------+-----------------------------Total |

635065396

73

2,

71) =

Prob > F

R-squared

Adj R-squared =

8699525.97

Root MSE

74
10.14
0.0001
0.2222
0.2003
2637.6

-----------------------------------------------------------------------------price |

Coef.

Std. Err.

P>|t|

[95% Conf. Interval]

-------------+----------------------------------------------------------------

39

40

lecture notes for advanced microeconometrics with stata (v.15.1)

mpg |

-220.1649

65.59262

-3.36

0.001

-350.9529

-89.3769

trunk |
_cons |

43.55851

88.71884

0.49

0.625

-133.3418

220.4589

10254.95

2349.084

4.37

0.000

5571.01

14938.89

------------------------------------------------------------------------------

The st_data() command loads data in memory from Stata to


create a matrix. In the above, the command line
x=st_data(.,("mpg", "trunk"))

creates matrix x where the ith row is the ith observation of mpg
and trunk variables.
Stata commands are accessible when in Mata using the stata()
command. For example
: stata("summarize price")
Variable |

Obs

Mean

Std. Dev.

Min

Max

-------------+-------------------------------------------------------price |

74

6165.257

2949.496

3291

15906

Summary statistics, regression and using results from Stata


The summarize and regress commands have already been introduced
in a previous section. This section looks a possible manipulations using these commands, alternative ways to capture summary statistics,
and ways to store results in Stata.
1. Accessing results in Stata
The summarize command provides summary statistic. The option
detail or its short form d provides more detailed list of summary
statistics. The summarize command is an r-class command. Stata
temporarily stores as scalars results from both r-class commands.
These stored results are replaced with a new r-class command
statement. The command return list displays scalars from a
r-class command currently in active memory
. sum price
Variable |

Obs

Mean

Std. Dev.

Min

Max

-------------+-------------------------------------------------------price |
. return list

74

6165.257

2949.496

3291

15906

chapter 1: introduction to stata

scalars:
r(N) =

74

r(sum_w) =

74

r(mean) =

6165.256756756757

r(Var) =

8699525.974268789

r(sd) =

2949.495884768919

r(min) =

3291

r(max) =

15906

r(sum) =

456229

. sum mpg,d
Mileage (mpg)
------------------------------------------------------------Percentiles
1%

Smallest

12

12

5%

14

12

10%

14

14

Obs

74

25%

18

14

Sum of Wgt.

74

50%

20

Mean
Largest

21.2973

Std. Dev.

5.785503

75%

25

34

90%

29

35

Variance

33.47205

95%

34

35

Skewness

.9487176

99%

41

41

Kurtosis

3.975005

. return list
scalars:
r(N) =

74

r(sum_w) =

74

r(mean) =

21.2972972972973

r(Var) =

33.47204738985561

r(sd) =

5.785503209735141

r(skewness) =

.9487175964588155

r(kurtosis) =

3.97500459645325

r(sum) =

1576

r(min) =

12

r(max) =

41

r(p1) =

12

r(p5) =

14

r(p10) =

14

r(p25) =

18

41

42

lecture notes for advanced microeconometrics with stata (v.15.1)

r(p50) =

20

r(p75) =

25

r(p90) =

29

r(p95) =

34

r(p99) =

41

. dis r(p10)
14

The scalars stored by return are using r(scalar_name) as done


above. The regress command is an example of an e-class or estimation class command. With e-class commands, Stata temporarily
stores results as both scalars and matrices. The command ereturn
list displays results from a e-class command currently in active
memory.
. reg profit revenue
Source |

SS

df

MS

Number of obs =

-------------+-----------------------------Model |
Residual |

7.8589e+14

7.8589e+14

2.1656e+15 13998

1.5471e+11

F(

-------------+-----------------------------Total |

2.9515e+15 13999

1, 13998) = 5079.93

Prob > F

R-squared

Root MSE

0.0000
0.2663

Adj R-squared =

2.1083e+11

14000

0.2662

3.9e+05

-----------------------------------------------------------------------------profit |

Coef.

Std. Err.

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------revenue |
_cons |

.0657316

.0009222

71.27

0.000

.0639239

.0675393

-6042.726

3466.723

-1.74

0.081

-12837.96

752.5133

-----------------------------------------------------------------------------. return list


. ereturn list
scalars:
e(N) =
e(df_m) =
e(df_r) =
e(F) =

14000
1
13998
5079.92696289744

e(r2) =

.2662724819513582

e(rmse) =

393325.8937651275

e(mss) =

785891415003325.8

chapter 1: introduction to stata

e(rss) =
e(r2_a) =

2165564211368496

e(ll) =
e(ll_0) =

-200217.6525015677

e(rank) =

.266220065354841
-202384.9753383144

macros:
e(cmdline) : "regress profit revenue"
e(title) : "Linear regression"
e(marginsok) : "XB default"
e(vce) : "ols"
e(depvar) : "profit"
e(cmd) : "regress"
e(properties) : "b V"
e(predict) : "regres_p"
e(model) : "ols"
e(estat_cmd) : "regress_estat"
matrices:
e(b) :

1 x 2

e(V) :

2 x 2

functions:
e(sample)
. matrix list e(b)
e(b)[1,2]
y1

revenue

_cons

.06573159

-6042.7258

. matrix list e(V)


symmetric e(V)[2,2]
revenue
revenue
_cons

_cons

8.505e-07
-.90726892

12018166

The above example shows that Stata temporarily stores the estimated coefficients and their variance-covariance matrix as matrices.
2. Using and storing results in Stata
The temporary r-class or e-class results can be stored longer

43

44

lecture notes for advanced microeconometrics with stata (v.15.1)

through matrix and scalar manipulation. The following foreach


loop calculates summary statistics for the price variable across
domestic and foreign classes of car, and stores means and standard
deviations as new scalars.
. foreach i in 0 1 {
2. sum price if foreign==i
3. scalar rmi=r(mean)
4. scalar rsi=r(sd)
5. }
Variable |

Obs

Mean

Std. Dev.

Min

Max

-------------+-------------------------------------------------------price |

52

6072.423

Variable |

Obs

Mean

3097.104
Std. Dev.

3291

15906

Min

Max

-------------+-------------------------------------------------------price |

22

6384.682

2621.915

3748

. dis rm0
6072.4231

Next, we create a matrix to store these summary statistics.


. matrix stat=(rm0, rm1\rs0,rs1)
. matrix list stat
stat[2,2]
c1

c2

r1

6072.4231

6384.6818

r2

3097.1043

2621.9151

The svmat command can now be used to convert the matrix of


summary statistics into data.
3. The collapse, statsby and egen commands
These three commands generate statistics. The collapse command
replace data in active memory with data containing specified
summary statistics for a set of specified variables. When using
the collapse command, The newly created data The statsby
command has the two additional features. First, statsby is able to
to collect statistics from both r-class and e-class. Second, statsby
has the saving option which saves the newly created data to a file.

12990

chapter 1: introduction to stata

The egen command does not replace the active data with new data,
but creates a new variable. The new variable is a summary statistic
for a specified variable. For all three commands, the by option
allows statistics to be generated across groups.

45

Chapter 2 Regression Diagnostics


Model misspecification and non-spherical errors are both violations
of OLS assumptions. This section looks at diagnostic tests and solutions to these problems.

Introduction
The validity of the OLS estimator and any inference based on this
estimator relies on the following assumptions. These assumptions
provide the data-generating process for a n 1-vector of random
variables y, and a n K-matrix of characteristics X:
Assumption A:
(A1) The true model is y = X + .
(A2) X is a nonstochastic and finite n K matrix such that n K.
(A3) X > X is nonsingular.
(A4) E () = 0

(A5) var() = E > = 2 In , 2 < .
(A6) N (0, var ()).
We now discuss the implications of violations of each of these assumptions and methods to handle these violations within Stata.

Non-Spherical Disturbances
Non-spherical disturbances refers to violation of the classical assumption A5. In particular, let us replace A5 by:



(A5 ) E i j = ij , ii i2 > 0, and max ij < +.
1i,jn

48

lecture notes for advanced microeconometrics with stata (v.15.1)

In particular, assumption A5 implies that


h
i
E > = 2 > 0

12 12 . . .

21 22 . . .
=
..
..
..
.
.
.
n1

n2

...

(1)
1n
2n
..
.
n2

where we have used the convention: i2 ii , and 12 = 21 .


Example 1 (Heteroskedasticity) This refers to the specific case where 1
takes the specific form =diag(12 , . . . , n2 ), i.e.

12
0
..
.
0

0
22
..
.
0

...
...
..
.

0
0
..
.
n2

...

Example 2 (Serial Correlation) Let us assume that { t }nt=1 follows an


AutoRegressive process of order 1, i.e. AR(1)
t = t1 + t ,
where {t }nt=1 is white noise3 , and || < 1, i.e. { t }nt=1 is covariance
stationary4 . It then follows that

( 0 )
( 1 )
..
.
( n 1 )

( 1 )
( 0 )
..
.
( n 2 )

...
...
..
.
...

( n 1 )
( n 2 )
..
.
( 0 )

E[t ] = 0,
E[t2 ] = 2 < , and

E[t ] = 0, for t 6= .

The stochastic process { t }nt=1 is


covariance stationary if
4

E[ t ] = < , t
E[( t )( ts )] = (s) , t.

where (s) = (2 /(1 2 ))|s| , for s = 1, . . . , n 1. In particular

2
=
1

..
.

1
..
.

...
...
..
.

n 1

n 2

...

n 1
n 2
..
.
1

A white noise process is defined as

2 .

(2)

Example 3 (Panel Data with Random Effects) Let {{yit }tT=1 , { xit> }tT=1 }in=1
represent Panel Data. Another potential model for the response yit is a Random Effect model:
yit = xit> + uit
uit = i + it , for i = 1, . . . , n and t = 1, . . . , T,

chapter 2 regression diagnostics

there are K parameters of interest, {i }in=1 are i.i.d.{0, 2 }, independent of


{{ it }tT=1 , { xit> }tT=1 }in=1 for i, t, and {{ it }tT=1 }in=1 are i.i.d.{0, 2 }. It can
be shown that
2
+ 2
2
...
2
0
...
0
...

.
.
.

..
..
..
2
2 + 2 . . .

.
.
..

..
..
.
2
0
...
...
...

2
2
2
2

. . . +
0
...
...
...

.
..
=
0
0
...
0
0
0
...

.
.
.

..
..
..
0 2 + 2
2
...

.
.
.
.

.
.
.
.
2
2
2
.
.
.
.

+ . . .

..
..
..
..
..
..
..

.
.
.
.
.
.
0

=
=

2 InT
2 InT

+ 2 [ In T >
T]
2
+ TV, where V

...

0
..
.
0
0
0
2
..
.
2
2 + 2

= T 1 [ In T >
T ].

It is said that is block diagonal. The structure of above implies that


{{uit }tT=1 }in=1 are homoskedastic, but serially correlated.
Remark 1 Another example of a model where assumption A5 is violated
is a Seemingly Unrelated Regression (SUR) model, see Greene [2012,
Section 10.2, pg. 292-314].

Estimation
In this section we discuss estimation strategies in the presence of
non-spherical disturbances. The discussion is somehow informal, and
we should try to strike understanding rather than preciseness.
In what follows, we maintain assumptions A1A4, A5 , and A6.
1 Maximum Likelihood Estimation (MLE)
Assumptions A1A4, A5 , and A6 imply that the joint density for
is a multivariate normal5 , i.e.


1/2
 1 
1 


f () = (2 )n/2 2
exp > 2
.
2


After noticing that 2 = 2n ||, the log-likelihood of observa
n
tions yi , xi> i=1 is


l ( , 2 y, X ) = log f ( y X| , 2 )
n
n
1
= log (2 ) log 2 log ||
2
2
2
1
> 1
2 (y X) (y X).
2

See Greene [2012, Section B.11, pg.


1041-1047]
5

49

50

lecture notes for advanced microeconometrics with stata (v.15.1)

Differentiating with respect to the K 1-vector and scalar 2 gives

"
#
l ( ,2 |y,X )
1
> 1 y X > 1 X )

2 (X
2

l ( , y, X ) =
=
.
l ( ,2 |y,X )
2n2 + 2(12 )2 (y X)> 1 (y X)
2



2
If the second-order conditions are satisfied, solving l ( bMLE , b
MLE
y, X ) =
0 gives
bMLE = ( X > 1 X )1 X > 1 y
2
b
MLE

1
= (y X bMLE )> 1 (y X bMLE ).
n

(3)
(4)

Remark 2 If assumption A5 holds, i.e. var() = 2 In , then b = bMLE ,


2
and s2 = (n/ (n K ))b
MLE
. Since maximum likelihood estimators attains Cramer-Rao Efficiency bound, it then follows that the Least Squares
b is Efficient if all the Classical Assumptions (Assumption A)
Estimator, ,
hold.


Remark 3 The Jacobian of l ( , 2 y, X ), i.e. the Hessian of l ( , 2 y, X ),
2
is needed in order to calculate the variance-covariance matrix of ( b>
MLE
)> .
MLE , b
Remark 4 Estimators 3 and 4 are only operational if is known.
2 Generalized Least Squares (GLS)
Since > 0 (by assumption), its inverse, 1 , is positive
definite, it is possible to write it as 1 = C1 C > , where
= diag(1 , . . . , n )
and l are the eigenvalues6 of . The columns of C are their
associated eigenvectors such that CC > = In . Therefore, choosing
P = 1/2 C > gives us
1 = C1 C > = C1/2 1/2 C >

= P> P, and
= ( P> P)

= P 1 ( P > ) 1 .

It follows from assumption A1 that


Py = PX + P, with 1 = P> P
y = X + .
Therefore, the transformed linear regression model satisfies the
classical Assumption A, i.e.
E[ ] = E[ P] = PE[],
var( ) = E[ ( )> ] = E[ P> P> ] = PE[> ] P>

= 2 PP> = 2 PP1 ( P> )1 P> = 2 In .

(5)

Since is also symmetric, its eigenvalues are l > 0, for l = 1, . . . , n.


6

chapter 2 regression diagnostics

51

Performing OLS on model 5, gives the Generalized Least Squares


estimator
bGLS = (( X )> X )1 ( X )> y

= ( X > P> PX )1 X > P> Py


= ( X > 1 X )1 X > 1 y,

(6)

with variance
var( bGLS ) = 2 (( X )> X )1

= 2 ( X > P> PX )1
= 2 ( X > 1 X ) 1 .

(7)

Furthermore, an unbiased estimate of the unknown 2 above is


readily obtained from direct application of OLS to model 5, i.e.
1
(y X bGLS )> (y X bGLS )
nK
1
=
[ P(y X bGLS )]> [ P(y X bGLS )]
nK
1
(y X bGLS )> 1 (y X bGLS ).
=
nK
Finally, since model 5 satisfies the classical Assumption A, an
exact, finite sample test of the linear restrictions
s2GLS =

H0 : R = c,
results in the test statistic 7

 1
1
>
s2GLS J
( R bGLS c)> ( R( X > X )1 R )1 ( R bGLS c),

This implies that all the standard


hypothesis test principles of OLS apply
to model 5.

which is distributed under H0 as F ( J, n K ).


Example 1 (Heteroskedasticity) Since =diag(12 , . . . , n2 ),
it follows that 2 = 2 diag(12 /2 , . . . , n2 /2 ), and therefore
P =diag(/1 , . . . , /n ), and model 5 becomes
i1 yi = i1 xi> + i1 i ,
giving the GLS estimator8
"
# 1
n

2
>
bGLS = xi x
i =1

It is also called the Weighted Least


Squares estimator, with weights i2 .
8

i2 xi yi .

i =1

Example 2 (Serial Correlation) It can be shown that for an AR(1)


process, matrix 2 is

0
...
0

..
1 + 2 . . .
.

1
.
.
.

..
..
..
= 0
0
.

..
.

2
.
.
1 +
0
...
0

(8)

52

lecture notes for advanced microeconometrics with stata (v.15.1)

and

P=

1 2

1
..
.

0
..
.
0

...

0
..
.
..
.
..
.
0

0
..
.

0 .

0
1

...
..

Therefore,
p

y = Py =

1 2 y1
y2 y1
..
.
y T y T 1


, x = Px =

(1 2 x1> )1/2
x2> x1>
..
.
x T> x T>1

(9)

that is, y is a ( T 1) 1-vector of the partial difference vector (almost)


{yt yt1 }tT=2 .
Example 3 (Panel Data) Define W = InT V, and note that
VW = 0, and
h
Wy = y11 y1

...

y1T y1

...

yn1 yn

...

ynT yn

i>

In this case, it can be shown that the GLS transformation is


s
2
P = InT (1 )V, where =
2 + T2

= W + V.
In other words, model 5 becomes
yit (1 ) yi = [ xit> (1 ) xi> ] + { it (1 ) i } .

(10)

Feasible GLS
All the above results requires knowledge of to be fully operational. In reality, we are hardly blessed with such information and
therefore GLS cannot be computed. Instead, we always replace
b (a consistent estimator9 of ). Thus an estimator of b is:
by
b 1 X ) 1 X >
b 1 y.
bFGLS = ( X >

(11)

Remark 5 We should also notice that 11 is no longer a linear estimator.


Remark 6 The finite sample properties of 11 are very difficult to evaluate. We will again rely on asymptotic approximations.
The following claim will show the asymptotic equivalence between bFGLS and bGLS :

Notice that has in general


n (n + 1) /2 + 1 unknown parameters. However, it is clear from the above
examples that we can re-parameterize
( ), where is a small set of
b (b).
parameters, and therefore
9

chapter 2 regression diagnostics

Claim 1 Let assumptions A1, A2*, A3, A4*, A5 hold. Then a sufficient condition for bFGLS and bGLS to have the same asymptotic distribution is
p
b ) X
n 1 X > (
0, and
p
b )
n1/2 X > (
0.

Proof. Available upon request.


The above claim has very important implications:
Remark 7 Firstly, it can be shown that var( bGLS ) in 7 is equal to
var( bMLE ) if assumption A6 also holds. Therefore, bGLS is efficient
(Cramer-Rao bound), and since in the limit, as n , bFGLS has the
same asymptotic distribution as bGLS , it follows that bFGLS is asymptotically efficient.
Remark 8 Secondly, the fact that using bFGLS allow us to estimate
efficiently (in the limit only!), with or without knowledge of (nuisance
parameter), this estimator is called adaptive.
Example 1 (Heteroskedasticity) Under assumption A5 , it should
be clear that b
2i should be informative about E[2i ] i2 . Therefore, it

n
can be shown that with a random sample yi , xi> , zi> i=1 , the following
algorithm provides a consistent estimator:
Algorithm 1 (Feasible Estimation: Heteroskedasticity)
b
(a) Regress yi on xi> and obtain the fitted residuals, i.e. b
i = yi xi> .
(b) Using {b
i }in=1 obtain a consistent estimate of i2 :
i. Parametric: Assume that i2 = h(zi> ), where h () is a known
function, e.g. h (u) = u, or h (u) = exp (u). Then, using
b. Set
Non-Linear Least Squares, regress b
2i on zi , and obtain
2
>
b
b) .
i = h(zi
ii. Nonparametric: Assume that i2 = m (zi ), where m () is an
unknown but smooth function. Then, using Nonparametric
b (zi ), and set
2i on zi , and obtain m
Regression Methods, regress b
2
b
b ( z i ).
i = m
(c) Replace i2 by b
i2 in 8. If necessary, repeat till convergence.

53

54

lecture notes for advanced microeconometrics with stata (v.15.1)

Example 2 (Serial Correlation) Again, assumption A5 means that


{b t }tT=1 is informative about the unobserved process { t }tT=1 . When T

T
is large, it can be shown that with a sample yt , xt> t=1 ,the following
algorithm (Corchrane-Orcutt) provides a consistent estimator:
Algorithm 2 (Feasible Estimation: Serial Correlation)
b
(a) Regress yt on xt> and obtain the fitted residuals, i.e. b
t = yt xt> .
(b) Using {b
t }tT=1 , regress b
t on b
t1 and obtain a consistent estimate of
, i.e. b.

T
(c) Replace b in 9, that is, using yt byt1 , ( xt bxt1 )> t=2
regress yt byt1 on ( xt bxt1 )> . If necessary, repeat till convergence.
Example 3 (Panel Data) Firstly, notice that
yit = xit> + i + it ,

(12)

yi = xi> + i + i ,

(13)

>

yit yi = ( xit xi ) + ( it i ),

(14)

and it should be clear that the fitted residuals from equation 14 are
informative about {{ it }tT=1 }in=1 only, while the fitted residuals from
equation 13 is informative about both, {i }in=1 , and {{ it }tT=1 }in=1 .
Therefore, again, the following algorithm can be shown to provide a
consistent estimator:
Algorithm 3 (Feasible Estimation: Random Effects)
(a) Regress yit on xit> using fixed effect estimator, i.e. regress yit yi
on xit xi in 14 by least squares. Then obtain the fitted residuals, and
construct
b
2 = (nT n K )1

[(yit yi ) (xit xi )> bFE ]2

i =1 t =1

(b) Regress yit on xit> using between estimator, i.e. regress yi on xi> in
13 by least squares. Then, obtain the fitted residuals, and construct10
b
2 = (n K 1)1
(c) Construct

s
b =

[yi xi> bB ]2 T b2

i =1

b
2
,
b
2 + Tb
2

and replace by b in 9. Run the regression.

10
Notice that b
2 can be negative, in
which case, it is common practice to set
b
2 = 0.

chapter 2 regression diagnostics

3 Ordinary Least Squares


If assumptions A1-A4 and A5 hold, the OLS estimator is still
unbiased, consistent, but inefficient
Unbiased:
Recall that assumption A1, and A3 imply, b = ( X > X )1 X > ,
then assumption A2 and A4 imply
E[ b] = ( X > X )1 X > E[]

= .
Inefficient:
It can be shown that
var( b) = 2 ( X > X )1 X > X ( X > X )1 .

(15)

In order to show that var( b)var( bGLS ) 0, it is sufficient to


show
X > 1 X ( X > X )( X > X )1 ( X > X ) 0
X > [1 X ( X > X )1 X > ] X 0
X > P> [ In ( P> )1 X ( X > X )1 X > P1 ] PX 0
X > P> [ In ( P> )1 X [(( P> )1 X )> (( P> )1 X )]1 (( P> )1 X )> ] PX 0,
and the last inequality is true because

[ In ( P> )1 X [(( P> )1 X )> (( P> )1 X )]1 (( P> )1 X )> ]


is an idempotent matrix.
Consistency:
In the presence of stochastic regressors such that
p

Qn n1 X > X Q > 0,
it can be shown11 that

See Greene [2012, Section 9.2.1, pg.


259-261]

11

b .
Corrections to OLS:
Since OLS is still consistent and unbiased, we could still use it
in a given empirical study, provided we are willing to accept the
c ( b) = s2 ( X > X )1 is not longer a
loss of efficiency. However, var
consistent estimator of var( b). The following procedures can be
implemented:
(a) Heteroskedasticity (Eicker-White): Under quite general conditions, White [1980] showed that
n 1

[b2i i2 ]xi xi> 0,

i =1

55

56

lecture notes for advanced microeconometrics with stata (v.15.1)

where {b
i }in=1 are the OLS fitted residuals. Therefore, a consistent estimator of 15 is
c ( b) = ( X > X )1 X > diag(b
var
21 , . . . , b
2n ) X ( X > X )1 .
Example 4 Suppose we have found evidence of heteroskedasticity and
would like to correct the OLS standard errors. There are two ways to
correct the OLS standard errors in Stata: (i) regress command plus
the robust option; or (ii) regress command plus the vce(robust)
option. The following shows both option:
. reg profit revenue emp
Source |

SS

df

MS

Number of obs =

-------------+-----------------------------Model |
Residual |

8.7206e+14

F(

4.3603e+14

Prob > F

0.0000

2.0632e+15 13161

1.5677e+11

R-squared

0.2971

-------------+-----------------------------Total |

13164

2, 13161) = 2781.37

2.9353e+15 13163

Adj R-squared =

2.2299e+11

Root MSE

0.2970

4.0e+05

-----------------------------------------------------------------------------profit |

Coef.

Std. Err.

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------revenue |

.0895376

.0013408

66.78

0.000

.0869093

.0921658

emp |

-10.55602

.424665

-24.86

0.000

-11.38842

-9.72361

_cons |

1516.313

3619.056

0.42

0.675

-5577.559

8610.184

-----------------------------------------------------------------------------. reg profit revenue emp,robust


Linear regression

Number of obs =
F(

2, 13161) =

13164
158.28

Prob > F

0.0000

R-squared

0.2971

Root MSE

4.0e+05

-----------------------------------------------------------------------------|
profit |

Robust
Coef.

Std. Err.

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------revenue |

.0895376

.0050334

17.79

emp |
_cons |

-10.55602

1.192198

-8.85

1516.313

2450.219

0.62

0.000

.0796714

.0994037

0.000

-12.89289

-8.219136

0.536

-3286.469

6319.094

-----------------------------------------------------------------------------. reg profit revenue emp,vce(robust)

chapter 2 regression diagnostics

Linear regression

Number of obs =
F(

2, 13161) =

57

13164
158.28

Prob > F

0.0000

R-squared

0.2971

Root MSE

4.0e+05

-----------------------------------------------------------------------------|

Robust

profit |

Coef.

Std. Err.

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------revenue |

.0895376

.0050334

17.79

0.000

.0796714

.0994037

emp |

-10.55602

1.192198

-8.85

0.000

-12.89289

-8.219136

_cons |

1516.313

2450.219

0.62

0.536

-3286.469

6319.094

------------------------------------------------------------------------------

(b) Serial Correlation (Newey-West)12 : Newey and West [1987]


proposed the following Heteroskedasticity and Autocorrelation
Consistent (HAC) estimator of the covariance matrix 15:

This is what the STATA command


newey implements.

12

c ( b) = [tT=1 xt xt> ]1 ST [tT=1 xt xt> ]1


var
p

ST = tT=1b
2t xt xt> + s=1 tT=s+1 w p (s) b
tb
ts [ xt xt>s + xts xt> ],
where w p (s) are weights, i.e. Barlett Weights w p (s) = 1
s/ ( p + 1). The implementation of this estimator requires us to
select13 p.

This corresponds to selecting the


option lag() in the STATA command
newey.

13

(c) Random Effects: To obtain the correct standard errors for the
OLS estimates, the cluster(varname) or vce(cluster varname)
options for the regress command can be used. Both options
do exactly the same correction to the OLS standard errors. The
cluster option adjusts standard errors to allow for within
group correlation. The variable given by varname provides the
categorical or group variable by which to cluster observations.
For the random effects model, the within group correlation
is at the individual or firm level, so the cluster option uses
a firm or individual identifier variable. However, alternative
random effect correlation structures are possible. For example,
firm within an industry may experience some correlation. In
this situation, the cluster option uses an industry identifier
variable.
. reg profit revenue emp, cluster(id)
Linear regression

Number of obs =

13164

F(

41.97

2,

2857) =

Prob > F

0.0000

58

lecture notes for advanced microeconometrics with stata (v.15.1)

R-squared

0.2971

Root MSE

4.0e+05

(Std. Err. adjusted for 2858 clusters in id)


-----------------------------------------------------------------------------|

Robust

profit |

Coef.

Std. Err.

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------revenue |

.0895376

.0098886

9.05

0.000

.070148

.1089272

emp |

-10.55602

1.96084

-5.38

0.000

-14.40082

-6.71121

_cons |

1516.313

4771.025

0.32

0.751

-7838.687

10871.31

-----------------------------------------------------------------------------. reg profit revenue emp, cluster(naics)


Linear regression

Number of obs =
F(

2,

13164

199) =

16.58

Prob > F

0.0000

R-squared

0.2971

Root MSE

4.0e+05

(Std. Err. adjusted for 200 clusters in naics)


-----------------------------------------------------------------------------|
profit |

Robust
Coef.

Std. Err.

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------revenue |

.0895376

.0156597

5.72

0.000

.0586573

.1204179

emp |

-10.55602

2.571695

-4.10

0.000

-15.62729

-5.484744

_cons |

1516.313

6387.663

0.24

0.813

-11079.88

14112.51

------------------------------------------------------------------------------

Non-Spherical Disturbances: Detection


Tests designed to detect heteroskedasticity and serial correlation
will, in most cases, be applied to the OLS residuals. This is the case
because they are based on the principle that if the true disturbances
are heteroskedastic or correlated, then the fitted residuals will show
evidence of this. In some cases, a simple visual inspection of the
residuals can help us choose the appropriate alternative.

Heteroskedasticity
The choice of the most appropriate test for heteroskedasticity is determined by how explicit we want to be about the form of heteroskedas-

chapter 2 regression diagnostics

ticity. In general, the more explicit we are, the more powerful the test
will be.
Breusch-Pagan/Godfrey Test
This is a Lagrange Multiplier test for the hypothesis
H0
HA

:
:

i2 = 2 ,

i2 = 2 f 0 + > zi .

The test is equivalent to testing = 0, but it does not require us to


specify the unknown, continuously differentiable function f () at all.
The simplest variant of the Breusch-Pagan test can be computed as
multiplying the number of observations with the un-centered R2 in
the auxiliary regression of 1 b
2i /b
2 on zi alone (without a constant),
n
2

1
2
where b
= n i =1 b
i . The resulting test is asymptotically chisquared with degrees of freedom equal to the number of variables in
zi under H0 .
Example 5 Suppose we are interested to test for the presence of heteroskedasticity in our Canadian firm level data. In Stata, the Breusch-Pagan
test is a post regression diagnostics test implemented with the following set
of commands:
. quietly reg profit revenue emp
. estat hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of profit
chi2(1)

= 93143.73

Prob > chi2

0.0000

. estat hettest,iid
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of profit
chi2(1)

81.72

Prob > chi2

0.0000

. estat hettest revenue emp,iid

59

60

lecture notes for advanced microeconometrics with stata (v.15.1)

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity


Ho: Constant variance
Variables: revenue emp
chi2(2)

105.20

Prob > chi2

0.0000

The iid option on the estat hettest command calculates the nR2
from the auxiliary version of the test. The estat hettest command
also allows for choice of variables, z, affecting the heteroskedasticity.
Without specifying any variables, the default is to perform the test
with fitted values as the dependent variable serving as the z variable.
White Test
This test does not require additional structure on the alternative
hypothesis. Specifically, it tests the general hypothesis of the form
H0
HA

:
:

i2 = 2 ,
Not H0 .

A simple operational version of this test is carried out by computing nR2 in the regression of b
2i on a constant and all (unique) first
moments, second moments, and cross-products of the original regressors. The test is asymptotically distributed as chi-squared with
degrees of freedom equal to the number of regressors in the auxiliary
regression, excluding the intercept.
Example 6 We now apply White Test instead. Besides manually calculating the test statistic, Stata allows two ways to perform the White test.
The first method uses the estat imtest with the white option specified.
The second method uses estat hettest with the iid option and letting z
variables in the auxiliary regression equal all the right-hand side variable
along with their products and cross products:
. gen rev2=revenue*revenue
. gen emp2=emp*emp
. gen revemp=revenue*emp
. quietly reg profit revenue emp
. estat imtest, white
Whites test for Ho: homoskedasticity
against Ha: unrestricted heteroskedasticity

chapter 2 regression diagnostics

chi2(5)

131.21

Prob > chi2

0.0000

61

Cameron & Trivedis decomposition of IM-test


--------------------------------------------------Source |

chi2

df

---------------------+----------------------------Heteroskedasticity |

131.21

0.0000

Skewness |

-2.15e+09

1.0000

Kurtosis |

-4.78e+25

1.0000

---------------------+----------------------------Total |

-4.78e+25

1.0000

--------------------------------------------------. estat hettest revenue emp rev2 emp2 revemp,iid


Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: revenue emp rev2 emp2 revemp
chi2(5)

131.21

Prob > chi2

0.0000

Other tests can also be found in the literature, e.g. see Johnston
and DiNardo [1997, Sections 6.2.3 and 6.2.4, pg. 168-169].

Serial Correlation
In order to illustrate these tests, we will fabricate a data set14
{yt , xt }36
t=1 , where
t = t1 + ut ,
xt = t + t1 ,
yt = 0 + xt 1 + t ,
36
with {ut }36
t=1 , and { t }t=1 are i.i.d. N (0, 1), 0 = 0, 1 = 1, = 0.9,
and = 0.9.
The code to generate this data set is:

clear
set obs 37
gen t=_n-1

A data set created using computers


pseudo-random number generation
capabilities.

14

62

lecture notes for advanced microeconometrics with stata (v.15.1)

set seed 21324


gen u=rnormal()
gen v=rnormal()
tsset t
gen x=v+0.9*l.v
drop if t==0
gen e=u if t==1
replace e=-.9*l.e+u if t>1
gen y=x+e

Durbin-Watson Test: AR(1)


Suppose that in the model yt = xt> + t one suspect that the disturbance follows an AR (1) process, namely
t = t1 + ut ,

chapter 2 regression diagnostics

63

The null hypothesis of zero correlation is then


H0 : = 0
H A : 6= 0.
A diagnostic test for autocorrelation called the Durbin-Watson test is:
2

d=

t b
t 1 )
tT=2 (b
,
T
2t
t =2 b

which for large T, we have


d ' 2 (1 b) ,
where {b
t }tT=1 are the OLS fitted residuals, b = tT=2b
tb
t1 /tT=2b
2t , i.e.
T
T
the coefficient in the OLS regression of {b
t }t=2 on {b
t 1 } t =2 .
Because any computed d value depends on the associated X
matrix, exact critical values of d cannot be tabulated. Instead, Durbin
and Watson established upper (dU ), and lower (d L ) bounds for the
critical values15 .
. reg y x
Source |

SS

df

MS

These bounds only depend on


the sample size and the number of
regressors. See Greene [2012, Section
20.7.3, pg. 923] for a description on how
to perform the test.

15

Number of obs =

-------------+------------------------------

F(

1,

34) =

36
16.57

Model |

109.129232

109.129232

Prob > F

0.0003

Residual |

223.877824

34

6.58464187

R-squared

0.3277

-------------+-----------------------------Total |

333.007055

35

Adj R-squared =

9.51448729

Root MSE

0.3079
2.5661

-----------------------------------------------------------------------------y |

Coef.

Std. Err.

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------x |
_cons |

1.320137

.3242759

4.07

0.000

.6611294

1.979145

.2189512

.4313634

0.51

0.615

-.6576847

1.095587

-----------------------------------------------------------------------------. estat dwatson


Durbin-Watson d-statistic(

2,

36) =

3.790313

There are two important issues to be considered when using this


tests:
(A1 ) It is necessary to include a constant term in the regression.

64

lecture notes for advanced microeconometrics with stata (v.15.1)

(A2 ) It is invalid when X is stochastic, i.e. it is not applicable


when lagged values of the dependent variable appear among the
regressors.
See Johnston and DiNardo [1997, Section 6.6.3, pg. 182-184] and
Greene [2012, Section 20.7.4, pg. 923-924] for a discussion about how
to perform the test when the regression contains a lagged dependent
variable.

Breusch-Godfrey Test
This is a Lagrange multiplier test of
H0 : No Autocorrelation
H A : { t }tT=1 AR ( p) , or { t }tT=1 MA ( p) .
Again, suppose that in the model yt = xt> + t one suspect that the
disturbance follows an AR (1) process, namely
t = t1 + ut .
Stata calculates the Breusch-Godfrey test statistic with the post
regress estimation command estat bgodfrey. Alternatively, this
Lagrangian multiplier test can be applied as follow:
Algorithm 4 Step 1 Regress {yt }tT=1 on { xt> }tT=1 by OLS, and obtain the
fitted residuals {b
t }tT=1 .

T
Step 2 Regress {b
t }tT=1 on 1, xt> , b
t1 t=2 and find R2 .
Step 3 Then, TR2 2(1) .
H0

. quietly reg y x
. estat bgodfrey
Breusch-Godfrey LM test for autocorrelation
--------------------------------------------------------------------------lags(p)

chi2

df

Prob > chi2

-------------+------------------------------------------------------------1

30.871

0.0000

--------------------------------------------------------------------------H0: no serial correlation


. predict ehat,resid

chapter 2 regression diagnostics

65

. quietly reg ehat x l.ehat


. dis 36*e(r2)
31.293603

Remark 9 This procedure can be extended to testing for higher orders of


autocorrelation. In particular, for general p, one simply adds further-lagged
fitted residuals16 to the regressors in Step 2.

We must fill-in missing values for


lagged residuals with zeroes.

16

Remark 10 A remarkable feature of this test is that it also tests against the
alternative hypothesis of an MA ( p) process for the disturbance.
Remark 11 See Greene [2012, Section 20.7.1, pg. 922] and Johnston and
DiNardo [1997, Section 6.6.4, pg. 185] for more information.
Other popular tests are Box and Pierces test and Ljungs refinement17 , e.g. Greene [2012, Section 20.7.2, pg. 922-923] and Johnston
and DiNardo [1997, Section 6.6.5, pg. 187].

Further Reading
Chapter 9, and Sections 20.120.9 (pg. 903-930) in Greene [2012].
Chapter 6 in Johnston and DiNardo [1997].
Sections 1.6, 2.52.8, and 2.10 in Hayashi [2000].
Chapters 27, and 28 in Goldberger [1991].
Section 7 in Davidson and MacKinnon [2004].

17
Caution: These tests are designed to
t , the effect
work with t , and not with b
of having estimated residuals rather
than the series itself is unknown.

Chapter 3 Quantile Regression


Introduction
The classical linear regression model can be thought as a model for
the conditional mean of y given a set of covariates x. That is, under
the classical set of assumptions A, we have
E[ y| x] = x> ,
and therefore inference and testing are defined around conditional average responses. Koenker and Bassett (1978) in Regression Quantile,
Econometrica, vol 46, 33-50 proposed modeling conditional quantile
responses instead. This approach has become very popular in recent
years. This popularity has been somehow driven by the wide range
of policy-related questions quantile regression can answer.

Quantiles and Optimization


Any real-valued random variable Y may be characterized by its
(right-continuous) distribution function
FY (y) = Pr {Y y} ,
whereas for any 0 < < 1, the -th quantile18 of Y is defined as

18
You may have already been familiar
with the 0.5-th quantile, i.e. the median.

FY1 ( ) = inf{ y| FY (y) }.


Given a random sample {yi }in=1 , the -th sample quantile, yb , can be
thought as the solution
yb ( ) = arg minin=1 (yi )
R

(u) = u ( I (u < 0)) .


Remark 12 Unlike the least squares approach, where yb = arg minin=1 (yi )2
R

is unique by the projection theorem, yb ( ) may be unique or represent an


interval of -th quantiles19 .

In this case, the smallest element must


be chosen to adhere to the convention
that the empirical quantile function is
left-continuous.

19

68

lecture notes for advanced microeconometrics with stata (v.15.1)

Quantile Regression
In the same way the sample mean solves the problem
minin=1 (yi )2
R

suggests that, if we are willing to express the conditional mean of y


given x as (x) = x> 0 , then 0 may be estimated by solving

2
min in=1 yi xi> .

1.2
8
iyi

Similarly, since the -th sample quantile solves the problem

1.4
10

iyi 2

RK

1.6
12

1.0

6
0.8

minin=1 (yi ) ,
R

we are lead to specify the -th conditional quantile function of y


given x as Qy ( | x) = x> 0 , then 0 may be estimated by solving


min in=1 yi xi> .
(16)

0.6

2
0.4

RK

Remark 13 The objective function 16 is NOT differentiable and can be


re-formulated as a linear programme. Therefore, standard methods and
asymptotic theory are not directly applicable.
Remark 14 For each -th quantile of y, an associated ( ) can be estimated, i.e. the covariates could have different effects at different quantiles of
the distribution of y.
We are interested in analyzing medical expenditures (ltotexp)
of individuals 65 years and older who qualify for health care under
the U.S. Medicare program. The original data source is the MEPS. In
particular, we are interested on how an indicator for supplementary
private insurance (suppins), one health-status variable (totchr), and
three sociodemographic variables (age, female, white) affect medical
expenditures at its different conditional quantiles.

. use mus03data.dta, clear


. drop if ltotexp==.
(109 observations deleted)
. keep ltotexp suppins totchr age female white

Given a sample {yi , xi0 }in=1 from the joint density f yx = f y|x f x ,
where x X Rk . Suppose we specify the th conditional quantile
function as
Q y ( |x ) = x0 0 ( ).

See Cameron and Trivedi [2010]

chapter 3 quantile regression

69

This can be understood as coming from the following linear model


y = x0 0 + u, where Qu ( |x) = 0.
We will denote Fu|x as the conditional cumulative distribution function of u given x which we are going to assume it is continuously
differentiable with conditional density f u|x .20 Then under suitable
conditions 0, are identified as the minimizer of

20

Notice that by construction


= Fu|x ( Qu ( |x)) = Fu|x (0).

Q( ) := E[ (y x0 )],
where (u) := u( I(u < 0)) is the check function introduced
earlier on, we have also used the short-hand notation := ( ) and
0, := 0 ( ). The quantile regression estimator of 0, is defined as
b := arg min En [ (y x0 )].

Firstly, notice we can rewrite21

21

(y x0 ) = (y x0 )[I(y x0 0) (1 )I(y x0 < 0)]

= (y x0 )[ I(y x0 < 0)].

Because

(u) = u( I(u < 0))

= uI(u 0) u(1 )I(u < 0).

(17)

Define the score function22


s( ) = x[ I(y x0 < 0)].

This can be thought at the derivative


of (17) with respect to while treating
I(y x0 < 0) as a constant.

22

Notice that
E[s( )|x] = x[ E[I(y x0 < 0)|x]]

= x[ E[I(y x0 0, x0 ( 0, ) < 0)|x]]


= x[ E[I(u < x0 ( 0, ))|x]]
= x[ Fu|x (x0 ( 0, ))],
and therefore E[s(0, )|x] = 0 by construction. Now define
E[s( )|x] = f u|x (x0 ( 0, ))xx0 .
Finally, one can show that23
H0 = E[ f u|x (0)xx0 ],
and that by choosing
s0,n = En [s(0, )],
the following asymptotic result holds

n(b 0, ) d N (0, (1 )E[ f u|x (0)xx0 ]1 E[xx0 ]E[ f u|x (0)xx0 ]1 )

The intuition here is that since


E[s( )] can be thought at the first
derivative of Q( ), then E[s( )]
can be thought as the second derivative of Q( ).

23

lecture notes for advanced microeconometrics with stata (v.15.1)

70

Remark 15 If u is independent of x, then Asy. Var( n(b 0, )) simplifies to (1 ) f u2 (0)E[xx0 ]. In any case, the construction of asymptotically
consistent standard errors is cumbersome because of the presence of the
unknown (conditional) density of u (given x). See Wooldridge [2010, pp.
456-457] for an accessible account.
Remark 16 The coefficient estimates b have interpretation (almost) like
those in least squares regression.24

24

http://www.ats.ucla.edu/stat/
stata/faq/quantreg.htm.

. global x suppins totchr age female white


.
. quietly regress ltotexp $x
. estimates store OLS

//Performing OLS regression

. quietly qreg ltotexp $x, quantile(.25) //Performing QR @ tau=0.25


. estimates store QR_25
. quietly qreg ltotexp $x, quantile(.50) //Performing QR @ tau=0.50
. estimates store QR_50
. quietly qreg ltotexp $x, quantile(.75) //Performing QR @ tau=0.75
. estimates store QR_75
.
.*--Comparison of estimates accross different quantiles
. estimates table OLS QR_25 QR_50 QR_75, b(%7.3f) se
-----------------------------------------------------Variable |
OLS
QR_25
QR_50
QR_75
-------------+---------------------------------------suppins |

0.257

0.386

0.277

0.149

0.046

0.055

0.047

0.060

totchr |

0.445

0.459

0.394

0.374

0.018

0.022

0.018

0.022

age |

0.013

0.016

0.015

0.018

0.004

0.004

0.004

0.005

female |

-0.077

-0.016

-0.088

-0.122

0.046

0.054

0.047

0.060

white |

0.318

0.338

0.499

0.193

|
_cons |

0.141

0.166

0.143

0.182

5.898

4.748

5.649

6.600

0.296

0.363

0.300

0.381

-----------------------------------------------------legend: b/se

Let := [ (1 )0 , . . . , (m )0 ]0 , then null


hypothesis of the form
H0 : R = r,
can be tested by the Wald approach. In
particular, H0 : (0.25) = (0.75) can be
tested using the simultaneous quantile
regression command sqreg followed
by the command test, see e.g. Section
7.3.7 in Cameron and Trivedi (2010).

chapter 3 quantile regression

71

Figure 9: This graph was obtained


using the user-written grqreg command

Chapter 4 Binary Choice Models


A Binary Choice Model is a type of a qualitative response model that
belongs to the more general class of models in which the dependent
variable takes only a limited range of values. The latter are called
Limited Dependent Variable Models (LDV). LDV models include discrete
responses: binary responses taking on 0 or 1, ordered discrete responses taking ordered categories 1, . . . , J. LDV models also include
those with non-discrete responses such as censored or truncated
responses.

Introduction
We assume that there is an unobserved (possibly vector-valued) latent
variable y that we wish to model based on an observed vector-valued
set of covariates x. An observed (possibly vector-valued) response y is
observed instead, that can be written as

y = ( y ),

25

For scalar y :
( u ) = I( u > 0)
(u) = max{u, 0}
(u) = u iff u > 0

For bivariate y = [y1 , y2 ]0 :

where () is assumed to be known, and it is in general non-invertible.25


Conditional on x, the stochastic nature of y is assumed to be driven
by an unobserved (possibly vector-valued) latent error with conditional CDF F|x and absolutely continuous conditional pdf f |x . For
scalar-valued y , we assume a linear model
y = x0 + .
For multivariate y = [y1 , . . . , yJ ]0 , we will assume J (possibly distinct) linear models yj = x0j j + j , j = 1, . . . , J.
The practitioner is assumed to have a random sample {yi0 , xi0 }in=1 .
In this set of lecture notes, we will use data on supplementary
health insurance coverage. This sample comes from the 2002 wave
of the Health & Retirement Study (HRS). The sample is restricted to
Medicare beneficiaries. The elderly can purchase supplementary pri-

( u1 , u2 ) = u2 I( u1 > 0)
( u1 , u2 ) = I( u1 > 0)I( u2 > 0)
(u1 , u2 ) = max{u2 , 0}I(u1 > 0)
(u1 , u2 ) = max{u1 , u2 }

74

lecture notes for advanced microeconometrics with stata (v.15.1)

vate insurance coverage (ins) from many sources. Explanatory variables include self-retirement status (retire) and spouse retirement
status (sretire), age (age), self-assessed health-status (hstatusg) reported as good, very good or excellent, household income (hhincome),
years of education (educyear), marry dummy (married), ethnicity dummies (hisp and white), gender dummy (female), activities
of daily living (adl) and the total number of chronic conditions
(chronic).

Binomial Discrete Response

Rewrite (y ) = I(y > 0), and notice


that ((y + q)/ ) = I((y + q)/ >
q/) for any > 0 and q > 0. Now
define ye := (y + q)/, and notice
that (y ) and (ye ) are observationally
equal, i.e. both will produce the same
y. So from the model point of view
x0 + = 0 + x1 1 + . . . + xk k + is
observationally equivalent to the model
x0 e + e
= ( 0 + q)/ + x1 ( 1 /) +
. . . + xk ( k / ) + /. Therefore,
Fe|x () = F|x () and = e iff there
is a location (q = 0) and scale ( = 1)
normalizations.
26

This type of LDV can be written as26


(
1

y = (y ) :=
0

y > 0;
y 0.

Therefore, one can write


Pr {y = 1|x} = Pr{y > 0|x} = Pr{x0 + > 0|x}

= Pr{ > x0 |x} = 1 Pr{ x0 |x}


= 1 F|x (x0 ),
and consequently Pr {y = 0|x} = F|x (x0 ).27 Similarly, notice that

27

Notice that if has a symmetric pdf,


Pr {y = 1|x} = F|x (x0 ),

E[y|x] = 1 Pr {y = 1|x} + 0 Pr {y = 0|x}

Pr {y = 0|x} = 1 F|x (x0 ).

= Pr {y = 1|x} .
The log-likelihood function is
n

Ln ( ) := ln L( |y, X) =

h
io
yi ln[1 F|x (xi0 )] + (1 yi ) ln F|x (xi0 ) .

i =1

Notice that if has a symmetric pdf, then after defining qi := 2yi 1


one can rewrite Ln ( ) as

Ln ( ) =

1
n

ln

F|x qi xi0

i

i =1

Therefore
1
Ln ( )
=


qi f |x qi xi0
1
F q x 0  xi = : n
i i
|x
i =1
n

(yi , xi0 )xi ,

i =1

(yi , xi0 )
1
n ( )
= xi
.
0

n i =1
0

2 L

Once one chooses a specific functional form for F|x , the (C)ML
estimator can be defined as28

If F|x (e) = (e) the (C)ML estimator


is called the probit estimator, and if
F|x (e) = (e) it is called the logit
estimator.
28

chapter 4 binary choice models

75

b(C)ML = arg min Ln ( ).


B

Sufficient conditions can be provided to establish b(C)ML P , and


its asymptotic normality, i.e.

2 L n ( )
Asy. Var( n( b(C)ML )) = E
0


 1
.

Name
Probit
Logit
Linear

. *--Loading the data set


. use mus14data.dta, clear

Gumbel
Cloglog

F|x (e)

(
t
)
dt
:= (e)

exp
(
e
)
/
[
1
+
exp(e)] := (e)

e<0
0
e
0e<1

1
e1
exp[ exp(e)]
1 exp[ exp(e)]

Re

.
. *--Data preparation
. global xlist retire age hstatusg hhincome educyear married hisp
. generate linc = ln(hhinc)
(9 missing values generated)
.
. *--Estimating & Comparing various models
. quietly probit ins $xlist
. estimates store bprobit
. quietly logit ins $xlist
. estimates store blogit
. quietly cloglog ins $xlist
. estimates store bcloglog
. estimates table bprobit blogit bcloglog, t stats(N ll) b(%7.3f) stfmt(%8.2f)
----------------------------------------------Variable | bprobit

blogit

bcloglog

-------------+--------------------------------retire |

0.118

0.197

0.136

2.31

2.34

2.10

age |

-0.009

-0.015

-0.010

-1.29

-1.29

-1.19

hstatusg |

0.198

0.312

0.250

3.56

3.41

3.39

hhincome |

0.001

0.002

0.001

3.19

3.02

2.75

educyear |

0.071

0.114

0.089

8.34

8.05

8.23

married |

0.362

0.579

0.464

6.47

6.20

6.14

hisp |

-0.473

-0.810

-0.726

-4.28

-4.14

-4.14

_cons |

-1.069

-1.716

-1.735

-2.33

-2.29

-3.01

76

lecture notes for advanced microeconometrics with stata (v.15.1)

-------------+--------------------------------N |

3206

3206

3206

ll | -1993.62

-1994.88

-2001.45

----------------------------------------------legend: b/t

Remark 17 For testing linear or non-linear hypotheses about the coefficients, standard W, LM or LR test are handy.
Remark 18 If |x has a logistic distribution, 2|x = 2 /3, and if it has a
standard normal distribution, 2|x = 1, the logit estimates of should be

multiplied by 3/ to be comparable to the probit estimates.


Remark 19 Mispecification of F|x due to heteroskedasticity, omitted
variables, endogenous regressors, wrong functional form will usually
yield inconsistent estimates of . However, if the (C)ML estimator has a
probability limit, then a consistent estimator of the asymptotic variance of
these pseudo-parameters can be consistently estimated, see Greene(2012, p.
693).29

Marginal Effects
Let x := [xc , xd ]0 , where xc and xd denote the sub-vectors
containing the continuous and discrete elements of x respectively.
Let c and d be defined accordingly. Then

Greene writes: But there is no guarantee that the [(C)ML] will converge to
anything interesting or useful. Simply
computing a robust covariance matrix for
an otherwise inconsistent estimator does not
give it redemption. Consequently, the virtue
of a robust covariance matrix in this setting
is unclear.
29

Pr {y = 1|x}
E[y|x]
=
= f |x (x0 ) c ,
xc
xc
E[y|x] = E[y|xc , xd + ] E[y|xc , xd ]
n
o
n
o
= Pr y = 1|xc , xd + Pr y = 1|xc , xd .

b [y|x]/x := f |x (x0 b(C)ML ) b(C)ML , E


b [y|x] := F|x (xc0 bc
Called E
(C)ML
d
0
d
0
30
30
b
b
(x + )
) F|x (x (C)ML ), then by the delta method
Notice that
(C)ML

b [y|x] E[y|x]))
Asy. Var( n(E
o 0
o
n
n
f |x (x0 )
f |x (x0 )

Asy. Var( n( b ))
,
=

b [y|x] E[y|x]))
Asy. Var( n(E

0



E[y|x]
E[y|x]
=
Asy. Var( n( b ))
.

E[y|x]
= f |x (x0 )

xc
xd


,

E[y|x]
= f |x (xc0 c (xd + )0 d )

 c 
x
f |x (x0 )
.
xd


xd

xc
+

chapter 4 binary choice models

. *------------------------------*
. * Calculating Marginal Effects *
. *------------------------------*
.
. *--Note: "i." operator indicates finite-difference method
.
. *--Marginal Effect @ a representative value
. quietly probit ins i.retire age i.hstatusg hhincome educyear i.married i.hisp
. margins, dydx(*) at (retire=1 age=75 hstatusg=1 hhincome=35 educyear=12 marrie
> d=1 hisp=0) noatlegend
Conditional marginal effects

Number of obs

Model VCE

: OIM

Expression

: Pr(ins), predict()

3206

dy/dx w.r.t. : 1.retire age 1.hstatusg hhincome educyear 1.married 1.hisp


-----------------------------------------------------------------------------|

Delta-method

dy/dx

Std. Err.

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------1.retire |

.0460306

.0196708

2.34

0.019

.0074767

.0845846

age |

-.0034912

.0026889

-1.30

0.194

-.0087614

.0017789

1.hstatusg |

.076091

.020875

3.65

0.000

.0351767

.1170053

hhincome |

.0004853

.0001516

3.20

0.001

.0001882

.0007825

educyear |

.0278473

.0033257

8.37

0.000

.021329

.0343656

1.married |

.1355362

.0199649

6.79

0.000

.0964057

.1746667

1.hisp |

-.1728396

.0365783

-4.73

0.000

-.2445318

-.1011474

-----------------------------------------------------------------------------Note: dy/dx for factor levels is the discrete change from the base level.
. mata: dydx_probit_MER = st_matrix("r(b)")
. quietly logit ins i.retire age i.hstatusg hhincome educyear i.married i.hisp
. quietly margins, dydx(*) at (retire=1 age=75 hstatusg=1 hhincome=35 educyear=1
> 2 married=1 hisp=0) noatlegend
. mata: dydx_logit_MER = st_matrix("r(b)")
. quietly cloglog ins i.retire age i.hstatusg hhincome educyear i.married i.hisp
. quietly margins, dydx(*) at (retire=1 age=75 hstatusg=1 hhincome=35 educyear=1
> 2 married=1 hisp=0) noatlegend
. mata: dydx_cloglog_MER = st_matrix("r(b)")
. mata: dydx_MER = dydx_probit_MER, dydx_logit_MER, dydx_cloglog_MER
. mata: dydx_MER[(2,3,5,6,7,9,11),1..3]
1

+----------------------------------------------+

77

lecture notes for advanced microeconometrics with stata (v.15.1)

78

1 |

.046030642

.0475583906

.0424805527

2 |

-.0034912069

-.0035828756

-.0033310914

3 |

.0760909676

.0744857137

.0756515623

4 |

.0004853401

.000565483

.0002844106

5 |

.0278472908

.0280488975

.0284452284

6 |

.1355362013

.1331654045

.1326820986

7 |

-.1728395842

-.1794231966

-.1925148858

+----------------------------------------------+

Endogenous Probit Model


Consider the model31

31

y = x0 + w + ,

w = z0 + u,
!
"
0
,
|x, z N
0

1
u

u
u2

!#
.

Application of Theorem 1 yields the following log-likelihood


function for the assumed sample
"
 !#
xi0 + wi + (/u ) wi zi0
1 n
2
p
Ln ( , , , u , ) := ln (2yi 1)
n i =1
1 2



wi zi0
1
1 n

.
+ ln
n i =1
u
u
Remark 20 Notice that all the parameters are identified by the form of the
likelihood.

Theorem 5 If



z1
N
z2

11
21

1
2
12
22


,

,

then the marginal distributions are


z1 N (1 , 11 ), z2 N (2 , 22 )
and the conditional distribution
of z1 given z2 is normal as well
z1 |z2 N [1.2 , 11.2 ] where
1
1.2 = 1 + 12 22
(z2 2 ),
1
11.2 = 11 12 22
21 .

Proof. See Theorem B.7, pp. 10411042]greene12.

Remark 21 Similarly, notice that the likelihood has 2 components. The


parameters in the second component, i.e. and u2 , can be identified and

n
therefore estimated using the sample wi , zi0 i=1 only. Let e
and e
u be the
corresponding estimator, then the log-likelihood in the second step will then
32
be32
You will need to correct standard
"
!# errors accordingly. See Wooldridge

[2010, pp. 922-924]
xi0 + wi + (/e
u ) wi zi0 e

1 n
p
Ln ( , , e
, e
u , ) := ln (2yi 1)
.
2
n i =1
1
The corresponding 2-Step estimators of , and will maximize Ln ( , , e
, e
u , ).
. *------------------------------*
. *
Endogenous Regressors
*
. *------------------------------*
. global xlist female age age2 educyear married hisp white chronic adl hstatusg
.

global zlist retire sretire

chapter 4 binary choice models

probit ins linc $xlist, vce(robust) nolog

Probit regression

Number of obs

Log pseudolikelihood = -1933.4275

3197

Wald chi2(11)

366.94

Prob > chi2

0.0000

Pseudo R2

0.0946

-----------------------------------------------------------------------------|
ins |

Robust
Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------linc |

.3466893

.0402173

8.62

0.000

.2678648

.4255137

female |

-.0815374

.0508549

-1.60

0.109

-.1812112

.0181364

age |

.1162879

.1151924

1.01

0.313

-.109485

.3420608

age2 |

-.0009395

.0008568

-1.10

0.273

-.0026187

.0007397

educyear |

.0464387

.0089917

5.16

0.000

.0288153

.0640622

married |

.1044152

.0636879

1.64

0.101

-.0204108

.2292412

hisp |

-.3977334

.1080935

-3.68

0.000

-.6095927

-.1858741

white |

-.0418296

.0644391

-0.65

0.516

-.168128

.0844687

chronic |

.0472903

.0186231

2.54

0.011

.0107897

.0837909

adl |

-.0945039

.0353534

-2.67

0.008

-.1637953

-.0252125

hstatusg |
_cons |

.1138708

.0629071

1.81

0.070

-.0094248

.2371664

-5.744548

3.871615

-1.48

0.138

-13.33277

1.843677

-----------------------------------------------------------------------------.

ivprobit ins $xlist (linc = $zlist), vce(robust) nolog

Probit model with endogenous regressors

Number of obs

3197

Wald chi2(11)

382.34

Log pseudolikelihood = -5407.7151

Prob > chi2

0.0000

-----------------------------------------------------------------------------|
|

Robust
Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------linc |

-.5338185

.3852354

-1.39

0.166

-1.288866

.221229

female |

-.1394069

.0494475

-2.82

0.005

-.2363223

-.0424915

age |

.2862283

.1280838

2.23

0.025

.0351887

.5372678

age2 |

-.0021472

.0009318

-2.30

0.021

-.0039736

-.0003209

educyear |

.1136877

.0237927

4.78

0.000

.0670549

.1603205

married |

.7058269

.2377729

2.97

0.003

.2398005

1.171853

hisp |

-.5094513

.1049488

-4.85

0.000

-.7151473

-.3037554

white |

.156344

.1035713

1.51

0.131

-.046652

.35934

chronic |

.0061943

.0275259

0.23

0.822

-.0477556

.0601441

79

80

lecture notes for advanced microeconometrics with stata (v.15.1)

adl |

-.1347663

.03498

-3.85

0.000

hstatusg |
_cons |

.2341782
-10.00785

-.2033259

-.0662067

.0709769

3.30

4.065795

-2.46

0.001

.0950661

.3732904

0.014

-17.97666

-2.03904

-------------+---------------------------------------------------------------/athrho |

.67453

.3599913

1.87

0.061

-.0310399

1.3801

/lnsigma |

-.331594

.0233799

-14.18

0.000

-.3774178

-.2857703

-------------+---------------------------------------------------------------rho |

.5879518

.2355468

-.0310299

.8809736

sigma |

.7177787

.0167816

.6856296

.7514352

-----------------------------------------------------------------------------Instrumented:

linc

Instruments:

female age age2 educyear married hisp white chronic adl


hstatusg retire sretire

-----------------------------------------------------------------------------Wald test of exogeneity (/athrho = 0): chi2(1) =

3.51 Prob > chi2 = 0.0610

Chapter 5 Program Evaluation


Introduction
Conditional Expectation

Definition 1 Suppose we know a random variable, X, has taken a


particular numerical value, say x. Then, one can calculate the expected
value of another random variable, Y, given that X = x. This expected
value is called the Conditional Expectation of Y given X = x and is
defined as:
When Y is a discrete random variable taking on finite values
{y1 , . . . , y J }, then
J

E [Y | X = x ] = E [Y | x ] =

y j Pr[Y = y j |X = x].

j =1

when Y is a continuous random variable with conditional distribution FY | X , then


E [Y | X = x ] = E [Y | x ] =

We should notice that when x


cahnges, so does E[Y | X = x ] therefore
the conditional expectation is a function
of x. Furthermore, when the conditional
expectation is seen as a function of
the random variable X, i.e. E[Y | X ], the
conditional expectation is also a random
variable.

ydFY | X .

Properties of Conditional Expectations:


Step 1 E[c( X )| X ] = c( X ), for any function c().
Step 2 For any two functions a() and b(), we have
E[ a( X )Y + b( X )| X ] = a( X ) E[Y | X ] + b( X ).
Step 3 If two random variables Y and X are statistically independent,
then E[Y | X ] = E[Y ].33
Step 4 E[ E[Y | X ]] = E[Y ].
Step 5 E[Y | X ] = E[ E[Y | X, Z ]| X ].

Recall that Y and X are statistically


independent iff FYX (y, x ) = FY (y)
FX ( x ).

33

82

lecture notes for advanced microeconometrics with stata (v.15.1)

Step 6 If E[Y | X ] = E[Y ], then cov(Y, X ) = 0. In fact, all functions of


X are uncorrelated with Y.34
Step 7 If E[Y 2 ] < and E[ g( X )2 ] < for a function g(), then

The covariance between two random variables Y and X is defined as


cov(Y, X ) = E[YX ] E[Y ] E[ X ].
34

E{[Y E[Y | X ]]2 | X } E{[Y g( X )]2 | X }, y


E{[Y E[Y | X ]]2 } E{[Y g( X )]2 }.
This last property is used a lot in the context of forecasting. The
first inequality says that, if we were to measure the forecasting error
as the Mean Squared Forecasting Error, conditioned on X, then the
conditional expected values is better than any other function of X
when predicting Y.

Program Evaluation Techniques

For all two random variables


(Y, X > )> one can write: Y =
E[Y | X = x ] + , where E[| X = x ] =
0.
A lineal regression model implies
E[Y | X = x ] = x > . A Probit or Logit
model implies that E[Y | X = x ] =
F| x ( x 0 ) with F| x equal to the CDF
of a standard normal for a Probit
and the CDF of a logistic for a Logit.
Similarly, a Poisson model implies
that E[Y | X = x ] = exp( x 0 ).

Another area where sample selection plays a crucial role is in the


estimation of treatment effects. Suppose that if observation i with
characteristics xi receives certain treatment (such as a government
sponsored training programme) then an outcome y1i is observed, if
this observation does not receive the treatment one only observes y0i .
Let yi denote the observed outcome (such as wage earnings) and di
an indicator of whether individual i received treatment or not. Then
the observed outcome can be written in terms of a Roys model, i.e.
yi = di y1i + (1 di )y0i .

(18)

The literature has concentrated on the following two parameters of


interest:35
Average Treatment Effect:
Average Treatment on the Treated Effect:

ATE := E[y1i y0i ],


ATT := E[y1i y0i |di = 1].

The ATE describes the expected effect of treatment for an arbitrary observation i chosen at random from the population, while the ATT is
the mean effect for those that actually participate in the programme,
i.e. the Average Treatment Effect in the treated subpopulation.36
Let re78 represent real earnings in 1978, age is age in years, educ
is number of years of formal education, black and hisp are ethnicity
dummies, married is a marital status indicator, nodegr equals 1 if
the person does not have a high school diploma, while re# and u#
represent real earning and unemployment indicators for # {74, 75}.
Remark 22 Since for each observation i one only observes either y1i or y0i ,
but not both (missing data problem), the joint distribution F10 (y1 , y0 ) is

One can also define the conditional Average Treatment Effect as


CATE (xi ) := E[y1i y0i |xi ]. Notice
that by the definitions of ATE and
ATT , we have ATE = E[CATE (xi )] and
ATT = E[CATE (xi )|di = 1].

35

Imbens & Angrist (1994) defined a


third parameter of interest they called
the Local Average Treatment Effect, i.e.
LATE , which measures the effect of
treatment upon observations at the
margin of being treated.

36

chapter 5 program evaluation

83

not identified. One can only identified the marginals F1 (y1 ) and F0 (y0 ).
It turns out that even when one can not identify F10 (, ), certain features
of it such as ATE can be identified under less restrictive conditions that
independence between y1 and y0 .
Relationship between ATE & ATT:
Let E[y1i ] =: 1 and E[y0i ] =: 0 , then by construction y1i =
1 + v1i , y0i = 0 + v0i , such that E[v1i ] = E[v0i ] = 0 and
y1i y0i = 1 0 + [v1i v0i ]

= ATE + [v1i v0i ] , and


E[y1i y0i |di = 1] = ATE + E[v1i v0i |di = 1],
ATT = ATE + E[v1i v0i |di = 1].
Therefore ATT differs from ATE by the expected person-specific
gain for those who participated.

Unconfoundedness based Methods


Suppose we observe a sample {yi , di , xi0 }in=1 from the joint distribution
of (y, d, x0 )0 Y [0, 1] X such that:

If subject i is randomly assigned


to the treatment group or the
control group, then Assumption
Step 1 holds by construction and
Assumption Step 2 is also bound
to hold.

Step 1 (Ignorability) (y1 , y0 ) and d are independent conditional of x.


Step 2 (Overlap) For all x X , 0 < Pr {d = 1|x} := p (x) < 1.
Assumption Step 1 is known in the statistical literature as: ignorability of treatment, unconfoundedness or simply conditional independence.
The idea is the following: If we were to observe enough information
(contained in x) that determines treatment, then (y1 , y0 ) might be independent of d (conditional on x) - even though (y1 , y0 ) and d might
be correlated, they become independent once we partial out x.37 Because of the latter, Assumption Step 1 is also known in econometrics
as selection on observables, i.e. once x is observed and conditioned on,
there is no unobserved factor that influence both outcomes (y1 , y0 )
and treatment d simultaneously.38
Assumption Step 2 guarantees that one observes individuals with
the same characteristics x in both the control (d = 0) and treatment
(d = 1) groups. p(x) is known as the propensity score.

Good candidates for inclusion in x are


pre-treatment variables which values do
not change during the time treatment
takes effect.
37

Selection on Observables is inherently a


non-testable restriction.
38

Regression based Methods


Define 1 (x) := E[y1 |x] and 0 (x) := E[y0 |x].39 Notice that 1 (x)
and 0 (x) are generally unknown, but Assumptions Step 1 and Step
2 provide a way to identify them as follows: Notice that one can

So by construction y1 = 1 (x) + v1 ,
y0 = 0 (x) + v0 , such that E[v1 |x] =
E[v0 |x] = 0 and

39

E[y1 y0 |x, d = 1]

= CATE (x) + E[v1 v0 |x, d = 1],


= CATE (x) + E[v1 v0 |x, d = 1],
= CATE (x) + E[v1 v0 |x] by Assumption Step 1
= CATE (x) .

lecture notes for advanced microeconometrics with stata (v.15.1)

84

rewrite Roys model in (18) as


y = y0 + d(y1 y0 ), so

(19)

E[y|x, d] = E[y0 |x, d] + d{E[y1 |x, d] E[y0 |x, d]}

= E[y0 |x] + d{E[y1 |x] E[y0 |x]} by Assumption Step 1


= 0 (x) + d{1 (x) 0 (x)}.
Therefore40

Since a sample from the joint distribution of (y, x0 , d) is observed, then m1 ()


and m0 () are said to be nonparametrically identified.

40

0 (x) E[y|x, d = 0] =: m0 (x),


1 (x) E[y|x, d = 1] =: m1 (x).
Based on representation (19) and under Assumptions Step 1 and Step
2, one can write

E[y|x, d = 1] E[y|x, d = 0] = E[y0 |x, d = 1] + E[y1 y0 |x, d = 1] E[y0 |x, d = 0],

= {E[y0 |x, d = 1] E[y0 |x, d = 0]} + CATE (x) ,


m1 (x) m0 (x) = CATE (x) by Assumption Step 1.
In conclusion
ATE := E[CATE (x)]
= E[m1 (x) m0 (x)],

ATT := E[CATE (x) |d = 1],


= E[m1 (x) m0 (x)|d = 1].

Estimation:
b 1 (xi ) and m
b 0 (xi ) represent consistent estimators of m1 (x) and
If m
41
m0 (x), then using the entire random sample of size n one has by
the analogy principle
bATE = n1
bATT =

Let m1 (x, 1 ) and m0 (x, 0 ) be parametric known functions, and let b1 be


estimated using those observations for
which di = 1 and let b0 be estimated
using those observations for which
di = 0, then
41

[mb 1 (xi ) mb 0 (xi )],

bATE = n1

i =1

(nj=1 d j )1

[m1 (xi , b1 ) m0 (xi , b0 )],

i =1

bATT = (nj=1 d j )1

di [mb 1 (xi ) mb 0 (xi )].

i =1

di [m1 (xi , b1 ) m0 (xi , b0 )],

i =1

and we have already discussed the way


to obtain expressions for the asymptotic
variances. See Wooldridge [2010, pp.
918-919]

. gen re7578=re78-re75
. gen re74sq=re74^2
. gen agesq=age^2
. gen educsq=educ^2

. global xlist age agesq educ educsq black hisp married nodegr re74 re74sq u74 u75
.
. teffects ra (re78 $xlist, linear) (treat), ate
Iteration 0:

EE criterion =

1.396e-20

Iteration 1:

EE criterion =

7.677e-24

Treatment-effects estimation

Number of obs

445

chapter 5 program evaluation

Estimator

: regression adjustment

Outcome model

: linear

85

Treatment model: none


-----------------------------------------------------------------------------|
re78 |

Robust
Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------ATE

|
treat |
(1 vs 0)

1624.517

645.9279

2.52

0.012

358.5212

2890.512

-------------+---------------------------------------------------------------POmean

|
treat |
0

4546.023

341.5606

13.31

0.000

3876.577

5215.47

-----------------------------------------------------------------------------. teffects ra (re78 $xlist, linear) (treat), atet


Iteration 0:

EE criterion =

1.396e-20

Iteration 1:

EE criterion =

4.622e-24

Treatment-effects estimation

Number of obs

Estimator

: regression adjustment

Outcome model

: linear

445

Treatment model: none


-----------------------------------------------------------------------------|
re78 |

Robust
Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------ATET

|
treat |

(1 vs 0)

1815.461

702.8208

2.58

0.010

437.9571

3192.964

-------------+---------------------------------------------------------------POmean

|
treat |
0

4533.685

383.3647

11.83

0.000

3782.304

5285.066

------------------------------------------------------------------------------

The teffects ra command in Stata


allows you specify linear (linear) as
well as nonlinear parametric models for
the outcome equation, i.e. m(, ) such
as logit, probit, hetprobit, and
poisson.

lecture notes for advanced microeconometrics with stata (v.15.1)

86

Propensity Score based Methods


Another way to establish identification is to use inverse propensity
score weighting as follows:





 
 

dy
dy1
dy1
x
E
x
=
E
x
=
E
E
x,
d




p (x)
p (x)
p (x)






dE[y1 |x, d]
dE[y1 |x]
d
E [ d |x]
=E
x
=
E
x
=

(
x
)
E
x = 1 (x)
1



p (x)
p (x)
p (x)
p (x)

= 1 (x).






 
 

(1 d)E[y0 |x, d]
(1 d)y0
(1 d)y0
(1 d)y

x =E
x =E E
x, d x = E
E
x
1 p (x)
1 p (x)
1 p (x)
1 p (x)




(1 d)E[y0 |x]
1 d
1 E [ d |x]
=E
x = 0 (x)E
x = 0 (x)
1 p (x)
1 p (x)
1 p (x)


= 0 (x).
Recall




(1 d)y
(d p(x))y
dy

x
=
E
x , so
CATE (x) := 1 (x) 0 (x) = E
p (x) 1 p (x)
p(x)(1 p(x))


(d p(x))y
ATE := E[CATE (x)] = E
.
(20)
p(x)(1 p(x))
Now, notice that42

42

(d p(x))y
(d p(x))y0
=
+ d ( y1 y0 ).
1 p (x)
1 p (x)

Write

(d p (x))y = [d p (x)] [y0 + d(y1 y0 )]


= [d p (x)] y0 + d [d p (x)] (y1 y0 )

We now show that E[(d p(x))y0 /(1 p(x))|x] = 0 by showing that


E [ (d p(x))y0 | x] = E [ E [ (d p(x))y0 | x, d]| x] = E [ (d p(x))E[y0 |x, d]| x]

= E [ (d p(x))E[y0 |x]| x] = {E[d|x] p(x)}E[y0 |x] = 0.


Therefore
E


(d p(x))y
x = E [ d(y1 y0 )| x] , and
1 p (x)


(d p(x))y
E
= E [d(y1 y0 )] .
1 p (x)

(21)

Now
E [d(y1 y0 )] = E[E [ d(y1 y0 )| d]]

= E [ d(y1 y0 )| d = 1] Pr {d = 1} + E [ d(y1 y0 )| d = 0] Pr {d = 0}
= E [ (y1 y0 )| d = 1] Pr {d = 1} ,
so replacing this on the right-hand side of (21) one obtains


(d p(x))y
E
= E [ (y1 y0 )| d = 1] Pr {d = 1}
1 p (x)


(d p(x))y
ATT = E
, where := Pr {d = 1} .
[1 p(x)]

(22)

= [d p (x)] y0 + d [1 p (x)] (y1 y0 ).

chapter 5 program evaluation

Estimation:
If pb(xi ) represents a consistent estimator of p(x), and we estimate
by b = n1 nj=1 d j , then using the entire random sample of size n
one has by the analogy principle

n 
(di pb(xi ))yi

1
bATE = n
,
pb(xi )(1 pb(xi ))
i =1

n 
(di pb(xi ))yi

1
bATT = n
.
b[1 pb(xi )]
i =1
. quie teffects ipw (re78) (treat $xlist, probit), atet
. estimate store ipwPROBIT
.
. quie teffects ipw (re78) (treat $xlist, logit), atet
. estimate store ipwLOGIT
.
. estimates table ipwPROBIT ipwLOGIT, b(%9.3f) p
-------------------------------------Variable | ipwPROBIT

ipwLOGIT

-------------+-----------------------ATET

|
r.treat |
1

1788.526

1785.963

0.0100

0.0101

-------------+-----------------------POmean

r.treat |
0

4560.619

4563.182

0.0000

0.0000

-------------+-----------------------TME1

|
age |

0.019

0.025

0.7350

0.7879

agesq |

-0.000

-0.000

0.7437

0.7991

educ |

-0.537

-0.861
0.0284

0.0252

educsq |

0.028

0.044

0.0393

0.0433

black |

-0.158

-0.258

0.4869

0.4819

hisp |

-0.524

-0.863

0.0951

0.0956

87

Let p(x, ) be a Logit or Probit model


estimated via (Conditional) maximum
b then
likelihood, i.e. ,
#
"
n
(di p(xi , b))yi
1
b
,
ATE = n
b
b
i =1 p (xi , )(1 p (xi , ))
"
#
n
(di p(xi , b))yi
bATT = n1
.
b[1 p(xi , b)]
i =1
and we have already discussed the way
to obtain expressions for the asymptotic
variances. See Wooldridge [2010, pp.
922-924]
The teffects ipw command in Stata
allows you specify three parametric
models for the propensity score, i.e.
p(, ) such as logit, probit, and
hetprobit.

88

lecture notes for advanced microeconometrics with stata (v.15.1)

married |

0.134

0.220

0.4333

0.4282

nodegr |

-0.252

-0.409

0.2888

0.2871

re74 |

-0.000

-0.000
0.9310

0.9576

re74sq |

-0.000

-0.000

0.5947

0.6303

u74 |

-0.036

-0.070

0.8952

0.8748

u75 |

-0.273

-0.438

0.1501

0.1567

_cons |

2.598

4.252

0.0570

0.0686

-------------------------------------legend: b/p

Nearest-Neighbor based Methods


Another popular (yet inefficient) method to estimate ATE and ATT
are methods based on direct imputation of the missing outcomes for
those treated or those in the control group via finding a member (or
members) in the other group with similar characteristics and impute
their outcome (or an average of their outcomes) as the missing value.
For each individual i, the basic idea is to find the index `m (i ) which
is the index of the unit in the opposite treatment group that is the
m-th closest to unit i in terms of the distance measure based on the
norm k kW .
In particular, `1 (i ) is the first nearest match, `2 (i ) is the second
nearest match, . . . , ` M (i ) corresponds to the M-th nearest match.
Collect these indices into J M (i ) = {`1 (i ), . . . , ` M (i )} and define the
imputed values as
(
y1i =

yi
1
M

jJ M (i) y j

;
;

di = 1
di = 0

(
y0i =

Therefore43

1
M

yi

jJ M (i) y j

;
;

This is called the Mahalanobis metric


and is used to define a distance in a
metric space. For example, for vectors
v1 and v2 , the Mahalanobis metric is
defined as (v1 v2 )0 W1 (v1 v2 ). Stata
allows the usage of the sample covariate
b X ), its diagonal
covariance matrix (
b X )) and the identity matrix in
(diag(
which case the Mahalanobis metric is
simply the standard Euclidean metric.

di = 1
di = 0
These estimators can achieve
the same precision of those previously
discussed only when the number of continuous covariates is equal to 1. When
the number of continuous covariates
is at least 3 the speed at which they
converge to the true values is slower
than those previously discussed. When
the number is 2 the estimates will be
biased.
43

bATE = n1
bATT = b1

(y1i y0i ),

i =1
n

di (y1i y0i ).

i =1

chapter 5 program evaluation

. teffects nnmatch (re78 $xlist) (treat), ate metric(euclidean)


Treatment-effects estimation

Number of obs

445

Estimator

: nearest-neighbor matching

Matches: requested =

Outcome model

: matching

min =

max =

Distance metric: Euclidean

-----------------------------------------------------------------------------|
re78 |

AI Robust
Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------ATE

|
treat |
(1 vs 0)

2062.063

734.4061

2.81

0.005

622.654

3501.473

------------------------------------------------------------------------------

Rosenbaum and Rubin [1983] showed that ignorability given x


implies ignorability given p(x), that is (y1 , y0 ) and d are independent given p(x). Using this result, one can match on the basis of
the propensity score instead (which is continuous and always has
dimensionality of one).
. teffects psmatch (re78) (treat $xlist), ate
Treatment-effects estimation

Number of obs

Estimator

: propensity-score matching

Matches: requested =

Outcome model

: matching

min =

max =

Treatment model: logit

445

-----------------------------------------------------------------------------|
re78 |

AI Robust
Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------ATE

|
treat |
(1 vs 0)

1765.745

673.3646

2.62

0.009

445.9744

3085.515

------------------------------------------------------------------------------

Even in situations when nearest-neighbor matching is performed


over one single continuous covariate or over the estimated propensity
score, the asymptotic variances of bATE and bATT calculated using
nearest-neighbor matching cannot be smaller than those based on
inverse probability weighting. The popularity of nearest-neighbor
matching is due to its natural construction and because of the availability of an user-written Stata command before the appearance of

89

90

lecture notes for advanced microeconometrics with stata (v.15.1)

the teffects command in later releases of Stata.

Chapter 6 Difference-in-Difference (DiD)


Consider a situation where one observes the outcome and characteristics of an individual i before (t = 0) and after (t = 1) receiving a
treatment that happens between both periods. A simple comparison
between pre and posttreatment outcomes for such individuals,
Y (i, 1) Y (i, 0), cannot in general be used to identify and estimate
the usual parameters of interest. The reason is simple: The pre and
posttreatment outcomes are contaminated with temporal tendencies, as well as the effects of events (unrelated to the treatment) that
happens between periods.
The solution is also simple. If only a fraction of the target population of interest has been exposed to the treatment, one can use
the group that was not exposed to the treatment between periods
to identify the temporal variation as well as the effect of unrelated
events on the outcome variables that are not due to the treatment.

Figure 10: Figure taken from:


http://en.wikipedia.org/wiki/Difference_in_differences

The Classical DiD Parametric Estimator


Let Y (i, t) be the outcome of interest for individual i at time t. The
target population is observed between the pretreatment period,
t = 0, and the posttreatment, t = 1. Between both periods, just
a fraction of the population is exposed to the treatment. Denote
as D (i, t) = 1 if individual i has been exposed to the treatment
before period t (as before, we will refer to them as the treated), and
as D (i, t) = 0 if they were not (as before, we will call them the
control).44
Lets assume that the outcome variable is generated by the following model
Y (i, t) = (t) + D (i, t) + (i ) + v(i, t),
where (t) is a time-specific parameter, represents the treatment
effect, (i ) is an individual-specific effect, and v(i, t) is an individualspecific transitory shock with mean zero for each period t = 0, 1, and
possibly autocorrelated.45

Since the treatment only happens


after the first period, note that D (i, 0) =
0 for all i.
44

Since we only observe Y (i, t)


and D (i, t), the treatment effect, , is
not identified unless we make further
assumptions.
45

92

lecture notes for advanced microeconometrics with stata (v.15.1)

A sufficient condition that ensures that is identified is that


selection into the treatment group does not depend on the transitory
individual shock
Pr{ D (i, 1) = 1|v(i, t)} = Pr{ D (i, 1) = 1},

(23)

for t = 0, 1. Adding and subtracting E[ (i )| D (i, 1)] from the original


equation, we obtain
Y (i, t) = (t) + D (i, t) + E[ (i )| D (i, 1)] + (i, t),
where (i, t) = (i ) E[ (i )| D (i, 1)] + v(i, t). This last equation can be
rewritten as46
Y (i, t) = + D (i, 1) + t + D (i, t) + (i, t).

(24)

The condition 23 implies that E[(1, D (i, 1), t, D (i, t))> (i, t)] = 0, that
is, that the orthogonal equations in a simple regression model hold
for this linear model and therefore the parameters [, , , ]> will be
consistently estimated by a simple OLS regression of Y (i, t) on D (i, 1),
t, y D (i, t) (including a constant).
Why is it called DiD?:
Model 24 is called Difference-in-Difference because under assumption 23, we have
= { E[Y (i, 1)| D (i, 1) = 1] E[Y (i, 1)| D (i, 1) = 0]}

{ E[Y (i, 0)| D (i, 1) = 1] E[Y (i, 0)| D (i, 1) = 0]}


= E[Y (i, 1) Y (i, 0)| D (i, 1) = 1] E[Y (i, 1) Y (i, 0)| D (i, 1) = 0]
|
{z
} |
{z
}
Difference 1

Difference 2

The traditional way to include individual-specific characteristics to


control for the intrinsic outcome dynamics is to include them linearly
Y (i, t) = + X (i )> (t) + D (i, 1) + t + D (i, t) + (i, t),
where X (i ) is assumed uncorrelated with (i, t). Taking the first
difference with respect to t, one obtains
Y (i, 1) Y (i, 0) = + X (i )> + D (i, 1) + [(i, 1) (i, 0)],
where = (1) (0).
. gen re7578=re78-re75
. gen re74sq=re74^2
. gen agesq=age^2
. gen educsq=educ^2

(25)

Note that this implies that v(i, 1)


v(i, 0) is mean independent with
respect to D (i, 1), and therefore when
there is not treatment, ATT has the
same variation as the ATT for those
individuals that did not participate in
the program+.

46

Note that
(t) (0) + [(1) (0)] t,

E[ (i )| D (i, 1)] E[ (i )| D (i, 1) = 0]+

{ E[ (i )| D (i, 1) = 1] E[ (i )| D (i, 1) = 0]}


D (i, 1),
:= E[ (i )| D (i, 1) = 0] + (0),
:= E[ (i )| D (i, 1) = 1] E[ (i )| D (i, 1) = 0],
: = (1) (0).

The parameter can then be


estimated by a simple regression
of Y (i, 1) Y (i, 0) on D (i, 1).

chapter 6 difference-in-difference (did)

. global xlist age agesq educ educsq black hisp married nodegr re74 re74sq u74 u75
.
. quie reg re7578 treat, robust
. estimates store DiDnoX
.
. quie reg re7578 treat $xlist, robust
. estimates store DiDwithX
. estimates table DiDnoX DiDwithX, k(treat) b(%9.3f) p
-------------------------------------Variable |

DiDnoX

DiDwithX

-------------+-----------------------treat |

1529.197

1489.351

0.0330

0.0321

-------------------------------------legend: b/p

Nonparametric Identification and Estimation


As it was discussed above, the classical DiD model depends on many
assumptions. In particular, linearity, separability among other things.
We now are going to discuss how one can identify ATT in this type of
models without some of these conditions (at least explicitly).
We start by extending the same notation we used previously:
Y 0 (i, t) represents the outcome individual i would have obtained
at time t in the absence of treatment. Let us also denote as Y 1 (i, t)
the outcome individual i would have obtained if treated at time t.
Recall we only observe one of these two potential outcomes for each
individual, i.e.
Y (i, 1) = Y 0 (i, t) (1 D (i, 1)) + Y 1 (i, t) D (i, 1).
Since the treatment is only administered after period t = 0, we can
call D (i ) := D (i, 1). Suppressing the subscript, i, for simplicity we
will assume the following two conditions:
Step 1 E[Y 0 (1) Y 0 (0)| X, D = 1] = E[Y 0 (1) Y 0 (0)| X, D = 0].
Step 2 Pr{ D = 1} > 0 and with probability one, 0 < Pr{ D = 1| X } <
1.
Condition Step 1 is crucial in DiD models. Basically, it implies
that conditioning in characteristics X, the average outcomes for those
treated follow a parallel temporal tendency as the average outcome in
the absence of treatment.

93

94

lecture notes for advanced microeconometrics with stata (v.15.1)

Abadie [2005] shows that if assumptions Step 1 and Step 2 hold for
each values of X, then


Y (1) Y (0)
D Pr{ D = 1| X }
1
0
ATT := E[Y (1) Y (1)| D = 1] = E

Pr{ D = 1}
1 Pr{ D = 1| X }
Estimation I:
If pb( X (i )) represents a consistent estimator of p( X (i )) := Pr{ D =
1| X }, and we estimate Pr{ D = 1} with b = n1 nj=1 D ( j), then using
the entire sample of size n, one has by the analogy principle

n 
Y (i, 1) Y (i, 0)
D (i ) pb( X (i ))
.
(26)
bATT = n1

1 pb( X (i ))
b
i =1
. quie summarize treat, meanonly
. scalar rho=r(mean)
. quie probit treat $xlist
. predict phat, pr
. g tauATT=re7578*(treat-phat)/(rho*(1-phat))
. quie summarize tauATT, meanonly
. di r(mean)
1631.7815

Now that we have computed the bATT the next question is how
to compute standard errors? The next section describes how the
bootstrap to compute standard errors.
The Bootstrap Method
Suppose we have a random sample { Zi ; i = 1, . . . , n} from a
random variable with CDF F0 . Lets say we are interested in the distribution of the statistic Tn = Tn ( Z1 , . . . , Zn ) which has a finite sample
distribution Gn (, F0 ) = Pr( Tn ) this distribution is generally unknown and it depends on F0 . In our Statistics & Econometric courses
we learnt to use G (, F0 ) in its place (the asymptotic distribution).
The basic idea of the bootstrap is to approximate Gn (, F0 )
with Gn (, Fn ) instead, where Fn is the empirical distribution
function of the data.
Unlike the calculation of G (, F0 ) which requires knowledge
of asymptotic theory and approximation theorems, the calculation
of Gn (, Fn ) can be performed on a computer instead as follows:
The idea is to treat the original random sample, { Zi ; i = 1, . . . , n},
as if it were the population [in which case the original statistics
Tn = Tn ( Z1 , . . . , Zn ) becomes the population parameter we wish

Define as p( X, ) := Pr{ D = 1| X }
a Probit or Logit model that can be
estimated by (conditional) maximum
likelihood methods. Let b be the (C)ML
estimate, then
"
#
n
D (i ) p( X (i ), b)
Y (i, 1) Y (i, 0)

bATT = n
.
b
1 p( X (i ), b)
i =1
Standard errors can be calculated as
described in Wooldridge [2010, pp.
922-924]

chapter 6 difference-in-difference (did)

to estimate and make inference about]. We resample (with replacement) from this population a pseudo-sample of the same size, say
{ Zib ; i = 1, . . . , n} and recalculate the original statistic using this
pseudo-sample, i.e. Tnb = Tn ( Z1b , . . . , Znb ). We can do this B many
times, i.e. b = 1, . . . , B and we end up with another pseudo-sample
{ Tnb : b = 1, . . . , B}. Then, Gn (, Fn ) can be estimated simply as the
empirical distribution function of { Tnb : b = 1, . . . , B}, i.e.
1
B



I Tnb .

b =1

Bootstrap Confidence Intervals


. program drop onebootrep
.
. program onebootrep, rclass
1.
drop _all
2.

use lalonde.dta, clear

3.

gen re7578=re78-re75

4.

gen re74sq=re74^2

5.

gen agesq=age^2

6.

gen educsq=educ^2

7.

bsample

8.

quie summarize treat, meanonly

9.

scalar rho=r(mean)

10.

quie probit treat $xlist

11.

predict phat, pr

12.
13.

g tauATTv=re7578*(treat-phat)/(rho*(1-phat))
quie summarize tauATTv, meanonly

14.

return scalar tauATT=r(mean)

15. end
.
. use lalonde.dta, clear
. gen re7578=re78-re75
. gen re74sq=re74^2
. gen agesq=age^2
. gen educsq=educ^2
. global xlist age agesq educ educsq black hisp married nodegr re74 re74sq u74 u75
. quie summarize treat, meanonly
. scalar rho=r(mean)
. quie probit treat $xlist
. predict phat, pr
. g tauATT=re7578*(treat-phat)/(rho*(1-phat))
. quie summarize tauATT, meanonly
. scalar tauATT0 = r(mean)

95

lecture notes for advanced microeconometrics with stata (v.15.1)

96

.
. simulate tauATTb=r(tauATT), seed(10101) reps(999) nodots saving(bdata,replace): onebootrep
command:

onebootrep

tauATTb:

r(tauATT)

. summarize tauATTb
Variable |

Obs

Mean

Std. Dev.

Min

Max

-------------+-------------------------------------------------------tauATTb |

999

1624.884

751.8149

-817.0814

4326.195

. scalar mtauATT = r(mean)


. scalar tauATTbcorrected = 2*tauATT0 - mtauATT
. _pctile tauATTb, p(2.5)
. scalar tauATT025 = r(r1)
. _pctile tauATTb, p(97.5)
. scalar tauATT975 = r(r1)
.
. display "Bootstrap Bias-Corrected ATT = " tauATTbcorrected
Bootstrap Bias-Corrected ATT = 1638.6792
. display "Bootstrap 95% Confidence Interval = (" tauATT025
>"," tauATT975 ")"
Bootstrap 95% Confidence Interval = (179.08388,3151.1758)

Remember that for any statistic,


Tn , its bias is theoretically defined
as bias( Tn ) = E( Tn ) T0 where T0
represents the population parameter
that Tn is trying to estimate. Then based
on the empirical distribution function
of { Tnb : b = 1, . . . , B}, this bias can be
estimated as
1
B

Tn

(b)

Tn .

b =1

The bias-corrected version of the estimator, (Tn bias( Tn )), can be estimated
as
1 B (b)
2Tn Tn .
B b =1

Two Cross-Sectional Samples


The previous discussion assumes that one has one cross-sectional
sample and that we observe the outcome for each individual in the
sample in both periods t = 0 and t = 1. This might not be the case
in some applications. In this section we discuss how we can adapt
the previous methods to situations where one has two cross-sectional
samples, one taken in period t = 0 and another one taken in period
t = 1.
Let lrprice be the logarithm of the price of a house in 1978 dollars, y81 equals 1 if the observation belong to year 1981, nearinc
equals 1 if the house is near the incinerator, y81nrinc equals 1 if the
observation belong to year 1981 and is near the incinerator, age is the
age of the house in years, agesq is age squared, lintst is the logarithm of the distance from the interstate measured in feet, lland is
the logarithm of the squared footage of the lot, larea is the logarithm
of the squared footage of the house, rooms is the number of rooms,
and baths is the number of bathrooms.

The rumors about the construction of


a new incinerator next to the neighborhood started after 1978. The data set
contains 321 observations of houses
sold in 1978 and 1981.

chapter 6 difference-in-difference (did)

97

. global xlist y81 nearinc age agesq lintst lland larea rooms baths
.
. quie reg lrprice y81 nearinc y81nrinc, robust
. estimates store DiDnoX
.
. quie reg lrprice y81nrinc $xlist, robust
. estimates store DiDwithX
. estimates table DiDnoX DiDwithX, k(y81nrinc) b(%9.3f) p
-------------------------------------Variable |

DiDnoX

DiDwithX

-------------+-----------------------y81nrinc |

-0.063

-0.132

0.5086

0.0289

-------------------------------------legend: b/p

Let assume that for each individual in the joint sample we observe
(Y, D, T, X > )> where T is the temporal indicator that takes on the
values 1 if the individual belongs to the post-treatment sample.47
Using this notation we make the following assumption:
3. Given T = 0, the data is a random sample from the joint distribution of (Y (0), D, X > )> ; given T = 1, the data is also a random
sample from the joint distribution of de (Y (1), D, X > )>
Denote as (0, 1) the proportion out of the total number of
observations sampled in period t = 1.
Abadie [2005] also shows that if assumptions Step 1, 3 and 0 <
Pr{ D = 1| X } < 1 hold for each value of X, then
ATT := E[Y 1 (1) Y 0 (1)| D = 1]


Pr{ D = 1| X }
=E
0 Y ,
Pr{ D = 1}
T
D Pr{ D = 1| X }
0 : =

.
(1 ) Pr{ D = 1| X }[1 Pr{ D = 1| X }]
Estimation II:
Based on the joint sample {Yi , Di , Ti , Xi }in=1 we can estimate this
quantity as
Ti b

Di pb( Xi )

,
b
b)
pb( Xi )[1 pb( Xi )]
(1

n 
pb( Xi )
1
=n
bi Yi ,
b
i =1

bi =

bATT

where b
= n1 nj=1 Tj .

(27)
(28)

This assumption implies that the


researcher knows the treatment status
(in period t = 1) of individuals
sampled in period t = 0, as well as
the characteristics, X, of individuals
sampled in period t = 1.
47

lecture notes for advanced microeconometrics with stata (v.15.1)

98

. quie summarize nearinc, meanonly


. scalar rho=r(mean)
. quie summarize y81, meanonly
. scalar lambda=r(mean)
. quie probit nearinc larea
. predict phat, pr
. g phi=((y81 - lambda)/(lambda*(1-lambda)))*((nearinc-phat)/(phat*(1-phat)))
. g tauATT=lrprice*phi*phat/rho
Define as p( X, ) := Pr{ D = 1| X }
a Probit or Logit model that can be
. quie summarize tauATT, meanonly
. di r(mean)
-.30975839

Bootstrap Confidence Intervals

estimated by (conditional) maximum


likelihood methods. Let b be the (C)ML
estimate, then
"
#
n
p( Xi , b)
bATT = n1
( Xi , b) Yi ,
b
i =1
( Xi , b) =

. program drop onebootrep


.
. program onebootrep, rclass
1.
drop _all
2.

use KIELMC.dta, clear

3.

bsample

4.

quie summarize nearinc, meanonly

5.

scalar rho=r(mean)

6.

quie summarize y81, meanonly

7.

scalar lambda=r(mean)

8.

quie probit nearinc larea

9.

predict phat, pr

b
Di p( Xi , b)
Ti

.
b
b
(1 )
p( Xi , b)[1 p( Xi , b)]

Standard errors can be calculated as


described in Wooldridge [2010, pp.
922-924]

10.

g phi=((y81 - lambda)/(lambda*(1-lambda)))*((nearinc-phat)/(phat*(1-phat)))

11.
12.

g tauATT=lrprice*phi*phat/rho
quie summarize tauATT, meanonly

13.

return scalar tauATT=r(mean)

14. end
.
. use KIELMC.dta, clear
. quie summarize nearinc, meanonly
. scalar rho=r(mean)
. quie summarize y81, meanonly
. scalar lambda=r(mean)
. quie probit nearinc larea
. predict phat, pr
. g phi=((y81 - lambda)/(lambda*(1-lambda)))*((nearinc-phat)/(phat*(1-phat)))
. g tauATT=lrprice*phi*phat/rho
. quie summarize tauATT, meanonly
. scalar tauATT0 = r(mean)

chapter 6 difference-in-difference (did)

99

.
. simulate tauATTb=r(tauATT), seed(10101) reps(999) nodots saving(bdata,replace): onebootrep
command:

onebootrep

tauATTb:

r(tauATT)

. summarize tauATTb
Variable |

Obs

Mean

Std. Dev.

Min

Max

-------------+-------------------------------------------------------tauATTb |

999

-.2311108

2.913331

-10.3286

8.815673

. scalar mtauATT = r(mean)


.
. scalar tauATTbcorrected = 2*tauATT0 - mtauATT
. _pctile tauATTb, p(2.5)
. scalar tauATT025 = r(r1)
. _pctile tauATTb, p(97.5)
. scalar tauATT975 = r(r1)
.
. display "Bootstrap Bias-Corrected ATT = " tauATTbcorrected
Bootstrap Bias-Corrected ATT = -.38840597
. display "Bootstrap 95% Confidence Interval = (" tauATT025 "," tauATT975 ")"
Bootstrap 95% Confidence Interval = (-6.1330314,5.238349)

Chapter 7 Panel Data


Panel data are repeated observations on the same cross-section
unit. In Economics, these units can be households, individuals,
firms, countries or industries. This set of lecture notes provides
an introduction to static and dynamic linear panel data modeling.
Extensive discussion of the Econometrics of panel data can be found
in Hsiao [2014], Arellano [2003] and Baltagi [2013].

Introduction
We assume to have a repeated sample on the cross-section unit i
0 ), . . . , ( y , x0 ). That is, the entire sample
through time, i.e. (yi1 , xi1
iTi iT
T

can be written as {{yit , xit0 }t=i 1 }in=1 and has 2 dimensions: n (crosssectional) and T := max1in Ti (Time Series). For notational simplicity, we are only concern with balanced panels, i.e. Ti = T, 1 i n.48
We will only be concerned with situations where T is small and n
is large.49 Like in standard regression analysis, when xit contains
lagged values of the dependent variable, i.e. yit1 ,. . . ,yit p , we will
refer to this case as dynamic, and when it does not, it will be labeled
static.
The data corresponds to a random sample of 595 individuals that
were drawn from the Panel Study of Income Dynamics (PSID) with
T = 7. We are interested in modeling earnings (lwage) with years of
full-time work experience (exp), number of weeks worked (wks) and
years of education (ed).

When Ti differ across i, one needs


to worry about why observations
(people, firms, households, etc) drop
out after a few years. Is this related
to the phenomenon we are trying to
explain? The latter will cause a sample
attrition bias.
49
This is commonly known as large n,
small T asymptotics. From the practical
point of view, when T is small, we
do not generally need to worry about
non-stationarity of the regressors.
48

See Baltagi and Khanti-Akom [1990] for


details.

Stata performs panel data analysis in the long data format rather
than the wide data formal. After loading the above data, Stata may
require the user to set the panel identifier and time variables. There
are two command options. The tsset or xtset commands both
perform this task. The tsset command is also used to set the time
variable and structure when using time series data.
Figure 11: Time-series plots for each of
the first 20 individuals

102

lecture notes for advanced microeconometrics with stata (v.15.1)

. *--Loading the data set


. use mus08psidextract.dta, clear
(PSID wage data 1976-82 from Baltagi and Khanti-Akom (1990))
.
. *--Organization of data set (Long Form)
. list id t lwage exp exp2 wks ed in 1/3, clean
id

lwage

exp

exp2

wks

ed

1.

5.56068

32

2.

5.72031

16

43

3.

5.99645

25

40

.
. *--Declaring individual & time identifier
. xtset id t
panel variable:

id (strongly balanced)

time variable:

t, 1 to 7

delta:

1 unit

. tsset id t
panel variable:

id (strongly balanced)

time variable:

t, 1 to 7

delta:

1 unit

. xtdes
id:
t:

1, 2, ..., 595

n =

595

1, 2, ..., 7

T =

Delta(t) = 1 unit
Span(t)

= 7 periods

(id*t uniquely identifies each observation)


Distribution of T_i:

Freq.

Percent

min

5%

25%

50%

75%

95%

max

Cum. |

Pattern

---------------------------+--------595

100.00

100.00 |

1111111

---------------------------+--------595

100.00

XXXXXXX

The xtdescribe command provides the distribution of crosssectional participation patterns. The command details whether the

chapter 7 panel data

103

panel is balanced or not. The xtdesribe shows that the above data
is a balanced panel with an observation for each participant in every
year.
In the second example, we use data on Canadian firms. The
xtdescribe command shows the panel is unbalanced but the most
popular pattern is the 201 firms with observations in every year.
. use cdn4.dta
. xtdes
id:
year:

1, 2, ..., 2981

n =

2977

1998, 1999, ..., 2011

T =

14

Delta(year) = 1 unit
Span(year)

= 14 periods

(id*year uniquely identifies each observation)


Distribution of T_i:

Freq.

Percent

min

5%

25%

50%

75%

95%

max

14

14

Cum. |

Pattern

---------------------------+---------------201

6.75

6.75 |

11111111111111

153

5.14

11.89 |

1.............

133

4.47

16.36 |

11............

129

4.33

20.69 |

.............1

109

3.66

24.35 |

111...........

102

3.43

27.78 |

............11

74

2.49

30.27 |

......1.......

73

2.45

32.72 |

.........11111

35.14 |

...........111

72

2.42

1931

64.86

100.00 | (other patterns)

---------------------------+---------------2977

100.00

XXXXXXXXXXXXXX

Let us consider a simple linear regression model


yit = xit0 + uit , i = 1, . . . , n, t = 1, . . . , T.
Unlike cross-sections, the fact that repeated realizations of the same
r.v.s are observed for the same observation through time allows the
researcher to model heterogeneity explicitly by means of modeling the
error process {{uit }tT=1 }in=1 . One such assumption is an one-way effect
model for the error process,50

A more general model is the two-way


effects model, where
50

uit = i + t + it .
See Hsiao (2003) and Baltagi (2008) for
details.

104

lecture notes for advanced microeconometrics with stata (v.15.1)

uit = i + it ,
where {i }in=1 represent random variables that capture unobserved
heterogeneity, and it is i.i.d. over i and t, uncorrelated with i , such
that E[ it |i , xi1 , . . . , xiT ] = 0,51 2 := V(i ), and 2 := V( it ).
Therefore,
yit = i + xit0 + it , i = 1, . . . , n, t = 1, . . . , T,

This assumption is known as strong


exogeneity or stritc exogeneity.
51

(29)

is also known as the individual-specific effects model. Notice that


E[yit |xi1 , . . . , xiT ] = E[i |xi1 , . . . , xiT ] + xit0 .

(30)

If E[i |xi1 , . . . , xiT ] = < , we say that (29) is a Random Effects


Model, and if E[i |xi1 , . . . , xiT ] is not constant, we will call (29) a
Fixed Effects Model.

Basic Framework
The simple linear regression model in (29) can be written in matrix
form as
y = [In T ] + X + ,
(31)
where In is the n n identity matrix, T is the T 1 vector of ones,
represents the Kronecker product,52 y = [y10 , . . . , y0n ]0 , X = [X10 , . . . , X0n ]0 ,
= [01 , . . . , 0n ]0 ,53 and = [1 , . . . , n ]0 , = [ 1 , . . . , k ]0 . We will also
use two idempotent and symmetric matrices
P = [In T 1 T 0T ],
Q = InT P.

If A is a m n matrix and B is a p q
matrix, then the Kronecker product A B
is the mp nq block matrix

a11 B
a1n B

..
..
..
.
AB =
.

.
.
am1 B amn B
52

yi1
yi2

53
yi = .
..
yiT

i1
i2

i = .
..
iT

Notice that P and Q are orthogonal, and


Py = [y1 , . . . , y1 , . . . , yn , . . . , yn ]0 ,
Qy = [y11 y1 , . . . , y1T y1 , . . . , yn1 yn , . . . , ynT yn ]0
are nT 1 vectors.

0
xi1
0
xi2

, Xi = .

..
0
xiT

Random Effect Model


Notice that we can rewrite (29) as54
yit = + xit0 + vit , i = 1, . . . , n, t = 1, . . . , T,

54
This is known as the PopulationAveraged model in Statistics.

(32)

vit := i + it ,
where E[vit |xi1 , . . . , xiT ] = 0 and therefore55
V(v) = E[vv0 ] := = 2 InT + 2 [In T 0T ], or

Errors vit are homoskedastic but serially


correlated.
55

chapter 7 panel data

2 + 2
2
..
.
2

2
2 + 2
..
.
...

...
...
..
.

2
2

0
0

0
0

0
0

...
...

0
0

2
2 + 2

0
0

0
0

...
...

0
0

0
0
0
..
.
0

0
0
0
..
.
0

0
0
..
.

0
0
0
..
.
0

...
...
...
..
.
...

0
2 + 2
2
..
.
2

0
2
2 + 2
..
.
...

...
...
...
..
.

0
2
2

0
0
..
.
0

105

2
2 + 2

Let W := [ nT , X] and := [, 0 ]0 , then model (32) can be rewritten as


y = W + v.

(33)

(Pooled) Ordinary Least Squares Estimator


Notice that as long as E[vit |xit ] = 0, the OLS estimator in (33) will be
unbiased and consistent, but inefficient, i.e. 6= v2 InT . In this case

The (pooled) OLS estimator is


obtained by simply regressing yit on a
constant and xit .

bOLS = (W0 W)1 W0 y,


V(bOLS |X) = (W0 W)1 W0 W(W0 W)1 .

Between Estimator
Notice that by taking the sample means across time in (29), we obtain
the representation
yi = + xi0 + (i + i ).
The BE estimator is the OLS estimator in this model. In matrix form,
e +v
it is simply the OLS regression of the transformed model y
e = W
e,
e = PW, v
where y
e = Py, W
e = Pv, i.e.
bBE = (W0 PW)1 W0 Py,
V(bBE |X) = (W0 PW)1 W0 PPW(W0 PW)1 .

The BE estimator is obtained by


simply regressing yi on a constant
and xi , where yi = T 1 tT=1 yit and
xi T 1 tT=1 xit .

106

lecture notes for advanced microeconometrics with stata (v.15.1)

Random Effects Estimator


Recall that the Generalized Least Squares estimator is efficient and
can be obtained by simple OLS regression on the transformed model
e +v
e = RW, v
y
e = W
e, where y
e = Ry, W
e = Rv and 1 = R0 R, i.e.
0

1
0

1
bGLS = (W W) W y, and V(bGLS |X) = (W0 1 W)1 .
Nerlove [1971] showed that
R = InT (1 )P = Q + P, where
s
2
:=
.
2
+ T2

The RE estimator is obtained by


simply regressing yit (1 b)yi on a
constant and [xit (1 b)xi ], where
b is an estimate of defined in 34,
yi = T 1 tT=1 yit , and xi = T 1 tT=1 xit .

(34)

That is, one can write56

Recall Q and P are orthogonal, i.e.


QP = 0.
56

bGLS = [W0 QW+ 2 W0 PW]1 [W0 Qy+ 2 W0 Py],


0

V(bGLS |X) = [W QW+ W PW]

(35)
(36)

The GLS estimator is obtained by simply regressing yit (1 )yi


on a constant and [xit (1 )xi ]. The Feasible GLS or RE estimator
p
is obtained by replacing the unknow in (35) by b
2 /(b
2 + Tb
2 ),
where b
2 can be obtained from the fixed effect estimator (see below)
2
and b
can be obtained from the between group residuals (see below).
. *--(Pooled) OLS regression with correct s.e.
. quietly reg lwage exp exp2 wks ed, vce(cluster id)
. estimates store OLS
.
. *--Between-Group Estimator
. quietly xtreg lwage exp exp2 wks ed, be
. estimates store BE
.
. *-- RE regression
. quietly xtreg lwage exp exp2 wks ed, re
. estimates store RE
.
. *--Comparison of OLS, BE & RE estimates
. estimates table OLS BE RE, b(%7.3f) se
-------------------------------------------Variable |

OLS

BE

RE

-------------+-----------------------------exp |

0.045

0.038

0.089

0.005

0.006

0.003

Figure 12: Time-series plots for each of


the first 20 residuals

If serial correlation in the errors is


suspected various variants of FGLS is
provided by Stata. Type help xtreg
from inside Stata and look at the pas
option corr. See Cameron and Trivedi
[2010].

chapter 7 panel data

exp2 |

-0.001

-0.001

-0.001

0.000

0.000

0.000

wks |

0.006

0.013

0.001

0.002

0.004

0.001

ed |

0.076

0.074

0.112

|
_cons |

0.005

0.005

0.006

4.908

4.683

3.829

0.140

0.210

0.094

-------------------------------------------legend: b/se

Testing for a RE Error Structure


Breusch and Pagan [1980] proposed a LM test for the H0 : vit = it
(2 = 0) vs H1 : vit = i + it . They showed that under H0
nT
2( T 1)

"

in=1 (tT=1 ubit )2


1
in=1 tT=1 ub2it

#2

d 21

. xttest0
Breusch and Pagan Lagrangian multiplier test for random effects
lwage[id,t] = Xb + u[id] + e[id,t]
Estimated results:
|

Var

sd = sqrt(Var)

---------+-----------------------------

Test:

lwage |

.2129935

.4615122

e |

.0231658

.1522032

u |

.1020921

.3195186

chibar2(01) =

5192.13

Prob > chibar2 =

0.0000

Var(u) = 0

Remarks on the Random Effects Model


Remark 23 All three estimators: (pooled) OLS, BE and RE estimators are
consistent for in (29), but only the RE estimator is efficient under the
random effect assumption, i.e. E[i |xi1 , . . . , xiT ] = < . If the true model
is the fixed effect, then all three estimators become inconsistent.
Remark 24 These three estimation procedures allow for the inclusion of a
constant as well as time invariant regressors, such as religion, culture, race,
gender, etc.

107

108

lecture notes for advanced microeconometrics with stata (v.15.1)

Fixed Effects Model


Recall that in the fixed effects model E[i |xi1 , . . . , xiT ] is assumed not
to be constant, so E[yit |xi1 , . . . , xiT ] is not identified. However, if we
were to conditioned on the individual specific effect i as well,57 then
E[yit |i , xi1 , . . . , xiT ] = i + xit0 ,

Basically, treating the {i }in=1 as fixed


or non-random.
57

and in (30) is identified as E[yit |i , xi1 , . . . , xiT ]/xit . That is,


the FE model permits only identification of the marginal effect
E[yit |i , xi1 , . . . , xiT ]/xit and even then only for time-varying covariates, so the marginal effect of race or gender is not identified.

Within Estimator
As pointed out above, the fixed effect model permits identification of
the for time-varying regressors by treating the i as fixed in estimating equation (29) or in matrix form (31). Therefore W estimator can
bW = (V0 V)1 V0 y,
be obtained by simple OLS regression on (31), i.e.
bW |X) = 2 (V0 V)1 , where V := [In T , X], and := [0 , 0 ]0 .
and V(
bW and V(
bW |X) require the inversion of a nT nT matrix V,
Both
which might be an undesirable feature. Fortunately, one can compute
the estimator for and by means of the partitioned regression
result:58
bW = [X0 M[In T ] X]1 X0 M[In T ] y,

(37)

The W estimator is obtained by


simply regressing yit yi on xit
xi without a constant, where yi =
T 1 tT=1 yit and xi = T 1 tT=1 xit .
However, notice that Stata actually fits
the model
yit yi + y = + (xit xi + x)0 + ( it i + ),
where y = n1 in=1 yi for example is
the grand mean of yit . This parameterization has the advantage of providing
an intercept estimate (the average of the
individual effects i ), while yielding the
same slope estimate bW .
58
See Greene [2012, Section 3.3, p. 32]
for details.

V( bW |X) = (X0 M[In T ] X)1 X0 M[In T ] V(|X)M[In T ] X(X0 M[In T ] X)1 .


(38)
One can show that M[In T ] = Q, and therefore the WE can be
e +e
obtained by running OLS on the transformed model y
e = X
,
59
e
where y
e = Qy, X = QX, e
= Q.

First-Difference Estimator
Define the first difference transformation D := In d, where d is of
dimension ( T 1, T ).60 Then, the FD estimator can be obtained by
e +e
running OLS on the transformed model y
e = X
, where y
e = Dy,
61
e
X = DX, e
= D:
bFD = [X0 DX]1 X0 Dy,
V( bFD |X) = (X0 DX)1 X0 DV(|X)D0 X(X0 DX)1 .

Remarks on the Fixed Effects Model


Remark 25 The Within estimator is also known in the literature as the
Least Squares Dummy Variable (LSDV) estimator, the Fixed Effect (FE)

Because M[In T ] [In T ] = 0 by


definition.
59

The FD estimator is obtained


by simply regressing yit yit1 on
xit xit1 without a constant. If a
constant is included, this would imply
that the original model had a time trend
because t (t 1) = .
60
d :=

1 1
0
0
... 0
0
0
1
1
0
... 0
0

0
0
1
1 . . . 0
0

.
..
..
..
..
..
..
.

.
. .
.
.
.
.
0
0
0
0
. . . 1 1
61

0.

Because [In d][In T ] = In d T =

chapter 7 panel data

109

estimator or the Covariance estimator. Both the W/FE and FD estimators are
consistent under both random effects and fixed effects models. However, they
are inefficient in the random effect case.
Remark 26 For T = 2, the FD and W/FE estimators are equal because
yi = (yi1 + yi2 )/2, so (yi1 yi ) = (yi2 yi1 )/2, and (yi2 yi ) =
(yi2 yi1 )/2, and similarly for xi , but when T > 2 the two estimators differ.
e +e
Remark 27 The GLS estimator of the transformed model y
e = X
, where
e = DX, e
y
e = Dy, X
= D equals the W/FE estimator. Therefore, since the
FD estimator is simply the OLS of this transformed model, the FD estimator
is less efficient than the W/FE estimator.
Remark 28 Unless further assumptions are made the W/FE and the FD
estimators do not identified the coefficient of time-invariant regressors.
However, this somewhat shortcoming turns out to also provide a robustness
property, i.e. W/FE and the FD estimators are robust to time-invariant
omitted-variable bias.

. *--Fixed-Effect Estimator
. quietly xtreg lwage exp exp2 wks ed, fe
. predict alpha_fe, u
. predict e_fe, e
. estimates store FE
. mata: beta_FE = st_matrix("e(b)")
.
. *--First-Difference Estimator
. sort id t
. quietly regress D.(lwage exp exp2 wks ed), vce(cluster id) noconstant
. mata: beta_FD = (st_matrix("e(b)"),0)
.
. *--Comparison of W/FE & FD estimates
. mata: beta_coef = beta_FE,beta_FD
. mata: beta_coef
1

+-------------------------------+
1 |

.1137878577

.1170653986

2 |

-.0004243693

-.0005321208

3 |

.0008358763

-.0002682652

4 |

5 |

4.596396197

+-------------------------------+
Figure 13: Histogram and estimated
density of the estimated individualspecific effects {b
i }4160
i =1 . Time-series
plots, t = 1, . . . , 7, for each of the first
{b it }20
i =1 implied by the W/FE estimator.

110

lecture notes for advanced microeconometrics with stata (v.15.1)

Fixed Effects or Random Effects?


Recall that the salient distinction between the two models is whether
the time-invariant effects are correlated with the regressors or not.
The specification test devised by Hausman [1978] is used to test for
orthogonality of the common effects and the regressors. The test is
based on the idea that the two estimators above have different properties depending on the correlation between i and the regressors.
Specifically,
If H0 : E[i |xi1 , . . . , xiT ] = < , then the RE estimator is
consistent and efficient. The FE estimator is also consistent but
inefficient.
If H1 : E[i |xi1 , . . . , xiT ] 6= , the FE estimator is still consistent, but
the RE estimator is now inconsistent.
Define qb1 := bW bRE , the Hausmans test statistics is62

Hausmans essential result is that the


b
covariance of an efficient estimator ( )
with is difference from an inefficient
b b) =
estimator (b) is zero, i.e. cov(b ,
0. This implies that
62

qb10 [Asy. Var(qb1 )]1 qb1 =


[ bW bRE ]0 [Asy. Var( bW ) Asy. Var( bRE )]1 [ bW bRE ] d 2dim( ) .
Hausman and Taylor [1981] showed that H0 can also be tested
using any of the following two paired differences: qb2 := bBE bRE , or
qb3 := bW bBE . The corresponding tests statistics can be computed as

V[b b] = V[b] V[ b].

qb20 [Asy. Var(qb2 )]1 qb2 = [ bBE bRE ]0 [Asy. Var( bBE )
Asy. Var( bRE )]1 [ bBE bRE ],
qb30 [Asy. Var(qb3 )]1 qb3 = [ bW bBE ]0 [Asy. Var( bW )
+ Asy. Var( bBE )]1 [ bW bBE ].
These three versions can be shown to be numerically exactly identical
if all the regressors are time-varying. However, in practice because
of rounding errors and the fact that we have to use feasible GLS for
RE estimation, numerically, these three versions may give us results
slightly different.
. *--Hausman test assuming RE is fully efficient under H0
. hausman FE RE, sigmamore
---- Coefficients ---|

(b)

(B)

FE

RE

(b-B)
Difference

sqrt(diag(V_b-V_B))
S.E.

-------------+---------------------------------------------------------------exp |

.1137879

.0888609

.0249269

.0012778

chapter 7 panel data

exp2 |

-.0004244

-.0007726

.0003482

.0000285

wks |

.0008359

.0009658

-.0001299

.0001108

111

-----------------------------------------------------------------------------b = consistent under Ho and Ha; obtained from xtreg


B = inconsistent under Ha, efficient under Ho; obtained from xtreg
Test:

Ho:

difference in coefficients not systematic


chi2(3) = (b-B)[(V_b-V_B)^(-1)](b-B)
=

1513.02

Prob>chi2 =

0.0000

Extending the Unobserved Effect Model: Mundlaks Approach


Mundlak [1978] suggested the specification63
E[i |xi1 , . . . , xiT ] =

xi0 .

The vector xi should be understood as


the vector containing the time-average
of all time-varying regressors in xit only.
63

Substituting this in the RE model, we obtain


yit = xit0 + i + it ,

= xit0 + xi0 + [i E[i |xi1 , . . . , xiT ]] + it ,


= xit0 + xi0 + i + it
This preserves the specification of the RE model, but it aims to deal
directly with the problem of E[i |xi1 , . . . , xiT ] 6= < via modeling.
This model can be understood as a compromise between the FE and
RE models. An important side benefit of the specification is that it
provides a robust approach to the Hausman Test, see the discussion in
Wooldridge [2010, pp. 332-333].
. *--Mundlaks (1978) Approach
. bysort id: egen mexp = mean(exp)
. bysort id: egen mexp2 = mean(exp2)
. bysort id: egen mwks = mean(wks)
. quietly xtreg lwage exp exp2 wks ed mexp mexp2 mwks, re vce(cluster id)
.
. *--Wooldridges (2010) Robust Hausman Test
. test mexp mexp2 mwks
( 1)

mexp = 0

( 2)

mexp2 = 0

( 3)

mwks = 0
chi2(

3) = 1792.41

Prob > chi2 =

0.0000

112

lecture notes for advanced microeconometrics with stata (v.15.1)

Endogeneity
So far we have made the strict exogeneity assumption, E[ it |i , xi1 , . . . , xiT ] =
0, which can be very strong and is often violated in economic problems.
We are now interested of modeling earnings (lwage) with a larger
set of covariates: Time varying regressors include exp, exp2, wks, marital status (ms), an indicator if wages were set by an union contract
(union), an indicator for blue-collar occupation (occ), indicators for
residence (south and smsa), and an indicator if the person works in
a manufacturing industry (ind). Time invariant covariates are ed, a
gender indicator (fem) and race indicator (blk).

Hausman & Taylors (1981) IV Estimator


Hausman and Taylor [1981] proposed a way to allow for endogenous
regressors while allowing the identification of coefficients multiplying time-invariant covariates. Their model is of the form:
0
0
0
0
yit = x1it
1 + x2it
2 + z1i
1 + z2i
2 + i + it ,

(39)

where
x1it represents k1 variables that are time-varying and uncorrelated
with i .
z1i represents l1 variables that are time invariant and uncorrelated
with i .
x2it represents k2 variables that are time-varying and correlated with
i .
z2i represents l2 variables that are time invariant and correlated with
i .
The assumptions about the random terms in the model are64
E[i |x1it , z1i ] = 0 though E[i |x2it , z2i ] 6= 0,
V[i |x1it , z1i , x2it , z2i ] = 2 ,
cov(i , it |x1it , z1i , x2it , z2i ) = 0,
V[i + it |x1it , z1i , x2it , z2i ] = 2 + 2 =: 2 ,
correlation(i + it , i + is |x1it , z1i , x2it , z2i ) = 2 /2 =: .

Note the crucial underlying assumption that one can distinguish between
the set of variables (both time-varying
and invariants) that are correlated or
not with i

64

We set x1it = [occit ,southit ,smsait ,indit ]0 , x2it = [expit ,exp2it ,wksit ,msit ,unionit ]0 ,
z1i = [femi ,blki ]0 and z2i = [edi ].
Then, the GLS transformation of (39) becomes65
yit (1 )yi = (x1it (1 )x1i )0 1 + (x2it (1 )x2i ) 2
0
0
+ z1i
1 + z2i
2 + i + it (1 )i .

A consistent estimator for can be


obtained as explained in Steps 1-3 in
Greene [2012, p. 396].
65

chapter 7 panel data

113

Let the full set of variables in the model be


0
0 0
wit := [(x1it (1 )x1i )0 , (x2it (1 )x2i ), z1i
, z2i
],
0

0
and define W := [W10 , . . . , W0
n ] , where Wi = [wi1 , . . . , wiT ] . Since
the RE estimator is inconsistent because functionals of x2it , x1it and
i are still present in the above specification, Hausman and Taylor
[1981] proposed using the following instrumental variables66

Therefore, for identification purposes


one requires that k1 l2 .
66

0
0 0
vit := [(x1it x1i )0 , (x2it x2i )0 , z1i
, x1i
],

and define V := [V10 , . . . , V0n ]0 , where Vi = [vi1 , . . . , viT ]0 , y :=


0

0
[y10 , . . . , y0
n ] and yi : = [ yi1 (1 ) yi , . . . , yiT (1 ) yi ] . Then the
instrumental variable estimator would be67
0
[ 0HT , HT
]0 = [W0 V(V0 V)1 V0 W ]1 W0 V(V0 V)1 V0 y .

Notice that the IV estimator is consistent but inefficient if the data are not
weighted, that is, if W rather than W is
used in the computation.
67

. *--Hausman & Taylors (1981) Estimator


. xthtaylor lwage occ south smsa ind exp exp2 wks ms union fem blk ed, endog(exp
>

exp2 wks ms union ed)

Hausman-Taylor estimation

Number of obs

4165

Group variable: id

Number of groups

595

Obs per group: min =

avg =

max =

Wald chi2(12)

6891.87

Prob > chi2

0.0000

Random effects u_i ~ i.i.d.

-----------------------------------------------------------------------------lwage |

Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------TVexogenous

occ |

-.0207047

.0137809

-1.50

0.133

-.0477149

.0063055

south |

.0074398

.031955

0.23

0.816

-.0551908

.0700705

smsa |

-.0418334

.0189581

-2.21

0.027

-.0789906

-.0046761

ind |

.0136039

.0152374

0.89

0.372

-.0162608

.0434686

exp |

.1131328

.002471

45.79

0.000

.1082898

.1179758

exp2 |

-.0004189

.0000546

-7.67

0.000

-.0005259

-.0003119

wks |

.0008374

.0005997

1.40

0.163

-.0003381

.0020129

ms |

-.0298508

.01898

-1.57

0.116

-.0670508

.0073493

union |

.0327714

.0149084

2.20

0.028

.0035514

.0619914

TVendogenous |

114

lecture notes for advanced microeconometrics with stata (v.15.1)

TIexogenous

fem |

-.1309236

.126659

-1.03

0.301

-.3791707

.1173234

blk |

-.2857479

.1557019

-1.84

0.066

-.5909179

.0194221

ed |

.137944

.0212485

6.49

0.000

.0962977

.1795902

|
_cons |

2.912726

.2836522

10.27

0.000

2.356778

3.468674

TIendogenous |

-------------+---------------------------------------------------------------sigma_u | .94180304
sigma_e |

.15180273

rho |

.97467788

(fraction of variance due to u_i)

-----------------------------------------------------------------------------Note:

TV refers to time varying; TI refers to time invariant.

Figure 14: Histogram and estimated


density of the estimated individualspecific effects {b
i }4160
i =1 . Time-series
plots, t = 1, . . . , 7, for each of the
first {b
it }20
i =1 implied by Hausman &
Taylors (1981) estimator.

Dynamic Panel Data Models


Consider a dynamic panel data model,
yit = yit1 + xit0 + i + it ,

(40)

where i is again an individual specific unobserved heterogeneity


term that may or may not be correlated with xit . However, by construction the lag-dependent variable yit1 is correlated with the
unobserved individual specific effect i . Due to this correlation, the

chapter 7 panel data

115

pooled OLS estimator is upward biased, while the WG estimator is


downward biased in short panels (see Nickell (1981)).
Now consider the following versions of model (40):
yit yi = (yit1 yi ) + (xit xi )0 + ( it i ),
0

yit yit1 = (yit1 yit2 ) + (xit xit1 ) + ( it it1 ).

(41)
(42)

Specifications given in (41) and (42) remove the unobserved heterogeneity parameter. Unfortunately, running OLS or GLS in (40), (41)
and (42) yield inconsistent estimates.68
Therefore IV and GMM estimation can be used instead. Specification (42) is the most popular version of (40) as it gets rid of i (as well
as any other time-invariant covariate).

68

This is so because
cov(yit1 , i + it ) 6= 0,
cov(yit1 yi , it i ) 6= 0,
cov(yit1 yit2 , it it1 ) 6= 0.

(40):
(41):
(42):

Anderson and Hsiaos (1981) IV estimator


Anderson and Hsiao [1982] suggested using yit2 (T 3) or
yit2 := yit2 yit3 (T 4) as instruments for the endogenous
covariate yit1 := yit1 yit2 , while xit := xit xit1 is used as an
instrument for itself.69
Then, the IV estimator becomes

!
!
! 1
n

bIV =

Xe i0 Zi

i =1

Xe i0 Zi

i =1

Zi0 Zi

i =1
n

Zi0 Zi

i =1

! 1

Zi0 Xe i

i =1

!
Zi0 yei .

69

However, notice that


t=3

E[ i3 |yi1 ] = 0

t=4

E[ i4 |yi1 ] = 0
E[ i4 |yi2 ] = 0

t=5

E[ i5 |yi1 ] = 0
E[ i5 |yi2 ] = 0
E[ i5 |yi3 ] = 0
..
.
E[ iT |yi1 ] = 0
E[ iT |yi2 ] = 0
..
.
E[ iT |yiT 2 ] = 0

i =1

..
.
t=T

Arellano and Bonds (1991) Estimator


Arellano and Bond (1991) proposed an alternative instrument set for
(42) with the following choices: If the regressors are strictly exogenous,
i.e. E[ it |xi1 , . . . , xiT ] = 0, then

0 , x0 , . . . , x0
yi1 , xi1
0
...
0
i2
iT

0
0
0
0
0
yi1 , yi2 , xi1 , xi2 , . . . , xiT . . .

Zi =
..
..
..
..

.
.
.
0 , x0 , . . . , x0
0
0
. . . yi1 , yi2 , . . . , yiT 2 , xi1
i2
iT

Alternatively, if the regressors are pre-determined instead, i.e. E[ it |xi1 , . . . , xit ] =


0, then

0 , x0
0
...
0
yi1 , xi1
i2

0 , x0 , x0
0
yi1 , yi2 , xi1
0

i2 i3 . . .
.
Zi =
..
..
..
..

.
.
.
0 , x0 , . . . , x0
0
0
. . . yi1 , yi2 , . . . , yiT 2 , xi1
i2
iT 1
Note: Lags of xit and xit can additionally be used as instruments, and that is
exactly what Stata does.

116

lecture notes for advanced microeconometrics with stata (v.15.1)

Alternative one could also set

0 , x0
yi1 , xi1
0
i2

0 , x0
0
yi1 , yi2 , xi1

i2
Zi =
..
..

.
.
0
0

...
...
..
.
...

0
0
..
.
0 , x0
yi1 , yi2 , . . . , yiT 2 , xi1
i2

. *--Arellano & Bonds (1991) Estimator


. xtabond lwage occ south smsa ind, lags(2) maxldep(3) pre(wks,lag(1,2)) endogen
> ous(ms,lag(0,2)) endogenous(union,lag(0,2)) twostep vce(robust) artests(3)
Arellano-Bond dynamic panel-data estimation

Number of obs

2380

Group variable: id

Number of groups

595

min =

avg =

max =

Wald chi2(10)

1287.77

Prob > chi2

0.0000

Time variable: t
Obs per group:

Number of instruments =

40

Two-step results
(Std. Err. adjusted for clustering on id)
-----------------------------------------------------------------------------|
lwage |

WC-Robust
Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------lwage |
L1. |

.611753

.0373491

16.38

0.000

.5385501

.6849559

L2. |

.2409058

.0319939

7.53

0.000

.1781989

.3036127

--. |

-.0159751

.0082523

-1.94

0.053

-.0321493

.000199

L1. |

.0039944

.0027425

1.46

0.145

-.0013807

.0093695

ms |

.1859324

.144458

1.29

0.198

-.0972

.4690649

union |

-.1531329

.1677842

-0.91

0.361

-.4819839

.1757181

occ |

-.0357509

.0347705

-1.03

0.304

-.1038999

.032398

south |

-.0250368

.2150806

-0.12

0.907

-.446587

.3965134

smsa |

-.0848223

.0525243

-1.61

0.106

-.187768

.0181235

ind |
_cons |

.0227008

.0424207

0.54

0.593

-.0604422

.1058437

1.639999

.4981019

3.29

0.001

.6637377

2.616261

|
wks |

-----------------------------------------------------------------------------Instruments for differenced equation


GMM-type: L(2/4).lwage L(1/2).L.wks L(2/3).ms L(2/3).union

chapter 7 panel data

117

Standard: D.occ D.south D.smsa D.ind


Instruments for level equation
Standard: _cons

Options lags(2) means that lwageit1 and lwageit2 are regressors, maxldep(3) means that at most 3 lags
of lwageit are used as instruments, pre(wks,lag(1,2)) means that wksit1 is a predetermined regressor, and
that up to 2 lags are used as instruments. If wksit1 is predetermined then this variable should not serve
an instrument due to the resulting correlation in (42). Now, the 2 lags instruments are wksit2 and wksit3 .
Option endogenous(,lag(0,2)) means that is an endogenous regressor that appears on the right-hand
side at its level (0) and that up to 2 lags are used as instruments.

Test of Serial Correlation


If it are serially uncorrelated, then it are correlated with it1 ,
but it should not be correlated with it j for j 2. Therefore,
Arellano and Bond [1991] proposed a test statistic based on the jthorder autocovariance r j :
r j :=

1
T3j

rtj ,

t =4+ j

where rtj = E[ it it j ]. Their null is therefore H0 : r j = 0 and the


test statistic is given by
zj =

b
rj
b (b
[V
r j )/n]1/2

where b
r j is the sample counterpart of r j based on first-difference
rtj = n1 in=1 b
it b
it j , and the expression for
residuals b
it and b
b (b
V
r j ) is given by equation 6.158, page 122, in Arellano [2003].70
Under H0 , Arellano & Bond (1991) showed that z j N (0, 1).

Since (generated) residuals b


it are
used instead of true errors it in b
rj ,
its variance V(b
r j ) contains a term to
account for the estimation effect.
70

. *--Test whether error is serially correlated


. estat abond
Arellano-Bond test for zero autocorrelation in first-differenced errors
+-----------------------+
|Order |

Prob > z|

|------+----------------|
|

|-4.5244

0.0000 |

|-1.6041

0.1087 |

| .35729

0.7209 |

+-----------------------+
H0: no autocorrelation

118

lecture notes for advanced microeconometrics with stata (v.15.1)

Test for Over-Identifying Restrictions


Notice that 40 instruments were used to estimate 11 parameters in
our example above, so there are 29 overidentifying restrictions. The
joint-validity of these extra restrictions can be tested using a standard
Sargan test for overidentifying restrictions.
. *--Test for Overidentifying restrictions
. *--No vce(robust) option is allowed
. quietly xtabond lwage occ south smsa ind, lags(2) maxldep(3) pre(wks,lag(1,2))
>

endogenous(ms,lag(0,2)) endogenous(union,lag(0,2)) twostep artests(3)

. estat sargan
Sargan test of overidentifying restrictions
H0: overidentifying restrictions are valid
chi2(29)

39.87571

Prob > chi2

0.0860

Blundell and Bonds (1998) Estimator


With highly persistent data, Arellano and Bonds (1991) first difference GMM estimator suffers from a weak instrument problem. The
system GMM estimator of Blundell and Bond (1998) provides an
alternative, which uses additional moment conditions to overcome
the weak instrument problem occurring when the data are highly
persistent. This estimator requires weaker assumptions on the initial
conditions. Additional moment conditions result from assumptions regarding the correlation between the xit and the error term
( it it1 ) in the first difference transformation of 42. For example,
the observables (xit ) can be assumed to be correlated with: a) past
shocks only; b) current shocks only; c) current and past shocks; d)
past, current and future shocks; or e) none of the shocks. Further
assumptions between the xit and the fixed effects generate additional moment conditions. When there is a high persistence to the
data, lagged differences yit1 form valid instruments for the level
equation given by 40. Blundell and Bond (1998) exploit a stationarity
assumption of the initial observation yi0 , which requires the moment
condition
i
E(yi0
) = 0,
(43)
1 i
This condition implies that the initial observation do not deviate
i
systematically from the stationary value 1
for all i. Consequently,
the estimation depends on both the first-differenced equation 42

chapter 7 panel data

and the level equation 40. This is particularly important for very
persistent data as the instruments available for 42 are more likely
to be weak and induce finite sample bias. The resulting moment
conditions cannot be used for the identification of the when we
have unit root. The additional moment conditions create instruments
that are not weak to reduce this potential bias.
. *--Blundell & Bonds (1998) Estimator
. xtdpdsys lwage occ south smsa ind, lags(2) maxldep(3) pre(wks,lag(1,2)) endoge
> nous(ms,lag(0,2)) endogenous(union,lag(0,2)) twostep vce(robust) artests(3)

System dynamic panel-data estimation

Number of obs

2975

Group variable: id

Number of groups

595

Time variable: t
Obs per group:

Number of instruments =

60

min =

avg =

max =

Wald chi2(10)

2270.88

Prob > chi2

0.0000

Two-step results
-----------------------------------------------------------------------------|
lwage |

WC-Robust
Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------lwage |
L1. |

.6017533

.0291502

20.64

0.000

.5446199

.6588866

L2. |

.2880537

.0285319

10.10

0.000

.2321322

.3439752

--. |

-.0014979

.0056143

-0.27

0.790

-.0125017

.009506

L1. |

.0006786

.0015694

0.43

0.665

-.0023973

.0037545

ms |

.0395337

.0558543

0.71

0.479

-.0699386

.1490061

union |

-.0422409

.0719919

-0.59

0.557

-.1833423

.0988606

occ |

-.0508803

.0331149

-1.54

0.124

-.1157843

.0140237

south |

-.1062817

.083753

-1.27

0.204

-.2704346

.0578713

smsa |

-.0483567

.0479016

-1.01

0.313

-.1422422

.0455288

ind |
_cons |

.0144749

.031448

0.46

0.645

-.0471621

.0761118

.9584113

.3632287

2.64

0.008

.2464961

1.670327

|
wks |

-----------------------------------------------------------------------------Instruments for differenced equation


GMM-type: L(2/4).lwage L(1/2).L.wks L(2/3).ms L(2/3).union

119

120

lecture notes for advanced microeconometrics with stata (v.15.1)

Standard: D.occ D.south D.smsa D.ind


Instruments for level equation
GMM-type: LD.lwage LD.wks LD.ms LD.union
Standard: _cons

The options follow to match the previous discussion using the xtabond command.
When the cross-section dimension is small, the Blundell and Bond
system GMM estimator typically yields downward biased standard
errors, which affects inference. The finite aspects of the data can be
used to address the later problem. 71 proposes a variance estimator,
which allows for heteroskedasticity-consistent standard errors and
corrects for this potential finite sample bias. The option vec(robust)
provide the Windmeijer corrected standard errors.
Serially uncorrelated it and overidentification are also considerations when using the Blundell and Bond system GMM estimator.
Fortunately, the post-estimation Stata commands to test for serial
correlation and overidentification extend to the xtdpdsys command.
Therefore, estat abond provides the test statistics for autocorrelation
of various orders, while estat sargan provides the test statistic for
the Sargan overidentification test. However, the Sargan statistic is
unavailable when using the Windmeijer corrected standard errors.

Frank Windmeijer. A finite sample


correction for the variance of linear
efficient two-step GMM estimators.
Journal of Econometrics, 126(1):2551,
May 2005
71

Other Stata options for Arellano-Bond and Blundell-Bond Estimators


Stata commands xtdpd and xtabond2 also allow an individual to
obtain the Arellano-Bond and Blundell-Bond Estimators. These
commands have a more complicated syntax, but allow for models
with more complicated structures on the correlation of error terms
and predetermined variables. For example, xtabond2 can be used
to estimate a wide variety of GMM models including the standard
ordinary least squares estimator.
The previous Arellano-Bond estimate obtained using xtabond2
. xtabond2 lwage l(1/2).lwage l(0/1).wks ms union occ south smsa ind, gmm(ms,lag
> (2 3)) gmm(union,lag(2 3)) twostep artests(3) noleveleq iv(occ south smsa ind)
>

gmm(l.wks,lag(1 2)) gmm(lwage,lag(2 4)) robust

Favoring space over speed. To switch, type or click on mata: mata set matafavor
> speed, perm.
Warning: Two-step estimated covariance matrix of moments is singular.
Using a generalized inverse to calculate optimal weighting matrix for two-step
>

estimation.
Difference-in-Sargan statistics may be negative.

Dynamic panel-data estimation, two-step difference GMM

chapter 7 panel data

-----------------------------------------------------------------------------Group variable: id

Number of obs

2380

Time variable : t

Number of groups

595

Number of instruments = 39
Wald chi2(10) =
Prob > chi2

Obs per group: min =

1287.77

avg =

4.00

0.000

max =

-----------------------------------------------------------------------------|
lwage |

Corrected
Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------lwage |
L1. |

.611753

.0373491

16.38

0.000

.5385501

.6849559

L2. |

.2409058

.0319939

7.53

0.000

.1781989

.3036127

--. |

-.0159751

.0082523

-1.94

0.053

-.0321493

.000199

L1. |

.0039944

.0027425

1.46

0.145

-.0013807

.0093695

|
wks |

|
ms |

.1859324

.144458

1.29

0.198

-.0972

.4690649

union |

-.1531329

.1677842

-0.91

0.361

-.4819839

.1757181

occ |

-.0357509

.0347705

-1.03

0.304

-.1038999

.032398

south |

-.0250368

.2150806

-0.12

0.907

-.446587

.3965134

smsa |

-.0848223

.0525243

-1.61

0.106

-.187768

.0181235

ind |

.0227008

.0424207

0.54

0.593

-.0604422

.1058437

-----------------------------------------------------------------------------Instruments for first differences equation


Standard
D.(occ south smsa ind)
GMM-type (missing=0, separate instruments for each period unless collapsed)
L(2/3).ms
L(2/3).union
L(1/2).L.wks
L(2/4).lwage
-----------------------------------------------------------------------------Arellano-Bond test for AR(1) in first differences: z =

-4.52

Pr > z =

0.000

Arellano-Bond test for AR(2) in first differences: z =

-1.60

Pr > z =

0.109

Arellano-Bond test for AR(3) in first differences: z =

0.36

Pr > z =

0.721

-----------------------------------------------------------------------------Sargan test of overid. restrictions: chi2(29)

59.55

Prob > chi2 =

0.001

Prob > chi2 =

0.086

(Not robust, but not weakened by many instruments.)


Hansen test of overid. restrictions: chi2(29)

39.88

(Robust, but can be weakened by many instruments.)


Difference-in-Hansen tests of exogeneity of instrument subsets:

121

122

lecture notes for advanced microeconometrics with stata (v.15.1)

gmm(ms, lag(2 3))


Hansen test excluding group:

chi2(21)

32.35

Prob > chi2 =

0.054

7.52

Prob > chi2 =

0.481

21.15

Prob > chi2 =

0.450

18.73

Prob > chi2 =

0.016

31.14

Prob > chi2 =

0.071

8.74

Prob > chi2 =

0.365

chi2(18)

23.59

Prob > chi2 =

0.169

Difference (null H = exogenous): chi2(11)

16.29

Prob > chi2 =

0.131

28.00

Prob > chi2 =

0.308

11.87

Prob > chi2 =

0.018

Difference (null H = exogenous): chi2(8)


gmm(union, lag(2 3))
Hansen test excluding group:

chi2(21)

Difference (null H = exogenous): chi2(8)


gmm(L.wks, lag(1 2))
Hansen test excluding group:

chi2(21)

Difference (null H = exogenous): chi2(8)


gmm(lwage, lag(2 4))
Hansen test excluding group:
iv(occ south smsa ind)
Hansen test excluding group:

chi2(25)

Difference (null H = exogenous): chi2(4)

Notice that we must specify all the regressors of the model including the lagged dependent or any other endogenous variables. Option
noleveleq specifies not to instrument the level equation, which
means that the results provide the Arellano-Bond estimator. Option
iv() means that include the exogenous variables in the model and
any other variables serving as instruments. Option gmm(,lag(a,b))
means that is an endogenous or a predetermined regressor that
appear on the right-hand side with upto a lags and between a and b
lags are used as instruments. Option gmm also contains a suboption
collapse, which restricts the instrument set to one instrument for
each variable and lag distance, rather than one for each time period,
variable, and lag distance. This restriction is useful for two reasons.
First, a bias arises in small samples as the number of instruments
moves towards the number of observations due to overfitting of the
model. Second, the suboption collapse reduces the width of the
instrument matrix which lowers the computational burdens and prevents the instrument matrix from exceeding Statas size limits. The
above syntax treats the variable wksit1 as predetermined and instruments with its two lags (wksit2 , wksit3 ). Equivalently, we could treat
the variable wksit as predetermined and instrument with its second
and third lags (wksit2 , wksit3 ).

See Roodman [2009] for details on the

xtabond2 command.

.
. xtabond2 lwage l(1/2).lwage l(0/1).wks ms union occ south smsa ind, gmm(ms,lag
> (2 3)) gmm(union,lag(2 3)) twostep artests(3) noleveleq iv(occ south smsa ind)
>

gmm(wks,lag(2 3)) gmm(lwage,lag(2 4)) robust

Favoring space over speed. To switch, type or click on mata: mata set matafavor
> speed, perm.

chapter 7 panel data

Warning: Two-step estimated covariance matrix of moments is singular.


Using a generalized inverse to calculate optimal weighting matrix for two-step
>

estimation.
Difference-in-Sargan statistics may be negative.

Dynamic panel-data estimation, two-step difference GMM


-----------------------------------------------------------------------------Group variable: id

Number of obs

2380

Time variable : t

Number of groups

595

Number of instruments = 39

Obs per group: min =

1287.77

avg =

4.00

0.000

max =

Wald chi2(10) =
Prob > chi2

-----------------------------------------------------------------------------|
lwage |

Corrected
Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------lwage |
L1. |

.611753

.0373491

16.38

0.000

.5385501

.6849559

L2. |

.2409058

.0319939

7.53

0.000

.1781989

.3036127

--. |

-.0159751

.0082523

-1.94

0.053

-.0321493

.000199

L1. |

.0039944

.0027425

1.46

0.145

-.0013807

.0093695

|
wks |

|
ms |

.1859324

.144458

1.29

0.198

-.0972

.4690649

union |

-.1531329

.1677842

-0.91

0.361

-.4819839

.1757181

occ |

-.0357509

.0347705

-1.03

0.304

-.1038999

.032398

south |

-.0250368

.2150806

-0.12

0.907

-.446587

.3965134

smsa |

-.0848223

.0525243

-1.61

0.106

-.187768

.0181235

ind |

.0227008

.0424207

0.54

0.593

-.0604422

.1058437

-----------------------------------------------------------------------------Instruments for first differences equation


Standard
D.(occ south smsa ind)
GMM-type (missing=0, separate instruments for each period unless collapsed)
L(2/3).ms
L(2/3).union
L(2/3).wks
L(2/4).lwage
-----------------------------------------------------------------------------Arellano-Bond test for AR(1) in first differences: z =

-4.52

Pr > z =

0.000

Arellano-Bond test for AR(2) in first differences: z =

-1.60

Pr > z =

0.109

Arellano-Bond test for AR(3) in first differences: z =

0.36

Pr > z =

0.721

------------------------------------------------------------------------------

123

124

lecture notes for advanced microeconometrics with stata (v.15.1)

Sargan test of overid. restrictions: chi2(29)

59.55

Prob > chi2 =

0.001

Prob > chi2 =

0.086

(Not robust, but not weakened by many instruments.)


Hansen test of overid. restrictions: chi2(29)

39.88

(Robust, but can be weakened by many instruments.)


Difference-in-Hansen tests of exogeneity of instrument subsets:
gmm(ms, lag(2 3))
Hansen test excluding group:

chi2(21)

32.35

Prob > chi2 =

0.054

7.52

Prob > chi2 =

0.481

21.15

Prob > chi2 =

0.450

18.73

Prob > chi2 =

0.016

31.14

Prob > chi2 =

0.071

8.74

Prob > chi2 =

0.365

chi2(18)

23.59

Prob > chi2 =

0.169

Difference (null H = exogenous): chi2(11)

16.29

Prob > chi2 =

0.131

28.00

Prob > chi2 =

0.308

11.87

Prob > chi2 =

0.018

Difference (null H = exogenous): chi2(8)


gmm(union, lag(2 3))
Hansen test excluding group:

chi2(21)

Difference (null H = exogenous): chi2(8)


gmm(wks, lag(2 3))
Hansen test excluding group:

chi2(21)

Difference (null H = exogenous): chi2(8)


gmm(lwage, lag(2 4))
Hansen test excluding group:
iv(occ south smsa ind)
Hansen test excluding group:

chi2(25)

Difference (null H = exogenous): chi2(4)

The Arellano-Bond estimator using xtdpd is


.
.
.
. xtdpd l(0/2).lwage l(0/1).wks ms union occ south smsa ind,dg(lwage,l(2 4)) dg(
> ms, l(2 3)) dg(union,l(2 3)) artests(3)

div(occ south smsa ind) dg(wks,l(2 3)

> ) twostep
Dynamic panel-data estimation

Number of obs

2975

Group variable: id

Number of groups

595

Time variable: t
Obs per group:

Number of instruments =

40

min =

avg =

max =

Wald chi2(10)

1640.91

Prob > chi2

0.0000

Two-step results
------------------------------------------------------------------------------

chapter 7 panel data

lwage |

Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------lwage |
L1. |

.611753

.0251464

24.33

0.000

.562467

.661039

L2. |

.2409058

.0217815

11.06

0.000

.198215

.2835967

--. |

-.0159751

.0067113

-2.38

0.017

-.029129

-.0028212

L1. |

.0039944

.0020621

1.94

0.053

-.0000472

.008036

|
wks |

|
ms |

.1859324

.1263155

1.47

0.141

-.0616413

.4335062

union |

-.1531329

.1345067

-1.14

0.255

-.4167613

.1104955

occ |

-.0357509

.0303114

-1.18

0.238

-.0951602

.0236583

south |

-.0250368

.1537619

-0.16

0.871

-.3264046

.2763309

smsa |

-.0848223

.0477614

-1.78

0.076

-.1784329

.0087884

ind |

.0227008

.03597

0.63

0.528

-.0477991

.0932006

_cons |

1.639999

.3656413

4.49

0.000

.9233556

2.356643

-----------------------------------------------------------------------------Warning: gmm two-step standard errors are biased; robust standard


errors are recommended.
Instruments for differenced equation
GMM-type: L(2/4).lwage L(2/3).ms L(2/3).union L(2/3).wks
Standard: D.occ D.south D.smsa D.ind
Instruments for level equation
Standard: _cons
.

Similarly The Blundell-Bond estimator using xtdpd is


. xtdpd l(0/2).lwage l(0/1).wks ms union occ south smsa ind,dg(lwage,l(2 4)) dg(
> ms, l(2 3)) dg(union,l(2 3)) artests(3)

div(occ south smsa ind) dg(wks,l(2 3)

> ) twostep lg(lwage wks ms union) vce(robust)


Dynamic panel-data estimation

Number of obs

2975

Group variable: id

Number of groups

595

Time variable: t
Obs per group:

Number of instruments =
Two-step results

60

min =

avg =

max =

Wald chi2(10)

2270.88

Prob > chi2

0.0000

125

126

lecture notes for advanced microeconometrics with stata (v.15.1)

(Std. Err. adjusted for clustering on id)


-----------------------------------------------------------------------------|
lwage |

WC-Robust
Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------lwage |
L1. |

.6017533

.0291502

20.64

0.000

.5446199

.6588866

L2. |

.2880537

.0285319

10.10

0.000

.2321322

.3439752

--. |

-.0014979

.0056143

-0.27

0.790

-.0125017

.009506

L1. |

.0006786

.0015694

0.43

0.665

-.0023973

.0037545

ms |

.0395337

.0558543

0.71

0.479

-.0699386

.1490061

union |

-.0422409

.0719919

-0.59

0.557

-.1833423

.0988606

occ |

-.0508803

.0331149

-1.54

0.124

-.1157843

.0140237

south |

-.1062817

.083753

-1.27

0.204

-.2704346

.0578713

smsa |

-.0483567

.0479016

-1.01

0.313

-.1422422

.0455288

ind |
_cons |

.0144749

.031448

0.46

0.645

-.0471621

.0761118

.9584113

.3632287

2.64

0.008

.2464961

1.670327

|
wks |

-----------------------------------------------------------------------------Instruments for differenced equation


GMM-type: L(2/4).lwage L(2/3).ms L(2/3).union L(2/3).wks
Standard: D.occ D.south D.smsa D.ind
Instruments for level equation
GMM-type: LD.lwage LD.wks LD.ms LD.union
Standard: _cons

When using xtdpd, the distinction between the difference GMM estimator of Arellano-Bond and the system
GMM estimator of Blundell-Bond occurs through the specification of instrumenting the difference equation
(42) or both the difference and levels (40) equations as part of the options. The option dg(,l(a,b)) means
that is an endogenous or a predetermined regressor that appear on the right-hand side with upto a lags
and between a and b lags are used as instruments in the difference equation. The option lg(,l(a)) means
that ath lag of the differences of serve as instruments in the levels equation. Not specifying the option lg
results in the Arellano-Bond estimator.

Chapter 8 High-Dimensional Fixed-Effects


Estimation of standard a fixed effects model in a panel data setting
is a straight forward process in Stata using the xtreg command
with the fe option. Difficulty arises when we have fixed effects in
multiple dimensions and large administrative data. In these notes, we
demonstrate the process to account for these fixed effects and obtain
estimates of the parameters of interest.

Introduction
One standard approach to estimate a fixed effect model with panel
data is to use the within estimator. This approach appeals to the
Frisch-Waugh Lovell theorem by demeaning data at the individual
level to remove the individual effect. The within estimator avoids two
problems which result when trying to estimate a fixed effect model
with a set of dummy variables for individuals. The first is a statistical
problem in that the inclusion of the individual dummy variables
leads to the estimation of excessive number of nuisance parameters
which causes inconsistency in the estimates of the parameters of
interest. The second is a practical problem, where the inclusion of
individual dummy variables as regressors leads to too many right
hand side variables for Stata to handle given its matrix limitations.

High-Dimensional Fixed Effects Models


We begin discussing High-Dimensional Fixed Effects (HDFE) models
with a simple example from Guimaraes and Portugal [2010], which
demonstrates the estimation process. A useful command in Stata
to count the time elapsed per command is set rmsg on. After each
command the amount of time plus time will be displayed.
. clear
. set more off
. set rmsg on
. set obs 1000

128

lecture notes for advanced microeconometrics with stata (v.15.1)

. * Generates a random data set


. gen x=rnormal()+5
. egen firm=seq(), from(1) to(20) block(50)
. gen y=5+0.5*firm+5*x+rnormal()

In this example, a random data set of 1000 observations is created.


The random variable x is generated with draws from a normal distribution with a mean equal to five. The variable f irm is an identifier
variable for individual firms, and thus, is our fixed effect. This variable is created by putting observations into 20 sets of 50 observation
blocks. Finally, both the x and f irm variables along with a random
variable with a normal distribution contribute to the data generating
process of the variable y. Given that the data generating process for
the variable y is known, we can perform a standard regression of
variable y on x and f irm variables:
. reg y x firm
Source |

SS

df

MS

Number of obs =

-------------+-----------------------------Model |

34105.8511

17052.9256

Residual |

1004.57925

997

1.00760205

F(

-------------+-----------------------------Total |

35110.4303

999

35.1455759

2,

1000

997) =16924.27

Prob > F

R-squared

0.0000

0.9714

Adj R-squared =

0.9713

Root MSE

1.0038

-----------------------------------------------------------------------------y |

Coef.

Std. Err.

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------x |

5.019581

.0313022

160.36

0.000

4.958156

5.081007

firm |
_cons |

.4994599

.0055049

90.73

0.000

.4886574

.5102625

4.862866

.1699874

28.61

0.000

4.529291

5.19644

------------------------------------------------------------------------------

However, we do not know the underlying data generating process


for real data. In practise, estimation of the coefficient on the variable
x would be done using a fixed effects regression with the following
options:
. xtset firm
panel variable:
. reg y x i.firm

firm (balanced)

chapter 8 high-dimensional fixed-effects

Source |

SS

df

MS

Number of obs =

-------------+-----------------------------Model |

34123.6364

20

1706.18182

Residual |

986.793935

979

1.00796112

F( 20,

-------------+-----------------------------Total |

35110.4303

999

35.1455759

1000

979) = 1692.71

Prob > F

R-squared

0.9719

Adj R-squared =

0.9713

Root MSE

0.0000

1.004

-----------------------------------------------------------------------------y |

Coef.

Std. Err.

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------x |

5.021169

.0315942

158.93

0.000

4.959169

5.083169

|
firm |
2

.3819136

.2008479

1.90

0.058

-.0122283

.7760555

.7969512

.2009165

3.97

0.000

.4026747

1.191228

1.306139

.200848

6.50

0.000

.9119964

1.700281

1.794757

.2014689

8.91

0.000

1.399396

2.190117

2.637599

.2008385

13.13

0.000

2.243476

3.031723

2.699761

.2007947

13.45

0.000

2.305723

3.093798

3.361955

.2010312

16.72

0.000

2.967454

3.756457

3.956463

.2012489

19.66

0.000

3.561534

4.351392

10

4.187189

.2010149

20.83

0.000

3.79272

4.581659

11

4.74716

.200897

23.63

0.000

4.352921

5.141398

12

5.314797

.2010879

26.43

0.000

4.920184

5.70941

13

5.845618

.2011525

29.06

0.000

5.450879

6.240358

14

6.114079

.2008724

30.44

0.000

5.719889

6.508269

15

6.98943

.2009823

34.78

0.000

6.595024

7.383835

16

7.436002

.2009108

37.01

0.000

7.041737

7.830267

17

7.718788

.200799

38.44

0.000

7.324742

8.112834

18

8.298004

.2008403

41.32

0.000

7.903877

8.692131

19

9.109146

.2008015

45.36

0.000

8.715095

9.503196

20

9.361374

.2009232

46.59

0.000

8.967084

9.755664

|
_cons |

5.496404

.2074166

26.50

0.000

5.089372

5.903436

-----------------------------------------------------------------------------. xtreg y x, fe
Fixed-effects (within) regression

Number of obs

1000

Group variable: firm

Number of groups

20

R-sq:

= 0.9627

Obs per group: min =

50

between = 0.0415

avg =

50.0

overall = 0.7351

max =

50

within

129

130

lecture notes for advanced microeconometrics with stata (v.15.1)

corr(u_i, Xb)

= -0.0039

F(1,979)

25257.78

Prob > F

0.0000

-----------------------------------------------------------------------------y |

Coef.

Std. Err.

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------x |
_cons |

5.021169

.0315942

158.93

0.000

4.959169

5.083169

10.09926

.1610984

62.69

0.000

9.783122

10.4154

-------------+---------------------------------------------------------------sigma_u |
2.958017
sigma_e |

1.0039727

rho |

.89670228

(fraction of variance due to u_i)

-----------------------------------------------------------------------------F test that all u_i=0:


F(19, 979) =
434.03
Prob > F = 0.0000
. egen ybar = mean(y), by(firm)
. egen xbar = mean(x), by(firm)
. gen yi=y-ybar
. gen xi=x-xbar
. reg yi xi
Source |

SS

df

MS

Number of obs =

-------------+-----------------------------Model |

25458.8598

25458.8598

Residual |

986.793937

998

.98877148

F(

-------------+-----------------------------Total |

26445.6537

999

26.4721259

1,

1000

998) =25747.97

Prob > F

R-squared

0.0000

0.9627

Adj R-squared =

0.9626

Root MSE

.99437

-----------------------------------------------------------------------------yi |

Coef.

Std. Err.

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------xi |
_cons |

5.021169

.031292

160.46

0.000

4.959763

5.082574

3.06e-09

.0314447

0.00

1.000

-.0617054

.0617054

------------------------------------------------------------------------------

The implementation of the fixed effects estimation is possible since


there is only one dimension fixed effect along with relatively few
number of firms at 20. Often these conditions do not hold, which
makes estimation difficult. Guimaraes and Portugal [2010] provide an

chapter 8 high-dimensional fixed-effects

algorithm to estimate parameters of interest while accounting for the


presence of fixed effects. For our example, the algorithm becomes:
local dif=1
local ll1=0
gen fe1=0
local i=1
while abs(dif)>epsdouble() {
qui regress y x fe1, nocons
local ll2=ll1
local ll1=e(ll)
local dif=ll2-ll1
capture drop fe1
gen double temp1=y-_b[x]*x
egen double fe1=mean(temp1), by(firm)
capture drop temp1
local i=i+1
}
di "Total Number of Iterations --> " i

The algorithm generates the variable f e1, which accounts for the
effect of the f irm variable on the value of y. Therefore, we have:
. reg y x fe1
Source |

SS

df

MS

Number of obs =

-------------+------------------------------

F(

2,

1000

997) =17238.28

Model |

34123.6364

17061.8182

Prob > F

0.0000

Residual |

986.793935

997

.989763225

R-squared

0.9719

Adj R-squared =

0.9718

Root MSE

.99487

-------------+-----------------------------Total |

35110.4303

999

35.1455759

-----------------------------------------------------------------------------y |

Coef.

Std. Err.

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------x |

5.021169

.0310239

161.85

0.000

4.960289

5.082048

fe1 |

.0109121

91.64

0.000

.9785868

1.021413

_cons |

8.30e-08

.1931887

0.00

1.000

-.379103

.3791032

------------------------------------------------------------------------------

For this regression, we have the same coefficient estimate on the


variable x as we do using the xtreg command with the fe option.
Thus, the algorithm accounts for the unobserved firm effects regardless of the nature of these effects.

131

132

lecture notes for advanced microeconometrics with stata (v.15.1)

The following generates a random data set of 1000000 observations. The x variable is an integer random variable. This new data
has two fixed effects which we call for exposition: (i) a firm effect;
and (ii) an industry effect. Given the set up of the random data set,
firms do switch industries.
global tob_obs "1000000"
global nfirms "20000"
*block1 = tot_obs/nfirms
global block1 "50"
global ind "200"
*block2= tot_obs/ind
global block2 "5000"
set obs $tob_obs
set seed 20140317
* Create a data set
*************************************************
gen rnd=uniform()
gen x=int(10*uniform())
egen firm=seq(), from(1) to($nfirms) block($block1)
sort rnd
egen ind=seq(), from(1) to(200) block($block2)
gen y=0.5*firm+5*x-0.5*ind+5*uniform()
sum
*************************************************

In this situation, the combination of number of observations and


two fixed effects make it difficult for Stata to control for both fixed
effects within the various standard regression or within estimation
approaches, which is not the case in the previous example. However,
the previous algorithm for the previous example can be modified to
account for both the firm and industry fixed effects.
while abs(dif)>epsdouble() {
qui regress y x fe1 fe2, nocons
local ll2=ll1
local ll1=e(ll)
local dif=ll2-ll1
capture drop fe1
capture drop temp1
gen double temp1=y-_b[x]*x-fe2

chapter 8 high-dimensional fixed-effects

egen double fe1=mean(temp1), by(firm)


capture drop fe2
capture drop temp2
gen double temp2=y-_b[x]*x-fe1
egen double fe2=mean(temp2), by(ind)
local i=i+1
}
r; t=70.73
. regress y x fe1 fe2
Source |

SS

df

MS

Number of obs = 1000000

-------------+-----------------------------Model |
Residual |

8.3344e+12

F(

2.7781e+12

Prob > F

0.0000

2040669.98999996

2.04067814

R-squared

1.0000

-------------+-----------------------------Total |

3,999996) =

8.3344e+12999999

Adj R-squared =

8334370.46

Root MSE

1.0000

1.4285

-----------------------------------------------------------------------------y |

Coef.

Std. Err.

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------x |

5.000167

.0004976

1.0e+04

0.000

4.999192

5.001142

fe1 |

4.95e-07

2.0e+06

0.000

.999999

1.000001

fe2 |
_cons |

.0000495

2.0e+04

0.000

.999903

1.000097

1.28e-08

.0036142

0.00

1.000

-.0070837

.0070837

-----------------------------------------------------------------------------r; t=0.17

The algorithm generates two variables. The variable f e1 controls


for the firm effect, while the variable f e2 controls for the industry
effect. There are several user-contributed .ado files to efficiently estimate high-dimensional fixed effect models. A recent implementation
is reg2hdfe by Guimaraes and Portugal [2010]. To estimate a similar
model above we use this command:
. reg2hdfe y x, id1(firm) id2(ind)
==============================================================
Tolerance Level for Iterations: 1.00e-06
Transforming variable: y
Variable y converged after 6 Iterations
Checking if model converged - Coefficients for fixed effects should equal 1
Coefficient for id1 --> 1
Coefficient for id2 --> 1

133

134

lecture notes for advanced microeconometrics with stata (v.15.1)

Transforming variable: x
Variable x converged after 5 Iterations
Checking if model converged - Coefficients for fixed effects should equal 1
Coefficient for id1 --> 1
Coefficient for id2 --> 1.0000002

********** Linear Regression with 2 High-Dimensional Fixed Effects **********


Number of obs

1000000

F(20199, 979800)=

1.98e+08

Prob > F

0.0000

R-squared

1.0000

Adj R-squared

1.0000

Root MSE

1.4432

-----------------------------------------------------------------------------y |

Coef.

Std. Err.

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------x |

5.00006

.0005079

9844.74

0.000

4.999064

5.001055

-----------------------------------------------------------------------------r; t=44.49

Notice that the manual implementation takes about 70.90 seconds


while reg2hdfe takes about 44.49 seconds. Other commands useful
for estimation with high-dimensional fixed effects models include
gpreg, reg2hdfe and a2reg. These commands are user created and
not built in to Stata. For a thorough review of these different methods the reader should consult McCaffrey et al. [2012]. This exercise is
left to the reader.

Bibliography
Alberto Abadie. Semiparametric difference-in-differences. Review of
Economic Studies, 72:119, 2005.
T.W. Anderson and Cheng Hsiao. Formulation and estimation of
dynamic models using panel data. Journal of Econometrics, 18(1):47
82, 1982.
Manuel Arellano. Panel data econometrics. Advanced texts in econometrics. Oxford University Press, 2003. ISBN 9780199245291.
Manuel Arellano and Stephen Bond. Some tests of specification for
panel data: Monte carlo evidence and an application to employment
equations. Review of Economic Studies, 58(2):27797, April 1991.
Badi H. Baltagi. Econometric Analysis of Panel Data. John Wiley & Sons
Ltd, 5 edition, 2013.
Badi H Baltagi and Sophon Khanti-Akom. On efficient estimation
with panel data: An empirical comparison of instrumental variables
estimators. Journal of Applied Econometrics, 5(4):40106, Oct.-Dec. 1990.
T S Breusch and A R Pagan. The lagrange multiplier test and its
applications to model specification in econometrics. Review of
Economic Studies, 47(1):23953, January 1980.
A. Colin Cameron and Pravin K. Trivedi. Microeconometrics Using
Stata. Stata Press, revised edition, 2010.
Russell Davidson and James G. MacKinnon. Econometric Theory and
Methods. Oxford University Press, New York, Oxford, 2004.
Arthur S. Goldberger. A Course in Econometrics. Harvard University
Press, Cambridge and London, 1991.
William H. Greene. Econometric Analysis. Prentice Hall, 7 edition,
2012. ISBN 0130600383.
Paulo Guimaraes and Pedro Portugal. A simple feasible procedure
to fit models with high-dimensional fixed effects. The Stata Journal,
10(4):628649, 2010.

136

lecture notes for advanced microeconometrics with stata (v.15.1)

Jerry A Hausman. Specification tests in econometrics. Econometrica,


46(6):125171, 1978.
Jerry A. Hausman and William E. Taylor. Panel data and unobservable individual effects. Journal of Econometrics, 16(1):155155, May
1981.
Fumio Hayashi. Econometrics. Princeton University Press, December
2000. ISBN 0691010188.
C. Hsiao. Analysis of panel data. Econometric Society monographs.
Cambridge University Press, 3 edition, 2014. ISBN 9781107038691.
Jack Johnston and John DiNardo. Econometric Methods. McGraw-Hill,
4 edition, 1997.
Daniel F. McCaffrey, J. R. Lockwood, Kata Mihaly, and Tim R. Sass. A
review of stata routines for fixed effects estimation in normal linear
models. The Stata Journal, 12(3):406432(27), 2012.
Yair Mundlak. On the pooling of time series and cross section data.
Econometrica, 46(1):6985, January 1978.
Marc Nerlove. A note on error components models. Econometrica, 39
(2):38396, March 1971.
Whitney K. Newey and Kenneth D. West. A simple, positive semidefinite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55(3):70308, May 1987.
D Roodman. How to do xtabond2: An introduction to difference"
and system" gmm in stata. The Stata Journal, 9(1):86136, 2009.
P. Rosenbaum and D. Rubin. The central role of the propensity score
in observational studies for causal effects. Biometrika, 70(01):4155,
1983.
Halbert White. A heteroskedasticity-consistent covariance-matrix
estimator and a direct test for heteroskedasticity. Econometrica, 48(4):
817838, 1980.
Frank Windmeijer. A finite sample correction for the variance of
linear efficient two-step GMM estimators. Journal of Econometrics, 126
(1):2551, May 2005.
Jeffrey M. Wooldridge. Econometric Analysis of Cross Section and Panel
Data. The MIT Press, Cambridge and London, 2 edition, 2010. ISBN
0262232197.

You might also like