0% found this document useful (0 votes)
5 views38 pages

Lecture 3

Uploaded by

78541234h
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views38 pages

Lecture 3

Uploaded by

78541234h
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 38

14.

170: Programming for


Economists

1/12/2009-1/16/2009

Melissa Dell
Matt Notowidigdo
Paul Schrimpf
Lecture 3, Maximum
Likelihood Estimation in Stata
Introduction to MLE
• Stata has a built-in language to write ML estimators. It uses this
language to write many of its built-in commands
– e.g. probit, tobit, logit, clogit, glm, xtpoisson, etc.

• I find the language very easy-to-use. For simple log-likelihood


functions (especially those that are linear in the log-likelihood of
each observation), implementation is trivial and the built-in
maximization routines are good

• Why should you use Stata ML?


– Stata will automatically calculate numerical gradients for you during
each maximization step
– Have access to Stata’s syntax for dealing with panel data sets (for panel
MLE this can result in very easy-to-read code)
– Can use as a first-pass to quickly evaluate whether numerical
gradients/Hessians are going to work, or whether the likelihood surface
is too difficult to maximize.

• Why shouldn’t you use Stata ML?


– Maximization options are limited (standard Newton-Raphson and BHHH
are included, but more recent algorithms not yet programmed)
– Tools to guide search over difficult likelihood functions aren’t great
ML with linear model and
normal errors
Basic Stata ML

program drop _all


program mynormal_lf
args lnf mu sigma
qui replace `lnf' = log((1/`sigma')*normalden(($ML_y1 - `mu')/`sigma'))
end

clear
set obs 100
set seed 12345
gen x = invnormal(uniform())
gen y = 2*x + invnormal(uniform())
ml model lf mynormal_lf (y = x) ()
ml maximize
reg y x
ML with linear regression
program drop _all
program mynormal_lf
args lnf mu sigma
qui replace `lnf' = log((1/`sigma')*normden(($ML_y1-`mu')/`sigma'))
end

clear
set obs 100
set seed 12345
gen x = invnormal(uniform())
gen y = 2*x + x*x*invnormal(uniform())
gen keep = (uniform() > 0.1)
gen weight = uniform()
ml model lf mynormal_lf (y = x) () [aw=weight] if keep == 1, robust
ml maximize
reg y x [aw=weight] if keep == 1, robust
What’s going on in the background?
• We just wrote a 3 (or 5) line program. What
does Stata do with it?
• When we call “ml maximize” it does the following
steps:
– Initializes the parameters (the “betas”) to all zeroes
– As long as it has not declared convergence
• Calculates the gradient at the current parameter value
• Takes a step
• Updates parameters
• Test for convergence (based on either gradient, Hessian, or
combination)
– Displays the parameters as regression output
(ereturn!)
How does it calculate gradient?
• Since we did not program a gradient, Stata will calculate gradients
numerically. It will calculate a gradient by finding a numerical derivative.

• Review:
– Analytic derivative is the following:

– So that leads to a simple approximation formula for “suitably small but large
enough h”; this is a numerical derivative of a function:

– Stata knows how to choose a good “h” and in general it gets it right

• Stata updates its parameter guess using the numerical derivatives as


follows (i.e. it takes a “Newton” step):
probit
Back to Stata ML (myprobit)
program drop _all
program myprobit_lf
args lnf xb
qui replace `lnf' = ln(norm( `xb')) if $ML_y1 == 1
qui replace `lnf' = ln(norm(-1*`xb')) if $ML_y1 == 0
end

clear
set obs 1000
set seed 12345
gen x = invnormal(uniform())
gen y = (0.5 + 0.5*x > invnormal(uniform()))
ml model lf myprobit_lf (y = x)
ml maximize
probit y x
TMTOWTDI!
program drop _all
program myprobit_lf
args lnf xb
qui replace `lnf' = ///
$ML_y1*ln(norm(`xb')) + (1-$ML_y1)*(1 - ln(norm(`xb')))
end

clear
set obs 1000
set seed 12345
gen x = invnormal(uniform())
gen y = (0.5 + 0.5*x > invnormal(uniform()))
ml model lf myprobit_lf (y = x)
ml maximize
probit y x
What happens here?
program drop _all
program myprobit_lf
args lnf xb
qui replace `lnf' = ln(norm( `xb')) if $ML_y1 == 1
qui replace `lnf' = ln(norm(-1*`xb')) if $ML_y1 == 0
end

clear
set obs 1000
set seed 12345
gen x = invnormal(uniform())
gen y = (0.5 + 0.5*x > invnormal(uniform()))
ml model lf myprobit_lf (y = x) ()
ml maximize
probit y x
Difficult likelihood functions?

• Stata will give up if it can’t calculate numerical derivatives. This can be


a big pain, especially if it’s a long-running process and happens after a
long time. If this is not a bug in your code (like last slide), a lot of
errors like this is a sign to leave Stata so that you can get better control
of the maximization process.

• A key skill is figuring whether the error above is “bug” in your program
or if it is a difficult likelihood function to maximize.
Transforming parameters
program drop _all
program mynormal_lf
args lnf mu ln_sigma
tempvar sigma
gen double `sigma' = exp(`ln_sigma')
qui replace `lnf' = log((1/`sigma')*normden(($ML_y1-`mu')/`sigma'))
end

clear
set obs 100
set seed 12345
gen x = invnormal(uniform())
gen y = 2*x + 0.01*invnormal(uniform())
ml model lf mynormal_lf (y = x) /log_sigma
ml maximize
reg y x
From “lf” to “d0”, “d1”, and “d2”
• In some (rare) cases you will want to code the gradient
(and possibly) the Hessian by hand. If there are simple
analytic formulas for these and/or you need more speed
and/or the numerical derivatives are not working out very
well, this can be a good thing to do.

• Every ML estimator we have written so far has been of


type “lf”. In order to calculate analytic gradients, we
need to use a “d1” or a “d2” ML estimator

• But before we can implement the analytic formulas for


the gradient and Hessian in CODE, we need to derive
the analytic formulas themselves.
gradient and Hessian for probit
More probit (d0)
program drop _all
program myprobit_d0
args todo b lnf
tempvar xb l_j
mleval `xb' = `b'
qui {
gen `l_j' = normalden( `xb') if $ML_y1 == 1
replace `l_j' = normalden(-1 * `xb') if $ML_y1 == 0
mlsum `lnf' = ln(`l_j')
}
end

clear
set obs 1000
set seed 12345
gen x = invnormal(uniform())
gen y = (0.5 + 0.5*x > invnormal(uniform()))
ml model d0 myprobit_d0 (y = x)
ml maximize
probit y x
More probit (d0)
program drop _all
program myprobit_d0
args todo b lnf
tempvar xb l_j
mleval `xb' = `b'
qui {
gen double `l_j' = norm( `xb') if $ML_y1 == 1
replace `l_j' = norm(-1 * `xb') if $ML_y1 == 0
mlsum `lnf' = ln(`l_j')
}
end

clear
set obs 1000
set seed 12345
gen x = invnormal(uniform())
gen y = (0.5 + 0.5*x > invnormal(uniform()))
ml model d0 myprobit_d0 (y = x)
ml maximize
probit y x
Still more probit (d1)
program drop _all
program myprobit_d1
args todo b lnf g
tempvar xb l_j g1
mleval `xb' = `b'
qui {
gen double `l_j' = norm( `xb') if $ML_y1 == 1
replace `l_j' = norm(-1 * `xb') if $ML_y1 == 0
mlsum `lnf' = ln(`l_j')

gen double `g1' = normden(`xb')/`l_j' if $ML_y1 == 1


replace `g1' = -normden(`xb')/`l_j' if $ML_y1 == 0
mlvecsum `lnf' `g' = `g1', eq(1)
}
end

clear
set obs 1000
set seed 12345
gen x = invnormal(uniform())
gen y = (0.5 + 0.5*x > invnormal(uniform()))
ml model d1 myprobit_d1 (y = x)
ml maximize
probit y x
Last probit, I promise (d2)
program drop _all
program myprobit_d2
args todo b lnf g negH
tempvar xb l_j g1
mleval `xb' = `b'
qui {
gen double `l_j' = norm( `xb') if $ML_y1 == 1
replace `l_j' = norm(-1 * `xb') if $ML_y1 == 0
mlsum `lnf' = ln(`l_j')

gen double `g1' = normden(`xb')/`l_j' if $ML_y1 == 1


replace `g1' = -normden(`xb')/`l_j' if $ML_y1 == 0
mlvecsum `lnf' `g' = `g1', eq(1)

mlmatsum `lnf' `negH' = `g1'*(`g1'+`xb'), eq(1,1)


}
end

clear
set obs 1000
set seed 12345
gen x = invnormal(uniform())
gen y = (0.5 + 0.5*x > invnormal(uniform()))
ml model d2 myprobit_d2 (y = x)
ml search
ml maximize
probit y x
Beyond linear-form likelihood fn’s
• Many ML estimators I write down do NOT satisfy the
linear-form restriction, but OFTEN they have a simple
panel structure (e.g. think of any “xt*” command in Stata
that is implemented in ML)

• Stata has nice intuitive commands to deal with panels


(e.g. “by” command!) that work inside ML programs

• As an example, let’s develop a random-effects estimator


in Stata ML. This likelihood function does NOT satisfy
the linear-form restriction (i.e. the overall log-likelihood
function is NOT just the sum of the individual log-
likelihood functions)

• This has two purposes:


– More practice going from MATH to CODE
– Good example of a panel data ML estimator implementation
Random effects in ML
program drop _all
program define myrereg_d0
args todo b lnf
tempvar xb z T S_z2 Sz_2 S_temp a first
tempname sigma_u sigma_e ln_sigma_u ln_sigma_e
mleval `xb' = `b', eq(1)
mleval `ln_sigma_u' = `b', eq(2) scalar
mleval `ln_sigma_e' = `b', eq(3) scalar
scalar `sigma_u' = exp(`ln_sigma_u')
scalar `sigma_e' = exp(`ln_sigma_e')

** hack!
sort $panel
qui {
gen double `z' = $ML_y1 - `xb'
by $panel: gen `T' = _N
gen double `a' = (`sigma_u'^2) / (`T'*(`sigma_u'^2) + `sigma_e'^2)
by $panel: egen double `S_z2' = sum(`z'^2)
by $panel: egen double `S_temp' = sum(`z')
by $panel: gen double `Sz_2' = `S_temp'^2
by $panel: gen `first' = (_n == 1)
mlsum `lnf' = -.5 * ///
( (`S_z2' - `a'*`Sz_2')/(`sigma_e'^2) + ///
log(`T'*`sigma_u'^2/`sigma_e'^2 + 1) + ///
`T'*log(2* _pi * `sigma_e'^2) ///
) if `first' == 1
}
end
Random effects in ML
program drop _all
program define myrereg_d0
args todo b lnf
tempvar xb z T S_z2 Sz_2 S_temp a first
tempname sigma_u sigma_e ln_sigma_u ln_sigma_e
mleval `xb' = `b', eq(1)
mleval `ln_sigma_u' = `b', eq(2) scalar
mleval `ln_sigma_e' = `b', eq(3) scalar
scalar `sigma_u' = exp(`ln_sigma_u')
scalar `sigma_e' = exp(`ln_sigma_e')

** hack!
sort $panel
qui {
gen double `z' = $ML_y1 - `xb'
by $panel: gen `T' = _N
gen double `a' = (`sigma_u'^2) / (`T'*(`sigma_u'^2) + `sigma_e'^2)
by $panel: egen double `S_z2' = sum(`z'^2)
by $panel: egen double `S_temp' = sum(`z')
by $panel: gen double `Sz_2' = `S_temp'^2
by $panel: gen `first' = (_n == 1)
mlsum `lnf' = -.5 * ///
( (`S_z2' - `a'*`Sz_2')/(`sigma_e'^2) + ///
log(`T'*`sigma_u'^2/`sigma_e'^2 + 1) + ///
`T'*log(2* _pi * `sigma_e'^2) ///
) if `first' == 1
}
end
Random effects in ML
program drop _all
program define myrereg_d0
args todo b lnf
tempvar xb z T S_z2 Sz_2 S_temp a first
tempname sigma_u sigma_e ln_sigma_u ln_sigma_e
mleval `xb' = `b', eq(1)
mleval `ln_sigma_u' = `b', eq(2) scalar
mleval `ln_sigma_e' = `b', eq(3) scalar
scalar `sigma_u' = exp(`ln_sigma_u')
scalar `sigma_e' = exp(`ln_sigma_e')

** hack!
sort $panel
qui {
gen double `z' = $ML_y1 - `xb'
by $panel: gen `T' = _N
gen double `a' = (`sigma_u'^2) / (`T'*(`sigma_u'^2) + `sigma_e'^2)
by $panel: egen double `S_z2' = sum(`z'^2)
by $panel: egen double `S_temp' = sum(`z')
by $panel: gen double `Sz_2' = `S_temp'^2
by $panel: gen `first' = (_n == 1)
mlsum `lnf' = -.5 * ///
( (`S_z2' - `a'*`Sz_2')/(`sigma_e'^2) + ///
log(`T'*`sigma_u'^2/`sigma_e'^2 + 1) + ///
`T'*log(2* _pi * `sigma_e'^2) ///
) if `first' == 1
}
end
Random effects in ML
program drop _all
program define myrereg_d0
args todo b lnf
tempvar xb z T S_z2 Sz_2 S_temp a first
tempname sigma_u sigma_e ln_sigma_u ln_sigma_e
mleval `xb' = `b', eq(1)
mleval `ln_sigma_u' = `b', eq(2) scalar
mleval `ln_sigma_e' = `b', eq(3) scalar
scalar `sigma_u' = exp(`ln_sigma_u')
scalar `sigma_e' = exp(`ln_sigma_e')

** hack!
sort $panel

qui {
gen double `z' = $ML_y1 - `xb'
by $panel: gen `T' = _N
gen double `a' = (`sigma_u'^2) / (`T'*(`sigma_u'^2) + `sigma_e'^2)
by $panel: egen double `S_z2' = sum(`z'^2)
by $panel: egen double `S_temp' = sum(`z')
by $panel: gen double `Sz_2' = `S_temp'^2
by $panel: gen `first' = (_n == 1)
mlsum `lnf' = -.5 * ///
( (`S_z2' - `a'*`Sz_2')/(`sigma_e'^2) + ///
log(`T'*`sigma_u'^2/`sigma_e'^2 + 1) + ///
`T'*log(2* _pi * `sigma_e'^2) ///
) if `first' == 1
}
end
Random effects in ML
program drop _all
program define myrereg_d0
args todo b lnf
tempvar xb z T S_z2 Sz_2 S_temp a first
tempname sigma_u sigma_e ln_sigma_u ln_sigma_e
mleval `xb' = `b', eq(1)
mleval `ln_sigma_u' = `b', eq(2) scalar
mleval `ln_sigma_e' = `b', eq(3) scalar
scalar `sigma_u' = exp(`ln_sigma_u')
scalar `sigma_e' = exp(`ln_sigma_e')

** hack!
sort $panel

qui {
gen double `z' = $ML_y1 - `xb'
by $panel: gen `T' = _N
gen double `a' = (`sigma_u'^2) / (`T'*(`sigma_u'^2) + `sigma_e'^2)
by $panel: egen double `S_z2' = sum(`z'^2)
by $panel: egen double `S_temp' = sum(`z')
by $panel: gen double `Sz_2' = `S_temp'^2
by $panel: gen `first' = (_n == 1)
mlsum `lnf' = -.5 * ///
( (`S_z2' - `a'*`Sz_2')/(`sigma_e'^2) + ///
log(`T'*`sigma_u'^2/`sigma_e'^2 + 1) + ///
`T'*log(2* _pi * `sigma_e'^2) ///
) if `first' == 1
}
end
Random effects in ML
program drop _all
program define myrereg_d0
args todo b lnf
tempvar xb z T S_z2 Sz_2 S_temp a first
tempname sigma_u sigma_e ln_sigma_u ln_sigma_e
mleval `xb' = `b', eq(1)
mleval `ln_sigma_u' = `b', eq(2) scalar
mleval `ln_sigma_e' = `b', eq(3) scalar
scalar `sigma_u' = exp(`ln_sigma_u')
scalar `sigma_e' = exp(`ln_sigma_e')

** hack!
sort $panel
qui {
gen double `z' = $ML_y1 - `xb'
by $panel: gen `T' = _N
gen double `a' = (`sigma_u'^2) / (`T'*(`sigma_u'^2) + `sigma_e'^2)
by $panel: egen double `S_z2' = sum(`z'^2)
by $panel: egen double `S_temp' = sum(`z')
by $panel: gen double `Sz_2' = `S_temp'^2
by $panel: gen `first' = (_n == 1)
mlsum `lnf' = -.5 * ///
( (`S_z2' - `a'*`Sz_2')/(`sigma_e'^2) + ///
log(`T'*`sigma_u'^2/`sigma_e'^2 + 1) + ///
`T'*log(2* _pi * `sigma_e'^2) ///
) if `first' == 1
}
end
Random effects in ML

clear
set obs 100
set seed 12345
gen x = invnormal(uniform())
gen id = 1 + floor((_n - 1)/10)
bys id: gen fe = invnormal(uniform())
bys id: replace fe = fe[1]
gen y = x + fe + invnormal(uniform())
global panel = "id"
ml model d0 myrereg_d0 (y = x) /ln_sigma_u /ln_sigma_e
ml search
ml maximize
xtreg y x, i(id) re
“my” MLE RE
vs.
XTREG, MLE

Point estimates
identical but
standard errors
different; why?
Exercises
(A) Implement logit as a simple (i.e. “lf”) ML
estimator using Stata’s ML language
(If you have extra time, implement as a d2
estimator, calculating the gradient and Hessian
analytically)

(B) Implement conditional logit maximum likelihood


(MLE) estimator using Stata’s ML
language (NOTE: This is HARD; see hints
on-line)
conditional logit

You might also like