Lecture 3
Lecture 3
1/12/2009-1/16/2009
Melissa Dell
Matt Notowidigdo
Paul Schrimpf
Lecture 3, Maximum
Likelihood Estimation in Stata
Introduction to MLE
• Stata has a built-in language to write ML estimators. It uses this
language to write many of its built-in commands
– e.g. probit, tobit, logit, clogit, glm, xtpoisson, etc.
clear
set obs 100
set seed 12345
gen x = invnormal(uniform())
gen y = 2*x + invnormal(uniform())
ml model lf mynormal_lf (y = x) ()
ml maximize
reg y x
ML with linear regression
program drop _all
program mynormal_lf
args lnf mu sigma
qui replace `lnf' = log((1/`sigma')*normden(($ML_y1-`mu')/`sigma'))
end
clear
set obs 100
set seed 12345
gen x = invnormal(uniform())
gen y = 2*x + x*x*invnormal(uniform())
gen keep = (uniform() > 0.1)
gen weight = uniform()
ml model lf mynormal_lf (y = x) () [aw=weight] if keep == 1, robust
ml maximize
reg y x [aw=weight] if keep == 1, robust
What’s going on in the background?
• We just wrote a 3 (or 5) line program. What
does Stata do with it?
• When we call “ml maximize” it does the following
steps:
– Initializes the parameters (the “betas”) to all zeroes
– As long as it has not declared convergence
• Calculates the gradient at the current parameter value
• Takes a step
• Updates parameters
• Test for convergence (based on either gradient, Hessian, or
combination)
– Displays the parameters as regression output
(ereturn!)
How does it calculate gradient?
• Since we did not program a gradient, Stata will calculate gradients
numerically. It will calculate a gradient by finding a numerical derivative.
• Review:
– Analytic derivative is the following:
– So that leads to a simple approximation formula for “suitably small but large
enough h”; this is a numerical derivative of a function:
– Stata knows how to choose a good “h” and in general it gets it right
clear
set obs 1000
set seed 12345
gen x = invnormal(uniform())
gen y = (0.5 + 0.5*x > invnormal(uniform()))
ml model lf myprobit_lf (y = x)
ml maximize
probit y x
TMTOWTDI!
program drop _all
program myprobit_lf
args lnf xb
qui replace `lnf' = ///
$ML_y1*ln(norm(`xb')) + (1-$ML_y1)*(1 - ln(norm(`xb')))
end
clear
set obs 1000
set seed 12345
gen x = invnormal(uniform())
gen y = (0.5 + 0.5*x > invnormal(uniform()))
ml model lf myprobit_lf (y = x)
ml maximize
probit y x
What happens here?
program drop _all
program myprobit_lf
args lnf xb
qui replace `lnf' = ln(norm( `xb')) if $ML_y1 == 1
qui replace `lnf' = ln(norm(-1*`xb')) if $ML_y1 == 0
end
clear
set obs 1000
set seed 12345
gen x = invnormal(uniform())
gen y = (0.5 + 0.5*x > invnormal(uniform()))
ml model lf myprobit_lf (y = x) ()
ml maximize
probit y x
Difficult likelihood functions?
• A key skill is figuring whether the error above is “bug” in your program
or if it is a difficult likelihood function to maximize.
Transforming parameters
program drop _all
program mynormal_lf
args lnf mu ln_sigma
tempvar sigma
gen double `sigma' = exp(`ln_sigma')
qui replace `lnf' = log((1/`sigma')*normden(($ML_y1-`mu')/`sigma'))
end
clear
set obs 100
set seed 12345
gen x = invnormal(uniform())
gen y = 2*x + 0.01*invnormal(uniform())
ml model lf mynormal_lf (y = x) /log_sigma
ml maximize
reg y x
From “lf” to “d0”, “d1”, and “d2”
• In some (rare) cases you will want to code the gradient
(and possibly) the Hessian by hand. If there are simple
analytic formulas for these and/or you need more speed
and/or the numerical derivatives are not working out very
well, this can be a good thing to do.
clear
set obs 1000
set seed 12345
gen x = invnormal(uniform())
gen y = (0.5 + 0.5*x > invnormal(uniform()))
ml model d0 myprobit_d0 (y = x)
ml maximize
probit y x
More probit (d0)
program drop _all
program myprobit_d0
args todo b lnf
tempvar xb l_j
mleval `xb' = `b'
qui {
gen double `l_j' = norm( `xb') if $ML_y1 == 1
replace `l_j' = norm(-1 * `xb') if $ML_y1 == 0
mlsum `lnf' = ln(`l_j')
}
end
clear
set obs 1000
set seed 12345
gen x = invnormal(uniform())
gen y = (0.5 + 0.5*x > invnormal(uniform()))
ml model d0 myprobit_d0 (y = x)
ml maximize
probit y x
Still more probit (d1)
program drop _all
program myprobit_d1
args todo b lnf g
tempvar xb l_j g1
mleval `xb' = `b'
qui {
gen double `l_j' = norm( `xb') if $ML_y1 == 1
replace `l_j' = norm(-1 * `xb') if $ML_y1 == 0
mlsum `lnf' = ln(`l_j')
clear
set obs 1000
set seed 12345
gen x = invnormal(uniform())
gen y = (0.5 + 0.5*x > invnormal(uniform()))
ml model d1 myprobit_d1 (y = x)
ml maximize
probit y x
Last probit, I promise (d2)
program drop _all
program myprobit_d2
args todo b lnf g negH
tempvar xb l_j g1
mleval `xb' = `b'
qui {
gen double `l_j' = norm( `xb') if $ML_y1 == 1
replace `l_j' = norm(-1 * `xb') if $ML_y1 == 0
mlsum `lnf' = ln(`l_j')
clear
set obs 1000
set seed 12345
gen x = invnormal(uniform())
gen y = (0.5 + 0.5*x > invnormal(uniform()))
ml model d2 myprobit_d2 (y = x)
ml search
ml maximize
probit y x
Beyond linear-form likelihood fn’s
• Many ML estimators I write down do NOT satisfy the
linear-form restriction, but OFTEN they have a simple
panel structure (e.g. think of any “xt*” command in Stata
that is implemented in ML)
** hack!
sort $panel
qui {
gen double `z' = $ML_y1 - `xb'
by $panel: gen `T' = _N
gen double `a' = (`sigma_u'^2) / (`T'*(`sigma_u'^2) + `sigma_e'^2)
by $panel: egen double `S_z2' = sum(`z'^2)
by $panel: egen double `S_temp' = sum(`z')
by $panel: gen double `Sz_2' = `S_temp'^2
by $panel: gen `first' = (_n == 1)
mlsum `lnf' = -.5 * ///
( (`S_z2' - `a'*`Sz_2')/(`sigma_e'^2) + ///
log(`T'*`sigma_u'^2/`sigma_e'^2 + 1) + ///
`T'*log(2* _pi * `sigma_e'^2) ///
) if `first' == 1
}
end
Random effects in ML
program drop _all
program define myrereg_d0
args todo b lnf
tempvar xb z T S_z2 Sz_2 S_temp a first
tempname sigma_u sigma_e ln_sigma_u ln_sigma_e
mleval `xb' = `b', eq(1)
mleval `ln_sigma_u' = `b', eq(2) scalar
mleval `ln_sigma_e' = `b', eq(3) scalar
scalar `sigma_u' = exp(`ln_sigma_u')
scalar `sigma_e' = exp(`ln_sigma_e')
** hack!
sort $panel
qui {
gen double `z' = $ML_y1 - `xb'
by $panel: gen `T' = _N
gen double `a' = (`sigma_u'^2) / (`T'*(`sigma_u'^2) + `sigma_e'^2)
by $panel: egen double `S_z2' = sum(`z'^2)
by $panel: egen double `S_temp' = sum(`z')
by $panel: gen double `Sz_2' = `S_temp'^2
by $panel: gen `first' = (_n == 1)
mlsum `lnf' = -.5 * ///
( (`S_z2' - `a'*`Sz_2')/(`sigma_e'^2) + ///
log(`T'*`sigma_u'^2/`sigma_e'^2 + 1) + ///
`T'*log(2* _pi * `sigma_e'^2) ///
) if `first' == 1
}
end
Random effects in ML
program drop _all
program define myrereg_d0
args todo b lnf
tempvar xb z T S_z2 Sz_2 S_temp a first
tempname sigma_u sigma_e ln_sigma_u ln_sigma_e
mleval `xb' = `b', eq(1)
mleval `ln_sigma_u' = `b', eq(2) scalar
mleval `ln_sigma_e' = `b', eq(3) scalar
scalar `sigma_u' = exp(`ln_sigma_u')
scalar `sigma_e' = exp(`ln_sigma_e')
** hack!
sort $panel
qui {
gen double `z' = $ML_y1 - `xb'
by $panel: gen `T' = _N
gen double `a' = (`sigma_u'^2) / (`T'*(`sigma_u'^2) + `sigma_e'^2)
by $panel: egen double `S_z2' = sum(`z'^2)
by $panel: egen double `S_temp' = sum(`z')
by $panel: gen double `Sz_2' = `S_temp'^2
by $panel: gen `first' = (_n == 1)
mlsum `lnf' = -.5 * ///
( (`S_z2' - `a'*`Sz_2')/(`sigma_e'^2) + ///
log(`T'*`sigma_u'^2/`sigma_e'^2 + 1) + ///
`T'*log(2* _pi * `sigma_e'^2) ///
) if `first' == 1
}
end
Random effects in ML
program drop _all
program define myrereg_d0
args todo b lnf
tempvar xb z T S_z2 Sz_2 S_temp a first
tempname sigma_u sigma_e ln_sigma_u ln_sigma_e
mleval `xb' = `b', eq(1)
mleval `ln_sigma_u' = `b', eq(2) scalar
mleval `ln_sigma_e' = `b', eq(3) scalar
scalar `sigma_u' = exp(`ln_sigma_u')
scalar `sigma_e' = exp(`ln_sigma_e')
** hack!
sort $panel
qui {
gen double `z' = $ML_y1 - `xb'
by $panel: gen `T' = _N
gen double `a' = (`sigma_u'^2) / (`T'*(`sigma_u'^2) + `sigma_e'^2)
by $panel: egen double `S_z2' = sum(`z'^2)
by $panel: egen double `S_temp' = sum(`z')
by $panel: gen double `Sz_2' = `S_temp'^2
by $panel: gen `first' = (_n == 1)
mlsum `lnf' = -.5 * ///
( (`S_z2' - `a'*`Sz_2')/(`sigma_e'^2) + ///
log(`T'*`sigma_u'^2/`sigma_e'^2 + 1) + ///
`T'*log(2* _pi * `sigma_e'^2) ///
) if `first' == 1
}
end
Random effects in ML
program drop _all
program define myrereg_d0
args todo b lnf
tempvar xb z T S_z2 Sz_2 S_temp a first
tempname sigma_u sigma_e ln_sigma_u ln_sigma_e
mleval `xb' = `b', eq(1)
mleval `ln_sigma_u' = `b', eq(2) scalar
mleval `ln_sigma_e' = `b', eq(3) scalar
scalar `sigma_u' = exp(`ln_sigma_u')
scalar `sigma_e' = exp(`ln_sigma_e')
** hack!
sort $panel
qui {
gen double `z' = $ML_y1 - `xb'
by $panel: gen `T' = _N
gen double `a' = (`sigma_u'^2) / (`T'*(`sigma_u'^2) + `sigma_e'^2)
by $panel: egen double `S_z2' = sum(`z'^2)
by $panel: egen double `S_temp' = sum(`z')
by $panel: gen double `Sz_2' = `S_temp'^2
by $panel: gen `first' = (_n == 1)
mlsum `lnf' = -.5 * ///
( (`S_z2' - `a'*`Sz_2')/(`sigma_e'^2) + ///
log(`T'*`sigma_u'^2/`sigma_e'^2 + 1) + ///
`T'*log(2* _pi * `sigma_e'^2) ///
) if `first' == 1
}
end
Random effects in ML
program drop _all
program define myrereg_d0
args todo b lnf
tempvar xb z T S_z2 Sz_2 S_temp a first
tempname sigma_u sigma_e ln_sigma_u ln_sigma_e
mleval `xb' = `b', eq(1)
mleval `ln_sigma_u' = `b', eq(2) scalar
mleval `ln_sigma_e' = `b', eq(3) scalar
scalar `sigma_u' = exp(`ln_sigma_u')
scalar `sigma_e' = exp(`ln_sigma_e')
** hack!
sort $panel
qui {
gen double `z' = $ML_y1 - `xb'
by $panel: gen `T' = _N
gen double `a' = (`sigma_u'^2) / (`T'*(`sigma_u'^2) + `sigma_e'^2)
by $panel: egen double `S_z2' = sum(`z'^2)
by $panel: egen double `S_temp' = sum(`z')
by $panel: gen double `Sz_2' = `S_temp'^2
by $panel: gen `first' = (_n == 1)
mlsum `lnf' = -.5 * ///
( (`S_z2' - `a'*`Sz_2')/(`sigma_e'^2) + ///
log(`T'*`sigma_u'^2/`sigma_e'^2 + 1) + ///
`T'*log(2* _pi * `sigma_e'^2) ///
) if `first' == 1
}
end
Random effects in ML
clear
set obs 100
set seed 12345
gen x = invnormal(uniform())
gen id = 1 + floor((_n - 1)/10)
bys id: gen fe = invnormal(uniform())
bys id: replace fe = fe[1]
gen y = x + fe + invnormal(uniform())
global panel = "id"
ml model d0 myrereg_d0 (y = x) /ln_sigma_u /ln_sigma_e
ml search
ml maximize
xtreg y x, i(id) re
“my” MLE RE
vs.
XTREG, MLE
Point estimates
identical but
standard errors
different; why?
Exercises
(A) Implement logit as a simple (i.e. “lf”) ML
estimator using Stata’s ML language
(If you have extra time, implement as a d2
estimator, calculating the gradient and Hessian
analytically)