Maximum Likelihood Programming in Stata: January 2003
Maximum Likelihood Programming in Stata: January 2003
Maximum Likelihood Programming in Stata: January 2003
net/publication/252262224
CITATIONS READS
6 2,225
1 author:
Marco Steenbergen
University of Zurich
66 PUBLICATIONS 5,064 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Marco Steenbergen on 04 June 2014.
August 2003
Contents
1 Introduction 2
2 Features 2
3 Syntactic Structure 2
3.1 Program Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Ml Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2.1 Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2.2 Additional Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.3 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Ml Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Ml Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.5 Ml Maximize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.6 Monitoring Convergence and Ml Graph . . . . . . . . . . . . . . . . . . . . . . 10
4 Output 10
9 References 17
1
1 Introduction
Maximum likelihood-based methods are now so common that most statistical
software packages have “canned” routines for many of those methods. Thus, it
is rare that you will have to program a maximum likelihood estimator yourself.
However, if this need arises (for example, because you are developing a new
method or want to modify an existing one), then Stata offers a user-friendly
and flexible programming language for maximum likelihood estimation (MLE).
In this document, I describe the basic syntax elements that allow you to
write and execute MLE routines in Stata (Versions 7 and 8). I do not intend to
offer an in-depth discussion of all of Stata’s many features—for this you should
consult Gould and Sribney (1999) and Gould, Pitblado, and Sribney (2003).
My objective is merely to provide you with enough tools that you can write a
simple MLE program and implement it.
2 Features
Stata has many nice features, including: (1) quick convergence (under most
circumstances) using the Newton-Raphson algorithm (Stata 8 also offers quasi-
Newton algorithms); (2) a conservative approach to declaring convergence, which
leads to more trustworthy estimates; (3) simplifying features that allow imple-
menting MLE with a minimum of calculus; (4) robust variance estimation; (5)
Wald and likelihood ratio test procedures; (6) a search routine that chooses
improved starting values; (7) estimation under linear constraints; and (8) post-
estimation commands. These features make Stata one of the easiest MLE pro-
grams to work with.
3 Syntactic Structure
Programming and executing MLE routines in Stata requires a specific sequence
of commands. These may be part of an ado file, or they can be entered in-
teractively. The following shows the sequence of commands and explains their
meaning. Optional commands are indicated by an asterisk.
1. Program instructions: The program specifies the parameters and log-
likelihood function. This is done in general terms, so that the commands
can be used in any application where they are relevant. (The program
may be kept in a separate ado file.)
2. ml model:1 This command specifies the model that is to be estimated
(i.e., dependent variable and predictors), as well as the MLE program
that should be run and the way in which it should be run. This command
is application-specific: it specifies the model in terms of the particular set
of variables that is loaded into memory.
1 In this document, I indicate Stata commands in print type.
2
3. ml check*: This command checks the program syntax for mistakes. While
optional, it is extremely useful for debugging MLE routines. Beginning
programmers are advised to use this command.
4. ml search*: This optional command causes Stata to search for better
starting values for the numerical optimization algorithm.
5. ml maximize: This command starts the execution of the estimation com-
mands and generates the output.
6. ml graph*: This is an optional command that produces a graph showing
the iteration path of the numerical optimization algorithm. I recommend
using this command so that one can monitor convergence.
end
In between these keywords, the user has to declare the parameters and the
log-likelihood function. First, the log-likelihood function and its parameters
have to be labeled. This is done through the command args (which is an
abbreviation for the computer term “arguments”). Next, the log-likelihood
function has to be defined; this is done using the quietly replace command.3
In addition to these specifications, it is often useful to declare the program
version, especially if you are planning to make changes to the program over
time.
2 That is, the maximization method is lf (see the discussion of ml model below). Please
note that lf is the easiest approach in Stata but not always the most accurate. However, in
my programming experience I have never encountered an instance in which the results from
lf were misleading.
3 “Replace” indicates that the user is substituting a new expression. “Quietly” implies that
Stata does not echo this substitution—i.e., it is not displayed on the screen or in the output.
3
Example 1: To show the use of these commands, consider the simple example
of the Poisson distribution:
µy e−µ
f (y|µ) =
y!
Here µ is the parameter that we want to estimate. For a sample of n independent
observations, this distribution produces the following log-likelihood function:
X X
l(µ|y1 , y2 · · · yn ) = yi ln(µ) − nµ − ln(yi !)
i i
Let us analyze what this program does. In the first line we define the pro-
gram, calling it poisson. In the second line, we show the version of the program
(version 1.0). The third line provides a name for the log-likelihood function
(lnf) and its one parameter (mu). The fourth line specifies the log-likelihood
function and the fifth line ends the program.
The action, of course, is in the fourth line. This line is based on the argu-
ments specified in args. Because we are referring to arguments, they should
be placed in apostrophes. (In fact, the leading apostrophe is backward leaning
and is typically located on the same key as the tilde; the second apostrophe
is straight and is typically located on the same key as the double apostrophe.)
The fourth line also contains the variable $ML y1, which is the internal label
for the (first) dependent variable. Stata will replace this with an appropriate
variable from the data set after the ml model command has been specified.4
Finally, the fourth line specifies a function. (The last term in this expression,
lnfact($ML y1), stands for ln(y!).)
A careful inspection of the fourth line of code shows that it looks a lot like
the log-likelihood function, except that it does not include summations. In fact,
this line gives the log-likelihood function for a single observation:
l(µ|yi ) = yi ln(µ) − µ − ln(yi !)
As long as the observations are independent (i.e., the linear form restriction on
the log-likelihood function is met), this is all you have to specify. Stata knows
that it should evaluate this function for each observation in the data and then
sum the results. This greatly simplifies programming log-likelihood functions.5
4 By not referring to a specific variable name, the program can be used for any data set.
This is quite useful, as you do not have to go back into the program to change variable names.
5 Keep in mind, however, that this will only work if the observations are independent and
4
Example 2: As a second example, consider the normal probability density
function:
( µ ¶2 )
2 1 1 y−µ
f (y|µ, σ ) = √ exp −
2πσ 2 2 σ
1
= φ(z)
σ
where z = (y−µ)
σ and φ(.) denotes the standard normal distribution.6 Imagine
that we draw a sample of n independent observations from the normal distrib-
ution, then the log-likelihood function is given by
X
l(µ, σ 2 |y1 , y2 · · · yn ) = −n ln(σ) + ln[φ(zi )]
i
5
3.2 Ml Model
To apply a program to a data set, you need to issue the ml model command.
This command also controls the method of maximization that is used, but I will
assume that this method is lf—i.e., the linear form restrictions hold and the
derivatives are obtained numerically.7
The syntax for ml model is:
ml model lf name equations [if] [, options]
Here ml model may be abbreviated to ml mod, name is the name of the MLE
program (e.g., poisson), and equations specifies the model that should be esti-
mated through the program. A subset of the data may be selected through the
if statement. It is also possible to specify various options, as will be discussed
below.
3.2.1 Equations
To perform MLE, Stata needs to know the model that you want to estimate.
That is, it needs to know the dependent and, if relevant, the predictor variables.
These variables are declared by specifying one ore more equations. The user
can specify these equations before running ml model by using an alias. It is
also possible to specify the equations in the ml model command, placing each
equation in parentheses.
The general rule in Stata is that a separate equation is specified for each mean
model and for each (co)variance model. For example, if we wanted to estimate
the mean and variance of a normal distribution, we would need an equation for
the mean and an equation for the variance. In a linear regression model, we
would need an equation for the conditional mean of y (i.e., E[yi |xi b]) and for
the variance (the latter model would include only a constant, unless we specify
a heteroskedastic regression model). In a simultaneous equations model, there
would be as many equations as endogenous variables, plus additional equations
to specify the covariance structure.
hold, then the user may choose from three other maximization methods: d0, d1, and d2.
The difference between these methods lies in the way in which the first and second (partial)
derivatives are obtained. Both the first and second derivative are obtained numerically with
d0. With d1, the user has to derive the first derivative analytically (i.e., through calculus),
while the second derivative is obtained numerically. With d2, both derivatives are obtained
analytically; this is generally the most accurate approach. You should note that the use of d0,
d1, and d2 necessitates additional lines of code in the program to specify the log-likelihood
function more completely and, if necessary, to specify the first and second derivatives (see
Gould and Sribney 1999; Gould, Pitblado, and Sribney 2003).
6
We have now generated an equation by the name or alias of “mean” that specifies
the model for µ in the Poisson distribution. The equation specifies the dependent
variable on the left-hand side—this is the name of a variable in the data set.
The right-hand side is empty because there are no predictors of µ (other than
the constant).8
To estimate this model we type:
ml model lf poisson mean
Here lf is the maximization method, poisson is the name of the maximum
likelihood program, and mean is the alias for the equation specifying the mean
model. The alias will appear in the output and can make it easier to read.
Example 2: Now imagine that the normal density describes the conditional
distribution of Y given two predictors, X and Z. We assume that the conditional
variance of Y is constant and given by σ 2 . We also assume that the conditional
mean of Y is given by β0 + β1 X + β2 Z. In other words, we are considering the
classical linear regression model under the assumption of normality. To estimate
this model, one should issue the following command:
ml model lf normal (Y=X Z) (Y=)
The first equation calls for the estimation of the conditional mean of Y, which
is a function of the predictors X and Z. The second equation pertains to the
estimation of σ, which is constant so that no predictors are specified.
One sees that the estimation of a normal regression model requires no ad-
ditional programming compared to the estimation of the mean and variance of
8 Ifthere are predictor variables, these should be specified after the equal sign.
9 The direct and aliasing methods may be combined. For details see Gould and Sribney
(1999) and Gould, Pitblado, and Sribney (2003).
7
a normal distribution. This minimizes the burden of programming and gives
MLE routines a great deal of “portability.” Both of these features are important
benefits of Stata.
3.2.3 Options
There are two options that can be specified with ml model, both of which
produce robust variance estimates (or Huber-White or sandwhich estimates).
(1) robust generates heteroskedasticity-corrected standard errors.
(This may be abbreviated as rob.)
(2) cluster(varname) generates cluster-corrected standard errors,
where varname is the name of the clustering variable. (This may be
abbreviated as cl.)
Both of these commands may be specified with the lf maximization method.10
A discussion of robust variance estimation can be found in Gould and Sribney
(1999), Gould, Pitblado, and Sribney (2003), and in advanced econometrics
textbooks (Davidson and MacKinnon 1993; Greene 2000).
3.3 Ml Check
It is useful to check an MLE program for errors. In Stata, you can do this by
issuing the command ml check. This command evaluates if the program can
compute the log-likelihood function and its first and second derivatives. If there
is a problem with the log-likelihood function, or with its derivatives, ml check
will let the user know. Stata will not be able to estimate the model before these
problems are fixed.
3.4 Ml Search
The Newton-Raphson algorithm needs an initial guess of the parameter esti-
mates to begin the iterations. These initial guesses are the so-called starting
values. In Stata, the user has two options: (1) use a default procedure for
starting values or (2) do a more extensive search.
The default procedure in Stata is to set the initial values to 0. If the log-
likelihood function cannot be evaluated for this choice of starting values, then
Stata uses a pseudo-random number generator to obtain the starting values. (It
will regenerate numbers until the log-likelihood function can be evaluated.) This
procedure is a quick-and-dirty way to start the Newton-Raphson algorithm.
Through ml search (which may be abbreviated as ml sea) the selection
of starting values can be improved. The ml search command searches for
starting values based on equations. A nice feature here is that the user can
specify boundaries on the starting values. For example, before estimating the
Poisson distribution, we could specify
10 However, other maximization methods may not allow these options or may require ad-
8
ml search 1 3
This causes the search command to pick starting values for µ that lie between
1 and 3. If the ML estimate lies within these bounds, beginning the iterations
there can speed up estimation considerably. Thus, I recommend using the ml
search command (even if you do not specify bounds), although it can be by-
passed.
3.5 Ml Maximize
None of the commands discussed so far actually causes Stata to generate a table
of parameter estimates. To do this, the MLE program has to be executed and
this is done through the ml maximize command. You simply type
ml maximize [, options]
9
This makes it impossible to compute the direction vector, which is
used to update estimates. The difficult option prompts Stata
to determine if the direction vector exists. If not, then the pro-
gram supplements the Newton-Raphson algorithm with a variation
on the steepest ascent method to obtain new estimates (see Gould
and Sribney 1999; Gould, Pitblado, and Sribney 2003). Specifying
difficult increases estimation time, so I do not suggest using it
by default. However, if Stata generates many warnings about non-
concavity, especially on later iterations, it may be worthwhile to
repeat estimation using this option.
4 Output
After running a MLE program, Stata will produce the following output.
10
Figure 1: Stata MLE Commands and Output
11
(1) An iteration log, showing the iterations and the value of the
log-likelihood at each iteration. (This log will not be shown if you
specified nolog as an option for the ml maximize command.)
(2) The final value of the log-likelihood function. (This is always
shown, even if you have specified the nolog option.)
(3) The number of observations on which the estimation is based.
(4) The Wald chi-square test and its (asymptotic) p-value.
(5) For each equation, the parameter estimates, their estimated stan-
dard errors, the test statistics and their p-value, and the upper and
lower bounds of the 95% confidence intervals. If the option robust
or cluster was specified on the ml model command, then robust
standard errors are reported. The test statistic is the ratio of the
estimate to its standard error.
Figure 1 shows the output generated by the “poisson” program that we
created earlier (in Example 1 of section 3.1) as applied to some artificial data.
Following the ml max command we first see the iteration history (or iteration
log). It took the program 3 iterations to find the ML estimate for µ. Notice that
the initial iteration produced an error message because Stata started by setting
µ̂ = 0 and for this value ln(µ), which is part of the log-likelihood function, is
not defined. On the final iteration, the log-likelihood function was -20.908921;
this value is repeated underneath the iteration log. To the right of the (final)
log-likelihood we find the number of observations and the Wald chi-squared and
its associated p-value (see below). The bottom portion of the output shows
the estimate (1.5), the estimated standard error (.3872988), the z-test statistic
(3.87) and its associated p-value (0.000), and the lower and upper bounds of the
95% confidence interval (.7409092 and 2.259091, respectively).
The optional ml graph command produces the output in Figure 2 (Stata will
show this output in a separate window). The horizontal axis of this graph shows
the iteration number and the vertical axis, labeled ln LO gives the value of the
log-likelihood function at that iteration. The iteration path shown in Figure 2
is precisely what we would like to see: as the iterations progress, changes in
the log-likelihood function become ever smaller. (In fact, this example shows
no change from the 2nd to 3rd iteration because there is a closed form solution
for the ML estimator.)
12
Figure 2: Graph Produced by ml graph
13
Figure 3: Wald Test Examples
14
lrtest, saving name
where name is an arbitrary name of no more than 4 characters.
(3) Estimate the constrained model.
(4) Type lrtest to perform the likelihood ratio test.
For example, consider the regression model shown in Figure 3. The following
sequence of commands allows one to perform the likelihood ratio test for the
null hypothesis that β1 = β2 = 0.
1. ml model lf normal (Y=X1 X2 X3) (Y=)
2. ml max
3. lrtest, saving(0)
4. ml model lf normal (Y=X3) (Y=)
5. ml max
6. lrtest
The results are shown in Figure 4.
15
Figure 4: Likelihood Ratio Test Examples
16
View publication stats
9 References
Davidson, Russell, and James G. MacKinnon. 1993. Estimation and Inference
in Econometrics. New York: Oxford University Press.
Gould, William, and William Sribney. 1999. Maximum Likelihood Estimation
with Stata. College Station, TX: Stata Press.
Gould, William, Jeffrey Pitblado, and William Sribney. 2003. Maximum Like-
lihood Estimation with Stata. 2nd ed. College Station, TX: Stata Press.
Greene, William H. 2000. Econometric Analysis. 4th ed. Upper Saddle River:
Prentice Hall.
Stata. 2001. Stata Programming Manual, Release 7. College Station, TX: Stata
Press.
13 It is also possible to switch between different algorithms. For details see Gould, Pitblado,
normal program that we defined earlier will work, Stata 8 also allows a more simplified version
of this program:
program normal2
Version 8.1
args lnf mu sigma
quietly replace ‘lnf’=ln(normden($ML y1,‘mu’,‘sigma’))
end
Notice that we can now write program instead of program define and that the log-likelihood
function is simpler. Second, to apply this program we can issue the following command:
ml model lf normal2 (Y=X Z) /sigma
The instruction /sigma will cause Stata to output an estimate of σ.
17