Making Regression Tables From Stored Estimates
Making Regression Tables From Stored Estimates
1 Introduction
Statistical packages are usually very good at fitting all kinds of regression models, but
they are rather poor at keeping the results for those models organized or processing
them for publication. This is a real problem because gathering the relevant figures by
hand from the large amount of statistical output that is usually produced and arrang-
ing the results in clear and presentable tables can be very inefficient and error-prone
processes. Furthermore, results must often be processed repeatedly, for example, when
operationalizations are modified or mistakes are detected. In order to reduce transcrip-
tion errors and avoid having to repeat the laborious tasks by hand, it makes sense to
automate the processing of results as much as possible.
Fortunately, Stata provides the basis for such an automation. One of the great
features in Stata is that, after an estimation command has been carried out, all the
relevant results are not only displayed onscreen but are returned in places where they can
be accessed by the user. This storage of results provides the user with the opportunity
to further process the results in a more-or-less automated manner. Furthermore, Stata 8
saw the introduction of the estimates command (see [R] estimates), which facilitates
the handling of the estimation results for multiple models. More specifically, results from
up to 20 models can be stored at a time. Stata also provides a utility for compiling a
table of the coefficients for all stored models called estimates table. Although the
estimates table command is rather limited and cannot be used to translate the table
c 2005 StataCorp LP st0085
B. Jann 289
to spreadsheet formats or LATEX code, it does a good job assembling a raw matrix of
models and parameters that can be used as a starting point for creating a complex and
well-formatted regression table.
In the remainder of this paper, I will present the new estout package, a program
that makes use of the possibilities provided by Stata and produces regression tables
in what I believe is a very flexible and functional way. Note that there also are other
user programs available to produce tables from regression results. John Luke Gallup’s
outreg is probably the most widely used package of this kind (Gallup 1998, 1999, 2000,
2001). Among the other packages are outtex by Antoine Terracol, est2tex by Marc
Muendler, and mktab by Nicholas Winter. Also see Newson (2003) for a very appealing
approach. However, estout represents a good compromise between functionality and
usability.
where namelist is a list of the names of stored estimates (the namelist can be entered
as * to refer to all stored estimates). The cells() and stats() options determine the
primary contents of the table. The style() option determines the basic formatting of
the table.
Basic usage
The procedure for using estout is to first store several models using the estimates
store command and then apply estout to save or display a table of the estimates. By
default, estout produces a plain, tab-separated table of the coefficients of the models
indicated by the command:
. sysuse auto
(1978 Automobile Data)
. replace price = price/1000
price was int now float
(74 real changes made)
290 Making regression tables
The table produced by the estout command looks messy in the Stata Results window
or the Stata log because the columns are tab-separated (note that tab characters are not
preserved in the Results window or the log). However, the stored example.txt would
look better if it were opened, for example, in a spreadsheet program.
Choosing a style
To align the columns in Stata’s Results window, fixed widths can be specified for the
columns and tab characters can be removed. This is most easily done via the style()
option, which provides a style called fixed:
. estout *, style(fixed)
m1 m2
b b
weight 1.746559 4.613589
mpg -.0495122 .2631875
forXmpg -.3072165
foreign 11.24033
_cons 1.946068 -14.44958
Other predefined styles are tab (the default), tex, and html, but it is also possible to
define one’s own styles (see appendix 4.3). The tex style, for example, modifies the
output table for use with LATEX’s tabular environment:
. estout *, style(tex) varlabels(_cons \_cons)
& m1& m2\\
& b& b\\
weight & 1.746559& 4.613589\\
mpg & -.0495122& .2631875\\
forXmpg & & -.3072165\\
foreign & & 11.24033\\
\_cons & 1.946068& -14.44958\\
B. Jann 291
Note that cons has been replaced by its LATEX equivalent in the example above using
the varlabels() option (since the underscore character produces an error in LATEX
unless it is preceded by a backslash). For more information on the varlabels() option,
consult estout’s online help.
Use the cells() option to specify the parameter statistics to be tabulated and how they
are to be arranged. The parameter statistics available are b (coefficients, the default), se
(standard errors), t (t/z statistics), p (p-values), ci (confidence intervals; to display the
lower and upper bounds in separate cells, use ci l and ci u), as well as any additional
parameter statistics included in the e()-returns for the models (also see section 3.7).
For example, cells(b se) reports raw coefficients and standard errors:
Multiple statistics are placed in separate rows beneath one another by default, as in
the example above. However, elements that are listed in quotes are placed beside one
another. For example, specifying cells("b se t p") produces the following table:
The two approaches can be combined. For example, cells("b p" se) would pro-
duce a table with raw coefficients and standard errors beneath one another in the first
column and p-values in the top row of the second column for each model.
Note that for each statistic named in the cells() option, a set of suboptions may
be specified in parentheses. For example, in social sciences, it is common to report
standard errors or t statistics in parentheses beneath the coefficients and to indicate the
significance of individual coefficients with stars. Furthermore, the results are rounded.
Such a table can be created using the following procedure:
292 Making regression tables
The estout default is to display * for p < .05, ** for p < .01, and *** for p < .001.
However, note that the significance thresholds and symbols are fully customizable (see
the starlevels option in appendix 4.1).
Finally, use the stats() option to specify scalar statistics to be displayed in the last
rows of each model’s table. The available scalar statistics are aic (Akaike’s information
criterion), bic (Schwarz’s information criterion), rank (the rank of e(V), i.e., the num-
ber of free parameters in the model), p (the p-value of the model), as well as any scalar
contained in the e()-returns for the models (also see section 3.7). For example, specify
stats(r2 bic N) to add the R-squared, BIC, and the number of cases to the bottom
of the table:
3 Advanced applications
The estout package has many features, and it is beyond the scope of this text to provide
examples for all of these options. The following presentation is therefore restricted to a
few selected examples illustrating the spectrum of estout’s capabilities and introducing
some of its less-obvious applications.
B. Jann 293
in the LATEX document for this article after having run the following command:
294 Making regression tables
Note that most of the options in the above command could also have been provided
via defaults files (see appendix 4.3). Working with defaults files can be very efficient if
you want to produce a large number of similar tables.
Furthermore, the parameter statistics reported for the various models can be specified
using the pattern() suboption within the cells() option (for example, it is possible
to print the t statistics for, say, the second model only; an example can be found in
section 3.6).
Note that in the example the models’ overall significance is denoted by stars attached
to values of the adjusted R-squared (both models are significant at the 0.001 level).
In the case of the multiple-equation models reg3, sureg, and mvreg, summary statistics
for all the model’s equations will be printed in separate columns in the same row. For
all other models, the summary statistics will be placed in the first column.
296 Making regression tables
. generate record = 0
. replace record = 1 if rep > 3
(34 real changes made)
. logit foreign mpg record
(output omitted )
. estimates store raw
. mfx
(output omitted )
. estimates store mfx
. estout raw mfx, cells("b Xmfx_X(pattern(0 1))" se(par)) margin legend
> style(fixed)
raw mfx
b/se b/se Xmfx_X
mpg .1079219 .0184528 21.2973
(.0565077) (.0101674)
record (d) 2.435068 .4271707 .4594595
(.7128444) (.1043178)
_cons -4.689347
(1.326547)
(d) marginals for discrete change of dummy variable from 0 to 1
With single-equation models, the incorporation of results from mfx in the table
is straightforward. However, matters become more complicated for multiple-equation
models. Marginal effects have nothing to do with the equations per se, so it is not clear
where to report the mfx results if some variables appear in several different equations.
The default in estout is to print the mfx coefficients in each row that relate to the
variable in question. This default can be changed with the meqs() option, which specifies
that the mfx results be printed only in select equations. For example, proceed as follows
to report the marginal effects for the probability of only the main outcome in heckprob:
. set seed 6630
. generate u = uniform() > 0.5
. heckprob u headroom, select(foreign = turn headroom) nolog
(output omitted )
. estimates store raw
. mfx
(output omitted )
. estimates store mfx
B. Jann 297
Taking the additional step of inserting the marginal effects for the selection proba-
bility in the example above is rather involved because the marginal effects for the two
functions must be saved in different models. The solution is to print only the main
equation in a first estout call and then append the rest of the table in a second call:
. mfx, predict(psel)
(output omitted )
. estimates store mfx2
. tempfile foo
. estout raw mfx using "‘foo’", cells(b se(par)) margin keep(u:)
> style(fixed) notype
. estout raw mfx2 using "‘foo’", cells(b se(par)) margin
> keep(foreign:) mlabels(, none) collabels(, none)
> style(fixed) notype append
. type "‘foo’"
raw mfx
b/se b/se
u
headroom -1.003445 -.2843565
(.6077779) (.2326952)
_cons 2.176479
(1.923797)
foreign
turn -.2954961 -.068597
(.0675027) (.0158482)
headroom -.1261772 -.029291
(.2919013) (.0665186)
_cons 11.05306
(2.479492)
may be used, for example, to add standardized coefficients or the means and standard
deviations of the regressors to the e()-returns for the stored models. However, estadd’s
basic capabilities can be extended by writing subroutines to allow for additional statis-
tics.
The basic syntax of estadd is
estadd namelist , stats(statslist)
where namelist is again a list of stored estimates (if namelist is empty, estadd will be
applied to the current estimates). Use stats() to specify the statistics to be added to
the e()-returns of the indicated models. For more details, see estadd’s online help.
Table of descriptives
estadd is equipped with a few predefined statistics, such as beta (standardized coef-
ficients), mean (means of regressors), and sd (standard deviations of regressors)1 . The
latter can be used, for example, to produce a table of descriptives for the variables in
the models in our examples:
Writing new estadd subroutines to add user-defined statistics is not overly complicated,
as we will illustrate below. In general, a new subroutine should be called estadd mystat.
mystat will be available to the stats() option of the estadd command after the program
code has been executed or the subroutine file has been saved as estadd mystat.ado
in either the current directory or somewhere else in the ado path ([P] sysdir). The
subroutine will be called once for each model with the model’s estimates restored. The
e()-returns for the model in question may be therefore used to calculate new statistics.
Within a subroutine, use the ereturn command ([P] ereturn) to append new statis-
tics to the existing e()-returns. New summary statistics should be returned as scalars
using the ereturn scalar command, whereas new parameter statistics (e.g., transfor-
mations of the regression coefficients) should be returned as matrices (row vectors, to
1 More functions are provided by the estadd plus package (available from the SSC archive).
B. Jann 299
be precise) using the ereturn matrix command. Note that the columns of the added
matrices should be named according to the row names of the coefficients matrix e(b) in
order to ensure estout’s ability to tabulate the new parameter statistics. Use the ex-
amples below or the estadd beta, estadd mean, and estadd sd subroutines, which are
supplied with the estadd package, as a starting point for programming new routines.
To report the Cox and Snell pseudo-R-squared, for example, define the estadd-
subroutine
program estadd_coxsnell, eclass
ereturn scalar coxsnell = 1 - exp(e(ll_0)-e(ll))^(2/e(N))
end
New parameter statistics can be added in a similar manner. For example, the fol-
lowing lines of code comprise a subroutine to insert the standardized factor change
coefficients, or exp(βj Sj ), where Sj is the standard deviation of regressor j, which are
sometimes reported for logistic regression (see Long 1997):
2 Also see the eret2 package (available from the SSC Archive). The eret2 command allows you to
add statistics to the e()-returns of a model without having to program subroutines. However, eret2
can be applied only to the currently active estimates.
300 Making regression tables
If the program is saved in the ado path as estadd ebsd.ado, it can, for example, be
called as follows:
4 Appendix
4.1 Full syntax of estout
estout namelist using filename , parameter statistics options
summary statistics option significance stars options layout options
labeling options output options defaults option
where namelist is either all or * or name name ... , and name is the name of
stored estimates. The results estimated last may be indicated by a period (.), even if
they have not yet been stored. For a detailed discussion of estout’s options, see the
online help. A brief list of the options is provided below.
where array is
row row ...
row is
‘ " element element ... " ’
element is
el (el subopts)
where levelslist is
symbol # symbol # ...
@span Returns the value of a count variable for the total number of physical
columns of the table if used in the labels in the blist() and elist()
suboptions of varlabels() or in the text specified in prehead(),
posthead(), prefoot(), or postfoot().
@span Returns the number of spanned columns if used in the text specified
in the prefix() and suffix() suboptions of mgroups(), mlabels(),
eqlabels(), or collabels(), or in the labels specified in these options.
@span Returns the range of spanned columns (e.g., 2-4 if columns 2, 3, and 4
are spanned) if used in the text specified in the erepeat() suboption
of mgroups(), mlabels(), eqlabels(), or collabels().
B. Jann 305
@M Returns the number of models in the table if used in the text specified
in prehead(), posthead(), prefoot(), or postfoot().
@title Returns the title specified with the title() option if used in the text
specified in prehead(), posthead(), prefoot(), or postfoot().
@discrete Returns the explanations provided by the discrete() option (if the
margin option is activated) if used in the text specified in prehead(),
posthead(), prefoot(), or postfoot().
@starlegend Returns a legend explaining the significance symbols if used in the text
specified in prehead(), posthead(), prefoot(), or postfoot().
settings styles
tab fixed tex html
begin <tr><td>
delimiter tab " " & </td><td>
end \\ </td></tr>
varwidth 0 12 12 12
modelwidth 0 12 12 12
abbrev off on off off
to open one of the existing defaults files (where style is the name of the defaults set,
e.g., tab; the estoutdef command is provided with the estout package), make the
desired modifications, and save the file as estout newstyle.def in the current directory
or elsewhere in the ado path (see [P] sysdir). To use the new option settings, type
estout has two main types of options, which are treated differentially in defaults files.
On the one hand, there are simple on/off options without arguments, such as legend
or showtabs. To turn such an option on, enter the option followed by the options name
as an argument; i.e., add the line
option option
to the defaults file. For example,
306 Making regression tables
legend legend
specifies that a legend be printed in the table footer. Otherwise, if you want to turn
the option off, just delete or comment out the line that contains it (or specify option
without an argument).
To temporarily turn off an option that has been activated in a defaults file, specify
nooption in the command line (do not, however, use nooption in defaults files). For
example, if the legend has been turned on in the defaults file, but you want to suppress
it in a specific call of estout, type
. estout ..., nolegend
would be referred to as
statslabelsprefix args
in the defaults file. The cells() option represents an exception to this rule. It may be
defined in the defaults file using only a simple array of cells elements without suboptions,
e.g.
cells "b se" p
However, the suboptions of the cells elements may be referred to as el suboption, for
example
b star star
or
se par [ ]
Be aware that the support for comments in defaults files is limited. In particular,
the /* and */ comment indicators cannot be used. The other comment indicators work
(more or less) as usual; that is,
• Empty lines and lines beginning with * (with or without preceding blanks) will
be ignored.
• // preceded by one or more blanks indicates that the rest of the line should
be ignored. Lines beginning with // (with or without preceding blanks) will be
ignored.
• /// preceded by one or more blanks indicates that the rest of the line should be
ignored and the part of the line preceding it should be added to the next line. In
other words, /// can be used to split commands into two or more lines of code.
5 Acknowledgments
Some of the estout code has been adapted from the official est table.ado. I would
like to thank Kit Baum, Elisabeth Coutts, Henriette Engelhardt, Jonathan Gardnerand,
Friedrich Huebler, Maren Kandulla, Clive Nicholas, Fredrik Wallenberg, Ian Watson,
and Vince Wiggins for their comments and suggestions.
6 References
Gallup, J. L. 1998. sg97: Formatting regression output for published tables. Stata
Technical Bulletin 46: 28–30. In Stata Technical Bulletin Reprints, vol. 8, 200–202.
College Station, TX: Stata Press.
—. 1999. sg97.1: Revision of outreg. Stata Technical Bulletin 49: 23. In Stata Technical
Bulletin Reprints, vol. 9, 170–171. College Station, TX: Stata Press.
308 Making regression tables
—. 2000. sg97.2: Update to formatting regression output. Stata Technical Bulletin 58:
9–13. In Stata Technical Bulletin Reprints, vol. 10, 137–143. College Station, TX:
Stata Press.
—. 2001. sg97.3: Update to formatting regression output. Stata Technical Bulletin 59:
23. In Stata Technical Bulletin Reprints, vol. 10, 143. College Station, TX: Stata
Press.
Long, J. S. 1997. Regression Models for Categorical and Limited Dependent Variables.
Thousand Oaks, CA: Sage.
Newson, R. 2003. Confidence intervals and p-values for delivery to the end user. Stata
Journal 3(3): 245–269.