Proc Tabulate
Proc Tabulate
Paper 171-2008
ABSTRACT
PROC TABULATE is used to build tabular reports containing descriptive statistical information, including
hierarchical relationships among variables. The code that is used to invoke PROC TABULATE is
complicated, and much of it looks quite different from other SAS procedures. Nevertheless, it is well worth
the necessary investment of time and effort for learning the intricacies and subtleties of its syntax. This
paper provides a simplified, step-by-step approach for coding PROC TABULATE.
INTRODUCTION
PROC TABULATE is used to build tabular reports containing descriptive statistical information, including
hierarchical relationships among variables. PROC TABULATE is the SAS® System’s implementation of TPL
(Table Producing Language), which was developed at the U.S. Bureau of Labor Statistics during the 1970s,
for generating tabular reports of descriptive statistics involving employment data.
PROC TABULATE is more powerful for producing tabulations than PROC FREQ, and it is a more flexible
statistical report writer than PROC MEANS. Although PROC TABULATE and PROC REPORT are both
capable of generating similar tabular reports in many situations, each of these procedures has strengths and
weaknesses. PROC TABULATE seems to be better for displaying hierarchical relationships. The syntax
used to invoke PROC TABULATE and PROC REPORT differ from one another, and both are complicated.
However, it is well worth the necessary investment of time and effort for learning the intricacies and
subtleties of coding both procedures. This paper will cover the fundamentals of coding PROC TABULATE.
The examples in this paper make use of the SAS® data files, prdsale (from the SAS 9.1 sashelp data
library), and empldata (an inner join of the empinfo, jobcodes, and salary SAS® data sets, from the SAS 9.1
sample data library).
1
SAS Global Forum 2008 Foundations and Fundamentals
2
SAS Global Forum 2008 Foundations and Fundamentals
Unfortunately, a lot of the code that is used to invoke PROC TABULATE looks quite different from the code
that is used for other SAS procedures. Here is the basic syntax for coding PROC TABULATE:
Here are some options that are frequently used in PROC TABULATE statements:
DATA=SAS-dataset-name
FORMAT=formatname
MISSING
NOSEPS
The DATA= option specifies the SAS dataset to be used.
The FORMAT option specifies a default format for each table cell. The default format is overridden by any
format specified in a subsequent TABLE statement.
The MISSING option requests that missing values be regarded as valid levels for classification variables.
Unless the MISSING option is specified, observations with missing values for class variables will not be
included in the analysis.
The NOSEPS option removes the interior horizontal lines from the printed report.
Analysis variables are numeric variables that are used to compute statistics that are reported in the body of
the table.
3
SAS Global Forum 2008 Foundations and Fundamentals
In a TABLE statement, the comma is a very important symbol, because it separates the dimensions of the
table.
• If two commas were specified, then the table would have three dimensions, and the order
would be pages, rows, and columns.
• If only one comma was specified, then the table would have two dimensions, and the order
would be rows, columns.
• No comma would be interpreted to mean that the table’s only dimension would be the column
dimension. The table would only have one row.
In this context, an expression can consist of variables, statistics, operators, format specifications, and label
assignments.
Because our immediate concern is only to define the table’s dimensions, we are pleased to discover that
only three operators are needed to specify the page, row, and column headings that identify the structure of
a table.
• An asterisk (*) can be used to cross the classification variables; that is, to arrange them in a
nested manner, according to the order listed (top, middle, and lower).
• A blank space is used to concatenate two classification variables (which will appear in the
table: top-to-bottom for row headings, left-to-right for column headings).
• Parentheses ( ) are used to group the elements of an expression, and to associate an adjacent
operator with each concatenated element inside the parentheses.
4
SAS Global Forum 2008 Foundations and Fundamentals
Whenever you cross a variable with a keyword for a statistic, you are identifying the statistic to be applied to
that variable (which tells PROC TABULATE what type of calculation to perform). You can cross
classification variables only with the N or PCTN statistics. By default, if the TABLE statement does not
include an analysis variable or a statistic, then PROC TABULATE automatically crosses the N statistic with
the indicated class variables. Analysis variables can be crossed with any statistic. By default, if the TABLE
statement includes an analysis variable but without crossing it with any statistic, PROC TABULATE
automatically crosses it with SUM.
The first TABLE statement would generate a hierarchical breakdown of frequency counts in the data set,
according to values of jobcode (the rows) and the nested values of location and gender (the columns).
5
SAS Global Forum 2008 Foundations and Fundamentals
The second and third TABLE statements would generate a hierarchical breakdown of percentages
represented in each cell, according to values of jobcode and the nested values of location and gender.
The fourth TABLE statement would generate a hierarchical breakdown of frequency counts represented in
each cell, according to values of jobcode and the nested values of location and gender. It would include an
additional row that would represent the percentage of the total population of the data set included in each
column.
6
SAS Global Forum 2008 Foundations and Fundamentals
The first TABLE statement would generate a hierarchical breakdown of the sum of the salary amounts
represented in each cell, according to values of jobcode and the nested values of location and gender.
7
SAS Global Forum 2008 Foundations and Fundamentals
The second TABLE statement would generate a hierarchical breakdown of the percentage of the total of
salary amounts represented in each cell, according to values of jobcode and the nested values of location
and gender.
The third TABLE statement would generate a hierarchical breakdown of the average salary represented in
each cell, according to values of jobcode and the nested values of location and gender.
8
SAS Global Forum 2008 Foundations and Fundamentals
PROC TABULATE has a universal class variable, ALL, which can be used to generate totals for any
specified class variable. Just concatenate the keyword ALL into the row or column expression of a TABLE
statement.
What is the difference between the tables produced by these two TABLE statements? Here are the results
from the first TABLE statement:
9
SAS Global Forum 2008 Foundations and Fundamentals
And here are the results from the second TABLE statement:
Brackets < > are used to explicitly specify the denominator that is to be used in the calculation of
percentages.
The first TABLE statement specifies that the display should include the number of instances of values for
gender occurring with each value of jobcode, and the percentages of those numbers to the total across all
combinations of values of jobcode and gender.
10
SAS Global Forum 2008 Foundations and Fundamentals
The second TABLE statement specifies that the display should include the number of occurrences of values
for gender, and the percentage of that number to the total for all values of gender in each jobcode (that is, a
row-percentage).
11
SAS Global Forum 2008 Foundations and Fundamentals
This TABLE statement specifies that the display should include breakdowns of the total salary amounts, and
the associated percentages, for each combination of values of jobcode and gender, where the percentages
are calculated in a column-wise manner.
Observe that, to obtain percentages by row, we use the column-expression in the “denominator definition”;
and to obtain percentages by column, we use the row-expression in the “denominator definition.”
And here are two more examples involving percentages, but these examples include the universal class
variable, ALL.
12
SAS Global Forum 2008 Foundations and Fundamentals
Notice that whenever row- or column-percentages are to be produced for a column- or row-expression that
includes the ALL universal class variable, then ALL also must be included in the “denominator definition.”
As in many other SAS procedures, you can use a LABEL statement to replace variable names with more
descriptive headings for your class variables. There also is a way to specify temporary labels in a TABLE
statement.
Similarly, TITLE and FOOTNOTE statements also can be used to enhance the tabular reports generated by
PROC TABULATE.
To assign labels to procedure-generated statistics and the universal class variable, we use the KEYLABEL
statement.
KEYLABEL N = ‘Count‘
ALL = ‘Total’
PCTN = ‘Percent’;
As in other SAS procedures, formats can be used to substitute labels for values of the classification
variables. Formats also can be used to combine many values of the classification variables into a much
smaller number of values to be printed in the report. We create custom formats by using PROC FORMAT,
and we invoke those formats in PROC TABULATE either through a FORMAT statement, or by crossing
F=format-name. in the TABLE statement with the particular variable.
13
SAS Global Forum 2008 Foundations and Fundamentals
The default for displaying cells with missing numeric values is a period. You can change the way missing
values are displayed by using the MISSTEXT= option to define up to twenty characters of text that will print
in the table cells whenever a particular combination of class variable values is not found in the input data
set.
Here are a couple of useful TABLE statement options that can be used for customizing the appearance of
tables:
• The RTSPACE= (or RTS=) option defines the total amount of space for the row headings. If there
are several levels of headings for rows, then the space is divided equally among the levels, after
subtracting the spaces that are needed for the vertical lines.
• Whenever a table produced by PROC TABULATE is too wide to fit on a single page, the procedure
automatically splits the table, to span as many separate pages as are necessary for printing. For
short, wide tables, the CONDENSE option could be specified on the TABLE statement, in order to
print as many logical pages as possible on a single page, one below the other.
Some people think that traditional SAS output is ugly. Beginning with Version 7, the SAS System provided
an ability to deliver procedure output in a flexible variety of file types and formats, through the SAS Output
Delivery System (ODS). Under SAS 9.1.3, ODS can be used to generate results as SAS data sets, output
listings, PostScript, HTML, RTF, PDF, PCL, XML, Excel, and other output file types. ODS can be used to
enhance tabular reports, by wrapping the PROC TABULATE code in ODS destinations, by changing fonts,
colors, and other style attributes, and by adding graphics. For further information about ODS and
TABULATE, consult the software documentation provided by SAS Institute.
Here is a final example, which illustrates several of the “Labeling & Formatting” techniques that I have
described:
PROC FORMAT;
VALUE salfmt low-<12000 = ‘Less than $12,000’
12000-<24000 = ‘$12,000 - $23,999’
24000-<48000 = ‘$24,000 - $47,999’
48000-<72000 = ‘$48,000 - $71,999’
72000-<96000 = ‘$72,000 - $95,999’
14
SAS Global Forum 2008 Foundations and Fundamentals
CONCLUSION
PROC TABULATE is a very useful and very powerful procedure for constructing tabular reports containing
descriptive statistical information, including hierarchical relationships among variables. It is well worth the
necessary investment of time and effort for learning the intricacies and subtleties of its syntax. Concentrating
on the five steps makes it much easier to learn how to code PROC TABULATE. We have only just
“scratched the surface” of this wonderful procedure! But now you know enough to continue learning about it
on your own. Happy tabulating!
Jonas V. Bilenas, “Making Sense of PROC TABULATE (Updated for SAS9®),” Proceedings of the SAS
Global Forum 2007, Paper #230-2007.
15
SAS Global Forum 2008 Foundations and Fundamentals
Dan Bruns, “The Utter ‘Simplicity?’ of the TABULATE Procedure,” Proceedings of the Seventeenth Annual
SAS Users Group International Conference (1992), pp. 216-220.
Dan Bruns, “Advanced Features of PROC TABULATE -- or -- The Utter Simplicity of the TABULATE
Procedure - The Sequel,” Proceedings of the Twenty-First Annual SAS Users Group International
Conference (1996), pp. 242-247.
Dan Bruns, “The Utter ‘Simplicity?’ of the TABULATE Procedure -- The Final Chapter,”
• Proceedings of the Twenty-Second Annual SAS Users Group International Conference (1997),
pp. 251-256 ; and
• Proceedings of the Seventh Annual South-Central SAS Users’ Conference (1997), pp. 61-66;
and
• Proceedings of the Twenty-Ninth Annual SAS Users Group International Conference (2004),
Paper #241-29.
Dan Bruns & Ray Pass, “Battle of the Titans: REPORT vs. TABULATE,” Proceedings of the Twenty-Seventh
Annual SAS Users Group International Conference (2002), Paper #133-27.
Dan Bruns & Ray Pass, “To REPORT or to TABULATE?: That is the Question!,” Proceedings of the
Twenty-Ninth Annual SAS Users Group International Conference (2004), Paper #122-29.
Diane Louise Rhodes, “Speaking Klingon: A Translator’s Guide to PROC TABULATE,” Proceedings of the
Thirtieth Annual SAS Users Group International Conference (2005), Paper #258-30.
SAS Institute Inc., “The TABULATE Procedure,” in SAS® 9.1.3 Help and Documentation, an HTML
application that is installed as a component of the SAS System and is accessible from the SAS
main menu by clicking “Help.”
SAS Institute Inc., “The TABULATE Procedure,” Chapter 52 of Base SAS® 9.1.3 Procedures Guide (2006),
available online at
http://support.sas.com/documentation/onlinedoc/91pdf/sasdoc_913/
base_proc_8977_new.pdf .
SAS Institute Inc., SAS Guide to TABULATE Processing, Second Edition (Cary, NC: SAS Institute Inc.,
1990).
Bob Virgile, “The Right Approach to Learning PROC TABULATE,” SESUG ‘97 Proceedings (1997), pp. 189-
195.
16
SAS Global Forum 2008 Foundations and Fundamentals
Tom Winn, “A Hands-On Introduction to PROC TABULATE,” Proceedings of the SCSUG SAS Educational
Forum 2007.
CONTACT INFORMATION
Thomas Winn, Ph.D.
U.S. Department of Veterans Affairs
DSS Support Office – Austin OI&T
1615 Woodward Street, 776 / 19F-4
Austin, TX 78772-0001
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute
Inc. in the USA and other countries. ®indicates USA registration.
17