Paper PO22 PROC SQL: An Efficient Tool For Creating Macro Variables
Paper PO22 PROC SQL: An Efficient Tool For Creating Macro Variables
ABSTRACT
Are you a PROC SQL lover? Are you interested in creating multiple macro variables
with a few lines of code? Are you interested in shortening your SAS code significantly
and running your SAS files efficiently? PROC SQL offers a lot of useful features, which
includes, but is not limited to: 1) combine the functionality of DATA and PROC steps
into one single step, 2) sort, summarize, join (merge) and concatenate datasets, 3)
construct in-line views using the FROM and SELECT clauses, 4) line up multiple macro
variables using the INTO clause. In my programming practice, I use PROC SQL to solve
a lot of programming problems, sometimes it is even impossible to solve a programming
problem without using PROC SQL. Since the applications of PROC SQL in SAS
programming are so broad, this paper will only focus on creating macro variables from
SAS data using PROC SQL. Concrete examples are provided to demonstrate the
advantages of PROC SQL in creating macro variables over the CALL SYMPUT routine.
INTRODUCTION
You often want to manipulate your SAS data and convert a list of unique values of one
variable, or the unique combinations of several variables by concatenation, or a list of
unique variables of one data file, or even a list of unique data file names of the entire
library into macro variables in order to efficiently run and maintain your SAS programs.
The CALL SYMPUT routine and DATA _NULL_ are the traditional methods to create
macro variables from SAS data. However, PROC SQL is much more powerful and
efficient in creating macro variables by taking advantage of the in-line view capability
and the SELECT, INTO, GROUP BY, HAVING, and ORDER BY clauses. There are
many efficient ways to create macro variables using PROC SQL. The purpose of this
presentation is to demonstrate the useful tricks and skills by a few practical examples.
EXAMPLE ONE
For example, you are asked to generate multiple graphs for the surgeons who have 100 or
more subjects. Your graphs will be generated from multiple SAS data files. The common
variable in these data files is the surgeon ID. However, the subject information is only
available from the demog data file. In order to avoid tedious coding, dynamically control
your output, and minimize your workload on each revision, it is helpful to create the
macro variables found in Table 1 by using the SAS code listed in Figure 1.
1
Figure 1. SAS Coding Used to Create the Macro Variables displayed in Table 1:
Comparison of PROC SQL and CALL SYMPUT in DATA _NULL_
As you notice in Table 1 and Figure 1, the obvious advantages of PROC SQL are : 1)
multiple macro variables are created with one step; 2) your coding is significantly
shorter; 3) the data values are summarized by an in-line view (highlighted with yellow
color) in PROC SQL. You have to rely on one PROC FREQ and one PROC SORT to
summarize the data if you use the CALL SYMPUT routine. In addition, the DATA
_NULL_ STEP requires more coding to create the same types of macro variables.
EXAMPLE TWO
It is very common that your clients ask you to add some summarized information into a
well-refined table or graph in order to make the presentation more informative. Of
course, your favorite tools are macro variables, because you can conveniently display the
information in a footnote or title by revoking macro variables. As presented in Figure 2,
two macro variables are created for the surgeons who have the most and least patients.
You have to count the number of patients by surgeon and identify the maximum and
minimum counts before you make the macro variables. PROC SQL can perform a very
nice job with a few lines of coding only. However, CALL SYMPUT routine requires two
SORT procedures and two DATA STEPS (Figure 2).
2
Figure 2. Convert the Numbers of most and least patients into macro variables
data _null_;
set counted end=last;
if _n_=1 then
call symput('maxnum',left(ptnum));
if last then
call symput('minnum',left(ptnum));
run;
EXAMPLE THREE
The SAS code displayed in Figure 3 can create five sets of macro variables, which are
used to create patient profiles for 53 subjects. The treatment start and end dates vary from
subject to subject. Therefore, you have to convert the date values into macro variables.
Set one (&pt1 to &pt53) contains Subject Identifications. Set two (&bday1 to &bday53)
contains the value of Treatment Start Date. Set three (&eday1 to ebday53) contains the
value of Treatment End Date. The information for each subject displays within each
profile by creating an annotation dataset. The other two sets of macro variables specify
the tick marks of the horizontal axis by 30 days. Set four (&bdy1 to &bdy53) contains the
value of Treatment Start Date minus 3 days. The reason of doing this is to allow the first
date value a little away from the vertical axis. Set five (&edy1 to &edy53) contains the
value of Treatment End Date plus 54 days. The reason of doing this is to ensure the
Treatment End Date included in the horizontal axis.
As you review the code in Figure 3, you will realize that the value calculation and
grouping in PROC SQL is completed by a short nested query (marked with yellow color).
However, you need a PROC MEAN and a DATA STEP to perform the same task when
you use CALL SYMPUT routine to create the macro variables. Furthermore, the code
used to create the macro variables in the DATA _NULL step is more complicated.
3
Figure 3. Comparison of SAS Code in Create Multiple Sets of Macro Variables
(PROC SQL and CALL SYMPUT)
data _null_;
set data2 end=last;
by pt;
if last then call symput('cnt',_n_);
call symput(compress('pt'||_n_),pt);
call symput
(compress('bday'||_n_),put(firstday,date9.));
call symput
(compress('eday'||_n_),put(lastday,date9.));
call symput (compress('bdy'||_n_),firstday2);
call symput(compress('edy'||_n_),lastday2);
run;
EXAMPLE FOUR
This is an example on efficiency of PROC SQL in creating macro variables other than the
advantage of shortening the code. These two macro variables are used for two different
reasons. Macro trtdt2 is used to specify the tick marks of the horizontal axis by day.
However, macro trtdt is used to label the tick marks on the horizontal axis. The SAS code
used to create these two macro variables with PROC SQL and CALL SYMPUT routine
is shown in Figure 4.
Macro trtdt:
"30MAR03" "31MAR03" "01APR03" "02APR03" "03APR03" "04APR03" "05APR03" "06APR03"
"07APR03""08APR03" "09APR03" "10APR03" "11APR03" "12APR03" "13APR03" "14APR03"
"15APR03" "16APR03" "17APR03" "18APR03"
Macro trtdt2:
"30MAR03"d "31MAR03"d "01APR03"d "02APR03"d "03APR03"d "04APR03"d "05APR03"d
"06APR03"d "07APR03"d "08APR03"d "09APR03"d "10APR03"d "11APR03"d "12APR03"d
"13APR03"d "14APR03"d "15APR03"d "16APR03"d "17APR03"d "18APR03"d
______________________________________________________________________________________________________
4
Figure 4. Example of Programming Efficiency Other than Shortening SAS Code
Let us start with the DATA _NULL_ step and CALL SYMPUT routine. You need a
PROC SORT procedure to sort the data. But, this is not the only shortcoming. First of all,
the length of the dummy variables listed1 and listed2 is specified at $200 by guessing.
When you check the log window, you are surprised because of missing the macro
variable trt_dt2. There are no error and warning messages available in the log window.
The warning and error messages display in the log window only if you execute the code
one more time. The messages say “WARNING: The quoted string currently being
processed has become more than 262 characters long. You may have unbalanced
quotation marks.” and “ERROR: Open code statement recursion detected.”. This tells
you that the actual length of the character string for the dummy variable listed2 is greater
than 200 characters long, and the unbalanced quotation marks are generated from the
truncation of the character string between the two quotation marks. You can figure out
the actual length by checking the length of one quoted treatment date. Alternatively, you
can avoid the problem by generously setting the length, i.e., setting the length of
listed1and listed2 at $1000. Unfortunaltely, this lesson is expensive for you will not
realize it until a number of trials. Please understand that the length statement in the
DATA _NULL_ step can’t be omitted. Otherwise the dummy variables listed1 and listed2
would become blanks.
However, you can easily create these two long macro variables with PROC SQL without
having to worry about the actual length of the clustered text. You do not have to
experience the problems discussed above.
5
CONCLUSION
PROC SQL is a great joy of SAS programming. This paper only demonstrates a few
practical examples of creating macro variables with PROC SQL. These examples present
great flexibility and useful skills which offer some hints for you to create macro variables
in your real work.
CONTACT INFORMATION
You can send your comments, questions, and inquiries to:
Louie Huang, Senior Technical Specialist
Baxter BioScience, Baxter Healthcare Corporation
One Baxter Way
Westlake Village, CA 91362
Tel: 805-372-3487
Fax: 805-372-3462
Email: [email protected]
TRADEMARK INFORMATION
SAS and all other SAS Institute Inc. product or service names are registered trademarks
or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA
registration.