Quick Hits - My Favorite SAS Tricks: Marje Fecht, Prowerk Consulting
Quick Hits - My Favorite SAS Tricks: Marje Fecht, Prowerk Consulting
ABSTRACT
Are you time-poor and code-heavy?
It's easy to get into a rut with your SAS code and it can be time-consuming to spend your time learning and
implementing improved techniques.
This presentation is designed to share quick improvements that take 5 minutes to learn and about the same time to
implement. The quick hits are applicable across versions of SAS and require only BASE SAS knowledge.
Included are:
- simple macro tricks
- little known functions that get rid of messy coding
- dynamic conditional logic
- data summarization tips to reduce data and processing
- generation data sets to improve data access and rollback.
INTRODUCTION
Many SAS programmers learn SAS “by example” - using the programs they have inherited or a co-worker’s
examples. Some examples are better than others! Sometimes, the examples we learn from reflect coding
practices from earlier SAS versions, prior to the introduction of the large function library and language extensions that
exist today.
When you are tasked with producing results, there isn’t always time to learn and implement new features. This paper
shares quick tricks that are easy to learn and implement.
CODE REDUCTION
The SAS function library grows with each release of SAS and provides built-in functionality for accomplishing many
common tasks. To help you relate the described functionality to your existing code, “before and after” examples are
included.
CONCATENATION
If you join strings of data together, then you have likely used the concatenation operator ( || ) as well as other
operators to obtain the desired results. Concatenation and managing delimiters and blanks can be frustrating and
may involve a lot of steps including:
• TRIM
• LEFT
• STRIP
• ||
• Adding delimiters.
The old way of joining the contents of the variables n (numeric), with a, b, c (all character) might look like:
old = put(n,1.) || ' ' || trim(a) || ' ' || trim(b) || ' ' || c;
Unfortunately the above code won’t always produce a pleasing result, and thus could require even more complication.
Consider the case when the variable a contains all blanks. This would result in 3 blanks in a row in the variable old
since trim(a) would reduce to a single blank and then add the additional blanks used as delimiters. Furthermore, a
LEFT or STRIP function would be needed to insure that leading blanks don’t cause issues.
-1–
Marje Fecht, Prowerk Consulting Ltd, 2013
The SAS 9 family of CAT functions reduces complexity when concatenating strings!
Note:
• If any of n, a, b, c are blank, CATX will not include the blanks and is smart enough to therefore not include
the delimiters.
• If any arguments are numeric, the CAT functions will handle the numeric to character conversion without
LOG messages.
CONDITIONAL CONCATENATION
Consider an example where you have a series of indicators that represent marketing channels used in a campaign.
You want to build a single string (see channels_CATX ) showing all channels used for each client. For this
example, we start with 4 indicators in the data set.
Example 1 - Solution 1:
A classic solution would be to create a variable that corresponds to each indicator with either a blank or the two letter
code that is desired for the result Then concatenate the 4 new variables.
This solution is
• wordy since it requires creating an extra set of variables
• problematic
o extra underscores in result since delimiters are included regardless of “missing data”
o TRIM or STRIP could be needed if variables had differing length strings.
2
Marje Fecht, Prowerk Consulting Ltd, 2014
Example 1 - Solution 2:
Improve upon Solution 1 by using the CATX function. A LENGTH statement is needed since CATX returns a string of
length 200.
/*** Using CATX – ignores missing values and handles STRIP ***/
length channels_CATX $12;
channels_CATX = CATX ( '_' , dm , on , cc , st);
Solution 2 still uses the extra 4 variables but it behaves when some values are missing, so you don’t need COMPRESS
or special logic.
Note that
• the expression ch_dm = 'Y' can ONLY result in TRUE or FALSE and thus two result arguments are provided
• the result 'DM' is returned only when ch_dm = 'Y'
• any value of ch_dm other than 'Y' results in a blank since the expression is FALSE.
3
Marje Fecht, Prowerk Consulting Ltd, 2014
Example 1 - Solution 3:
Improve upon Example 1, Solution 2 by removing the need to create the extra 4 variables. Note that as with most
function calls in SAS, IFC can easily be embedded within CATX.
Example 2 - Solution 1
If grade ge 70 then status = "Pass";
else if grade = . then status = "Pending";
else status = "Fail";
This coding works fine but is wordier than necessary.
Example 2 - Solution 2
Recall that IFC returns a character value based on whether an expression is
• true ( not 0 or . )
• false ( 0 )
• or missing ( . ).
The previous example will only result in Pass and Fail, since the logical expression will only result in True (1) or
False(0). BEWARE!! It is helpful to note that when SAS evaluates an expression, a value of ZERO denotes FALSE
and any other numeric non-missing values other than zero denotes TRUE.
4
Marje Fecht, Prowerk Consulting Ltd, 2014
Example 2 - Solution 3
Correct the above logic by using a mechanism to insure that a missing value of grade will result in a missing value
for the expression. A solution I commonly employ is to multiply the logical expression ( grade ge 70 ) by the
variable being evaluated ( grade ).
For example, for
the expression is
• TRUE when grade ge 70
since grade * (grade ge 70) = grade * 1 = grade TRUE
• FALSE when grade lt 70 and grade NE .
since grade * (grade ge 70) = grade * 0 = 0 FALSE
• MISSING when grade = .
since grade * (grade ge 70) = . * 0 = . MISSING.
• TRIMN removes trailing blanks – returns null for values that are all blank
• INDEX, INDEXC, INDEXW locates position of first occurrence of string, Character, or Word
• PROPCASE handles upper / lower case to assist with Proper Names and Addresses, etc.
5
Marje Fecht, Prowerk Consulting Ltd, 2014
CODE GENERALIZATION
If you suffer from copy-paste syndrome or if every request seems to require a “from scratch” build, then you need to
work on changing your approach!! Before you begin writing a program, think about what could change about this
request in the future? If the request is time-specific or if it focuses on just a subset of the available population,
chances are good that you can generalize the code so that it can be easily adapted to future requests. Think also
about what have I written before that could be leveraged now?
In computer science and software engineering, reusability is the likelihood that a segment of source code
can be used again to add new functionalities with slight or no modification.
My programs commonly include a comment line of /***** NO CHANGES BELOW HERE *****/
since I pass parameters to my source code and use generalized coding practices with
• User Defined Macro variables for “static” information
• Data-Driven values via metadata or functions for “dynamic” information
• Generalized location / naming / etc to enable easy changes
• File names and locations that are parameterized
• System locations, options, settings that are generalized and parameterized.
My source code remains “un-touched”, unless upgrades are implemented, which then roll-out to all programs that call
the source code.
• PROJECT974_20120507_1329.log
• PROJECT974_20120507_1329.lst
Solution:
Create a program that calls the latest version of the source, and call that “control program” from your drivers.
Now, when source changes, your drivers always call the most current version.
Helpful Hint:
• There is no limit to the # of %include statements you can use
• If you want the code from a %include to display in your log, use the SOURCE2 option
• In addition to %include, reusable modules may be
• Macros
• Format Libraries.
7
Marje Fecht, Prowerk Consulting Ltd, 2014
CODE GENERALIZATION : DYNAMIC CODE
Reporting and analytics requests frequently revolve around lists of dates, campaigns, products, departments, etc.
Thus, generalizing your code to dynamically create and accept LISTS is worth the effort. If a list can be
programmatically generated, you avoid manual input and thus the introduction of typos and errors.
Example: Extract and summarize data for all marketing within a given date range and focus.
Solution 1: Create a generalized program that expects a list of all campaign codes for the date range with the
focus of interest. Manually input the list of campaign codes within the dates of interest.
%let codes =
2010307ABC
,2010337ABC
,2011003ABC
... Etc ...
;
Problem: Someone has to manually locate the campaign codes of interest and correctly input them for the program.
Solution 2: Use SQL to build a macro variable ( codes ) that contains a comma-delimited list of all of the campaign
codes
• with campaign “drop” between a specified begin and end date
• that end in ‘ABC’ since that identifies the campaign focus.
from mktg_metadata
where
drop_dt between %str(%')&start%str(%')
and %str(%')&end%str(%')
and substr(cmpgn_code , 8 , 3) = 'ABC'
;
%let NumCodes = &SQLOBS; /** # of rows returned from SQL query **/
quit;
8
Marje Fecht, Prowerk Consulting Ltd, 2014
DYNAMIC VARIABLE NAMING
Suppose you need to summarize the latest 3 months of data, and you want monthly variables representing the totals
for each month of data.
You plan to use SQL with CASE to create the monthly amounts, and you don’t want to manually intervene with the
program. Instead, you want the program to determine the current month and generate the latest three months of data
and names.
The SQL clause to bucket the monthly data might look like:
Example: Create 3 macro variables that contain the current and two previous months in the format: yymm
%let M0 = %sysfunc( today() , yymmN4.);
Note: when INTNX is used in %sysfunc, do not use quotes for the arguments of INTNX.
The above code produces three macro variables with values such as
• M0 = 1205
• M1 = 1204
• M2 = 1203
9
Marje Fecht, Prowerk Consulting Ltd, 2014
Example: Create 2 macro variables per month that contain the 1st and last day of the month, in SAS date format.
%let M2_beg = %sysfunc( intnx( MONTH
, %sysfunc( today() ) , -2 , B) /** return Beginning of month **/
, date9.);
Note: Use whatever date format is appropriate for your data. For the example data, a SAS Date value is used.
If you are inclined to copy-paste and you expand this for the three months requested, the below code creates 3 macro
variables per month to create the variable suffix (yymm) and the beginning and end date ranges for each month. This
works but can obviously be improved upon.
The 3 macro variables for each of the 3 months could be used to compute the monthly totals in code such as:
select
sum(case when txn_date between
"&M0_beg"d and "&M0_end"d
then txn_amt else 0 end ) as Tot_Amt_&m0
from amts
10
Marje Fecht, Prowerk Consulting Ltd, 2014
Helpful Hint:
The previous code assumed that SAS Date Values are needed (ddMMMyy).
Notice that the code could easily be generated in a macro loop as long as MaxMonth (the maximum number of
monthly computations) is defined.
Developing the appropriate code requires planning and understanding of the desired outcome.
M0 1205 Tot_Amt_1205
M0_Beg 1May2012
M0_End 31May2012
M1 1204 Tot_Amt_1204
M1_Beg 1Apr2012
M1_End 30Apr2012
. . . . . .
M11 1106 Tot_Amt_1106
M11_Beg 01Jun2011
M11_End 30Jun2011
11
Marje Fecht, Prowerk Consulting Ltd, 2014
Example: Create a macro to generate the Macro Variable Names and Values, using as input only today’s date and
the number of months desired ( MaxMonth ).
%macro Monthly;
%do Num = 0 %to &MaxMonth;
/*** use %global if macro variables needed outside macro ***/
%global M&Num. M&Num._beg M&Num._end;
%end;
%mend Monthly;
%Monthly
%put _user_; /** write all user defined macro variables and values to LOG **/
The above example dynamically generates the macro variables needed for the extensible solution. In a similar macro
%DO loop, the SQL to create the case statements could be accomplished.
REUSE SPACE !!
Your use (or abuse) of SAS Work Space could crash other jobs (including yours)! If you are processing large data
files, please delete intermediate data sets when they are no longer needed. For example, once joins are complete
with other data, delete the intermediate data sets that comprised the join.
PROC DATASETS is handy to manage your SAS datasets including deleting files, modifying attributes, changing
names, etc.
Note: Before deleting intermediate data, you should confirm there are no error conditions, etc.
12
Marje Fecht, Prowerk Consulting Ltd, 2014
MINIMIZE THE AMOUNT OF DATA YOU READ
You do not have to read data to change many data set attributes. Many SAS programmers rely on the DATA step to
handle changes to variable attributes (labels, formats, renaming, etc.). However, the DATA step reads every record
which is problematic if your data include millions of records. Instead, the following PROC DATASETS example
• changes a data set name
• changes variables names
• assigns a variable format.
No data values are read!
Each data set in a SAS generation data group has the same member name (data set name) but has a different
version number. Every time the SAS data set is updated, a new generation data set is created and the version
numbers of the older versions are incremented. The DEFAULT version is called the base version, and is the most
recent version of the data. What are the advantages of using generation data sets?
• The most current version of the generation data group is referenced using the base data set name. Thus,
downstream programs do not need to worry about date-time stamps in the data set name.
• Older versions of the data are available for PROC COMPARE testing when validating the data.
• You don’t have to remember to save the data before creating a new version .
For further information on creating and using generation data sets, see the genmax and gennum data set options in
SAS online documentation. Or, reference Lisa Eckler’s SAS Global Forum Paper on Generation Data Sets:
http://support.sas.com/resources/papers/proceedings12/051-2012.pdf .
CONCLUSION
Deadlines and deliverables leave little time for learning new tricks and making changes to existing programs. But
there are improvements that can be made that save you time and headaches.
This paper provides techniques you may want to consider to update and improve your programs. In the long run, the
changes could generalize your code and decrease your maintenance efforts.
Your comments and questions are valued and encouraged. Contact the author at:
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the
USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective
companies.
13
Marje Fecht, Prowerk Consulting Ltd, 2014