Top Ten SAS Performance Techniques PDF
Top Ten SAS Performance Techniques PDF
Top Ten SAS Performance Techniques PDF
Introduction
When developing SAS program code and/or applications, efficiency is not always given the attention it deserves, particularly in
the early phases of development. System performance requirements can greatly affect the behavior an application exhibits.
Active user participation is crucial to understanding application and performance requirements.
Attention should be given to each individual program function to assess performance criteria. Understanding user expectations
(preferably during the early phases of the application development process) often results in a more efficient application.
Consequently, the difficulty associated with improving efficiency as coding nears completion is often minimized. This paper
highlights several areas where a program's performance can be improved when using SAS software.
Efficiency Objectives
Efficiency objectives are best achieved when implemented as early as possible, preferably during the design phase. But when
this is not possible, for example when customizing or inheriting an application, efficiency and performance techniques can still
be "applied" to obtain some degree of improvement. Efficiency and performance strategies can be classified into five areas:
CPU Time, Data Storage, Elapsed Time, I/O, and Memory.
Jeffrey A. Polzin of SAS Institute Inc. shared his thoughts about measuring efficiency, "CPU time and elapsed time are baseline
measurements, since all the other measurements impact these in one way or another." He continues by saying, "... as one
measurement is reduced or increased, it influences the others in varying degrees."
The simplest of requests can fall prey to one or more efficiency violations, such as retaining unwanted datasets in work space,
not subsetting early to eliminate undesirable observations, or reading wanted as well as unwanted variables. Much of an
applications inefficiency can be avoided with better planning and knowing what works and what does not prior to beginning
the coding process. Most people do not plan to fail - they just fail to plan. Fortunately, efficiency gains can be realized by
following a few guidelines.
The following suggestions are not meant as an exhaustive list of all known efficiency techniques, but as a sampling of proven
methods that can provide some measure of efficiency. Performance tuning techniques are presented for the following resource
areas: CPU time, data storage, I/O, memory, and programming time. Selective coding examples are illustrated in Table 1.
MWSUG 2014
CPU Time
1)
2)
3)
4)
Data Storage
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
I/O
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
11) Use the OUT= option with PROC SORT to reduce I/O operations.
12) The BUFNO= option can be specified to adjust the number of open page buffers when processing SAS datasets.
MWSUG 2014
Memory
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
11)
Programming Time
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
Survey Results
A survey was conducted to elicit responses from participants on efficiency and performance. The Efficiency and Performance
Survey is illustrated in Table 2. Analyzing the responses from each participant provided a better appreciation for what users
and application developers look for as they apply efficiency methods and strategies.
The purpose for constructing the survey in the first place began in order to assess the general level of understanding that
people have with various efficiency methods and techniques. What was found was quite interesting. The majority of users and
application developers want their applications to be as efficient as possible. Many go to great lengths to implement sound
strategies and techniques achieving splendid results. Unfortunately for others, a lack of familiarity with effective techniques
often results in a situation where the application works, but may not realize its true potential.
Survey participants often indicated that efficiency and performance tuning is not only important, but essential to their
application. Many cite response time as a critical objective and are always looking for ways to improve this benchmark. Charles
Edwin Shipp of Shipp Consulting offers these comments on applying efficiency techniques, "Efficiency shouldn't be considered
as a one-time activity. It is best to treat it as a continuing process of reaching an optimal balance between competing resources
and activities."
MWSUG 2014
4. (Continued)
data _null_;
length pageno rptdate 4;
set sales;
file report header=h;
put @10 item $20.
@35 sales comma6.2;
return;
h:
rptdate=today();
pageno + 1;
put @20 'Sales Report'
/ @1 rptdate mmddyy10.
/ @30 'Page ' pageno 4. //;
return;
run;
data af_users;
set sands.members
(keep=name company phone user);
if user = 'SAS/AF';
run;
2. The CLASS statement provides the ability to perform bygroup processing without the need for data to be sorted
first in a separate step. Consequently, CPU time can be
saved when data is not already in the desired order. The
CLASS statement can be used in the MEANS and SUMMARY
procedure.
data capitols;
set states;
if state='CA' then capitol = 'Sacramento';
else if state='FL' then capitol = 'Tallahassee';
else if state='TX' then capitol = 'Austin';
run;
proc sql;
title1 'SAS/AF Programmers/Users';
select * from sands.members
where user = 'SAS/AF'
order by name;
quit;
Table 1.
MWSUG 2014
Other universally accepted findings consist of using WHERE, LENGTH, CLASS and KEEP=/DROP= data set options to retain only
those variables necessary to the application; avoiding unnecessary sorting; verify the efficiency of simple and/or composite
indexes using the IDXNAME= or IDXWHERE= OPTION; using SAS functions; and constructing DATA _NULL_ steps as effective
techniques to improve the efficiency of an application.
Techniques receiving "strong" (between Sometimes and Always), but not unanimous, support among survey participants
include using system options to control resources; deleting unwanted WORK datasets; combining two or more steps into a
single step; storing and using formats and informats; creating and using simple and composite indexes consisting of
discriminating variables; using the APPEND procedure to concatenate two data sets; constructing IF-THEN/ELSE statements to
improve conditional processing; and saving intermediate files, especially for large multi-step jobs.
Sunil Kumar Gupta of Gupta Programming offers these suggestions on assigning informats, formats, and labels, "Informats,
formats, and labels are stored with many of our important SAS datasets to minimize processing time. A reason for using this
technique is that many popular procedures use stored formats and labels as they produce output, eliminating the need to assign
them in each individual step. This provides added incentives and value for programmers and end-users, especially since
reporting requirements are usually time critical."
A very interesting approach being used more users to achieve greater efficiency is to use the SQL Pass-Through Facility to
access data stored in one or more database environments. The advantage for users is that this forces all processing to be
performed on the host database (e.g., Oracle, DB2, Access, etc.) which is where it should be. Also, the SAS software and its
associated processing costs are automatically transferred to the host database for even greater efficiencies.
The techniques cited by survey participants as "Sometimes" being used to achieve efficiency include using DATA set options,
using data compression, conserving memory by turning off unnecessary components and/or options, using the SQL procedure
to consolidate and simplify multiple operations, using the Stored Program Facility, creating and using DATA and SQL views to
control environments where duplication of data is rampant, and using the DATASETS procedure COPY statement for databases
with one or more indexes.
2)
3)
Insufficient time and inadequate budgets can often be attributed to ineffective planning and implementation of efficiency
strategies.
MWSUG 2014
Conclusion
The value of implementing efficiency and performance strategies into an application cannot be over-emphasized. Careful
attention should be given to individual program functions, since one or more efficiency techniques can often affect the
architectural characteristics and/or behavior an application exhibits.
Efficiency techniques are learned in a variety of ways. Many learn valuable techniques through formal classroom instruction,
while others find value in published guidelines such as books, manuals, articles, and videotapes. But the greatest value comes
from others experiences, as well as their own, by word-of-mouth, and on the job. Whatever the means, a little efficiency goes
along way.
References
Fournier, Roger, 1991. Practical Guide to Structured System Development and Maintenance. Yourdon Press Series. Englewood
Cliffs, N.J.: Prentice-Hall, Inc., 136-143.
Hardy, Jean E. (1992), "Efficient SAS Software Programming: A Version 6 Update," Proceedings of the Seventeenth Annual SAS
Users Group International Conference, 207-212.
Lafler, Kirk Paul (2014), "Top Ten SAS Performance Tuning Techniques," MWSUG 2014 and SESUG 2014 Conferences.
Lafler, Kirk Paul (2012), "Top Ten SAS Performance Tuning Techniques," KCASUG 2012 Conference.
Lafler, Kirk Paul (2012), "Top Ten SAS Performance Tuning Techniques," MWSUG 2012, SCSUG 2012, NESUG 2012 Conferences.
Lafler, Kirk Paul (2012), "Essential SAS Coding Techniques for Gaining Efficiency," MWSUG 2012 Conference.
Lafler, Kirk Paul (2009), "SAS Performance Tuning Techniques," Twin Cities Area SAS Users Group (TCASUG) 2009 Meeting.
Lafler, Kirk Paul (2007), "SAS Performance Tuning Techniques," WUSS 2007 Conference.
Lafler, Kirk Paul (2000), "Efficient SAS Programming Techniques," MWSUG 2000 Conference.
Lafler, Kirk Paul (1985), "Optimization Techniques for SAS Applications," Proceedings of the Tenth Annual SAS Users Group
International Conference, 530-532.
Polzin, Jeffrey A. (1994), "DATA Step Efficiency and Performance," Proceedings of the Nineteenth Annual SAS Users Group
International Conference, 1574-1580.
SAS Institute Inc. (1990), SAS Programming Tips: A Guide to Efficient SAS Processing, Cary, NC, USA.
Valentine-Query, Paige (1991), "Introduction to Efficient Programming Techniques," Proceedings of the Sixteenth Annual SAS
Users Group International Conference, 266-270.
Wilson, Steven A. (1994), "Techniques for Efficiently Accessing and Managing Data," Proceedings of the Nineteenth Annual SAS
Users Group International Conference, 207-212.
Acknowledgments
The author thanks Brian Varney and Misty Johnson, MWSUG 2014 SAS 101 Section Chairs, for accepting my abstract and paper;
as well as Cindy Lee, MWSUG 2014 Academic Chair, Craig Wildeman, MWSUG 2014 Operations Chair, and the MWSUG
Executive Committee for organizing a great conference!
Trademark Citations
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the
USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective
companies.
Author Information
Kirk Paul Lafler is consultant and founder of Software Intelligence Corporation and has been using SAS since 1979. He is a SAS
Certified Professional, provider of IT consulting services, trainer to SAS users around the world, mentor, and sasCommunity.org
emeritus Advisory Board member. As the author of six books including Google Search Complete (Odyssey Press. 2014); PROC
SQL: Beyond the Basics Using SAS, Second Edition (SAS Press. 2013); PROC SQL: Beyond the Basics Using SAS (SAS Press. 2004);
Kirk has written more than five hundred papers and articles, been an Invited speaker and trainer at four hundred-plus SAS
International, regional, special-interest, local, and in-house user group conferences and meetings, and is the recipient of 23
Best contributed paper, hands-on workshop (HOW), and poster awards.
Comments and suggestions can be sent to:
Kirk Paul Lafler
Senior SAS Consultant, Application Developer, Data Scientist, Trainer and Author
Software Intelligence Corporation
E-mail: [email protected]
LinkedIn: http://www.linkedin.com/in/KirkPaulLafler
Twitter: @sasNerd
6
MWSUG 2014
Organization: ____________________________________________
Contact Date: ____________________________________________
"I am conducting a survey for a regional SAS user group paper that I am writing. The topic of the paper is efficiency and how it relates to the SAS
Software. Could you spare a few minutes to answer a few questions on this subject?"
1.
2.
Have you received any training (formal or informal) in efficiency and performance strategies?
3.
4.
Rate whether the following efficiency measurement categories have importance in your environment.
(Use the following rating scale: 1=Not Important, 2=Somewhat Important, 3=Very Important.)
a. _____ CPU Time
d. _____ I/O
e. _____ Memory
5.
In response to question #4, which measurement has the greatest importance in your environment? ________________
Why?: _____________________________________________________________________________________________
6.
At what time(s) during the application development process do you consider using efficiency and performance techniques?
____
____
____
____
7.
____
____
____
Testing Phase
Implementation Phase
Maintenance/Enhancement Phase
Rate the following techniques and/or strategies that you have used in your environment to improve a program's/application's
efficiency and/or performance? (Use the following rating scale: 1=Never, 2=Sometimes, 3=Always.)
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
Other: