0% found this document useful (0 votes)

68 views12 pages

Efficiency Considerations Using The SAS System: Rick Langston, SAS Institute Inc., Cary, NC

This paper examines a few of the most common questions regarding efficient SAS code. The main focus is to examine cpu usage differences in the areas listed below. If statements versus where statements in data steps if statements versus SELECT statements data step views Functions versus arrays versus direct SAS coding data step stored programs threaded versus non-threaded performance for PROC.

Uploaded by

itsmesaru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views12 pages

Efficiency Considerations Using The SAS System: Rick Langston, SAS Institute Inc., Cary, NC

Uploaded by

itsmesaru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

SUGI 30 Applications Development

Paper 002-30

Efficiency Considerations Using the SAS® System

Rick Langston, SAS Institute Inc., Cary, NC

ABSTRACT
There are many products being released by SAS Institute that either use an abundance of SAS code, or generate
SAS code, or both. We have begun an internal process to assist product developers in reviewing this SAS code for
issues of efficiency, host compatibility, performance, and so on. We also realize that these issues are of interest and
benefit to all SAS programmers. This paper is a collection of some of our initial performance analyses in writing
efficient SAS code.

INTRODUCTION
This paper examines a few of the most common questions regarding efficient SAS code. Its main focus is to
examine CPU usage differences in the areas listed below. Except where noted, all examples were tested on
production versions of SAS on UNIX platforms.
• IF statements versus WHERE statements in DATA steps
• IF statements versus SELECT statements
• PROC SORT versus SQL ORDER BY
• Issues involving DATA step views
• Functions versus arrays versus direct SAS coding
• DATA step stored programs
• Threaded versus non-threaded performance for PROC SUMMARY
• Table lookup comparing PROC FORMAT, merging, and the hash table object

It is important to understand that the real times and CPU times shown in the logs in this paper are not necessarily
indicative of the times you may see if you try the same tests. These times vary greatly between operating systems
and between machine configurations. The times are used here primarily for purposes of comparison to help you to
make decisions on SAS coding practices.

IF OR WHERE
A common question regarding efficiency is "Which do I use: IF or WHERE?" These examples compare the
performance of three different methods of subsetting a third of the observations in a 500,000-observation data set.

SUBSETTING IF STATEMENT
data temp; set mydata.x500000;
if mod(obs,3)=0;
run;

NOTE: There were 500000 observations read from

the data set MYDATA.X500000.
NOTE: The data set WORK.TEMP has 166666 observations
and 11 variables.
NOTE: DATA statement used:
real time 8.44 seconds
cpu time 1.73 seconds

WHERE CLAUSE
data temp; set mydata.x500000;
where mod(obs,3)=0;
run;

NOTE: There were 166666 observations read

from the data set MYDATA.X500000.
WHERE MOD(obs, 3)=0;
NOTE: The data set WORK.TEMP has 166666
observations and 11 variables.

1
SUGI 30 Applications Development

NOTE: DATA statement used:

real time 8.66 seconds
cpu time 2.18 seconds

WHERE SPECIFICATION IN PARENTHETICAL OPTIONS

data temp; set mydata.x500000(where=(mod(obs,3)=0));
run;

NOTE: There were 166666 observations read

from the data set MYDATA.X500000.
WHERE MOD(obs, 3)=0;
NOTE: The data set WORK.TEMP has 166666 observations and 11 variables.
NOTE: DATA statement used:
real time 8.89 seconds
cpu time 2.18 seconds

In the three previous examples, the subsetting IF statement is slightly faster, but not significantly so. Interestingly,
the WHERE statement is a little faster than the parenthetical WHERE statement.

Performance in these examples is close enough that the choice of an IF statement versus a WHERE statement
versus a WHERE option is arbitrary. However, if the subsetting is to be performed using a LIBNAME engine against
a database that is optimized for WHERE processing, then the WHERE choice is preferred. A subsetting IF
statement requires all observations to be retrieved from the database, with the DATA step code determining whether
the observation is to be kept. Using a WHERE statement in these circumstances retrieves only those observations
that match the criteria for subsetting.

IF-THEN/ELSE OR SELECT
To generate a significant number of IF-THEN/ELSE statements and SELECT/WHEN statements, we use the
following SAS code to create programs with 100 clauses of each type. The code assigns a random value to y based
on increasing values of x. The file SASCODE1 will contain IF-THEN statements, and the file SASCODE2 will contain
SELECT/WHEN statements. SASCODE1 and SASCODE2 are both TEMP files that are referenced by a %INCLUDE
statement when each DATA step program is run.

filename sascode1 temp;

filename sascode2 temp;
data _null_;
file sascode2; put 'select(x1);';
do i=1 to 100;
j=ranuni(43431);
file sascode1;
if i>1 then put 'else ' @;
put 'if x1=' i ' then ' j= ';';
file sascode2;
put 'when(' i ')' j= ';';
end;
file sascode2; put 'otherwise; end;';
run;

IF-THEN/ELSE STATEMENTS
19 data temp; set mydata.x500000;
20 %include sascode1/source2;
21 +if x1=1 then j=0.1093699425 ;
22 +else if x1=2 then j=0.9230094435 ;
23 +else if x1=3 then j=0.7694377665 ;
...
119 +else if x1=99 then j=0.3876921946 ;
120 +else if x1=100 then j=0.9119509677 ;
121 run;
...
NOTE: DATA statement used:
real time 9.17 seconds
cpu time 2.61 seconds

2
SUGI 30 Applications Development

SELECT/WHEN STATEMENTS
122 data temp; set mydata.x500000;
123 %include sascode2/source2;
124 +select(x1);
125 +when(1 )j=0.1093699425 ;
126 +when(2 )j=0.9230094435 ;
...
223 +when(99 )j=0.3876921946 ;
224 +when(100 )j=0.9119509677 ;
225 +otherwise; end;
226 run;
...
NOTE: DATA statement used:
real time 9.08 seconds
cpu time 2.51 seconds

The SAS log shows that 100 IF-THEN/ELSE statements took 2.61 CPU seconds to run, while the corresponding
SELECT statement with 100 WHEN clauses took a little less time, CPU 2.51 seconds.

The SELECT statement is only slightly better in this example. WHEN clauses can have scalar values only, whereas
IF-THEN/ELSE statements don’t have that limitation.

Remember that the ELSE clause makes a big difference in IF-THEN/ELSE performance. If the ELSE clause had
been omitted here, the example would have run much slower. Consider another example:

if x=1 and y=2 then z=3;

else if x=3 and y=4 then z=7;

In the first case, if x=1 and y=2, then it is not possible for x=3 and y=4, so it is appropriate to provide an ELSE
clause. If the ELSE clause were omitted, and the condition of x=1 and y=2 were true, then the execution of the
second IF statement would be unnecessary and cause additional execution time.

But consider this example:

if x=1 and y=2 then z=3;

if z=3 and q=7 then z=7;

The conditions in the two IF statements are not mutually exclusive. Therefore, adding an ELSE clause might cause z
to not be properly set. You must examine your specifications carefully when using IF-THEN/ELSE statements.

Remember also that you can't always substitute a SELECT/WHEN statement for an IF-THEN/ELSE statement. Here
is an example:

if x=1 and y=2 then z=3;

else if x=3 and y=4 then z=7;

This expression cannot be represented using a simple SELECT statement. You could use the following scenario,
which has an implied ELSE clause, but it has no advantage over the IF-THEN/ELSE statement.

select(1);
when(x=1 and y=2) do; z=3; end;
when(x=3 and y=4) do; z=7; end;
end;

SORT OR SQL
This example demonstrates the performance difference between using SQL to sort a data set and using the
traditional PROC SORT approach. Examples show three ways to create a data set TEMP1 with the variables obs
and x3, to sort by those variables, and then to subset.

DATA STEP AND PROC SORT

6 data temp1; set mydata.x500000(keep=obs x3);

NOTE: DATA statement used:

3
SUGI 30 Applications Development

real time 8.31 seconds

cpu time 1.54 seconds

7 proc sort; by x3 obs;

NOTE: PROCEDURE SORT used:

real time 4.62 seconds
cpu time 4.53 seconds

PROC SORT WITH KEEP OPTION

8 proc sort data=mydata.x500000(keep=obs x3)
9 out=temp2; by x3 obs;

NOTE: PROCEDURE SORT used:

real time 14.41 seconds
cpu time 5.67 seconds

The execution time is slightly longer for the multi-step process. You might find it faster to combine the subsetting and
sorting into a single step.

PROC SQL WITH ORDER BY CLAUSE

11 proc sql;
12 create table temp3 as
13 select obs,x3 from mydata.x500000
14 order by x3,obs;
15 run;
16

NOTE: PROCEDURE SQL used:

real time 16.33 seconds
cpu time 7.96 seconds

Here we perform the same subsetting in PROC SQL using SELECT ... FROM and perform the sorting via the
ORDER BY clause. The execution time is greater, just under eight seconds.

Deciding whether to use PROC SQL depends partly on what engine is involved. SAS/ACCESS engines might be
able to pass SQL code directly to the database for faster execution. PROC SORT can be very competitive in its
sorting techniques. Often you will either be intimately familiar with SAS code and choose PROC SORT, or you will
choose SQL based on that familiarity.

USING DATA STEP VIEWS

DATA step views can be handy, but you must proceed with caution when using them. Using the DATA step to create
a new data set might be less costly in terms of performance because the user of the view might not be aware of the
code that is used to define the view. In the following examples, one DATA step reads a view that is created via the
INFILE and INPUT statements. The other DATA step creates a permanent SAS data set.

DATA STEP VIEWS

1 data fromfile/view=fromfile;
2 infile 'mydata.dat';
3 input x y z;
4 run;
...
5
6 data new; set fromfile;
7 run;

NOTE: View WORK.FROMFILE.VIEW used:

real time 5.69 seconds
cpu time 3.94 seconds

NOTE: DATA statement used:

real time 5.72 seconds
cpu time 1.62 seconds

4
SUGI 30 Applications Development

SAS DATA SET

9 data new;
10 infile 'mydata.dat';
11 input x y z;
12 run;

NOTE: DATA statement used:

real time 4.24 seconds
cpu time 4.02 seconds
13
14 data new2; set new;
15 run;

NOTE: DATA statement used:

real time 1.81 seconds
cpu time 1.70 seconds

Notice that the overall CPU execution time for the DATA step view example is 3.94 seconds (for the view execution)
+ 1.62 seconds (for the DATA step execution). By contrast, if we create the SAS data set and use a SET statement
on the data set instead of the view, the CPU time is considerably less, only 1.7 seconds. You need to be careful
when using views, especially when dealing with external files. The dynamic nature of external files makes them
powerful but CPU intensive.

FUNCTION OR HARDCODING OR ARRAY

Here we consider several ways to accomplish the same task—determining the sum of a list of variables in a
500,000-observation data set. The first example uses the SUM function. The second uses a DIY approach—Do It
Yourself—with SAS operators to see if the code runs faster. The third example tests the use of an ARRAY statement
to accumulate sums. The execution time should be greater due to the loop iteration.

SUM FUNCTION
5 data new; set mydata.x500000;
6 sum = sum(of x1-x10);
7 run;

NOTE: DATA statement used:

real time 9.84 seconds
cpu time 2.04 seconds

SAS OPERATORS
9 data new; set mydata.x500000;
10 sum = x1+x2+x3+x4+x5+x6+x7+x8+x9+x10;
11 run;

NOTE: DATA statement used:

real time 9.43 seconds
cpu time 1.93 seconds

ARRAY STATEMENT

13 data new; set mydata.x500000;

14 array x x1-x10;
15 sum = 0;
16 do over x;
17 sum + x;
18 end;
19 run;

NOTE: DATA statement used:

real time 10.37 seconds
cpu time 2.88 seconds

5
SUGI 30 Applications Development

Using the SUM function to sum a list of variables takes CPU 2.04 seconds, while the DIY method with SAS operators
takes CPU 1.93 seconds. The ARRAY statement uses CPU 2.88 seconds to complete the task. There’s not a great
deal of performance improvement gained in the DIY method. If the function is performing more complex operations,
it’s certainly better to rely on the function because the code is simpler and easier to maintain.

STORED PROGRAM CONSIDERATIONS

This section considers DATA step stored programs as a means to improve efficiency. Stored programs are pre-
compiled DATA steps that are subsequently executed with only a resolution phase and no subsequent compilation
phase.

The following SAS macro program helps demonstrate stored program usage. The macro produces SAS code with a
specified number of IF statements and with a different specified number of passes. This macro generates SAS code
that in turn generates other SAS code. The program is run as a single DATA step and as a stored program for
comparison.
filename sascode temp;
%macro runtest(dsname,npasses,nlines);
data _null_; file sascode;
put "do npasses=1 to &npasses;";
put " x=ceil(ranuni(13131)*&nlines);";
do i=1 to &nlines;
put @5 'if x=' i ' then y=' i ';';
end;
put @5 'end;';
put 'run;';
data &dsname.; %include sascode;
data &dsname./pgm=&dsname.; %include sascode;
data pgm=&dsname.; execute; run;
%mend;

RUNNING THE SAS CODE WITH 1 PASS AND 10 IF STATEMENTS

22 %runtest(new1,1,10);
MPRINT(RUNTEST): data _null_;
MPRINT(RUNTEST): file sascode;
MPRINT(RUNTEST): put "do npasses=1 to 1;";
MPRINT(RUNTEST): put " x=ceil(ranuni(13131)*10);";
MPRINT(RUNTEST): do i=1 to 10;
MPRINT(RUNTEST): put @5 'if x=' i ' then y=' i ';';
MPRINT(RUNTEST): end;
MPRINT(RUNTEST): put @5 'end;';
MPRINT(RUNTEST): put 'run;';

RUNNING THE SAS CODE WITH 1000 PASSES

MPRINT(RUNTEST): data new3;
MPRINT(RUNTEST): do npasses=1 to 1000;
MPRINT(RUNTEST): x=ceil(ranuni(13131)*10);
MPRINT(RUNTEST): if x=1 then y=1 ;
MPRINT(RUNTEST): if x=2 then y=2 ;
...
MPRINT(RUNTEST): if x=10 then y=10 ;
MPRINT(RUNTEST): end;
MPRINT(RUNTEST): run;

NOTE: DATA statement used:

real time 0.03 seconds
cpu time 0.01 seconds

COMPILING THE SAS CODE AS A STORED PROGRAM

MPRINT(RUNTEST): data new3/pgm=new3;
...

6
SUGI 30 Applications Development

STORED PROGRAM WITH 1000 PASSES AND 10 STATEMENTS

MPRINT(RUNTEST): data pgm=new3;
MPRINT(RUNTEST): execute;
MPRINT(RUNTEST): run;

NOTE: DATA statement used:

real time 0.03 seconds
cpu time 0.02 seconds

With 1000 passes, but only 10 statements, the compile-and-run DATA step is a little faster than the stored program.
The compilation time isn’t counted toward the final execution time. This example shows that the compile phase for a
small program is a tiny fraction of the overall execution, so creating a stored program is not very beneficial to overall
performance.

STORED PROGRAM WITH 1000 PASSES AND MANY STATEMENTS

MPRINT(RUNTEST): data new3;
MPRINT(RUNTEST): do npasses=1 to 1000;
MPRINT(RUNTEST): x=ceil(ranuni(13131)*1000);
MPRINT(RUNTEST): if x=1 then y=1 ;
MPRINT(RUNTEST): if x=2 then y=2 ;
...
MPRINT(RUNTEST): if x=999 then y=999 ;
MPRINT(RUNTEST): if x=1000 then y=1000 ;
MPRINT(RUNTEST): end;
MPRINT(RUNTEST): run;

NOTE: DATA statement used:

real time 0.46 seconds
cpu time 0.23 seconds

MPRINT(RUNTEST): data new3/pgm=new3;

MPRINT(RUNTEST): data pgm=new3;

MPRINT(RUNTEST): execute;
MPRINT(RUNTEST): run;

NOTE: DATA statement used:

real time 0.18 seconds
cpu time 0.09 seconds

Here the compile-and-execute time is greater than when running only the stored program. In those rare cases where
the program size is indirectly proportional to execution passes, the stored program might be useful.

Be aware that if a stored program contains macros, they are resolved at compilation time and have no further
bearing. In this example, the &MYVALUE macro variable is reset to second, but it has no bearing when the stored
program is run a second time.

1 %let myvalue=first;
2 data xyz/pgm=xyz;
3 x="&myvalue.";
4 put x=;
5 run;

7 data pgm=xyz; run;

x=first

8
9 %let myvalue=second;
10 data pgm=xyz; run;

x=first

We can conclude that the larger the compiled code, with fewer execution passes, the more beneficial the stored
program can be.
7
SUGI 30 Applications Development

COMPARISON OF VARIOUS KEY LOOKUP METHODS

This section examines different key lookup methods:
• formats invoked via the PUT function
• sorting and merging
• the KEY= option
• the hash object
The first step is to create a demonstration key data set where the keys go from 1 to 5000, with the corresponding
label being only the Z4 representation of the key, such as 15 having the label of 0015. The result is stored as a
CNTLIN= data set for subsequent use by PROC FORMAT. The second step is to create the format, a one-time
operation. The final step is to create some random test data. The test data set has 100,000 observations and three
key variables, each getting random values between 1 and 5000.

data cntlin;
fmtname='testfmt';
do start=1 to 5000;
label=put(start,z4.);
output;
end;
run;

8 proc format cntlin=cntlin;

NOTE: Format TESTFMT has been output.
8 ! run;

NOTE: PROCEDURE FORMAT used (Total process time):

real time 0.26 seconds
cpu time 0.21 seconds
11 data keytest;
12 array keys key1-key3;
13 do i=1 to 100000;
14 do over keys;
15 keys=ceil(ranuni(13131)*5000);
16 end;
17 output;
18 end;
19 run;

FORMATS AND THE PUT FUNCTION

This example illustrates how to use the format for key lookup. The PUT function uses the format to create a
character value. The variable key1x contains the label for the corresponding key1 value.

22 data new1a; set keytest;

23 key1x=put(key1,testfmt.);
24 run;

NOTE: DATA statement used (Total process time):

real time 0.56 seconds
cpu time 0.54 seconds

MERGING

The more traditional way to perform key lookup is a merge. To perform a merge, the input data set must first be
sorted by the key. The sorting step is included in the overall CPU time.

26 proc sort data=keytest out=keytestx; by key1;

NOTE: PROCEDURE SORT used (Total process time):

real time 2.96 seconds
cpu time 0.65 seconds

27 data new1b; merge keytestx(in=want)

28 cntlin(keep=start label rename=(start=key1));
29 by key1;
8
SUGI 30 Applications Development

30 if want;
31 run;

NOTE: There were 100000 obs read from WORK.KEYTESTX.

NOTE: There were 5000 obs read from WORK.CNTLIN.
NOTE: The data set WORK.NEW1B has 100000 obs and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.39 seconds
cpu time 0.29 seconds

Note that the CPU time using the PUT function was .54 seconds. The sort, along with the merge, takes .65 + .29
seconds, or .94 CPU seconds, as compared to .54 CPU seconds for the format usage. But you might need an
additional sort. Remember that if the necessary sort disrupted the observations so that they needed to be sorted
again, more CPU time is consumed as shown here:

32 proc sort data=new1b; by i;

NOTE: PROCEDURE SORT used (Total process time):

real time 0.48 seconds
cpu time 0.48 seconds

The format and PUT function method does not disrupt the observation order. Multiple keys can be handled within the
same pass using the PUT function.

34 data new1a; set keytest;

35 key1x=put(key1,testfmt.);
36 key2x=put(key2,testfmt.);
37 key3x=put(key3,testfmt.);
38 run;

But for SORT/MERGE, you must sort and merge by each key separately, adding greatly to CPU time usage.

proc sort data=keytest out=new1b; by key1;

data new1b; merge new1b(in=want)
cntlin(keep=start label
rename=(start=key1 label=key1x));
by key1;
if want;
run;
proc sort data=new1b; by key2;

INDEXING

To illustrate using an indexed data set, the next step is to create the CNTLINX data set from the CNTLIN data set,
adding an index so that a key lookup can be performed to locate the corresponding label value.

58 data cntlinx(index=(start)); set cntlin;

60 data new1b; set keytest;

61 array keys key1-key3;
62 length key1x key2x key3x $4;
63 array keyx key1x key2x key3x;
64 do over keys;
65 start=9999; link getkey; _iorc_=0; _error_=0;
66 start=keys; link getkey; keyx=label;
67 end;
68 return;
69 getkey: set cntlinx key=start;
70 return;
71 run;

NOTE: DATA statement used (Total process time):

real time 26.85 seconds
cpu time 26.53 seconds

9
SUGI 30 Applications Development

Looking for the same key two times in a row is a problem that requires the introduction of a dummy key. Adding the
dummy key adds to the execution time. The real time increases dramatically: 26.85 seconds as compared to less
than 1 second using the other approaches.

HASH OBJECT

The hash object is a new feature in SAS®9 and a foray into object-oriented programming.

74 data new1c; set keytest;

75 array keys key1-key3;
76 length key1x key2x key3x $4;
77 array keyx key1x key2x key3x;
78 length start 8 label $4;
79 if _n_=1 then do;
80 declare hash h(hashexp: 4,
dataset:"work.cntlin", ordered: 'a');
81 h.defineKey('start');
82 h.defineData('label');
83 h.defineDone();
84 end;
85
86 do over keys;
87 start=keys; rc = h.find();
88 if rc=0 then keyx=label;
89 end;
90 run;
NOTE: DATA statement used (Total process time):
real time 0.82 seconds
cpu time 0.82 seconds

Populating the hash table along with all the searching took only .82 seconds. Although a little longer than the other
methods, the hash object method is faster when you consider that the setup time is included. Notice that the hash
table does not persist across DATA steps and must be re-created each time it is used.

THREADS OR NOTHREADS
To demonstrate threading in PROC SUMMARY, the first step is to create a large data set with over 7 million
observations. There are about 500 different values of the CLASS variable and up to 30,000 observations per class.
The values of x are random, with y and z being 1 and 2 larger, respectively.

data temp;
do cv=1 to 500;
classvar=ceil(ranuni(13131)*500);
nobs = ceil(ranuni(13131)*30000);
do i=1 to nobs;
x = ceil(ranuni(13131)*1000);
y = x + 1;
z = x + 2;
output;
end;
end;
keep classvar x y z;
run;

NOTHREADS OPTION
23 proc summary data=temp nothreads;
24 class classvar;
25 format classvar z5.;
26 var x y z;
27 output out=new
28 mean=mx my mz min=minx miny minz
29 var=varx vary varz;
30 run;

NOTE: PROCEDURE SUMMARY used (Total process time):

real time 16.32 seconds
cpu time 16.31 seconds

10
SUGI 30 Applications Development

THREADS OPTION
31 proc summary data=temp threads;
32 class classvar;
33 format classvar z5.;
34 var x y z;
35 output out=new
36 mean=mx my mz min=minx miny minz
37 var=varx vary varz;
38 run;

NOTE: PROCEDURE SUMMARY used (Total process time):

real time 13.93 seconds
cpu time 31.16 seconds

The first example ran explicitly with the NOTHREADS option. CPU time and real time match at 16.31 seconds. Note
that we use a FORMAT statement, multiple variables, and multiple statistics to compute this. Run with the THREADS
option, the real time for the same example is less than the CPU time because multiple threads run simultaneously,
and the CPU time is the sum of the thread execution time. The real time, 13.93 seconds, is a reduction over the
NOTHREADS run, which is the desired result.

The following examples show how different hardware platforms can influence performance and how each platform
can have significantly different performance indicators. To set up the example, the number of CLASS variables is
increased and the examples are run with the NOTHREADS and THREADS options on two different UNIX machines.

26 proc summary data=temp [threads] [nothreads];

27 class classvar01-classvar10;
28 format classvar01-classvar10 z5.;
...

UNIX 1, USING NOTHREADS

NOTE: PROCEDURE SUMMARY used (Total process time):
real time 1:10.76
cpu time 1:00.62

UNIX 1, USING THREADS

NOTE: PROCEDURE SUMMARY used (Total process time):
real time 32.36 seconds
cpu time 1:38.40

UNIX 2, USING NOTHREADS

NOTE: PROCEDURE SUMMARY used (Total process time):
real time 1:19.89
cpu time 1:19.68

UNIX 2, USING THREADS

NOTE: PROCEDURE SUMMARY used (Total process time):
real time 37.17 seconds
cpu time 2:58.69

Using the THREADS option on UNIX 1 is a considerable improvement in real time, going from 1:10.76 down to
32.36, using only 38.40 more seconds of CPU. While UNIX 2 has somewhat similar improvements in real time, the
THREADS option used more than twice the CPU time as the NOTHREADS option.

To benefit from threading, you need large amounts of data and you need to run on a multiprocessing machine. Not
all SAS procedures are threaded, although more procedures are being converted to support threading capability.
The THREADS/NOTHREADS global option can control whether any procedure will run with threading enabled.

11
SUGI 30 Applications Development

CONCLUSION
This paper provides examples of several choices that the SAS programmer can make regarding efficiency. In each
case, there are advantages and disadvantages to the choices. The SAS programmer should consider all aspects of
the application to make the appropriate choice.

RESOURCES
Ray, Robert. 2000. “Version 8 Base SAS® Performance: How Does It Stack Up?” Proceedings of the Twenty-fifth
Annual SAS Users Group International Conference. Available
http://www2.sas.com/proceedings/sugi25/25/aa/25p009.pdf.

Ray, Robert. 2003. “An Inside Look at Version 9 and 9.1 Threaded Base SAS® Procedures.” Proceedings of the
Twenty-eighth Annual SAS Users Group International Conference. Available
http://www2.sas.com/proceedings/sugi28/282-28.pdf.

Shamlin, David. 2004. “Threads Unraveled: A Parallel Processing Primer.” Proceedings of the Twenty-ninth Annual
SAS Users Group International Conference. Available http://www2.sas.com/proceedings/sugi29/217-29.pdf.

CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author:
Rick Langston
SAS Institute Inc.
SAS Campus Drive
Cary, NC 27513
Phone: (919) 677-8000
E-mail: [email protected]

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are trademarks of their respective companies.

SAS Advanced Exam Questions
100% (1)
SAS Advanced Exam Questions
11 pages
Advanced Programmingsas 94
No ratings yet
Advanced Programmingsas 94
5 pages
Sascheatsheet 170401221255
100% (1)
Sascheatsheet 170401221255
29 pages
Sas 1
100% (2)
Sas 1
50 pages
SAS Ex
No ratings yet
SAS Ex
34 pages
Sas Sample Syntax
No ratings yet
Sas Sample Syntax
34 pages
Base Five 08
No ratings yet
Base Five 08
19 pages
Vibeeshcsa15031sas1 150920141056 Lva1 App6891
No ratings yet
Vibeeshcsa15031sas1 150920141056 Lva1 App6891
44 pages
Notes For SAS Programming Fall2009
No ratings yet
Notes For SAS Programming Fall2009
88 pages
Class 5 Notes
No ratings yet
Class 5 Notes
33 pages
LBSIM Business Analytics Slides - Day 6
No ratings yet
LBSIM Business Analytics Slides - Day 6
33 pages
Base Programming Ref Sheet
No ratings yet
Base Programming Ref Sheet
4 pages
Merge
0% (1)
Merge
16 pages
SAS Material
No ratings yet
SAS Material
75 pages
Effective Use of SQL in SAS Programming
100% (1)
Effective Use of SQL in SAS Programming
8 pages
ST521 Practice FinalKeyV3
No ratings yet
ST521 Practice FinalKeyV3
11 pages
Set B
No ratings yet
Set B
9 pages
Sas Programming
No ratings yet
Sas Programming
30 pages
Sas Library Dataset Creation
No ratings yet
Sas Library Dataset Creation
44 pages
Mainframe Sas Online Training 3
No ratings yet
Mainframe Sas Online Training 3
18 pages
BD SAS Test Part 1 - 20 Mins
No ratings yet
BD SAS Test Part 1 - 20 Mins
6 pages
57 Paper
No ratings yet
57 Paper
5 pages
DATA Cars1
No ratings yet
DATA Cars1
6 pages
SAS If and Where
No ratings yet
SAS If and Where
12 pages
Efficiency Techniques and Methods Kelley Weston Q2 2009
No ratings yet
Efficiency Techniques and Methods Kelley Weston Q2 2009
46 pages
A Few Alternatives To IF-THEN-ELSE
No ratings yet
A Few Alternatives To IF-THEN-ELSE
2 pages
Keiko I. Powers, PH.D., J. D. Power and Associates, Westlake Village, CA
No ratings yet
Keiko I. Powers, PH.D., J. D. Power and Associates, Westlake Village, CA
4 pages
Imelda C. Go, Lexington County School District One, Lexington, SC
No ratings yet
Imelda C. Go, Lexington County School District One, Lexington, SC
4 pages
SAS Chapter 10
No ratings yet
SAS Chapter 10
5 pages
SAS Chapter 03
No ratings yet
SAS Chapter 03
6 pages
Advanced SAS Interview Questions and Answers
No ratings yet
Advanced SAS Interview Questions and Answers
11 pages
Sas Data Statement
No ratings yet
Sas Data Statement
17 pages
Sas 201
No ratings yet
Sas 201
17 pages
Sas Faq V1.3
No ratings yet
Sas Faq V1.3
56 pages
Sastest
No ratings yet
Sastest
7 pages
Sas 101
No ratings yet
Sas 101
17 pages
Sas 201
No ratings yet
Sas 201
17 pages
Sas 101
No ratings yet
Sas 101
17 pages
Sas 101
No ratings yet
Sas 101
17 pages
Here Is The Output Produced by The Proc Print Statement Above
No ratings yet
Here Is The Output Produced by The Proc Print Statement Above
6 pages
Sas 101
No ratings yet
Sas 101
17 pages
Sas 101
No ratings yet
Sas 101
17 pages
SAS Virtual Learning Environment Sessions
No ratings yet
SAS Virtual Learning Environment Sessions
5 pages
SAS Adv 63 No Key
No ratings yet
SAS Adv 63 No Key
35 pages
Day 1
No ratings yet
Day 1
13 pages
Sas Base & Advance
No ratings yet
Sas Base & Advance
4 pages
PSTAT 130 Final Solutions
No ratings yet
PSTAT 130 Final Solutions
10 pages
Gits Academy-: Test-Mcq'S DATE-14 Jan 2012 Q.1
No ratings yet
Gits Academy-: Test-Mcq'S DATE-14 Jan 2012 Q.1
6 pages
SAS Programming by Example (14) : Chapter 14 Efficiency Making Your Programs More Efficient
No ratings yet
SAS Programming by Example (14) : Chapter 14 Efficiency Making Your Programs More Efficient
9 pages
Sas fAQ'S 1
No ratings yet
Sas fAQ'S 1
114 pages
Carpenter's Complete Guide To The SAS (R) Macro Language
No ratings yet
Carpenter's Complete Guide To The SAS (R) Macro Language
55 pages
F054 Meet Dodiya Lab 5 jveAAqJfpG
No ratings yet
F054 Meet Dodiya Lab 5 jveAAqJfpG
6 pages
SAS Cheat Sheet
No ratings yet
SAS Cheat Sheet
2 pages
Approach, Method, and Technique
100% (1)
Approach, Method, and Technique
23 pages
Quick Reference: SAS Programming 1: Essentials
No ratings yet
Quick Reference: SAS Programming 1: Essentials
10 pages
Sas Interview
No ratings yet
Sas Interview
12 pages
Math C4 Practice
No ratings yet
Math C4 Practice
53 pages
Ericsson The Bss To Cloud Journey
No ratings yet
Ericsson The Bss To Cloud Journey
26 pages
Nomination Facility Provided by Banks MCQs With Case Study
No ratings yet
Nomination Facility Provided by Banks MCQs With Case Study
8 pages
Logiq 500 GE
No ratings yet
Logiq 500 GE
407 pages
F5 Got It Pass Class Notes 2021 June
No ratings yet
F5 Got It Pass Class Notes 2021 June
221 pages
RM 64
No ratings yet
RM 64
632 pages
The Ergonomic Posture Assessment by Comparing REBA With RULA & OWAS: A Case Study in A Gas Springs Factory
No ratings yet
The Ergonomic Posture Assessment by Comparing REBA With RULA & OWAS: A Case Study in A Gas Springs Factory
23 pages
1 Vemm Regalit .
No ratings yet
1 Vemm Regalit .
4 pages
Bourdon Pressure - Gauges PDF
No ratings yet
Bourdon Pressure - Gauges PDF
2 pages
Supported Upgrade Paths For FortiOS Firmware 5.2
0% (1)
Supported Upgrade Paths For FortiOS Firmware 5.2
20 pages
IJCRT2310639
No ratings yet
IJCRT2310639
9 pages
Bacte - Medically Significant Fungi
No ratings yet
Bacte - Medically Significant Fungi
4 pages
S4 Writing 3 - Letter of Advice - Language Focus T (Updated)
No ratings yet
S4 Writing 3 - Letter of Advice - Language Focus T (Updated)
6 pages
RTE 1503 Unit 3 Self Test
No ratings yet
RTE 1503 Unit 3 Self Test
15 pages
x100 Pad 2 User Manual PDF
No ratings yet
x100 Pad 2 User Manual PDF
29 pages
Chapter 3 Abstract
No ratings yet
Chapter 3 Abstract
19 pages
11月13日文献（陈洁）a Neural Circuit for Bergamot Essential Oil‐Induced Anxiolytic Effects
No ratings yet
11月13日文献（陈洁）a Neural Circuit for Bergamot Essential Oil‐Induced Anxiolytic Effects
13 pages
B 2 B Sales Manager Checklist
No ratings yet
B 2 B Sales Manager Checklist
1 page
Foto Electrici2
No ratings yet
Foto Electrici2
98 pages
National Apprenticeship Training Scheme (NATS)
No ratings yet
National Apprenticeship Training Scheme (NATS)
5 pages
Ejercicios Recuperacion Presente Simple 4c2ba
No ratings yet
Ejercicios Recuperacion Presente Simple 4c2ba
4 pages
Designation
No ratings yet
Designation
12 pages
CMPC Pulp: Insulation Requirement: Heat Conservation (For Personnel Protection, See Notes 3 & 4) Service
No ratings yet
CMPC Pulp: Insulation Requirement: Heat Conservation (For Personnel Protection, See Notes 3 & 4) Service
3 pages
Sabino 2017
No ratings yet
Sabino 2017
15 pages
MVP Comprehensive Resource Impacts Agreement
No ratings yet
MVP Comprehensive Resource Impacts Agreement
16 pages
Fascinating Photos of Afghanistan in The 1960s Show Life Before The Taliban
No ratings yet
Fascinating Photos of Afghanistan in The 1960s Show Life Before The Taliban
1 page
Fis2014 Fish Biology Individual Assignment (50 Marks)
No ratings yet
Fis2014 Fish Biology Individual Assignment (50 Marks)
2 pages
Chapter-5 Push and Pull Model
No ratings yet
Chapter-5 Push and Pull Model
12 pages
Housekeeping
No ratings yet
Housekeeping
1 page
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
Interview Questions for IBM Mainframe Developers
From Everand
Interview Questions for IBM Mainframe Developers
Robert Wingate
1/5 (1)
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Advanced SAS Interview Questions You'll Most Likely Be Asked
From Everand
Advanced SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Efficiency Considerations Using The SAS System: Rick Langston, SAS Institute Inc., Cary, NC

Uploaded by

Efficiency Considerations Using The SAS System: Rick Langston, SAS Institute Inc., Cary, NC

Uploaded by

SUGI 30 Applications Development

Efficiency Considerations Using the SAS® System

NOTE: There were 500000 observations read from

NOTE: There were 166666 observations read

NOTE: DATA statement used:

WHERE SPECIFICATION IN PARENTHETICAL OPTIONS

NOTE: There were 166666 observations read

filename sascode1 temp;

if x=1 and y=2 then z=3;

But consider this example:

if x=1 and y=2 then z=3;

if x=1 and y=2 then z=3;

DATA STEP AND PROC SORT

NOTE: DATA statement used:

real time 8.31 seconds

7 proc sort; by x3 obs;

NOTE: PROCEDURE SORT used:

PROC SORT WITH KEEP OPTION

NOTE: PROCEDURE SORT used:

PROC SQL WITH ORDER BY CLAUSE

NOTE: PROCEDURE SQL used:

USING DATA STEP VIEWS

DATA STEP VIEWS

NOTE: View WORK.FROMFILE.VIEW used:

NOTE: DATA statement used:

SAS DATA SET

NOTE: DATA statement used:

NOTE: DATA statement used:

FUNCTION OR HARDCODING OR ARRAY

NOTE: DATA statement used:

NOTE: DATA statement used:

13 data new; set mydata.x500000;

NOTE: DATA statement used:

STORED PROGRAM CONSIDERATIONS

RUNNING THE SAS CODE WITH 1 PASS AND 10 IF STATEMENTS

RUNNING THE SAS CODE WITH 1000 PASSES

NOTE: DATA statement used:

COMPILING THE SAS CODE AS A STORED PROGRAM

STORED PROGRAM WITH 1000 PASSES AND 10 STATEMENTS

NOTE: DATA statement used:

STORED PROGRAM WITH 1000 PASSES AND MANY STATEMENTS

NOTE: DATA statement used:

MPRINT(RUNTEST): data new3/pgm=new3;

MPRINT(RUNTEST): data pgm=new3;

NOTE: DATA statement used:

7 data pgm=xyz; run;

COMPARISON OF VARIOUS KEY LOOKUP METHODS

8 proc format cntlin=cntlin;

NOTE: PROCEDURE FORMAT used (Total process time):

FORMATS AND THE PUT FUNCTION

22 data new1a; set keytest;

NOTE: DATA statement used (Total process time):

26 proc sort data=keytest out=keytestx; by key1;

NOTE: PROCEDURE SORT used (Total process time):

27 data new1b; merge keytestx(in=want)

NOTE: There were 100000 obs read from WORK.KEYTESTX.

32 proc sort data=new1b; by i;

NOTE: PROCEDURE SORT used (Total process time):

34 data new1a; set keytest;

proc sort data=keytest out=new1b; by key1;

58 data cntlinx(index=(start)); set cntlin;

60 data new1b; set keytest;

NOTE: DATA statement used (Total process time):

74 data new1c; set keytest;

NOTE: PROCEDURE SUMMARY used (Total process time):

NOTE: PROCEDURE SUMMARY used (Total process time):

26 proc summary data=temp [threads] [nothreads];

UNIX 1, USING NOTHREADS

UNIX 1, USING THREADS

UNIX 2, USING NOTHREADS

UNIX 2, USING THREADS

You might also like