A Hands-On Introduction To SAS Programming: Casey Cantrell, Clarion Consulting, Los Angeles, CA
A Hands-On Introduction To SAS Programming: Casey Cantrell, Clarion Consulting, Los Angeles, CA
ABSTRACT
This workshop is intended to give the new programmer hands-on experience working with SAS. Although we will use
tools available in the SAS windowing environment, the workshop will address basics common to SAS running on all
operating systems. Topics include how to read data into SAS, how to work with data in SAS, and how to extract
information from a SAS system file. Where applicable, we will demonstrate both programming and graphical methods
to accomplish these tasks.
INTRODUCTION
SAS is a highly sophisticated information delivery system that can perform complex statistical analysis and advanced
data management tasks. However, even the inexperienced programmer can quickly acquire the skills necessary to
convert data into information. The SAS windowing environment provides an excellent opportunity for the new
programmer to gain firsthand experience working in SAS. Programs written to run under Windows can be ported to
other operating systems.
LOCATION OF
INPUT DATA
While this method works well for small files, most of the time you will want to read data that are external to your
programs. In Windows, you can do this interactively or by writing the necessary code in your program. The obvious
advantage to writing the code is that your program then documents the source of your input file.
The keyword DATALINES tells SAS that the data are internal to the program. The analogous keyword, INFILE
directs SAS to read the input data from the file specified in the INFILE statement. There are two ways to do this.
In the example shown in Figure 2, the INFILE statement includes the fully defined file name.
FULLY DEFINED
INPUT FILE NAME
Typically, programmers will use the second method, which involves defining a nickname, or FILEREF. The
FILEREF serves as an abbreviated means of referring to the complete path and filename. The association is defined
through the FILENAME statement, which is like saying When I use the name Mike, I am talking about Michael Smith
who lives at 123 Main St, Apt B. San Diego, California.
LOCATION OF
INPUT DATA
FILEREF
Once youve defined a FILEREF, you may use it for the duration of your SAS
session. When you click on the File Shortcuts icon in the Explorer window, the
FILEREF, or nickname, will appear in the Active File Shortcuts list.
FILE
SHORTCUTS
Since our example file, Pupdat is a text file, clicking on the FILEREF
icon will open the file in Notepad.
FILEREF
ICON
Figure 4 Active File Shortcuts
You may also define a FILEREF, or shortcut, interactively from the Explorer window.
To do this, first click on the File Shortcuts icon in the Explorer Window. This opens the Active File Shortcuts
window as shown in Figure 4.
The new shortcut will now appear in the list of Active File Shortcuts.
To start the Wizard, select Import Data from the File menu.
This opens the dialogue box shown in Figure 8. Select the data source
type for your input file.
To store the file permanently, we must provide an explicit output destination, which we do in Figure 11 by pointing to
the appropriate library. Since we have already defined it, the library we nicknamed PUPS will appear in the list of
available libraries. Had we not defined it previously, we would need to do this first.
To export a file from SAS into a different file format, we would select Export data from the file menu and reverse the
process.
SAS PROGRAMS
SAS programs are built using two key components: the DATA step and the PROC step. The DATA step is used to
create SAS files and/or modify their contents. PROC steps invoke prewritten procedures typically used to perform
statistical analysis. DATA steps produce SAS files, while PROCs most often generate results. The process is
illustrated in Figure 12.
RAW
DATA
RESULTS
DATA Step
PROC statement;
Procedure
statements..;
DATA statement;
Programming
statements.. ;
PROC Step
There are two important things to keep in mind. First, your data must be in SAS system file format before you can run
any SAS procedures. Second, you may not mix DATA and PROC steps.
A constant
SAS PROCEDURES
SAS procedures begin with the keyword PROC followed by the name of the procedure and the name of the file you
want to use in the procedure. Procedures may include options and/or optional statement specific to the procedure.
Although there are myriad procedures in the SAS system, we will discuss the following five, which you are certain to
use:
PROC CONTENTS - Display information about file and its contents
PROC PRINT
PROC SORT
For the next set of exercises, we will be working with a SAS system file named CLASS which is stored in the
SASHELP library. The SASHELP library is automatically defined each time SAS is started. The two-part name for the
file is then SASHELP.CLASS.
PROC CONTENTS
First, lets examine the contents of the file. We can do this interactively using FSVIEW as previously discussed, or we
can write a program to provide similar information by running PROC CONTENTS as shown in Figure 13.
SAS FILE NAME
PROC NAME
PROC CONTENTS lists variables in the file in alphabetical order (Figure 14). We may request an additional list
showing variables in the order they appear in the file by including the POSITION option in our program (Figure 15).
POSITION OPTION
Figure 15 Using the POSITION option in PROC CONTENTS
PROC PRINT
While PROC CONTENTS provides information about
the file, the PRINT procedure actually prints the data.
The default action for PROC PRINT is to print every
variable for every record in the file, plus an observation
number. Figures 17 and 18 show a PROC PRINT
program and the output it generates.
We can control both content and format of PROC PRINT output by using any of several optional statements.
PROC UNIVARIATE
Since it is always a good idea to run exploratory analysis
before working with a file, well run PROC UNIVARIATE to
get some additional information about our data. As seen in
Figure 22, UNIVARIATE provides several basic statistics,
including mean, mode, median, and standard deviation.
When we run UNIVARIATE without any options or optional
statements, the procedure generates statistics for every
numeric variable in the file. We may request statistics
for specific variables by listing them in the VAR statement.
An example is shown in Figure 21.
NEW
OUTPUT
FILE
TEMPORARY
INPUT SAS
FILE
CONSTANT
RESULT OF
OPERATION
Figure 23 - Creating new variables in SAS
10
To open a program into the Program Editor, click on the File menu and select Open
program to search for the desired file.
The File menu will also list recently used files, so be sure to check there first.
RECENTLY
USED
PROGRAMS
CONDITIONAL STATEMENTS
Lets add another variable to our file. Since we know from running PROC UNIVARIATE that the mean height for the
students in our class is 62.3 inches, well use a conditional assignment statement to create a new variable, which
well call Tall.
This time, instead of printing the entire file, well use the optional VAR
statement with PROC PRINT to print only the variables we are
interested in seeing.
11
If we wanted to restrict our analysis to only boys, we might use a subsetting IF statement to keep only observations
for males in our file as shown in Figure 30.
We might want to check that are in fact 10 males in the file, by running PROC FREQ on the variable SEX.
PROC PRINT shows that we do in fact have all males in our file.
12
ADDING TITLES
Although it might be obvious to us now why the listings shown in Figures 29 and 32 differ, it may not be obvious six
months from now. Its always a good idea to include titles on any output we produce. We can do this interactively or
by adding the appropriate statements to our program.
This opens the Titles window. You may also reach the Titles window by typing titles into the command box.
Title1 already contains the value The
SAS System, which you may have
noticed appeared on previous listings.
To add or change titles, simply type the
desired text onto the line number. Close
the window and accept changes.
When we rerun the previous program, titles now appear in the listings.
Note that blank lines are printed where title lines were left blank.
13
Titles entered using the TITLES window are global, meaning they remain in effect for the duration of the session
and will appear in every listing. Since this may not be appropriate for every table, you may prefer to add titles
statements to your program instead.
The titles statement begins with the keyword TITLE followed by the appropriate line number, then the desired text
enclosed in double quotation marks. Dont forget the semi-colon!
TITLEn Title for line number n ;
Note that when you change a TITLE (TITLEn), all titles which came after it (TITLE>n) will be cleared.
PROC SORT
The SORT procedure allows us to rearrange the order of records in the file based on values for the variables named
in the BY statement.
PROC SORT DATA = filename;
BY variable;
14
15
THE BY STATEMENT
The BY STATEMENT is also available as an optional statement in several other SAS procedures. When used in
procedures other than SORT, the BY STATEMENT will generate analysis for every level of the variable named in the
BY STATEMENT.
In a previous example, we used PROC UNIVARIATE to look at the distribution of HEIGHT in our CLASS file. Since
there are differences in height across gender, it might prove interesting to run separate analysis for boys and for girls.
16
PROC FREQ
Although we have used PROC PRINT to look at our data, PROC FREQ will give us greater detail and more practical
information. Although this procedure is typically used to generate frequency listings and cross tabulations, FREQ also
generates useful statistics, such as chi-square values, odds ratios, and kappa coefficients.
No preliminary analysis should be considered complete until we have looked at the distribution of variables by
running PROC FREQ.
Like PROC PRINT, when we run PROC FREQ without including any options or optional statements, we will get
frequency listings for every variable in the file. To select specific tables, we will use the optional statement TABLES
followed by the variable names. The TABLES statement is similar in this way to the VAR statement used in PRINT
and UNIVARIATE. An example is shown in Figure 43.
17
To generate a three-way cross tabulation, well add a third variable to our TABLES statement. Our output will include
two tables, one for females and one for males.
We may also control the content and format of output from PROC FREQ by using any of several options and/or
optional statements. Since n-way tables can be difficult to read, we might use the LIST option to condense the output
so it will print in a single table. Note that column and row percentages are no longer printed.
LIST
OPTION
18
Another alternative is to use the BY STATEMENT. Remember the BY STATEMENT will generate separate tables for
each value of the variable named in the BY STATEMENT. The file must be sorted by the variables named in the BY
statement.
By default, PROC FREQ does not print missing values. The MISSING option will add them to the table. In the
example in Figure 48, one student has missing values for SEX and AGE, while another is missing SEX.
MISSING
OPTION
19
To request statistics, we include the KEYWORD for the desired statistic as an option on the TABLES statement. In
the program below, we have requested Chi-square tests as shown in Figure 49.
CONCLUSION
In this workshop, we have given you the opportunity to try your hand at SAS programming. Although one may
achieve a certain mastery of the SAS language, good programmers never stop learning. And, as any musician,
athlete or foreign language specialist knows, the best way to learn is by doing. In this workshop, we have covered
some of the basics and seen a few of the powerful features available in the SAS System. The rest is up to you.
REFERENCES
SAS Institute (1999) SAS Companion for the Microsoft Windows Environment, Version 8, Cary, NC: SAS Institute Inc.
TRADEMARK
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are
trademarks of their respective companies.
AUTHOR CONTACT
Casey Cantrell
Clarion Consulting
4404 Grand View Blvd.
Los Angeles, CA 90066
[email protected]
20
21