0% found this document useful (0 votes)
70 views21 pages

A Hands-On Introduction To SAS Programming: Casey Cantrell, Clarion Consulting, Los Angeles, CA

The document provides an introduction to SAS programming. It covers getting data into SAS, including reading external files and using the import wizard. It then discusses the basics of SAS programs including DATA and PROC steps and provides examples of PROC CONTENTS, PROC PRINT, and PROC UNIVARIATE.

Uploaded by

proudofsky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views21 pages

A Hands-On Introduction To SAS Programming: Casey Cantrell, Clarion Consulting, Los Angeles, CA

The document provides an introduction to SAS programming. It covers getting data into SAS, including reading external files and using the import wizard. It then discusses the basics of SAS programs including DATA and PROC steps and provides examples of PROC CONTENTS, PROC PRINT, and PROC UNIVARIATE.

Uploaded by

proudofsky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

A Hands-On Introduction to SAS Programming

Casey Cantrell, Clarion Consulting, Los Angeles, CA

ABSTRACT
This workshop is intended to give the new programmer hands-on experience working with SAS. Although we will use
tools available in the SAS windowing environment, the workshop will address basics common to SAS running on all
operating systems. Topics include how to read data into SAS, how to work with data in SAS, and how to extract
information from a SAS system file. Where applicable, we will demonstrate both programming and graphical methods
to accomplish these tasks.

INTRODUCTION
SAS is a highly sophisticated information delivery system that can perform complex statistical analysis and advanced
data management tasks. However, even the inexperienced programmer can quickly acquire the skills necessary to
convert data into information. The SAS windowing environment provides an excellent opportunity for the new
programmer to gain firsthand experience working in SAS. Programs written to run under Windows can be ported to
other operating systems.

GETTING DATA INTO SAS


If you are working with non-SAS data, before you can perform any analysis in SAS, you will need to create a SAS
system file. Although there are various ways to do this, we will concentrate on two: reading text data into SAS and
working with the Import/Export Wizard.
SAS needs the following information to create a SAS file:
1 - Where to find the input data
2 - How to read the input data
3 - Where to put the output file
In Figure 1, the DATALINES statement informs SAS that the data are instream, meaning they are included in the
program itself. The INPUT statement provides instructions for reading the data. The DATA statement tells SAS
where to store the file and what to name it. Since we are using a one-part name, SAS will create a temporary file and
write it to the WORK folder.

OUTPUT FILE DESTINATION


INPUT DATA
ATTRIBUTES

LOCATION OF
INPUT DATA

Figure 1 Reading instream data

While this method works well for small files, most of the time you will want to read data that are external to your
programs. In Windows, you can do this interactively or by writing the necessary code in your program. The obvious
advantage to writing the code is that your program then documents the source of your input file.
The keyword DATALINES tells SAS that the data are internal to the program. The analogous keyword, INFILE
directs SAS to read the input data from the file specified in the INFILE statement. There are two ways to do this.
In the example shown in Figure 2, the INFILE statement includes the fully defined file name.

FULLY DEFINED
INPUT FILE NAME

Figure 2 Fully defined INFILE statement

Typically, programmers will use the second method, which involves defining a nickname, or FILEREF. The
FILEREF serves as an abbreviated means of referring to the complete path and filename. The association is defined
through the FILENAME statement, which is like saying When I use the name Mike, I am talking about Michael Smith
who lives at 123 Main St, Apt B. San Diego, California.

An example using the FILENAME statement is shown in Figure 3.

LOCATION OF
INPUT DATA
FILEREF

Figure 3 The FILENAME statement

Once youve defined a FILEREF, you may use it for the duration of your SAS
session. When you click on the File Shortcuts icon in the Explorer window, the
FILEREF, or nickname, will appear in the Active File Shortcuts list.
FILE
SHORTCUTS

Since our example file, Pupdat is a text file, clicking on the FILEREF
icon will open the file in Notepad.
FILEREF
ICON
Figure 4 Active File Shortcuts

You may also define a FILEREF, or shortcut, interactively from the Explorer window.
To do this, first click on the File Shortcuts icon in the Explorer Window. This opens the Active File Shortcuts
window as shown in Figure 4.

Select New from the File drop-down menu.

This opens a dialogue box where you may now


define your shortcut.
If you want the shortcut defined each time you
initiate a SAS session, check the Enable at
Startup box.
Press ENTER to save the shortcut.

Figure 5 Creating a filename shortcut

The new shortcut will now appear in the list of Active File Shortcuts.

Figure 6- Active Shortcuts

THE IMPORT/EXPORT WIZARD


Although there are various ways to read foreign file types into SAS, the Import/Export Wizard is among the easiest to
use.

To start the Wizard, select Import Data from the File menu.

This opens the dialogue box shown in Figure 8. Select the data source
type for your input file.

Figure 7 Starting the Import/Export Wizard

The Wizard supports several


common formats, including
comma-delimited files (.csv),
Excel files and Microsoft Access
files.
Note that this implicitly provides
the information SAS needs to
know how to read the input data.

Figure 8 Selecting input file type

Next, we need to tell SAS where to


find the data, which well do using
the Browse function.

Figure 9 Selecting the input file

Finally, we need to tell SAS


what to do with the output file.
If we want to create a
temporary file, we need only to
provide a file name, since the
data will be written to the
default WORK library.

Figure 10 Saving a temporary file

To store the file permanently, we must provide an explicit output destination, which we do in Figure 11 by pointing to
the appropriate library. Since we have already defined it, the library we nicknamed PUPS will appear in the list of
available libraries. Had we not defined it previously, we would need to do this first.

Figure 11 Saving a permanent file

To export a file from SAS into a different file format, we would select Export data from the file menu and reverse the
process.

SAS PROGRAMS
SAS programs are built using two key components: the DATA step and the PROC step. The DATA step is used to
create SAS files and/or modify their contents. PROC steps invoke prewritten procedures typically used to perform
statistical analysis. DATA steps produce SAS files, while PROCs most often generate results. The process is
illustrated in Figure 12.

RAW
DATA

RESULTS

DATA Step

PROC statement;
Procedure
statements..;

DATA statement;
Programming
statements.. ;

PROC Step

SAS Data Set

Figure 12 SAS data processing

There are two important things to keep in mind. First, your data must be in SAS system file format before you can run
any SAS procedures. Second, you may not mix DATA and PROC steps.

THE DATA STEP


DATA steps are made up of programming statements, which may include assignment statements, conditional
operations and/or subsetting operations. DATA steps always begin with the keyword DATA, followed by the name you
want to give the file you are building. Remember that all SAS data files have two part names. If you want to create a
permanent file you need to provide both the filename and the library name.
Assignment statements assign values to new or existing variables. These values may be:

A constant

The value of another variable

The results of a mathematical expression


Conditional operations perform operations on:

Some, but not all, records

Some, but not all, conditions

IF condition is met THEN action


Subsetting operations:

Include only specific records in the output file


IF condition is met THEN include record

SAS PROCEDURES
SAS procedures begin with the keyword PROC followed by the name of the procedure and the name of the file you
want to use in the procedure. Procedures may include options and/or optional statement specific to the procedure.
Although there are myriad procedures in the SAS system, we will discuss the following five, which you are certain to
use:
PROC CONTENTS - Display information about file and its contents
PROC PRINT

- Print some or all records, some or all variables

PROC SORT

- Rearrange the order of records

PROC UNIVARIATE- Generate descriptive statistics


PROC FREQ

- Generate frequency tables and cross-tabs

For the next set of exercises, we will be working with a SAS system file named CLASS which is stored in the
SASHELP library. The SASHELP library is automatically defined each time SAS is started. The two-part name for the
file is then SASHELP.CLASS.

PROC CONTENTS
First, lets examine the contents of the file. We can do this interactively using FSVIEW as previously discussed, or we
can write a program to provide similar information by running PROC CONTENTS as shown in Figure 13.
SAS FILE NAME

PROC NAME

Figure 13 PROC CONTENTS

PROC CONTENTS lists variables in the file in alphabetical order (Figure 14). We may request an additional list
showing variables in the order they appear in the file by including the POSITION option in our program (Figure 15).

Figure 14 PROC CONTENTS listing

POSITION OPTION
Figure 15 Using the POSITION option in PROC CONTENTS

Figure 16 Variables listed in POSITION order

PROC PRINT
While PROC CONTENTS provides information about
the file, the PRINT procedure actually prints the data.
The default action for PROC PRINT is to print every
variable for every record in the file, plus an observation
number. Figures 17 and 18 show a PROC PRINT
program and the output it generates.

Figure 17 PROC PRINT

Figure 18 PROC PRINT listing

We can control both content and format of PROC PRINT output by using any of several optional statements.

The program shown in Figure 19 suppresses the


observation number and uses the variable NAME instead
by using the ID statement.

Figure 19 Using the ID statement in PROC PRINT

To select which variables are printed, we will use the VAR


statement. In Figure 20, we have elected to print only two
variables: NAME and AGE.

Figure 20 Using the VAR statement in PROC PRINT

PROC UNIVARIATE
Since it is always a good idea to run exploratory analysis
before working with a file, well run PROC UNIVARIATE to
get some additional information about our data. As seen in
Figure 22, UNIVARIATE provides several basic statistics,
including mean, mode, median, and standard deviation.
When we run UNIVARIATE without any options or optional
statements, the procedure generates statistics for every
numeric variable in the file. We may request statistics
for specific variables by listing them in the VAR statement.
An example is shown in Figure 21.

Figure 21 The VAR statement in PROC UNIVARIATE

Figure 22 Output from PROC UNIVARIATE

CREATING NEW VARIABLES


Now that we have an idea what our data set looks like, we are ready to work with the file. First, well create some new
variables. Since we are changing the data file, our program must include two statements: the DATA statement, which
names the new file and specifies its output destination, and the SET statement, which names the SAS input data set
and indicates its location.
The SET statement also provides implicit instructions about how to read the data, since SET is the keyword that tells
SAS we are reading an existing SAS data set. Information about the data structure is already stored in the descriptor
portion of the file. We need only to tell SAS where the file is stored.
In the program below we are creating a new file named students. Since we have not given it a two part name, SAS
will store it in the WORK library, and delete the file when we terminate the SAS session. Our input file is the existing
SAS file named CLASS, found in the SASHELP library folder. We are adding three new variables to the file.

NEW
OUTPUT
FILE

TEMPORARY

INPUT SAS
FILE

VALUE FROM EXISTING VARIABLE

CONSTANT

RESULT OF
OPERATION
Figure 23 - Creating new variables in SAS

SAVING YOUR PROGRAM


Since there were no syntax errors in our program, lets save it before we continue. Remember that we must explicitly
save anything we want to keep since we are running interactively.

First, make sure the Program


Editor is the active window,
then select Save as from
the File menu.
This opens a dialogue box
where you may browse to
the desired destination
folder.

Figure 24 Saving a program

In this example, we will save the file in


a folder called ClassData. Since the
file is a SAS program, we will name it
Height.sas.
Click Save to complete the process.

Figure 25 Saving a SAS program

Note that the program


name now appears at
the top of the Program
Editor window.

Figure 26 Program name shown in the Program Editor

10

To open a program into the Program Editor, click on the File menu and select Open
program to search for the desired file.

The File menu will also list recently used files, so be sure to check there first.

RECENTLY
USED
PROGRAMS

Figure 27 Opening a program


into the Program Editor

CONDITIONAL STATEMENTS
Lets add another variable to our file. Since we know from running PROC UNIVARIATE that the mean height for the
students in our class is 62.3 inches, well use a conditional assignment statement to create a new variable, which
well call Tall.

Figure 28 Using a conditional assignment statement

This time, instead of printing the entire file, well use the optional VAR
statement with PROC PRINT to print only the variables we are
interested in seeing.

Figure 29 PROC PRINT with VAR statement

11

If we wanted to restrict our analysis to only boys, we might use a subsetting IF statement to keep only observations
for males in our file as shown in Figure 30.

Note that while we read


in 19 records, our output
file contains only 10.

Figure 30 Controlling output using the subsetting IF Statement

We might want to check that are in fact 10 males in the file, by running PROC FREQ on the variable SEX.

Figure 31 PROC FREQ using the TABLE statement

PROC PRINT shows that we do in fact have all males in our file.

Figure 32 PROC PRINT using the VAR statement

12

ADDING TITLES
Although it might be obvious to us now why the listings shown in Figures 29 and 32 differ, it may not be obvious six
months from now. Its always a good idea to include titles on any output we produce. We can do this interactively or
by adding the appropriate statements to our program.

To add titles interactively,


click in the Output window to
make it the active window.

Then open the Tools menu


and select Options. From
there select Titles.

Figure 33 Adding titles through the TITLES window

This opens the Titles window. You may also reach the Titles window by typing titles into the command box.
Title1 already contains the value The
SAS System, which you may have
noticed appeared on previous listings.
To add or change titles, simply type the
desired text onto the line number. Close
the window and accept changes.

Figure 34 Adding titles through the TITLES window

When we rerun the previous program, titles now appear in the listings.
Note that blank lines are printed where title lines were left blank.

Figure 35 Adding titles

13

Titles entered using the TITLES window are global, meaning they remain in effect for the duration of the session
and will appear in every listing. Since this may not be appropriate for every table, you may prefer to add titles
statements to your program instead.
The titles statement begins with the keyword TITLE followed by the appropriate line number, then the desired text
enclosed in double quotation marks. Dont forget the semi-colon!
TITLEn Title for line number n ;

You may add up to 10 titles. In the example shown


Figure 36, titles print on lines 1, 2 and 4, leaving a
blank line since no title was specified for title 3.

Figure 36 Writing TITLES statements

Note that when you change a TITLE (TITLEn), all titles which came after it (TITLE>n) will be cleared.

PROC SORT
The SORT procedure allows us to rearrange the order of records in the file based on values for the variables named
in the BY statement.
PROC SORT DATA = filename;
BY variable;

The program shown in Figure 37


sorts the file by Sex. Character
variables are, of course, sorted
alphabetically.

Figure 37 PROC SORT

14

We may also sort by


multiple variables.
The sort shown in
Figure 38 provides a list
sorted by NAME within
SEX.

Figure 38 Sorting by two variables

The default sort order is by


ascending values. To sort
in descending order, we
need to add the
KEYWORD Descending
to the BY statement before
the sort variable name.

Figure 39 Sorting in DESCENDING order

In Figure 40, we are


sorting the file in
ascending order by
SEX, and by
descending values
for NAME within
SEX.

Figure 40 Nested sort

15

THE BY STATEMENT
The BY STATEMENT is also available as an optional statement in several other SAS procedures. When used in
procedures other than SORT, the BY STATEMENT will generate analysis for every level of the variable named in the
BY STATEMENT.
In a previous example, we used PROC UNIVARIATE to look at the distribution of HEIGHT in our CLASS file. Since
there are differences in height across gender, it might prove interesting to run separate analysis for boys and for girls.

Note that your data must


be sorted by the
variables named in the
BY STATEMENT or your
program will fail as
shown in Figure 41.

Figure 41 Using a BY statement without sorting

After sorting our file BY SEX,


we can now run separate
analysis for boys and girls.

Figure 42 PROC UNIVARIATE with a BY statement

16

PROC FREQ
Although we have used PROC PRINT to look at our data, PROC FREQ will give us greater detail and more practical
information. Although this procedure is typically used to generate frequency listings and cross tabulations, FREQ also
generates useful statistics, such as chi-square values, odds ratios, and kappa coefficients.
No preliminary analysis should be considered complete until we have looked at the distribution of variables by
running PROC FREQ.
Like PROC PRINT, when we run PROC FREQ without including any options or optional statements, we will get
frequency listings for every variable in the file. To select specific tables, we will use the optional statement TABLES
followed by the variable names. The TABLES statement is similar in this way to the VAR statement used in PRINT
and UNIVARIATE. An example is shown in Figure 43.

Figure 43 Simple frequencies

To generate cross tabs, we


need only to insert an
asterisk between the two
variable names.
In addition to simple
frequencies, our listing
includes column and row
percentages as well.

Figure 44 Cross tabulations using PROC FREQ

17

To generate a three-way cross tabulation, well add a third variable to our TABLES statement. Our output will include
two tables, one for females and one for males.

Figure 45 Three way cross tabulation using PROC FREQ

We may also control the content and format of output from PROC FREQ by using any of several options and/or
optional statements. Since n-way tables can be difficult to read, we might use the LIST option to condense the output
so it will print in a single table. Note that column and row percentages are no longer printed.

LIST
OPTION

Figure 46 Using the LIST option

18

Another alternative is to use the BY STATEMENT. Remember the BY STATEMENT will generate separate tables for
each value of the variable named in the BY STATEMENT. The file must be sorted by the variables named in the BY
statement.

Figure 47 Using the BY statement with PROC FREQ

By default, PROC FREQ does not print missing values. The MISSING option will add them to the table. In the
example in Figure 48, one student has missing values for SEX and AGE, while another is missing SEX.

MISSING
OPTION

Figure 48 Using the MISSING option

19

To request statistics, we include the KEYWORD for the desired statistic as an option on the TABLES statement. In
the program below, we have requested Chi-square tests as shown in Figure 49.

Figure 49 Requesting optional statistics from PROC FREQ

CONCLUSION
In this workshop, we have given you the opportunity to try your hand at SAS programming. Although one may
achieve a certain mastery of the SAS language, good programmers never stop learning. And, as any musician,
athlete or foreign language specialist knows, the best way to learn is by doing. In this workshop, we have covered
some of the basics and seen a few of the powerful features available in the SAS System. The rest is up to you.

REFERENCES
SAS Institute (1999) SAS Companion for the Microsoft Windows Environment, Version 8, Cary, NC: SAS Institute Inc.

TRADEMARK
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are
trademarks of their respective companies.

AUTHOR CONTACT
Casey Cantrell
Clarion Consulting
4404 Grand View Blvd.
Los Angeles, CA 90066
[email protected]

20

21

You might also like