Training Handouts
Training Handouts
The collection of software that is routinely included in the SAS system allows the end user to perform
a wide range of tasks that cover just about every aspect of business administration and function.
Essentially, the SAS system represents a one stop shopping approach to getting all the programs
needed under one simple umbrella
The software included in the typical SAS system provides tools for all sorts of projects and daily tasks.
Writing reports and creating graphics are easy using the tools provided. Research and project
management software aids in creating both operational and marketing strategies. Tools that allow for
quick and efficient data entry and retrieval make it possible to gather statistics or other information for
reports in no time at all. The SAS system usually includes components that will aid in departmental
functions that range from information systems support to human resources management and even
customer care protocols.
DATA steps typically create or modify SAS data sets. They can also be used to produce custom
designed reports. For example, you can use DATA steps to
PROC (procedure) steps are pre-written routines that enable you to analyze and process the data in a
SAS data set and to present the data in the form of a report. PROC steps sometimes create new SAS
data sets that contain the results of the procedure. PROC steps can list, sort, and summarize data.
For example, you can use PROC steps to
SAS programs consist of SAS statements. A SAS statement has two important characteristics:
It usually begins with a SAS keyword.
It always ends with a semicolon.
Note: You can specify SAS statements in uppercase or lowercase but text that is enclosed in
quotation marks is case sensitive.
SAS Names
SAS names follow a simple naming rule: All SAS variable names and data set names can be no
longer than 32 characters and must begin with a letter or the underscore (_) character. The remaining
characters in the name may be letters, digits, or the underscore character. Characters such as dashes
and spaces are not allowed. Here are some valid and invalid SAS names
Temporary SAS libraries last only for the current SAS session.
Permanent SAS libraries are available to you during subsequent SAS sessions.
For example, by specifying the library name sasdata when you create a file, you specify that the file is
to be stored in a permanent SAS data library until you delete it.
Two-Level Names
To reference a permanent SAS data set in your SAS programs, you use a two-level name:
libref.filename
In the two-level name, libref is the name of the SAS data library that contains the file, and
filename is the name of the file itself. A period separates the libref and filename.
For example, in our sample program, ABC.Student is the two-level name for the SAS data set
Student, which is stored in the library named ABC
Alternatively, you can use a one-level name (the filename only) to reference a file in a temporary
SAS library. When you specify a one-level name, the default libref Work is assumed. For
example, the one-level name Test also references the SAS data set named Test that is stored in
the temporary SAS library Work.
Overview of Data Sets
Conceptually, a SAS data set is a file that consists of two parts: a descriptor portion and a data
portion.
Descriptor Portion
The descriptor portion of a SAS data set contains information about the data set, including
the name of the data set
the date and time that the data set was created
the number of observations
the number of variables.
In addition to general information about the data set, the descriptor portion contains information about
the attributes of each variable in the data set. The attribute information includes the variable's name,
type, length, format, informat, and label.
Data Portion
The data portion of a SAS data set is a collection of data values that are arranged in a rectangular
table.
Engine V9 Indexes 0
Protection Compressed NO
Label
Data
WINDOWS
Representation
wlatin1 Western
Encoding (Windows)
Engine/Host Dependent Information
8 ActLevel Char 4
4 Age Num 8
6 Height Num 8
1 ID Char 4
2 Name Char 14
3 Sex Char 1
7 Weight Num 8
Note: Here the variables are listed in alphabetical order
A more useful way to list variable information is to list them in the order the
variables are
stored in the SAS data set, rather than alphabetically. To create such a list, use the
VARNUM option of PROC CONTENTS, like this:
Engine V9 Indexes 0
Protection Compressed NO
Label
Data
WINDOWS
Representation
wlatin1 Western
Encoding (Windows)
Engine/Host Dependent Information
1 ID Char 4
2 Name Char 14
3 Sex Char 1
4 Age Num 8
6 Height Num 8
7 Weight Num 8
8 ActLevel Char 4
Directory
Libref YUNUS
Engine V9
Member File
# Name Type Size Last Modified
When SAS encounters a syntax error, SAS identifies the error and writes the location and
explanation of the error to the SAS log. Syntax errors generally cause SAS software to
stop processing the step where the error is encountered. Common syntax errors include
DATA uspresidents;
INPUT President $ Party $ Number;
DATALINES;
Adams F 2
Lincoln R 16
Grant R 18
Kennedy D 35
;
RUN;
The following program shows the use of the INFILE statement to read the external data file:
DATA uspresidents;
INFILE '/wns/Training/President.xls';
INPUT President $ Party $ Number;
RUN;
Long records
In some operating environments, SAS assumes external files have a record length of 256 or less.
(The record length is the number of characters, including spaces, in a data line.) If your data lines
are long, and it looks like SAS is not reading all your data, then use the LRECL= option in the INFILE
statement to specify a record length at least as long as the longest record in your data file.
INFILE '/wns/Training/President.txt' LRECL=2000;
SAS OPTIONS
DATE|NODATE
NUMBER|NONUMBER
PAGENO=
PAGESIZE=
LINESIZE=
OBS=
INFORMATS
Informats are useful anytime you have non-standard data. (Standard numeric data contain only
numerals, decimal points, minus signs, and E for scientific notation.) Numbers with embedded
commas or dollar signs are examples of non-standard data.SAS have informats for reading these
types of data as well.
Dates are perhaps the most common non-standard data. Using date informats, SAS will convert
conventional forms of dates like 10-31-2003 or 31OCT03 into a number, the number of days since
January 1, 1960. This number is referred to as a SAS date value. This turns out to be extremely
useful when you want to do calculations with dates. For example, you can easily find the number of
days between two dates by subtracting one from the other.
There are three general types of informats: character, numeric, and date.
Character: - $informatw.
Numeric: - informatw.d
Date: - informatw.
The $ indicates character informats, INFORMAT is the name of the informat, w is the total width, and
d is the number of decimal places (numeric informats only). The period is very important part of the
informat name. Without a period, SAS may try to interpret the informat as a variable name, which by
default, cannot contain any special characters except the underscore. Two informats do not have
names: $w., which reads standard character data, and w.d, which reads standard numeric data.
DATA contest;
INFILE DATALINES;
INPUT Name $16. +1 Age 2. +1 Type $1. +1 Date MMDDYY10. (Score1 Score2
Score3 Score4 Score5) (4.1);
DATALINES;
Alicia Grossman 13 c 10-28-2003 7.8 6.5 7.2 8.0 7.9
Matthew Lee 9 D 10-30-2003 6.5 5.9 6.8 6.0 8.1
Elizabeth Garcia 10 C 10-29-2003 8.9 7.9 8.5 9.0 8.8
Lori Newcombe 6 D 10-30-2003 6.7 5.6 4.9 5.2 6.1
Jose Martinez 7 d 10-31-2003 8.9 9.510.0 9.7 9.0
Brian Williams 11 C 10-29-2003 7.8 8.4 8.5 7.9 8.0
;RUN;
The variable Name has an informat of $16., meaning that it is a character variable 16 columns wide.
Variable Age has an informat of three, is numeric, three columns wide, and has no decimal places.
The +1 skips over one column. Variable Type is character, and it is one column wide. Variable Date
has an informat MMDDYY10. and reads dates in the form 10-31-2003 or 10/31/2003, each 10
columns wide. The remaining variables, Score1 through Score5, all require the same informat, 4.1. By
putting the variables and the informat in separate sets of parentheses, you only have to list the
informat once.
FORMATS
Formats affect only the way that the data values appear in output, not the actual data values as they
are stored in the SAS data set
DATA contest2;
INFILE DATALINES;
INPUT Name $16. +1 Age 2. +1 Type $1. +1 Date MMDDYY10. (Score1 Score2
Score3 Score4 Score5) (4.1);
FORMAT DATE DATE9.;
DATALINES;
Alicia Grossman 13 c 10-28-2003 7.8 6.5 7.2 8.0 7.9
Matthew Lee 9 D 10-30-2003 6.5 5.9 6.8 6.0 8.1
Elizabeth Garcia 10 C 10-29-2003 8.9 7.9 8.5 9.0 8.8
Lori Newcombe 6 D 10-30-2003 6.7 5.6 4.9 5.2 6.1
Jose Martinez 7 d 10-31-2003 8.9 9.510.0 9.7 9.0
Brian Williams 11 C 10-29-2003 7.8 8.4 8.5 7.9 8.0
;
RUN;
A SAS date is a numeric value equal to the number of days since January 1, 1960.
The table below lists four dates and their values as SAS dates:
SAS has special tools for working with dates: informats for reading dates, functions for
manipulating dates, and formats for printing dates.
SAS has a variety of date informats for reading dates in many different forms. All of these informats
convert your data to a number equal to the number of days since January 1, 1960.
Setting the default century When SAS sees a date with a two-digit year like 07/04/76,
SAS has to decide in which century the year belongs. Is the year 1976, 2076, or perhaps 1776? The
system option YEARCUTOFF= specifies the first year of a hundred-year span for SAS to use. The
default value for this option is 1920, but you can change this value with the OPTIONS statement.
To avoid problems, you may want to specify the YEARCUTOFF= option whenever you have data
containing two-digit years. This statement tells SAS to interpret two-digit dates as occurring between
1950 and 2049:
OPTIONS YEARCUTOFF = 1950;
Dates in SAS expressions Once a variable has been read with a SAS date informat, it can be
used in arithmetic expressions like other numeric variables. For example, if a library book is due in
three weeks, you could find the due date by adding 21 days to the date it was checked out:
DateDue = DateCheck + 21;
You can use a date as a constant in a SAS expression by adding quotation marks and a letter D.
Input Styles:
1. List input
2. Formatted input
3. Column input
4. Mixed input
data list_input;
input name $ age sal ;
cards;
venu 24 456.09
inder 25 467.17
reddy 21 766.36
hanu 26 765.89
;
run;
Column Input style
That the INPUT statement lists the variables with their corresponding column locations in order from
left to right. However, one of the features of column input is the capability to read fields in any order.
For example, you could have read the values for InStock and BackOrd before the values for
Item and IDnum
DATA column_input_1;
INFILE DATALINES;
INPUT Name $ 1-16 Age 18-19 Type $ 21 Score1 23-25;
DATALINES;
Alicia Grossman 13 c 7.8
Matthew Lee 9 D 6.5
Elizabeth Garcia 10 C 8.9
Lori Newcombe 6 D 6.7
Jose Martinez 7 d 8.9
Brian Williams 11 C 7.8
;
RUN;
Formatted Input style
Formatted input is a very powerful method for reading both standard and nonstandard data in fixed
fields.
The @n is an absolute pointer control that moves the input pointer to a specific column number.
The @ moves the pointer to column n, which is the first column of the field that is being read.
DATA PATIENT;
INFILE DATALINES;
INPUT @1 ID $4. @6 Name $14. @21Gender $1. @23 Age 2. @26 Date MMDDYY8.
@35 Height 2. @38 Weight 3. @42 ActLevel $4. @47 Fee 6.2 ;
datalines;
2588 Ivan, H F 22 06/02/97 63 139 LOW 85.20
2586 Derber, B M 25 06/04/97 75 188 HIGH 85.20
2458 Murray, W M 27 06/05/97 72 168 HIGH 85.20
2572 Oberon, M F 28 06/05/97 62 118 LOW 85.20
2544 Jones, M M 29 06/07/97 76 193 HIGH 124.80
2574 Peterson, V M 30 06/08/97 69 147 MOD 149.75
2501 Bonaventure, T F 31 06/09/97 61 123 LOW 149.75
2552 Reberson, P F 32 06/10/97 67 151 MOD 149.75
;
run;
The +n pointer control moves the input pointer forward to a column number that is relative to the
current position. The + moves the pointer forward n columns.
DATA formatted_input_rel_1;
INFILE DATALINES;
INPUT ID $4. +1 Name $14. +1 Gender $1. +1 Age 2. +1 Date MMDDYY8.
+1 Height 2. +2 Weight 3. +1 ActLevel $4. +1 Fee 6.2 ;
datalines;
2588 Ivan, H F 22 06/02/97 63 139 LOW 85.20
2586 Derber, B M 25 06/04/97 75 188 HIGH 85.20
2458 Murray, W M 27 06/05/97 72 168 HIGH 85.20
2572 Oberon, M F 28 06/05/97 62 118 LOW 85.20
2544 Jones, M M 29 06/07/97 76 193 HIGH 124.80
2574 Peterson, V M 30 06/08/97 69 147 MOD 149.75
2501 Bonaventure, T F 31 06/09/97 61 123 LOW 149.75
2552 Reberson, P F 32 06/10/97 67 151 MOD 149.75
;
run;
Creating single observation from multiple records
data line_pointer_1;
input #1 Name $ Age Gender $ #2 City $ salary #3 State $;
datalines;
Raj 22 m
Delhi 22000
Delhi
rahul 25 m
Gurgaon 25000
Haryana
;
run;
data line_pointer_rel;
input Name $ Age Gender $ / City $ salary / State $;
datalines;
Raj 22 m
Delhi 22000
Delhi
rahul 25 m
Gurgaon 25000
Haryana
;
run;
The DLM= option If you read your data using list input, the DATA step expects your file to have
spaces between your data values. The DELIMITER=, or DLM=, option in the INFILE statement allows
you to read data files with other delimiters. The comma and tab characters are common delimiters
found in data files, but you could read data files with any delimiter character by just enclosing the
delimiter character in quotation marks after the DLM= option (i.e.,
DLM=’&’).
If the same data had tab characters between values instead of commas, then you could use the
DLM=’09’X option.
By default, SAS interprets two or more delimiters in a row as a single delimiter. If your file has missing
values, and two delimiters in a row indicate a missing value, then you will also need the DSD option in
the INFILE statement.
The DSD option The DSD (Delimiter-Sensitive Data) option for the INFILE statement does three
things for you.
It ignores delimiters in data values enclosed in quotation marks.
It does not read quotation marks as part of the data value.
It treats two delimiters in a row as a missing value.
The DSD option assumes that the delimiter is a comma. If your delimiter is not a comma then you can
use the DLM= option with the DSD option to specify the delimiter.
CSV files Comma-separated values files, or CSV files, are a common type of file that can be read
with the DSD option. Many programs, such as Microsoft Excel, can save data in CSV format. These
files have commas for delimiters and consecutive commas for missing values; if there are commas in
any of the data values, and then those values are enclosed in quotation marks.
DATA music_2;
INFILE datalines DLM = ',' DSD ;
INPUT BandName :$30. GigDate :MMDDYY10. EightPM NinePM TenPM ElevenPM;
datalines;
Lupine Lights,12/3/2003,45,63,70,32
Awesome Octaves,12/15/2003,17,28,44,12
"Stop, Drop, and Rock-N-Roll",1/5/2004,34,62,77,91
The Silveyville Jazz Quartet,1/18/2004,38,30,42,43
Catalina Converts,1/31/2004,56,,65,34
;
run;
Colon modifier tells SAS to read for the length of the informat (30 for BandName and 10 for
GigDate), or until it encounters a delimiter, whichever comes first. Because the names of the bands
are longer than the default length of 8 characters, we use the :$30. informat for BandName to read
up to 30 characters
Missover
By default, SAS will go to the next data line to read more data if SAS has reached the end of the data
line and there are still more variables in the INPUT statement that have not been assigned values.
The MISSOVER option tells SAS that if it runs out of data, don’t go to the next data line. Instead,
assign missing values to any remaining variables.
Example: ‘missover_1.txt’
Lupine Lights,12/3/2003,45,63,70
Awesome Octaves,12/15/2003,17,28,44,12
"Stop, Drop, and Rock-N-Roll",1/5/2004,34,62,77,91
The Silveyville Jazz Quartet,1/18/2004,38,30,42,43
Catalina Converts,1/31/2004,56,,65,34
DATA music_3;
INFILE 'C:\Users\Yunus khwaja\Desktop\missover_1.txt' dlm = ',' DSD
MISSOVER;
INPUT BandName :$30. GigDate :MMDDYY10. EightPM NinePM TenPM ElevenPM ;
run;
proc print ; run;
Truncover
You need the TRUNCOVER option when you are reading data using column or formatted input and some data
lines are shorter than others. If a variable’s field extends past the end of the data line, then, by default, SAS will
go to the next line to start reading the variable’s value. This option tells SAS to read data for the variable until it
reaches the end of the data line, or the last column specified in the format or column range, whichever comes
first.
This program uses column input to read the address file. Because some of the addresses stop
before the end of the variable Street’s field (columns 20 through 35), you need the TRUNCOVER
option. Without the TRUNCOVER option, SAS would try to go to the next line to read the data for
Street on the first and third records.
Example: ‘Truncover.txt’
John Garcia 114 Maple Ave.
Sylvia Chung 1302 Washington Drive
Martha Newton 45 S.E. 14th St.
DATA homeaddress;
INFILE 'C:\Users\Yunus khwaja\Desktop\truncover.txt' truncover;
INPUT Name $ 1-13 Number 15-18 Street $ 20-35;
RUN;
Note: TRUNCOVER is similar to MISSOVER. Both will assign missing values to variables if the data
line ends before the variable’s field starts. But when the data line ends in the middle of a variable field,
TRUNCOVER will take as much as is there, whereas MISSOVER will assign the variable a missing
value.
IMPORT Procedure
There are a few things that PROC IMPORT does for you that make it easy to read certain types of
data files. PROC IMPORT will scan your data file and automatically determine the variable types
(character or numeric), will assign proper lengths to the character variables, and can recognize some
date formats.2 PROC IMPORT will treat two consecutive delimiters in your data file as a missing
value, will read values enclosed by quotation marks, and assign missing values to variables when it
runs out of data on a line. Also, if you want, you can use the first line in your data file for the variable
names. The IMPORT procedure actually writes a DATA step for you, and after you submit your
program, you can look in the Log window to see the DATA step it produced.
Where, the file you want to read follows the DATAFILE= option, and the name of the SAS data set
you want to create follows the OUT= option. SAS will determine the file type by the extension of the
file as shown in the following table.
If your file does not have the proper extension, or your file is of type DLM, then you must use the
DBMS= option in the PROC IMPORT statement. Use the REPLACE option if you already have a SAS
data set with the name you specified in the OUT= option, and you want to overwrite it. Here is the
general form of PROC IMPORT with both the REPLACE and the DBMS options:
The IMPORT procedure will, by default, get variable names from the first line in your data file. If you
do not want this, then add the GETNAMES=NO statement after the PROC IMPORT statement.
PROC IMPORT will assign the variables the names VAR1, VAR2, VAR3, and so on. Also if your data
file is type DLM, PROC IMPORT assumes that the delimiter is a space. If you have a different
delimiter, then specify it in the DELIMITER= statement. The following shows both these
statements:
PROC IMPORT DATAFILE = ’filename’ OUT = data-set
DBMS = DLM REPLACE;
GETNAMES = NO;
DELIMITER = ’delimiter-character’;
RUN;
Microsoft Access Files If you want to read Microsoft Access files, then instead of using the
DATAFILE= option, you need a DATABASE= and a DATATABLE=option as follows4:
Where, data-set is the SAS data set you want to export, and filename is the name you make up for
the output data file. The following statement tells SAS to read a temporary SAS data set named
HOTELS and write a comma-delimited file named Hotels.csv in a directory named MyRawData on the
C drive (Windows):
SAS uses the last part of the filename, called the file extension, to decide what type of file to create.
You can also specify the file type by adding the DBMS= option to the PROC EXPORT statement. The
following table shows the filename extensions and DBMS identifiers currently available with Base SAS
software. If you specify the DBMS option, then it takes precedence over the file extension.
Notice that for space-delimited files, there is no standard extension so you must use the DBMS=
option. The following statement, containing the DBMS= option, tells SAS to create a spacedelimited
file named Hotels.spc. The REPLACE option tells SAS to replace any file with the same name.
If you want to create a file with a delimiter other than a comma, tab, or space, then you can add the
DELIMITER statement. If you use the DELIMITER statement, then it does not matter what file
extension you use, or what DBMS identifier you specify, the file will have the delimiter that you specify
in the DELIMITER statement. For example, the following would produce a file, Hotels.txt, that has the
ampersand (&) as the delimiter:
The following program writes a plain text, tab-delimited file that you can read with any text editor or
word processor:
Because the name of the output file ends with .txt and there is no DELIMITER statement, SAS will
write a tab-delimited file. If you run this program, your log will contain the following note about
the output file:
Any format that you have assigned to variables in the SAS data set will be applied by PROC
EXPORT. If you want to change a format, use a FORMAT statement in a DATA step before running
PROC EXPORT.