Sas Programmers Guide
Sas Programmers Guide
4 Programmer’s
®
Guide: Essentials
SAS® Documentation
August 8, 2024
The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2019. SAS® 9.4 Programmer’s Guide: Essentials. Cary, NC:
SAS Institute Inc.
SAS® 9.4 Programmer’s Guide: Essentials
Copyright © 2019, SAS Institute Inc., Cary, NC, USA
PART 1 Introduction 1
PART 2 Syntax 39
Chapter 5 / Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Definition of SAS Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Ways to Create Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Creating a New Variable in a Formatted INPUT Statement . . . . . . . . . . . . . . . . . . . . . . . . 79
Manage Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Variable Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Variable Type Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Automatic Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
SAS Variable Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Missing Variable Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Numeric Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Examples: Create and Modify SAS Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Examples: Control Output of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Examples: Reorder and Align Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Examples: Convert Variable Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Examples: Use Automatic Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Examples: Use Variable Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Examples: Manage Missing Variable Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Examples: Manage Problems Related to Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Examples: Encrypt Variable Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Appendix 3 / Updating Data Using the MODIFY Statement and the KEY= Option . . . . . . . . . . . . . . 879
Updating Data Using the MODIFY Statement and the KEY= Option . . . . . . . . . . . . . 879
x Contents
xi
Audience
This document is appropriate for all users of the Base SAS programming language.
New users can run examples to quickly demonstrate each feature. Experienced
users can refer to the associated concepts topics for advanced details.
This document covers the common concepts for Base SAS syntax. See also these
reference books:
n Base SAS Procedures Guide
PART 1
Introduction
Chapter 1
The SAS Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2
SAS Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Chapter 3
DATA Step Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2
3
1
The SAS Language
The SAS System is the primary programming environment in the SAS 9.4 software
product and industry-specific solutions from SAS.
4 Chapter 1 / The SAS Language
In SAS Viya products, the SAS System is one of several programming environments.
The SAS System is an addition to the action-based cloud analytic programming
environment named SAS Cloud Analytic Services (CAS). The SAS System is also a
client of the CAS server. Using the CAS engine provided with SAS Viya, you can
connect to the CAS server and use Base SAS language statements to read and
manipulate data from a CAS session in your SAS session. In addition, you can load
data from a SAS session to a CAS session. However, you cannot submit CAS
actions to a CAS server with the Base SAS language.
n applications development
SAS Files
When you work with the Base SAS language, you use files that are created and
maintained by SAS, as well as files that are created and maintained by other
systems that are not related to SAS.
Files that are created and maintained by SAS are referred to as SAS files. Table 1.1
summarizes the key SAS file types.
SAS data set a set of variables and observations that are stored as a
unit. A SAS data set is similar to a relational table, which
consists of columns and rows, except a SAS data set
stores descriptive information about the variables and
observations in addition to the data.
SAS view a virtual data set that extracts data values from other files
in order to provide a customized and dynamic
representation of the data.
There are two types of views:
DATA step view
is a stored DATA step program.
6 Chapter 1 / The SAS Language
SAS catalog a file that stores many different types of information that
are used in a SAS job. Examples include instructions for
reading and printing data values, or function key settings
for SAS user interfaces.
SAS stored program a type of SAS file that contains compiled code that you
create and save for repeated use.
The most commonly used SAS file is the SAS data set. In most cases, the
functionality that is available for a SAS data set is available for a SAS view.
External Files
Data files that you use to read and write data, but which are in a structure unknown
to SAS, are called external files. External files can store any of the following:
n data that has not been processed by SAS
n PROC steps
A DATA step consists of a group of statements in the SAS language that create or
manipulate SAS data sets. It is called a DATA step because every step begins with a
DATA statement. The following figure shows how a DATA step creates a SAS data
set in its simplest form:
You typically use a DATA step to read data from an input source, process it, and
create a SAS data set. In addition, you can perform the following tasks:
n compute the values for new variables
n produce new SAS data sets by subsetting, merging, and updating existing data
sets
Once your data is accessible as a SAS data set, you can analyze the data and write
reports by using SAS procedures. The following figure shows how a PROC step can
be used to operate on a SAS data set in its simplest form:
A group of procedure statements is called a PROC step. Every PROC step begins
with a statement whose name begins with the word PROC and is followed by a
keyword that describes the purpose of the procedure. Using a PROC step, you can
perform simple tasks such as sorting data in a data set (PROC SORT) and printing a
data set (PROC PRINT). In addition, you can analyze data in SAS data sets to
produce statistics, tables, reports, charts, and plots; create SQL queries; and
perform other analyses and operations on your data.
Global Statements
Global statements are SAS language elements that are assigned outside the DATA
step and PROC step and provide information to both DATA steps and PROC steps.
The primary use of global statements is defining data access for a SAS session. The
secondary use is setting system options that modify the default behavior of the
SAS System or one of its subsystems.
SAS Engines
Base SAS provides several engines to read and write SAS data sets and selected
external files. For a summary of engines, see Chapter 13, “SAS Engines,” on page
289.
In SAS®9 and SAS Viya, the default Base SAS engine is the V9 engine. Base SAS
software includes an alternate storage engine: SAS Scalable Performance Data
(SPD) engine. Most SAS language features are supported for both V9 engine and
SPD engine, but not all.
The SAS language functionality described in this book applies to both storage
engines. Functionality that is specific to the SAS V9 engine is provided in SAS V9
LIBNAME Engine: Reference. Functionality that is specific to SPD engine is provided
in SAS Scalable Performance Data Engine: Reference.
Base SAS Language Definitions 9
Additional Languages
Base SAS software includes the following SAS languages in addition to the DATA
step and PROC step. These language statements are submitted through SAS
procedures. The languages can query and manipulate files created with the V9
engine, the SPD engine, and third-party databases that are accessed with
SAS/ACCESS software.
SAS SQL PROC SQL a SAS implementation of the ANSI SQL: 1992
core standard. A benefit of SAS SQL is that it
is fully integrated with the SAS language: it
uses SAS data types, can be managed with
SAS system options, and fully supports SAS
data set options, formats, and informats. It
supports explicit SQL pass-through to
external DBMS when appropriate
SAS/ACCESS software is licensed.
SAS SQL statements are submitted from a
SAS session by using the SQL procedure.
For more information, see SAS SQL Procedure
User’s Guide.
%macro mymacro ;
<statements>; Macro language
%mend;
data mylibref.mydata;
<statements>;
run ;
DATA step
proc tabulate
data =mylibref.mydata;
<statements>; PROC step
run ;
proc sql ;
<statements>; SAS SQL
run ;
proc ds2 ;
<statements>; DS2 Procedures for other languages
run ;
proc fedsql ;
<statements>; FedSQL
run ;
1 System options determine how SAS initializes its interfaces with your computer
hardware and the operating environment, how it reads and writes data, and
other global functions. These can be set in your SAS program, or in a
configuration file.
2 The Output Delivery System controls how DATA step and procedure output is
formatted and displayed. The default output for an interactive SAS program is
HTML. Optional output formats include PostScript (PDF), an output data set,
and RTF (for Word files), among others. ODS options can be set in your SAS
program, or in a configuration file.
3 The SAS macro language enables you to define SAS programming shortcuts.
Using the SAS macro language is optional. Character values, numeric values,
and SAS statements are associated with a variable that is defined in the
%MACRO statement. Later, the variable can be specified in a DATA or PROC
step with an ampersand (&). When the SAS System encounters the macro
variable, it inserts the associated character value, numeric value, or SAS
statements and includes them in its processing. The macro facility provides an
easy way to substitute values or code segments in a SAS program. To execute a
macro, you must precede the name of the macro by a percent (%) sign (for
example, %test).
12 Chapter 1 / The SAS Language
6 The PROC step analyzes the data set MyLibref.MyData and returns an output.
The title and footnote that were defined earlier are included in the output.
SAS Studio a web- Getting Started with Programming in SAS Studio and SAS Studio:
based User’s Guide
graphical
user
interface
that
enables
you to
write and
run SAS
code from
your
browser
and
display
the
results.
SAS Windowing a desktop Getting Started Under Windows in SAS Companion for Windows
Environment graphical
Working in the SAS Windowing Environment in SAS 9.4 Companion for
user
UNIX Environments
interface
that Overview of Windows in the z/OS Environment in SAS 9.4 Companion
enables for z/OS
you to
Ways to Submit SAS Programs 13
write and
run SAS
code and
display
the
results.
SAS Enterprise a SAS Enterprise Guide Online Documentation and SAS Enterprise
Guide Windows Guide Tutorial
client
applicatio
n and
graphical
user
interface
for writing
and
submitting
SAS code.
It uses
menus and
wizards to
guide you
through
code
developm
ent.
Non-interactive a mode in Running SAS in Batch Mode in SAS Companion for Windows
batch mode which you Environments
place your
“Noninteractive and Batch Modes in UNIX Environments” in SAS
SAS
Companion for UNIX Environments
statement
s in a file Destinations of SAS Output Files and Directing SAS Log and
and Procedure Output in SAS Companion for z/OS
submit
them for
execution
along with
the
control
statement
s and
system
commands
required at
your site.
Interactive line a mode in Use SAS Interactively or in Batch Mode in SAS Companion for
mode which you Windows Environments
enter
program
14 Chapter 1 / The SAS Language
When you customize your SAS session, we recommend that you set system options
as follows:
1 Store system options with the settings that you want in a configuration file.
When you start SAS, these settings are in effect. See the SAS documentation
for your operating environment for more information about the configuration
file.
In some operating environments, you can use both a system-wide and a user-
specific configuration file.
2 Specify system options in your SAS session to override the configured system
options. System options that are specified directly in your SAS session are
available for the duration of the SAS session.
Operating Environment Information 15
By placing SAS system options in a configuration file, you can avoid having to
specify the options every time you start SAS. For example, you can specify the
NODATE system option in your configuration file to prevent the date from
appearing at the top of each page of your output.
To execute SAS statements automatically each time you start SAS, store them in
an autoexec file. SAS executes the statements automatically after the system is
initialized. You can activate this file by specifying the AUTOEXEC= system option.
See the SAS documentation for your operating environment for information about
how autoexec files should be set up so that they can be located by SAS.
Any SAS statement can be included in an autoexec file. For example, you can set
report titles, footnotes, or create macros or macro variables automatically with an
autoexec file.
on the host operating environment. The characteristics of external files also vary
according to the host environment. See the following for host-specific information:
n SAS Companion for UNIX Environments
2
SAS Processing
The following figure shows a high-level view of SAS processing using a DATA step
and a PROC step. The figure focuses primarily on the DATA step.
18 Chapter 2 / SAS Processing
n In the DATA step, you include SAS statements that contain instructions for
processing the data.
n As each DATA step in a SAS program is compiling or executing, SAS generates a
log that contains processing messages and error messages. These messages can
help you debug a SAS program.
n PROC steps typically analyze and process data in the form of a SAS data set,
but they can sometimes be used to create SAS data sets.
Remote access
enables you to read input data from nontraditional sources such as a TCP/IP
socket or a URL. SAS treats this data as if it were coming from an external file.
Remote access is available for the following data sources:
Azure
specifies the access method that enables you to access data in Microsoft
Azure Data Lake Storage. The Azure access method is supported only for
SAS Viya.
SAS catalog
specifies the access method that enables you to reference a SAS catalog as
an external file.
Clipboard
specifies the access method that enables you to read or write text data to
the clipboard on the host computer.
DATAURL
specifies the access method that enables you to access remote files by using
the DATAURL access method.
EMAIL
specifies the access method that enables you to send electronic mail
programmatically from SAS using the SMTP (Simple Mail Transfer Protocol)
email interface.
FILESRV
specifies the access method that enables you to store and retrieve user
content using the SAS Viya Files service. The FILESRV access method is
supported only for SAS Viya.
FTP
specifies the access method that enables you to use File Transfer Protocol
(FTP) to read from or write to a file from any host computer that is
connected to a network with an FTP server running.
Hadoop
specifies the access method that enables you to access files on a Hadoop
Distributed File System (HDFS) whose location is specified in a
configuration file.
S3
specifies the access method that enables you to access Amazon S3 files.
The S3 access method is supported only for SAS Viya.
SFTP
specifies the access method that enables you to use Secure File Transfer
Protocol (SFTP) to read from or write to a file from any host computer that
is connected to a network with an Open SSH SSHD server running.
TCP/IP socket
specifies the access method that enables you to read from or write to a
Transmission Control Protocol/Internet Protocol (TCP/IP) socket.
URL
specifies the access method that enables you to use the uniform resource
locator (URL) to read from and write to a file from any host computer that is
connected to a network with a URL server running.
20 Chapter 2 / SAS Processing
WebDAV
specifies the access method that enables you to use the WebDAV protocol
to read from or write to a file from any host computer that is connected to a
network with a WebDAV server running.
ZIP
specifies the access method that enables you to access ZIP files by using
zlib services.
For more information about these output types, see Chapter 27, “Output,” on page
669.
General Information
If you are running SAS in a client/server environment (for example, you are using
SAS Enterprise Guide), the SAS server administrator can restrict access to files and
directories on the host system. In addition, when a SAS session is in a locked-down
state, certain access methods, functions, CALL routines, and procedures are
restricted by default. For more information, see “Sign On to Locked-Down SAS
Sessions” in SAS/CONNECT User’s Guide.
When SAS is in a locked-down state, the following SAS language elements are not
available by default:
Functions and
CALL Routines Access Methods Procedures Other
PEEK function
PEEKC function
PEEKCLONG
function
PEEKLONG
function
FTP
EMAIL
HADOOP (enables PROC HADOOP)
HTTP (enables PROC HTTP and PROC SOAP)
SOCKET
TCPIP
URL (enables PROC HTTP and PROC SOAP
If you attempt to use a resource that is locked down, SAS issues an error message
to the SAS log. If the SAS session is configured for the SAS logging facility, SAS
issues an error message to the Audit.Lockdown logger.
For more information, see the following resources. For SAS 9.4:
n LOCKDOWN System Option and LOCKDOWN Statement in SAS Intelligence
Platform: Application Server Administration Guide
n Locked-Down Servers in SAS Intelligence Platform: Security Administration
Guide
For Viya 3.5, see “LOCKDOWN System Option and LOCKDOWN Statement” in SAS
Viya Administraton: Programming Run-Time Servers.
To see the procedures that do not execute when the SAS server is in a locked-down
state, see “Restrictions” and “Interactions” syntax information for the individual
procedures in the Base SAS Procedures Guide.
Restricted Features
Access to permanent z/OS data sets and UFS files and directories is not permitted
unless enabled in the lockdown list. This restriction applies to all SAS features,
most notably FILENAME and LIBNAME statements in SAS programs that are
submitted for execution on the server. This restriction also applies to the ability to
list files on the server through SAS clients such as SAS Enterprise Guide. When
SAS is in the locked-down state, SAS does not permit access to uncataloged z/OS
data sets except through externally allocated ddnames that are established by the
server administrator. However, there are no restrictions on creating temporary z/OS
data sets and UFS files, and processing them within the context of a single client
session. The z/OS data sets are considered temporary if they are allocated
DISP=(NEW,DELETE). External files are considered temporary if they are assigned
using the FILENAME device of TEMP. All members of the client WORK library are
considered temporary.
24 Chapter 2 / SAS Processing
The SAS server administrator at your organization is responsible for the content of
the lockdown list. Therefore, if you need to access a z/OS data set or UFS file that
is unavailable in the locked-down state, contact your server administrator.
Disabled Features
The following SAS procedures, which are specific to z/OS, cannot be executed
when SAS is in the locked-down state:
PDS SOURCE
PDSCOPY TAPECOPY
RELEASE TAPELABEL
The following DATA step functions, which are specific to z/OS, cannot be executed
when SAS is in the locked-down state:
ZVOLLIST ZDSATTR
ZDSLIST ZDSRATT
ZDSNUM ZDSXATT
ZDSIDNM ZDSYATT
The following access method, which is specific to z/OS, cannot be executed when
SAS is in the locked-down state:
VTOC
The SYSMSG function can be placed after the function call in a DATA step to
display lockdown-related file access errors.
This condition is true for the following functions, as well as for any other functions
that take physical pathname locations as input:
n DCREATE
n FILEEXIST
n FILENAME
n RENAME
3
DATA Step Processing
You can use the DATA step to perform the following tasks:
n subset, modify, combine, and update existing SAS data sets
n write reports
26 Chapter 3 / DATA Step Processing
n retrieve information
The following DATA statement specifies to create a data set called Test_Results.
data test_results;
n data that you can remotely access through a SAS catalog entry, clipboard, data
URL, email, FTP protocol, Hadoop Distributed File System, TCP/IP socket, URL,
WebDAV protocol, or through zlib services
n data that is stored in a Database Management System (DBMS) or other vendor's
data files
Reading Raw Data: Examples 27
Usually, DATA steps read input data records from only one of the first three sources
of input. However, DATA steps can use a combination of some or all of the sources.
Example Code
The components of a DATA step that produce a SAS data set from raw data stored
in an external file are outlined here.
data Weight; 1
infile 'your-input-file'; 2
input IDnumber $ week1 week16; 3
WeightLoss=week1-week16; 4
run; 5
1 Begin the DATA step and create the SAS data set Weight.
2 Specify the external file that contains your data.
3 Read a record and assign values to three variables.
4 Calculate a value for the variable WeightLoss.
5 Execute the DATA step.
6 Print the data set Weight using the PRINT procedure.
7 Execute the PRINT procedure.
Key Ideas
n The DATA statement specifies a name for the SAS data set.
n The INFILE statement specifies the path to the input file, within quotation marks.
See Also
Concepts
n Chapter 15, “Raw Data,” on page 353
Reference
n “DATA Statement” in SAS DATA Step Statements: Reference
n “INFILE Statement” in SAS DATA Step Statements: Reference
n “INPUT Statement” in SAS DATA Step Statements: Reference
Example Code
This example reads raw data from instream data lines.
data Weight2; 1
input IDnumber $ week1 week16; 2
AverageLoss=week1-week16; 3
datalines; 4
2477 195 163
2431 220 198
2456 173 155
2412 135 116
; 5
proc print data=Weight2; 6
run;
1 Begin the DATA step and create the SAS data set Weight2.
2 Read a data line and assign values to three variables.
3 Calculate a value for the variable WeightLoss2.
4 Begin the data lines.
5 Signal the end of data lines with a semicolon and execute the DATA step.
6 Print the data set Weight2 using the PRINT procedure.
Key Ideas
n Values for the variables are supplied following a DATALINES statement as data
lines.
See Also
Concepts
n Chapter 15, “Raw Data,” on page 353
Reference
n “DATALINES Statement” in SAS DATA Step Statements: Reference
n “INPUT Statement” in SAS DATA Step Statements: Reference
Example Code
You can also take advantage of options in the INFILE statement when you read
instream data lines. This example shows the use of the MISSOVER option, which
assigns missing values to variables for records that contain no data for those
variables.
data weight2; 1
infile datalines missover; 2
input IDnumber $ Week1 Week16; 3
WeightLoss2=Week1-Week16;
datalines;
2477 195 163 4
2431
2456 173 155
2412 135 116
; 5
1 Begin the DATA step and create the SAS data set Weight2.
2 Specify the INFILE statement, the Datalines variable, and the MISSOVER
option.
3 Read a data line and assign values to three variables.
4 Begin the data lines.
5 Signal the end of data lines with a semicolon and execute the DATA step.
30 Chapter 3 / DATA Step Processing
Key Ideas
n The INFILE statement provides options for reading a new input data record if the
DATALINES statement does not find values in the current input line for all the
variables in the data set.
n The MISSOVER option can be used with the INFILE statement to assign missing
values to variables that do not contain values in records.
See Also
Concepts
n Chapter 15, “Raw Data,” on page 353
Reference
n “INFILE Statement” in SAS DATA Step Statements: Reference
Example Code
This example shows how to use multiple input files as instream data to your
program.
data all_errors; 1
length filelocation $ 60; 2
input filelocation; 3
infile daily filevar=filelocation 4
filename=daily 5
end=done; 6
do while (not done); 7
input Station $ Shift $ Employee $ NumberOfFlaws;
output;
end;
put 'Finished reading ' daily=;
datalines;
pathmyfile_A 8
pathmyfile_B
Reading Raw Data: Examples 31
pathmyfile_C
;
1 Begin the DATA step and create the SAS data set All_Errors.
2 Use the LENGTH statement to specify a length for the variable Filelocation.
3 Use the INPUT statement to read the content of the Filelocation variable.
4 Use the INFILE statement to define temporary variables for the input data. In
the FILEVAR= option, specify the Filelocation variable. The FILEVAR= option
enables you to specify a physical filename whose change in value causes the
INFILE statement to close the current input file and open a new one. When the
next INPUT statement executes, it reads from the new file that the FILEVAR=
variable specifies.
5 In the FILENAME= option of the INFILE statement, specify the physical name of
the currently opened input file. In this example, the file name is Daily.
6 Use the END= option of the INFILE statement to specify a variable that SAS
sets to 1, when the current input data record is the last in the input file. In this
example, the name of the variable is Done. Until SAS processes the last data
record, the END= variable is set to 0.
7 Use the DOWHILE statement to execute the INPUT statement in a DO loop
repetitively while the NOT DONE condition is true. The INPUT statement reads
the input data and assigns values to four variables.
8 Specify the file pathnames following the DATALINES statement.
9 The program then sorts the observations by Station, and creates a sorted data
set called Sorted_Errors.
10 The print procedure prints the results.
32 Chapter 3 / DATA Step Processing
Key Ideas
The DATALINES statement does not provide input options for reading data. However,
you can specify INFILE options to manipulate how data is supplied to the DATALINES
statement.
See Also
Concepts
n Chapter 15, “Raw Data,” on page 353
Reference
n “DO WHILE Statement” in SAS DATA Step Statements: Reference
n “INFILE Statement” in SAS DATA Step Statements: Reference
Reading Data from SAS Data Sets 33
Example Code
This example reads data from one SAS data set, generates a value for a new
variable, and creates a new data set.
data average_loss; 1
set weight; 2
Percent=round((AverageLoss * 100) / Week1); 3
run; 4
1 Begin the DATA step and create the SAS data set Average_Loss.
2 Specify the name of the existing SAS data set Weight in the SET statement.
3 Calculate a value for the variable Percent.
4 Execute the DATA step.
Key Ideas
The SET statement enables you to create a new SAS data set from an existing SAS
data set.
See Also
Concepts
n Chapter 14, “SAS Data Sets,” on page 321
Reference
n “SET Statement” in SAS DATA Step Statements: Reference
34 Chapter 3 / DATA Step Processing
Example Code
1 Begin the DATA step and create a SAS data set Investment.
2 Calculate a value based on a $2,000 capital investment and 7% interest each
year from 1990 through 2009. Calculate variable values for one observation per
iteration of the DO loop.
3 Write each observation to the data set Investment.
4 Write a note to the SAS log proving that the DATA step iterates only once.
5 Execute the DATA step.
6 To see your output, print the Investment data set with the PRINT procedure.
7 Use the FORMAT statement to write numeric values with dollar signs, commas,
and decimal points.
8 Execute the PRINT procedure.
DATA Step Processing Time 35
Key Ideas
n You can use assignment statements and SAS language statements to create
calculated variables.
n This example defines beginning and end dates for the calculation with assignment
statements. Then it uses a DO loop and a SUM statement to define the calculation.
The SAS data set Investment contains the results of the calculation.
See Also
Concepts
n Chapter 14, “SAS Data Sets,” on page 321
n Chapter 12, “SAS Libraries,” on page 247
n Chapter 13, “SAS Engines,” on page 289
n Chapter 16, “Database and PC Files,” on page 385.
Reference
n “Assignment Statement” in SAS DATA Step Statements: Reference
n “DO WHILE Statement” in SAS DATA Step Statements: Reference
n “Sum Statement” in SAS DATA Step Statements: Reference
The two phases do not occur simultaneously: that is, the DATA step compiles first,
and then it executes. For more information about these phases, see “How the DATA
Step Processes Data” on page 863.
36 Chapter 3 / DATA Step Processing
Understanding these processing times and how they relate to the structure of your
SAS programs might be helpful when you are looking for ways to improve
performance. In general, the more statements a DATA step processes, the longer
the compilation time. Alternatively, DATA steps processing large numbers of
observations tend to have longer execution times because they are more I/O-
intensive.
For example, a very large DATA step job that is not I/O-intensive (that is, it must
process a relatively small number of observations) might need to be rewritten to
reduce complexity and to eliminate repetitive and unused code. DO loops and user-
defined functions created with PROC FCMP can reduce compilation time by
decreasing the amount of code that must be compiled. For more information about
how to improve performance when running CPU-intensive programs, see
“Techniques for Optimizing CPU Performance” on page 647.
If most of the time used by the DATA step is for processing hundreds of
observations, then other techniques designed to optimize I/O might be more useful.
For more information about how to improve performance when running I/O-
intensive programs, see “Techniques for Optimizing I/O” on page 638.
Several SAS system options provide information that can help you minimize
processing time and optimize performance. For example, the FULLSTIMER option
collects and displays performance statistics on each DATA step so that you can
determine which resources were used for each step of data processing. For more
information about this option and about optimization in general, see Chapter 25,
“Optimizing System Performance,” on page 635.
The following example shows how to estimate the compilation time for a very large
DATA step job that has a small number of observations. The program uses the
DATETIME function with the %PUT macro statement to calculate the compilation
start time. It then uses the _N_ automatic variable to find the execution start time
(SAS always sets this variable to 1 at the start of the execution phase). By
calculating the difference between the two times, the program returns the total
compilation time of the DATA step.
data a;
if _N_ = 1 then do;
endTime = datetime();
put 'Starting execution of
DATA step: ' endTime:DATETIME20.3;
timeDiff=endTime-&startTime;
put 'The Compile time for this DATA Step is
approximately ' timeDiff:time20.6;
end;
/* Lots of DATA step code */
run;
Stored Compiled DATA Step Programs 37
Example Code 3.1 Log for Finding Compilation and Execution Time
Note: Macro statements and macro variables are resolved at compilation time and
do not affect the time it takes to execute the DATA step. For information, see
“Getting Started with the Macro Facility” in SAS Macro Language: Reference, and
“SAS Programs and Macro Processing” in SAS Macro Language: Reference.
Stored compiled programs are available for the V9 engine and DATA step
applications only. For more information, see “Stored, Compiled DATA Step
Programs” in SAS V9 LIBNAME Engine: Reference.
38 Chapter 3 / DATA Step Processing
39
PART 2
Syntax
Chapter 4
Words and Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Chapter 5
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Chapter 6
Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Chapter 7
SAS Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Chapter 8
SAS Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Chapter 9
Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Chapter 10
Dates and Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Chapter 11
Component Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
40
41
4
Words and Names
SAS Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Definition of a SAS Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Types of Words or Tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Placement and Spacing of Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
SAS Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Definition of a SAS Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Rules for Most SAS Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Length Rules for Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Extending SAS Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
See Also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
SAS Name Literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Definition of SAS Name Literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Important Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Using Name Literals in BY Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Avoiding Errors When Using Name Literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
See Also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Variable Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Rules for Variable Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Extended Rules for SAS Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
See Also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Member Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Rules for Member Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Extended Rules for SAS Member Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
See Also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Data Set Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Definition of a Data Set Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Structure of a Data Set Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Special Data Set Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Rules for Naming SAS Data Sets and Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Extended Rules for Naming SAS Data Sets and Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
See Also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Examples: SAS Words and Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Example: Create a Variable Name Containing Blanks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Example: Create a SAS Data Set Name Containing a Special Character . . . . . . . . . . . . 64
Example: Manage Placement and Spacing of Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Example: Specify the FIRST. and LAST. BY Variables as Name Literals . . . . . . . . . . . . . 66
42 Chapter 4 / Words and Names
SAS Words
_new
yearcutoff
year_99
descending
_n_
literal
consists of 1 to 32,767 bytes enclosed in single or double quotation marks. Here
are some examples of literals:
n 'Chicago'
SAS Words 43
n "1990-91"
n 'Amelia Earhart'
Note: The surrounding quotation marks identify the character string as a literal,
but SAS does not store these marks as part of the literal string.
number
in general, is composed entirely of numeric digits, with an optional decimal point
and a leading plus or minus sign. SAS recognizes numeric values in the following
forms as number tokens: scientific (E−) notation, hexadecimal notation, missing
value symbols, and date and time literals. Here are some examples of numbers:
n 5683
n 2.35
n 0b0x
n -5
n 5.4E-1
n '24aug90'd
special character
is usually any single keyboard character other than letters, numbers, the
underscore, and the blank. In general, each special character is a single word,
although some two-character operators, such as ** and <=, form a single word.
The blank can end a name or a number, but it is not a word. Here are some
examples of special characters:
n =
n ;
n '
n +
n @
n /
n You can begin a statement on one line and continue it on another line, but you
cannot split a word between two lines.
n A blank is not treated as a character in a SAS statement unless it is enclosed in
quotation marks as a literal or part of a literal. Therefore, you can put multiple
blanks any place in a SAS statement where you can put a single blank. It has no
effect on the syntax.
n The rules for recognizing the boundaries of words or tokens determine the use
of spacing between them in SAS programs. If SAS can determine the beginning
of each token due to cues such as operators, you do not need to include blanks.
If SAS cannot determine the beginning of each token, you must use blanks.
SAS Names
n user-supplied names
Note: The rules are more flexible for SAS variable names, data set names, view
names, and item store names than for other language elements. See Rules for
Variable Names and Rules for Member Names.
SAS Names 45
n The length of a SAS name depends on which element it is assigned to. Many SAS
names can be 32 bytes long; others have a maximum length of 8 bytes. For a list
of SAS names and their maximum length, see Table 4.1.
n The first character must be an English letter (A, a, B, b, C, c, . . . , Z, z,) or an
underscore (_). Subsequent characters can be letters, numeric digits (0, 1, . . ., 9),
or underscores.
n Special characters, except for the underscore, are not allowed. In filerefs only,
you can use the dollar sign ($), the number sign (#), and the at sign (@).
n SAS reserves a few names for automatic variables and variable lists, SAS data
sets, and librefs.
o When creating variables, do not use the names of special SAS automatic
variables (for example, _N_ and _ERROR_) or special variable list names (for
example, _CHARACTER_, _NUMERIC_, and _ALL_).
o When associating a libref with a SAS library, do not use these libref names:
n Sashelp
n Sasmsg
n Sasuser
n Work
o When you create SAS data sets, do not use these names:
n _NULL_
n _DATA_
n _LAST_
n When assigning a fileref to an external file, do not use the file name SASCAT.
n When you create a macro variable, do not use names that begin with SYS.
Maximum Length
Language Element Names in Bytes
Array names 32
Catalog names 32
Engine names 8
Fileref names 8
Function names 16
Index names 32
Macro names 32
Password names 8
1. Some examples of SAS library member names are data set names, view names, and catalog names. For more
information, see “Member Types” in Base SAS Procedures Guide.
SAS Names 47
Maximum Length
Language Element Names in Bytes
View names 32
Window names 32
When these system option values are set, the maximum number of characters that
you can use for a SAS member name is determined by the number of bytes that are
used to store one character. This value is set by the SAS encoding value for your
SAS session. For more information, see “Determining the Encoding of a SAS Data
Set” in SAS National Language Support (NLS): Reference Guide.
VALIDVARNAME=ANY or VALIDMEMNAME=EXTEND must be set to allow the
use of National Language Support (NLS) characters.
When these system options are not set to ANY or EXTEND, the default session
encoding is used and only one-byte characters are allowed. For more information
about the default session encoding in SAS, see “Default SAS Session Encoding” in
SAS National Language Support (NLS): Reference Guide.
Note: A SAS member name includes names of data sets, DATA step views, PROC
SQL views, catalogs, stored, compiled DATA step programs, and item stores.
The SAS encodings for western languages use one byte of storage to store one
character. Therefore, in western languages, you can use 32 characters for these SAS
48 Chapter 4 / Words and Names
names. The SAS encoding for some Asian languages use one to two bytes of
storage to store one character. The Unicode encoding, UTF-8, supports one to four
bytes of storage for a single character. When the SAS encoding uses four bytes to
store one character, the maximum length of one of these SAS names is eight
characters.
All SAS encodings support the characters A–Z and a–z as one-byte characters.
Follow these instructions for finding the maximum number of characters that can
be used for a SAS name:
2 In the table “SBCS, DBCS, and Unicode Encoding Values Used to Transcode
Data,” find the maximum number of bytes per character for the SAS encoding.
This table is in SAS National Language Support (NLS): Reference Guide.
3 Find the maximum number of bytes for a SAS name from Table 4.1. Divide this
number by the bytes per character. The result is the maximum number of
characters that you can use for the SAS name.
See Also
Examples
n “Example: Create a Variable Name Containing Blanks”
n “Example: Manage Placement and Spacing of Words”
Statements
n “OPTIONS Statement” in SAS Global Statements: Reference
System Options
n “VALIDMEMNAME= System Option” in SAS System Options: Reference
n “VALIDVARNAME= System Option” in SAS System Options: Reference
SAS Name Literals 49
When the VALIDVARNAME system option is set to V=7, the first character of a SAS
variable name must be an English letter (A, a, B, b, C, c, . . . , Z, z,) or an underscore
(_). Subsequent characters can be letters, numeric digits (0, 1, . . ., 9), or
underscores.
Name literals enable you to use many more characters in the name, including
blanks, national characters, and numerals as the first character in the name. To use
a numeral as the first character in the name, however, you must set the
VALIDVARNAME= option to ANY:
options validvarname=any;
data test;
"1ABC"n="hello";
run;
n DBMS table
n item store
n SAS view
n statement label
To use characters in a name literal other than _, A–Z, or a–z, you must set either the
VALIDVARNAME=ANY or VALIDMEMNAME=EXTEND system options. The
following table specifies the options that you must set to use SAS name literals.
Name literals are especially useful for expressing DBMS column and table names
that contain special characters and for including national characters in SAS names.
Important Restrictions
n You can use a name literal only for variables, statement labels, DBMS column
and table names, SAS data sets, SAS view, and item stores.
n When the name literal of a SAS data set name, a SAS view name, or an item
store name contains any characters that are not allowed when
VALIDMEMNAME=COMPAT, then you must set the system option
VALIDMEMNAME=EXTEND. See “VALIDMEMNAME= System Option” in SAS
System Options: Reference.
n When the name literal of a variable, DBMS table, or DBMS column contains any
characters that are not allowed when VALIDVARNAME=V7, then you must set
the system option VALIDVARNAME=ANY. See “VALIDVARNAME= System
Option” in SAS System Options: Reference.
n If you use either the percent sign (%) or the ampersand (&), then you must use
single quotation marks in the name literal in order to avoid interaction with the
SAS Macro Facility.
n When the name literal of a DBMS table or column contains any characters that
are not valid for SAS rules, you might need to specify a SAS/ACCESS LIBNAME
statement option. For more details and examples about the SAS/ACCESS
LIBNAME statement and about using DBMS table and column names that do
not conform to SAS naming conventions, see SAS/ACCESS for Relational
Databases: Reference.
n In a quoted string, SAS preserves and uses leading blanks, but SAS ignores and
trims trailing blanks.
n Blanks between the closing quotation mark and the n are not valid when you
specify a name literal.
n Note that even if you set VALIDVARNAME=ANY, the V6 engine does not
support names that have intervening blanks.
Variable Names 51
See Also
Examples
n “Example: Create a Variable Name Containing Blanks”
n “Example: Specify the FIRST. and LAST. BY Variables as Name Literals”
Statements
n “BY Statement” in SAS DATA Step Statements: Reference
n “OPTIONS Statement” in SAS Global Statements: Reference
System Options
n “VALIDMEMNAME= System Option” in SAS System Options: Reference
n “VALIDVARNAME= System Option” in SAS System Options: Reference
Variable Names
determines which rules apply to the variables that you create in your SAS session,
as well as to variables that you want to read from existing data sets.
The VALIDVARNAME= system option has three settings (V7, UPCASE, and ANY),
each with varying degrees of flexibility for variable names. If you do not specify the
VALIDVARNAME= system option in your SAS session, the default value, V7, is
automatically assigned to your SAS session.
The table below shows a summary of the rules for naming SAS variables when the
VALIDVARNAME= system option is set to V7. These are the default settings in
Base SAS. In some SAS applications, such as SAS Visual Analytics and SAS Cloud
Analytic Services, the VALIDVARNAME= system options are set by default to allow
the most flexibility for naming SAS variables. These rules are summarized in Table
4.5.
Variable Names
(with VALIDVARNAME=V7)
n cannot contain special characters (except for the underscore), blanks, or national
characters.
n must begin with a letter of the Latin alphabet (A–Z, a–z) or the underscore.
n can contain mixed-case letters. SAS stores and writes the variable name in the
same case that is used in the first reference to the variable.
Note that SAS internally converts all variable names to uppercase.
For example, the names cat, Cat, and CAT all represent the same variable.
Variable Names
(with VALIDVARNAME=UPCASE)
n cannot contain special characters (except for the underscore), blanks, or national
characters.
n must begin with a letter from the Latin alphabet (A–Z, a–z) or the underscore.
Variable Names 53
Variable Names
(with VALIDVARNAME=UPCASE)
n can contain mixed-case letters. SAS stores and writes the variable name in the
same case that is used in the first reference to the variable.
Note that SAS internally converts all variable names to uppercase.
For example, the names cat, Cat, and CAT all represent the same variable.
The following table shows a summary of the rules for naming SAS variables when
the highest level of flexibility is allowed (that is, when the VALIDVARNAME=
system option is set to ANY).
Variable Names
(with VALIDVARNAME=ANY)
n can contain special characters including / \ * ? " < > | : - . A name that contains
special characters must be specified as a name literal.
n can begin with any character, including blanks, national characters, special
characters, and multibyte characters.
n preserves leading blanks, but trailing blanks are ignored.
n can contain mixed-case letters. SAS stores and writes the variable name in the
same case that is used in the first reference to the variable.
Note that SAS converts all variable names to uppercase.
For example, the names cat, Cat, and CAT all represent the same variable.
n cannot contain all blanks.
IMPORTANT Throughout SAS, using the name literal syntax with variable names
that exceed the 32-byte limit or have excessive embedded quotation marks might
cause unexpected results. The intent of the VALIDVARNAME=ANY system option is
to enable compatibility with other DBMS variable (column) naming conventions, such
as allowing embedded blanks and national characters.
Note: If VALIDVARNAME=V7 and you use any characters other than letters,
numerals, or underscores), then you must express the variable name as a name literal
and you must set VALIDVARNAME=ANY. If the name includes either the percent sign
(%) or the ampersand (&), then you must use single quotation marks in the name
literal in order to avoid interaction with the SAS Macro Facility. See SAS Name
Literals and Avoiding Errors When Using Name Literals.
54 Chapter 4 / Words and Names
See Also
Examples
n “Example: Create a Variable Name Containing Blanks”
n “Example: Specify the FIRST. and LAST. BY Variables as Name Literals”
n “Example: Create a SAS Data Set Name Containing a Special Character”
Statements
n “OPTIONS Statement” in SAS Global Statements: Reference
System Options
n “VALIDMEMNAME= System Option” in SAS System Options: Reference
n “VALIDVARNAME= System Option” in SAS System Options: Reference
Member Names
The rules for member names have expanded to provide more functionality. The
setting of the “VALIDMEMNAME= System Option” in SAS System Options:
Reference determines which rules you apply to the member names that you create
in your SAS session. The VALIDMEMNAME= system option has three settings, each
with varying degrees of flexibility for member names. If you do not specify the
VALIDMEMNAME= system option in your SAS session the default is COMPATIBLE.
Table 4.6 Summary of Default Rules for Naming SAS Member Names
Member Names
(with VALIDMEMNAME=COMPATIBLE)
n Names must begin with a letter from the Latin alphabet (A–Z, a–z) or an
underscore. Subsequent characters can be letters from the Latin alphabet,
numerals, or underscores.
n Names cannot contain blanks or special characters except for the underscore.
Member Names 55
Member Names
(with VALIDMEMNAME=COMPATIBLE)
n Names can contain mixed-case letters. SAS internally converts the member name
to uppercase. Note that the names, customer, Customer, and CUSTOMER all
represent the same member name.
Table 4.7 Summary of Extended Rules for Naming SAS Member Names
Variable Names
(with VALIDMEMNAME=EXTEND)
n The name can include special characters, except for the / \ * ? " < > |: -. characters.
Note: The SPD Engine does not allow ‘.’ (the period) anywhere in the member
name.
n The name must contain at least one character (letters, numbers, valid special
characters, and national characters).
n The length of the name can be up to 32 bytes.
n Null bytes are not allowed. Names cannot begin with a blank or a ‘.’ (the period).
Note: The SPD Engine does not allow ‘$’ as the first character of the member
name.
n Leading and trailing blanks are deleted when the member is created.
n Names can contain mixed-case letters. SAS internally converts the member name
to uppercase. Note that the names, customer, Customer, and CUSTOMER all
represent the same member name.
IMPORTANT Throughout SAS, using the name literal syntax with variable names
that exceed the 32-byte limit or have excessive embedded quotation marks might
cause unexpected results. The intent of the VALIDVARNAME=ANY system option is
to enable compatibility with other DBMS variable (column) naming conventions, such
as allowing embedded blanks and national characters.
Note: Regardless of the value of the VALIDMEMNAME= system option, a member
name cannot end in the special character # followed by three digits. This is because it
would conflict with the naming conventions for generation data sets. Using such a
member name results in an error.
56 Chapter 4 / Words and Names
Variable Names
(with VALIDMEMNAME=EXTEND)
Note: When VALIDMEMNAME=EXTEND, SAS data set names, SAS data view
names, and item store names must be written as a SAS name literal if the name
includes blank spaces, special characters, or national characters. If you use either the
percent sign (%) or the ampersand (&), then you must use single quotation marks in
the name literal in order to avoid interaction with the SAS Macro Facility. For more
information, see “SAS Name Literals”.
See Also
Statements
n “OPTIONS Statement” in SAS Global Statements: Reference
System Options
n “VALIDMEMNAME= System Option” in SAS System Options: Reference
n “VALIDVARNAME= System Option” in SAS System Options: Reference
Follow the rules for naming SAS member names when naming SAS data sets. See
Rules for Member Names on page 54 for more information.
If you do not specify a name for the output data set in a DATA statement, SAS
automatically assigns a default data set name. If you do not specify a name for the
input data set in a SET statement, SAS automatically uses the last data set that
was created. For more information, see “Special Data Set Names” on page 59.
Here are some statements that might require you to supply a name for a data set:
n DATA= option
n MERGE statement
Data Set Names 57
n MODIFY statement
n OPEN function
n SET statement
n SQL Procedure
n UPDATE statement
Note: SAS data sets and SAS Views that share the same library cannot have the
same name.
Levels
The complete name of every SAS data set has three levels. You assign the first two
levels and SAS supplies the third. The form for a SAS data set name is as follows:
When you refer to SAS data sets in your program statements, use a one- or two-
level name. You can use a one-level name when the data set is in the temporary
library Work. In addition, if the reserved libref User is assigned, you can use a one-
level name when the data set is in the permanent library User. Use a two-level
name when the data set is in some other permanent library that you have
established.
One-level Names
A one-level name consists of just the data set name. You can omit the libref, and
refer to data sets with a one-level name in the following form:
Data sets with one-level names are automatically assigned to one of two SAS
libraries:
Work
a SAS library that is used for temporarily saving data sets with one-level names.
Temporarily means that the contents of the library are deleted at the end of the
SAS job or session. Data sets with one-level names are stored in the Work
library by default.
User
a SAS library that is used for permanently saving data sets with one-level
names.
Most commonly, they are assigned to the temporary library Work and are deleted
at the end of a SAS job or session. If you have associated the libref User with a SAS
library or used the USER= system option to set the User library, data sets with one-
level names are stored in that library.
See “User Library” and “Work Library (Temporary)” for more information about
using the Work and User libraries.
Two-level Names
A two-level name consists of both the libref and the data set name. The form most
commonly used to create, read, or write to SAS data sets in permanent SAS
libraries is the two-level name as shown here:
Data Set Names 59
When you create a new SAS data set, the libref indicates where it is to be stored.
When you reference an existing data set, the libref tells SAS where to find it.
n _LAST_
n _NULL_
For example, when the following program executes, SAS creates three data sets in
the WORK library, naming them DATA1, DATA2, and DATA3:
data _data_;
x=1;
run;
data;
y=2;
run;
data _data_;
z=3;
run;
This feature enables you to automate the naming of output data sets without the
risk of overwriting your data. The names are unique when they are written to the
60 Chapter 4 / Words and Names
WORK library, and they continue to increment numerically for the duration of the
SAS session.
See Splitting a data set into smaller data sets for some practical examples that
show how to use the SAS DATAn naming convention.
The _LAST_= system option enables you to designate a data set as the _LAST_ data
set.
See “Example: Use the _LAST_= System Option to Specify the Default Input Data
Set” on page 72 for an example that shows how to use the _LAST_ system option.
Using _NULL_ causes SAS to execute the DATA step as if it were creating a new
data set, but no observations or variables are written to an output data set. This
process can be a more efficient use of computer resources if you are using the
DATA step for some function, such as report writing, for which the output of the
DATA step does not need to be stored as a SAS data set.
Table 4.8 Summary of Default Rules for Naming SAS Data Sets and Variables
IMPORTANT In the Linux operating environment, SAS reads only data set
names that are written in all lowercase characters.
Table 4.9 Summary of Extended Rules for Naming SAS Data Sets and Variables
IMPORTANT In the Linux operating environment, SAS reads only data set
names that are written in all lowercase characters.
See Also
Examples
n “Example: Create a SAS Data Set Name Containing a Special Character”
n “Example: Create a One-Level Data Set Name”
n “Example: Use the Automatic Naming Feature for Naming Data Sets”
1. In the UNIX operating environment, SAS reads only data set names that are written in all lowercase characters
Examples: SAS Words and Names 63
n “Example: Use the _LAST_= System Option to Specify the Default Input Data
Set”
Functions
n OPEN function
Statements
n DATA Statement
n SET statement
n MERGE statement
n MODIFY statement
n “OPTIONS Statement” in SAS Global Statements: Reference
n UPDATE statement
System Options
n _LAST_= system option
n USER= system option
n “VALIDMEMNAME= System Option” in SAS System Options: Reference
n “VALIDVARNAME= System Option” in SAS System Options: Reference
n VIEW= option
Example Code
The following example illustrates creating a variable name that contains blanks
options validvarname=any; /* 1 */
data mydata;
'First Name'n='John'; /* 2 */
run;
64 Chapter 4 / Words and Names
Key Ideas
See Also
n “Definition of SAS Name Literals”
Example Code
The following example illustrates creating a new SAS data set name that contains
the special character $:
options validmemname=extend; /* 1 */
Examples: SAS Words and Names 65
data 'my$data'n; /* 2 */
x=1;
y=3;
sum=x+y;
run;
Key Ideas
See Also
n “Definition of SAS Name Literals”
Example Code
In the following example, blanks are not required because SAS can determine the
boundary of every word by examining the beginning of the next word.
total=x+y;
66 Chapter 4 / Words and Names
In the following example, the equal sign marks the end of the word total. The plus
sign, another special character, marks the end of the word x. The last special
character, the semicolon, marks the end of the y word. Though blanks are not
needed to end any words in this example, you can add them for readability.
total = x + y;
In the following example, blanks are required because SAS cannot recognize the
individual words without the spaces. Without blanks, the entire statement up to the
semicolon fits the rules for a name. Therefore, the statement requires blanks to
distinguish individual names and numbers.
input group 15 room 20;
Key Ideas
See Also
n “Definition of a SAS Word” on page 42
Example Code
The following example illustrates specifying FIRST. and LAST. BY variables as name
literals:
options validvarname=any; /* 1 */
Examples: SAS Words and Names 67
data sedanTypes;
set cars;
by 'Sedan Types'n; /* 2 */
if 'first.Sedan Types'n then type=1; /* 3 */
else type=2;
run;
Key Ideas
See Also
n “BY Statement” in SAS DATA Step Statements: Reference
Example Code
The following example illustrates the use of one-level names in SAS statements:
options user='c:\temp'; /* 1 */
data test; /* 2 */
x=1;
run;
data samptest; /* 3 */
x=1;
y=6;
s=x*y+y;
run;
1 USER=system option specifies the name of the default permanent SAS library.
When you specify the USER= system option, any data set that you create with a
one-level name is permanently stored in the specified library.
2 The DATA statement creates the permanent data set, Test, in the SAS system
library, User. Note that you do not have to reference User as a part of your data
set name.
3 The DATA statement creates the permanent data set, Samptest, in the current
directory.
Note: You do not have to reference the USER= system option prior to the DATA
step or reference User as a part of your data set name in order for SAS to create
Samptest in the User library.
85 options user='C:\temp';
86 data test;
87 x=1;
88 run;
89 data samptest;
90 x=1;
91 y=6;
92 s=x*y+y;
93 run;
Key Ideas
n Data sets with one-level names are automatically assigned either the Work or
User library. You can omit the libref and refer to data sets with a one-level name.
n Most commonly, data sets are assigned to the temporary library, Work, and are
deleted at the end of a SAS job or session.
n If you have associated the libref User with a SAS library or used the USER= system
option to set the User library, data sets with one-level names are stored in that
library.
See Also
n DATA Statement
n One-Level Names
n OPTIONS Statement
Example Code
The following example illustrates the use of two-level names in SAS statements:
libname finance 'C:\finance'; /* 1 */
data finance.balances; /* 2 */
rate=0.03;
months=12;
balance=1000;
endbal=(rate*months)+balance;
run;
1 The LIBNAME statement associates a SAS library with the libref, Finance. You
can reference the libref in subsequent DATA or PROC steps to create, read, or
write SAS data sets in the permanent library.
2 The DATA statement references the Finance libref and tells SAS to create a new
SAS data set in the Finance library and name the new data set Balances. The
new data set contains four variables and one observation.
Key Ideas
n A two-level data set name is the most commonly used form to create, read, or
write SAS data sets in a permanent SAS library.
Examples: SAS Words and Names 71
n When you create a new SAS data set, the libref indicates where it is to be stored.
When you reference an existing data set, the libref tells SAS where to find it.
See Also
n DATA Statement
n LIBNAME Statement
n Two-Level Names
Example Code
The following example illustrates the use of the DATAn automatic naming feature
in the SAS DATA step. If you do not specify a name for the output data set or
specify it as DATA (or data as it is case insensitive), SAS automatically assigns
names to the data sets that you create as DATA1, DATA2, and so on up to
DATA9999. SAS assigns a successive name to every unnamed data set.
The example contains three unnamed output data sets and one named data set.
data;
setsashelp.class;
run;
data DATA;
set sashelp.air;
run;
data sample;
set sashelp.cars;
run;
data data;
setsashelp.flags;
run;
Output 4.3 SAS Log: Use the DATAn Automatic Naming Feature
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.DATA1 has 19 observations and 5 variables.
NOTE: There were 144 observations read from the data set SASHELP.AIR.
NOTE: The data set WORK.DATA2 has 144 observations and 2 variables.
NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: The data set WORK.SAMPLE has 428 observations and 15 variables.
NOTE: There were 220 observations read from the data set SASHELP.FLAGS.
NOTE: The data set WORK.DATA3 has 220 observations and 7 variables.
Key Ideas
n If you do not specify a SAS data set name in a DATA statement, SAS automatically
creates data sets with the names DATA1, DATA2, and so on, to successive data
sets.
n The automatically named data sets are stored in the Work or User library.
n If you do not save your automatically named data sets to a permanent library, they
will be removed at the end of your SAS session.
See Also
n _DATA_ Data Set
n DATA Statement
n SET Statement
Example Code
The following example illustrates the use of the _LAST_= system option to specify
the default input data set for the current SAS session:
options _last_=mysas.class; /* 1 */
libname mysas 'C:\Users\';
data mysas.class; /* 2 */
set sashelp.class;
Examples: SAS Words and Names 73
run;
data mysas.test; /* 3 */
set;
run;
1 The _LAST_= system option enables you to designate a data set as the _LAST_
data set. The name that you specify is used as the default data set until you
create a new data set. You can use the _LAST_= system option when you want
to use an existing permanent data set for a SAS job that contains a number of
procedure steps. Issuing the _LAST_= system option enables you to avoid
specifying the SAS data set name in each procedure statement.
2 The first DATA step creates a data set named mysas.class that is referenced in
the _LAST_= system option.
3 The second DATA step creates a permanent data set named mysas.test and
does not specify a data set in the SET statement. The SET statement then
executes the _LAST_= system option and sets the input data set as
mysas.class.
NOTE: There were 19 observations read from the data set MYSAS.CLASS.
NOTE: The data set MYSAS.CLASS has 19 observations and 5 variables.
Key Ideas
n SAS keeps track of the most recently created SAS data set through the reserved
name _LAST_.
n When you execute a DATA or PROC step without specifying an input data set, by
default, SAS uses the _LAST_ data set. Some functions use the _LAST_ default as
well.
n The _LAST_= system option enables you to designate a data set as the _LAST_
data set. The name that you specify is used as the default data set until you create
a new data set.
n You can use the _LAST_= system option when you want to use an existing
permanent data set for a SAS job that contains a number of DATA step or
procedure steps.
n Specifying the _LAST_= system option enables you to avoid specifying the SAS
data set name in each DATA step or procedure statement.
See Also
n Data Set Names
74 Chapter 4 / Words and Names
n LIBNAME Statement
n OPTIONS Statement
n SET Statement
75
5
Variables
Note: There are more ways to create variables. For example, the SET, MERGE,
MODIFY, and UPDATE statements can also create variables.
for the first time in a formatted INPUT statement, then SAS defines the variable
and its attributes based on the category of the informat specified in the INPUT
statement:
n Associating a numeric informat with a variable when it is created for the first
time in a DATA step using formatted input causes the variable to be created as a
numeric type, with a default length of 8.
n Associating a character informat with a variable when it is created for the first
time in a DATA step using formatted input causes the variable to be created as a
character type. The length matches the width specified in the informat in the
INPUT statement. If you do not specify a length with the informat or anywhere
else in the DATA step, then SAS assigns the default length of 8 bytes.
Manage Variables
Modify Variables
You can modify SAS variables using the following methods:
n FORMAT or INFORMAT statement
n LENGTH statement
n ATTRIB statement
n INFORMAT=
n LABEL=
n LENGTH=
1 See “Specifying a New Variable in a FORMAT or INFORMAT Statement” for information about how
SAS determines variable attributes when variables are created using the FORMAT or INFORMAT
statement.
82 Chapter 5 / Variables
See Formats by Category and Informats by Category for a list of these categories.
Example Code 5.1 Specifying a New Variables Using the FORMAT Statement
(Without Specifying Lengths)
data lollipops;
format Flavor $upcase. Amount comma.;
Flavor='Cherry';
Amount=10;
run;
proc contents data=lollipops; run;
The next example is identical except that a length is specified for both variables in
the FORMAT statement along with the format:
Manage Variables 83
Output 5.1 PROC CONTENTS Output for Creating New Variables Using the
FORMAT Statement (Without Specifying Lengths)
Example Code 5.2 Specifying New Variables Using the FORMAT Statement (With
Length Specified)
data lollipops;
format Flavor $upcase10. Amount comma10.;
Flavor='Cherry';
Amount=10;
run;
proc contents data=lollipops; run;
Output 5.2 PROC CONTENTS Output for Specifying New Variables Using the
FORMAT Statement (With Length Specified)
In the example, Example Code 5.3 on page 83, the variables appear for the first
time in an assignment statement rather than in the FORMAT statement. So, in this
case, the assignment statements create the variables. When a variable appears for
the first time on the left side of an assignment statement, SAS automatically sets
its type and length based on the expression on the right side of the assignment
statement. Since the expression on the right is the 5–letter character string Cherry,
SAS assigns a length of 5 bytes and the type character to the variable Flavor. SAS
assigns a length of 8 bytes to the numeric variable, Amount.
Example Code 5.3 Changing the Format of an Existing Variable Using the FORMAT
Statement
data lollipops;
Flavor='Cherry';
Amount=10;
format Flavor $upcase10. Amount comma10.;
run;
proc contents data=lollipops; run;
84 Chapter 5 / Variables
Output 5.3 PROC CONTENTS Output for Changing the Format of an Existing
Variable Using the FORMAT Statement
See Also
n Table 5.5 on page 88 for more information about how SAS determines variable
attributes with assignment statements.
n SAS Formats and Informats: Reference
See Formats by Category and Informats by Category for a list of these formats and
informats.
Manage Variables 85
The following table summarizes the general differences between the DROP, KEEP,
and RENAME statements and the DROP=, KEEP=, and RENAME= data set options.
Table 5.3 Statements versus Data Set Options for Dropping, Keeping, and Renaming
Variables
apply to output data sets only apply to output or input data sets
can be used in DATA steps only can be used in DATA steps and PROC
steps
can appear anywhere in DATA steps must immediately follow the name of
each data set to which they apply
is one that is specified in the DATA statement.) Consider the following facts when
you make your decision:
n If variables are not written to the output data set and they do not require any
processing, using an input data set option to exclude them from the DATA step
is more efficient.
n If you want to rename a variable before processing it in a DATA step, you must
use the RENAME= data set option in the input data set.
n If the action applies to output data sets, you can use either a statement or a
data set option in the output data set.
The following table summarizes the action of data set options and statements
when they are specified for input and output data sets. The last column of the table
tells whether the variable is available for processing in the DATA step. If you want
to rename the variable, use the information in the last column.
Table 5.4 Status of Variables and Variable Names When Dropping, Keeping, and
Renaming Variables
Status of
Where Data Set Option Variable or
Specified or Statement Purpose Variable Name
Status of
Where Data Set Option Variable or
Specified or Statement Purpose Variable Name
Order of Application
Your program might require that you use more than one data set option or a
combination of data set options and statements. It is helpful to know that SAS
drops, keeps, and renames variables in the following order:
n First, options on input data sets are evaluated left to right within SET, MERGE,
and UPDATE statements. DROP= and KEEP= options are applied before the
RENAME= option.
n Next, DROP and KEEP statements are applied, followed by the RENAME
statement.
n Finally, options on output data sets are evaluated left to right within the DATA
statement. DROP= and KEEP= options are applied before the RENAME= option.
See Also
Examples
n Using the DROP= Data Set Option and DROP Statement
n Using the DROP= and RENAME= Data Set Options
Statements
n DROP Statement
Variable Attributes
Definition
SAS variables are containers that you create within a program to store and use
character and numeric values. Variables have the following attributes:
Length refers to the number of 2 to 8 bytes for 8 bytes for numeric and
bytes used to store numeric. character
each of the variable's
1 to 32,767 bytes for
values in a SAS data
character.
set. You can use a
LENGTH statement to
set the length of both
numeric and character
variables.
Variable Attributes 89
Informat refers to the See About Informats in w.d for numeric, $w.for
instructions that SAS SAS Formats and character
uses when reading data Informats: Reference
values. If no informat is
specified, the default
informat is w.d for a
numeric variable, and
$w. for a character
variable. You can assign
SAS informats to a
variable in the
INFORMAT or ATTRIB
statement.
You can also assign an
informat in the INPUT
statement or INPUT
function.
Note: Starting with SAS 9.1, the maximum number of variables can be greater than
32,767. The maximum number depends on your environment and the file's
attributes. For example, the maximum number of variables depends on the total
length of all the variables and cannot exceed the maximum page size.
See Also
Examples
n View Variable Attributes
n Change Attributes of a Variable Using the ATTRIB Statement
Statements
Data Types 91
n ATTRIB Statement
n FORMAT Statement
n INFORMAT Statement
Procedures
n CONTENTS Procedure
Data Types
Definitions
A data type is an attribute of every SAS variable that specifies what type of data
the variable stores. The data type identifies a piece of data as a character string, an
integer, a floating-point number, or a date or time, for example. The data type also
determines how much memory to allocate for the variable’s values. The default SAS
engine is the V9 engine, or the Base SAS engine. In Base SAS, the following data
types are supported by the DATA step.
data values
are character or numeric values.
numeric value
contains only numbers, and sometimes a decimal point, a minus sign, or both.
When they are read into a SAS data set, numeric values are stored in the
floating-point format native to the operating environment. Nonstandard
numeric values can contain other characters as numbers; you can use formatted
input to enable SAS to read them.
character value
is a sequence of characters.
standard data
are character or numeric values that can be read with list, column, formatted, or
named input.
nonstandard data
is data that can be read only with the aid of informats. Examples of nonstandard
data include numeric values that contain commas, dollar signs, or blanks; date
and time values; and hexadecimal and binary values.
92 Chapter 5 / Variables
Numeric Data
Numeric data can be represented in several ways. SAS can read standard numeric
values without any special instructions. To read nonstandard values, SAS requires
special instructions in the form of informats. Table 5.7 on page 92 shows standard,
nonstandard, and invalid numeric data values and the special tools, if any, that are
required to read them. For complete descriptions of all SAS informats, see SAS
Formats and Informats: Reference.
23 - minus sign follows Put minus sign before number or solve programmatically.
number It might be possible to use the S370FZDTw.d informat,
but positive values require the trailing plus sign (+).[
You can also use the TRAILSGN informat.
J23 not a number Read as a character value, or edit the raw data to change
it to a valid number.
Character Data
A value that is read with an INPUT statement is assumed to be a character value if
one of the following is true:
n A dollar sign ($) follows the variable name in the INPUT statement.
n The variable has been previously defined as character. For example, a value is
assumed to be a character value if the variable has been previously defined as
character in a LENGTH statement, in the RETAIN statement, by an assignment
statement, or in an expression.
Input data that you want to store in a character variable can include any character.
Use the guidelines in the following table when your raw data includes leading
blanks and semicolons.
Table 5.9 Reading Instream Data and External Files Containing Leading Blanks and
Semicolons
leading or trailing blanks that formatted input and the List input trims leading
you want to preserve $CHARw. informat and trailing blanks from
a character value
94 Chapter 5 / Variables
delimiters, blank characters, DSD option, with DLM= or These options enable
or quoted strings DLMSTR= option in the SAS to read a character
INFILE statement value that contains a
delimiter within a
quoted string; these
options can also treat
two consecutive
delimiters as a missing
value and remove
quotation marks from
character values.
In the example below, the character variable Rate appears in a numeric context. It is
multiplied by the numeric variable Hours to create a new variable named Salary.
Salary=Rate*Hours;
When this step executes, SAS automatically attempts to convert the character
values of Rate to numeric values so that the calculation can occur. This conversion
is completed by creating a temporary numeric value for each character value of
Rate. This temporary value is used in the calculation. The character values of Rate
Variable Type Conversions 95
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
9248:8
No conversion messages appear in the SAS log when the INPUT function is used.
In the example below, you want to create a new character variable named Loc that
concatenates the values of the numeric variable Site and the character variable
Dept. The new variable values must contain the value of Site with a slash, and then
the value of Dept.
Loc=Site||'/'||Dept;
Submitting this statement in a DATA step causes SAS to automatically convert the
numeric values of Site to character values because Site is used in a character
96 Chapter 5 / Variables
context. The variable Site appears with the concatenation operator, which requires
character values.
No conversion messages appear in the SAS log when you use the PUT function.
See Also
Examples
n “Example: Convert Character Variables to Numeric”
n “Example: Convert Numeric Variables to Character”
n “Example: Use Automatic Type Conversions”
Functions
n “INPUT Function” in SAS Functions and CALL Routines: Reference
n PUT Function
Automatic Variables
Automatic variables are system variables that SAS creates automatically when a
session is started. SAS creates both automatic macro variables and automatic
DATA step variables. The names of automatic variables are reserved for the
system-generated variables that are created when the DATA step executes. It is
recommended that you do not use names that start and end with an underscore in
your own applications.
Automatic Variables 97
_CMD_
contains the last command from the window’s command line that was not
recognized by the window. This automatic variable is created in the DATA step
when you use the WINDOW statement to create customized windows for your
applications.
_ERROR_
is 0 by default but is set to 1 whenever an error is encountered. Examples of
errors include an input data error, a conversion error, or a math error, as in
division by 0 or a floating point overflow. You can use the value of this variable
to help locate errors in data records and to print an error message to the SAS
log.
_IORC_
The automatic variable _IORC_ contains the return code for each I/O operation
that the MODIFY statement attempts to perform. The best way to test for
values of _IORC_ is with the mnemonic codes that are provided by the SYSRC
autocall macro. Each mnemonic code describes one condition. The mnemonics
provide an easy method for testing problems in a DATA step program. This
automatic variable is created in the DATA step when you use the MODIFY
statement to replace, delete, and append observations in an existing SAS data
set.
_N_
is initially set to 1. Each time the DATA step loops past the DATA statement, the
variable _N_ increments by 1. The value of _N_ represents the number of times
the DATA step has iterated.
_MSG_
contains a message that you specify to be displayed in the message area of the
window. This automatic variable is created in the DATA step when you use the
WINDOW statement to create customized windows for your applications.
FIRST.variable
are variables that SAS creates for each BY variable. SAS sets FIRST.variable
when it is processing the first observation in a BY group.
98 Chapter 5 / Variables
LAST.variable
are variables that SAS creates for each BY variable. SAS sets LAST.variable
when it is processing the last observation in a BY group.
See Also
Examples
n “Example: Use the _N_ Automatic Variable”
n “Example: Use the _ERROR_ and _INFILE_ Automatic Variable with the IF/THEN
Statement”
Statements
n “_INFILE_=variable” in SAS DATA Step Statements: Reference
SAS Variable Lists 99
SAS keeps track of active variables in the order in which the compiler encounters
them within a DATA step. With name variable lists, you refer to the variables in a
variable list in the same order.
You can use variable lists when you define variables. You can use variable lists in
SAS statements and data set options in the DATA step. Variable lists are useful
because they provide a quick way to reference existing groups of data.
Note: Only the numbered range list are supported in the RENAME= option.
variables that begin with a This character string tells SAS to calculate the
specified character string. sum of all the variables that begin with “Sales,”
such as Sales_Jan, Sales_Feb, and Sales_Mar.
Definition
The OF operator enables you to specify SAS variable lists or SAS arrays as
arguments to functions. Here is the syntax for functions used with the OF operator:
The following table shows the types of SAS variable lists that are valid with the OF
operator:
See Also
Examples
n “Example: Create a Variable Numbered Range List”
n “Example: Create a Variable Name Range List”
n “Example: Use the OF Operator with a Variable List”
n “Example: Use the OF Operator to Create Multiple Variable Lists”
Statements
n ARRAY Statement
n INPUT Statement, List
n KEEP Statement
n PUT Statement
n VAR Statement
Functions
n SUM Function
1. Requires you to have a series of variables with the same name except for the last character or characters, which are
consecutive numbers.
2. If array-name is a temporary array, there are limitations. See “Using the OF Operator with Temporary Arrays” in SAS
Functions and CALL Routines: Reference.
102 Chapter 5 / Variables
n character
n special numeric
Definition
A special missing value is a type of numeric missing value that enables you to
represent different categories of missing data by using the letters A–Z or an
underscore.
n When SAS prints a special missing value, it prints only the letter or underscore.
n When data values contain characters in numeric fields that you want SAS to
interpret as special missing values, use the MISSING statement to specify those
characters.
Numeric Variables
Within SAS, a missing value for a numeric variable is smaller than all numbers. If
you sort your data set by a numeric variable, observations with missing values for
that variable appear first in the sorted data set. For numeric variables, you can
compare special missing values with numbers and with each other. The following
table shows the sorting order of numeric values.
104 Chapter 5 / Variables
smallest ._ underscore
. period
-n negative numbers
0 zero
For example, the numeric missing value (.) is sorted before the special numeric
missing value .A, and both are sorted before the special missing value .Z. SAS does
not distinguish between lowercase and uppercase letters when sorting special
numeric missing values.
Note: The numeric missing value sort order is the same regardless of whether your
system uses the ASCII or EBCDIC collating sequence.
Character Variables
Missing values of character variables are smaller than any printable character
value. Therefore, when you sort a data set by a character variable, observations
with missing (blank) values of the BY variable always appear before observations in
which values of the BY variable contain only printable characters. However, some
unprintable characters (for example, machine carriage-control characters and real
or binary numeric data that have been read in error as character data) have values
less than the blank. Therefore, when your data includes unprintable characters,
missing values might not appear first in a sorted data set.
Missing Variable Values 105
n automatic variables
When all rows in a data set in a match-merge operation (with a BY statement) are
processed, the variables in the output data set retain their values as described
earlier. That is, as long as there is no change in the BY value in effect when all of the
rows in the data set have been processed, the variables in the output data set
retain their values from the final observation. FIRST.variable and LAST.variable, the
automatic variables that are generated by the BY statement, both retain their
values. Their initial value is 1.
When the BY value changes, the variables are set to missing and remain missing
because the data set contains no additional observations to provide replacement
values. When all of the rows in a data set in a one-to-one merge operation (without
a BY statement) have been processed, the variables in the output data set are set
to missing and remain missing.
106 Chapter 5 / Variables
Invalid Operations
SAS prints a note in the log and assigns a missing value to the result if you try to
perform an invalid operation, such as the following:
n dividing by zero
See Also
Examples
n “Example: Automatically Replacing Missing Values”
n “Example: Creating Special Missing Values”
n “Example: Preventing Propagation of Missing Values”
Functions
n MISSING Function
Statements
n MISSING Statement
Numeric Precision
Overview
In any number system, whether it is binary or decimal, there are limitations to how
precise numbers can be represented. As a result, approximations have to be made.
For example, in the decimal number system, the fraction 1/3 cannot be perfectly
represented as a finite decimal value because it contains infinitely repeating digits
(.333...). On computers, because of finite precision, this number cannot be
represented with perfect precision. Numerical precision is the accuracy with which
numbers are approximated or represented.
Furthermore, although computers do allow the use of decimal numbers and decimal
arithmetic via human-centric software interfaces, all numbers and data are
108 Chapter 5 / Variables
For example, the decimal value 1/10 has a finite decimal representation (0.1), but in
binary it has an infinitely repeating representation. In binary, the value converts to
0.000110011001100110011 ...
where the pattern 0011 is repeated indefinitely. As a result, the value is rounded
when stored on a computer.
There are many decimal fractions whose binary equivalents are infinitely repeating
binary numbers, so be careful when interpreting results from general rational
numbers in decimal. There are some rational numbers that do not present problems
in either number system. For example, 1/2 can be finitely represented in both the
decimal and binary systems.
To understand better why a simple calculation such as this one can go wrong, or
how a number can be out of range, it is important to understand in more detail how
SAS stores binary numbers.
On any computer, there are limits to how large the absolute value of an integer can
be. In SAS, this maximum integer value depends on two factors:
n the number of bytes that you explicitly specify for storing the variable (using
the LENGTH statement)
n the operating environment on which SAS is running
If you have not explicitly specified the number of storage bytes, then SAS uses the
default length of 8 bytes, and the maximum integer then depends solely on what
operating system you are using.
The following table lists the largest integer that can be reliably stored by a SAS
variable in the mainframe, UNIX, and Windows operating environments.
Table 5.13 Largest Integer That Can Be Safely Stored in a Given Length
When Variable
Largest Integer Largest Integer
Length
Equals ... z/OS Windows or UNIX
3 65,536 8,192
4 16,777,216 2,097,152
5 4,294,967,296 536,870,912
6 1,099,511,627,776 137,438,953,472
7 281,474,946,710,656 35,184,372,088,832
the length of variables, make sure that the values are less than or equal to the
largest integer allowed for that specified length.
For example, in the UNIX operating environment, if you know that the values of
your numeric variables are always integers between -8192 and 8192, then you
can safely specify a length of 3 to store the number:
data myData;
length num 3;
num=8000;
run;
CAUTION
Use the full 8 bytes to store variables that contain real numbers.
Floating-Point Representation
SAS stores numeric values in 8 bytes of data. The way that the numbers are stored
and the space available to store them also affects numerical accuracy. Although
there are various ways to store binary numbers internally, SAS uses floating-point
representation to store numeric values. Floating-point representation supports a
wide range of values (very large or very small numbers) with an adequate amount
of numerical accuracy.
mantissa exponent
987 = .987 x 103
base
n the mantissa is the number that is being multiplied by the base. In the example,
the mantissa is .987.
n the base is the number that is being raised to a power. In the example, the base
is 10.
n the exponent is the power to which the base is raised. In the example, the
exponent is 3.
The following figure shows the decimal value 987 written in the IEEE 754 binary
floating-point format. Because it is a small value, no rounding is needed.
Figure 5.1 on page 111 shows the byte layout for a double-precision binary floating-
point number. This layout uses the first bit to encode the sign of the number, the
next 11 bits to encode the exponent, and the final 52 bits to encode the mantissa. If
the sign bit is 1, then the number is negative and if the sign bit is 0, the number is
positive.
sign
exponent (11 bit) mantissa
Different host computers can have different formats and specifications for floating-
point representation. All platforms on which SAS runs use 8-byte floating-point
representation.
SAS stores truncated floating-point numbers using the LENGTH statement, which
reduces the number of mantissa bits.The following table shows some differences
between floating-point formats for the IBM mainframe and the IEEE standard. The
IEEE standard is used by the Windows and UNIX operating systems.
112 Chapter 5 / Variables
IEEE Standard
IBM
Specifications Mainframe (Windows/UNIX) Affects
Base 16 2 magnitude
The following bullet points describe the table above in more detail:
n Base 16 – uses digits 0-9 and letters A-F (to represent the values 10-15).
For example, to convert the decimal value 3000 to hexadecimal, you use the
base 16 number system:
Base 16
167 ... 16 4 16 3 16 2 16 1 16 0
For example, to convert the decimal value 184 to binary, you use the base 2
number system:
Base 2
27 ... 24 23 22 21 20
128 ... 16 8 4 2 1
n exponent bits – the number of bits reserved for storing the exponent, which
determines the magnitude of the number that you can store. The number of
exponent bits varies between operating systems. IEEE systems yield numbers of
greater magnitude because they use more bits for the exponent.
n mantissa bits – the number of bits reserved for storing the mantissa, which
determines the precision of the number. Because there are more bits reserved
for the mantissa on mainframes, you can expect greater precision on a
mainframe compared to a PC.
n round or truncate – the chosen conversion method used for handling two or
more digits. Because there is room for only two hexadecimal characters in the
mantissa, a convention must be adopted on how to handle more than two digits.
One convention is to truncate the value at the length that can be stored. This
convention is used by IBM Mainframe systems.
An alternative is to round the value based on the digits that cannot be stored,
which is done on IEEE systems. There is no right or wrong way to handle this
dilemma since neither convention results in an exact representation of the
value.
In SAS, the LENGTH statement works by truncating the number of mantissa
bits. For more information about the effects of truncated lengths, see “Using the
TRUNC Function When Comparing Values”.
n bias – an offset used to enable both negative and positive exponents with the
bias representing 0. If a bias is not used, an additional sign bit for the exponent
must be allocated. For example, if a system uses a bias of 64, a characteristic
with the value 66 represents an exponent of +2, whereas a characteristic of 61
represents an exponent of –3.
Although the IEEE platforms use the same set of specifications, you might
occasionally see varying results between the platforms due to compiler differences,
and math library differences. Also, because the IEEE standard allows for some
variations in how the standard is implemented, there might be differences in how
different platforms perform calculations even though they are following the same
standard. Hosts might yield different results because the underlying instructions
that each operating system uses to perform calculations are slightly different.
representation components differ. For example, there are differences between the
z/OS and Windows operating systems and between the z/OS and UNIX operating
systems.
On Windows this allows storage of numbers larger than the basic IEEE floating-
point format used by operating systems such as UNIX. This is one reason why you
might see slightly different values from operating systems that use the same IEEE
standard. Extended precision formats provide greater precision and more exponent
range than the basic floating-point formats.
Storage Format
The byte layout for a 64-bit, double-precision number on Windows is as follows:
This representation corresponds to bytes of data with each character being 1 bit, as
follows:
n The S in byte 1 is the sign bit of the number. A value of 0 in the sign bit is used to
represent positive numbers.
Numeric Precision 115
The exponent has a base associated with it. Do not confuse this with the base in
which the exponent is represented; the exponent is always represented in binary
format, but the exponent is used to determine how many times the base should be
multiplied by the mantissa.
Log Output
not equal
data _null_;
x=.5000000000000000;
y=.500000000000000;
if x=y then put 'equal';
else put 'not equal';
put x=hex16./
y=hex16.;
run;
Log Output
equal
x=3FE0000000000000
y=3FE0000000000000
116 Chapter 5 / Variables
Storage Format
SAS for z/OS uses the traditional IBM mainframe floating-point representation as
follows:
This representation corresponds to bytes of data with each character being 1 bit, as
follows:
n The S in byte 1 is the sign bit of the number. A value of 0 in the sign bit is used to
represent positive numbers.
n The seven E characters in byte 1 represent a binary integer known as the
characteristic. The characteristic represents a signed exponent and is obtained
by adding the bias to the actual exponent. The bias is an offset used to enable
both negative and positive exponents with the bias representing 0. If a bias is
not used, an additional sign bit for the exponent must be allocated. For example,
if a system uses a bias of 64, a characteristic with the value of 66 represents an
exponent of +2, whereas a characteristic of 61 represents an exponent of –3.
n The remaining M characters in bytes 2 through 8 represent the bits of the
mantissa. There is an implied radix point before the left-most bit of the
mantissa. Therefore, the mantissa is always less than 1. The term radix point is
used instead of decimal point because decimal point implies that you are
working with decimal (base 10) numbers, which might not be the case. The radix
point can be thought of as the generic form of decimal point.
Numeric Precision 117
Computational Considerations
Regardless of how much precision is available, there are still some numbers that
cannot be represented exactly. Most rational numbers (for example, .1) cannot be
represented exactly in base 2 or base 16. This is why it is often difficult to store
fractions in floating-point representation.
Notice that here is an infinitely repeating 9 digit similar to the trailing 3 digit in the
attempted decimal representation of one-third (.3333 …). This lack of precision
can be compounded when arithmetic operations are performed on these values
repeatedly.
For example, when you add .33333 to .99999, the theoretical answer is 1.33333, but
in practice, this answer is not possible. The sums become more imprecise as the
values continue to be calculated.
For example, consider the following DATA step:
data _null_;
do i=-1 to 1 by .1;
put i=;
if i=0 then put 'AT ZERO';
end;
run;
The AT ZERO message in the DATA step is never printed because the accumulation
of the imprecise number introduces enough errors that the exact value of 0 is never
encountered. The calculated result is close to 0, but never exactly equal to 0.
Therefore, when numbers cannot be represented exactly in floating point,
performing mathematical operations with other non-exact values can compound
the imprecision.
For example, the IBM mainframe representation uses 8 bytes for full precision, but
you can store as few as 2 bytes on disk. The value 1 is represented as
118 Chapter 5 / Variables
41 10 00 00 00 00 00 00
in 8 bytes. In 2 bytes, it is truncated to 41 10. In this case, you still have the full
range of magnitude because the exponent remains intact, but there are fewer digits
involved. A decrease in the number of digits means either fewer digits to the right
of the decimal place or fewer digits to the left of the decimal place before trailing
zeros must be used.
For example, consider the number 1234567890, which is .1234567890 to the 10th
power of 10 in base 10 floating-point notation. If you have only five digits of
precision, the number becomes 123460000 (rounding up). Note that this is the case
regardless of the power of 10 that is used (.12346, 12.346, .0000012346, and so
on).
In addition, you must be careful in your choice of lengths, as the previous discussion
shows. Consider a length of 2 bytes on an IBM mainframe system. This value
enables 1 byte to store the exponent and sign, and 1 byte for the mantissa. The
largest value that can be stored in 1 byte is 255. Therefore, if the exponent is 0
(meaning 16 to the 0th power, or 1 multiplied by the mantissa), then the largest
integer that can be stored with complete certainty is 255. However, some larger
integers can be stored because they are multiples of 16.
For example, consider the 8-byte representation of the numbers 256 to 272 in the
following table:
Sign or Mantissa
Value Exp 1 Mantissa 2-7 Considerations
258 43 10 200000000000
259 43 10 300000000000
271 43 10 F00000000000
x = 1/3
However, adding the TRUNC function makes the comparison true, as in the
following:
if x=trunc(1/3,3) then ...;
See“TRUNC Function” in SAS Functions and CALL Routines: Reference for more
information about this function.
An 8-byte floating point that is truncated to 4 bytes might not be the same as a
float point in a C program. In the C language, an 8-byte floating-point number is
called a double. In Fortran, it is a REAL*8. In IBM PL/I, it is a FLOAT BINARY(53). A
4-byte floating-point number is called a float in the C language, REAL*4 in Fortran,
and FLOAT BINARY(21) in IBM PL/I.
Consider transporting data between an IBM mainframe and a PC, for example. The
IBM mainframe has a range limit of approximately .54E−78 to .72E76 (and their
negative equivalents and 0) for its floating-point numbers.
Other computers, such as the PC, have wider limits (the PC has an upper limit of
approximately 1E308). Therefore, if you are transferring numbers in the magnitude
of 1E100 from a PC to a mainframe, you lose that magnitude. During data transfer,
the number is set to the minimum or maximum allowable on that operating system,
so 1E100 on a PC is converted to a value that is approximately .72E76 on an IBM
mainframe.
CAUTION
Transfer of data between computers can affect numerical precision.
If you are transferring data from an IBM mainframe to a PC, notice that the number
of bits for the mantissa is 4 less than that for an IBM mainframe. This means that
you lose 4 bits when moving to a PC.
This precision and magnitude difference is a factor when moving from one
operating environment to any other where the floating-point representation is
different.
An alternative solution, and probably the safest way to avoid numerical precision
problems when transferring data between operating systems, is to convert the
numbers in your data to integers.
For more information about moving data between operating systems, see Moving
and Accessing SAS Files.
See Also
Examples
n “Example: Compare Imprecise Values in SAS”
n “Example: Convert a Decimal Value to a Floating Point Representation”
n “Example: Convert a Decimal Value to a Hexadecimal Floating-Point
Representation”
n “Example: Round Values to Avoid Computational Errors”
n “Example: Use the LENGTH Statement to Compare Values”
n “Example: Compare Values That Have Imprecise Representations”
n “Example: Confirm Precision Errors Using Formats”
n “Example: Determine How Many Bytes Are Needed to Store a Number
Accurately”
Functions
n “ROUND Function” in SAS Functions and CALL Routines: Reference
n “TRUNC Function” in SAS Functions and CALL Routines: Reference
Examples: Create and Modify SAS Variables 121
Statements
n “LENGTH Statement” in SAS DATA Step Statements: Reference
Example Code
This example shows how to create a numeric and character variable using an
assignment statement. A numeric variable evaluates the value on the right side of
the expression as an integer and the variable type is implicitly set to a numeric data
type. To create the character variable, you can enclose the value in quotation marks
specifies that the variable type is character. The length of the variable is set to the
length of the character value. The length for the variable Patient is set to 12.
data newvar;
Insurance=100;
Patient="John Whitman"
run;
Key Ideas
n You can create a new variable and assign it a value by using it for the first time on
the left side of an assignment statement.
n The variable gets the same type and length as the expression on the right side of
the assignment statement.
122 Chapter 5 / Variables
See Also
n Assignment Statement
Example Code
This example demonstrates using simple list input to create a SAS data set, named
gems, and define four variables based on the data provided.
data gems;
input Name $ Color $ Carats Owner $;
datalines;
emerald green 1 smith
sapphire blue 2 johnson
ruby red 1 clark
;
run;
The INPUT statement reads in four variables: Name, Color, Carats, and Owner. Name,
Color, and Owner were character variables as indicated by the $ and Carats is a
numeric variable.
Key Ideas
n When reading raw data into SAS, you can use the INPUT statement to define your
variables based on positions within the raw data.
n You can use one of the following methods with the INPUT statement to provide
information about the raw data organization:
o column input
o list input (simple or modified)
o formatted input
o named input
Examples: Create and Modify SAS Variables 123
See Also
n INPUT Statement, List
n Assignment Statement
Example Code
This example shows how to modify a SAS variable using the LENGTH statement.
data sales;
length Salesperson $25;
Salesperson='Jonathon Mark Walker';
run;
Key Ideas
n For character variables, you must use the longest possible value in the first
statement that uses the variable. The reason is that you cannot change the length
with a subsequent LENGTH statement within the same DATA step. The maximum
length of any character variable in SAS is 32,767 bytes.
n For numeric variables, you can change the length of the variable by using a
subsequent LENGTH statement.
n When SAS assigns a value to a character variable, it pads the value with blanks or
truncates the value on the right side, if necessary, to make it match the length of
the target variable.
See Also
n LENGTH Statement
n Assignment Statement
124 Chapter 5 / Variables
Example Code
This example demonstrates how to read in raw data from an external data file and
create new variables in an output data set using both simple list INPUT and
formatted input. The INPUT statement creates the variable Name as a character
variable with a default length of 8 using simple list input (no length is specified).
The INPUT statement also creates a numeric variable Carats that also has a length
of 8. Because the last variable in the INPUT statement, Color, specifies an
INFORMAT in the INPUT statement, it is being read in using formatted input, SAS
defines the variable as a character variable with a length of 10.
data gems;
input Name $ Carats comma3.1 Color $char10.;
informat Name $char10.;
format Carats comma3.1;
datalines;
emerald 15 green
aquamarine 20 blue
;
proc print data=gems; run;
proc contents data=gems; run;
data gems;
input Name $char10. Carats comma3.1 Color $char.;
datalines;
emerald 15 green
aquamarine 20 blue
;
proc print data=gems; run;
proc contents data=gems; run;
Key Ideas
n If a variable does not already exist and you create it for the first time in a
formatted INPUT statement, then SAS defines the variable and its attributes
based on the category of the informat specified in the INPUT statement:
o Associating a numeric informat with a variable when it is created for the first
time in a DATA step using formatted input causes the variable to be created as
a numeric type, with a default length of 8.
o Associating a character informat with a variable when it is created for the first
time in a DATA step using formatted input causes the variable to be created as
a character type. The length matches the width specified in the informat in the
INPUT statement. If you do not specify a length with the informat or anywhere
else in the DATA step, then SAS assigns the default length of 8 bytes.
n You can modify a variable and specify its format or informat with a FORMAT or
INFORMAT statement.
See Also
n FORMAT Statement
n INFORMAT Statement
Example Code
The following DATA step creates a variable named Flavor in a data set named
lollipops.
data lollipops;
attrib Flavor format=$10.;
Flavor="Cherry";
run;
126 Chapter 5 / Variables
Key Ideas
n The ATTRIB statement enables you to specify one or more of the following
variable attributes for an existing variable:
o FORMAT=
o INFORMAT=
o LABEL=
o LENGTH=
n If the variable does not already exist, one or more of the FORMAT=, INFORMAT=,
and LENGTH= attributes can be used to create a new variable.
n You cannot create a new variable by using a LABEL statement or the ATTRIB
statement's LABEL= attribute by itself. Labels can be applied only to existing
variables.
See Also
n ATTRIB Statement
Example Code
This example shows you how to use the CONTENTS procedure to view
Sashelp.Cars variable names, types, and attributes.
n The variables' names, types, and attributes (including formats, informats, and
labels)
n If your data is sorted by any variable, then the sort information is also displayed.
Examples: Create and Modify SAS Variables 127
The alphabetic list of variables and attributes table generated by the CONTENTS
procedure shows #, Variable, Type, Len, Format and Label.
#
The original order of the variable in the columns of the data set. PROC
CONTENTS prints the variables in alphabetical order with respect with the
name, instead of the order in which they appear in the data set.
Variable
Identifies the variable name. This might be different from what you see as the
name displayed in the output.
Type
Whether the variable is numeric (Num) or character (Char).
Len
Short for “Length”. Represents the width of the variable.
Format
The assigned format that will be used when the variables are printed in the
Results window.
Informat
The original format of the variable when it was read into SAS.
Label
The assigned variable label that will be used when the name of the variable is
printed in the Output window. If your variables do not have labels, this column is
identical to the Variable column.
Key Ideas
n PROC CONTENTS describes the structure and displays the variable attributes of
the data set, rather than the data values.
n This procedure is especially useful if you have imported your data from a file and
want to check that your variables have been read correctly, and have the
appropriate variable type and format.
See Also
n CONTENTS Procedure
n Variable Attributes
128 Chapter 5 / Variables
Example Code
This example shows how to change variable attributes using the ATTRIB
statement. The ATTRIB statement associates a format, informat, label, and length
with one or more variables.
data flightmiles;
attrib
Atlanta label='ATL' length=8 format=comma8.
Chicago label='ORD' length=8 format=comma8.
Denver label='DEN' length=8 format=comma8.
Houston label='IAH' length=8 format=comma8.
LosAngeles label='LAX' length=8 format=comma8.
Miami label='MIA' length=8 format=comma8.
NewYork label='JFK' length=8 format=comma8.
SanFrancisco label='SFO' length=8 format=comma8.
Seattle label='SEA' length=8 format=comma8.
WashingtonDC label='DCA' length=8; format=comma8.
;
set sashelp.mileages;
run;
title1 'Flying Miles Between Ten US Cities';
proc print data=flightmiles label;
run;
If you add a PROC CONTENTS step, you can see how the ATTRIB statement
changed your variable attributes. Each variable is formatted (boxed in orange),
length (boxed in brown), and a label (boxed in red).
Key Ideas
n For character variables, you must use the longest possible value in the first
statement that uses the variable. The reason is that you cannot change the length
with a subsequent LENGTH statement within the same DATA step. The maximum
length of any character variable in SAS is 32,767 bytes.
n Using the ATTRIB statement in the DATA step permanently associates attributes
with variables by changing the descriptor information of the SAS data set that
contains the variables.
n You can use either an ATTRIB statement or an individual attribute statement such
as FORMAT, INFORMAT, LABEL, and LENGTH to change an attribute that is
associated with a variable.
See Also
n ATTRIB Statement
n Variable Attributes
130 Chapter 5 / Variables
Example Code
This example uses the DROP statement and the DROP= data set option to control
the output of variables to the new SAS data sets. The DROP statement drops the
MSRP variable from the new data set, newcars, while the DROP= data set option
drops seven variables from the original data set, sashelp.cars, and does not bring
those variables into the new data set when reading from sashelp.cars.
There are 428 observations and 15 variables in sashelp.cars. The new data set,
newcars, contains 17 observations and 7 variables.
data newcars;
set sashelp.cars(drop=origin enginesize cylinders horsepower weight
wheelbase length);
if MSRP>75000 then output;
drop msrp;
run;
proc print data=newcars;
run;
Examples: Control Output of Variables 131
Key Ideas
n The DROP statement or the DROP= data set options control which variables are
processed during the DATA step.
n If you use the DROP, KEEP, or RENAME statement, the action always occurs as the
variables are written to the output data set.
n With SAS data set options, where you use the option determines when the action
occurs. If the option is used on an input data set, the variable is dropped, kept, or
renamed before it is read into the program data vector.
n If used on an output data set, the data set option is applied as the variable is
written to the new SAS data set.
See Also
n DROP Statement
Example Code
This example uses the DROP= and RENAME= data set options and the INPUT
function to convert the variable Day from numeric to character. The variable name
Day is changed to Weekday before processing so that a new variable Weekday can be
written to the output data set. Note that the variable Day is dropped from the
output data set and that the new name Weekday is used in the program statements.
data fails (drop=Process);
length Day 8;
set sashelp.failure(rename=(Day=Weekday));
Day=input(Weekday,8.);
run;
proc print data=fails;
run;
Key Ideas
n The DROP= data set options control which variables are processed or output
during the DATA step.
n With SAS data set options, where you use the option determines when the action
occurs. If the option is used on an input data set, the variable is dropped, kept, or
renamed before it is read into the program data vector. If used on an output data
set, the data set option is applied as the variable is written to the new SAS data
set.
See Also
n RENAME= Data Set Option
Example Code
In the following example, the data set sashelp.class contains variables Name, Sex,
Age, Height, and Weight (in that order). The ATTRIB statement is specified before
the SET statement so that the variable Sex is moved to the first position in the
output data set.
data class3;
attrib Sex length=$8;
set sashelp.class;
if Sex='M' then Sex='Male';
else Sex='Female';
run;
proc print data=class3;
run;
The output displays the Sex variable listed first because the ATTRIB statement
preceded the SET statement.
134 Chapter 5 / Variables
Key Ideas
n You can control the order in which variables are displayed in SAS output by using
the ATTRIB statement.
n Use the ATTRIB statement prior to the SET, MERGE, or UPDATE statement in
order for you to reorder the variables.
n Variables not listed in the ATTRIB statement retain their original position.
Examples: Reorder and Align Variables 135
See Also
n ARRAY Statement
n ATTRIB Statement
n FORMAT Statement
n INFORMAT Statement
n LENGTH Statement
n RETAIN Statement
n CONTENTS Procedure
Example Code
In the following example, the data set sashelp.class contains variables Name, Sex,
Age, Height, and Weight (in that order). The LENGTH statement is specified before
the SET statement so that the variable Height is moved to the first position in the
output data set
data class1;
length Height 3;
set sashelp.class;
run;
proc print data=class1;
run;
The output displays the Height variable listed first because the LENGTH
statement preceded the SET statement.
136 Chapter 5 / Variables
TIP You can use the CONTENTS procedure on the Class1 data set to view
the length of each variable.
Key Ideas
n You can control the order in which variables are displayed in SAS output by using
the LENGTH statement.
n Use the LENGTH statement prior to the SET, MERGE, or UPDATE statement in
order for you to reorder the variables.
n Variables not listed in the LENGTH statement retain their original position.
Examples: Reorder and Align Variables 137
See Also
n ARRAY Statement
n ATTRIB Statement
n FORMAT Statement
n INFORMAT Statement
n LENGTH Statement
n RETAIN Statement
n CONTENTS Procedure
Example Code
In the following example, the RETAIN statement causes the variables Weight and
Age to be listed first in the output data set. The data set sashelp.class contains
variables Name, Sex, Age, Height, and Weight (in that order).
data class2;
retain Weight Age;
set Sashelp.Class;
run;
proc print data=class2;
run;
The output displays the Weight and Agevariables listed first because the RETAIN
statement preceded the SET statement.
138 Chapter 5 / Variables
Key Ideas
n The RETAIN statement is most often used to reorder variables simply because no
other variable attribute specifications are required.
n The RETAIN statement has no effect on retaining values of existing variables being
read from the data set.
n Only the variables whose positions are relevant need to be listed. Variables not
listed in the RETAIN statement retain their original position.
n Use the RETAIN statement prior to the SET, MERGE, or UPDATE statement in
order for you to reorder the variables.
See Also
n ARRAY Statement
Examples: Reorder and Align Variables 139
n ATTRIB Statement
n FORMAT Statement
n INFORMAT Statement
n LENGTH Statement
n RETAIN Statement
n CONTENTS Procedure
Example Code
In the following example, the first DATA step creates the data set investment
where the variables Material, Item, Investment, and Profit are defined.
The FORMAT statement causes the variable Item to be listed first in the output
data set. The data set investment contains variables Material, Item, Investment,
and Profit (in that order).
data investment;
input Material $1-7 Item $9-15 Investment Profit;
datalines;
cotton shirts 2256354 83952175
silk ties 498678 2349615
silk suits 9482146 69839563
leather belts 7693 14893
leather shoes 7936712 22964
;
run;
data invest01;
format Item Material $upcase9. Investment Profit dollar15.2;
set investment;
run;
proc print data=invest01;
run;
The output below has a boxed red section that illustrates the Item variable listed
first. The boxed orange section illustrates the FORMAT statement transformations
where Item and Material variables are uppercased and Investment and Profit
variables include dollar signs, commas, and two decimal places.
140 Chapter 5 / Variables
Key Ideas
n You can control the order in which variables are displayed in SAS output by using
the FORMAT statement.
n Use the FORMAT statement prior to the SET, MERGE, or UPDATE statement in
order for you to reorder the variables.
n A single FORMAT statement can associate the same format with several variables,
or it can associate different formats with different variables.
n When you use the FORMAT statement to reorder your variables, you do not have
to associate a format with your variables.
n You can also use the INFORMAT statement to reorder your variables.
See Also
n ARRAY Statement
n ATTRIB Statement
n FORMAT Statement
n INFORMAT Statement
n LENGTH Statement
n RETAIN Statement
n CONTENTS Procedure
Examples: Convert Variable Types 141
Example Code
This example shows how to explicitly convert character data values to numeric
values. The INPUT function explicitly converts the rate variable to a numeric and
rate has a length of 2, the numeric informat 2. is used to read the values of the
variable. You can print the data set to see your new variable pay_chk with the
numeric values.
data work.weeksal;
set work.payscale;
pay_chk=input(rate,2.)*Hours;
run;
proc print data=work.weeksal;
run;
Note: No conversion messages appear in the SAS log when the INPUT function is
used.
Key Ideas
n Explicit conversions help you to control the data type and avoids having
conversion errors.
See Also
n “INPUT Function” in SAS Functions and CALL Routines: Reference
Example Code
This example shows how to explicitly do numeric to character conversion using the
PUT function. Use the PUT function in an assignment statement, where Site is the
source variable. Because Site has a length of 2, choose 2. as the numeric format.
The DATA step adds the new variable named Loc from the assignment statement to
the data set. You use PROC PRINT to view your Loc variable.
data work.dept;
set work.payscale;
Loc=catx('/',put(Site,2.),Dept);
run;
proc print data=work.dept;
run;
Examples: Convert Variable Types 143
Note: No conversion messages appear in the SAS log when you use the PUT
function.
Key Ideas
See Also
n “INPUT Function” in SAS Functions and CALL Routines: Reference
Example Code
By default, if you reference a character variable in a numeric context such as an
arithmetic operation, SAS tries to convert the variable values to numeric. In the
example, Rate is a character variable, but it is used in a numeric context. Therefore,
SAS automatically convertsthe values of Rate to numeric to complete the
arithmetic operation.
NOTE: Character values have been converted to numeric values at the places given
by: (Line):(Column).
75:10
NOTE: Numeric values have been converted to character values at the places given
by: (Line):(Column).
76:7
NOTE: There were 9 observations read from the data set WORK.PAYSCALE.
NOTE: The data set WORK.AUTOSAL has 9 observations and 7 variables.
Key Ideas
o It uses the w. informat, where w is the width of the character value that is being
converted.
o It produces a numeric missing value from any character value that does not
conform to standard numeric notation (digits with an optional decimal point,
leading sign, or scientific notation).
See Also
n Automatic Character-to-Numeric Conversion
Example Code
The following example illustrates how to use the _N_ automatic variable with the
PUT function.
data test;
input x $ y;
put 'This is row ' _n_;
datalines;
a 1
b 2
c 3
x 24
z 26
;
run;
proc print data=test;
run;
146 Chapter 5 / Variables
This is row 1
This is row 2
This is row 3
This is row 4
This is row 5
Key Ideas
n Each time the DATA step loops past the DATA statement, the variable _N_
increments by 1.
n The value of _N_ represents the number of times the DATA step has iterated.
See Also
n Automatic Variables
Example Code
The following example illustrates the use of _ERROR_ and _INFILE_= to write to the
SAS log, during each iteration of the DATA step, the contents of an input record in
which an input error is encountered.
data testerr;
input x $ y;
if _error_=1 then put 'Error before row ' _n_ 'which contains '
_infile_;
put _infile_;
datalines;
a 1
*^*#
b 2
c3
;
run;
Key Ideas
n Automatic variables are created automatically by the DATA step or by DATA step
statements. These variables are added to the program data vector but are not sent
as output to the data set being created.
n _INFILE_ is an automatic character variable that gets created automatically by the
DATA step when you use the INFILE statement. It contains the value of the current
input record (row) read in either a file or in data in the DATALINES statement.
n Use the value of _ERROR_ to help locate errors in data records and to print an error
message to the SAS log.
148 Chapter 5 / Variables
See Also
n Automatic Variables
Example Code
This example shows you how to create variable numbered range lists. You can
begin with any number and end with any number as long as you do not violate the
rules for user-supplied names and the numbers are consecutive. Use the INPUT
statement to write out a numbered range list. You can also use a numbered range
list in an ARRAY statement.
data temperatures;
input Day1-Day7;
datalines;
74 75 82 84 85 86 89
;
run;
proc print data=temperatures noobs;
title "Average Daily Low Temperature";
run;
data tempCelsius(drop=i);
set temperatures;
array celsius{7} Day1-Day7;
do i=1 to 7;
celsius{i}=(celsius{i}-32)*5/9;
end;
run;
proc print data=tempCelsius;
title "Average Daily Low Temperature in Celsius";
run;
The following graphic displays the PROC PRINT output for temperatures. The box
highlighted in red shows the variable numbered range list Day1–Day7.
Examples: Use Variable Lists 149
Key Ideas
n Numbered range variable lists are lists consisting of variables that are prefixed
with the same name but that have different numeric values for the last characters.
n The numeric values must be consecutive numbers.
n In a numbered range list, you can refer to variables that were created in any order,
provided that their names have the same prefix.
See Also
n ARRAY Statement
Example Code
In the example, the name range list specified in the KEEP statement keeps all
numeric variables between and including nAtBat and nOuts. The name range list
specified in the ARRAY statement reads all variables between nAtbat and nRuns. In
this case that also includes variables nHits and nHome.
150 Chapter 5 / Variables
Note: All variables in the range list must be the same case.
The name range list specified in the VAR statement in the PRINT procedure
specifies that only the variables between and including Name and nBB are printed in
the PROC PRINT output.
data changeStats(where=(YrMajor>18));
set sashelp.baseball;
keep Name nAtBat-numeric-nOuts YrMajor;
array stats(4) nAtBat--nRuns;
do i=1 to 4;
stats{i} = stats{i}*10;
end;
run;
proc print data=changeStats;
var Name--nBB;
run;
Key Ideas
n You can use a name range list in an ARRAY declaration as long as you have already
defined the variables prior to declaring the array. The variables can be defined in
the same DATA step or in a previous DATA step.
n Notice that name range lists use a double hyphen ( -- ) to designate the range
between variables, and numbered range lists use a single hyphen to designate the
range.
See Also
n ARRAY Statement
n KEEP Statement
n VAR Statement
Example Code
In the following example, arguments are passed in as numbered range lists, both
with and without the use of the OF operator.
data _null_;
x1=30; x2=20; x3=10;
T=sum(x1-x3); /* #1 */
T2=sum(OF x1-x3); /* #2 */
put T=; /* #3 */
put T2=; /* #4 */
run;
Key Ideas
n The OF operator enables you to specify SAS variable lists or SAS arrays as
arguments to functions.
n The OF operator is important when used with functions whose arguments are in
the form of a numbered-range list. For example, an argument in the form of a
numbered range (x1 – xn) is read in as a range of values only if the list is preceded
by the OF operator.
n If the same list is not preceded by the OF operator and it is used with the SUM
function, the (-) character is treated as a subtraction sign. The function returns the
difference between the variables rather than the sum of the range of values.
See Also
n SUM Function
n PUT Statement
Example Code
The following example illustrates a ranged variable list in which the entire list is
preceded by the OF operator and the lists are separated by spaces or commas.
data _null_;
T=sum(OF x1-x3 y1-y3 z1-z3);
run;
or
data _null_;
T=sum(OF x1–x3, OF y1–y3, OF z1–z3);
run;
Key Ideas
n The OF operator enables you to specify SAS variable lists or SAS arrays as
arguments to functions. as arguments to functions.
n The OF operator is also used to distinguish one variable list from another when
multiple ranged lists are used as arguments in functions.
See Also
n Types of Variable Lists
Example Code
SAS replaces the missing values as it encounters values that you assign to the
variables. Thus, if you use program statements to create new variables, their values
in each observation are missing until you assign the values in an assignment
statement, as shown in the following DATA step:
data new;
input x;
if x=1 then y=2;
datalines;
4
1
3
1
;
run;
proc print data=new;
run;
Key Ideas
n SAS replaces the missing values as it encounters values that you assign to the
variables.
n At the beginning of each iteration of the DATA step, SAS sets the value of each
variable that you create in the DATA step to missing.
See Also
n When Reading Raw Data
Example Code
The following example uses data from a marketing research company. Five testers
were hired to test five different products for ease of use and effectiveness. If a
tester was absent, there is no rating to report, and the value is recorded with an X
for “absent.” If the tester was unable to test the product adequately, there is no
rating, and the value is recorded with an I for “incomplete test.” The following
program reads the data and displays the resulting SAS data set. Note the special
missing values in the first and third data lines:
data period_a;
missing X I;
input Id $4. Foodpr1 Foodpr2 Foodpr3 Coffeem1 Coffeem2;
datalines;
1001 115 45 65 I 78
1002 86 27 55 72 86
1004 93 52 X 76 88
1015 73 35 43 112 108
1027 101 127 39 76 79
;
Key Ideas
n When data values contain characters in numeric fields that you want SAS to
interpret as special missing values, use the MISSING statement to specify those
characters.
n If you do not begin a special numeric missing value with a period, SAS identifies it
as a variable name. Therefore, to use a special numeric missing value in a SAS
expression or assignment statement, you must begin the value with a period,
followed by the letter or underscore. Here is an example: x=.d;
See Also
n MISSING Statement
Example Code
If you do not want missing values to propagate in your arithmetic expressions, you
can omit missing values from computations by using the sample statistic functions.
The SUM statement also ignores missing values, so the value of c is also 5. For
example, consider the following DATA step:
156 Chapter 5 / Variables
data test;
x=.;
y=5;
a=x+y;
b=sum(x,y);
c=5;
c+x;
put a= b= c=;
run;
Output 5.20 SAS Log Results for a Missing Value in a Statistic Function
Key Ideas
n The SUM function and SUM statement ignore missing values when making the
calculation.
n If you use a missing value in an arithmetic calculation, SAS sets the result of that
calculation to missing.
n SAS prints notes in the log to notify you which arithmetic expressions have
missing values and when they were created.
Examples: Manage Problems Related to Precision 157
See Also
n MISSING Function
Example Code
In the example, SAS sets the variables point_three and three_time_point_one to
0.3 and (3 x 0.1), respectively. It then compares the two values by subtracting
one from the other and writing the result to the SAS log:
data a;
point_three=0.3;
three_time_point_one=3*0.1;
difference=point_three - three_times_point_one;
put 'The difference is ' difference;
run;
The log output shows that (3 x 0.1) — 0.3 does not equal 0, as it does in decimal
arithmetic. The reason for this is because the values are stored in floating-point
representation and cannot be stored precisely. For more information, see “Storage
Format”.
Key Ideas
n The numbers that are imprecise in decimal are not always the same ones that are
imprecise in binary.
n Performing calculations and comparisons on imprecise numbers in SAS can lead to
unexpected results.
n There are many decimal fractions whose binary equivalents are infinitely repeating
binary numbers, so be careful when interpreting results from general rational
numbers in decimal. There are some rational numbers that do not present
problems in either number system.
See Also
n Numeric Precision
Example Code
This example shows the conversion process for the decimal value 255.75 to
floating-point representation.
1 Use the base 2 number system to write out the value 255.75 in binary.
Note: Each bit in the mantissa represents a fraction whose numerator is 1 and
whose denominator is a power of 2. The mantissa is the sum of a series of
fractions such as 1/2, 1/4 , 1/8 , and so on. Therefore, for any floating-point
number to be represented exactly, you must express it as the previously
mentioned sum.
Base 2
27 26 25 24 23 22 21 20 .2-1 2-2
Base 2
2 Move the decimal over until there is only one digit to the left of it. This process
is called normalizing the value. Normalizing a value in scientific notation is the
process by which the exponent is chosen so that the absolute value of the
mantissa is at least one but less than ten. For this number, you move the
decimal point 7 places:
1.111 1111 11
Because the decimal point was moved 7 places, the exponent is now 7.
4 Convert the decimal value, 1030, to hexadecimal using the base 16 number
system:
Base 16
167 ... 16 4 16 3 16 2 16 1 16 0
The converted hexadecimal value for 1030 is placed in the exponent portion of
the final result.
If the value that you are converting is negative, change the first bit to 1:
1100 0000 0110
6 In Step 2 above, delete the first digit and decimal (the implied one-bit):
11111111
1111 1111 1
8 To have a complete nibble at the end, add enough zeros to complete 4 bits:
1111 1111 1000
9 Convert
1111 1111 1000
Key Ideas
See Also
n Floating-Point Representation on Windows
Examples: Manage Problems Related to Precision 161
Example Code
The following example shows the conversion process for the decimal value 512.1
to hexadecimal floating-point representation. This example illustrates how values
that can be precisely represented in decimal cannot be precisely represented in
hexadecimal floating point.
1 Because the base is 16, you must first convert the value 512.1 to hexadecimal
notation.
2 First, convert the integer portion, 512, to hexadecimal using the base 16 number
system:
Base 16
167 ... 16 4 16 3 16 2 16 1 16 0
4 Convert the fraction portion (.1) of the original number, 512.1 to hexadecimal:
1 1.6
.1 = 10
= 16
The numerator cannot be a fraction, so keep the 1 and convert the .6 portion
again.
6 9.6
.6 = 10
= 16
Again, there cannot be fractions in the numerator, so keep the 9 and reconvert
the .6 portion.
The .6 continues to repeat as 9.6, which means that you keep the 9 and
reconvert. The closest that .1 can be represented in hexadecimal is
162 Chapter 5 / Variables
.1 = .1999999 × 160
5 The exponent for the value is 3 (Step 2 above). To determine the actual
exponent that will be stored, take the exponent value and add the bias to it:
true exponent + bias = 3 + 40 = 43 (hexadecimal) = stored exponent
Key Ideas
See Also
n Floating-Point Representation on Windows
Examples: Manage Problems Related to Precision 163
Example Code
The following example shows how you can use the ROUND function to round the
results for one iteration of the DATA step.
data _null_;
do i=-1 to 1 by .1;
i=round(i, .1);
put i=;
if i=0 then put 'AT ZERO';
end;
run;
i=-1
i=-0.9
i=-0.8
i=-0.7
i=-0.6
i=-0.5
i=-0.4
i=-0.3
i=-0.2
i=-0.1
i=0
AT ZERO
i=0.1
i=0.2
i=0.3
i=0.4
i=0.5
i=0.6
i=0.7
i=0.8
i=0.9
i=1
Example Code
You can avoid comparison errors by explicitly rounding the values before
performing the comparison. The next example compares the calculated result of
1/3 to the assigned value .33333. Because 1/3 is an imprecise number, the value
164 Chapter 5 / Variables
is not equal to .33333, and the PUT statement is not executed. However, if you add
the ROUND function, as in the following example, the PUT ‘MATCH’ statement is
executed:
data _null_;
x=1/3;
if round(x, .00001)=.33333 then put 'MATCH';
run;
Output 5.23 Log Output: Using the ROUND Function to Avoid Comparison Errors
MATCH
Key Ideas
See Also
n “ROUND Function” in SAS Functions and CALL Routines: Reference
Example Code
The numbers from 257 to 271 cannot be stored exactly in the first 2 bytes; a third
byte is needed to store the number precisely. As a result, the following code
produces misleading results:
data ab;
length x 2;
x=257;
y1=x+1;
data abc;
set ab;
Examples: Manage Problems Related to Precision 165
339 x=257;
340 y1=x+1;
NOTE: There were 0 observations read from the data set WORK.AB.
NOTE: The data set WORK.ABC has 0 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
The PUT statement is never executed because the value of X is actually 256 (the
value 257 truncated to 2 bytes). Recall that 256 is stored in 2 bytes as 4310, but
257 is also stored in 2 bytes as 4310, with the third byte of 10 truncated.
Note, however, that Y1 has the value 258 because the values of X are kept in full, 8-
byte floating-point representation in the program data vector. The value is
truncated only when stored in a SAS data set. Y2 has the value 257 because X is
truncated before the number is read into the program data vector.
CAUTION
Do not use the LENGTH statement if your variable values are not integers.
Fractional numbers lose precision if truncated. Also, use the LENGTH statement to
truncate values only when disk space is limited. Refer to the length table in the SAS
documentation for your operating environment for maximum values.
166 Chapter 5 / Variables
Key Ideas
n You can use the LENGTH statement to control the number of bytes that are used
to store variable values. However, you must use it carefully to avoid errors and
significant data loss.
n During compilation SAS allocates as many bytes of storage space as there are
characters in the first value that it encounters for that variable.
See Also
n “LENGTH Statement” in SAS DATA Step Statements: Reference
Example Code
In decimal arithmetic, the expression 15.7 – 11.9 = 3.8 is true. But, in SAS, if
you compare the literal value of 3.8 to the calculated value of 15.7 – 11.9 and
output the result to the SAS log, you will get a result of
'not equal'
.
data a;
x=15.7-11.9;
if x=3.8 then put 'equal';
else put 'not equal';
run;
proc print data=a;
run;
The log output indicates that the values 3.8 and (15.7 – 11.9) are not
equivalent. This is because the values involved in the computation cannot be
precisely represented in floating-point representation.
Examples: Manage Problems Related to Precision 167
Output 5.24 Log Output for Comparing Values That Have Imprecise
Representations
not equal
The PROC PRINT statement displays the value for x as 3.8 rather than the actual
stored value because the procedure automatically applies a format and rounds the
results before displaying them. This example shows how non-explicit rounding can
cause confusion because, in this case, PROC PRINT rounds only the final results
after they are calculated.
Key Ideas
See Also
n Floating-Point Representation
Example Code
In the next example, two different formats are applied to the results given and
displayed in the SAS log. The first format, 10.8 shows that the value of x is 3.8;
however, displaying the value using the 18.16 format indicates that x is slightly less
than 3.8.
data a;
x=15.7-11.9;
168 Chapter 5 / Variables
not equal
x=3.80000000
x=3.7999999999999900
Example Code
You can also use the width of 16 with the HEXw.d format to show floating-point
representation.
data a;
x=15.7-11.9;
if x=3.8 then put 'equal';
else put 'not equal';
put x=hex16.;
run;
not equal
x=400E666666666664
Key Ideas
n The HEXw. format is a special format that can be used to show floating-point
representation.
n When comparing non-integer values that do not have precise floating-point
representations you can sometimes encounter surprising results.
See Also
n “HEXw. Format” in SAS Formats and Informats: Reference
n Floating-Point Representation
Examples: Manage Problems Related to Precision 169
Example Code
You can also use the TRUNC function to determine the minimum number of bytes
that are needed to store a value accurately. The following program finds the
minimum length of bytes (MinLen) that are needed for numbers stored in a native
SAS data set named Numbers in an IBM mainframe environment. The data set
Numbers contains the variable Value. Value contains a range of numbers from 269
to 272:
data numbers;
input value;
datalines;
269
270
271
272
;
data temp;
set numbers;
x=value;
do L=8 to 1 by -1;
if x NE trunc(x,L) then
do;
minlen=L+1;
output;
return;
end;
end;
run;
proc print data=temp noobs;
var value minlen;
run;
Output 5.28 Determining How Many Bytes Are Needed to Store a Number
Accurately
Key Ideas
n The minimum length required for the value 271 is greater than the minimum
required for the value 272.
n This fact illustrates that it is possible for the largest number in a range of numbers
to require fewer bytes of storage than a smaller number.
n If precision is needed for all numbers in a range, you should obtain the minimum
length for all the numbers, not just the largest one.
See Also
n “TRUNC Function” in SAS Functions and CALL Routines: Reference
Examples: Manage Problems Related to Precision 171
Example Code
The following example illustrates an inaccurate result for large data.
data _null_;
lo2hi =0;
hi2lo =0;
do i = -10 to 10 by 0.1;
lo2hi = lo2hi + 10**i;
end;
do i = 10 to -10 by -0.1;
hi2lo = hi2lo + 10**i;
end;
diff = hi2lo-lo2hi;
put lo2hi;
put hi2lo;
put diff;
run;
The following output shows the log output from this example:
Example Code 5.4 SAS Log
lo2hi=48621160939
hi2lo=48621160939
diff=0.0041885376
Key Ideas
See Also
n “DO Statement: Iterative” in SAS DATA Step Statements: Reference
Example Code
This example shows how to use a simple 1-byte-to-1-byte swap with the
TRANSLATE function.
data sample1; /* 1 */
input @1 name $;
length encrypt decrypt $ 8;
/*ENCRYPT*/
do i = 1 to 8; /* 2 */
encrypt=strip(encrypt)||translate(substr(name,i,1),
'0123456789!@#$%^&*()-=,./?<','ABCDEFGHIJKLMNOPQRSTUVWXYZ');
end;
/*DECRYPT*/
do j = 1 to 8; /* 3 */
decrypt=strip(decrypt)||translate(substr(encrypt,j,1),
'ABCDEFGHIJKLMNOPQRSTUVWXYZ','0123456789!@#$%^&*()-=,./?<');
end;
drop i j;
datalines;
ROBERT
JOHN
GREG
;
proc print;
run;*/
Examples: Encrypt Variable Values 173
1 The DATA step reads a name that is 8 or fewer characters in length and uses a
DO loop to process the TRANSLATE function and SUBSTR function 1 byte at a
time.
2 The first DO loop creates the encrypted value.
3 The second DO loop creates the decrypted value by reversing the order and
returning the original value.
Key Ideas
n SAS provides encryption for SAS data sets with the ENCRYPT= data set option,
but this option is typically used to encrypt data at the data set level.
n To encrypt data at the SAS variable level, you can use a combination of DATA step
functions and logic to create your own encryption and decryption algorithms.
n However, if you create your own algorithms, it is important that you create a
program that not only is secure and hidden from public view, but that also contains
methods to both encrypt and decrypt the data.
See Also
n SUBSTR-left Function
n SUBSTR-right Function
n TRANSLATE Function
174 Chapter 5 / Variables
Example Code
This example shows how to encrypt values using a 1-byte-to-2-byte swap with the
TRANWRD function. In the following sample code, the DATA step reads an ID that
is 6 or fewer characters in length. However, the variable is assigned a length of 12 to
double the character length, because this is a 1-byte-to-2-byte exchange. A DO
loop processes the TRANWRD function 1 byte at a time.
data sample2; /* 1 */
input @1 id $12.;
/*ENCRYPT*/
encrypt=id;
i=21;
do from_1 = "C","F","E","A","D","B"; /* 2 */
to_1=put(i,2.);
encrypt=tranwrd(encrypt,from_1,to_1);
i+1;
end;
/*DECRYPT*/
decrypt=encrypt;
j=21;
do to_2 = "C","F","E","A","D","B"; /* 3 */
from_2=put(j,2.);
decrypt=tranwrd(decrypt,from_2,to_2);
j+1;
end;
drop i j to_1 from_1 to_2 from_2;
datalines;
ABCDEF
FEDC
ACE
BDFA
CAFDEB
BADCF
ABC
;
proc print;
run;
1 The DATA step reads an ID that is 6 or fewer characters in length. However, the
variable is assigned a length of 12 to double the character length, because this is
a 1-byte-to-2-byte exchange.
Examples: Encrypt Variable Values 175
Key Ideas
n SAS provides encryption for SAS data sets with the ENCRYPT= data set option,
but this option is typically used to encrypt data at the data set level.
n To encrypt data at the SAS variable level, you can use a combination of DATA step
functions and logic to create your own encryption and decryption algorithms.
n However, if you create your own algorithms, it is important that you create a
program that not only is secure and hidden from public view, but that also contains
methods to both encrypt and decrypt the data.
See Also
n TRANWRD Function
176 Chapter 5 / Variables
Example Code
This example shows how to encrypt a numeric value to create a character value
using a different character every third time. This method uses the PUT, SUBSTR,
INDEXC, TRANSLATE, CATS, and INPUT functions, as well as array processing.
data
sample3;
/* 1 */
input num;
array from(3) $ 10 from1-from3
('0123456789','0123456789','0123456789'); /* 2 */
array to(3) $ 10 to1-to3 ('ABCDEFGHIJ','KLMNOPQRST','UVWXYZABCD');
array old(5) $ old1-old5;
array new(5) $ new1-new5;
char_num=put(num,5.);
/* 3 */
do i = 1 to
5; /* 4 */
old(i)=substr(char_num,i,1);
end;
j=1;
do k = 1 to
5; /* 5 */
if indexc(old(k),from(j)) > 0 then do;
new(k)=translate(old(k),to(j),from(j));
j+1;
if j=4 then j=1;
end;
end;
encrypt_num=cats(of new1-
new5); /* 6 */
keep num encrypt_num;
datalines;
12345
70707
99
1111
;
run;
data
sample4;
/* 7 */
set sample3;
Examples: Encrypt Variable Values 177
proc print;
run;
1 The first DATA step reads numeric values that are 5 digits or fewer. The numeric
variable is converted to a character variable and is split into five separate
values.
2 Four ARRAY statements are used: the first array sets up the from values; the
second sets up the to values; the third holds the five separate numeric values;
and the fourth holds the five new, separate encrypted values. The from and to
arrays are each created with three elements. The from ARRAY is assigned the
same string of numbers for all three elements, and the to ARRAY is assigned a
different string of letters for each of the three elements to build the every-third-
time rotating pattern.
3 The PUT function converts the numeric value to a character value.
4 The first DO loop uses the SUBSTR function to split the value into five separate
values and assigns each to the old ARRAY.
5 The second DO loop translates each value by using the INDEXC function to find
the original number in the from ARRAY and, if found, translates the value using
the from ARRAY, and rotates through the list of elements every third time.
6 The encrypted value is created by using the CATS function to concatenate the
five translated values. The same process that is used to encrypt the values is
also used to decrypt the values. The only differences are that the encrypted
variable is passed to the SUBSTR function, and the final decrypted variable is
passed to the INPUT function following the CATS function. This is done so that
the final values are numeric values.
7 The second DATA step reverses the encryption done in the first DATA step,
converting the values back to their original values. If you compare the two DATA
steps, you can see that the values in the to and from arrays are reversed. This is
because the second DATA step reverses the encryption done in the first DATA
step, converting the values back to their original values.
178 Chapter 5 / Variables
Output 5.31 PROC PRINT Result: Use Different Functions to Encrypt Numeric
Values as Character Strings
Key Ideas
n SAS provides encryption for SAS data sets with the ENCRYPT= data set option,
but this option is typically used to encrypt data at the data set level.
n To encrypt data at the SAS variable level, you can use a combination of DATA step
functions and logic to create your own encryption and decryption algorithms.
n However, if you create your own algorithms, it is important that you create a
program that not only is secure and hidden from public view, but that also contains
methods to both encrypt and decrypt the data.
See Also
n CATS Function
n INDEXC Function
n INPUT Function
n PUT Function
n SUBSTR-left Function
n SUBSTR-right Function
n TRANSLATE Function
179
6
Data Types
1 If a shorter length is specified, then 8 bytes are used and a note is printed to the SAS log.
2 The CAS engine is required for storing VARCHAR variables in the output table of a DATA step. The DATA step can read
CAS tables containing VARCHAR variables, but it cannot store them unless a CAS engine libref is specified on the output
table.
The CHAR data type does not support zero-length character strings. Strings
defined with no blank spaces (double or single quotation marks with no characters
or blank spaces) have a length of 1.
Data Types in SAS Viya 181
For more information about data types in CAS, see “Data Types” in SAS Cloud
Analytic Services: User’s Guide.
The SAS V9 engine supports only the CHAR and NUMERIC types. For more
information about data types that are supported by the SAS V9 Engine, see “Data
Types” on page 91.
SAS Cloud Analytic Services does not support the LENGTH statement for numeric
data types and does not support DOUBLE columns that use less than 8 bytes. If
you use the LENGTH statement to store numeric data with less than 8 bytes and
then load that data into CAS, the server stores the data with 8 bytes. If you have
several numeric columns that use less than 8 bytes, the in-memory table size can
be much larger than what is required for a .sas7bdat file.
Definition
VARCHAR(length | * )
Syntax
LENGTH (variable-naame) VARCHAR(length|*);
ARRAY array-name[N] VARCHAR(length|*);
ARRAY array-name[*] VARCHAR-variables;
variable-name
specifies one or more variables that are assigned the type VARCHAR.
182 Chapter 6 / Data Types
length
specifies a numeric constant that is the user-defined maximum number of
characters that can be stored in the VARCHAR variable. This value can be up
to 536,870,911 characters in length. Uninitialized VARCHAR variables are
given a length of 1 by default. This value is based on the defined range.
length xyz varchar(32);
*
specifies that SAS uses the maximum length allowed, which is 536,870,911
characters. When assigning a character constant to a VARCHAR variable, the
character constant is limited to 32,767 bytes.
length xyz varchar(*);
array-name
specifies the name of the array. Defines the elements in an array as a list of
VARCHAR variables.
When using a list of VARCHAR variables with the ARRAY statement, you
can use the hyphen ( – ), colon / prefix, and double-dash lists:
array arr1[*] v1-v5;
array arr2[*] v:;
array arr3[*] v1--v5;
N
describes the number and arrangement of elements in the array
*
specifies the maximum length allowed, 536,870,911 characters. When
assigning a character constant to a VARCHAR variable, the character
constant is limited to 32767 bytes.
array myArray{*} varchar(*) a1 a2 a3 ('a','b','c');
Details
The VARCHAR type is a varying length character data type whose length
represents the maximum number of characters you want to store in a column.
VARCHAR variables have the following characteristics:
n their length is measured in terms of characters rather than bytes
These characteristics are in contrast to those of the CHAR data type, in which
length is fixed and measured in bytes.
For example, a VARCHAR(100) can store up to 100 characters, but the actual
storage used in any given row depends on the lengths of the individual values in the
row and column.
run;
In most cases, you should take advantage of VARCHAR support. However, if values
are consistently short, such as an ID column of airport codes, then a fixed-width
CHAR variable uses less memory and runs faster. This is because VARCHAR values
require 16 bytes plus the memory needed to store the VARCHAR value. So, if your
values are always smaller than 16 bytes, you can save memory and processing time
by using a CHAR type variable instead.
Range
SAS defines the length of a VARCHAR data type in terms of characters rather than
bytes. The maximum length of a VARCHAR variable is 536,870,911 Unicode
characters, or, 231 bytes. This means that up to 536,870,911 characters, or
2,147,483,644 bytes of data or can be stored in a VARCHAR variable. The maximum
length in bytes is calculated by multiplying 536,870,911 by the maximum length that
any one character in the UTF-8 character set can be, which is 4 bytes.
You can use the MISSING function to test whether a VARCHAR variable is missing.
if missing(var2) then var2 = "missing";
Data Types in SAS Viya 185
The DATA step supports the processing of VARCHAR data. However, only the CAS
engine supports the VARCHAR data type. This means that the DATA step can read
in and process VARCHAR data, but the data is converted to a CHAR when it is
stored as a SAS data set.
When you convert between CAS data and SAS data, the supported data type
conversions are defined by the engine.
If character strings declared as VARCHAR data types are converted to the CHAR
data type, values that are too long for the CHAR data type are truncated.
Data types can be converted from one type to another either implicitly or explicitly.
In implicit conversions SAS automatically converts data from one type to another
and the conversions are not visible to the user. An example is when you save a CAS
table containing a VARCHAR as a SAS data set. The VARCHAR is implicitly
converted to a CHAR in the output data set.
SAS language elements that can explicitly convert one data type to another are the
PUT function and the INPUT function. The following converts the numeric value of
a VARCHAR to a DOUBLE data type and writes the output to a CAS table.
data mycas.new;
length vc varchar(40);
vc = '5000';
num = input(vc,8.);
run;
proc contents data=mycas.new; run;
Output 6.1 PROC CONTENTS Output Showing Explicit Data Type Conversion
186 Chapter 6 / Data Types
n Len displays the length specified by the user when the variable is declared. The
defined length is given in 2 sub-columns, Chars and Bytes:
o Chars displays the defined length in characters, or the length that the user
specifies n the VARCHAR(n | *) statement when the variable is declared.
o Bytes displays a length in bytes that SAS calculates based on the defined
length and the SAS session encoding. SAS calculates the length by
multiplying the defined length by the largest possible number of bytes
required to store any one character in the character set encoding. The
defined length is the value defined by the user in the VARCHAR(n)
statement.
For example, if your SAS session encoding is UTF-8, then the Len Bytes
value is calculated as (n x 4), where n is the length defined in the
VARCHAR(n) statement and 4 is the largest possible number of bytes
required to store any character in the UTF-8 character set. Similarly, if the
local SAS session encoding is Latin1, then the Len Bytes value is calculated
as (n x 1), where n is the length defined in the VARCHAR(n) statement and 1
is the largest number of bytes required to store any character in the Latin1
character set.
n Max Bytes Used displays the number of bytes that are used to store the longest
string value in the column.
data mycas.new;
length vc1 varchar(*) /* defined length of vc1, shown in "Len
Chars" column */
vc2 varchar(100) /* defined length of vc2, shown in "Len
Chars" column */
vc3 varchar(10); /* defined length of vc3, shown in "Len
Chars" column */
vc2 = "123456789"; /* actual bytes used for vc2, shown in Max
Bytes Used column */
vc3 = "abc"; /* actual bytes used for vc3, shown in Max
Bytes Used column */
run;
proc contents data=mycas.new; run;
Data Types in SAS Viya 187
If you specify VARCHAR(*), then SAS defines the length as the maximum allowable
length.
When converting a variable from a VARCHAR to a CHAR, the length of the CHAR
depends on how the VARCHAR is originally defined.
n VARCHAR(*) – If a table that contains a VARCHAR(*) definition is saved as a
SAS data set, the VARCHAR is automatically converted to a CHAR that has a
length equal to the Max Bytes Used for the original VARCHAR.
n VARCHAR(n) – If a table that contains a VARCHAR(n) definition is converted to
a SAS data set, then the length of the variable depends on the local SAS session
encoding. The length is calculated as follows: SAS multiplies the current length
of the VARCHAR by the maximum value that a character’s length can be in the
local SAS session encoding.
o If the local SAS session encoding uses single-byte characters, then the
VARCHAR is converted to a CHAR with a length of (n x 1). n is the length of
the original VARCHAR and 1 is the largest number of bytes required to store
any character in the character set.
o If the local SAS session encoding uses double-byte characters, then the
VARCHAR is converted to a CHAR with length (n x 2). n is the length of the
original VARCHAR and 2 is the largest number of bytes required to store any
character in the character set.
o If the local SAS session encoding uses UTF-8 encoding, then the VARCHAR
is converted to a CHAR with length (n x 4). n is the length of the original
VARCHAR and 4 is the largest number of bytes required to store any
character in the character set.
188 Chapter 6 / Data Types
Table 6.2 Restrictions and Notable Behaviors for the VARCHAR Data Type in the
CAS Engine
Feature Description
KEY= on SET and MODIFY VARCHAR variables are not supported by the KEY=
statements option in either the SET or MODIFY statements.
PUT statement (to ODS VARCHAR variables are not supported with the PUT
output) statement when the DATA step writes output using
ODS.
Implicit declaration means that you do not have to explicitly declare a variable’s
type or length before using it. You can create a new variable and use it for the first
time in an assignment statement without having to explicitly declare its type or
190 Chapter 6 / Data Types
length. When you create a variable in this way, SAS determines the type based on
the values that you assign to the variable.
n Variables that are assigned a character string value are implicitly defined as a
CHAR types with a default length of 8 bytes.
n Variables that are assigned an integer value are implicitly defined as DOUBLE
types with a default length of 8 bytes.
Note: This is different from the V9 engine, which supports a length range of 1 -
8 bytes for NUMERIC types.
In the following DATA step, the type and length for variables x and y are set
implicitly:
libname mycas cas;
data mycas.datatypes;
x=1;
y='hello';
run;
For information about data types supported by the SAS V9 engine, see “Data
Types” on page 91.
191
7
SAS Expressions
SAS Expressions
Definitions
expression
is a sequence of operands and operators that form a set of instructions that are
performed to produce a resulting value.
operands
are constants or variables that can be numeric or character.
operators
are symbols that represent a comparison, arithmetic calculation, or logical
operation; a SAS function; or grouping parentheses.
simple expression
is an expression with no more than one operator. A simple expression can
consist of one of the following single operators:
n constant
n variable
n function
192 Chapter 7 / SAS Expressions
compound expression
is an expression that includes several operators. When SAS encounters a
compound expression, it follows rules to determine the order in which to
evaluate each part of the expression.
WHERE expression
is a type of SAS expression that is used within a WHERE statement or WHERE=
data set option to specify a condition for selecting observations for processing
in a DATA or PROC step.
See Also
n Operators
n Conditionally Selecting Data
n WHERE Statement
8
SAS Constants
Definitions
A SAS constant is a number or a character string that indicates a fixed value.
Constants can be used as expressions in many SAS statements, including variable
assignment and IF-THEN statements. Here are some examples of constants used in
SAS expressions:
n x=10;
n name="James";
n date=01/23/2018;
196 Chapter 8 / SAS Constants
They can also be used as values for certain options. Constants are also called
literals. The following are types of SAS constants:
n character constants
n numeric constants
Character Constants
Definition
A character constant consists of 1 to 32,767 characters and must be enclosed in
quotation marks. The quotation marks can be either single or double quotation
marks. 'Tom' and "Tom" are equivalent.
In the second set of examples, SAS searches for variables named ABC and SMITH,
instead of constants.
CAUTION
Matching quotation marks correctly is important. Missing or extraneous quotation
marks cause SAS to misread both an erroneous statement and the statements that
follow it. For example, in name='O'Brien';, O is the character value of Name, Brien is
extraneous, and '; begins another quoted string.
Commas can be used to make the string more readable, but they are not part of and
do not alter the hexadecimal value, as in this example:
'31,32,33,34'x
Note: Trailing or leading blanks within the quotation marks cause an error message
to be written to the log.
See Also
Examples
n “Example: Use Character Constants in Expressions” on page 203
n “Example: Compare Character Constants with Character Variables”
n “Example: Use Quotation Marks Within Strings”
n “Example: Define Character Constants in Hexadecimal Notation”
SAS Constants in Expressions 199
Numeric Constants
Definition
A numeric constant is a number that appears in a SAS statement. Numeric
constants can be presented in many forms, including these:
n standard notation
n hexadecimal notation
1 is an unsigned integer
Numeric constants that are larger than (1032)−1 must be written using scientific
notation. For example, a number such as 2E4 would need to be written in scientific
notation.
200 Chapter 8 / SAS Constants
See Also
Examples
n “Example: Define Numeric Constants in Standard Notation”
n “Example: Define Numeric Constants in Scientific Notation”
n “Example: Define Numeric Constants in Hexadecimal Notation”
Statements
n FORMAT Statement
“ddmmm<yy>yy”D "01jan18"D
SAS Constants in Expressions 201
“hh:mm<:ss.s>”T "9:25:19pm"T
“ddmmm<yy>yy:hh:mm<:ss.s>”DT "18jan2018:9:27:05am"DT
IMPORTANT UTC or ISO 8601 Datetime constants, which have the Zulu
timezone indication or a numeric offset from the Universal Coordinate Time,
are converted to local time by adjusting the internal value according to the
system timezone offset. This adjustment occurs regardless of whether the
system TIMEZONE option has been explicitly set. If you have specified the
TIMEZONE= system option, then SAS converts UTC and ISO 8601
DATETIME constants based on the value that you specify in the
TIMEZONE= system option. Otherwise, if you do not specify the
TIMEZONE= system option, then SAS converts UTC and ISO 8601
DATETIME constants based on the system UTC time zone offset.
Trailing blanks or leading blanks that are included within the quotation marks do
not affect the processing of the date constant, time constant, or datetime constant.
See Also
“Example: Define Date, Time, and Datetime Values in Date Constants” on page 211
When SAS tests a character value, it aligns the left-most bit of the mask with
the left-most bit of the string; the test proceeds through the corresponding bits,
moving to the right.
When SAS tests a numeric value, the value is truncated from a floating-point
number to a 32-bit integer. The right-most bit of the mask is aligned with the
right-most bit of the number, and the test proceeds through the corresponding
bits, moving to the left.
comparison-operator
compares an expression with the bit mask. For more information, see Operators.
bit-mask
is a string of 0s, 1s, and periods in quotation marks that is immediately followed
by a B, such as '..1.0000'b. Zeros test whether the bit is off; ones test whether
the bit is on; and periods ignore the bit. Commas and blanks can be inserted in
the bit mask for readability without affecting its meaning.
CAUTION
Truncation can occur when SAS uses a bit mask. If the expression is longer than
the bit mask, SAS truncates the expression before it compares it with the bit mask. A
false comparison might result. An expression's length (in bits) must be less than or equal
to the length of the bit mask. If the bit mask is longer than a character expression, SAS
generates a warning in the log, stating that the bit mask is truncated on the left, and
continues processing.
Note: Bit masks cannot be used as bit literals in assignment statements. For
example, the following statement is not valid:
x='0101'b; /* incorrect*/
See Also
Examples
n “Example: Bit Test a Variable’s Value”
Examples: Expressions and Constants 203
Formats
n $BINARYw. Format
Example Code
The following example illustrates how to create character constants using single
and double quotation marks:
data example;
char_const = 'Tom'; /* 1 */
apostrophe = "Tom's"; /* 2 */
singlequote = 'Tom''s'; /* 3 */
run;
proc print data=example;
run;
1 A character constant is created using single quotation marks around the string
value.
2 A character constant contains an apostrophe, which is created using double
quotation marks on the outside of the string. A single quotation mark on the
inside represents the apostrophe.
3 You can also use two single quotation marks in the string with single quotation
marks around the whole value.
Key Ideas
n Missing or extraneous quotation marks cause SAS to misread the statement and
generate an error.
n If a character constant contains an apostrophe (or single quotation mark), it could
be created by enclosing the whole string in double quotation marks.
See Also
n “Character Constants and Character Variables”
Example Code
The following example illustrates creating character constants with character
variables:
data compare;
var1 = 'abc'; /* 1 */
var2 = 'def'; /* 2 */
name1 = var1; /* 3 */
name2 = var2; /* 4 */
run;
1 Create character variables, var1 and var2, by placing their assigned values in
single or double quotation marks. The variables var1 and var2 are created from
character constants because their values are contained within quotation marks.
2 Create character variables name1 and name2. Do not place the values in
quotation marks. The value for the variables is the name of the character
constants created in Step 1.
Examples: Expressions and Constants 205
3 The variable name1 is a character variable whose value is the value of var1. The
value for var1 is 'abc'.
4 The variable name2 is a character variable whose value is the value of var2. The
value for var2 is 'def'.
The variables name1 and name2 are not created from character constants. They are
character variables because the values assigned to them are not contained within
quotation marks. Their values are the names of the previously created character
variables var1 and var2. If you assign a character string to a variable and do not use
quotation marks around the value, SAS expects the value to be the name of an
existing variable.
Key Ideas
n Character constants are enclosed in quotation marks, but variable names are not.
See Also
n “Character Constants and Character Variables”
Example Code
The following example illustrates using quotation marks, both double and single,
within character constants:
data titles;
book1 = "Uncle Tom's Cabin"; /* 1 */
book2 = 'Uncle Tom''s Cabin'; /* 2 */
book3 = '"Ben Hur"'; /* 3 */
book4 = """Ben Hur"""; /* 4 */
run;
proc print data=titles;
run;
Key Ideas
n You can use one of two methods to escape quotation marks in character
constants. You can use alternating double and single quotation marks, or you can
double up on the quotation marks to escape the character.
n If a constant contains an apostrophe or a single quotation mark, then use double
quotation marks around the entire constant value and use the single quotation
mark inside the value. Alternatively, you can “escape” the single quotation mark by
using another single quotation mark.
Examples: Expressions and Constants 207
See Also
n “Character Constants and Character Variables”
Example Code
The following example illustrates character constants in hexadecimal notation:
data _null_;
value1='534153'x; /* 1 */
value2='53,41,53'x; /* 2 */
put value1 "is identical to " value2;
run;
Key Ideas
See Also
n “Character Constants Expressed in Hexadecimal Notation”
Example Code
The following example defines numeric constants expressed in standard notation.
The numeric constants assigned to num1–num5 have different representations. num1
is an unsigned integer, num2 contains a leading zero, num3 and num4 contain a plus
sign, and num5 represents the signed integer, negative 1.25. The minus sign (-) is
used to represent negative integers or negative numbers.
Note: Even if you specify a leading zero when creating your variables, leading zeros
are dropped by default for numeric variables.
data _null_;
num1=1;
num2=01;
num3=+1;
num4=+01;
put num1 "= " num2 "= " num3 "= " num4;
num5=-1.25;
put num5;
run;
1 = 1 = 1 = 1
-1.25
Examples: Expressions and Constants 209
Key Ideas
n You can express a numeric constant in standard notation using plus and minus
signs to indicate positive and negative numbers.
n You can also express numeric constants in scientific and hexadecimal notation.
See Also
n Numeric Constants Expressed in Standard Notation
Example Code
The following example defines numeric constants in scientific notation:
data _null;
large1=1.2e23;
med1=0.5e-10;
put "large1=" large1;
put "med1=" med1;
run;
large1=1.2E23
med1=5E-11
Key Ideas
n In scientific notation, the number before the E is multiplied by the power of ten,
which is indicated by the number after the E.
210 Chapter 8 / SAS Constants
n For numeric constants that are larger than (1032)-1, you must use scientific
notation.
n Use E and an exponent after the value to signify scientific notation.
n You can also express numeric constants in standard and hexadecimal notation.
See Also
n “Numeric Constants Expressed in Standard Notation”
Example Code
The following example defines numeric constants in hexadecimal notation:
data _null;
hex1=0c1x;
hex2=9x;
put "hex1=" hex1;
put "hex2=" hex2;
run;
hex1=193
hex2=9
Key Ideas
See Also
n “Numeric Constants Expressed in Standard Notation”
Example Code
The following example contains three illustrations of date, time, and datetime
constants:
data dtconsts;
date='1jan2018'd; /* 1 */
time="9:25:19pm"t; /* 2 */
datetime='18jan2018:9:27:05am'dt; /* 3 */
run;
1 The variable date is expressed as a date constant, which has single quotation
marks and is followed by a d signifying date.
2 The variable time is expressed as a time constant, which has double quotation
marks and is followed by a t signifying time.
3 The variable datetime is expressed as a datetime constant, which has single
quotation marks and is followed by a dt signifying datetime.
4 PROC PRINT uses the FORMAT statement to format the date, time, and
datetime constants. Without the FORMAT statement the result is displayed as a
number, which represents the SAS date, time, and datetime values.
212 Chapter 8 / SAS Constants
Key Ideas
n Trailing blanks or leading blanks that are included within the quotation marks do
not affect the processing of the date constant, time constant, or datetime
constant.
See Also
n “Dates, Times, and Intervals”
Example Code
The following example uses bit masks to do bit testing. The example tests whether
bit 4 is 1 (TRUE) for the values 8 and 7.
data _null_;
var=8;
if var="1..."b then state="1";
else state="O";
put "For value " var "bit 4 is " state;
var=7;
if var="1..."b then state="1";
else state="O";
put "For value " var "bit 4 is " state;
run;
Key Ideas
n A bit testing constant is a bit mask that is used in bit testing to compare internal
bits in a value's representation.
n You can perform bit testing on both character and numeric variables.
n Enclose the bit mask in quotation marks and use b to signify a bit mask.
See Also
n “Bit Testing Constants”
n $BINARYw. Format
Example Code
The following example illustrates a common error that you might get when you use
a string in quotation marks. It is followed by a variable name, which does not have a
space between the quoted string and the variable name. ‘821’t is evaluated as a
time constant because there is no space between the numeric value of 821 and the
t in the THEN statement.
data aireuro;
set sasuser.europe;
if flight='821'then flight='230';
run;
To correct this error, insert a blank space between the ending quotation mark and
the t in the THEN statement. This eliminates the misinterpretation. No error
message is generated and all observations with a FLIGHT value of 821 are replaced
with a value of 230.
if flight='821' then flight='230';
Key Ideas
n Always insert a blank space between ending quotation marks and variable names
to avoid errors.
n Without the blank space, SAS misinterprets a character constant followed by a
letter as a special SAS constant.
See Also
n Avoiding a Common Error with Character Constants
215
9
Operators
SAS Operators
n infix operators
216 Chapter 9 / Operators
+y
n Constants:
-25
n Functions:
-cos(angle1)
n Parenthetical expressions:
+(x*y)
a=b+c
n Comparison:
Weight>150
n Logical or Boolean:
if x or y eq 1
n Concatenation:
Name=Firstname || Lastname
SAS also provides several other operators that are used only with certain SAS
statements. The WHERE statement uses a special group of SAS operators, valid
only when used with WHERE expressions. For a discussion of these operators, see
WHERE Statement.
Arithmetic Operators
Arithmetic operators perform calculations, as shown in the following table.
1 The asterisk (*) is always necessary to indicate multiplication; 2Y and 2(Y) are not valid expressions.
See “Order of Operation in Compound Expressions” on page 225 for the order in
which SAS evaluates these operators.
Note: When a value that is used with an arithmetic operator is missing, the result
is a missing value. See “Missing Variable Values” on page 102 for information about
how to prevent the propagation of missing values.
Comparison Operators
Comparison operators express a condition. If the comparison is true, the result is 1.
If the comparison is false, the result is 0.
You can add a colon (:) modifier to any of the operators to compare only a specified
prefix of a character string. See “Character Comparisons” in SAS Language
Reference: Concepts for details.
= or EQ equal to a=3
1 The symbol that you use for NE depends on your operating environment.
2 The symbol => is also accepted for compatibility with previous releases of SAS. It is not supported
in WHERE clauses or in PROC SQL.
3 The symbol =< is also accepted for compatibility with previous releases of SAS. It is not supported
in WHERE clauses or in PROC SQL.
Numeric Comparisons
SAS makes numeric comparisons that are based on true and false values. The
expression evaluates to 1 if the expression is true and the expression evaluates to 0
if the expression is false. For example, in the expression A<B, if A has the value 4
and B has the value 3, then A<B has the value 0, or false.
You might get an incorrect result when you compare numeric values of different
lengths because values less than 8 bytes have less precision than those longer than
8 bytes. Rounding also affects the outcome of numeric comparisons. See “SAS
Variables” in SAS Language Reference: Concepts for a complete discussion of
numeric precision.
A missing numeric value is smaller than any other numeric value, and missing
numeric values have their own sort order. See “Missing Variable Values” on page
102 for more information.
SAS Operators 219
Character Comparisons
Character variable values are compared character by character from left to right.
Character order depends on the collating sequence used by your computer, and
your SAS session encoding option.
For example, in the Latin 1 (cp1252 West European) and UTF-8 (UTF-8 Unicode)
collating sequences, G is greater than A. Therefore, this expression is true:
'Gray'>'Adams'
There are several important notes to keep in mind when making character
comparisons:
n A blank and a period in character strings are smaller than any other printable
character in the string. A blank is smaller than a period. For example, the
following expressions are true:
'C.Jones'<'CharlesJones'
'C Jones' < 'CJones'
'C Jones' < 'C.Jones'
n Trailing blanks are ignored in a comparison. For example, 'fox ' is equivalent to
'fox'.
n A colon modifier after the comparison operator compares the quoted characters
after the colon with value on the other side of the comparison operator. SAS
truncates the longer value to the length of the shorter value during the
comparison. For example, see “Example: Compare a Specified Prefix of a
Character Expression” on page 236.
n Character values are case sensitive.
IN Operator
The IN operator checks whether a value exists in a list and selects records that
match the search. The value that is checked against the list can be the result of an
expression. Individual values in the list can be separated by commas or spaces. You
can use a colon to specify a range of sequential integers.
For more information and examples of using the IN operator, see “The IN Operator
in Numeric Comparisons” in SAS Language Reference: Concepts and “Example:
Search an Array of Numeric Variable Values Using the IN Operator”.
The following table shows how to use the IN operator with numeric and character
variables.
Logical Operators
Logical operators, also called Boolean operators, are usually used in expressions to
link a sequence of expressions into compound expressions.
A numeric expression without any logical operators can serve as a logical numeric
expression.
! or OR
¦ or OR
¬ or NOT2 not(a>b)
∘ or NOT
222 Chapter 9 / Operators
~ or NOT
1 The symbol that you use for OR depends on your operating environment.
2 The symbol that you use for NOT depends on your operating environment.
See “Order of Operation in Compound Expressions” for the order in which SAS
evaluates these operators.
AND Operator
Two comparisons with a common variable linked by AND can be condensed with an
implied AND. The following two subsetting IF statements produce the same result:
n if 16<=age and age<=65;
n if 16<=age<=65;
OR Operator
A comparison using the OR operator resolves as true if only one of the operands is
true. Any nonzero, nonmissing constant is always evaluated as true. Therefore, in
the list below, the first subsetting IF statement is always true and the second is not
necessarily true:
n if x=1 or 2;
n if x=1 or x=2;
NOT Operator
The NOT operator is a prefix operator and a logical operator. The NOT operator
inverts the truth of a statement value. The result of negating a false statement (0)
is true (1).The result of negating a true statement (1) is false (0).
Comparisons that use the NOT operator can be written differently and yield the
same results. The following two expressions are equivalent:
n not(a=b & c>d) is the same as a ne b | c le d
For example, see “Example: Use the NOT Operator to Reverse the Logic of a
Comparison” on page 238.
SAS Operators 223
For example, suppose that you want to assign values to the variable Remarks
depending on whether the value of Cost is present for a given observation. You can
write the IF-THEN statement as follows:
if cost then remarks='Ready to budget';
The numeric value that is returned by a function is also a valid numeric expression:
if index(address,'Avenue') then do;
>< or MIN 1 Returns the lower of the two values. n where x = (b min c);
n if x = (y><z);
<> or MAX 2 Returns the higher of the two values. n where a=(b max c)
224 Chapter 9 / Operators
n if x = (a<>b)
1 In a WHERE expression, the symbol representation >< is not supported. The MIN mnemonic is converted to >< in the LOG.
2 In a WHERE expression, the symbol representation <> is interpreted as “not equal to”.
If missing values are part of the comparison, SAS uses the sorting order for missing
values that is described in “Order of Missing Values” on page 103.
Concatenation Operators
The concatenation operator combines character values. It is indicated by the
double vertical bar ||. The results of a concatenation operation are usually stored in
a variable with an assignment statement, as in <inlineCode>level='grade '||'A'</
inlineCode>. The length of the resulting variable is the sum of the lengths of each
variable or constant in the concatenation operation. You can use a LENGTH or an
ATTRIB statement to specify a different length for the new variable.
The concatenation operator does not trim leading or trailing blanks. If variables are
padded with trailing blanks, check the lengths of the variables and use the TRIM
function to trim trailing blanks from values before concatenating them.
Concatenate the value of a variable with newname='Mr. or Ms. ' ||oldname; If the value of
a character constant. OldName is 'Jones', then NewName has the value 'Mr. or
Ms. Jones'
SAS Operators 225
Eliminate trailing blanks using the TRIM Use the TRIM function in a Concatenation Operation to
function in a concatenation operation. Eliminate Trailing Blanks on page 240
n CATX Function: removes leading and trailing blanks and inserts delimiters
n SAS does not guarantee the order in which subsequent expressions are
evaluated. Fore more information, see “Short-Circuit Evaluation in SAS”.
Order of Symbol/
Priority Evaluation Mnemonic Action Example
Order of Symbol/
Priority Evaluation Mnemonic Action Example
n x=2**3**4 is evaluated
as x=(2**(3**4))
n -3><-3 is evaluated as
-(3><-3). These are
equal to -(-3), which
equals +3.
/ division f=g/h;
- subtraction f=g-h;
= or EQ equal to if y eq (x+a)
then output;
¬= or NE not equal to if x ne z
then output;
y = x in (1:10);
SAS Operators 227
Order of Symbol/
Priority Evaluation Mnemonic Action Example
Group 6 left to right & or AND logical and if a=b & c=d
then x=1;
1 The plus (+) sign can be either a prefix or arithmetic operator. A plus sign is a prefix operation only when it appears at the
beginning of an expression or when it is immediately preceded by an open parenthesis or another operator.
2 The minus (−) sign can be either a prefix or arithmetic operator. A minus sign is a prefix operator only when it appears at
the beginning of an expression or when it is immediately preceded by an open parenthesis or another operator.
3 Depending on the characters available on your keyboard, the symbol can be the not sign (¬), tilde (~), or caret (^). The
SAS system option CHARCODE allows various other substitutions for unavailable special characters.
4 Depending on the characters available on your keyboard, the symbol that you use as the concatenation operator can be a
double vertical bar (||), broken vertical bar (¦¦), or exclamation mark (!!).
5 Group 5 operators are comparison operators. The result of a comparison operation is 1 if the comparison is true and 0 if it
is false. Missing values are the lowest in any comparison operation. The symbols =< (less than or equal to) are also
allowed for compatibility with previous versions of SAS.
6 An exception to this rule occurs when two comparison operators surround a quantity. For example, the expression x<y<z
is evaluated as (x<y) and (y<z).
7 Depending on the characters available on your keyboard, the symbol that you use for the logical or can be a single vertical
bar (|), broken vertical bar (¦), or exclamation mark (!). You can also use the mnemonic equivalent OR.
Note: When a value that is used with an arithmetic operator is missing, the result
is a missing value. See “Missing Variable Values” on page 102 for information about
how to prevent the propagation of missing values.
SAS does not guarantee short-circuit evaluation. When using Boolean operators to
join expressions, you might get undesired results if your intention is to short circuit,
or to avoid the evaluation of the second expression. To guarantee the order in
which SAS evaluates an expression, you can rewrite the expression using nested IF
statements. The following examples show how SAS might use short-circuit
evaluation at some times and not at others. The final example shows how you can
use nested IF statements to guarantee the order of evaluation.
In the first example below, SAS uses short-circuit evaluation when it evaluates the
first argument of the condition a>0. The expression evaluates to FALSE, and as a
result, SAS does not evaluate the second expression a=1/a, which contains an
invalid, division-by-zero operation. Since SAS does not evaluate this second
expression, the program does not return the error.
228 Chapter 9 / Operators
data test;
a=0;
if (a>0 AND a=1/a) then put 'hello';
else put 'goodbye';
run;
goodbye
NOTE: The data set WORK.TEST has 1 observations and 1 variables.
In the next example, short-circuit evaluation is not used. Even though the first
argument in the condition, a, evaluates to FALSE, the second argument, a=1/a, is
evaluated and a division-by-zero error is returned.
data test;
a=0;
if (a AND a=1/a) then put 'hello';
else put 'goodbye';
run;
To guarantee the order in which SAS evaluates an expression such as this one, you
can rewrite the expression using nested IF statements.
data test;
a=0;
IF a>0 THEN
DO;
IF a=1/a THEN put 'hello';
ELSE put 'goodbye';
END;
ELSE
put 'goodbye again';
run;
goodbye again
NOTE: The data set WORK.TEST has 1 observations and 1 variables.
Summary of Ways to Use Operators 229
Arithmetic Operators
Comparison Operators
= or EQ equal to n a=3
n ‘Jones’ EQ ‘Jones’
n ‘Jones’ NE ‘JONES’
n ‘CharlesJones’>’C.Jones’
n state in ('NY','NJ','PA')
230 Chapter 9 / Operators
Logical Operators
& or AND If both of the quantities linked by an AND are (a>b & c>d)
1 (true), then the result of the AND operation
is 1. Otherwise, the result is 0.
! or OR
¦ or OR
¬ or NOT not(a>b)
∘ or NOT
~ or NOT
>< or MIN Returns the lower of the two values. n where x = (b min c);
n if x = (y><z);
<> or MAX Returns the higher of the two values. n where a=(b max c)
n if x = (a<>b)
Concatenation Operator
Examples: Operators
Example Code
This example demonstrates how comparison operators can be used in an IF-THEN/
ELSE Statement to subset data.
data greencars;
set sashelp.cars;
MPG_AVG=(MPG_City + MPG_Highway)/2;
if MPG_AVG>30 then Green_Rating=1; /* 1 */
if MPG_AVG<30 and MPG_AVG>=25 then Green_Rating=2; /* 2 */
else if MPG_AVG<=25 then delete; /* 3 */
proc print data=greencars;
var Make Model MPG_City MPG_Highway MPG_AVG Green_Rating;
run;
Key Ideas
n You can use arithmetic operators in an Assignment Statement when creating new
variables.
n Numeric comparisons yield a 1 if the result is true and a 0 if the result is false.
See Also
n “Summary of Ways to Use Operators Tables” on page 229
Example Code
The following example shows how to use comparisons in the Assignment
Statement to create a variable.
data test; /* 1 */
x=6;
y=8;
c=5*(x<y)+12*(x>=y); /* 2 */
put c; /* 3 */
run;
Key Ideas
See Also
n “Summary of Ways to Use Operators Tables” on page 229
Example Code
This example shows how you can use the IN operator to search an array of numeric
values. The following code creates an array a, defines a constant x, and uses the IN
operator to search for the value of x in the array.
data test;
array a{10} (2*1:5); /* 1 */
x=99; /* 2 */
y= x in a; /* 3 */
put y=;
run;
proc print data=test;run;
The code above shows how to create the array and assign values. The code below
shows how to assign values once you add a{5}=98.
data test;
array a{10} (2*1:5);
x=99;
a{5} = 99; /* 1 */
y = x in a;
put y=;
run;
proc print data=test;
run;
1 The Assignment statement assigns the value 99 to the fifth element in the array,
overwriting the original value of 5 that was assigned in the ARRAY statement.
Key Ideas
n You can assign values to the variables in an array when you create it by specifying
the values in parenthesis.
n PROC SQL does not support the IN operator.
Examples: Operators 235
See Also
n “IN Operator” on page 219
Example Code
In this example, the array, a, defines the constant, x, and then uses the IN operator
to search for x in the array.
data _null_;
array a{5} $ (5*'');
x='b1';
y = x in a;
put y=;
a{5} = 'b1';
y = x in a;
put y=;
run;
Example Code 9.1 Results from Using the IN Operator to Search an Array of Character
Values (Partial Output)
Key Ideas
n You can also use the IN operator to search an array of numeric values.
236 Chapter 9 / Operators
See Also
n “Why Use the IN Operator?” on page 220
Example Code
This example shows how you can use a colon (:) after the comparison operator to
compare only a specified prefix of a character expression.
data restaurantratings;
length Action $10;
input Name $1-18 Eval_1 20-21 Eval_2 Eval_3 Status $;
if Status=:'F' then Action='Contact';
if Status=:'P' then Action='Print Card';
datalines;
Lilac and Lavender 97 95 99 Passed
The Salty Pearl 85 70 65 F
Taste the Range 90 92 90 Pass
The Underground 85 90 90 P
When Pigs Fly 70 70 67 Fail
Basil 85 90 90 Passed
;
Key Ideas
n In this example, the colon modifier after the equal sign tells SAS to select only the
first character. But it has the capability to compare as many characters as are
placed in quotation marks after the colon modifier.
n SAS truncates the longer value to the length of the shorter value during the
comparison.
n If you compare a zero-length character value with any other character value in
either an IN: comparison or an EQ: comparison, the two-character values are not
considered equal. The result always evaluates to 0, or false.
See Also
n SAS Functions and CALL Routines: Reference
Example Code
This example shows how you can use Boolean operators to compare variables.
data inventory;
length Restock $ 3;
input ItemNum 1-4 Quantity 5-7 PriorityLev 8-10;
if Quantity<=10 and PriorityLev>=5
or Quantity<=10 then Restock=1;
else Restock=0;
datalines;
6530 10 8
8759 3 1
4573 22 10
9237 2 10
4329 12 4
9831 9 7
9830 15 4
9458 25 1
5673 4 10
238 Chapter 9 / Operators
7562 7 3
3291 3 10
;
Key Ideas
n If both of the quantities linked by AND are 1 (true), then the result of the AND
operation is 1. Otherwise, the result is 0.
n If either of the quantities linked by an OR is 1 (true), then the result of the OR
operation is 1 (true). Otherwise, the OR operation produces a 0.
See Also
n “Boolean Numeric Expressions” in SAS Language Reference: Concepts.
Example Code
This example shows how you can use the NOT operator with the IN operator to
reverse the logic of a comparison.
Examples: Operators 239
data productreviews;
length Category $ 10;
input ProdID Satisfaction Quality Safety Usability;
Avg= round((Satisfaction+Quality+Safety+Usability)/4,0.1);
if (avg>=3) then Category='Pass'; /* 1 */
else if (avg) NOT IN (3,4,5) then Category='Fail';
datalines;
8954 5 3 2 5
9183 5 5 5 5
6839 1 1 1 1
3493 2 1 3 2
2908 3 2 3 3
5419 5 4 5 5
3759 3 4 3 3
5301 4 3 4 4
;
run;
proc print data=productreviews;
var ProdID avg Category;
run;
Key Ideas
n You can use the NOT operator with other operators to reverse the logic of the
comparison.
See Also
n “NOT Operator” on page 222
240 Chapter 9 / Operators
Example Code
In this example, the TRIM function is used with the concatenation operator to
remove the trailing blanks that are visible after the concatenation of the variables
color and name.
data namegame;
length color name $8 game $12;
color='black'; /* 1 */
name='jack';
game=trim(color)||name;/* 2 */
put game=;
run;
1 The length of the color variable is eight, so the value black has 3 trailing
blanks.
2 The value of game is black jack. The TRIM function removes the spacing.
Key Ideas
n You can use the CAT Functions to perform concatenation operations without
needing to use the TRIM and PUT functions.
See Also
n SAS Functions and CALL Routines: Reference
241
10
Dates and Times
n “About Date and Time Intervals” in SAS Formats and Informats: Reference
242 Chapter 10 / Dates and Times
243
11
Component Objects
SAS provides five predefined component objects for use in a DATA step: hash, hash
iterator, logger, appender, and Java objects.
PART 3
Accessing Data
Chapter 12
SAS Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Chapter 13
SAS Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
Chapter 14
SAS Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Chapter 15
Raw Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Chapter 16
Database and PC Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Chapter 17
SAS Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
Chapter 18
SAS Dictionary Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
246
247
12
SAS Libraries
Each SAS library is associated with a libref, an engine, a physical location, and
options that are specific to the engine or environment.
The libref is a shortcut name or a nickname that you can use to reference the
physical location of the data. For example, in the two-level name mylib.myfile,
the libref is mylib, and the library member is myfile.
Library members can be any of the SAS file types in the following table. Other files
can be stored in the directory, but only SAS files are recognized as part of the SAS
library.
Member
File Type Examples
data set or DATA n “Example: Create and Print a V9 Engine Data Set”
table in SAS V9 LIBNAME Engine: Reference
n “Example: Access DBMS Data as a SAS Library”
Member
File Type Examples
In output from the DATASETS procedure that lists the contents of a library, you
might notice that additional files are listed below a data set and have a different
member type. These files are attributes of the data set and are stored in separate
files. These member types are associated with a data set and cannot be specified in
a MEMTYPE= option.
Table 12.2 Data Set Attributes That Are Stored in Separate Files
index INDEX
Each SAS member type has a distinctive file extension. Therefore, a library can
contain files with the same name but with different member types. File extensions
vary, depending on the operating environment:
n “File Extensions and Member Types in UNIX Environments” in SAS Companion
for UNIX Environments
n “File Extensions for SAS Files” in SAS Companion for Windows
New Library window Check the Enable at start-up box in the SAS windowing
environment or the Re-create this library at start-up
box in SAS Studio.
The libref is a shortcut name or a nickname that you can use to reference the
physical location of the data. For example, in the two-level name mylib.myfile,
the libref is mylib, and the library member is myfile.
n A libref is valid only for the current SAS session unless you choose a method of
persisting the library assignment.
n You can deassign (clear) a libref before the session ends.
n When a libref is deassigned or the session ends, the library members are not
deleted from the physical storage location. (However, contents of the Work
library are deleted when the session ends.)
n You can reference a libref repeatedly within a SAS session.
252 Chapter 12 / SAS Libraries
n If you use a macro variable as a libref name, do not enclose the variable in
quotation marks. Place a delimiter after the variable. For example, the two-level
name &mylib..mydata references the &mylib variable to find the libref for the
library where the mydata file is located. See “Creating a Period to Follow
Resolved Text” in SAS Macro Language: Reference.
Library Engine
The engine name might not be required in a library assignment, but specifying the
engine is a best practice:
libname libref engine 'location' options;
SAS provides a number of engines to access SAS files or files formatted by another
application or DBMS. The shipped default Base SAS engine is BASE. In SAS®9 and
SAS® Viya®, the BASE engine is an alias for the V9 engine. If you do not specify an
engine name when you create a new library, and if you have not specified the
ENGINE system option, then the V9 engine is automatically selected. If the library
location already contains SAS files, then SAS might be able to assign the correct
engine based on those files. For example, if the location contains V9 data sets only,
then SAS assigns the V9 engine. However, if a library location contains a mix of
different engine files, then SAS might not assign the engine you want. Therefore,
specifying the engine is a best practice. For more information, see Chapter 13, “SAS
Engines,” on page 289. See also “Example: Set a Default Engine” on page 307.
Library Location
The physical location is required in a library assignment:
libname libref engine 'location' options;
Specify the path name of the physical location where you want to create or access
data. If you specify a relative path name, it references your current working
directory, which depends on the SAS interface and your deployment.
Enclose the physical location in single or double quotation marks. If you are
concatenating libraries, the syntax for library location is slightly different.
If the library’s physical location does not exist before you submit the LIBNAME
statement, SAS writes a note to the log. In some cases, if you create the location
after you submit such a LIBNAME statement, you can then successfully use the
libref. However, in many processing modes and interfaces, you must resubmit the
LIBNAME statement after you create the physical location.
You can set the DLCREATEDIR system option to automatically create a new
subdirectory for the library.
Elements of a Library Assignment 253
Library Options
Library options might be required in a library assignment, depending on the engine
and environment:
libname libref engine 'location' options;
When you access data that is stored in a DBMS or application other than SAS,
usually you must specify connection information. You might need additional
LIBNAME statement options that are specific to the engine.
In addition, some SAS data set options are also available as LIBNAME statement
options. Note that LIBNAME statement options take precedence over system
options. Data set options take precedence over LIBNAME statement options.
For LIBNAME statement options that are specific to an operating environment, see
the SAS companions:
n “LIBNAME Statement: UNIX” in SAS Companion for UNIX Environments
For LIBNAME statement options that are specific to a SAS engine, see documents
such as these:
n SAS V9 LIBNAME Engine: Reference
Here are the rules for specifying a physical location instead of a libref:
n Enclose the path name and file name in quotation marks. You can omit the file
extension if the file is a SAS data set.
n The quoted path name and file name must conform to the naming conventions
of your operating environment.
n Specifying a libref is preferred to specifying the physical location, due to the
benefits of library assignment.
n A quoted physical location is not supported for the following SAS features:
Library Concatenation
For a more detailed example, see “Example: Concatenate SAS Libraries” on page
262.
n If any library in the concatenation is sequential, then all of the libraries are
treated as sequential.
The order in which you list the libraries determines how SAS files are processed:
n When a data set is opened for input or update, the concatenated libraries are
searched in the order in which they are listed in the concatenation. The first
occurrence of the data set is used.
n When a data set is created, it is created in the first library that is listed in the
concatenation, even if a file exists with the same name in another library in the
concatenation.
n When you delete or rename a SAS file, only the first occurrence of the file is
affected.
n When a list of SAS files is displayed, only the first occurrence of a file name is
shown.
n A SAS file that is logically associated with another file (such as an index to a
data set) is included in the concatenation only if the parent file resides in that
same library. This rule is affected by the rule that only the first occurrence of a
file is used. For example, two libraries are concatenated. Both libraries contain a
data set that is named Mytable. In the second library, Mytable is indexed. The
index is not included in the concatenation.
n The attributes of the first library determine the attributes of the concatenation.
For example, if the first library is Read-Only, then the entire concatenated
library is Read-Only.
The Work library enables you to specify one-level names for temporary storage.
Specifying a one-level name means that you omit a libref and specify the file name
only, without quotation marks. See “Example: Use the Work Library for Temporary
Data” on page 272.
By default, the Work library is deleted at the end of each SAS session if the session
terminates normally. To change the default behavior for the Work library, see the
WORK=, WORKINIT, and WORKTERM system options in SAS System Options:
Reference.
In contrast to the Work library, most SAS libraries are permanent, not temporary.
User Library
The User library enables you to specify one-level names for permanent storage. If
you assign a User library, then all files that are created with a one-level name are
stored in the User library instead of the temporary Work library. When you refer to
a file by a one-level name, SAS looks for the file in the User library. If you have not
assigned a User library, then SAS looks for the file in the Work library.
You can assign the User library by using one of the methods in the User library
example on page 273. See also “USER= System Option” in SAS System Options:
Reference.
Files that are stored in the User library are not deleted by SAS when the session
terminates. Data files that SAS creates internally are still stored in the Work library.
After you assign the User library, if you want to create a temporary file in Work, you
must specify Work in a two-level name, such as Work.MyFile.
For the SPD Engine, you can use the TEMP=YES LIBNAME statement option with
the USER= system option to store temporary data sets that can be referenced with
a one-level name. See “TEMP= LIBNAME Statement Option” in SAS Scalable
Performance Data Engine: Reference.
258 Chapter 12 / SAS Libraries
Sashelp Library
SAS assigns the Sashelp library automatically. The Sashelp library contains the
following items:
n sample data that is used for some examples in SAS documentation.
n catalogs, item stores, and other files that store SAS settings for your entire site.
The defaults in the Sashelp library can be customized by your on-site SAS
support personnel. (Many settings are stored in the SAS Registry, which is two
item stores. See Chapter 35, “The SAS Registry,” on page 761.)
Use PROC DATASETS or PROC CATALOG to list the catalogs in a library. The
SASHELP system option specifies the location of the Sashelp library. The option is
set during the installation process and normally is not changed after installation.
Sasuser Library
SAS assigns the Sasuser library automatically. The Sasuser library contains
catalogs and other files that store your personal settings and customizations. For
example, in Base SAS, you can store your defaults for function key settings or
window attributes. These values are stored in a catalog named Sasuser.Profile. See
“Sasuser.Profile Catalog” in SAS V9 LIBNAME Engine: Reference. (Many settings are
stored in the SAS Registry, which is two item stores. See Chapter 35, “The SAS
Registry,” on page 761.)
The SASUSER system option specifies the location of the Sasuser library.
The RSASUSER system option enables the system administrator to control the
mode of access to the Sasuser library. RSASUSER is helpful for installations that
have one Sasuser library for multiple users, to prevent those users from modifying
it.
Examples: Access Data by Using a Libref 259
Example Code
This example assigns a libref and then references it in a DATA step and a PROC
step.
1 The LIBNAME statement assigns the libref sales to the physical location
c:\myfiles.
2 The DATA step creates the data set sales.quarter1 and stores it in the library’s
physical location.
3 The PROC PRINT step references the data set by its two-level name,
sales.quarter1.
Key Ideas
See Also
n “LIBNAME Statement: V9 Engine” in SAS V9 LIBNAME Engine: Reference
n “Example: Assign the User Library for Permanent Data” on page 273
Example Code
This example creates a macro named test that uses functions to assign a libref and
verify the assignment.
%macro test;
%let mylibref=new; /* 1 */
%let mydirectory=c:\example; /* 2 */
%if %sysfunc(libname(&mylibref,&mydirectory)) %then /* 3 */
%put %sysfunc(sysmsg());
%else %put success;
%if %sysfunc(libref(&mylibref)) %then /* 4 */
Examples: Access Data by Using a Libref 261
%put %sysfunc(sysmsg());
%else %put library &mylibref is assigned to &mydirectory;
%mend test;
%test
Here is the output in the log from running the macro test:
12 %test
success
library new is assigned to c:\example
Key Ideas
n Some programmers prefer using SAS functions rather than statements. The
LIBNAME function can assign or deassign (clear) a library assignment. The LIBREF
function can verify a library assignment.
n The behavior of the LIBNAME function depends on the number of arguments.
n Be aware of additional rules when you use DATA step functions within macro
functions.
See Also
n “LIBNAME Function” in SAS Functions and CALL Routines: Reference
n “Using DATA Step Functions within Macro Functions” in SAS Functions and
CALL Routines: Reference
n “Elements of a Library Assignment” on page 250
262 Chapter 12 / SAS Libraries
Example Code
This LIBNAME statement concatenates two SAS libraries:
libname lib3 (lib1 lib2);
Notice that the index for apples does not appear in the concatenation. The
lib2.apples data set has an index. However, the lib1.apples data set does not
have an index, and lib1 is listed first in the concatenation. SAS suppresses the
index when its associated data set is not part of the concatenation.
If multiple catalogs have the same name, their entries are concatenated. The
lib3.formats catalog combines the entries of the lib1.formats and
lib2.formats catalogs. For details, see “Catalog Concatenation” in SAS V9
LIBNAME Engine: Reference.
Key Ideas
n Library concatenation enables you to reference multiple libraries that are stored in
different physical locations.
n When a data set is opened for input or update, the concatenated libraries are
searched and the first occurrence of the data set is used. When a data set is
created, it is created in the first library that is listed in the concatenation, even if a
file exists with the same name in another library in the concatenation. Unwanted
behavior could occur if data sets exist with the same name in the different
locations.
Examples: Access Data by Using a Libref 263
See Also
n “Library Concatenation” on page 255
Example Code
This example uses a SAS/CONNECT spawner to access a SAS library that is stored
on a remote computer.
options comamid=tcp; /* 1 */
%let myserver=host.name.com; /* 2 */
signon myserver.__1234 user=userid password='mypw'; /* 3 */
libname reports '/myremotedata' server=myserver.__1234; /* 4 */
proc datasets library=reports; /* 5 */
run;
quit;
signoff myserver.__1234;
The following output from PROC DATASETS shows the REMOTE engine is
assigned to the directory. In this case, the SAS client is running on Windows, the
spawner is running on z/OS, and the data is located on UNIX.
264 Chapter 12 / SAS Libraries
Output 12.2 Portion of PROC DATASETS Output Showing the Remote Directory
Information
Key Ideas
See Also
n “Types of Sign-ons” in SAS/CONNECT User’s Guide
Example Code
The following LIBNAME statement accesses a WebDAV server:
1 The LIBNAME statement assigns the libref davdata to the URL location of a
WebDAV server.
2 The WEBDAV option is required in order to access a WebDAV server.
Key Ideas
See Also
n “LIBNAME Statement: WebDAV Server Access” in SAS Global Statements:
Reference
266 Chapter 12 / SAS Libraries
Example Code
In this example, a SAS DATA step creates a Teradata table. To run this example, you
must license a SAS/ACCESS interface.
1 The LIBNAME statement specifies the mytddata libref and TERADATA, which is
the engine nickname for SAS/ACCESS Interface to Teradata. The statement also
specifies connection options for Teradata. Change these options to specify your
SAS/ACCESS connection values and any other options you need.
2 The DATA step creates a table named grades. The table is in the Teradata
DBMS and is not a SAS data set.
3 The PROC DATASETS output for the mytddata library shows that the engine is
Teradata. For the grades table, the SAS member type is DATA and the DBMS
member type is TABLE.
Output 12.3 PROC DATASETS Output Showing the DBMS Directory Information
Examples: Access Data by Using a Libref 267
Output 12.4 PROC DATASETS Output Showing the Library Contents for DBMS
Content
Key Ideas
n If you license a SAS/ACCESS interface for your DBMS data, you can submit a
LIBNAME statement in SAS to run SAS code against the DBMS data. SAS can
create or process a DBMS table as if it were a SAS data set.
n Some SAS/ACCESS interfaces are case sensitive. You might need to change the
case of table or column names to comply with the requirements of your DBMS.
n When you access data that is stored in a DBMS or application other than SAS,
usually you must specify connection information. You might need additional
LIBNAME statement options that are specific to the engine.
See Also
n SAS/ACCESS documentation
Example Code
This example creates a SAS view that references the Teradata table that was
created in “Example: Access DBMS Data as a SAS Library” on page 266. This code
creates a SAS view, not a native Teradata view.
run;
1 In the LIBNAME statement, the libref target is assigned with a Windows path
name, not the Teradata server. No engine is specified, so SAS assigns the
default V9 engine.
2 This is the same SAS/ACCESS LIBNAME statement as in “Example: Access
DBMS Data as a SAS Library” on page 266.
3 The DATA step creates a SAS view named highgrades that references the
Teradata table named grades.
4 The view includes rows where the final variable is greater than 80.
5 PROC PRINT executes the view. Be aware that DATA step views do not retain
the LIBNAME statement. Therefore, when you reference this view, you must
first submit LIBNAME statements for the mytddata library as well as the target
library.
The PROC PRINT output shows that Wilma’s final grade is greater than 80. Fred’s
final grade is not greater than 80, so Fred is not included in the view.
Output 12.5 PROC PRINT of a SAS View That References a Teradata Table
If you run PROC DATASETS, you can see that the member type of highgrades is
VIEW.
proc datasets library=target;
run;
quit;
For comparison, you could create a native table and view in Teradata by using the
Teradata BTEQ tool. Submit the CREATE TABLE and CREATE VIEW commands in
BTEQ to create a table named gradessql and a view named gradessqlview.
If you run PROC DATASETS against the mytddata library, you can see that the SAS
member types differ from the DBMS member types.
proc datasets library=mytddata;
run;
quit;
Examples: Access Data by Using a Libref 269
Key Ideas
n Many customers use a DBMS as a large, ongoing data store. Rather than process
an entire DBMS table, you can create a SAS view to query a subset of the data.
n After you submit a SAS/ACCESS LIBNAME statement, you can create a DATA step
view or a PROC SQL view. Another option is a PROC SQL view that uses the SQL
pass-through facility, which submits DBMS-specific syntax.
n A SAS view reflects the current state of data in the underlying data source.
Alternatively, if the underlying DBMS data is not expected to change, you could
read the DBMS data to create a permanent SAS data set.
See Also
n “SAS Views of DBMS Data” in SAS/ACCESS for Relational Databases: Reference
Example Code
This example sets the DLCREATEDIR system option in order to create a
subdirectory in the file system for a library.
In the example, the directory c:\example exists in the file system, but the
subdirectory project does not. Because the DLCREATEDIR system option is set,
SAS creates project.
270 Chapter 12 / SAS Libraries
options dlcreatedir;
libname mynewlib 'c:\example\project';
The SAS log includes a note that the library was created:
Key Ideas
n You can set the DLCREATEDIR system option to create a subdirectory for the SAS
library that is specified in the LIBNAME statement if that directory does not exist.
n SAS can create one new subdirectory of an existing directory. In other words, SAS
creates only the final component in the path name.
n The shipped default is NODLCREATEDIR for all environments except z/OS.
See Also
n “DLCREATEDIR System Option” in SAS System Options: Reference
Example Code
The following statement deassigns the libref mylib from its physical location:
libname mylib clear;
Use the _ALL_ keyword in the LIBNAME statement to deassign all library
assignments (other than system libraries):
libname _all_ clear;
You can also use the LIBNAME function. The following code deassigns the libref
new, which was assigned in “Example: Assign a Libref by Using a Function” on page
260:
Examples: Access Data by Using a Libref 271
%macro test;
%if (%sysfunc(libname(new))) %then
%put %sysfunc(sysmsg());
%mend test;
%test
Key Ideas
n Deassigning a libref can be useful for freeing up resources, especially for shared
data.
n By default, SAS deassigns librefs automatically at the end of each SAS session.
n You can request to deassign a libref before the end of the session:
o In the LIBNAME statement, specify the libref name and CLEAR to deassign a
single libref. Specify _ALL_ and CLEAR to deassign all currently assigned librefs.
System libraries such as Sashelp and Sasuser are not deassigned.
o In the LIBNAME function, use the one-argument form to deassign the libref. In
certain operating environments, you can deassign the libref by specifying a
blank between quotation marks for the library location. However, in some
operating environments, a blank for the library location assigns a libref to the
current directory. Therefore, the one-argument form is recommended.
n If you use a method to persist a library assignment beyond the current session, you
cannot permanently deassign the libref by using the LIBNAME statement or
LIBNAME function. The libref is deassigned for the current session only and is
reassigned when you start a new session.
See Also
n “LIBNAME Statement: V9 Engine” in SAS V9 LIBNAME Engine: Reference
Example Code
If you have not set a User library, the following code writes mytable to the Work
library.
Notice the one-level name mytable does not have a libref. This code behaves the
same if you specify work.mytable.
data mytable;
x=1;
run;
proc contents data=mytable;
run;
Here is a portion of the PROC CONTENTS output, showing that the Work library is
used.
Output 12.8 Portion of PROC CONTENTS Output for a Data Set in Work
Examples: Access Data without Using a Libref 273
Key Ideas
n You can use the Work library for intermediate or temporary results.
n By default, if you specify a one-level name when you create a data set, the data
set is stored temporarily in the Work library. If you want to specify a one-level
name to create and use permanent files instead of temporary files, then assign a
User library.
n The Work library is automatically defined by SAS at the beginning of each SAS
session. Typically, files in the Work library are deleted at the end of each SAS
session if the session terminates normally.
See Also
n “Work Library (Temporary)” on page 257
n “Example: Assign the User Library for Permanent Data” on page 273
Example Code
If you want to specify a one-level name to create and use permanent files instead
of temporary files, then assign a User library. This example references the data set
quarter1, which was created in “Example: Assign a Libref by Using the LIBNAME
Statement” on page 259.
1 The LIBNAME statement assigns the libref sales to a physical location for the
library.
274 Chapter 12 / SAS Libraries
2 The USER= system option specifies the sales library as the default for one-
level names.
3 The PROC PRINT step references the data set by its one-level name, quarter1.
The log output confirms the data is read from sales.quarter1, even though sales
is not specified in the PROC PRINT:
NOTE: There were 2 observations read from the data set SALES.QUARTER1.
Instead of setting the USER= system option, you can use a LIBNAME statement or
LIBNAME function to assign the User library. Try these two examples:
libname user 'c:\myfiles';
proc print data=quarter1;
run;
data _null_;
x=libname ('user', 'c:\myfiles');
run;
proc print data=quarter1;
run;
The behavior is the same as setting the USER= system option, except the log shows
the libref as user:
NOTE: There were 2 observations read from the data set USER.QUARTER1.
Key Ideas
n By default, if you specify a one-level name when you create a data set, the data
set is stored temporarily in the Work library. If you want to specify a one-level
name to create and use permanent files instead of temporary files, then assign a
User library.
n You can use the USER= system option to set a User library. You can also use the
common ways to assign libraries, such the LIBNAME statement. Specify a
previously assigned libref or a physical location.
n After you set the User library, if you want to store a temporary data set in the
Work library, you must specify the Work libref in a two-level name.
Examples: Access Data without Using a Libref 275
See Also
n “User Library” on page 257
n “Example: Use the Work Library for Temporary Data” on page 272
Example Code
This example prints a data set that is identified by its full path name and file name
instead of a libref and data set name. The data set quarter1 is created in “Example:
Assign a Libref by Using the LIBNAME Statement” on page 259. A Windows path
name is used for this demonstration.
proc print data='c:\myfiles\quarter1.sas7bdat';
run;
Key Ideas
n Instead of using a two-level name libref.file-name, you can omit the libref and
hardcode the full path name and file name, enclosed in quotation marks. You can
omit the file extension if the file is a SAS data set.
n Many language elements do not accept a physical location. For the restrictions,
see “Accessing Data without Using a Libref” on page 254.
See Also
n “LIBNAME Statement: V9 Engine” in SAS V9 LIBNAME Engine: Reference
n “Example: Assign the User Library for Permanent Data” on page 273
276 Chapter 12 / SAS Libraries
Example Code
This example uses a fileref to identify the location of raw data. The file
sampdata.txt is an external text file.
1 The FILENAME statement specifies the location of a file. The fileref is test, the
access method is URL, and the location is the full URL of the file.
If you download sampdata.txt to a file system that is accessible from your SAS
session (such as an NFS-mounted directory), then the URL access method is not
necessary.
2 The DATA step creates a temporary data set named credit in the Work library.
3 The INFILE statement specifies the fileref test, which was assigned in the
FILENAME statement. The FIRSTOBS= and OBS= data set options specify the
lines to read from the external file. The file sampdata.txt contains many sample
data sets and programs. This example uses lines 945–954 only, which is raw
data that is delimited by spaces.
4 PROC PRINT verifies that the external data is imported as a SAS data set.
Examples: Access Data without Using a Libref 277
Output 12.9 PROC PRINT Output Showing Successful Import from the External File
Key Ideas
n A text file that contains delimited data (also called raw data) is not a SAS library
member. Therefore, you cannot use a libref to refer to the location of the file.
Instead, use the FILENAME statement to assign a SAS fileref.
n The FILENAME statement can assign a fileref to an external file or an output
device, deassign a fileref and external file, or list attributes of external files.
n If you do not need to reuse a fileref, and if you do not need an access method such
as URL, you can omit the FILENAME statement. Instead, you can choose to specify
the quoted path name and file name in the INFILE statement. However, filerefs can
be helpful for many of the same reasons as librefs.
See Also
n Chapter 15, “Raw Data,” on page 353
Example Code
The DATASETS procedure prints a list, or directory, of the members in a SAS library.
libname myfiles 'c:\example';
proc datasets library=myfiles;
run;
quit;
In the example output from PROC DATASETS, the directory information includes
the libref, the physical location, and other attributes of the library. The second
section of the output shows the name of each member and its member type,
followed by any associated files. In the output, notice that the flowers data set has
an index.
Key Ideas
See Also
n “DATASETS Procedure” in Base SAS Procedures Guide
Example Code
This example uses the LIST argument in the LIBNAME statement. The LIST
argument prints the library’s attributes in the log.
libname myfiles 'c:\example';
libname myfiles list;
Key Ideas
n Specify the libref name to list the attributes of a single SAS library. Specify _ALL_
to list the attributes of all SAS libraries that have librefs in your current session.
n If you specify _ALL_, then librefs that are defined as environment variables appear
only if you have already used those librefs in a SAS statement.
See Also
n “LIBNAME Statement: V9 Engine” in SAS V9 LIBNAME Engine: Reference
Example Code
This example uses the PATHNAME function to return the physical location that is
assigned to the libref myfiles.
libname myfiles 'c:\example';
data _null_;
length path $ 100;
path=pathname('myfiles');
put path;
run;
The PUT statement writes the path name to the SAS log:
c:\example
Key Ideas
n The PATHNAME function returns the physical location that is assigned to a libref.
You can also view the physical location by using PROC DATASETS or by using the
LIST argument in the LIBNAME statement.
n Several other SAS functions are available to return information about a SAS library
or a library member.
See Also
n “ATTRC Function” in SAS Functions and CALL Routines: Reference
Example Code
The following COPY procedure example copies the entire myfiles library to the
target library. PROC COPY has many useful options, but none are specified, so the
default behavior such as CLONE is used.
Key Ideas
n SAS file management utilities such as PROC COPY have the following capabilities:
See Also
n “COPY Procedure” in Base SAS Procedures Guide
Example Code
The following MIGRATE procedure example migrates members in a SAS library to
take advantage of features that are provided in a newer SAS release. This example
does not use a SAS/CONNECT or SAS/SHARE server, which is required in some
cases.
Run this code in a session of the SAS release that you are migrating to.
libname myfiles 'c:\example';
libname target 'd:\new';
proc migrate in=myfiles out=target;
run;
The SAS log shows the results of the migration. Notice that PROC MIGRATE calls
PROC CPORT to migrate catalogs.
284 Chapter 12 / SAS Libraries
NOTE: The BUFSIZE= option was not specified with the MIGRATE procedure.
The migrated library members will use the current value for BUFSIZE. For more
information, see the PROC MIGRATE documentation.
NOTE: Migrating MYFILES.CONTAINERS to TARGET.CONTAINERS (memtype=VIEW).
NOTE: Migrating MYFILES.FLOWERS to TARGET.FLOWERS (memtype=DATA).
NOTE: Simple index plantname has been defined.
NOTE: There were 7 observations read from the data set MYFILES.FLOWERS.
NOTE: The data set TARGET.FLOWERS has 7 observations and 3 variables.
NOTE: Migrating MYFILES.RESTOCK to TARGET.RESTOCK (memtype=DATA).
NOTE: There were 7 observations read from the data set MYFILES.RESTOCK.
NOTE: The data set TARGET.RESTOCK has 7 observations and 4 variables.
An error message like the following is also due to CEDA, and it indicates that the
formats catalog was not migrated. To migrate catalogs with PROC MIGRATE to an
incompatible operating environment, you must use a SAS/CONNECT or
SAS/SHARE server to access the IN= library.
Key Ideas
n PROC MIGRATE is usually the best way to migrate members in a SAS library to the
current SAS version. PROC MIGRATE is a one-step copy procedure that retains the
data attributes that most users want in a data migration.
n If either of the following issues is present, then a SAS/CONNECT or SAS/SHARE
server is required, and different syntax is used. See “Migrating from a SAS®9
Release by Using SAS/CONNECT” in Base SAS Procedures Guide.
o if you do not have direct access to the source library from the target session via
a Network File System (NFS)
o if the source library contains catalogs and if the processing invokes CEDA on
the target session
n Alternatively, if you have direct access to the source library through NFS, then you
can use cross-environment data access (CEDA) instead of migrating. This Read-
Examples: Manage SAS Libraries 285
Only access is automatic and transparent, but you must be aware of the
restrictions. See Chapter 33, “Cross-Environment Data Access,” on page 737.
n If you are changing to a different character encoding that uses more bytes to
represent the characters, you might want to use the CVP engine as part of the
copy or migration process. See “Example: Avoid Truncation When Migrating a SAS
Library by Using a Two-Step Process” in SAS V9 LIBNAME Engine: Reference.
See Also
n “MIGRATE Procedure” in Base SAS Procedures Guide
Example Code
This example copies a SAS library across environments by using the CPORT and
CIMPORT procedures. A multistep process is necessary:
1 In the source environment, use PROC CPORT to create a transport file.
2 Use communication software (such as FTP) or a storage device to move the
transport file to the target environment. If you use FTP, transfer the file in
binary mode.
3 In the target environment, use PROC CIMPORT to import the library from the
transport file.
For step 1, PROC CPORT creates the transport file mytransfer, which is referenced
by the fileref tranfile.
libname source 'c:\example';
filename tranfile 'c:\myfiles\mytransfer';
proc cport library=source file=tranfile;
run;
In the log, notice that containers is not ported, because it is a SAS view.
286 Chapter 12 / SAS Libraries
For step 2, the user copies the mytransfer file from their Windows environment to
a UNIX environment.
For step 3, PROC CIMPORT creates the target library by importing the contents of
mytransfer.
libname target '/mydata/example';
filename tranfile '/mydata/mytransfer';
proc cimport library=target infile=tranfile;
run;
Key Ideas
n PROC CPORT and PROC CIMPORT have several limitations as compared to PROC
MIGRATE. Only use this method for migration if PROC MIGRATE would require
SAS/CONNECT or SAS/SHARE software, and you do not have access to that
software. PROC MIGRATE migrates an entire library, including data sets, catalogs,
and most other member types. See “Example: Migrate a SAS Library across
Environments by Using SAS/CONNECT” in SAS V9 LIBNAME Engine: Reference.
n PROC CPORT supports SAS data sets and catalogs but not other member types.
Examples: Manage SAS Libraries 287
n If you use FTP to move the transport file to the target environment, transfer the
file in binary mode.
n When you are transcoding to a new encoding, truncation could occur. If truncation
occurs, you must expand variable lengths. You can either use the CVP engine with
PROC CPORT or use the EXTENDVAR= option with PROC CIMPORT.
n Transport files that are created by the CPORT procedure are not interchangeable
with transport files that are created by the XPORT engine.
See Also
n “CPORT Procedure” in Base SAS Procedures Guide
n “PROC CPORT and PROC CIMPORT” in Moving and Accessing SAS Files
13
SAS Engines
Most engines are referred to as library engines, because they access a group of SAS
files that are used as a SAS library. For more information, see Chapter 12, “SAS
Libraries,” on page 247.
The SAS Multi Engine Architecture enables you to access a variety of file formats:
n Certain engines process SAS data only. See “Examples: Use a SAS Engine to
Process SAS Data” on page 298.
n Other engines interpret data from other applications (for example, DBMS, XML,
JSON, or Microsoft Excel). These engines apply a layer of abstraction so that
SAS can process the external data as if it were a SAS data set or a SAS library of
data sets. See “Examples: Use a SAS Engine to Process External Data” on page
308.
1
How Engines Work with Files 291
1 A SAS data set or table is stored in one or more physical files, depending on the
engine and attributes. If you have licensed the appropriate SAS engine, SAS can
read and write data that is created by other applications, such as a DBMS.
Base SAS can read some raw data. For example, the DATA step or the IMPORT
procedure can read comma-separated data from a text file. The DATA step or
procedure provides the data to the V9 engine for output to a SAS data set. See
Chapter 15, “Raw Data,” on page 353.
2 When you specify a SAS data set name, the engine locates the stored file or
files to obtain metadata. V9 engine data sets contain metadata (also known as
descriptor information) within the data set file. Other file types store metadata
in a separate file. Although SAS can determine the metadata for many external
file types, you might be required to provide additional instructions. The
metadata provides information such as variable names and attributes, and
whether the file has special processing characteristics such as indexes or
compressed observations.
Note that more than one engine might be involved in processing. For example, in
a DATA step, one engine could be used to read data, and a different engine used
to write data.
3 The engine uses the metadata to organize the data in the standard logical form
for SAS processing. This standard form is the SAS data set model. A SAS data
set consists of data values that are organized into variables (columns) and
observations (rows).
Similar to the SAS data set model, the SAS library model is a group of data sets
and other library members that are organized in a logical form for processing.
When files are accessed as a SAS library, you can use SAS utilities such as the
DATASETS procedure to list their contents and to manage them.
4 SAS procedures and DATA step statements process the data in this logical form,
the SAS data set model. During processing, the engine passes down whatever
instructions are necessary to open and close physical files and to read and write
data. Processing can occur in the SAS data set model without the data ever
being physically stored as a SAS data set. If the data is stored in an external
application, such as a DBMS, some SAS procedures can pass processing to that
application.
292 Chapter 13 / SAS Engines
Engine Characteristics
LIBNAME
Engine
Nickname Uses Examples and Documentation
V9 or BASE Shipped default Base SAS engine “Example: Assign the V9 Engine in a LIBNAME
Statement” on page 298
SAS data sets
“Example: Read a Comma-Delimited File” on
page 308
“Example: Set a Default Engine” on page 307
SAS/ACCESS External data from other “Example: Read Microsoft Excel Data by Using a
engines applications SAS/ACCESS Engine and PROC IMPORT” on
(multiple) page 310
“Example: Create Data in a DBMS by Using a
SAS/ACCESS Engine” on page 311
“Example: Embed a SAS/ACCESS LIBNAME
Statement in a PROC SQL View” on page 313
“Example: Access DBMS Data by Using the SQL
Pass-Through Facility” on page 315
LIBNAME
Engine
Nickname Uses Examples and Documentation
CAS Big data “Example: Load a SAS Data Set to a CAS Server”
on page 305
Multi-threaded distributed
processing
“CAS LIBNAME Engine Overview” in SAS Cloud
Cloud computing Analytic Services: User’s Guide
SPDE Alternative Base SAS engine “Example: Assign the SPD Engine in a LIBNAME
Statement” on page 300
SPD Engine data sets
“Example: Read and Write SAS Data in Hadoop
Big data
by Using the SPD Engine” on page 301
Multi-CPU threaded processing
Optional Hadoop storage SAS Scalable Performance Data Engine:
Reference
SAS SPD Engine: Storing Data in the Hadoop
Distributed File System
Cloud storage
CVP National language support “Example: Avoid Truncation by Using the CVP
Engine with the V9 Engine” on page 302
XMLV2 Import and export XML in Base “Example: Import XML Data by Using the
SAS XMLV2 Engine” on page 316
JSON Import and export JSON in Base “Example: Import JSON Data by Using the JSON
SAS Engine” on page 318
SASIOLA and SAS LASR Analytic Server SAS LASR Analytic Server: Reference Guide
SASHDAT
294 Chapter 13 / SAS Engines
LIBNAME
Engine
Nickname Uses Examples and Documentation
Multi-threaded distributed
processing
Optional Hadoop storage
FEDSVR External data or SAS data by SAS LIBNAME Engine for SAS Federation Server:
using the SAS Federation Server User’s Guide
The file format for SAS 9, SAS 8, and SAS 7 data sets is very similar, so SAS does
not differentiate between them. See “Cross-Release and Cross-Environment
Compatibility” in SAS V9 LIBNAME Engine: Reference. However, new file features
can make a data set unusable in an earlier release. See “File Features That Are Not
Supported in a Previous Release” in SAS V9 LIBNAME Engine: Reference.
If you see an unexpected engine name in the log or in output from PROC
CONTENTS or PROC DATASETS, the engine was probably called internally from
the V9 engine. These internal engines are not valid for a user to specify. For
example, when you create a DATA step view, SAS calls the SASDSV engine. When
you create a PROC SQL view, SAS calls SQLVIEW.
Engine Characteristics 295
When you use SAS/CONNECT or SAS/SHARE software, SAS calls the REMOTE
engine. Usually, you should not specify the REMOTE engine in a LIBNAME
statement, because the client automatically determines which engine to use.
Legacy Engines
V6 Compatibility Engine
The SAS 6 compatibility engine can automatically support some processing of SAS
6 files in SAS 9 without requiring you to convert the file to the SAS 9 format. For
more information, see the Migration Focus Area at support.sas.com/migration.
Tape Engines
The tape engines are sequential engines and are typically used for legacy data
storage.
A tape engine processes SAS files on storage media that do not provide random
access methods (for example, tape or sequential format on disk). The tape engines
require less overhead than the V9 engine because sequential access is simpler than
random access. The following tape engines are available:
n V9TAPE (alias TAPE) processes SAS 9, SAS 8, and SAS 7 files.
Before you store SAS libraries in sequential format, consider the following
restrictions:
n DATA is the only member type that is useful for purposes other than backup and
restore. Member types CATALOG, VIEW, and MDDB are supported for backup
and restore purposes only.
n You cannot use random (direct) access with sequential SAS data sets. Some
examples of direct access are the use of indexes, or the use of the POINT= or
KEY= options in the SET or MODIFY statements.
n The DATASETS procedure is not supported.
o You can access a SAS file during one DATA or PROC step. You can then
access another SAS file in the same sequential library or on the same tape
during a later DATA or PROC step.
o You can use the OPEN=DEFER option of the SET statement in the DATA
step to delay opening multiple data sets. OPEN=DEFER opens the first data
set during the compilation phase and opens subsequent data sets during the
execution phase. See the SET statement.
n The tape engines are not supported on Windows.
Operating Environment Information: For more details about storing and accessing
SAS files in sequential format, see the following documentation:
n SAS Companion for UNIX Environments
Transport Engine
The XPORT engine creates files in transport format, which uses an environment-
independent standard for character encoding and numeric representation.
Transport files that are created by the XPORT engine can be transferred across
operating environments. The transport file can be read on the target environment
by using the XPORT engine with the DATA step or the COPY procedure (or the
COPY statement of the DATASETS procedure).
The XPORT engine is not a best practice for moving SAS files across environments.
For other methods, see “Strategies for Moving and Accessing SAS Files” in Moving
and Accessing SAS Files. If you are migrating to a newer SAS release, see “MIGRATE
Procedure” in Base SAS Procedures Guide.
SPSS Engine
The SPSS engine reads data that was created in the external application SPSS. The
engine reads the SPSS portable file format, which has a .por extension. This file
format is analogous to the transport format for SAS data sets. An SPSS portable
file (also called an export file) must be created by using the SPSS EXPORT
command. Under z/OS, the SPSS engine also reads SPSS Release 9 files and SPSS-
X files in either compressed or uncompressed format. The SPSS engine is a
sequential engine.
n For examples, see also Sample 34629: Coming to SAS from SPSS.
Access Patterns
SAS procedures and statements can read observations in SAS data sets in one of
four general patterns:
sequential access
processes observations one after the other, starting at the beginning of the file
and continuing in sequence to the end of the file. For example, the XMLV2 and
JSON engines perform sequential processing only.
random access
processes observations according to the value of some indicator variable
without processing previous observations. For example, the POINT= option in
the SET statement requires random access to observations as well as the ability
to calculate observation numbers from record identifiers within the file.
BY-group access
groups and processes observations in order of the values of the variables that
are specified in a BY statement.
multiple-pass
performs two or more passes on data when required by SAS statements or
procedures.
If a statement or procedure tries to access a data set whose engine does not
support the required access pattern, SAS prints an appropriate error message in the
SAS log.
Levels of Locking
When a SAS data set can be opened concurrently by more than one SAS session or
by more than one statement or procedure within a single session, the level of
locking is important. The level of locking determines how many sessions,
procedures, or statements can read and write to the file at the same time. Some
engines do not support locking at all, or do not support some of these levels. For
details, see the engine documentation.
Library-level locking
specifies that concurrent access is controlled at the library level. This locking
restricts concurrent access to only one Update process to the library.
Member-level (data set) locking
enables Read access to many sessions, statements, or procedures. This locking
restricts all other access to the SAS data set when a session, statement, or
procedure acquires Update or Output access.
298 Chapter 13 / SAS Engines
By default, SAS provides the greatest possible level of concurrent access, while
guaranteeing the integrity of the data. Here are the ways of controlling the locking
level:
n Although controlling the access level yourself is usually not needed, the
CNTLLEV= data set option enables locking at the library, record, or member
level.
n The LOCK statement acquires, lists, or releases an exclusive lock on an existing
SAS file.
n Some SAS products, such as SAS/ACCESS and SAS/SHARE, contain engines
that support enhanced session-management services and file-locking
capabilities.
Example Code
This library assignment specifies the V9 engine.
Examples: Use a SAS Engine to Process SAS Data 299
1 The LIBNAME statement assigns the myfiles libref and the V9 engine to a
physical location.
2 The DATA step creates the myclass data set in the myfiles library by copying
the class data set in the sashelp library.
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set MYFILES.MYCLASS has 19 observations and 5 variables.
Key Ideas
See Also
n “Examples: Access Data by Using a Libref” on page 259
Example Code
The following LIBNAME statement for the SPD Engine is very similar to a LIBNAME
statement for the V9 engine.
1 This portion of the LIBNAME statement assigns the mylib libref and the SPD
Engine to a primary path name. The first (and usually only) metadata file for a
data set is always stored in the library’s primary path.
2 Optionally, you can assign one or more path names in the DATAPATH= option to
store data partitions. Otherwise, the data partition files are stored in the
primary path.
3 Optionally, you can assign one or more path names in the INDEXPATH= option
to store index files. Otherwise, the index files are stored in the primary path.
Example Code 13.2 SAS Log Showing a Successful SPD Engine Library Assignment
Key Ideas
n The SPD Engine is designed for high-speed processing of very large tables. The
engine uses threads to read data very rapidly and in parallel, executing on multiple
CPUs. Contributing to this performance is the partitioned file format, which can
take advantage of distributed environments.
n Although SPD Engine stores a data set in multiple files, you can process an SPD
Engine data set very similarly to a V9 engine data set. Most of the Base SAS
language works very well with an SPD Engine data set. However, the engine
Examples: Use a SAS Engine to Process SAS Data 301
supports some language elements that are specific to its processing and storage
optimizations. For differences from V9 engine capabilities, see the documentation.
See Also
n SAS Scalable Performance Data Engine: Reference
Example Code
The following example assigns a Base SAS library to a Hadoop cluster.
options set=SAS_HADOOP_CONFIG_PATH='\\myconfigpath'; /* 1 */
options set=SAS_HADOOP_JAR_PATH='\\myconfigpath';
1 The SET= system option defines environment variables for Hadoop. If these
environment variables are already set (for example, in the SAS configuration file
or SAS invocation), do not submit these lines of code. If these environment
variables are not correctly set, then the following LIBNAME statement produces
errors in the SAS log.
2 The LIBNAME statement assigns the mydata libref to the SPD Engine and a
directory in the Hadoop cluster. The HDFS=YES argument specifies to connect
to the Hadoop cluster that is defined in the Hadoop cluster configuration files.
The ACCELWHERE=YES option requests that data subsetting be performed by
a MapReduce program in the Hadoop cluster.
Key Ideas
n The SPD Engine is an alternative Base SAS engine that can read and write SAS
data on a traditional file system or on Hadoop. The engine does not require you to
license additional SAS products such as SAS/ACCESS, but you must be running a
supported Hadoop distribution.
302 Chapter 13 / SAS Engines
n Customers often choose Hadoop for low-cost storage of very large data. The
distributed storage and processing of the SPD Engine works well with the Hadoop
file system (HDFS). In addition, the engine can optimize most WHERE expressions
by automatically submitting a MapReduce program in the Hadoop cluster.
n The engine can read SPD Engine data sets on Hadoop. After you use the engine to
store a data set on Hadoop, you can use most of the Base SAS language for
processing the data. However, the engine supports some language elements that
are specific to its processing and storage on Hadoop.
See Also
n SAS SPD Engine: Storing Data in the Hadoop Distributed File System
Example Code
To run this example, first create a data set named myclass as in “Example: Assign
the V9 Engine in a LIBNAME Statement” on page 298. Run PROC CONTENTS to
see the length of the variables:
libname myfiles v9 'c:\examples';
proc contents data=myfiles.myclass;
run;
In the PROC CONTENTS output, notice the two character variables. Name has a
length of 8, and Sex has a length of 1.
The example below uses the CVP engine with the V9 engine to expand the size of
character variables. The CVP engine can help you avoid truncation if you copy a
data set to an encoding that uses more bytes to represent the characters.
1 This LIBNAME statement assigns the srclib library to the CVP engine and the
location of the data that you want to copy. The CVPENGINE= option specifies
the V9 engine as the underlying engine to process the data. The CVPMULT=
option specifies a multiplication factor of 2.5 to expand all character variables.
2 This LIBNAME statement assigns the target library to contain the copied data.
3 The COPY procedure copies the srclib library to the target library. During the
copy, the CVP engine expands the character variable lengths 2.5 times larger.
4 The CONTENTS procedure shows that the lengths of the character variables
have been multiplied by 2.5:
For Name, 8 × 2.5 = 20.
For Sex, 1 × 2.5 = 2.5, which is 3 when rounded up to a whole number.
Key Ideas
n When you copy a data set to an encoding that uses more bytes to represent the
characters, truncation might occur if the column length does not accommodate the
larger character size. For example, a character might be represented in wlatin1
encoding as one byte but in UTF-8 as two bytes.
n If an error in the log states character data was lost during transcoding, it
usually indicates that truncation has occurred. You can troubleshoot the error by
using the CVP engine to expand the length of character variables.
304 Chapter 13 / SAS Engines
See Also
n SAS National Language Support (NLS): Reference Guide
Example Code
The following example uses the DATA step to load a SAS data set into memory as a
SAS Cloud Analytic Services (CAS) table.
1 The CAS statement starts a CAS session and specifies casauto as the CAS
session name. Use your connection information in the PORT= and HOST=
options.
2 The LIBNAME statement assigns the mycas libref to the CAS engine. The
SESSREF= LIBNAME option is not specified, so the engine uses the casauto
session.
3 The DATA step copies the SAS data set sashelp.cars to the CAS session. The
PROMOTE=YES data set option promotes the table with global scope.
4 PROC CONTENTS shows the mycas.cars table is available on the CAS server
for the duration of the session. After data is loaded into memory, subsequent
steps can process the data in memory. Loading and processing are done in
separate steps.
306 Chapter 13 / SAS Engines
Key Ideas
n You can submit a LIBNAME statement that uses the CAS engine to connect your
SAS session to a CAS session. You must have access to a CAS server and an
existing CAS session.
n The CAS LIBNAME engine with the DATA step is one way of loading SAS data to
the CAS server as an in-memory table. Other methods might be more efficient for
large tables.
n After you load data to the CAS server, you can execute SAS procedures or the
DATA step from your SAS session by referencing the SAS libref and table name.
You do not process the table in memory in the same DATA step that you use to
load the table into memory. Loading and processing are done in separate steps.
n Tables are not automatically saved when they are loaded to a caslib. You can use
the CASUTIL procedure to save tables. Native CAS tables have the file
extension .sashdat.
Examples: Use a SAS Engine to Process SAS Data 307
See Also
n “LIBNAME Statement: CAS Engine” in SAS Cloud Analytic Services: User’s Guide
n “Example: Load SAS Data to CAS by Using the CASUTIL Procedure” in SAS V9
LIBNAME Engine: Reference
n An Introduction to SAS Viya Programming
Example Code
The following invocation command is an example for the Windows file system. The
path to the executable might be different in your deployment.
In this example, the ENGINE system option assigns the SPD Engine as the default
for new data sets.
"c:\program files\SASHome\SASFoundation\9.4\sas.exe" -engine spde
Key Ideas
n The ENGINE system option cannot be set interactively. You must specify ENGINE
in a configuration file, at invocation, or in an environment variable.
n The shipped default Base SAS engine is BASE. In SAS ®9 and SAS ® Viya ®, the BASE
engine is an alias for the V9 engine. If you do not specify an engine name when you
create a new library, and if you have not specified the ENGINE system option, then
the V9 engine is automatically selected. If the library location already contains
SAS files, then SAS might be able to assign the correct engine based on those files.
For example, if the location contains V9 data sets only, then SAS assigns the V9
engine. However, if a library location contains a mix of different engine files, then
SAS might not assign the engine you want. Therefore, specifying the engine is a
best practice.
See Also
n “ENGINE= System Option” in SAS System Options: Reference
308 Chapter 13 / SAS Engines
Example Code
In this example, the IMPORT procedure imports a file that contains comma-
separated values.
1 The FILENAME statement assigns the chol fileref. The TEMP option specifies
that the file is temporary, so a path name is not necessary.
2 The HTTP procedure specifies the URL of the cholesterol.csv input file. The
data is written to the chol fileref.
3 PROC IMPORT reads the comma-delimited data and creates the mycholesterol
data set.
4 The PRINT procedure prints the data set. The output shows that the file was
imported correctly, including the column names.
The output shows that the file was imported correctly, including the column names.
Examples: Use a SAS Engine to Process External Data 309
Key Ideas
n Base SAS software can import and export some external text files without using a
SAS/ACCESS engine. These files are usually referred to as raw data or delimited
data. The DATA step or PROC IMPORT can read data from a text file and provide
that data to the V9 engine for output to a SAS data set.
n If you want Base SAS to read or write a Microsoft Excel file, the file must have
a .csv extension. To import a file that has an Excel file extension of .xls or .xlsx,
you must have a license to SAS/ACCESS Interface to PC Files. In Excel, you can
save an Excel file as a .csv file.
n If you create a SAS data set from an Excel file, the data is static and does not
change to reflect the underlying Excel data. If you want to keep data in Excel as an
ongoing data store that you can query from SAS, then use the XLSX or EXCEL
engine. You must have a license to SAS/ACCESS Interface to PC Files.
See Also
n “Default Base SAS Engine (V9 Engine)” on page 294
Example Code
This example uses the XLSX engine and PROC IMPORT to import Excel data. To
create the data for this example, start Microsoft Excel and open the cholesterol.csv
file that you used in “Example: Read a Comma-Delimited File” on page 308. (The
cholesterol.csv file can be downloaded from http://support.sas.com/
documentation/onlinedoc/viya/exampledatasets/cholesterol.csv.) In Excel, save
the file with an .xlsx extension. The following code imports cholesterol.xlsx as a
SAS data set, mycholesterol.
options validvarname=v7; /* 1 */
proc import datafile='C:\examples\cholesterol.xlsx' /* 2 */
dbms=xlsx /* 3 */
out=work.mycholesterol /* 4 */
replace; /* 5 */
run;
Key Ideas
n The XLSX engine enables you to read data directly from Excel .xlsx files. You must
have a license to SAS/ACCESS Interface to PC Files. The XLSX engine supports
connections to Microsoft Excel 2007, 2010, and later files.
n SAS/ACCESS Interface to PC Files also provides the EXCEL engine for earlier
releases of Excel.
n Excel has different naming conventions than SAS. A best practice for reading Excel
data is to set VALIDVARNAME=V7. This option converts spaces to underscores,
and truncates names greater than 32 characters.
Examples: Use a SAS Engine to Process External Data 311
See Also
n SAS/ACCESS Interface to PC Files: Reference
Example Code
In this example, a SAS DATA step creates a Teradata table. To run this example, you
must license a SAS/ACCESS interface.
1 The LIBNAME statement specifies the mytddata libref and TERADATA, which is
the engine nickname for SAS/ACCESS Interface to Teradata. The statement also
specifies connection options for Teradata. Change these options to specify your
SAS/ACCESS connection values and any other options you need.
2 The DATA step creates a table named grades. The table is in the Teradata
DBMS and is not a SAS data set.
3 The PROC DATASETS output for the mytddata library shows that the engine is
Teradata. For the grades table, the SAS member type is DATA and the DBMS
member type is TABLE.
312 Chapter 13 / SAS Engines
Output 13.5 PROC DATASETS Output Showing the DBMS Directory Information
Output 13.6 PROC DATASETS Output Showing the Library Contents for DBMS
Content
Key Ideas
n If you license a SAS/ACCESS interface for your DBMS data, you can submit a
LIBNAME statement in SAS to run SAS code against the DBMS data. SAS can
create or process a DBMS table as if it were a SAS data set.
n Most SAS/ACCESS engines fully support the Base SAS language. A few Base SAS
features, such as catalogs, are specific to the V9 engine and are not available in
SAS/ACCESS engines. The SAS/ACCESS engines support some language
elements that are specific to the data source. For details, see the documentation.
n Some SAS/ACCESS interfaces are case sensitive. You might need to change the
case of table or column names to comply with the requirements of your DBMS.
See Also
n SAS/ACCESS documentation
Example Code
The following example embeds a SAS/ACCESS LIBNAME statement in a PROC
SQL view.
1 The V9 engine LIBNAME statement assigns the viewlib libref to the location
where the SAS view will be stored.
2 The CREATE VIEW statement in the SQL procedure creates the mygrades view
in the viewlib library.
3 The mytddata.grades table is referenced in the view.
4 The USING argument embeds the LIBNAME statement for the SAS/ACCESS
Interface to Teradata engine.
5 The PRINT procedure executes the viewlib.mygrades view, which references
the mytddata.grades table by using the embedded LIBNAME statement.
Key Ideas
See Also
n “LIBNAME Statement: External Databases” in SAS/ACCESS for Relational
Databases: Reference
n “CREATE VIEW” in SAS SQL Procedure User’s Guide
n SAS/ACCESS documentation
Examples: Use a SAS Engine to Process External Data 315
Example Code
This PROC SQL example uses the SQL pass-through facility to send a query to a
Teradata table.
proc sql;
connect to teradata as myconn (server=mytera
user=myid password=mypw); /* 1 */
select *
from connection to myconn /* 2 */
(select *
from grades
where final gt 90);
disconnect from myconn; /* 3 */
quit;
Key Ideas
n The SQL pass-through facility is an extension of PROC SQL that enables you to
use DBMS-specific SQL syntax instead of SAS SQL syntax.
n You can use SQL pass-through facility statements in a PROC SQL query or store
them in a PROC SQL view.
n The pass-through facility consists of three statements and one component:
See Also
n “SQL Pass-Through Facility” in SAS/ACCESS for Relational Databases: Reference
n SAS/ACCESS documentation
Prerequisites
To run the XMLV2 example code, first create a file named nhl.xml that contains
the following XML data. Store this file in a location that is accessible by your SAS
session.
<CONFERENCE> Western
<DIVISION> Pacific
<TEAM name="Stars" abbrev="DAL" />
<TEAM name="Kings" abbrev="LA" />
<TEAM name="Ducks" abbrev="ANA" />
<TEAM name="Coyotes" abbrev="PHX" />
<TEAM name="Sharks" abbrev="SJ" />
Examples: Use a SAS Engine to Process External Data 317
</DIVISION>
</CONFERENCE>
</NHL>
Example Code
The following example uses the XMLV2 engine to import XML data. You can
process XML data as SAS data sets in memory, without creating permanent SAS
data sets.
1 The first FILENAME statement assigns the file reference nhl to the physical
location of the XML document nhl.xml that will be imported.
2 The second FILENAME statement assigns the file reference map to a physical
location to store the XMLMap nhlgenerate.map that will be generated by SAS.
3 The LIBNAME statement assigns the nhl library to the XMLV2 engine. The
AUTOMAP=REPLACE option specifies to automatically generate an XMLMap
and to replace the XMLMap if it already exists. The XMLMAP= option specifies
the fileref of the XMLMap.
4 PROC PRINT produces output, verifying that the import was successful.
Key Ideas
n The Base SAS XMLV2 engine provides sequential access to read (import) XML
data. The engine can also write (export) SAS data sets as XML data.
n The engine creates temporary data sets in memory. To create permanent data sets,
reference the temporary data sets in the DATA step or in PROC COPY.
n One XML file can contain multiple related data sets. See the documentation for an
example that imports an XML document as multiple SAS data sets.
n The engine can automatically create an XMLMap of the data in an XML file. To
save the map, specify the MAP= option and AUTOMAP=REPLACE in the LIBNAME
statement. Then you can open the map in a text editor to customize it. After you
edit a map, specify the MAP= option and AUTOMAP=REUSE to avoid overwriting
your customizations.
n You can also generate and customize an XMLMap by using SAS XML Mapper, a
graphical interface.
See Also
n SAS XMLV2 and XML LIBNAME Engines: User’s Guide
Example Code
The following example uses the SAS JSON engine to read JSON data. The example
creates temporary, in-memory SAS data sets for analysis.
To run this example, first copy the JSON code for example.json from “LIBNAME
Statement: JSON Engine” in SAS Global Statements: Reference. Create the file in a
location that is accessible by your SAS session.
libname mydata json 'c:\examples\example.json';
proc datasets lib=mydata;
run;
quit;
Examples: Use a SAS Engine to Process External Data 319
Here is the PROC DATASETS output, showing the SAS data sets imported from
example.json.
Output 13.10 PROC DATASETS Output of SAS Data Sets Imported from JSON
Key Ideas
n The Base SAS JSON engine provides read-only, sequential access to JSON data.
n The engine creates temporary data sets in memory. To create permanent data sets,
reference the temporary data sets in the DATA step or in PROC COPY.
n One JSON file can contain multiple related data sets. See the documentation for
an example that merges the imported data sets into a single data set.
n The engine automatically creates a map of the data in a JSON file. To save the
map, specify the MAP= option and AUTOMAP=CREATE in the LIBNAME
statement. Then you can open the map in a text editor to customize it. After you
edit a map, specify the MAP= option and AUTOMAP=REUSE to avoid overwriting
your customizations.
n To export a JSON file from a SAS data set, use the JSON procedure.
See Also
n “LIBNAME Statement: JSON Engine” in SAS Global Statements: Reference
14
SAS Data Sets
For more information about SAS data views, see “Definitions for SAS Views” on
page 393.
n For the SPD engine, see “Differences between the Default Base SAS Engine
Data Sets and the SPD Engine Data Sets” in SAS Scalable Performance Data
Engine: Reference and “Organizing SAS Data Using the SPD Engine” in SAS
Scalable Performance Data Engine: Reference.
n “Data” in SAS Cloud Analytic Services: Fundamentals.
To view the descriptor information (metadata) about a SAS data set, you can use
the CONTENTS procedure or the CONTENTS Statement in the DATASETS
procedure:
proc contents data=sashelp.air;
run;
If a data set is sorted, then additional sort information is added to the data set’s
metadata. For more information about sorting SAS data sets, see BY-Group
Processing on page 403.
Managing SAS Data Sets 323
Read and Create a SAS data set on page 329 V9 Engine LIBNAME statement
Many of the methods that you use to manage data sets are identical to the ones
that you use to manage SAS libraries on page 282.
Table 14.2 Ways to Control the Reading and Writing of Variables and Rows
More
Category Task Information
More
Category Task Information
Table 14.3 Ways to Control the Reading and Writing of Variables and Rows
Control rows Conditionally select rows by using the WHERE statement and
WHERE statement subsetting IF statement
Conditionally select rows by using the IF
statement
Sequential access
reads rows sequentially, in the order in which they appear in the physical file.
The SET, MERGE, UPDATE, and MODIFY statements read rows sequentially by
default. The SAS functions OPEN, FETCH, and FETCHOBS also read rows
sequentially.
Direct access
by row number
In the DATA step, to access rows directly by their row number, use the
POINT= option in the SET or MODIFY statements. The POINT= option
names a temporary variable whose current value determines which row that
a SET or MODIFY statement reads.
You can subset rows from one data set and combine them with rows from
another data set by using direct access methods, as follows:
data south;
set revenue;
if region=4;
set expense point=_n_;
run;
by index
To directly access rows that are based on the values of one or more specified
variables, you must first create an index for the variables. An index is a
separate structure that contains the data values of the key variable or
variables, paired with a location identifier for the rows that contain the
value.
Once the index is created, you can then use the DATA step with the KEY=
option in the SET or MODIFY statement to directly access rows based on the
values of the indexed variable.
For example, suppose that you need to match information in one data set
with a specific value in a second data set. If the second data set is properly
indexed, you can use the KEY= option in the SET statement to perform a
“table lookup” on the indexed table to combine only those rows with the first
data set.
data combine;
set invtory(keep=partno instock price);
set partcode(keep=partno desc) key=partno;
run;
Indexes can be created on SAS data sets that are created using the Base SAS
V9 Engine. For more information about creating indexes on SAS data sets,
see “Indexes” in SAS V9 LIBNAME Engine: Reference.
n “Example: Create a SAS Data Set Name Containing a Special Character” on page
64 and “Example: Create a Two-Level Data Set Name” on page 70
A data set name list can be either a numbered range list or a name prefix list.
Numbered range list
data combine;
set sales0 sales1 sales2 sales3 sales4 sales5;
run;
data combine;
set sales0-sales5;
run;
n data set names in a numbered range list have the same name except for the
last character or characters, which are consecutive numbers.
328 Chapter 14 / SAS Data Sets
n data set names in a numbered range list can begin with any number and end
with any number as long as the numbers are consecutive. For example, the
following SET statements refer to the same data sets:
proc datasets;
copy in=work out=mysas;
select d1-d3;
run; quit;
Note: If the numeric suffix contains leading zeros, the number of digits in the
suffix of the last data set name must be greater than or equal to the number of
digits in the first data set name. For example, the data set lists sales001–
sales99 causes an error. The data set list sales001–sales999 is valid.
data combine;
set sal:;
run;
Data set name lists are typically used in these SAS language elements:
n REPAIR statement
n SET statement
See “SAS Variable Lists” on page 99 for more information about creating character
and numeric variable lists in SAS code.
Examples: Create and Read SAS Data Sets 329
See Also
“Definitions for SAS Data Sets” on page 322
Example Code
This example reads a SAS data set from the Sashelp library and writes the output
to the SAS Work library.
The SET statement reads the Sashelp.shoes data set into the DATA step where it
is processed by the WHERE statement. The WHERE statement selects only those
observations that contain a value greater than 500,000 for the variable sales. The
DATA step then writes the output to the data set that is specified in the DATA
statement (work.shoes).
data work.shoes;
set sashelp.shoes;
where sales>500000;
run;
proc print data=shoes; run;
330 Chapter 14 / SAS Data Sets
Output 14.1 PROC PRINT Output for the Work.shoes Data Set Read from the
Sashelp Library
Key Ideas
n The SET statement reads SAS data sets into the DATA step for processing. You
can also use the MERGE statement, the MODIFY statement, and the UPDATE
statement to read SAS data sets into a DATA step.
n The DATA statement writes out SAS data sets that have been processed by the
DATA step.
n If you do not specify a location for the output data set, the DATA statement
automatically writes the output to the SAS Work library. Data sets written to the
Work library are saved only for the duration of the current SAS session. To specify
a permanent output location, you must create a SAS library (libref) by using the
LIBNAME statement. See Chapter 12, “SAS Libraries,” on page 247 for more
information about libraries in SAS.
n By default, the DATA step reads observations from a SAS data set using sequential
access. For more information about sequential and direct data access, see “Ways
to Read Rows in SAS Data Sets” on page 326.
See Also
n SET statement, DATA statement, and “WHERE Statement” in SAS DATA Step
Statements: Reference
n “Sashelp Library” on page 258 in SAS Programmer’s Guide: Essentials
Example Code
This example reads in three data sets from the Sashelp library and then
concatenates them into a single output data set named concat. Since a SAS library
or output location is not specified, the output data set, concat, is temporarily saved
in the SAS Work library.
data concat;
set sashelp.nvst1 sashelp.nvst2 sashelp.nvst3;
run;
proc print data=concat; run;
The output data set consists of observations from all three data sets. The order in
which the data sets are concatenated in the output data set is based on how the
data sets are listed in the SET statement. Observations from sashelp.nvst1 are
first, followed by observations from sashelp.nvst2, followed by observations from
sashelp.nvst3.
Here is the PROC PRINT output for the data set concat, annotated to show how
the DATA step concatenates multiple input data sets:
332 Chapter 14 / SAS Data Sets
Output 14.3 Log Output for Reading Multiple SAS Data Sets
NOTE: There were 6 observations read from the data set SASHELP.NVST1.
NOTE: There were 6 observations read from the data set SASHELP.NVST2.
NOTE: There were 6 observations read from the data set SASHELP.NVST3.
NOTE: The data set WORK.CONCAT has 18 observations and 2 variables.
Key Ideas
n You can read from multiple SAS data sets and combine and modify data in
different ways. See “Summary of Ways to Combine SAS Data Sets” on page 478 for
more information.
n The SET statement reads SAS data sets into the DATA step for processing. You
can also use the MERGE statement, the MODIFY statement, and the UPDATE
statement to read SAS data sets into a DATA step.
Examples: Create and Read SAS Data Sets 333
n The DATA statement writes out SAS data sets that have been processed by the
DATA step.
n If you do not specify a location for the output data set, the DATA statement
automatically writes the output to the SAS Work library. Data sets written to the
Work library are saved only for the duration of the current SAS session. To specify
a permanent output location, you must create a SAS library (libref) by using the
LIBNAME statement. See Chapter 12, “SAS Libraries,” on page 247 for more
information about libraries in SAS.
n By default, the DATA step reads observations from a SAS data set using sequential
access. For more information about sequential and direct data access, see “Ways
to Read Rows in SAS Data Sets” on page 326.
See Also
n “SET Statement” in SAS DATA Step Statements: Reference
Example Code
This example reads the permanent data set, quakes, from the user-defined library,
mysas, and then writes it out as quakes_mag in the same library.
To set up this example, first use the LIBNAME statement to create a user-defined
library (libref), mysas. Then, create the data set sashelp.quakes and specify
mysas.quakes as the output data set.
data mysas.quakes; /* 2 */
set sashelp.quakes;
run;
data mysas.quakes_mag; /* 4 */
334 Chapter 14 / SAS Data Sets
set mysas.quakes; /* 5 */
by Magnitude; /* 6 */
run;
proc print data=mysas.quakes_mag(obs=10); /* 7 */
title "Earthquakes by Magnitude";
run;
1 The LIBNAME statement creates the libref, mysas. The path to the library is
specified in single or double quotation marks.
2 To create the permanent data set for the example, the SET statement reads the
quakes data set from the Sashelp library. The DATA statement writes the data
set quakes to the mysas library, where it is saved to disc.
3 The SORT procedure sorts the mysas.quakes data set by the values of the
variable Magnitude.
4 The DATA statement writes the quakes data set to the data set quakes_mag in
the same library.
5 The SET statement reads the data set quakes from the mysas library.
6 The BY statement groups and orders the observations by the values of the
variable Magnitude.
7 The PRINT procedure prints the results for the quakes_mag data set. The OBS=
data set option in the PRINT statement causes the PRINT procedure to display
only the first 10 observations of the output data set.
Note: The output is only a portion of the output data set. The OBS= data set
option in the PROC PRINT statement limits the number of observations that are
displayed.
Examples: Create and Read SAS Data Sets 335
Key Ideas
n If you do not specify a location for the output data set, the DATA statement
automatically writes the output to the SAS Work library. Data sets written to the
Work library are saved only for the duration of the current SAS session. To specify
a permanent output location, you must create a SAS library (libref) by using the
LIBNAME statement. See Chapter 12, “SAS Libraries,” on page 247 for more
information about libraries in SAS.
n The SET statement reads SAS data sets into the DATA step for processing. You
can also use the MERGE statement, the MODIFY statement, and the UPDATE
statement to read SAS data sets into a DATA step.
n The DATA statement writes out SAS data sets that have been processed by the
DATA step.
n By default, the DATA step reads observations from a SAS data set using sequential
access. For more information about sequential and direct data access, see “Ways
to Read Rows in SAS Data Sets” on page 326.
See Also
n Chapter 12, “SAS Libraries,” on page 247
Example Code
This example reads a permanent SAS data set that has been saved to the Windows
directory, demo, and writes it out to a new data set in a different directory (the
demo2 directory). Instead of using a libref, the pathname to the SAS data set is
specified in quotation marks in the DATA statement.
The SET statement reads the data set, shoesales, in the folder demo. The DATA
statement uses the same syntax to write the output to shoesales2, in the folder
demo2.
data "c:\Users\demo\shoesales2.sas7bdat";
336 Chapter 14 / SAS Data Sets
set "c:\Users\demo2\shoesales.sas7bdat";
run;
data "c:\Users\Jdoe\courses2.sas7bdat";
set "c:\Users\Jdoe\sasuser\courses.sas7bdat";
run;
proc print data="c:\Users\Jdoe\courses2.sas7bdat"; run;
NOTE: There were 395 observations read from the data set c:\Users\demo
\shoesales.sas7bdat.
NOTE: The data set c:\Users\demo2\shoesales2.sas7bdat has 395 observations and 8
variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
Key Ideas
n If you do not specify a location for the output data set, the DATA statement
automatically writes the output to the SAS Work library. Data sets written to the
Work library are saved only for the duration of the current SAS session. To specify
a permanent output location, you must create a SAS library (libref) by using the
LIBNAME statement. See Chapter 12, “SAS Libraries,” on page 247 for more
information about libraries in SAS.
n The SET statement reads SAS data sets into the DATA step for processing. You
can also use the MERGE statement, the MODIFY statement, and the UPDATE
statement to read SAS data sets into a DATA step.
n The DATA statement writes out SAS data sets that have been processed by the
DATA step.
n By default, the DATA step reads observations from a SAS data set using sequential
access. For more information about sequential and direct data access, see “Ways
to Read Rows in SAS Data Sets” on page 326.
See Also
n Chapter 12, “SAS Libraries,” on page 247
Example Code
This example uses the KEEP statement to control which variables are written to the
output data set.
The LIBNAME statement specifies the location for writing the output data set, and
it associates that location with the name mysas. The SET statement reads the input
data set, Sashelp.Cars and the DATA statement writes the results to the output
data set mysas.cars.
This example uses the KEEP statement to include only the variables Make, Mpg, and
MSRP in the output data set. SAS reads all the variables from the Sashelp.Cars data
set into memory and then removes the unwanted variables when it creates the
output data set. The variables MPG_City and MPG_Highway are read into memory by
the DATA step and are used to calculate the weighted average MPG, but they are
not included in the output.
libname mysas "c:\Users\demo";
data cars;
set sashelp.cars;
keep make mpg MSRP;
Mpg=(MPG_City*.45)+(MPG_Highway*.55)/2;
run;
proc print data=cars(obs=10); run;
338 Chapter 14 / SAS Data Sets
Output 14.6 PROC PRINT Output for the Mysas.Cars Data Set Showing Only the
Variables Specified in the KEEP Statement
Note: The output is only a portion of the output data set. The OBS= data set
option in the PROC PRINT statement limits the number of observations that are
displayed.
An alternate way to control the selection of variables is to use the KEEP= data set
option. The KEEP= data set option in the input data set in the SET statement
controls which variables are read and written to the output data set. The KEEP=
data set option in the output data set in the DATA statement controls which
variables are written to the output data set.
Key Ideas
n If you do not instruct it to do otherwise, SAS writes all variables and all
observations from input data sets to output data sets.
n You can control which variables and observations that you want to read and write
by using SAS statements, data set options, and functions. See “Ways to Manage
Variables in SAS Data Sets” on page 323 for a list of these language elements.
n The DROP statement and DROP= data set option work the same way that the
KEEP= statement and KEEP= data set option work except that the selected
variables are dropped rather than kept.
See Also
n Comparing the DROP= data set option and the DROP statement
n Comparing the KEEP= data set option and the KEEP statement
n “Processing Variables without Writing Them to a Data Set” in SAS Data Set
Options: Reference
Examples: Control Variables and Observations in Data Sets 339
Example Code
This example uses the WHERE statement to select observations that are based on
the values of the variables age and height.
data class;
set sashelp.class;
where age>12 and height>=67;
run;
proc print data=class; run;
When the DATA step runs this program, it does not read every row in the input data
set, as the following log shows:
NOTE: There were 3 observations read from the data set SASHELP.CLASS.
WHERE (age>12) and (height>=67);
Using the WHERE statement can improve the efficiency of your SAS program when
the DATA step has to read fewer observations.
You can also use the WHERE= data set option to conditionally select observations
in either the input data set or the output data set.
data class;
set sashelp.class(where=(age>12 and height >=67));
run;
Like the WHERE statement, the WHERE= data set option, when specified in the
input data set, does not require that the DATA step reads all the observations. Here
is the log output when the WHERE= data set option is specified in the SET
statement, in the input data set:
NOTE: There were 3 observations read from the data set SASHELP.CLASS.
WHERE (age>12) and (height>=67);
340 Chapter 14 / SAS Data Sets
When the WHERE= data set option is specified in the DATA statement, in the
output data set, the DATA step reads all the rows of the input data set and then
writes only those observations that meet the criteria to the output data set:
data class(where=(age>12 and height >=67));
set sashelp.class;
run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS has 3 observations and 5 variables.
Key Ideas
n If you do not instruct it to do otherwise, SAS writes all variables and all
observations from input data sets to output data sets.
n You can control which variables and observations that you want to read and write
by using SAS statements, data set options, and functions. See “Controlling the
Reading and Writing of Variables and Observations” in SAS Language Reference:
Concepts for a list of these language elements.
n The WHERE statement controls which observations are read by the SET
statement based on the value of a variable.
n When the WHERE statement is used in a DATA step, the DATA step does not read
every observation in the input data set. Therefore, using the WHERE statement
can improve the efficiency of your SAS programs.
n The WHERE= data set option in the DATA statement controls which observations
are written to the output data set by the DATA statement. When the WHERE=
data set option is specified in the DATA statement, the DATA step reads all the
observations in the input data set.
See Also
n “WHERE Statement” in SAS DATA Step Statements: Reference and “Specify the
WHERE Statement in a SAS DATA Step” in SAS DATA Step Statements:
Reference
n “WHERE= Data Set Option” in SAS Data Set Options: Reference
Examples: Control Variables and Observations in Data Sets 341
Example Code
In this example, the DATA step directly accesses a specific observation in the input
data set and writes the output to a new data set starting with that specified
observation. The FIRSTOBS= data set option specifies that the observations are
written to the output data set, quakes2, beginning with observation number 5 of
the input data set, Sashelp.quakes.
To create an input data set for this example, the following DATA step is used to
create a subset of the Sashelp.Quakes data set. The DATA step creates a subset by
conditionally selecting observations in which the values of the variable Magnitude
are greater than 6.0.
data quakes;
set sashelp.quakes(where=(Magnitude>6.0));
keep Depth Type Magnitude;
run;
proc print data=quakes; run;
Output 14.10 PROC PRINT Output for the Quakes Data Set
In the next DATA step, the FIRSTOBS= data set option is specified in the SET
statement to create a new output data set that begins with observation 5 from the
input data set. The output data set contains all of the remaining observations from
the input data set.
data quakes2;
set quakes(firstobs=5);
run;
proc print data=quakes2; run;
342 Chapter 14 / SAS Data Sets
Output 14.11 PROC PRINT Output Showing Row 5 from the Quakes Data Set as the
First Row in the Quakes2 Data Set
You can also use the following functions to access observations directly by
observation number: the NOTE function, the CUROBS Function, the POINT
function, and the FETCHOBS function.
Key Ideas
n When the OBS= data set option specifies an ending point for processing, the
FIRSTOBS= data set option specifies a starting point. The two options are often
used together to define a range of observations to be processed.
n The OBS= data set option enables you to select observations from SAS data sets.
You can select observations to be read from external data files by using the OBS=
option in the INFILE statement.
See Also
n “FIRSTOBS= System Option” in SAS System Options: Reference and “OBS=
System Option” in SAS System Options: Reference
n “FIRSTOBS= Data Set Option” in SAS Data Set Options: Reference and “OBS=
Data Set Option” in SAS Data Set Options: Reference
Example Code
In this example, the DATA step accesses a range of observations in a data set by
using the FIRSTOBS= and OBS= data set options together. The FIRSTOBS= data set
option specifies that the observations are written to the output data set, quakes2,
beginning with observation number 2 from the input data set, Sashelp.quakes. The
Examples: Control Variables and Observations in Data Sets 343
OBS= data set option tells SAS to stop processing observations after reading a
specified number of observations.
SAS uses the following formula to determine the number of observations to read
when using the OBS= and FIRSTOBS= data set options together: (obs -
firstobs) + 1 = number of rows.
To create the input data set for this example, the first DATA step creates a subset
of the Sashelp.Quakes data set. The second DATA step creates a range of
observations.
data quakes;
set sashelp.quakes(where=(Magnitude>6.0));
keep Depth Type Magnitude;
run;
proc print data=quakes; run;
data quakes2;
set quakes(firstobs=2 obs=4);
run;
proc print data=quakes2; run;
Output 14.12 PROC PRINT Output Showing a Range of Observations Selected from
Quakes and Written to Quakes2
You can also use the following functions to access observations directly by
observation number: the NOTE function, the CUROBS Function, the POINT
function, and the FETCHOBS function.
Key Ideas
n When the OBS= data set option specifies an ending point for processing, the
FIRSTOBS= data set option specifies a starting point. The two options are often
used together to define a range of observations to be processed.
n The OBS= data set option enables you to select observations from SAS data sets.
You can select observations to be read from external data files by using the OBS=
option in the INFILE statement.
344 Chapter 14 / SAS Data Sets
See Also
n “FIRSTOBS= System Option” in SAS System Options: Reference and “OBS=
System Option” in SAS System Options: Reference
n “FIRSTOBS= Data Set Option” in SAS Data Set Options: Reference and “OBS=
Data Set Option” in SAS Data Set Options: Reference
Example Code
In this example, the DATA step directly accesses the third row in the
Sashelp.Comet data set.
A temporary numeric variable, num, is created to hold the value of the observation
to be directly accessed from the Sashelp.Comet data set. The assignment
statement creates the variable and assigns a value of 3, which represents the third
observation in the Sashelp.Comet data set. Then, the POINT= option is specified in
the SET statement and is set equal to the temporary variable num. The OUTPUT
statement writes the current observation to the output data set Comet. The STOP
statement is used to prevent continuous processing of the DATA step.
The CALL SYMPUT routine assigns the value for the row number to a macro
variable that can be used in the TITLE statement to denote which row is being
accessed.
The first PRINT procedure below is used to show the first few rows of the input
data set in which the third row is directly accessed by the DATA step.
run;
data comet;
num=3;
set sashelp.comet point=num;
call symput('num',num);
output;
stop;
run;
Examples: Control Variables and Observations in Data Sets 345
Output 14.13 Partial PROC PRINT Output for the Sashelp.Comet Data Set (for
Comparison)
Output 14.14 PROC Print Output for the Comet Data Set Showing the Directly
Accessed Row 3
Key Ideas
n If you do not instruct it to do otherwise, SAS writes all variables and all
observations from input data sets to output data sets.
n You can control which variables and observations that you want to read and write
by using SAS statements, data set options, and functions. See “Controlling the
Reading and Writing of Variables and Observations” in SAS Language Reference:
Concepts for a list of these language elements.
n Because SAS does not detect an end-of-file when directly accessing an
observation using the POINT= option, you must specify the STOP statement with
the POINT= option to prevent continuous processing of the DATA step.
n The WHERE statement controls which observations are read by the SET
statement based on the value of a variable.
See Also
n POINT= option
n STOP statement
n OUTPUT statement
Example Code
In this example, the CONTENTS procedure displays information about the SAS
data set Sashelp.Snacks.
proc contents data=sashelp.snacks;
run;
Examples: View Descriptor and Sort Information for Data Sets 347
Output 14.15 PROC CONTENTS Output Showing the Descriptor Information for the Sashelp.Snacks
Data Set
Key Ideas
n The descriptor information is the part of a SAS data set that contains information
about the contents of the data set.
n Descriptor information includes the number of observations, the observation
length, the date that the data set was last modified, and other facts.
348 Chapter 14 / SAS Data Sets
See Also
n “CONTENTS Procedure” in Base SAS Procedures Guide
Example Code
In this example, the CONTENTS procedure is used to view information about the
Sashelp.Air data set. The Sorted field in the PROC CONTENTS output indicates
that the data set is not sorted. Therefore, there is no additional Sort Indicator table
in the output.
proc contents data=sashelp.air; run;
Output 14.16 PROC CONTENTS Output Showing That the Data Set Sashelp.Air is
Not Sorted
In the DATA step below, data set air is created from the Sashelp.Air data set and
it is sorted by the variable air. The CONTENTS procedure is used again to view the
sort information.
data air(sortedby=air);
Examples: View Descriptor and Sort Information for Data Sets 349
set sashelp.air;
run;
The PROC CONTENTS output indicates that the data set was sorted using the
SORTEDBY= data set option. The Sort Information table (the Sort Indicator) is
included now. Notice that the Sorted field is set to NO. This is because the
SORTEDBY= data set option was used to sort the data set. Sort information
indicates a valid sort only when the data set is sorted using either PROC SORT or
PROC SQL.
Output 14.17 PROC CONTENTS Output Showing That the Data Set Was Sorted
Using SORTEDBY
Next, the SORT procedure is used to sort the data set by the values of the variable
air, in descending order.
proc sort data=air; by descending air; run;
proc contents data=air; run;
350 Chapter 14 / SAS Data Sets
The PROC CONTENTS output now shows that the data set is sorted and that the
sort is validated. YES in the Validated field indicates that the data was sorted by
SAS using PROC SORT or PROC SQL.
Key Ideas
n The sort indicator is set when a data set is sorted by any of the following methods:
o SORT procedure
o ORDER BY clause in PROC SQL
o MODIFY statement in PROC DATASETS
o SORTEDBY= data set option in the DATA step DATA statement
n PROC SORT and PROC SQL generate validated sorts, which can be seen in the
Validated field of the descriptor information.
n The SORTEDBY= data set option and the SORTEDBY= option in the DATASETS
procedure generate sorts that are not validated.
n SAS procedures that require data to be sorted read sort indicator field in the the
descriptor information for the data set to determine if the data set is sorted.
n You can specify the SORTVALIDATE system option in the OPTIONS statement to
validate sorts and to ensure that the data set is sorted according to the variables
in the BY statement. If the data set is not sorted correctly, SAS sorts the data set.
Examples: View Descriptor and Sort Information for Data Sets 351
See Also
n “CONTENTS Procedure” in Base SAS Procedures Guide
352 Chapter 14 / SAS Data Sets
353
15
Raw Data
n external files
Raw data does not include Database Management System (DBMS) files. You must
license SAS/ACCESS software to access data stored in DBMS files. For more
information about SAS/ACCESS features, see “About SAS/ACCESS Software” in
SAS Language Reference: Concepts.
n SAS I/O functions, such as the FOPEN, FGET, and and FCLOSE functions.
For a description of available functions, see the SAS File I/O and External File
categories in “SAS Functions and CALL Routines by Category” in SAS Functions
and CALL Routines: Reference. See “Using Functions to Manipulate Files” in SAS
Functions and CALL Routines: Reference for information about how statements
and functions manipulate files.
n External File Interface (EFI)
If your operating environment supports a graphical user interface, you can use
the EFI or the Import Wizard to read raw data. The EFI is a point-and-click
graphical interface that you can use to read and write data that is not in SAS
software's internal format. By using EFI, you can read data from an external file
and write it to a SAS data set. You can also read data from a SAS data set and
write it to an external file. For more information about EFI, see SAS/ACCESS
Interface to PC Files: Reference.
n Import Wizard
The Import Wizard guides you through the steps to read data from an external
data source and write it to a SAS data set. As a wizard, it is a series of windows
that present simple choices to guide you through a process. For more
information about the wizard, see SAS/ACCESS Interface to PC Files: Reference.
Reading Raw Data 355
Note: If the data file that you are passing to EFI is password protected, you are
prompted multiple times for your logon ID and password.
Operating Environment Information: Using external files with your SAS jobs
requires that you specify filenames with syntax that is appropriate to your
operating environment. For information about operating system documentation,
see “Operating Environment Information” on page 15.
Note: A semicolon appearing alone on the line immediately following the last data
line is the convention that is used in this example. However, a PROC statement,
DATA statement, or a global statement ending in a semicolon on the line
immediately following the last data line also submits the previous DATA step.
data weight;
input PatientID $ Week1 Week8 Week16;
loss=Week1-Week16;
datalines4;
24;77 195 177 163
24;31 220 213 198
24;56 173 166 155
24;12 135 125 116
;;;;
n Formatted input
n Column input
n Named input
You can also combine styles of input in a single INPUT statement. For details about
the styles of input, see the INPUT statement.
List Input
List input uses a scanning method for locating data values. Data values are not
required to be aligned in columns but must be separated by at least one blank (or
other defined delimiter). List input requires only that you specify the variable
names and a dollar sign ($), if defining a character variable. You do not have to
specify the location of the data fields.
datalines;
Riley 1132 1187
Henderson 1015 1102
;
For more examples, see “Reading Unaligned Data with Simple List Input” in SAS
DATA Step Statements: Reference.
List input has several restrictions on the type of data that it can read:
n Input values must be separated by at least one blank (the default delimiter) or
by the delimiter specified with the DLM= or DLMSTR= option in the INFILE
statement. If you want SAS to read consecutive delimiters as if there is a
missing value between them, specify the DSD option in the INFILE statement.
n Blanks cannot represent missing values. A real value, such as a period, must be
used instead.
n To read and store a character value that is longer than 8 bytes, you must
explicitly define its length. You can define a variable’s length by using either the
LENGTH statement, the INFORMAT statement, or the ATTRIB statement. When
you define a character variable using either of these statements, you must place
the statement that defines the variable’s length first in the DATA step, before
any other references to that variable. You can also specify the variable’s length
by using modified list input, which consists of an informat and the colon
modifier in the INPUT statement.
n Character values cannot contain embedded blanks when the file is delimited by
blanks.
n Fields must be read in order.
Note: Nonstandard numeric values, such as packed decimal data, must use the
formatted style of input. For more information, see “Formatted Input” on page 359 .
n ~ (tilde) format modifier enables you to read and retain single quotation marks,
double quotation marks, and delimiters within character values.
Here is an example of the : and ~ format modifiers. You must use the DSD option in
the INFILE statement. Otherwise, the INPUT statement ignores the ~ format
modifier. This example reads raw instream data using modified list input:
data scores;
infile datalines dsd;
input Name : $9. Score1-Score3 Team ~ $25. Div $;
datalines;
Smith,12,22,46,"Green Hornets, Atlanta",AAA
Mitchel,23,19,25,"High Volts, Portland",AAA
Jones,09,17,54,"Vulcans, Las Vegas",AA
;
proc print data=scores;
For another example, see Reading Delimited Data with Modified List Input.
Column Input
Column input enables you to read standard data values that are aligned in columns
in the data records. Specify the variable name, followed by a dollar sign ($) if it is a
character variable, and specify the columns in which the data values are located in
each record:
data scores;
infile datalines truncover;
input name $ 1-12 score2 17-20 score1 27-30;
datalines;
Riley 1132 987
Henderson 1015 1102
;
For more examples, see “Read Input Records with Column Input” in SAS DATA Step
Statements: Reference.
Reading Raw Data 359
Note: Use the TRUNCOVER option in the INFILE statement to ensure that SAS
handles data values of varying lengths appropriately.
n Placeholders, such as a single period (.), are not required for missing data.
n Input values can be read in any order, regardless of their position in the record.
n Both leading and trailing blanks within the field are ignored.
CAUTION
If you insert tabs while entering data in the DATALINES statement in column
format, you might get unexpected results. This issue exists when you use the
SAS Enhanced Editor or SAS Program Editor. To avoid the issue, do one of
the following actions:
n Replace all tabs in the data with single spaces using another editor outside of
SAS.
n Specify the %INCLUDE statement from the SAS editor to submit your code.
n If you are using the SAS Enhanced Editor, select Tools ð Options ð Enhanced
Editor to change the tab size from 4 to 1.
Formatted Input
Formatted input combines the flexibility of using informats with many of the
features of column input. By using formatted input, you can read nonstandard data
for which SAS requires additional instructions. Formatted input is typically used
with pointer controls that enable you to control the position of the input pointer in
the input buffer when you read data.
The INPUT statement in the following DATA step uses formatted input and pointer
controls to read the raw, instream data listed in the DATALINES statement.
Note that $12. and COMMA5. are informats; +4 and +6 are column pointer controls.
data scores;
360 Chapter 15 / Raw Data
datalines;
Riley 1,132 1,187
Henderson 1,015 1,102
;
For more examples, see “Formatted Input with Pointer Controls” in SAS DATA Step
Statements: Reference.
Note: You can also use informats to read data that is not aligned in columns. See
“Modified List Input” in SAS Language Reference: Concepts for more information.
n Placeholders, such as a single period (.) are not required for missing data.
n With the use of pointer controls to position the pointer, input values can be read
in any order, regardless of their positions in the record.
n Values or parts of values can be reread.
n Formatted input enables you to read data stored in nonstandard form, such as
packed decimal or numbers with commas.
Named Input
Named input enables you to read records in which data values are preceded by the
name of the variable and an equal sign (=). The following INPUT statement reads
the data lines containing equal signs.
data games;
input name=$ score1= score2=;
datalines;
name=riley score1=1132 score2=1187
;
For more examples, see “Using List and Named Input” in SAS DATA Step
Statements: Reference.
Note: When an equal sign follows a variable in an INPUT statement, SAS expects
that data remaining on the input line contains only named input values. You cannot
switch to another form of input in the same INPUT statement after using named
input. Also, note that any variable that exists in the input data but is not defined in
the INPUT statement generates a note in the SAS log indicating a missing field.
Reading Raw Data 361
variable-length data fields read delimited data list input with or without a
and records format modifier in the INPUT
statement and the TRUNCOVER,
DLM=, DLMSTR=, or , or DSD
options in the INFILE statement
(Examples)
For more information about data-reading features, see the INPUT and INFILE
statements in SAS DATA Step Statements: Reference.
n 1166.42
nonstandard data
is data that can be read only with the aid of informats. Examples of nonstandard
data include numeric values that contain commas, dollar signs, or blanks; date
and time values; and hexadecimal and binary values.
Numeric Data
Numeric data can be represented in several ways. SAS can read standard numeric
values without any special instructions. To read nonstandard values, SAS requires
special instructions in the form of informats. “Reading Nonstandard Numeric Data”
in SAS Language Reference: Concepts shows standard, nonstandard, and invalid
numeric data values and the special tools, if any, that are required to read them. For
364 Chapter 15 / Raw Data
complete descriptions of all SAS informats, see SAS Formats and Informats:
Reference.
J23 not a number Read as a character value, or edit the raw data to change
it to a valid number.
Character Data
A value that is read with an INPUT statement is assumed to be a character value if
one of the following conditions is true:
n A dollar sign ($) follows the variable name in the INPUT statement.
n The variable has been previously defined as character. For example, a value is
assumed to be a character value if the variable has been previously defined as
character in a LENGTH statement, in the RETAIN statement, by an assignment
statement, or in an expression.
Input data that you want to store in a character variable can include any character.
Use the guidelines in the following table when your raw data includes leading
blanks and semicolons.
366 Chapter 15 / Raw Data
Table 15.5 Reading Instream Data and External Files Containing Leading Blanks and
Semicolons
leading or trailing blanks that formatted input and the List input trims leading
you want to preserve $CHARw. informat and trailing blanks from
a character value
before the value is
assigned to a variable.
delimiters, blank characters, DSD option, with DLM= or These options enable
or quoted strings DLMSTR= option in the SAS to read a character
INFILE statement value that contains a
delimiter within a
quoted string; these
options can also treat
two consecutive
delimiters as a missing
value and remove
quotation marks from
character values.
n It does not match the input style used. An example is if it is read as standard
numeric data (no dollar sign or informat) but it does not conform to the rules for
standard SAS numbers.
n It is out of range (too large or too small).
If SAS reads a data value that is incompatible with the type specified for that
variable, SAS tries to convert the value to the specified type. If conversion is not
possible, an error occurs, and SAS performs the following actions:
n sets the value of the variable being read to missing or to the value specified
with the INVALIDDATA= system option.
n prints an invalid data note in the SAS log.
n prints the input line and column number containing the invalid value in the SAS
log. If a line contains unprintable characters, it is printed in hexadecimal form. A
scale is printed above the input line to help determine column numbers.
For more information about missing values, see “Reading Column-Binary Data” on
page 377.
368 Chapter 15 / Raw Data
The following example shows how to code missing values by using a MISSING
statement in a DATA step:
data test_results;
missing a b c;
input name $8. Answer1 Answer2 Answer3;
datalines;
Smith 2 5 9
Jones 4 b 8
Carter a 4 7
Reed 3 5 c
;
Note that you must use a period when you specify a special missing numeric value
in an expression or assignment statement, as in the following:
x=.d;
However, you do not need to specify each special missing numeric data value with
a period in your input data. For example, the following DATA step, which uses
periods in the input data for special missing values, produces the same result as the
input data without periods:
data test_results;
missing a b c;
input name $8. Answer1 Answer2 Answer3;
datalines;
Smith 2 5 9
Jones 4 .b 8
Carter .a 4 7
Reed 3 5 .c
;
proc print;
run;
Definitions for External Files 369
Note: SAS is displayed and prints special missing values that use letters in
uppercase.
external files
files that are managed and maintained by your operating system, not by SAS.
They contain data or text as input to SAS jobs, or they are
External files as input to a SAS session include the following:
n records of raw data in external files that you want to read into SAS as input,
including data in the form of plain ASCII text files and binary files.
n programming statements in external files that you want to submit to SAS for
execution.
n files created as the result of running a SAS program, including SAS log files
and results from SAS procedures. For example, the PRINTTO procedure
enables you to direct procedure output to an external file. Every SAS job
creates at least one external file, the SAS log. SAS catalog files and ODS
output destinations are also examples of external files.
Note: External files do not include database management system (DBMS) files or
PC files. DBMS and PC files are a special category of files that can be read with
SAS/ACCESS software.
n For information about DBMS files, see Chapter 16, “Database and PC Files,” on
page 385 and SAS/ACCESS for Relational Databases: Reference.
n For information about access to PC files, see Chapter 16, “Database and PC
Files,” on page 385 and SAS/ACCESS Interface to PC Files: Reference
run;
1. In some operating environments, you can also use the command '&' to assign a fileref.
Reading External Files 373
run;
Specify the file that the FILE statement filename myoutput 'c:\Users\sasuser\lake.txt';
PUT statement writes data _null_; set sashelp.lake;
file myoutput;
values to.
put Width Length Depth;
run;
“Reading from Multiple Input Files” in SAS DATA Step Statements: Reference
“Reading from Multiple Input Files” in SAS DATA Step Statements: Reference
on a WebDAV
server.
accessed by specifier
using Zlib
services.
See SAS DATA Step Statements: Reference for detailed information about each of
these statements.
Different computer platforms store numeric binary data in different forms. The
ordering of bytes differs by platforms that are referred to as either “big endian” or
“little endian.” For more information, see “Byte Ordering for Integer Binary Data on
Big Endian and Little Endian Platforms” in SAS Formats and Informats: Reference.
SAS provides a number of informats for reading binary data and corresponding
formats for writing binary data. Some of these informats read data in native mode,
that is, by using the byte-ordering system that is standard for the system on which
SAS is running. Other informats force the data to be read by the IBM 370 standard,
regardless of the native mode of the system on which SAS is running. The informats
that read in native or IBM 370 mode are listed in the following table.
If you write a SAS program that reads binary data and that is run on only one type
of system, you can use the native mode informats and formats. However, if you
want to write SAS programs that can be run on multiple systems that use different
byte-storage systems, use the IBM 370 informats. The IBM 370 informats enable
you to write SAS programs that can read data in this format and that can be run in
any SAS environment, regardless of the standard for storing numeric data.
For example, using the IBM 370 informats, you could download data that contain
binary integers from a mainframe to a PC and then use the S370FIB informats to
read the data.
Note: Anytime a text file originates from anywhere other than the local encoding
environment, it might be necessary to specify the ENCODING= option on either
EBCDIC or ASCII systems. When you read an EBCDIC text file on an ASCII
platform, it is recommended that you specify the ENCODING= option in the
Reading External Files 377
FILENAME or INFILE statement. However, if you use the DSD and the DLM= or
DLMSTR= options in the INFILE statement, the ENCODING= option is a
requirement because these options require certain characters in the session
encoding (such as quotation marks, commas, and blanks). Reserve encoding-
specific informats for use with true binary files that contain both character and
non-character fields.
For complete descriptions of all SAS formats and informats, including how numeric
binary data is written, see SAS Formats and Informats: Reference.
n how to set the RECFM= and LRECL= options in the INFILE statement
To read column-binary data, you must set two options in the INFILE statement:
n Set RECFM= to F for fixed.
n Set the LRECL= to 160, because each card column of column-binary data is
expanded to two bytes before the fields are read.
For example, to read column-binary data from a file, use an INFILE statement in the
following form before the INPUT statement that reads the data:
data out;
infile file-specification or path-name recfm=f lrecl=160;
input var1;
run;
Note: The expansion of each column of column-binary data into two bytes does
not affect the position of the column pointer. You use the absolute column pointer
control @, as usual, because the informats automatically compute the true location
378 Chapter 15 / Raw Data
on the doubled record. If a value is in column 23, use the pointer control @23 to
move the pointer there.
Example Code
In this example, the IMPORT procedure imports a file that contains comma-
separated values.
1 The FILENAME statement assigns the chol fileref. The TEMP option specifies
that the file is temporary, so a path name is not necessary.
2 The HTTP procedure specifies the URL of the cholesterol.csv input file. The
data is written to the chol fileref.
3 PROC IMPORT reads the comma-delimited data and creates the mycholesterol
data set.
4 The PRINT procedure prints the data set. The output shows that the file was
imported correctly, including the column names.
The output shows that the file was imported correctly, including the column names.
Examples: Read External Files Using PROC IMPORT 379
Key Ideas
n Base SAS software can import and export some external text files without using a
SAS/ACCESS engine. These files are usually referred to as raw data or delimited
data. The DATA step or PROC IMPORT can read data from a text file and provide
that data to the V9 engine for output to a SAS data set.
n If you want Base SAS to read or write a Microsoft Excel file, the file must have
a .csv extension. To import a file that has an Excel file extension of .xls or .xlsx,
you must have a license to SAS/ACCESS Interface to PC Files. In Excel, you can
save an Excel file as a .csv file.
n If you create a SAS data set from an Excel file, the data is static and does not
change to reflect the underlying Excel data. If you want to keep data in Excel as an
ongoing data store that you can query from SAS, then use the XLSX or EXCEL
engine. You must have a license to SAS/ACCESS Interface to PC Files.
See Also
n “Default Base SAS Engine (V9 Engine)” on page 294
Example Code
The following example shows how to write data from a Sashelp data set to an
external, space-delimited text file. It then reads the data from the external text file
into a SAS data set named air.
PROC IMPORT builds a DATA step to read the external file and writes the DATA
step code to the SAS log:
Examples: Read External Files Using PROC IMPORT 381
Output 15.4 Partial PROC PRINT Output for Reading a Space-Delimited Text File
Using PROC IMPORT
382 Chapter 15 / Raw Data
Output 15.5 Log Output for Reading a Space-Delimited Text File Using PROC
IMPORT
1430 /**********************************************************************
1431 * PRODUCT: SAS
1432 * VERSION: 9.4
1433 * CREATOR: External File Interface
1434 * DATE: 17OCT19
1435 * DESC: Generated SAS Datastep Code
1436 * TEMPLATE SOURCE: (None Specified.)
1437 ***********************************************************************/
1438 data WORK.AIR; 1
1439 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
1440 infile 'c:\Users\sasuser\txt\air.txt' delimiter = ' '
MISSOVER 2
DSD 3
lrecl=32767
1440! firstobs=2; 4
1441 informat DATE MONYY7.; 5
1442 informat AIR best32. ;
1443 format DATE MONYY7. ; 6
1444 format AIR best12. ;
1445 input 7
1446 DATE
1447 AIR
1448 ;
1449 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection
macro variable */
1450 run;
54
1 PROC IMPORT automatically generates this DATA step and prints the step to
the SAS Log.
2 The MISSOVER option prevents the INPUT statement from reading a new input
data record if it does not find values in the current input line for all the variables
in the statement.
3 The DSD option specifies that when data values are enclosed in quotation
marks, delimiters within the value are treated as character data.
4 FIRSTOBS= data set option specifies the first observation to be processed in
the data set.
5 MONYYw. informat reads month and year date values in the form monyy.
6 The MMYYxw. format writes date values in the form mm<yy>yy or mm-<yy>yy,
where the x in the format name is a character that represents the special
character that separates the month and the year.
7 The INPUT statement assigns input values to the corresponding SAS variables.
Examples: Read External Files Using PROC IMPORT 383
Key Ideas
n You can use the DATAROW statement to specify which row to begin reading data
from. For example, you specify DATAROW=1 for external files that do not contain
column names.
See Also
n “PROC IMPORT Statement” in Base SAS Procedures Guide
16
Database and PC Files
Note: To use the SAS/ACCESS features described in this section, you must license
SAS/ACCESS software. See the SAS/ACCESS documentation for your DBMS for
full documentation of the features described in this section.
You can use a DATA step, SAS procedures, or the Explorer window to view and
update the DBMS data associated with the libref, or use the DATASETS and
CONTENTS procedures to view information about the DBMS objects.
proc sql;
select *
from mydb2lib.employees(drop=salary)
where dept='Accounting';
quit;
SQL Procedure Pass-Through Facility 387
The LIBNAME statement connects to DB2. You can reference a DBMS object, in
this case, a DB2 table, by specifying a two-level name that consists of the libref and
the DBMS object name. The DROP= data set option causes the SALARY column of
the EMPLOYEES table on DB2 to be excluded from the data that is returned by the
query.
See your SAS/ACCESS documentation for a full listing of the SAS/ACCESS data
set options and the Base SAS data set options that can be used on data sets that
refer to DBMS data.
proc sql;
create view viewlib.emp_view as
select *
from mydblib.employees
using libname mydblib oracle user=smith password=secret
path='myoraclepath';
quit;
When PROC SQL executes the SAS view, the SELECT statement assigns the libref
and establishes the connection to the DBMS. The scope of the libref is local to the
SAS view and does not conflict with identically named librefs that might exist in
the SAS session. When the query finishes, the connection is terminated and the
libref is unassigned.
Note: You can also embed a Base SAS LIBNAME statement in a PROC SQL view.
pass-through facility. You can use pass-through facility statements in a PROC SQL
query or store them in a PROC SQL view.
select *
from connection to myconn
(select empid, lastname, firstname, salary
from employees
where salary>75000);
To store the same query in an SQL procedure, use the CREATE VIEW statement:
libname viewlib
'SAS-library';
proc sql;
connect to oracle as myconn (user=smith password=secret
path='myoracleserver');
The following example creates an access descriptor and a view descriptor in the
same PROC step to retrieve data from a DB2 table:
libname adlib 'SAS-library';
libname vlib 'SAS -library';
create vlib.custord.view;
select ordernum stocknum shipto;
format ordernum 5.
stocknum 4.;
run;
When you want to use access descriptors and view descriptors, both types of
descriptors must be created before you can retrieve your DBMS data. The first step,
creating the access descriptor, enables SAS to store information about the specific
DBMS table that you want to query.
After you have created the access descriptor, the second step is to create one or
more view descriptors to retrieve some or all of the DBMS data described by the
access descriptor. In the view descriptor, you select variables and apply formats to
390 Chapter 16 / Database and PC Files
manipulate the data for viewing, printing, or storing in SAS. You use only the view
descriptors, and not the access descriptors, in your SAS programs.
The interface view engine enables you to reference your SAS view with a two-level
SAS name in a DATA or PROC step, such as the PROC PRINT step in the example.
See “SAS Views” in SAS V9 LIBNAME Engine: Reference for more information about
SAS views. See the SAS/ACCESS documentation for your DBMS for more detailed
information about creating and using access descriptors and SAS/ACCESS views.
DBLOAD Procedure
The DBLOAD procedure enables you to create and load data into a DBMS table
from a SAS data set, data file, SAS view, or another DBMS table, or to append rows
to an existing table. It also enables you to submit non-query DBMS-specific SQL
statements to the DBMS from your SAS session.
The following example appends data from a previously created SAS data set named
INVDATA into a table in an ORACLE database named INVOICE:
proc dbload dbms=oracle data=invdata append;
user=smith;
password=secret;
path='myoracleserver';
table=invoice;
load;
run;
See the SAS/ACCESS documentation for your DBMS for more detailed information
about the DBLOAD procedure.
n The INPUT statement is used with the INFILE statement to issue a GET call to
retrieve DBMS data.
n The FILE statement identifies the database or message queue to be updated, if
writing to the DBMS is supported.
n The PUT statement is used with the FILE statement to issue an UPDATE call, if
writing to the DBMS is supported.
The following example updates data in an IMS database by using the FILE and
INFILE statements in a DATA step. The statements generate calls to the database
in the IMS native language, DL/I. The DATA step reads Bank.Customer, an existing
SAS data set that contains information about new customers, and then it updates
the ACCOUNT database with the data in the SAS data set.
data _null_;
set bank.customer;
length ssa1 $9;
infile accupdt dli call=func dbname=db ssa=ssa1;
file accupdt dli;
func = 'isrt';
db = 'account';
ssa1 = 'customer';
put @1 ssnumber $char11.
@12 custname $char40.
@52 addr1 $char30.
@82 addr2 $char30.
@112 custcity $char28.
@140 custstat $char2.
@142 custland $char20.
@162 custzip $char10.
@172 h_phone $char12.
@184 o_phone $char12.;
if _error_ = 1 then
abort abend 888;
run;
In SAS/ACCESS products that provide a DATA step interface, the INFILE statement
has special DBMS-specific options that enable you to specify DBMS variable
values and to format calls to the DBMS appropriately. See the SAS/ACCESS
documentation for your DBMS for a full listing of the DBMS-specific INFILE
statement options and the Base SAS INFILE statement options that can be used
with your DBMS.
392 Chapter 16 / Database and PC Files
393
17
SAS Views
n In most cases, you can use a SAS view as if it were a SAS data set.
SAS views are supported by the V9 engine. SAS views can also reference DBMS
data if the appropriate SAS/ACCESS engine is licensed. SAS views are not
supported by the CAS engine or the SPD Engine. SAS V9 LIBNAME Engine:
Reference
394 Chapter 17 / SAS Views
395
18
SAS Dictionary Tables
When you access a DICTIONARY table, SAS determines the current state of the
SAS session and returns the desired information accordingly. This process is
performed each time a DICTIONARY table is accessed, so that you always have
current information.
n use any SAS procedure or the DATA step, referring to the PROC SQL view of the
table in the Sashelp library
Some DICTIONARY tables can become quite large. In this case, you might want to
view a part of a DICTIONARY table that contains only the data that you are
interested in. The best way to view part of a DICTIONARY table is to subset the
table using a PROC SQL WHERE clause.
The following steps describe how to use the VIEWTABLE or FSVIEW utilities to
view a DICTIONARY table in a windowing environment.
2 Select the Sashelp library. A list of members in the Sashelp library appears.
3 Select a SAS view with a name that starts with V (for example, VMEMBER).
A VIEWTABLE window appears that contains its contents. (For z/OS, type the
letter 'O' in the command field for the desired member and press Enter. The
FSVIEW window appears with the contents of the view.)
In the VIEWTABLE window the column headings are labels. To see the column
names, select View ð Column Names.
The result of the DESCRIBE TABLE statement appears in the SAS log:
NOTE: SQL table DICTIONARY.INDEXES was created like:
n The first word on each line is the column (or variable) name. You need to use
this name when you write a SAS statement that refers to the column (or
variable).
n Following the column name is the specification for the type of variable and the
width of the column.
n The name that follows label= is the column (or variable) label.
After you know how a table is defined, you can use the processing ability of the
PROC SQL WHERE clause in a PROC SQL step to extract a portion of a SAS view.
Note that many character values in the DICTIONARY tables are stored as all-
uppercase characters; you should design your queries accordingly.
CAUTION
Do not confuse the GENNUM variable value in CONTENTS OUT= data set with
the GEN variable value from DICTIONARY tables. GENNUM from a CONTENTS
procedure or statement refers to a specific generation of a data set. GEN from
DICTIONARY tables refers to the total number of generations for a data set.
For example, the following programs both produce the same result, but the PROC
SQL step runs much faster because the WHERE clause is processed before opening
the tables used by Sashelp.VCOLUMN view:
data mytable;
set sashelp.vcolumn;
where libname='WORK' and memname='SALES';
run;
How to View DICTIONARY Tables 399
proc sql;
create table mytable as
select * from sashelp.vcolumn
where libname='WORK' and memname='SALES';
quit;
Note: SAS does not maintain DICTIONARY table information between queries.
Each query of a DICTIONARY table launches a new discovery process.
If you are querying the same DICTIONARY table several times in a row, you can get
even faster performance by creating a temporary SAS data set and running your
query against that data set. You can create the temporary data set by using the
DATA step SET statement or PROC SQL CREATE TABLE AS statement.
400 Chapter 18 / SAS Dictionary Tables
401
PART 4
Manipulating Data
Chapter 19
Grouping Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
Chapter 20
Loops and Conditionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
Chapter 21
Combining Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
Chapter 22
Using Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
Chapter 23
Using Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573
Chapter 24
Debugging Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595
Chapter 25
Optimizing System Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
Chapter 26
Using Parallel Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
402
403
19
Grouping Data
For more information about BY-Group processing, see “Reading, Combining, and
Modifying SAS Data Sets” in SAS Language Reference: Concepts. See also
Combining and Modifying SAS Data Sets: Examples.
Understanding BY Groups 405
Syntax
Syntax
DATA step BY-groups are created and managed using the BY statement in SAS. See
“BY Statement” in SAS DATA Step Statements: Reference for complete syntax
information.
Understanding BY Groups
The first BY group contains all observations with the smallest value for the BY
variable zipCode. The second BY group contains all observations with the next
smallest value for the BY variable, and so on.
406 Chapter 19 / Grouping Data
You can then specify the BY variable in the DATA step using the following code:
Example Code 19.2 Sort and Group the zipCode Data Set by a Single Variable
proc sort data=zip;
by zipcode;
run;
data zip;
set zip; by zipcode;
run;
The figure shows three BY groups. The data set is shown with the BY variables
State and City printed on the left for easy reading. The position of the BY variables
in the observations does not affect how the values are grouped and ordered.
The observations are arranged so that the observations for Arizona occur first. The
observations within each value of State are arranged in order of the value of City.
Each BY group has a unique combination of values for the variables State and City.
For example, the BY value of the first BY group is AZ Tucson, and the BY value of
the second BY group is FL Lakeland.
Here is the code for creating the output shown in the figure “BY Groups with
Multiple BY Variables” on page 407 :
Example Code 19.4 Sort and Group the zipCode Data Set by Multiple BY Variables
proc sort data=zip;
by State City;
run;
data zip;
set zip;
by State City;
run;
proc print data=zip noobs;
title 'BY Groups with Multiple BY Variables: State City';
run;
n PROC step (For information about BY-group processing with procedures, see
“Creating Titles That Contain BY-Group Information ” in Base SAS Procedures
Guide.)
The following DATA step program uses the SET statement to combine observations
from three SAS data sets by interleaving the files. The data is ordered by State City
and Zip.
data all_sales;
set region1 region2 region3;
by State City Zip;
… more SAS statements …
run;
Preprocessing Input Data for BY-Group Processing 409
If the observations are not in the order that you want, you must either sort the data
set or create an index for it before using BY-group processing.
If you use the MODIFY statement in BY-group processing, you do not need to
presort the input data. Presorting, however, can make processing more efficient and
less costly.
You can use PROC SQL views in BY-group processing. For complete information,
see SAS SQL Procedure User’s Guide.
Note: SAS/ACCESS Users: If you use SAS views or librefs, see SAS/ACCESS for
Relational Databases: Reference for information about using BY groups in your SAS
programs.
ascending values of the variables State and ZipCode, and replaces the original data
set.
proc sort data=information;
by State ZipCode;
run;
As a general rule, specify the variables in the PROC SORT BY statement in the
same order that you specify them in the DATA step BY statement. For a detailed
description of the default sorting orders for numeric and character variables, see
the SORT procedure in Base SAS Procedures Guide.
Note: The BY statement honors the linguistic collation of sorted data when you
use the SORT procedure with the SORTSEQ=LINGUISTIC option.
Note: Because creating and maintaining indexes require additional resources, you
should determine whether using them significantly improves performance.
Depending on the nature of the data in your SAS data set, using PROC SORT to
order data values can be more advantageous than indexing. For an overview of
indexes, see “Understanding SAS Indexes” in SAS Language Reference: Concepts.
n LAST.variable
For example, if the DATA step specifies the variable state in the BY statement,
then SAS creates the temporary variables FIRST.state and LAST.state.
FIRST. and LAST. DATA Step Variables 411
These temporary variables are available for DATA step programming but are not
added to the output data set. Their values indicate whether an observation is one
of the following positions:
n the first one in a BY group
n both first and last, as is the case when there is only one observation in a BY
group
You can take actions conditionally, based on whether you are processing the first or
the last observation in a BY group. See “Processing BY-Groups Conditionally” on
page 416 for more information.
n For the last observation in a data set, the value of all LAST.variable variables are
set to 1.
data sedanTypes;
set cars;
by 'Sedan Types'n;
if 'first.Sedan Types'n then type=1;
run;
412 Chapter 19 / Grouping Data
For more information about BY-Group Processing and how SAS creates the
temporary variables, FIRST and LAST, see “How SAS Determines FIRST.variable
and LAST.variable” in SAS Language Reference: Concepts and “How SAS Identifies
the Beginning and End of a BY Group” in SAS DATA Step Statements: Reference.
n FIRST and LAST variables are referenced in the DATA step but they are not part
of the output data set. Also, fixed output image.
n Six temporary variables are created for each BY variable: FIRST.State,
LAST.State, FIRST.City, LAST.City, FIRST.ZipCode, and LAST.ZipCode.
data zip;
input State $ City $ ZipCode Street $20-29;
datalines;
FL Miami 33133 Rice St
FL Miami 33133 Thomas Ave
FL Miami 33133 Surrey Dr
FL Miami 33133 Trade Ave
FL Miami 33146 Nervia St
FL Miami 33146 Corsica St
FL Lakeland 33801 French Ave
FL Lakeland 33809 Egret Dr
AZ Tucson 85730 Domenic Ln
AZ Tucson 85730 Gleeson Pl
;
proc sort data=zip; by State City ZipCode; run;
data zip2;
set zip;
by State City ZipCode;
put _n_= City State ZipCode
first.city= last.city=
first.state= last.state=
first.ZipCode= last.ZipCode= ;
run;
FIRST. and LAST. DATA Step Variables 413
Example Code 19.1 Grouping Observations by State, City, and ZIP Code
Note: This is a chart used to display the contents of the log more clearly. It is not
the output data set.
data _null_;
set fruit; by x y z;
if _N_=1 then put 'Grouped by X Y Z';
put _N_= x= first.x= last.x= first.y= last.y= first.z= last.z= ;
run;
data _null_;
set fruit; by y x z;
if _N_=1 then put 'Grouped by Y X Z';
put _N_= first.y= last.y= first.x= last.x= first.z= last.z= ;
run;
Grouped by X Y Z
_N_=1 FIRST.x=1 LAST.x=0 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=0
_N_=2 FIRST.x=0 LAST.x=0 FIRST.y=0 LAST.y=1 FIRST.z=0 LAST.z=1
_N_=3 FIRST.x=0 LAST.x=1 FIRST.y=1 LAST.y=1 FIRST.z=1 LAST.z=1
_N_=4 FIRST.x=1 LAST.x=1 FIRST.y=1 LAST.y=1 FIRST.z=1 LAST.z=1
Grouped by Y X Z
_N_=1 FIRST.y=1 LAST.y=0 FIRST.x=1 LAST.x=0 FIRST.z=1 LAST.z=0
_N_=2 FIRST.y=0 LAST.y=1 FIRST.x=0 LAST.x=1 FIRST.z=0 LAST.z=1
_N_=3 FIRST.y=1 LAST.y=0 FIRST.x=1 LAST.x=1 FIRST.z=1 LAST.z=1
_N_=4 FIRST.y=0 LAST.y=1 FIRST.x=1 LAST.x=1 FIRST.z=1 LAST.z=1
Overview
The most common use of BY-group processing is to combine data sets by using the
BY statement with the SET, MERGE, MODIFY, or UPDATE statements. (If you use a
SET, MERGE, or UPDATE statement with the BY statement, your observations
must be grouped or ordered.) When processing these statements, SAS reads one
observation at a time into the program data vector. With BY-group processing, SAS
selects the observations from the data sets according to the values of the BY
variable or variables. After processing all the observations from one BY group, SAS
expects the next observation to be from the next BY group.
416 Chapter 19 / Grouping Data
The BY statement modifies the action of the SET, MERGE, MODIFY, or UPDATE
statement by controlling when the values in the program data vector are set to
missing. During BY-group processing, SAS retains the values of variables until it has
copied the last observation that it finds for that BY group in any of the data sets.
Without the BY statement, the SET statement sets variables to missing when it
reads the last observation. The MERGE statement does not set variables to missing
after the DATA step starts reading observations into the program data vector.
This example assumes that the data is grouped by the character variable MONTH.
The subsetting IF statement conditionally writes an observation, based on the
value of LAST.month. This DATA step writes an observation only after processing
the last observation in each BY group.
data sales;
input month
data total_sale(drop=sales);
set region.sales
by month notsorted;
total+sales;
if last.month;
run;
418 Chapter 19 / Grouping Data
The GROUPFORMAT option is valid only in the DATA step that creates the SAS
data set. It is particularly useful with user-defined formats. The following examples
illustrate the use of the GROUPFORMAT option.
data _null_;
format height range.;
set sorted_class;
by height groupformat;
if first.height then
put 'Shortest in ' height 'measures ' height:best12.;
run;
Shortest
in Under 55 measures 51.3
Shortest in 55 to 60 measures 56.3
Shortest in 60 to 65 measures 62.5
Shortest in 65 to 70 measures 65.3
Shortest in Over 70 measures 72
Processing BY-Groups in the DATA Step 419
20
Loops and Conditionals
Conditional processing
statements or expressions in SAS that perform different computations or
actions depending on whether a programmer-specified condition is true or false.
For a list of these statements, see “Summary of Statements for Conditional
Processing in SAS” on page 423.
Control flow
programming constructs in SAS that control the sequence and flow of program
execution. For example, conditional processing statements and DO loops are
types of control flow statements. For a list of additional control flow
statements in SAS, see “Other Control Flow Statements” on page 447.
For more information, see “Altering the Flow for a Given Observation” on page
874.
DO group
a group of SAS DATA step statements that begins with the DO statement and
ends with the END statement. DO group statements are executed as a unit. A
DO group can consist of an iterative DO statement (also known as a DO loop),
or it can consist of a simple DO statement. Simple DO statements are often
used with conditional IF-THEN/ELSE statements.
DO loop
instructions that are continually repeated until a certain condition is reached.
The iterative DO statement enables you to create DO loops. For more
information, see “Types of DO Loops” on page 425.
DOLIST syntax
a type of syntax that is based on the iterative DO loop, in which the loop
iterates over the elements of a list. The list serves as the start-value in the DO
statement and the items in the list are separated by commas. DOLIST syntax
can also be specified in the iterative DO WHILE and DO UNTIL statements. For
more information, see “Types of DO Loops” on page 425.
Summary of Statements for Conditional Processing in SAS 423
WHERE expression
a WHERE expression is a type of SAS expression that defines a condition for
selecting rows to be processed in a SAS data set. The following WHERE
statement defines a single condition for selecting rows:
where sales > 600000;
More information
WHERE clause in the conditionally selects proc sql; select state from crime where murder > 7;
PROC SQL step rows that meet run; quit;
specified conditions.
More information
Table 20.2 Base SAS Procedures That Support WHERE Expression Processing
RANK procedure
DO Loops
Types of DO Loops
In programming, you might need to execute the same set of statements repeatedly
or execute the same set of statements for a specific number of times. This type of
processing requires the use of loops. In SAS, you create loops by specifying the DO
statement. There are four basic DO loops used in SAS programming:
Log output
x=1 i=1
x=2 i=2
x=3 i=3
Log output
x=8
x=9
x=10
WHERE Expressions
WHERE Statement
You can use the WHERE statement in SAS to select rows from a SAS data set that
meet a particular condition.
data cars;
set sashelp.cars;
where weight>6000;
run;
For complete syntax information, see “WHERE Statement” in SAS DATA Step
Statements: Reference.
Note: By default, a WHERE expression does not evaluate added and modified
rows. To specify whether a WHERE expression should evaluate updates, you can
specify the WHEREUP= data set option. See the “WHEREUP= Data Set Option” in
SAS Data Set Options: Reference.
See Also
“Example: Conditionally Select Rows Using the WHERE Statement” on page 457
This example creates a SAS data set that selects only rows in which the date is
later than January 1, 1953. This value is an example of a SAS date constant and it is
also used with the comparison operator, greater than (>) to perform a comparison
of date values.
data air;
set sashelp.air;
where date>'01jan1953'd;
run;
For example, in the following DATA step, the WHERE expression returns all rows
from the sashelp.heart data set in which the numeric values for AgeCHDdiag are
not missing or zero and the character values for DeathCause are not blank or
missing:
data heart;
set sashelp.heart;
where AgeCHDdiag and DeathCause;
run;
The following DATA step uses the SUBSTR function to produce a SAS data set that
contains only rows in which name begins with Jan:
data class; set sashelp.class; run;
data J_class;
set class;
where substr (name,1,3) = 'Jan';
run;
proc print data=J_class;
430 Chapter 20 / Loops and Conditionals
run;
Output 20.1 PROC PRINT Output Showing the SUBSTR Function Used in a WHERE
Statement
The following SAS functions are commonly used with WHERE expressions:
n SUBSTRN function to extract a substring.
OF syntax is permitted in some SAS functions. For example, in the following DATA
step, OF is used with the RANGE function to return the difference between the
largest and smallest values in the list d1–d3. This is permitted as long as the
function containing the OF operator is not specified in a WHERE expression.
data test;
n1=1;
n2=2;
n3=3;
num = range(of n1-n3);
run;
If you specify this same function with OF in a WHERE expression, SAS returns an
error to the log:
Instead of using an OF in a WHERE clause, you can enumerate all the variables
from the OF list:
where num < range(n1,n2,n3);
Note: You can use the TRIM function and the SUBSTR function in a WHERE
expression to improve performance on indexed data. For more information, see
“Indexes and WHERE Expressions” on page 442.
See Also
n “The OF Operator with Functions and Variable Lists” on page 100
n SAS Functions and CALL Routines: Reference
WHERE Expressions 431
Constant
Type Description Example
Arithmetic Operators
Arithmetic operators enable you to perform a mathematical operation. The
arithmetic operators include the following:
Operator
Symbol Description Example
/ division where f = g / h;
Comparison Operators
Comparison operators (also called binary operators) compare a variable with a
value or another variable.
For example, the following WHERE expression accesses only those rows that have
the value 78753 for the numeric variable zipcode:
where zipcode eq 78753;
When you do character comparisons, you can use the colon (:) modifier to compare
only a specified prefix of a character string. For example, in the following WHERE
expression, the colon modifier, used after the equal sign, tells SAS to look at only
the first character in the values for variable LastName and to select the rows with
names beginning with the letter S:
where lastname=: 'S';
Note that in the SQL procedure, the colon modifier that is used in conjunction with
an operator is not supported. You can use the LIKE operator instead.
Logical Operators
You can combine or modify WHERE expressions by using the Boolean logical
operators AND, OR, and NOT. The basic syntax of a compound WHERE expression
is as follows:
The logical operators and their equivalent symbols are shown in the following
table:
Mnemonic
Symbol Equivalent Description Example
Mnemonic
Symbol Equivalent Description Example
3 OR – processed last.
For example, suppose that you want a list of all the Canadian sites that have either
SAS/GRAPH or SAS/STAT software. You issue the following expression without
using parentheses:
WHERE Expressions 435
The result, however, includes all sites that license SAS/GRAPH software along with
the Canadian sites that license SAS/STAT software. To obtain the correct results,
you can use parentheses, which causes SAS to evaluate the comparisons within the
parentheses first, providing a list of sites with either product licenses, then the
result is used for the remaining condition:
where (product='GRAPH' or product='STAT') and country='Canada';
IN Operator
The IN operator is also a comparison operator. It searches for character and
numeric values that are equal to one of the values from a list of values. The list of
values must be in parentheses, with each character value in quotation marks and
separated by either a comma or blank.
For example, suppose that you want all sites that are in North Carolina or Texas.
You could specify:
where state = 'NC' or state = 'TX';
However, it is easier to use the IN operator, which selects any state in the list:
where state in ('NC','TX');
In addition, you can use the NOT logical operator to exclude a list:
where state not in ('CA', 'TN', 'MA');
n y = x in (1:10);
See Also
“Types of Variable Lists” on page 99
Interval Comparisons
An interval comparison (or, fully bounded range condition) consists of a variable
between two comparison operators, specifying both an upper and lower limit. For
example, the following expression returns the employee numbers that fall within
the range of 500 to 1000. In this case, since the comparison operator, less than ( < )
is accompanied by an equal sign, this comparison is considered an inclusive
interval:
436 Chapter 20 / Loops and Conditionals
Note that the previous range condition expression is equivalent to the following:
where empnum >= 500 and empnum <= 1000;
You can combine the NOT logical operator with a fully bounded range condition to
select rows that fall outside the range. Note that parentheses are required:
where not (500 <= empnum <= 1000);
BETWEEN-AND Operator
The BETWEEN-AND operator is also considered a fully bounded range condition
that selects rows in which the value of a variable falls within an inclusive range of
values.
You can specify the limits of the range as constants or expressions. Any range that
you specify is an inclusive range, so that a value equal to one of the limits of the
range is within the range. The general syntax for using BETWEEN-AND is as
follows:
For example:
where empnum between 500 and 1000;
where taxes between salary*0.30 and salary*0.50;
You can combine the NOT logical operator with the BETWEEN-AND operator to
select rows that fall outside the range:
where empnum not between 500 and 1000;
Note: The BETWEEN-AND operator and a fully bounded range condition produce
the same results. That is, the following WHERE expressions are equivalent:
where 500 <= empnum <= 1000;
where empnum between 500 and 1000;
CONTAINS Operator
The most common usage of the CONTAINS (?) operator is to select rows by
searching for a specified set of characters within the values of a character variable.
The position of the string within the variable's values does not matter. However, the
operator is case sensitive when making comparisons.
The following examples select rows that have the values Mobay and Brisbayne for
the variable Company, but they do not select rows that contain Bayview:
where company contains 'bay';
where company ? 'bay';
WHERE Expressions 437
You can combine the NOT logical operator with the CONTAINS operator to select
rows that are not included in a specified string:
where company not contains 'bay';
You can also use the CONTAINS operator with two variables to determine whether
one variable is contained in another. When you specify two variables, keep in mind
the possibility of trailing spaces, which can be resolved using the TRIM function.
proc sql;
select *
from table1 as a, table2 as b
where a.fullname contains trim(b.lastname) and
a.fullname contains trim(b.firstname);
In addition, the TRIM function is helpful when you search on a macro variable.
proc print;
where fullname contains trim("&lname");
run;
And the following is equivalent for numeric data. This statement differentiates
missing values with special missing value characters:
where idnum <= .Z;
You can combine the NOT logical operator with IS NULL or IS MISSING to select
nonmissing values, as follows:
where salary is not missing;
LIKE Operator
The LIKE operator selects rows by pattern matching; that is, it selects rows by
comparing the values of a character variable to a specified pattern.
The LIKE operator is case sensitive and it uses two special characters for specifying
a pattern:
438 Chapter 20 / Loops and Conditionals
underscore (_)
matches just one character in the value for each underscore character. You can
specify more than one consecutive underscore character in a pattern, and you
can specify a percent sign and an underscore in the same pattern. For example,
you can use different forms of the LIKE operator to select character values from
this list of first names:
n Diana
n Diane
n Dianna
n Dianthus
n Dyan
The following table shows which of these names is selected by using various forms
of the LIKE operator:
You can use a SAS character expression to specify a pattern, but you cannot use a
SAS character expression that uses a SAS function.
You can combine the NOT logical operator with LIKE to select values that do not
have the specified pattern, such as the following:
where frstname not like 'D_an%';
The % and _ characters have a special meaning for the LIKE operator. When
searching for patterns that contain the % and _ characters, you must use an escape
character.
For example, if the variable X contains the values abc, a_b, and axb, the following
LIKE operator with an escape character selects only the value a_b. The escape
character (/) specifies that the pattern searches for a literal ' _' that is surrounded
by the characters a and b. The escape character (/) is not part of the search.
where x like 'a/_b' escape '/';
Without an escape character, the following LIKE operator selects the values a_b
and axb. The underscore in the search pattern matches any single b character,
including the value with the underscore:
where x like 'a_b';
Sounds-like Operator
The sounds-like ( =*) operator selects rows that contain a spelling variation of a
specified word or words. The operator uses the Soundex algorithm to compare the
variable value and the operand. For more information, see the SOUNDEX function
in SAS Functions and CALL Routines: Reference.
Note: The SOUNDEX algorithm is English-biased, and is less useful for languages
other than English.
Although the sounds-like operator is useful, it does not always select all possible
values. For example, consider that you want to select rows from the following list
of names that sound like Smith:
n Schmitt
n Smith
n Smithson
n Smitt
440 Chapter 20 / Loops and Conditionals
n Smythe
The following WHERE expression selects all the names from this list except
Smithson:
where lastname=* 'Smith';
You can combine the NOT logical operator with the sounds-like operator to select
values that do not contain a spelling variation of a specified word or words. Here is
an example of what you cannot do:
where lastname not =* 'Smith';
SAME-AND Operator
Use the SAME-AND operator to add more conditions to an existing WHERE
expression later in the program without retyping the original conditions. This
capability is useful with the following:
n interactive SAS procedures
Use the SAME-AND operator with a WHERE expression when you want to insert
additional conditions. The SAME-AND operator has the following form:
n where-expression-1;
n SAS statements...
n SAS statements...
SAS selects rows that satisfy the conditions after the SAME-AND operator, in
addition to any previously defined conditions. SAS treats all of the existing
conditions as if they were conditions separated by AND operators in a single
WHERE expression.
The following example shows how to use the SAME-AND operator within RUN
groups in the GPLOT procedure. The SAS data set YEARS has three variables and
contains quarterly data for the 2009–2011 period:
proc gplot data=years;
plot unit*quar=year;
run;
where year > '01jan2009'd;
run;
where same and year < '01jan2012'd;
run;
WHERE Expressions 441
For example, if A is less than B, then the following would return the value of A:
where x = (a min b);
Note: The symbol representation >< is not supported, and <> is interpreted as “not
equal to.”
Concatenation Operator
The concatenation operator concatenates character values. You indicate the
concatenation operator as follows:
n || (two OR symbols)
For example:
where name = 'John'||'Smith';
Prefix Operators
The plus sign (+) and the minus sign (–) can be either prefix operators or arithmetic
operators. They are prefix operators when they appear at the beginning of an
expression or immediately preceding an open parenthesis. A prefix operator is
applied to the variable, constant, SAS function, or parenthetic expression.
where z = −(x + y);
Avoid using the LIKE where country like 'A%INA'; where country like '%INA';
operator that begins with
% or _.
Avoid using arithmetic where salary > 48000; where salary > 12*4000;
expressions that contain
only constants.
For example, the following DATA step creates an index on the sashelp.prdsal2
data set that is optimized for selecting rows based on the values of the variable
Actual:
data prdIndx(index=(Actual));
set sashelp.prdsal2;
run;
proc print data=myindex(where=(Actual<10));
run;
Assume that the data set sashelp.prdsal2, which has over 23,000 rows, is sorted
by the variable Actual, without an index. To process the expression where
Actual<10, SAS starts reading rows sequentially, beginning with the first row in the
data set, and stops reading rows only after it finds a row that is greater than 10.
SAS can determine when to stop reading rows, but without an index, there is no
indication of where to begin. So, SAS begins with the first row. This can require
reading a lot of rows.
Having an index enables SAS to determine which rows satisfy the criteria. This is
called optimizing the WHERE expression.
WHERE Expressions 443
Note: However, by default, SAS decides whether to use the index or to read the
entire data set sequentially.
For information about SAS indexes, see “Using an Index for WHERE Processing” in
SAS V9 LIBNAME Engine: Reference.
The following example uses the FIRSTOBS= and OBS= data set options in the
PROC PRINT statement to print a subset of the data returned by the WHERE
statement.
data shoes;
set sashelp.shoes;
keep Product Sales;
run;
proc print data=shoes(firstobs=2 obs=5);
title 'Subset of Shoes WHERE Sales<800';
title2 'Print rows 2 - 5';
where Sales<800;
run;
Note: Notice that the values for the data set options are 2 and 5 and not 176 and
297. This is because the PRINT output preserves the physical numbers from the
original Shoes input data set. The data set option numbers represent the logical row
numbers from the filtered data set and not the physical row numbers.
If you are processing a SAS view that is a view of another view (nested views),
applying OBS= and FIRSTOBS= to a subset of data might produce unexpected
results. For nested views, OBS= and FIRSTOBS= processing is applied to each SAS
view, starting with the root (lowest-level) view, and then filtering rows for each
SAS view. The result might be that no rows meet the criteria.
444 Chapter 20 / Loops and Conditionals
See Also
“Example: Data Set Options” on page 469
IF Statements
IF statements execute code only if a condition is satisfied. The subsetting IF and
the WHERE statements both test a condition to determine whether SAS should
process a row. However, the two statements are not equivalent.
For a comparison of these two statements, see “Subsetting IF Statement versus the
WHERE Statement” on page 444.
Types of IF Statements
There are two types of IF statements in SAS:
Statement
Type Description Example
statement is required. The following table provides tasks that require you to use
either one of the statements.
Make the selection at some point during a DATA step subsetting IF statement
rather than at the beginning.
If the condition is true, SAS reads the If the condition is true, SAS processes the
row into the PDV and processes the row.
row.
If the condition is false, SAS removes the
If the condition is false, SAS does not row from the PDV and continues processing
read the row into the PDV and with the next row.
continues processing with the next
row.
446 Chapter 20 / Loops and Conditionals
Testing the condition before the row is Not testing the condition before the row is
read can yield substantial savings read can be expensive, especially when rows
when rows contain many variables or contain many variables or very long
very long character variables (up to character variables.
32K bytes).
If the data set contains rows with very few
variables or short character variables,
moving data to the PDV is likely to be fast.
However, a variable containing 32K bytes of
character data takes longer to move even
though the data is contained in only one
variable. In this case, a WHERE expression is
generally more efficient because it avoids
reading unnecessary data to the PDV.
See Also
n “Examples: IF Statements” on page 464
n “Other Control Flow Statements” on page 447
n “Creating the Input Buffer and the Program Buffer” on page 867.
otherwise delete;
end;
run;
proc print;
See Also
“Example: SELECT WHEN Statement” on page 467
Statement Description
Examples: DO Loops
Example Code
This example uses an iterative DO statement to repeatedly decrement the values
for the variable balance and write these values to the output data set.
data loan;
balance=10000;
do payment_number=1 to 10;
balance=balance-1000;
output;
end;
run;
proc print data=loan;
run;
450 Chapter 20 / Loops and Conditionals
Output 20.2 PROC PRINT Output Showing Balance Decrease in a SAS Data Set
Using an Iterative DO Loop
Key Ideas
See Also
“DO Statement: Iterative” in SAS DATA Step Statements: Reference
Example Code
The following example shows how you can place one DO loop within another to
compute the value of a one-year investment that earns 7.5% annual interest,
compounded monthly.
data earn;
Capital=2000;
do Year=1 to 10;
do Month=1 to 12;
Interest=Capital*(.075/12);
Capital+Interest;
output;
end;
Examples: DO Loops 451
end;
run;
proc print data=earn; run;
Output 20.3 Partial PROC PRINT Output Showing Compound Interest Earned Using
Nested DO Loops
Key Ideas
See Also
“DO Statement: Iterative” in SAS DATA Step Statements: Reference
Example Code
This example uses DOLIST syntax to iterate over a list of values.
data do_list;
x=-5;
do i=5, /*a single value*/
5, 4, /*multiple values*/
x + 10, /*an expression*/
80 to 90 by 5, /*a sequence*/
60 to 40 by x; /*a sequence with a variable*/
output;
end;
run;
proc print data=do_list; run;
Key Ideas
See Also
n “Types of DO Loops” on page 425.
n “DO Statement: Iterative” in SAS DATA Step Statements: Reference
Example Code
This example uses the iterative DO TO-value syntax to repeatedly increment the
values of x. The loop also specifies the OUTPUT statement to write each
incremented value of x to the output data set.
data do_to;
x=0;
do i=0 to 10;
x=x+1;
output;
end;
run;
proc print data=do_to;
run;
Key Ideas
n With DO TO-value syntax, the TO value specifies the ending value for the
index-variable in an iterative DO loop.
See Also
n “DO Statement: Iterative” in SAS DATA Step Statements: Reference
n “Using Various Forms of the Iterative DO Statement” in SAS DATA Step
Statements: Reference
Example Code
This example uses the iterative DO TO-value BY-increment syntax to repeatedly
increment values of x by 1. The BY-increment value specifies that the index variable
increments by 2 each time that the loop executes.
The loop also specifies that the OUTPUT statement writes each incremented value
of x to the output data set.
data do_to_by;
x=0;
do i=0 to 10 by 2;
x=x+1;
output;
end;
run;
proc print data=do_to_by;
run;
Examples: DO Loops 455
Key Ideas
See Also
n “DO Statement: Iterative” in SAS DATA Step Statements: Reference
n “Using Various Forms of the Iterative DO Statement” in SAS DATA Step
Statements: Reference
Example Code
This example uses a DO loop to execute a group of statements repetitively while a
condition is true. The program calculates the number of payments that must be
made for a specified loan amount by iterating repeatedly to decrement Balance
while the balance is greater than zero.
data loan;
balance=10000;
payment=0;
456 Chapter 20 / Loops and Conditionals
do while (balance>0);
balance=balance-1000;
payment=payment+1;
output;
end;
run;
proc print data=loan;
run;
Output 20.7 PROC PRINT Output Showing Balance Decrease in a SAS Data Set
Using a DO WHILE Statement
Key Ideas
See Also
n “DO UNTIL Statement” in SAS DATA Step Statements: Reference
n “DO WHILE Statement” in SAS DATA Step Statements: Reference
n “DO Statement: Iterative” in SAS DATA Step Statements: Reference
Examples: WHERE Processing 457
Example Code
This example uses the WHERE statement to conditionally select rows from the
sashelp.class data set and writes the selected rows to the SAS data set, shoes.
The first DATA step runs in SAS and subsets the data, selecting only the rows in
which Sales are greater than $500,000. The filtered data is loaded to shoes.
The second DATA step runs on the shoes data set to further subset the data. This
DATA step selects only the rows in which the values for the variable Region is
Canada. The selected rows are then written to the output SAS data set shoes2.
data shoes;
set sashelp.shoes;
where Sales>=500000;
run;
proc print data=shoes;
var Region Product Sales;
run;
data shoes2;
set shoes;
where Region="Canada";
run;
Output 20.8 PROC Print Output for the shoes Table in Which Sales Are Greater
Than $500,000
Output 20.9 PROC Print Output for the shoes2 Table in Which Region Is Canada
Key Ideas
See Also
“WHERE Statement” in SAS DATA Step Statements: Reference
Examples: WHERE Processing 459
Example Code
This example uses a compound WHERE expression to select a subset of data from
the SAS data set sashelp.shoes.
The DATA step conditionally select rows in which Sales are greater than $500,000
and Region is Canada. The compound WHERE expression consists of two WHERE
expressions that are joined by the AND operator.
data shoes;
set sashelp.shoes;
where Sales>=500000 and Region="Canada";
keep Region Product Sales;
run;
Output 20.10 PROC PRINT Output for Conditionally Selecting Rows Using a
Compound WHERE Expression
Key Ideas
3 OR expressions last.
460 Chapter 20 / Loops and Conditionals
See Also
n “WHERE Statement” in SAS DATA Step Statements: Reference
n compound WHERE expression
Example Code
In the following example, the AND operator is used in the WHERE statement to
find rows based on conditions for Age and Sex.
data class;
set sashelp.class;
where sex="M" and age >= 15;
run;
proc print data=class;
run;
The output data set contains information about Males who are 15 years or older:
Output 20.11 PROC PRINT Output for Conditionally Selecting Rows Based on
Multiple Conditions
Output 20.12 PROC PRINT Output for Conditionally Selecting Rows Based on
Multiple Conditions
In the following example, the less than symbol ( < ) finds rows that satisfy the
criteria for age and sex.
data class;
set sashelp.class;
where age < 15 and sex NE "M";
run;
proc print data=class;
title 'Finds Females
Older less than 15 Years';
run;
Output 20.13 PROC PRINT Output for Conditionally Selecting Rows Based on
Multiple Conditions
The order in which SAS processes expressions that are joined by Boolean operators
affects the output. The default order of operations is to process the NOT
expressions first, the AND expressions next, and the OR expressions last:
data class;
set sashelp.class;
where age>15 or height<60 and sex="F";
run;
proc print data=class;
title 'age > 15 OR height < 60 AND sex = F';
run;
462 Chapter 20 / Loops and Conditionals
Output 20.14 PROC PRINT Output for Conditionally Selecting Rows Based on
Multiple Conditions Using Multiple Boolean Operators
To control the order of evaluation, you use parentheses. Here is the same example
except parentheses are used to specify that the OR expression is evaluated first
and then the AND expression.
data class;
set sashelp.class;
where (age>15 or height<60) and sex="F";
run;
proc print data=class;
title '(age > 15 OR height < 60) AND sex = F';
run;
Output 20.15 PROC PRINT Output for Conditionally Selecting Rows and Controlling
the Order of Operations
Key Ideas
3 OR expressions last.
Examples: WHERE Processing 463
Example Code
This example uses the WHERE= data set option to conditionally select rows from
the sashelp.shoes data set.
data sales;
set sashelp.shoes(where=(Region="Canada" and Sales<2000));
run;
proc print data=sales; run;
Key Ideas
n The WHERE= data set option cannot be used with the POINT= option in
the SET and MODIFY statements.
See Also
n “WHERE= Data Set Option” in SAS Data Set Options: Reference
Example Code
In the following example, the first DATA step creates an output data set,
mybaseball, and the index= option adds a simple index to the team variable. The
464 Chapter 20 / Loops and Conditionals
second DATA step reads the data set and selects for processing only those rows in
which the team name is Atlanta.
data mybaseball(index=(team));
set sashelp.baseball;
run;
data mybaseball;
set sashelp.baseball;
where Team="Atlanta";
keep Name Team Position;
run;
proc print data=mybaseball; run;
Key Ideas
See Also
“Indexes” in SAS V9 LIBNAME Engine: Reference
Examples: IF Statements
Example Code
This example shows how to use a subsetting IF statement to subset data from an
input SAS data set and write out the rows to an output data set. The DATA step
executes the SET statement on a row if the condition Age=13 is true. Therefore, the
program selects only the 3 rows in which Age equals 13.
data class;
set sashelp.class;
Examples: IF Statements 465
if Age = 13;
run;
proc print data=class; run;
Output 20.16 PROC PRINT Output Showing How to Filter Data Using a Subsetting
IF Statement
The following DATA step executes the SET statement on the rows in which the
value for the variable Age is missing. Since the input data does not contain any
missing data for Age, SAS evaluates the IF statement to false and no rows are
written to the output data set.
data class;
set sashelp.class;
if Age = .;
run;
proc print data=class; run;
Example Code 20.1 Log Output for Using the Subsetting IF Statement to Subset Data
Key Ideas
n The subsetting IF statement filters data from an input SAS data source
and writes out only the rows that satisfy a specified condition.
n If the expression is false (its value is 0 or missing), no further statements
are processed for that row, the current row is not written to the output,
and the remaining program statements in the DATA step are not
executed.
See Also
n subsetting IF statement.
n “IF-THEN/ELSE Statement” in SAS DATA Step Statements: Reference
466 Chapter 20 / Loops and Conditionals
Example Code
This example shows how to use an IF-THEN/ELSE statement to conditionally
select rows in which Region is Canada and Product is Slipper.
data slipper;
set sashelp.shoes;
format commission comma6.;
if Region = "Canada" and Product="Slipper"
then commission = (Sales - Returns) * .10;
else delete;
keep Region Product Sales Returns Commission;
run;
proc print data=slipper;
Output 20.17 PROC PRINT Output Showing How to Conditionally Select Data Using
an IF-THEN/ELSE Statement
Key Ideas
See Also
n subsetting IF statement
n “IF-THEN/ELSE Statement” in SAS DATA Step Statements: Reference
n “SELECT Statement” in SAS DATA Step Statements: Reference
Example Code
This example shows how to use the DATA step SELECT statement to subset a SAS
data set.
data shoes;
set sashelp.shoes(where=(Region='Canada' or Region='Pacific' or
Region='Asia'));
keep Region Sales Product;
select(Region);
when('Canada') sales = sales * .10;
when('Pacific') sales = sales * .09;
when('Asia') sales = sales * .07;
end;
run;
proc print data=shoes; run;
468 Chapter 20 / Loops and Conditionals
Output 20.18 Partial PROC PRINT Output Showing How to Use the SELECT
Statement to Conditionally Select Rows from a SAS Data Set
Key Ideas
See Also
n “SELECT Statement” in SAS DATA Step Statements: Reference
n subsetting IF statement
n “IF-THEN/ELSE Statement” in SAS DATA Step Statements: Reference
Example Code
The following example shows how to use the FIRSTOBS= and OBS= data set
options with the WHERE statement to specify a segment of data to process
conditionally.
In this example, the DATA step creates a data set named Example that contains 10
rows and two variables: i and x.:
data example;
do i=1 to 10;
x=i + 1;
output;
end;
run;
The following PROC step contains two separate subsetting actions: the WHERE
statement and the data set options FIRSTOBS= and OBS=. The WHERE statement
is processed first on the original input data set example, and then the two data set
options are processed on the results of the WHERE statement.
WHERE i > 5 tells SAS to select only the rows in which i is greater than 5 from
the original input data set. This is a subset of the original data set containing only 5
rows, which are rows 6 through 10 of the original input data set.
SAS then processes the data set options, FIRSTOBS=2 and OBS=4, on the resulting 5
rows and prints the 2nd through 4th rows of the WHERE results. The PRINT
procedure prints rows 7, 8, and 9 from the original input data set.
proc print data=example (firstobs=2 obs=4);
where i > 5;
run;
Output 20.20 PROC PRINT Selects Rows Where i>5 and Then from That Subset,
Rows 2 through 4
Output 20.21 PROC PRINT Output Showing Data Set Example Containing Rows 7,
8, and 9
Key Ideas
See Also
n “FIRSTOBS= Data Set Option” in SAS Data Set Options: Reference
n “OBS= Data Set Option” in SAS Data Set Options: Reference
472 Chapter 20 / Loops and Conditionals
473
21
Combining Data
Terms
The following terms are used throughout this chapter and are defined here in the
context of combining data:
Overview of Combining Data 475
common variable
a variable that is present in all of the input data sets that are being combined. A
common variable can be specified as the BY variable, but it does not have to be.
Variables are recognized as the same, or common to multiple data sets if they
have the same name regardless of the use of upper or lowercase letters in the
name. For example, SAS considers the variable name in one data set to be
identical to the variable NAME in another data set. In addition to having the same
name, common variables must also have the same data type. See “Example:
Prepare Data in Which Common Variables Have Different Data Types” on page
501 for more information.
BY variable
a variable (or variables) whose values serve as the basis for how rows are
merged when you combine multiple data sets into one. BY variables are
specified in the BY statement in SAS. BY variables are always common variables
in the context of combining data. The BY variables are the variables named in
the BY statement of the MERGE, UPDATE, SET, or MODIFY statements when
combining data. See “Example: Examine the Data to Be Combined” on page 495
and “Example: Creating Unique BY Values When Data Contains Duplicate BY
Values ” on page 499 for more information.
BY value
the value that a BY variable has in a particular row or observation.
BY group
a group of rows that have the same value for a variable that is specified in a BY
statement. If more than one variable is specified in a BY statement, then the BY
group is a group of rows that have a unique combination of values for those
variables.
data relationships
associations between data sets or tables that exist in one of three ways: one-
to-one, one-to-many (or, many-to-one), and many-to-many. The data sets are
associated by common data, either at the physical or logical level. For example,
in a business database, employee data and department data could be related
through an Employee_ID variable that shares common values. Another data set
could contain a numeric sequence of numbers whose partial values logically
relate it to a separate data set by row number. See “Data Relationships” for
more information about data relationships and how they relate to combining
data.
matched rows
rows in input data sets that are selected for output when the values match the
selected criteria. The matching criteria can be based on row number (or position
within the data set) or it can be based on the values of variables that are
specified by the user in a BY statement.
unmatched rows
rows in input data sets that are not selected for output because the values do
not match the specified criteria. The matching criteria can be based on row
number (or position within the data set) or it can be based on the values of
variables that you specify in a BY statement.
476 Chapter 21 / Combining Data
Data Relationships
The following four categories are characterized by how rows relate among the data
sets. All related data fall into one of these categories. You must be able to identify
the existing relationships in your data, because this knowledge is crucial to
understanding how input data can be processed to produce desired results.
One-to-One Relationship
In a one-to-one relationship, a single row in one data set is related to only one row
in a second data set, and a single row in the second data set is related to only one
row in the first data set. The relationship is based on the values of one or more
selected variables. A one-to-one relationship implies that there are no duplicate
values of the selected variable in each data set. When you work with multiple
selected variables, this relationship implies that each combination of values occurs
no more than once in each data set.
In the figure, rows in data sets Animal and Plant are related by matching values for
the variable Common. The values for the variable Common are unique in both data sets.
One-to-Many Relationship
A one-to-many relationship between input data sets implies that one data set has
at most one row with a specific value of the selected variable, but the other input
data set can have more than one, or duplicates, of each value. When you work with
multiple selected variables, this relationship implies that each combination of
values occurs no more than once in one data set. The combination can occur more
than once in the other data set. The order in which the input data sets are
processed determines whether the relationship is one-to-many or many-to-one.
Overview of Combining Data 477
In the figure, rows in data sets Animal and Plant are related by common values for
the variable Common. Values for the variable Common are unique in the data set
Animal but not in data set Plant
In the figure, rows in data sets Animal, Plant, and Mineral are related by common
values for the variable Common. Values for Common are unique in data sets Animal
and Mineral but not in Plant. A one-to-many relationship exists between rows in
data sets Animal and Plant and a many-to-one relationship exists between rows in
data sets Plant and Mineral.
478 Chapter 21 / Combining Data
Many-to-Many Relationship
The many-to-many category implies that multiple rows from each input data set
can be related based on the values of one or more common variables.
In the figure, rows in data sets Animal and Plant are related by common values for
the variable Common. Values for the variable Common are not unique in either data set.
A many-to-many relationship exists between rows in these data sets for values a
and c.
Statement or
Type Method Procedure Simplified Example Code
Statement or
Type Method Procedure Simplified Example Code
Concatenating
Definition Concatenating Two Data Sets
480 Chapter 21 / Combining Data
In the figure, the rows from Data2 are appended to the rows from
Data1.
Details
n Concatenating using the SET statement processes the input data sets
sequentially, in the order in which they are listed in the SET statement.
n Concatenating does not require that input data sets contain the same variables.
n Concatenating creates output that contains all the rows from all the input data
sets and all the columns from all the input data sets.
n Concatenating sets the values of variables that do not exist in all input data
sets to missing.
See Also
n “Examples: Concatenate Data” on page 510
Interleaving
Definition Interleaving Two Data Sets
Interleaving combines two or more SAS data sets
into a single data set by interspersing their rows
with one another based on the values of common
BY variables. In some programming languages, the
term merge means to interleave rows. However,
rows that are interleaved in SAS data sets are not
merged; they are copied from the original data sets
in the order of the values of the BY variable.
Syntax
Overview of Combining Data 481
Details
n Interleaving requires the SET statement together with the BY statement, which
specifies one or more common variables by which rows are matched.
n The input data sets must be indexed on or sorted by the BY variables.
n Interleaving returns rows that are ordered within each BY group by their
positions in the original input data set.
See Also
n “Examples: Interleave Data” on page 518
Match-Merging
Definition Match–Merging Two Data Sets
Match-Merging combines rows
from two or more input data sets
into a single row in an output data
set based on the values of the BY
variable.
Syntax
MERGE data-set-1 <data-
set-2><…data-set-n>;
BY variable-1 <…variable-n>;
482 Chapter 21 / Combining Data
data Combined;
merge Data1 Data2;
by Year;
run;
In the figure, rows from Data2 are merged with rows from Data1
based on the values of the BY variable, Year.
Details
n Match-merging requires the MERGE statement together with the BY statement,
which specifies one or more common variables by which rows are matched.
n Match-merging requires that input data sets contain indexes or that they are
sorted by the values of the BY variables.
n The variables in the BY statement must be common to all input data sets.
n Only one BY statement can accompany each MERGE statement in a DATA step.
n The MERGE, UPDATE, and MODIFY statements can be used to combine and
update rows in SAS data sets. For a comparison of these methods, see
“Updating and Modifying Comparison” on page 489
See Also
n “Examples: Match-Merge Data” on page 525
One-to-One Merging
Definition One-to-One Merging of Two Data Sets
Merging data one-to-one combines
rows from multiple input data sets
into a single row in a new output
data set. Rows are combined based
on their positions in the input data
sets. The first row in the first input
data set is combined with the first
row in the second input data set,
and so on.
Overview of Combining Data 483
Syntax
Data1 Data2 Combined
MERGE data-set-1 <data-set-2><… X Y X Y
data-set-n>;
x1 y1 x1 y1
x2 + y2 = x2 y2
x3 y3 x3 y3
y4 y4
y5 y5
data Combined;
merge Data1 Data2 ;
run ;
In the figure, rows from Data2 are merged with rows from Data1
based on row number.
Details
n One-to-one merging requires the MERGE statement without the BY statement.
There is no key variable on which to base the merge. Instead, rows are merged
implicitly by row number.
n If the input data sets contain common variables, SAS replaces the values from
the first data set that is read with values from the last data set that is read.
n SAS does not stop reading rows until it has read all rows from all input data
sets. Compare this to one-to-one reading, in which SAS stops reading rows
when it has read the last row from the smallest input data set.
n The output data set contains all variables from all input data sets. The number
of rows in the output data set equals the number of rows in the largest input
data set.
See Also
n “Examples: Merge Data One-to-One” on page 542
One-to-One Reading
Definition One-to-One Reading of Two Data Sets
One-to-one reading combines rows
from multiple input data sets into a
single row in a new data set. Rows are
matched based on their positions in
the data sets. The first row in the first
484 Chapter 21 / Combining Data
In the figure, rows from Data2 are merged with rows from
Data1 based on row number.
Details
n One-to-one reading requires multiple SET statements without a BY statement.
There is no key BY variable on which to merge the data sets. Like One-to-One
Merging, rows are merged implicitly by row number rather than by the value of a
BY variable.
n If the input data sets contain common variables, SAS replaces the values of the
common variables from the first data set that is read with values from the last
data set that is read. There is no key BY variable on which to merge rows, so
they are combined implicitly by row number.
n SAS stops reading rows from input data sets after it has read the last row from
the smallest input data set. Compare this to One-to-One Merging, in which SAS
does not stop reading until it has read all rows from all input data sets.
n The output data set contains all variables from all input data sets. The number
of rows in the output data set equals the number of rows in the smallest input
data set.
See Also
n “Examples: Combine Data One-to-One” on page 550
Details
n Hash objects are available only for the duration of the DATA step.
n You can output the contents of a hash object to a data table by using the
OUTPUT method.
n The data set that is written by the DATA statement contains the observations
from the data set that is being read, along with the data variables named in the
DEFINEDATA method for those observations that match the key variable or
variables.
n Sorting is not required when using hash tables to merge data sets.
n Hash-table merging enables fast processing on large tables, as long as the hash
table can be held in memory.
n The same variable name does not have to be used for the key or keys in both
data sets. The match can be done on different variables.
See Also
n “Example: Merge Data Using a Hash Table” on page 552
Updating
Definition Updating a Master Data Set from a Transaction Data Set
486 Chapter 21 / Combining Data
2013 Y2 2014 X1 Y1
2015 X2 Y2 2015 X2 Y2
Details
n The UPDATE statement requires a BY statement that specifies one or more
common variables by which rows are matched.
n The input data sets must be indexed on or sorted by the BY variables.
n Values of the BY variable must be unique for each observation in the master
data set. If the master data set contains multiple rows with the same value of
the BY variable, the first observation is updated and the remaining observations
in the BY group are ignored. SAS writes a warning message to the log when the
DATA step executes.
n The transaction data set can contain more than one observation with the same
BY value. Multiple transaction rows are all applied to the master observation
before it is written to the output file.
n Updating creates a new output data set. It does not update the existing master
input data set the way that modifying does.
n Usually, the master data set and the transaction data set contain the same
variables. However, to reduce processing time, you can create a transaction data
set that contains only those variables that are being updated. The transaction
data set can also contain new variables to be added to the output data set.
n The MERGE, UPDATE, and MODIFY statements can be used to combine and
update rows in SAS data sets. For a comparison of these methods, see
“Updating and Modifying Comparison” on page 489
See Also
n “Example: Update Data Using the UPDATE Statement” on page 556
n Updating a Table with Values from Another Table Using PROC SQL
Overview of Combining Data 487
Modifying
Definition Modifying a Master Data Set from a Transaction Data Set
The MODIFY statement matches the
Master Master
values of one or more BY variables in
Year X Y data Master; Year X Y
a master data set against the same
2005 X1 Y1 modify Master Trans; 2005 X1 Y1
variables in a transaction data set. The by Year;
2006 X1 Y1 2006 X1 Y1
existing master data set is modified. A run;
new data set is not created. 2007 X1 Y1 2007 X1 Y1
2008 X1 Y1 2008 X1 Y1
Syntax
2009 X1 Y1 2009 X1 Y1
MODIFY master-data-set transaction- Trans
2010 X1 Y1 2010 X1 Y1
data-set <(data-set-options)> Year X Y
2011 X1 Y1 2011 X2 2011 X2 Y1
BY by-variable;
2012 X1 Y1 2012 X2 Y2 2012 X2 Y2
2013
2014
X1
X1
Y1
Y1
+ 2013 X2
= 2013 X2 Y2
2013 Y2 2014 X1 Y1
2015 X2 Y2 2015 X2 Y2
In the figure, rows in the Master data set are modified based
on the values of the specified BY variable Year in the Trans
data set.
Details
n The MODIFY statement requires that the input data sets share a common
variable that is specified in a BY statement.
n Modifying does not require that the input data sets are sorted or indexed.
n Both the master data set and the transaction data set can have rows with
duplicate values of the BY variables.
n If duplicate values of the BY variable exist in the master data set, only the first
occurrence is updated.
n If duplicate values of the BY variable exist in the transaction data set, the
duplicates are applied one on top of another so that only the value in the last
transaction appears in the master observation.
n Modifying does not permit you to create new variables or change variable
attributes.
n The MERGE, UPDATE, and MODIFY statements can all be used to combine and
update rows in SAS data sets. For a comparison of these methods, see
“Updating and Modifying Comparison” on page 489
See Also
n “Example: Modify a Data Set by Adding an Observation” on page 563
488 Chapter 21 / Combining Data
Comparing Methods
When values of the common variables are different between data sets that are
being merged, they are often referred to as “unmatched rows” or “unmatched
records.” Missing values also result in unmatched rows.
The DATA step and PROC SQL treat unmatched rows differently.
The yellow highlights in the match-merge indicate the un-matched rows, which are not
included in the SQL output. For another example that compares the DATA step with PROC
SQL, see “Comparing PROC SQL with the SAS DATA Step” in SAS SQL Procedure User’s Guide.
Comparing Methods 489
When to use n You have a master data set that n You want to change the master data set
needs to be changed or updated based on the values of another data set
based on the values of a without creating an additional, new output
transaction data set. data set. MODIFY creates an updated
n You want to process most of the
version of the original master data set.
data in the master data set. n You want to process or change only a small
portion of the master data set.
Number of UPDATE and MODIFY when UPDATE and MODIFY when specified with
data sets that specified with the BY statement can the BY statement can combine rows from
can be combine rows from exactly two data exactly two data sets.
processed sets.
Disk space “Updating” requires more disk space “Modifying” saves disk space because it
because it produces an updated copy updates the existing table without creating a
of the data set. new table.
BY statement The BY statement is required. The BY statement is required when using the
requirements MODIFY statement to update a master data
based on values in a separate, transaction
data set when the input data sets are related
in some way by one or more common
variables. *
Unique BY or n Duplicate values for BY variables n Duplicate values for BY variables are
key value are allowed in the transaction data allowed in either the master data set or the
requirements set. Multiple transaction rows are transaction data set.
all applied to the master n If duplicates exist in the master data set,
observation before it is written to
only the first occurrence is updated
the output file.
because the generated WHERE statement
n Unique values for BY variables are always finds the first occurrence in the
required in the master data set. If master data set.
duplicate values for the BY n If duplicates exist in the transaction data
variable exist in the master data
set, the duplicates are applied one on top
490 Chapter 21 / Combining Data
set, then only the first observation of another unless you write an
with that value in the BY group is accumulation statement to add all of them
updated and SAS issues a warning. to the master observation. Without the
accumulation statement, the values in the
duplicates overwrite each other so that
only the value in the last transaction is the
result in the master observation.
Sort or index n Sorting or indexing is required on n Sorting or indexing is not required for any
requirements input data sets. data set.
n If an index exists on the input data n Neither the master data set nor the
set, then the UPDATE statement transaction data set require sorting or
does not maintain it on the indexing because the BY statement, when
updated data set. You must used with the MODIFY statement, triggers
rebuild the index. dynamic WHERE processing.
n If an index exists on the input data set,
then the MODIFY statement maintains it
on the updated data set.
Treatment of n MODIFY and UPDATE do not overwrite values in the master data set with missing
missing values ones in the transaction data set by default.
in the input n To cause missing values in the transaction data set to replace existing values in the
data sets or
master data set, specify NOMISSINGCHECK in the UPDATEMODE= option. For an
master data
example, see Example: Update a Data Set with Missing Values on page 560
set
Data set No data loss occurs because the Data might be only partially updated due to
integrity UPDATE statement works on a copy an abnormal task termination.
of the data.
* The MODIFY statement, unlike the UPDATE statement, can be used in its other
forms to make changes to a single input data set. In these cases, syntax Forms 2
Comparing Methods 491
through 4 and a BY statement are not required. However, for the purposes of this
topic (combining data), the MODIFY statement requires the Form 1 version of its
syntax, which does require the BY statement.
Characteris
tic One-to-One Reading One-to-One Merging
How rows Rows are matched by row number for both one-to-one reading and one-to-one merging.
are For example, row 1 of data set 1 is matched with row 1 of data set 2, row 2 of data set 1 is
matched matched with row 2 of data set 2, and so on.
Treatment Only matched rows are selected for output. Both matched and unmatched rows are
of That is, only rows that have the identical selected for output. That is, all rows from
“unmatched row number values in both data sets are all data sets are included in the output
rows” included in the merged output. This means even when the input data sets have a
that merging data sets with different different number of rows.
numbers of rows result in a merged data set
that has the same number of rows as the
smallest input data set.
Treatment Values of common variables are overwritten. That is, values of common variables in the
of common last data set that is named in the SET or MERGE statement overwrite the values in the
variables previous data sets. This is true for both one-to-one reading and one-to-one merging of
data sets.
Sort None. There are no sort requirements for either of these methods.
requirement
s
492 Chapter 21 / Combining Data
Access Method
Use with BY
Language Element Purpose Sequential Direct statement
BY n The BY statement NA NA NA
controls the operation of
MERGE, MODIFY, SET,
and UPDATE statements
and creates and orders
groups.
n The BY statement
enables you to process
rows that contain
variables with equal
values.
Comparing Methods 493
Access Method
Use with BY
Language Element Purpose Sequential Direct statement
Access Method
Use with BY
Language Element Purpose Sequential Direct statement
Access Method
Use with BY
Language Element Purpose Sequential Direct statement
Example Code
In this example, the CONTENTS, SORT, and PRINT procedures are used to examine
the data in three data sets before combining them.
The following table shows the three input data sets to be combined: Inventory,
Sales, and Sales2019.
1 View the descriptive information about the input data sets to determine
whether they share any common variables. In the PROC CONTENTS output,
ensure that the common variable, PartNumber, has the same attributes and
represents the same data in each of the input data sets.
2 Sort the input data sets by the values of the common variable. Because the
variable partNumber is common to all three input data sets, it can be used as a
BY variable for merging the data.
3 Print the data sets to examine the values of the common variable and to
determine the relationship between the data sets. The PROC PRINT output
shows that there is a one-to-many relationship between the data sets
Inventory and Sales, a many-to-one relationship between the data sets Sales
and Sales2019, and a one-to-one relationship between the data sets Inventory
and Sales2019.
Figure 21.2 Partial PROC CONTENTS Output Comparing the Inventory, Sales, and Sales2019 Data
Sets
Examples: Prepare Data 497
Figure 21.3 PROC PRINT Output for the Inventory, Sales, and Sales2019 Data Sets Showing One-to-
Many and Many-to-One Relationships
Note: There are BY variable values that exist in the Inventory data set that do not
exist in the Sales2019 data set. For example, in the Inventory data set,
PartNumber JD03 does not exist in the Sales2019 data set. When the values of the
BY variables are different between data sets that are being merged, they are often
referred to as “unmatched observations” or “unmatched records.” Missing values are
also considered unmatched observations. Different merge methods treat
unmatched observations differently. See “Comparing Methods” on page 488 and
“PROC SQL versus Match-Merging” on page 488 for more information.
Key Ideas
n The PRINT procedure lets you examine the structure and the contents of the data
sets to be combined.
n The CONTENTS procedure displays descriptive information about data sets,
including variable information, formats, number of observations, file size, and
other summary information about the data.
n When merging data sets that have a one-to-many or a many-to-one relationship,
you can use PROC SQL, a DATA step MERGE BY (match-merge) on page 530, the
MODIFY or UPDATE statements, or a hash-table merge on page 552.
n When combining data sets using the DATA step SET, MERGE, MODIFY, and
UPDATE statements, attributes of common variables must match. See, “Example:
Prepare Data in Which Common Variables Have Different Data Types” on page 501
“Example: Prepare Data in Which Common Variables Have Different Lengths” on
page 504, and “Example: Prepare Data in Which Common Variables Have Different
Formats and Labels” on page 506 for more information.
498 Chapter 21 / Combining Data
See Also
n “COMPARE Procedure” in Base SAS Procedures Guide
n “Example: Find the Common Variables in Multiple Input Data Sets” on page 498
Example Code
In this example, PROC SQL uses the metadata in SAS session Dictionary tables to
determine variables that are in common to three input data sets: Inventory, Sales,
and Sales2019.
proc sql;
title "Variables Common to Inventory, Sales, and Sales2019";
create table commonvars as /*
1 */
select memname, upcase(name) as name
from dictionary.columns
where libname='WORK' and
memname in ('INVENTORY', 'SALES', 'SALES2019');
select name /*
2 */
from commonvars
group by name
having count(*)=(select count(distinct(memname)) from
commonvars);
quit;
Key Ideas
n You can also use the COMPARE procedure to compare up to two data sets.
n Identifying common variables across data sets is useful when you want to merge
data based on matching values rather than concatenating or appending data.
PROC COMPARE and PROC SQL can be used to identify common variables
between two data sets. PROC SQL can be used to identify common variables
across multiple data sets.
See Also
n “COMPARE Procedure”
Example Code
This example uses the FREQ procedure on the sales data set to check for duplicate
values of the variable partNumber in the Sales data set. It then uses the _N_ DATA
step automatic variable with the CATX function to create a unique row ID for the
duplicate values.
proc sort data=sales; /* 1 */
by partNumber;
run;
proc print data=sales; run;
data SalesUnique; /* 3 */
set Sales;
uniqueID = catx('.',partNumber,_n_);
500 Chapter 21 / Combining Data
run;
1 Sort the data by the variable, partNumber, which is used as the BY variable.
2 Use PROC FREQ to determine whether there are any duplicate values for
partNumber. Create a data set, SalesDupes, which includes only the BY variable,
partNumber, and Count. Count is a variable that is automatically generated by
PROC FREQ and represents the frequency of each value of partNumber in the
Sales data set. The value of partNumber is written to SalesDupes only if the
value appears more than once in Sales (Count > 1).
3 Create a data set to contain the unique values of the BY variable partNumber.
Modify the value of partNumber to create unique values by appending a unique
number to each value. The CATX function appends the value of _N_ to each
value of partNumber and stores the new value in uniqueID.
4 PROC PRINT shows the values of SalesUnique.
Output 21.1 Output of PROC FREQ That Shows Duplicate Values for the Common
Variable, PartNumber, Found in the Sales Data Set
Output 21.2 PROC PRINT Output Showing a New Variable Containing Unique
Values for the Common Variable PartNumber
Key Ideas
See Also
n For more information about the _N_ automatic variable, see “Automatic
Variables” on page 96
n For more information about the CATX function, see “CATX Function” in SAS
Functions and CALL Routines: Reference
n “Data Relationships” on page 476
n For more information about automatic variables that are generated by PROC
FREQ, see Output Data Sets in the FREQ procedure documentation.
Example Code
In this example, the two data sets contain a common variable that has different
data types.
This example uses the following input data sets: CarsSmall and a subset of the
Sashelp.Cars data set.
data cars; /* 1 */
set sashelp.cars;
run;
data combineCars;
merge cars CarsSmallNum; /* 3 */
by make model;
keep Make DriveTrain Model MakeModelDrive Weight;
run;
1 Create some input data for the example from the Sashelp.Cars data set.
2 Display descriptive information about each of the input data sets to identify
common variables. The CONTENTS procedure output shows that the common
variable, Weight, is a CHAR type variable in the CarsSmall data set and it is a
NUMERIC type variable in the Cars data set.
502 Chapter 21 / Combining Data
3 Merge the data set by the values of the common variable, Weight. Because the
type attribute is different in each of the input data sets, SAS prints an error
message in the log stating that the variable type is incompatible.
Figure 21.4 Comparing PROC CONTENTS Output for the Data Sets Vehicles and VehiclesSmall
Showing Different Data Types for the Common Variable
Example Code 21.1 Log Error Showing that the Type Attribute of the BY Variable is Different
in the Input Data Sets
ERROR: Variable WeightLBS has been defined as both character and numeric.
1274 run;
To fix the problem, re-create the common variable, Weight, in the carsSmall data
set by using the INPUT function in a new DATA step.
data CarsSmallNum; /* 1 */
set CarsSmall;
weightNum=input(weight, 8.);
drop weight;
rename WeightNum=weight;
run;
data combineCars;
merge cars CarsSmallNum; /* 3 */
by make model;
keep Make DriveTrain Model MakeModelDrive Weight;
run;
proc contents data=combineCars; run; /* 4 */
proc print data=combineCars; run;
1 The INPUT function converts the variable, Weight, into a numeric data type and
stores the value in a new variable WeightNum. The original Weight character
variable in CarsSmall is dropped, and the new numeric variable WeightNum is
Examples: Prepare Data 503
renamed to Weight. Renaming the variable ensures that the Cars and CarsSmall
data sets have a common variable to merge the data sets.
2 The Cars and the CarsSmallNum data sets are sorted by the same variables so
that they can be merged.
3 The Cars and the CarsSmallNum data sets are merged by make and model.
Because the data type for the Weight variable in each data set matches, the
data sets merge successfully to create the combineCars data set.
4 Only a portion of the CarsSmall data set is shown in Figure 21.5.
Figure 21.5 Partial PROC PRINT Output for Match-Merged Data Sets after Normalizing the Common
Variable’s Type Attribute
Key Ideas
n When combining data sets, variables in the input data sets that have the same
name are referred to as common variables. The common variable in one input data
set does not always share the same attributes as the same variable in another data
set.
n Common variables with the same data but different attributes can cause problems
when the data sets are combined.
n If the type attribute is different, SAS stops processing the DATA step and issues an
error message stating that the variables are incompatible. To correct this error, you
must use a DATA step to re-create the variables.
n If the length attribute is different, SAS takes the length from the first data set that
is specified in the SET, MERGE, or UPDATE statement.
504 Chapter 21 / Combining Data
See Also
n “Converting Character Values to Numeric Values” in SAS Functions and CALL
Routines: Reference
n “Converting Numeric Values to Character Value” in SAS Functions and CALL
Routines: Reference
Example Code
In this example, the data sets quarter1, quarter2, quarter3, and quarter4 are
match-merged into one output data set using the MERGE statement with the BY
statement. The data sets are pre-sorted and merged on the common variable
account.
This example uses the Quarter1, Quarter 2, Quarter3, Quarter4 data sets.
proc print data=quarter1; run; /* 1 */
proc print data=quarter2; run;
proc print data=quarter3; run;
proc print data=quarter4; run;
data yearly;
merge quarter1 quarter2 quarter3 quarter4; /* 2 */
by Account;
run;
data yearly; /* 3 */
length Mileage 6;
merge quarter1 quarter2 quarter3 quarter4;
by Account;
run;
proc contents data=yearly; run; /* 4 */
1 View the descriptive information about the input data sets to compare whether
the attributes of the common variables are the same in each of the input data
sets. The PROC CONTENTS output shows that the length of the common
variable, mileage, is four bytes in the quarter1 data set, eight bytes in the
quarter2 data set, and six bytes in the quarter3 and quarter4 data sets.
2 Merge the data sets by the common variable, mileage. Notice that SAS issues a
nonzero return code and prints a warning in the log output.
Examples: Prepare Data 505
3 Change the length of the mileage variable by specifying the appropriate length
in the LENGTH statement before specifying the MERGE statement. The
LENGTH statement must also come before the SET, MERGE, and UPDATE
statements if they are used.
4 View the descriptive information for the merged output data set.
Example Code 21.2 Log Output for Match-Merge of Data Sets Containing Variables with
Different Lengths
WARNING: Multiple lengths were specified for the variable mileage by input data
set(s).
This can cause truncation of data.
You can also use the ATTRIB statement to change the length attribute on the
variable:
data yearly;
merge quarter1 quarter2 quarter3 quarter4;
by Account;
attrib Mileage
506 Chapter 21 / Combining Data
length = 6;
run;
If you expect truncation of data (for example, when removing insignificant blanks
from the end of character values), the warning is expected and you do not want
SAS to issue a nonzero return code. In this case, you can turn this warning off by
setting the VARLENCHK= system option to NOWARN.
Key Ideas
n When combining data sets, variables in the input data sets that have the same
name are referred to as common variables. The common variable in one input data
set does not always share the same attributes as the same variable in another data
set.
n Common variables with the same data but different attributes can cause problems
when the data sets are combined.
n When combining data sets, if the length of the common variable is different, SAS
takes the length from the first data set that is specified in the SET, MERGE, or
UPDATE statement, and then prints a warning message on page 505 in the SAS
log.
See Also
n “LENGTH Statement” in SAS DATA Step Statements: Reference
Example Code
In this example, the data sets class and classfit are match-merged using the
MERGE statement with the BY variable, Name.
The ATTRIB statement is used to control the length, label, and format of the
common variable in the final output data set.
Before merging the data sets, PROC CONTENTS is run on each data set to identify
the common variable and to look for any differences between the attributes of the
common variables.
proc contents data=class; run; /* 1 */
proc contents data=classfit; run;
data merged;
merge class classfit; by Name;
attrib Weight
label = "Weight";
attrib Height Weight Predict format=comma8.2; /* 3 */
run;
proc print data=merged; /* 4 */
run;
proc contents data=merged; run;
1 View the descriptive information about the input data sets to determine
whether they share any common variables and to compare their attributes. The
PROC CONTENTS output shows that the common variables Height and Weight
have different labels in each of the input data sets.
2 Sort the input data sets by the values of the BY variable.
3 Change the attributes by specifying the ATTRIB statement:
4 Print the descriptor information for the merged output data set.
Output 21.3 PROC CONTENTS Output for the Data Sets Class and Classfit Showing
the Differences in Formats and Labels
508 Chapter 21 / Combining Data
Output 21.4 PROC CONTENTS Output for the Default Behavior When Formats and
Labels Are Different in the Merged Data Sets
Output 21.5 PROC CONTENTS Output Showing How the ATTRIB Statement
Changes the Format and Label on the Merged Output Data Set
Key Ideas
n If the label, format, or informat attributes of the common variable in the input data
sets are different, SAS takes the attribute from the first data set that is listed in
the SET, MERGE, MODIFY, or UPDATE statement.
n Any label, format, or informat that you explicitly specify overrides the default. If
all data sets contain explicitly specified attributes, the one specified in the first
data set listed in the SET or MERGE statement overrides the others.
n You can ensure that the new output data set has the attributes that you want by
using the ATTRIB statement in the DATA step.
n You can also use VLABEL and VLABELX functions to modify attributes on
variables.
Examples: Prepare Data 509
See Also
n “ATTRIB Statement” in SAS DATA Step Statements: Reference
Example Code
In this example, the data sets to be merged contain variables that have the same
name but they represent completely different data. In this sense, they are not
meant to be common variables.
To resolve this problem, rename one of the variables in one of the input data sets.
Here, the RENAME= data set option is used to rename the variable Weight in the
CarsSmall input data set:
data vehicles(rename=(weight=weightLBS));
set vehicles;
run;
Key Ideas
n If the label, format, or informat attributes of the common variable in the input data
sets are different, SAS takes the attribute from the first data set that is listed in
the SET, MERGE, MODIFY, or UPDATE statement.
n Any label, format, or informat that you explicitly specify overrides the default. If
all data sets contain explicitly specified attributes, the one specified in the first
data set listed in the SET or MERGE statement overrides the others.
n You can ensure that the new output data set has the attributes that you want by
using the ATTRIB statement in the DATA step.
n You can also use the “VLABEL Function” in SAS Functions and CALL Routines:
Reference and the “VLABELX Function” in SAS Functions and CALL Routines:
Reference to modify attributes on variables.
510 Chapter 21 / Combining Data
See Also
n “RENAME Statement” in SAS DATA Step Statements: Reference
Example Code
In this example, the SET statement is used to concatenate two data sets into a
single output data set. The input data sets share one common variable (common).
Concatenation does not require that the data sets share any variables or that the
data is related in anyway since it simply involves the placing of one intact data set
after the other.
The following table shows the input data sets that are used in this example: animal
and plant.
animal plant
data concatenate;
set animal plant; /* 1 */
run;
proc print data=concatenate; /* 2 */
run;
1 The SET statement reads all of the observations sequentially from both input
data sets, and it then appends the observations from the first data set that is
read to the second data set that is read. The results are stored in a new output
data set named concatenate. As a result, observations from the animal data set
Examples: Concatenate Data 511
are displayed first in the output, followed by the observations from the plant
data set.
2 Print the results (Output 21.6 on page 511).
Output 21.6 PROC PRINT Output Showing Concatenated Data Sets Using the SET
Statement
Notice that the number of observations in the output data set is 12, which is the
sum of the observations from both input data sets.
Key Ideas
n Concatenation is the combining of two or more data sets, one after the other, into
a single data set.
n In the output data set, observations from the data set that is listed first in the SET
statement are followed by observations from the data set that is listed second in
the SET statement, and so on.
n The output data set contains all of the variables from both input data sets. Values
of variables that are found in one data set but not in another are set to missing.
n Generally, you concatenate SAS data sets that have the same variables.
See Also
n Concatenating on page 479 data sets
Example Code
In this example, the SQL procedure is used to concatenate two data sets together
into a new output SAS data set and an SQL table. During the concatenation, SQL
reads all the rows from both input data sets and creates a new output data set
named combined.
The following table shows the input data sets that are used in this example: animal
and plant:
animal plant
proc sql;
create table combined as /* 1 */
select * from animal /* 2 */
outer union corresponding /* 3 */
select * from plant; /* 4 */
quit;
Output 21.7 Concatenated animal and plant Data Sets Using PROC SQL
The resulting output consists of a SAS data set and an SQL table. Both tables have
12 rows (observations) each. The total number of rows in the output is equal to the
sum of the rows from the combined data sets. Values of variables that are found in
one data set but not in another are set to missing.
In the output data set, the observations from the plant data set are appended to
the observations from the animal data set. Because the animal data set is first in
the SET statement, it appears first in the output. The output data set contains all
the variables from both input data sets.
See Also
n “Concatenating Query Results (OUTER UNION)” in SAS SQL Procedure User’s
Guide
n “Union Joins” in SAS SQL Procedure User’s Guide
n “Comparing PROC SQL with the SAS DATA Step” in SAS SQL Procedure User’s
Guide
Example Code
In this example, the APPEND procedure is used to add observations from the Year2
data set to the end of the Year1 data set. During the processing, the procedure
reads only the rows from the Year2 data set. The procedure then updates the Year1
data set by appending the observations from the Year2 data set to it. The
514 Chapter 21 / Combining Data
procedure does not generate a new output data set. It simply updates the existing
Year1 data set.
The following table shows the input data sets that are used in this example: Year1
and Year2:
Year1 Year2
1 2009 1 2010
2 2010 2 2011
3 2011 3 2012
4 2012 4 2013
5 2014
The Year1 data set contains all the observations from both data sets.
Note: You cannot use PROC APPEND to add observations to a SAS data set in a
sequential library.
Key Ideas
n When you concatenate data sets using the SET statement, SAS reads every row of
all input data sets and creates a new output data set. When you use the APPEND
procedure, SAS reads only the rows in the data set specified in the DATA= option
and it does not create a new data set. See “Choosing between the SET Statement
Examples: Concatenate Data 515
and the APPEND Statement” in Base SAS Procedures Guide for more information
about choosing between the two methods.
n If the DATA= data set contains variables that are not in the BASE= data set, you
must specify the FORCE option in the APPEND statement. See “Appending to
Data Sets with Different Variables” in Base SAS Procedures Guide for more
information.
n If no additional processing is necessary, using PROC APPEND or the APPEND
statement in PROC DATASETS is more efficient than using a DATA step to
concatenate data sets.
n Generally, you concatenate SAS data sets that share one or more common
variables.
See Also
n “Concatenating Two SAS Data Sets” in Base SAS Procedures Guide
Example Code
In this example, the OPEN=DEFER option in the DATA statement causes a note to
be written to the SAS log when variables in one or more of the input data sets are
not present in the first input data set that is read.
The following table shows the input data sets that are used in this example. Notice
how the variables predict and lowermean, which are defined in Table2, are not
present in Table1.
When you concatenate the data sets without specifying the OPEN=DEFER option
and you print the results, the output shows a typical concatenation in which the
values for the variables that are not in all input data sets appear as missing:
data concat;
set table1 table2;
516 Chapter 21 / Combining Data
run;
proc print data=concat;
title "Concatenate Table1 and Table2";
run;
Compare this output to when you use the OPEN=DEFER option. Notice how only
the variables from the data set that are listed in the first SET statement appear in
the final output data set.
data concat2;
set table1 table2 open=defer;
run;
proc print data=concat2;
title "Concatenate with OPEN=DEFER";
run;
Output 21.9 Partial Log Output Showing Missing Variables When Using
OPEN=DEFER
NOTE: There were 4 observations read from the data set WORK.TABLE1.
NOTE: For OPEN=DEFER processing, all variables processed should be specified by
the first data
set listed in the SET statement.
NOTE: Variable predict, found on WORK.TABLE2, is being ignored.
NOTE: Variable lowermean, found on WORK.TABLE2, is being ignored.
NOTE: There were 4 observations read from the data set WORK.TABLE2.
NOTE: The data set WORK.CONCAT2 has 8 observations and 5 variables.
Key Ideas
n In most cases, if the set of variables defined by any subsequent data set differs
from the variables defined by the first data set, SAS prints a warning message to
the log but does not stop execution.
n When you concatenate data sets using the SET statement, SAS reads every row of
all input data sets and creates a new output data set. When you use the APPEND
procedure, SAS reads only the rows in the data set specified in the DATA= option
and it does not create a new data set. See “Choosing between the SET Statement
and the APPEND Statement” in Base SAS Procedures Guide for more information
about choosing between the two methods.
n If the DATA= data set contains variables that are not in the BASE= data set, you
must specify the FORCE option in the APPEND statement. See “Appending to
Data Sets with Different Variables” in Base SAS Procedures Guide for more
information.
n If no additional processing is necessary, using PROC APPEND or the APPEND
statement in PROC DATASETS is more efficient than using a DATA step to
concatenate data sets.
n Generally, you concatenate SAS data sets that have the same variables.
See Also
n “SET Statement” in SAS DATA Step Statements: Reference
n FORCE option
Example Code
The following program creates the input data sets, sorts each of the data sets by
their common variable, interleaves the data sets, and then prints the results. The
common variable, common, is specified as the BY variable.
The following input data sets, animal and plant are used in this example:
animal plant
The following program first sorts both input data sets using the SORT procedure,
interleaves the data sets, and then prints the results.
proc sort data=animal; by common; run;
proc sort data=plant; by common; run;
data interleave;
set animal plant;
by common;
run;
proc print data=interleave; run;
Notice that the input data sets are sorted by the same variable that is specified in
the DATA step BY statement.
The output data set contains all the variables from all data sets, as well as
variables created by the DATA step. Values of variables that are found in one data
set but not in another are set to missing. The number of observations in the output
data set is 12, which is the sum of the observations from both data sets.
Examples: Interleave Data 519
Key Ideas
n Input data sets must first be indexed or sorted by the values of the BY variables.
n The observations in the interleaved data sets are not combined; they are copied
from the original data sets in the order of the values of the BY variable.
n Input data sets are processed sequentially, in the order in which they are listed in
the SET statement.
n Observations in the output data set are arranged by the values of the BY variable.
See Also
n “Interleaving” on page 480 data sets
Example Code
The following program creates the input data sets, sorts each of the data sets by its
common variable, interleaves the data sets, and then prints the results.
The program first sorts the input data sets using the SORT procedure, which groups
and sorts the rows in each data set by the values of the common variable, common.
Notice that there are duplicate values for the shared BY variable in both input data
sets.
The following table shows the input data sets animalDupes and plantDupes, with
the duplicate values highlighted in each data set:
animalDupes plantDupes
1 a Ant 1 a Apple
2 a Ape 2 b Banana
3 b Bird 3 c Coconut
4 c Cat 4 c Celery
5 d Dog 5 d Dewberry
6 e Eagle 6 e Eggplant
data interleave;
set animalDupes plantDupes;
by common;
run;
The output data set contains all the variables from both input data sets. Values of
variables that are found in one data set but not in another are set to missing. The
number of observations in the output data set is 12, which is the sum of the
observations from the input data sets. The observations are written to the output
data set in the order in which they occur in the original data sets.
Examples: Interleave Data 521
Output 21.11 PROC PRINT Output for Interleaved Data Sets with Duplicate Values
of the BY Variable
Notice that observations from the input data set animalDupes are listed first,
followed by observations from the second input data set, plantDupes. This is
because the DATA step processes data sets sequentially, in the order in which they
are listed in the SET statement. For example, if you change the order of the input
data sets so that the data set plantDupes is listed first in the SET statement, the
observations from plantDupes would be listed first in the output data set.
data interleave;
set plantDupes animalDupes; by common;
run;
proc print data=interleave; run;
522 Chapter 21 / Combining Data
Output 21.12 PROC PRINT Output for Interleaved Data Sets with the SET
Statement Order Changed
Key Ideas
n Input data sets must first be indexed or sorted by the values of the BY variables.
n The observations in the interleaved data sets are not combined; they are copied
from the original data sets in the order of the values of the BY variable.
n Input data sets are processed sequentially, in the order in which they are listed in
the SET statement.
n Observations in the output data set are arranged by the values of the BY variable.
See Also
n “Interleaving” on page 480
Example Code
The following program creates the input data sets, sorts each of the data sets by
the common variable, interleaves the data sets, and then prints the results.
The following table shows the input data sets used in this example: animalDupes
and plantMissing2 with the different BY values highlighted:
animalDupes plantMissing2
1 a Ant 1 a Apple
2 a Ape 2 b Banana
3 b Bird 3 c Coconut
4 c Cat 4 e Eggplant
5 d Dog 5 f Fig
6 e Eagle
Both input data sets contain values for the variable common that are not present in
the other data set. For example, the value “d” in the animalDupes data set is not
present in the plantMissing2 data set. The value “f” in the plantMissing2 data set
is not present in the animalDupes data set.
proc sort data=animalDupes; by common; run; /* 1 */
proc sort data=plantMissing2; by common; run;
data interleave; /* 2 */
set animalDupes plantMissing2;
by common;
run;
1 Create the input data sets, animalDupes and plantMissing2. Each input data
set contains the variable common, and the SORT procedure sorts observations in
order of the values of the BY variable, common.
2 The DATA step interleaves the data sets based on the values of common.
The output data set contains all the variables from both input data sets. Values of
variables that are found in one data set but not in another are set to missing. The
number of observations in the output data set is 11, which is the sum of the
observations from the input data sets. The observations are written to the output
data set in the order in which they occur in the original data sets.
524 Chapter 21 / Combining Data
Output 21.13 PROC PRINT Output for Interleaved Data Sets with Different Values
for the BY Values
Key Ideas
n Input data sets must first be indexed or sorted by the values of the BY variables.
n The observations in the interleaved data sets are not combined; they are copied
from the original data sets in the order of the values of the BY variable.
n Input data sets are processed sequentially, in the order in which they are listed in
the SET statement.
n Observations in the output data set are arranged by the values of the BY variable.
See Also
n “Interleaving” on page 480 data sets
Example Code
In this example, the MERGE statement is used with the BY statement to merge two
data sets. The observations from the first data set are merged with the
observations from the second data set based on the values of a common BY
variable.
The input data sets animal and plant both contain the variable common, which is
specified as the BY variable in this example.
This example shows a one-to-one merge because there are no duplicate values for
the BY variable in either of the input data sets.
The following table shows the input data sets that are used in this example: animal
and plant:
animal plant
1 a Ant 1 a Apple
2 b Bird 2 b Banana
3 c Cat 3 c Coconut
4 d Dog 4 d Dewberry
5 e Eagle 5 e Eggplant
6 f Frog 6 f Fig
The following program first sorts both input data sets using the SORT procedure,
and then match-merges the data sets by the common BY variable, common. PROC
PRINT is used to print the results:
proc sort data=animal; by common; run;
proc sort data=plant; by common; run;
data matchmerge;
merge animal plant;
526 Chapter 21 / Combining Data
by common;
run;
proc print data=matchmerge; run;
Key Ideas
n Match-merging is used for merging data sets that have one or more common
variables and you want to merge the data sets based on the values of the common
variables.
n Input data sets must be indexed or sorted on the BY variable prior to merging.
n A match-merge in SAS is comparable to an inner join in PROC SQL when all of the
values of the BY variable match and there are no duplicate BY variables. See “Inner
Joins” in SAS SQL Procedure User’s Guide for more information.
n SAS retains the values of all variables in the program data vector even if the value
is missing (or unmatched).
n When SAS reads the last observation from a BY group in one data set, SAS retains
its values in the program data vector for all variables that are unique to that data
set until all observations for that BY group have been read from all data sets. The
total number of observations in the final data set is the sum of the maximum
number of observations in a BY group from either data set.
See Also
n “Match-Merging” on page 481 data sets
n “Comparing DATA Step Match-Merges with PROC SQL Joins” in SAS SQL
Procedure User’s Guide
Examples: Match-Merge Data 527
Example Code
In this example, the MERGE statement is used with the BY statement to merge two
data sets in a one-to-many merge. The observations from the first data set are
merged with the observations from the second data set based on the values of a
common BY variable. Because there are duplicate values for the BY variable in one
of the input data sets, this is a one-to-many merge.
This example shows what happens to the values of a common variable when it is
not specified as the BY variable.
The input data sets that are used in this example, one and many, both contain the
variables ID and state. The variable ID is the BY variable and its values are unique
within the one data set. However, its values are not unique within the many data set.
There are multiple observations for values of ID in the many data set. The variable
state is common to both input data sets, but it is not specified as a BY variable.
The purpose for the merge is to replace the incorrect abbreviations for state in the
many data set with the correct, two–letter abbreviations shown in the data set one.
The following table shows the one and many data sets that are used in this example:
one many
ID city state
ID state
1 Phoenix Ariz
1 AZ 2 Boston Mass
2 MA 2 Foxboro Mass
3 WA 3 Olympia Mass
4 WI 3 Seattle Wash
3 Spokane Wash
4 Madison Wis
4 Milwaukee Wis
4 Madison Wis
4 Hurley Wis
528 Chapter 21 / Combining Data
Output 21.15 PROC PRINT Output Showing Input Data Sets That Contain a
Common BY Variable and a Common Non-BY Variable
When these data sets are merged, the value of the common variable in data set one
overwrites the values from data set many because the one data set is listed second
in the MERGE statement. However, on subsequent iterations of the MERGE
statement for the same BY group, the one data set is not read again. Therefore, the
values from the one data set do not replace the remaining values in the BY group.
This means that the first value in each BY group is replaced but the remaining
values are not.
data three;
merge many one;
by ID;
run;
proc print data=three noobs; run;
title;
Output 21.16 PROC PRINT Output for Match Merging Observations with a Common
Variable That is Not the BY Variable
Examples: Match-Merge Data 529
To replace the values for state for all observations within the BY group, the
common (non-BY) variable from the many data set must be dropped or renamed:
data solution;
merge many(drop=state) one;
by ID;
run;
proc print data=solution noobs; run;
Output 21.17 PROC PRINT Output for Solution to One-to-many Merge with
Common Variables That Are Not the BY Variables
SAS reads the value from data set one on the first iteration of the merge. Because
variables that are read from data sets are automatically retained throughout a BY
group, all observations for the BY group, state, contain the values from data set
one.
Key Ideas
n For the first matching observation in a one-to-many match-merge, the value of the
common variable in the last data set specified in the MERGE statement overwrites
the values from the previous data sets. However, on subsequent iterations of the
MERGE statement for the same BY group, the first data set is not read again. So,
the remaining values of the common BY variable come from the originating data
set rather than from the last data set read.
n Match-merging is used for merging data sets that have one or more common
variables and you want to merge the data sets based on the values of the common
variables.
n Input data sets must be indexed or sorted on the BY variable prior to merging.
n A match-merge in SAS is comparable to an inner join in PROC SQL when all of the
values of the BY variable match and there are no duplicate BY variables. See “Inner
Joins” in SAS SQL Procedure User’s Guide for more information.
530 Chapter 21 / Combining Data
n SAS retains the values of all variables in the program data vector even if the value
is missing (or unmatched).
See Also
n “Match-Merging” on page 481 data sets
n “Comparing DATA Step Match-Merges with PROC SQL Joins” in SAS SQL
Procedure User’s Guide
Example Code
In this example, the MERGE statement is used with the BY statement to merge two
data sets. The observations from the first data set are merged with the
observations from the second data set based on the values of a common BY
variable. The input data sets contain duplicate values of the BY variable (common),
so this is an example of a many-to-one merge.
The following table shows the input data sets that are used in this example:
animalDupes and plantDupes:
animalDupes plantDupes
1 a Ant 1 a Apple
2 a Ape 2 b Banana
3 b Bird 3 c Coconut
4 c Cat 4 c Celery
5 d Dog 5 d Dewberry
6 e Eagle 6 e Eggplant
The following program first sorts both input data sets using the SORT procedure,
and then match-merges the data sets by the common variable, common. PROC
PRINT is used to print the results:
proc sort data=animalDupes; by common; run;
proc sort data=plantDupes; by common; run;
Examples: Match-Merge Data 531
data matchmerge;
merge animalDupes plantDupes;
by common;
run;
proc print data=matchmerge; run;
Key Ideas
n Match-merging is used for merging data sets that have one or more common
variables and you want to merge the data sets based on the values of the common
variables.
n Input data sets must be indexed or sorted on the BY variable prior to merging.
n A match-merge in SAS is comparable to an inner join in PROC SQL when all of the
values of the BY variable match and there are no duplicate BY variables. See “Inner
Joins” in SAS SQL Procedure User’s Guide for more information.
n SAS retains the values of all variables in the program data vector even if the value
is missing (or unmatched).
n When SAS reads the last observation from a BY group in one data set, SAS retains
its values in the program data vector for all variables that are unique to that data
set until all observations for that BY group have been read from all data sets. The
total number of observations in the final data set is the sum of the maximum
number of observations in a BY group from either data set.
n When there are unequal members of observations in a data set, the common
variables get their values from the data set contributing those values. Any unique
variables to data sets retain their values until the end of the BY group.
532 Chapter 21 / Combining Data
See Also
n “Match-Merging” on page 481 data sets
n “Comparing DATA Step Match-Merges with PROC SQL Joins” in SAS SQL
Procedure User’s Guide
Example Code
In this example, the MERGE statement is used with the BY statement to perform a
match-merge on two input data sets. The observations from one data set are
merged together with the observations from another data set by a common
variable.
The two input data sets have different values for their common variables resulting
in unmatched observations. Unmatched observations means that an observation in
one input data set does not contain the same value for the BY variable in the other
input data set.
The following table shows the input data sets that are used in this example:
animalMissing and plantMissing2:
animalMissing plantMissing2
1 a Ant 1 a Apple
2 c Cat 2 b Banana
3 d Dog 3 c Coconut
4 e Eagle 4 e Eggplant
5 f Fig
In the input data sets, data set animalMissing does not contain a value of “b” or “f”
for the common variable, but data set plantMissing2 does. Data set
plantMissing2 does not contain the value “d” for the common variable, but data
set animalMissing does.
Examples: Match-Merge Data 533
The following program first sorts both input data sets using the SORT procedure,
and then match-merges the data sets by the common variable, common. PROC
PRINT is used to print the results:
proc sort data=animalMissing; by common; run;
proc sort data=plantMissing2; by common; run;
data matchmerge;
merge animalMissing plantMissing2;
by common;
run;
Notice how SAS retains the values of all variables from both input data sets in the
final output, even if the value is missing in one data set.
Output 21.19 PROC PRINT Output for Match-Merge with Unmatched Observations
Key Ideas
n Match-merging is used for merging data sets that have one or more common
variables and you want to merge the data sets based on the values of the common
variables.
n Input data sets must be indexed or sorted on the BY variable prior to merging.
n A match-merge in SAS is comparable to an inner join in PROC SQL when all of the
values of the BY variable match and there are no duplicate BY variables. See “Inner
Joins” in SAS SQL Procedure User’s Guide for more information.
n SAS retains the values of all variables in the program data vector even if the value
is missing (or unmatched).
n When SAS reads the last observation from a BY group in one data set, SAS retains
its values in the program data vector for all variables that are unique to that data
set until all observations for that BY group have been read from all data sets. The
total number of observations in the final data set is the sum of the maximum
number of observations in a BY group from either data set.
534 Chapter 21 / Combining Data
See Also
n “Match-Merging” on page 481 data sets
n “Comparing DATA Step Match-Merges with PROC SQL Joins” in SAS SQL
Procedure User’s Guide
Example Code
In this example, the MERGE statement is used with the BY statement to merge two
data sets. The observations from the first data set are merged with the
observations from the second data set based on the values of a common variable.
The following example uses the MERGE and BY statements to combine two data
sets that have unmatched observations. This example is identical to “Example:
Match-Merge Observations with Different Values of the BY Variable” on page 532
except that in this example, the IN= data set option is used to remove the
unmatched observations from the output data set. Unmatched observations refers
to observations in which the values for the shared BY variable are not equal in both
input data sets.
The following table shows the input data sets that are used in this example:
animalMissing and plantMissing2:
animalMissing plantMissing2
1 a Ant 1 a Apple
2 c Cat 2 b Banana
3 d Dog 3 c Coconut
4 e Eagle 4 e Eggplant
5 f Fig
In the input data sets, data set animalMissing does not contain the value b or f for
the common variable, but data set plant does. Data set plantMissing2 does not
contain the value d for the common variable, but data set animal does.
Examples: Match-Merge Data 535
The data sets animalMissing and plantMissing2 do not contain all values of the
BY variable common.
The following program first sorts both input data sets using the SORT procedure,
and then match-merges the data sets by the common variable, common. PROC
PRINT is used to print the results:
proc sort data=animalMissing; by common; run;
proc sort data=plantMissing2; by common; run;
data matchmerge;
merge animalMissing plantMissing2;
by common;
run;
proc print data=matchmerge; run;
In the next example, the program match-merges the two data sets and uses the IN=
data set option on the input data sets to remove the unmatched observations from
the output data set.
The IN= data set option is a Boolean value variable, which has a value of 1 if the
data set contributes to the current observation in the output and a value of 0 if the
data set does not contribute to the current observation in the output.
data matchmerge2;
merge animalMissing(in=i) plantMissing2(in=j);
by common;
if (i=1) and (j=1);
run;
proc print data=matchmerge2; run;
Output 21.21 PROC PRINT Output for Match-Merge Using the IN= Data Set Option
to Remove Unmatched Observations
536 Chapter 21 / Combining Data
Key Ideas
n Match-merging is used for merging data sets that have one or more common
variables and you want to merge the data sets based on the values of the common
variables.
n Input data sets must be indexed or sorted on the BY variable prior to merging.
n A match-merge in SAS is comparable to an inner join in PROC SQL when all of the
values of the BY variable match and there are no duplicate BY variables. See “Inner
Joins” in SAS SQL Procedure User’s Guide for more information.
n SAS retains the values of all variables in the program data vector even if the value
is missing (or unmatched).
n When SAS reads the last observation from a BY group in one data set, SAS retains
its values in the program data vector for all variables that are unique to that data
set until all observations for that BY group have been read from all data sets. The
total number of observations in the final data set is the sum of the maximum
number of observations in a BY group from either data set.
See Also
n “Match-Merging” on page 481 data sets
n “Comparing DATA Step Match-Merges with PROC SQL Joins” in SAS SQL
Procedure User’s Guide
Example Code
In this example, the MERGE statement is used with the BY statement to perform a
match-merge on two input data sets. The observations from one data set are
merged with observations from another data set based on a common BY variable.
Examples: Match-Merge Data 537
The data set fruit is merged with the data set color based on the values of the
common variable, ID.
fruit color
1 a apple 1 a amber
2 c apricot 2 b brown
3 d banana 3 b blue
4 e bluebarry 4 b black
5 c cantaloupe 5 b beige
6 c coconut 6 b bronze
7 c cherry 7 c cocoa
8 c crabapple 8 c cream
9 c cranberry
Notice that the data set fruit has duplicate values, a and c for ID. The data set
color has duplicate values, b and c for ID.
Note: In this example, it is assumed that the data sets are pre-sorted or indexed on
the BY variable, ID.
data merged;
merge fruit color;
by id;
run;
Output 21.22 PROC PRINT Output for Merging Observations by a Common Variable
When Duplicates Exist in More Than One Input Data Set
Notice the different values for the variable fruit for the BY group c. The value
cocoa is overwritten in the PDV with cream when the second observation in the BY
group is read from the data set color. The value cream carries down the rest of the
BY group. The value is carried down because when a BY statement is used with the
MERGE statement, variables do not reinitialize to missing until the BY group
changes.
Key Ideas
n Match-merging is intended for merging data sets that have one or more common
variables and you want to merge the data sets based the values of the common
variables.
n Input data sets must be indexed or sorted on the BY variable prior to merging.
n A match-merge in SAS is comparable to an inner join in PROC SQL when all of the
values of the BY variable match and there are no duplicate BY variables. See “Inner
Joins” in SAS SQL Procedure User’s Guide for more information.
n SAS retains the values of all variables in the program data vector even if the value
is missing (or unmatched).
n When SAS reads the last observation from a BY group in one data set, SAS retains
its values in the program data vector for all variables that are unique to that data
set until all observations for that BY group have been read from all data sets. The
total number of observations in the final data set is the sum of the maximum
number of observations in a BY group from either data set.
Examples: Match-Merge Data 539
See Also
n “Match-Merging” on page 481 data sets
n “Comparing DATA Step Match-Merges with PROC SQL Joins” in SAS SQL
Procedure User’s Guide
Example Code
In this example, the MERGE statement is used with the BY statement to perform a
match-merge on two input data sets when one data set contains missing data. The
example shows how missing values are treated the same as nonmissing values.
The following table shows the input data sets that are used in this example. Note
that data set one has missing values for age after the first value for each ID.
one two
1 1 8 90 1 1 Sarah 11
2 1 . 100 2 2 John 10
3 1 . 95
4 2 9 80
5 2 . 100
Doing a simple merge of these two data sets by ID results in missing values for the
variable age in the merged output data set:
data merge1;
merge one two;
by id;
run;
proc print data=merge1; title 'Merged by ID'; run;
540 Chapter 21 / Combining Data
After the first observation from each data set is combined, SAS reads data set one
for the next observation for the BY group. It overwrites the values in the PDV with
those values, including the missing value for age. It does not read from data set two
again since the BY group is complete. Therefore, the next observation contains a
missing value for age.
To get the values for age in data set two to replace the missing values, you can use
the IF statement to check for missing values. If the value for age is missing, a
temporary variable, temp_age, is created to contain the last nonmissing value for
the BY group. The value for temp_age is then used as the value for age if age is
missing.
data merge2 (drop=temp_age);
merge one two;
by id;
retain temp_age;
if first.id then temp_age = .;
if age = . then age = temp_age;
else temp_age = age;
run;
proc print; title 'Merged by ID with Age Retained'; run;
Examples: Match-Merge Data 541
Key Ideas
n Match-merging is intended for merging data sets that have one or more common
variables and you want to merge the data sets based the values of the common
variables.
n Input data sets must be indexed or sorted on the BY variable prior to merging.
n A match-merge in SAS is comparable to an inner join in PROC SQL when all of the
values of the BY variable match and there are no duplicate BY variables. See “Inner
Joins” in SAS SQL Procedure User’s Guide for more information.
n SAS retains the values of all variables in the program data vector even if the value
is missing (or unmatched).
n When SAS reads the last observation from a BY group in one data set, SAS retains
its values in the program data vector for all variables that are unique to that data
set until all observations for that BY group have been read from all data sets. The
total number of observations in the final data set is the sum of the maximum
number of observations in a BY group from either data set.
See Also
n “Match-Merging” on page 481 data sets
n “Comparing DATA Step Match-Merges with PROC SQL Joins” in SAS SQL
Procedure User’s Guide
542 Chapter 21 / Combining Data
Example Code
In this example, the MERGE statement is used without a BY statement to perform a
one-to-one merge of two data sets that have an equal number of rows. The input
data sets animal and plantG contain a common variable, named common. The values
for the shared variable are equal except in row 6, where the value is f in the animal
data set and g in the plantG data set.
The following table shows the input data sets that are used in this example: animal
and plantG:
animal plantG
1 a Ant 1 a Apple
2 b Bird 2 b Banana
3 c Cat 3 c Coconut
4 d Dog 4 d Dewberry
5 e Eagle 5 e Eggplant
6 f Frog 6 g Fig
The following program merges these data sets and prints the results:
data merged;
merge animal plantG;
run;
Output 21.23 PROC PRINT Output for One-to-One Merge of Data Sets with an
Equal Number of Observations
The output data set contains all the variables from both input data sets. Notice
that the value for the variable common in observation 6 in the output data set is g.
The values for the common variable, common, in the data set plantG replace the
values for common in the animal data set.
Key Ideas
n The MERGE statement recognizes common variables in the input data sets, but,
without a BY statement, it does not merge observations based on variable values.
Rather, it matches observations implicitly based on row number, regardless of the
variable values.
n If the column names in the input data sets are the same and no BY statement is
specified, then the merge overwrites the values of the common columns. The
values of the common variables in the data set specified last in the MERGE
statement overwrite the values in the previously specified data sets. Column
names (variables) that are not shared by all the input data sets are added as new
columns.
n The DATA step reads the first observation from the first data set and then reads
the first observation from the second data set, and so on.
n The resulting output data set contains all the observations from all the input data
sets, regardless of whether they have the same number of observations.
n You can use the MERGENOBY system option to control log messaging when
performing a one-to-one merge.
See Also
n “MERGE Statement” in SAS DATA Step Statements: Reference
Example Code
In this example, the MERGE statement is used without a BY statement to perform a
one-to-one merge of two data sets that have an unequal number of rows.
The following table shows the input data sets that are used in this example:animal
and plantMissing:
animal plantMissing
1 a Ant 1 a Apple
2 b Bird 2 b Banana
3 c Cat 3 c Coconut
4 d Dog
5 e Eagle
6 f Frog
The data sets animal and plantmissing both contain the variable common, and the
observations are arranged by the values of common. The plantmissing data set has
fewer observations than the animal data set.
The following program merges these unequal data sets and prints the results:
data animal;
input common $ animal $;
datalines;
a Ant
b Bird
c Cat
d Dog
e Eagle
f Frog
;
data plantMissing;
input common $ plant $;
datalines;
a Apple
b Banana
c Coconut
;
data merged;
merge animal plantmissing;
Examples: Merge Data One-to-One 545
run;
proc print data=merged; run;
Output 21.24 PROC PRINT Output for Merging Data Sets with an Unequal Number
of Observations
Compare the program above to the one-to-one reading of the same data sets using
the SET statement:
data combine;
set animal;
set plantMissing;
run;
proc print data=combine; run;
Output 21.25 PROC PRINT Output for Combining Data Sets with an Unequal
Number of Observations Using the SET Statement
The DATA step stops selecting observations for output after it reads the last
observation in the data set with the least number of observations. Therefore, the
number of observations in the resulting output data set is the number of
observations in the smallest original data set. In this example, the DATA step stops
selecting observations when it reads the last observation in plantMissing.
Key Ideas
n The MERGE statement recognizes common variables in the input data sets, but,
without a BY statement, it does not merge observations based on variable values.
Rather, it matches observations implicitly based on row number, regardless of the
variable values.
n If the column names in the input data sets are the same and no BY statement is
specified, then the merge overwrites the values of the common columns. The
546 Chapter 21 / Combining Data
values of the common variables in the data set specified last in the MERGE
statement overwrite the values in the previously specified data sets. Column
names (variables) that are not shared by all the input data sets are added as new
columns.
n The DATA step reads the first observation from the first data set and then reads
the first observation from the second data set, and so on.
n The resulting output data set contains all the observations from all the input data
sets, regardless of whether they have the same number of observations.
n You can use the MERGENOBY system option to control log messaging when
performing a one-to-one merge.
See Also
n “MERGE Statement” in SAS DATA Step Statements: Reference
Example Code
The following example shows how you can get undesirable results when you merge
data sets that contain duplicate values of common variables. In this case, the
merging is done without specifying a BY variable. This type of merging should be
reserved for data sets that have a one-to-one relationship. The input data sets in
this example do not have a one-to-one relationship.
The following table shows the input data sets that are used in this example:
animalDupes and plantDupes:
animalDupes plantDupes
1 a Ant 1 a Apple
2 a Ape 2 b Banana
3 b Bird 3 c Coconut
4 c Cat 4 c Celery
5 d Dog 5 d Dewberry
6 e Eagle 6 e Eggplant
Examples: Merge Data One-to-One 547
The data sets animalDupes and plantDupes contain the variable common, and each
data set contains observations with duplicate values of common.
The following program merges the data sets and prints the results.
/* This program illustrates undesirable results. */
data animalDupes;
input common $ animal $;
datalines;
a Ant
a Ape
b Bird
c Cat
d Dog
e Eagle
;
data plantDupes;
input common $ plant $;
datalines;
a Apple
b Banana
c Coconut
c Celery
d Dewberry
e Eggplant
;
data merged;
merge animalDupes plantDupes;
run;
proc print data=merged; run;
Output 21.26 PROC PRINT Output for Undesirable Results When Merging Data Sets
That Have Duplicate Values of Common Variables
This method works as expected on data that has a one-to-one relationship. Since
the relationship of the data in this example has both a one-to-many and many-to-
one relationship, you might get unwanted results
548 Chapter 21 / Combining Data
Key Ideas
n The MERGE statement recognizes common variables in the input data sets, but,
without a BY statement, it does not merge observations based on variable values.
Rather, it matches observations implicitly based on row number, regardless of the
variable values.
n If the column names in the input data sets are the same and no BY statement is
specified, then the merge overwrites the values of the common columns. The
values of the common variables in the data set specified last in the MERGE
statement overwrite the values in the previously specified data sets. Column
names (variables) that are not shared by all the input data sets are added as new
columns.
n The DATA step reads the first observation from the first data set and then reads
the first observation from the second data set, and so on.
n The resulting output data set contains all the observations from all the input data
sets, regardless of whether they have the same number of observations.
n You can use the MERGENOBY system option to control log messaging when
performing a one-to-one merge.
See Also
n “MERGE Statement” in SAS DATA Step Statements: Reference
Example Code
The following example shows the undesirable results obtained from using the one-
to-one merge to combine data sets that have different values for their common
variable.
In this example, the data sets animalMissing and plantMissing2 have different
values for the variable common.
Examples: Merge Data One-to-One 549
The following table shows the input data sets that are used in this example:
animalMissing and plantMissing2:
animalMissing plantMissing2
1 a Ant 1 a Apple
2 c Cat 2 b Banana
3 d Dog 3 c Coconut
4 e Eagle 4 e Eggplant
5 f Fig
The following program produces the data set merged and prints the results:
data merged;
merge animalMissing plantMissing2;
run;
proc print data=merged; run;
Output 21.27 PROC PRINT Output Showing Undesirable Results with a One-to-One
Merge of Data Sets with Different Values for the Common Variable
Key Ideas
n The MERGE statement recognizes common variables in the input data sets, but,
without a BY statement, it does not merge observations based on variable values.
Rather, it matches observations implicitly based on row number, regardless of the
variable values.
n If the column names in the input data sets are the same and no BY statement is
specified, then the merge overwrites the values of the common columns. The
values of the common variables in the data set specified last in the MERGE
statement overwrite the values in the previously specified data sets. Column
names (variables) that are not shared by all the input data sets are added as new
columns.
n The DATA step reads the first observation from the first data set and then reads
the first observation from the second data set, and so on.
550 Chapter 21 / Combining Data
n The resulting output data set contains all the observations from all the input data
sets, regardless of whether they have the same number of observations.
n You can use the MERGENOBY system option to control log messaging when
performing a one-to-one merge.
See Also
n “MERGE Statement” in SAS DATA Step Statements: Reference
Example Code
In this example, two SET statements are used to combine observations from one
data set together with the observations from another data set. The input data sets
animal and plantG contain a common variable, named common. The values for the
shared variable are equal except in row 6, where the value is f in the animal data
set and g in the plantG data set.
The following table shows the input data sets that are used in this example: animal
and plantG:
animal plantG
1 a Ant 1 a Apple
2 b Bird 2 b Banana
3 c Cat 3 c Coconut
4 d Dog 4 d Dewberry
5 e Eagle 5 e Eggplant
6 f Frog 6 g Fig
Examples: Combine Data One-to-One 551
The following program combines the data sets and prints the results:
data combine;
set animal;
set plantG;
run;
Output 21.28 PROC PRINT Output for Combine Data Sets That Contain an Equal
Number of Observations Using the SET Statement
Because the SET statement does not merge observations by matching the values of
a common variable, using this method on data sets that have different values for
the like-named variables can cause undesired results.
In this example, the animal data set has a value of f for the like-named variable,
common, in row 6. The plantG data set has a value of g for common in row 6. The data
set plantG is specified in the last SET statement, so the values for common in the
plantG data set overwrite the values for common in the animal data set.
Key Ideas
n The values of common variables from the data set specified last in the SET
statement replace the values of the common variables from previous data sets.
n The DATA step reads the first observation from the first data set and then reads
the first observation from the second data set, and so on.
n The resulting output data set contains all the variables from all the input data sets.
n The DATA step stops selecting observations for output after it reads the last
observation in the data set with the least number of observations. Therefore, the
number of observations in the resulting output data set is the number of
observations in the smallest original data set.
n To use the SET statement to combine data sets that have an unequal number of
observations, you can use the POINT= option to directly access and match the
observations by a common variable.
552 Chapter 21 / Combining Data
See Also
n KEY= data set option
Example Code
In this example, a hash table is used to merge two sets of data that have a common
variable. The hash table is created from a SAS data set that is loaded into memory
and is available for use by the DATA step that created it. The common variable in
the hash table is unique and is used as a key that provides very fast lookup into the
internal memory table.
The second data set is used as the base data set. The DATA step reads
observations from the base data set and uses the common variable to find a match
in the hash table. Matching observations are written out to the output data set
named in the DATA statement.
The following tables show the input data sets product_list and supplier, with the
common variable Supplier_ID highlighted in each of the data sets:
product_list
supplier
In this program, the product_list data set is used as the base data set. The
DECLARE statement names the in-memory location of the supplier data set. The
DEFINEKEY method identifies the unique key variable that is used to join the data
sets. The DEFINEDATA method lists additional variables that we want loaded into
memory.
Note: The base data set contains duplicates of the key. The observations are read
and processed one at a time, and the duplicate values do not affect processing.
data supplier_info;
drop rc;
length Supplier_Name $40 Supplier_Address $ 45 Country $
2; /* 1 */
if _N_=1 then do;
declare hash
S(dataset:'work.supplier'); /* 2 */
S.definekey('Supplier_ID');
S.definedata('Supplier_Name',
'Supplier_Address','Country');
S.definedone();
call missing(Supplier_Name,
554 Chapter 21 / Combining Data
Supplier_Address,Country); /* 3 */
end;
set
work.product_list; /* 4 */
rc=S.find(); /*
5 */
run;
proc print data=supplier_info;
var Product_ID Supplier_ID Supplier_Name
Supplier_Address Country;
title "Product Information";
run;
title;
Key Ideas
n A SAS hash table contains rows (hash entries) and columns (hash variables)
n Each hash entry must have at least one key column and one data column. Values
can be hardcoded or loaded from a SAS data set.
n A hash table resides completely in memory, making its operations fast. The data
does not need to be pre-sorted.
n A hash table is temporary: once the DATA step has stopped execution, it ceases to
exist. Thus, it cannot be reused in any subsequent step. However, its content can
be saved in a SAS data set or external database.
n The hash object is sized dynamically.
556 Chapter 21 / Combining Data
See Also
n “OUTPUT Method” in SAS Component Objects: Reference
Example Code
In this example, the UPDATE statement is used to update a master data set based
on new values in a transaction data set. The data set, master, contains the original
values of the shared variables common and plant. The transaction data set,
plantNew, contains the new values for the shared variable plant. The goal is to
update the master data set with the new values for plant. Specifically, the value
Eggplant for plant in row 5 should be replaced with the value Escarole from the
plantNew data set.
The following table shows the input data set, master, and the transaction data set,
plantNew.
master plantNew
The program first creates the two input data sets, then updates the master data set
based on the values of the BY variable, common. The PRINT procedure prints the
results:
Examples: Update Data 557
data master2;
update master plantNew;
by common;
run;
proc print data=master2; run;
Output 21.29 PROC PRINT Output for Using the UPDATE Statement to Update
Data
Key Ideas
n The UPDATE statement enables you to update values in one data set (the master
data set) based on the values in another data set (transaction data set). The
UPDATE statement creates a new output data set.
n The BY statement is required and the input data sets must contain an index or be
sorted by the values of the BY variables.
n Because the UPDATE statement creates a new file when it generates output, you
can add, delete, or rename variables when you perform an update.
n By default, missing values in the transaction data set do not replace existing
values in the master data set. You can change this behavior so that missing values
in the transaction data set replace values in the master data set by specifying
NOMISSINGCHECK in the UPDATEMODE= option.
n If the transaction data set contains duplicate values of the BY variable, then the
values from the transaction data set replace the values in the master data set.
n If an observation in the transaction data set does not have a corresponding
observation in the master data set, then SAS adds an observation to the master
output data set. Observations are matched based on the value of the BY variable.
n If no changes need to be made to an observation in the master data set, then that
observation does not need to be included in the transaction data set.
558 Chapter 21 / Combining Data
See Also
n “Updating” on page 485
Example Code
In this example, the UPDATE statement is used to update a master data set based
on new values in a transaction data set. The transaction data set contains duplicate
values of the common variable in observations 4 and 5. The data sets also share the
common variable plant. This example shows what happens to common variables
that are not the BY variable when there are duplicate values for the BY variable.
The following table shows the master input data set, master, and the transaction
data set, plantNewDupes:
master plantNewDupes
The following program first creates the two input data sets, then updates the
master data set based on the values of the BY variable, common. Because the
transaction data set contains duplicate values for the BY variable, common. Because
Examples: Update Data 559
it was not specified as a BY variable, the values for the other common variable,
plant, are replaced by the values in the transaction data set. The value Dewberry in
the master data set is replaced by Dill, which is the last value for plant in the
transaction data set. The PRINT procedure prints the results:
data master;
update master plantNewDupes;
by common;
run;
proc print data=master; run;
CAUTION
Values of the BY variable must be unique for each observation in the master data set. If
the master data set contains duplicate values for the BY variable, then only the first
observation containing the variable is updated and subsequent observations containing
the variable are ignored. SAS writes a warning message to the log when the DATA step
executes.
Output 21.30 PROC PRINT Output for Update Data Sets with Duplicate Values of
the BY Variable
Key Ideas
n The UPDATE statement enables you to update values in one data set (the master
data set) based on the values in another data set (transaction data set). The
UPDATE statement creates a new output data set.
n The BY statement is required and the input data sets must contain an index or be
sorted by the values of the BY variables.
n Because the UPDATE statement creates a new file when it generates output, you
can add, delete, or rename variables when you perform an update.
n By default, missing values in the transaction data set do not replace existing
values in the master data set. You can change this behavior so that missing values
in the transaction data set replace values in the master data set by specifying
NOMISSINGCHECK in the UPDATEMODE= option.
n If the transaction data set contains duplicate values of the BY variable, then the
values from the transaction data set replace the values in the master data set.
560 Chapter 21 / Combining Data
See Also
n “Updating” on page 485
Example Code
In this example, the UPDATE statement is used with the BY statement to update a
master data set based on new values in a transaction data set.
The master data set, master, contains a missing value for the variable plant in the
first observation. Not all of the values of the common BY variable (common) are
included.
The transaction data set, minerals, contains a new variable (mineral), a new value
for the BY variable, common, and missing values for several observations.
The following table shows the input data sets, master and minerals:
master minerals
4 e Eagle Eggplant 4 e . .
5 f Frog Fig 5 f Fennel .
6 g Grape Garnet
The following program updates the master data set based on the values in the
minerals data set and prints the results:
data master;
update master minerals;
by common;
run;
Output 21.31 PROC PRINT Output for Using UPDATE for Processing Unmatched
Observations, Missing Values, and New Variables
Note the following points about the updated master data set:
n The variable mineral was added to the master output data set and is set to
missing for some observations.
n Values in observations 2 and 6 in the transaction data set do not have
corresponding values in the master data set. They are added to the master data
set as new observations.
n The value for plant in observation 4 is not changed to missing even though it is
missing in the transaction data set.
n Three observations in the new data set have updated values for the variable
plant.
If you want the values that are missing in the transaction data set to be updated as
missing in the master output data set, then specify the UPDATEMODE= option in
the UPDATE statement:
data master;
update master minerals updatemode=nomissingcheck;
by common;
run;
proc print data=master;
title "Updated Data Set master";
562 Chapter 21 / Combining Data
In the following PROC PRINT output, the value of plant in observation 5 is set to
missing because it is missing in the transaction data set and the
UPDATEMODE=NOMISSINGCHECK option is in effect.
Output 21.32 PROC PRINT Output for the Updated Master Data Set with Values
Updated to Missing
Key Ideas
n The UPDATE statement enables you to update values in one data set (the master
data set) based on the values in another data set (transaction data set). The
UPDATE statement creates a new output data set.
n The BY statement is required and the input data sets must contain an index or be
sorted by the values of the BY variables.
n Because the UPDATE statement creates a new file when it generates output, you
can add, delete, or rename variables when you perform an update.
n By default, missing values in the transaction data set do not replace existing
values in the master data set. You can change this behavior so that missing values
in the transaction data set replace values in the master data set by specifying
NOMISSINGCHECK in the UPDATEMODE= option.
n If the transaction data set contains duplicate values of the BY variable, then the
values from the transaction data set replace the values in the master data set.
n If an observation in the transaction data set does not have a corresponding
observation in the master data set, then SAS adds an observation to the master
output data set. Observations are matched based on the value of the BY variable.
n If no changes need to be made to an observation in the master data set, then that
observation does not need to be included in the transaction data set.
Example: Modify Data 563
See Also
n “Updating” on page 485
Example Code
In this example, the MODIFY statement is used to update a master data set based
on values contained in a transaction data set. The observations in the transaction
data set are matched to the observations in the master data set by matching the
values of the common variable, partNumber.
The data in this example represents inventory for a warehouse that stores tools
and hardware. Each tool is uniquely identified by its part number. The master data
set, Inventory, holds a record of the warehouse’s inventory. It is updated to reflect
changes when a warehouse receives a new shipment of items. The InventoryAdd
data set is the transaction data set. The transaction data set contains information
about new items that are being added to the inventory (new kinds of tools). The
transaction data set also adds inventory (newStock) to existing items and changes
the price (newPrice) for existing items.
The following table shows the master input data set, Inventory, and the transaction
data set, InventoryAdd:
Inventory
InventoryAdd
To begin this example, first sort and print the Inventory and InventoryAdd data
sets for comparison.
proc sort data=Inventory; by partNumber; run;
proc sort data=InventoryAdd; by partNumber; run;
proc print data=Inventory; title "Inventory"; run;
proc print data=InventoryAdd; title "InventoryAdd"; run;
Note: The SORT procedure is not required when modifying a data set using the
MODIFY statement. The data sets in this example are sorted to better show the
differences between the two data sets.
Example: Modify Data 565
Output 21.33 PROC PRINT Output for the Master Inventory Data Set Sorted by
partNumber
Output 21.34 PROC PRINT Output for the Transaction Data Set InventoryAdd
Sorted by partNumber
Now, modify the master data set based on the new information in the transaction
data set, matching the observations by the unique values of the variable
partNumber:
data Inventory;
modify Inventory InventoryAdd; /*
1 */
by partNumber;
select (_iorc_); /*
2 */
566 Chapter 21 / Combining Data
1 The MODIFY statement loads the data from the master and transaction data
sets. The BY statement matches observations from each data set based on the
unique values of the variable partNumber.
2 If matches for partNumber from the transaction data set are found for
partNumber in the master data set, then the _IORC_ automatic variable is
automatically set to a code of _SOK.
3 The %SYSRC autocall macro checks to see whether the value of _IORC_ is
_SOK. If the value is _SOK, then the SELECT statement executes the first DO
statement block. Because the observation in the transaction data set matches
the observation in the master data set, the values in the observation can be
updated by being replaced.
4 The REPLACE statement updates the master data set by replacing its
observation with the observation from the transaction data set. The REPLACE
statement updates observations 4, 7, and 8, (highlighted in blue in the output)
with new values for stock and price. The stock values are updated based on
the values for newStock in the transaction data set. The price values are
updated based on the values for newPrice in the transaction data set. The
receivedDate values for these observations are not updated because these are
existing items that were received in the past.
Example: Modify Data 567
5 If no matches for partNumber in the transaction data set are found for
partNumber in the master data set, then the _IORC_ automatic variable is
automatically set to a code of _DSENMR, which means that no match was
found. The %SYSRC autocall macro checks to see whether the value of _IORC_
is _DSENMR. If the value is _DSENMR, then the SELECT statement executes the
second DO block. Because the observation in the transaction data set does not
exist in the master data set, the values cannot simply be replaced. An entire
observation is created and added to the master data set.
6 The OUTPUT statement writes the new observation to the master data set. The
OUTPUT statement adds observations 1, 2, and 5 to the master data set (see
the observations highlighted in yellow in the output). The receivedDate values
for these observations are updated based on the returned value for the TODAY
function.
7 If neither condition is met, the OTHERWISE statement executes the last DO
block and the PUT statement writes an error message to the log.
In the output below, the transaction data set contains three new items: hammer,
wrench, and socket. Because some observations do not exist in the master data set
and are being added from the transaction data set, an explicit OUTPUT statement
is needed. For those observations that already exist in the master data set, the
REPLACE statement is needed to update the values for these observations.
The program uses the OUTPUT statement to add observations 1, 2, and 5 to the
master data set, and it uses the REPLACE statement to update observations 4, 7,
and 8 with new values for stock and price.
568 Chapter 21 / Combining Data
Output 21.35 PROC PRINT Output for the Modified Inventory Master Data Set
Sorted by partNumber
Key Ideas
n The MODIFY statement updates the existing data set without creating a new
output data set. Unlike the MODIFY statement, the SET, MERGE, and UPDATE
statements create a new output data set. For more information, see Table 21.2 on
page 489 and Table 21.4 on page 492.
n With MODIFY, you cannot add, delete, rename, or change variables.
n Both the master data set and the transaction data set can have observations with
duplicate values of the BY variables. MODIFY treats the duplicates as follows:
o If duplicate values exist in the master data set, only the first occurrence is
updated.
o If duplicates exist in the transaction data set, the last value of the duplicated
variable overwrites the previous duplicated value. If you specify an
Example: Modify Data 569
See Also
n “Modifying” on page 487
22
Using Indexes
Indexes in SAS
A SAS index is an optional component of a SAS data set that enables SAS to access
observations in the SAS data set quickly and efficiently. The purpose of SAS
indexes is to optimize WHERE-clause processing and to facilitate BY-group
processing.
For information about SAS indexes, see the documents that are listed in the
following table.
SPD Engine “Features That Boost You can use the same language elements as
Processing with the V9 engine to create or use SAS
Performance” in SAS indexes.
Scalable Performance
Indexes are stored differently than with the
Data Engine:
V9 engine. Additional language elements
Reference
for the SPD Engine enable more control
over index usage. In addition, the engine
provides an automatic sort for BY
processing, so an unsorted data set does
not require an index.
572 Chapter 22 / Using Indexes
CAS engine “Indexing” in SAS You can use one of several CAS actions to
Cloud Analytic create an index. The syntax is different
Services: from the V9 engine.
Fundamentals
You can use an index to improve WHERE
processing, especially on very large tables.
23
Using Arrays
Note: Arrays in SAS are different from those in many other programming
languages. In SAS, an array is not a data structure. An array is just a convenient
way of temporarily identifying a group of variables.
array processing
is a method that enables you to perform the same tasks for a series of related
variables.
array reference
is a method to reference the elements of an array.
one-dimensional array
is a simple grouping of variables that, when processed, results in output that can
be represented in simple row format.
multidimensional array
is a more complex grouping of variables that, when processed, results in output
that could have two or more dimensions, such as columns and rows.
n repeating an action
Rules for Referencing Arrays 575
An array definition is in effect only for the duration of the DATA step. If you want to
use the same array in several DATA steps, you must redefine the array in each step.
You can, however, redefine the array with the same variables in a later DATA step
by using a macro variable. A macro variable is useful for storing the variable names
that you need, as shown in this example:
%let list=NC SC GA VA;
data one;
array state{*} &list;
… more SAS statements …
run;
data two;
array state{*} &list;
… more SAS statements …
run;
576 Chapter 23 / Using Arrays
One-Dimensional Array
The following figure is a conceptual representation of two one-dimensional arrays,
Misc and Mday.
Arrays Variables
1 2 3 4 5 6 7 8
MI SC mi s c 1 mi s c 2 mi s c 3 mi s c 4 mi s c 5 mi s c 6 mi s c 7 mi s c 8
1 2 3 4 5 6 7
MDAY md a y 1 md a y 2 md a y 3 md a y 4 md a y 5 md a y 6 md a y 7
Misc contains eight elements, the variables Misc1 through Misc8. To reference the
data in these variables, use the form Misc{n}, where n is the element number in the
array. For example, Misc{6} is the sixth element in the array.
Mday contains seven elements, the variables Mday1 through Mday7. Mday{3} is the
third element in the array.
Two-Dimensional Array
The following figure is a conceptual representation of the two-dimensional array
Expenses.
Syntax for Defining and Referencing an Array 577
First Second
Dimension Dimension
Expense
Categories Days of the Week Total
1 2 3 4 5 6 7 8
Pe r s . Au t o 3 per aut 1 per aut 2 per aut 3 per aut 4 per aut 5 per aut 6 per aut 7 per aut 8
Ai r f a r e 5 ai r l i n1 ai r l i n2 ai r l i n3 ai r l i n4 ai r l i n5 ai r l i n6 ai r l i n7 ai r l i n8
Re g i s t r a t i o n 7 r egf ee1 r egf ee2 r egf ee3 r egf ee4 r egf ee5 r egf ee6 r egf ee7 r egf ee8
Fees
Ot h e r 8 ot her 1 ot her 2 ot her 3 ot her 4 ot her 5 ot her 6 ot her 7 ot her 8
T i p s ( n o n - me a l ) 9 t i ps 1 t i ps 2 t i ps 3 t i ps 4 t i ps 5 t i ps 6 t i ps 7 t i ps 8
Me a l s 10 me a l s 1 me a l s 2 me a l s 3 me a l s 4 me a l s 5 me a l s 6 me a l s 7 me a l s 8
The Expenses array contains ten groups of eight variables each. The ten groups
(expense categories) comprise the first dimension of the array, and the eight
variables (days of the week) comprise the second dimension. To reference the data
in the array variables, use the form Expenses{m,n}, where m is the element number
in the first dimension of the array, and n is the element number in the second
dimension of the array. Expenses{6,4} references the value of dues for the fourth
day (the variable is Dues4).
where
array-name
is a SAS name that identifies the group of variables.
number-of-elements
is the number of variables in the group. You must enclose this value in either
parentheses (), braces {}, or brackets [].
578 Chapter 23 / Using Arrays
$
specifies that the elements in the array are character elements.
length
specifies the length of the elements in the array that have not been previously
assigned a length.
array-elements
is a list of the names of the variables in the group. All variables that are defined
in a given array must be of the same type, either all character or all numeric.
initial-value-list
is a list of the initial values for the corresponding elements in the array.
For complete information, see the “ARRAY Statement” in SAS DATA Step
Statements: Reference.
To reference an array that was previously defined in the same DATA step, use an
Array Reference statement. An array reference has the following form:
array-name {subscript}
where
array-name
is the name of an array that was previously defined with an ARRAY statement in
the same DATA step.
subscript
specifies the subscript, which can be a numeric constant, the name of a variable
whose value is the number, a SAS numeric expression, or an asterisk (*).
Note: Subscripts in SAS are 1-based by default, and not 0-based as they are in
some other programming languages.
For complete information, see the Array Reference statement in the SAS DATA Step
Statements: Reference.
When you define an array, SAS assigns each array element an array reference with
the form array-name{subscript}, where subscript is the position of the variable in the
Processing Simple Arrays 579
list. The following table lists the array reference assignments for the previous
ARRAY statement:
Reference books{1}
Usage books{2}
Introduction books{3}
Later in the DATA step, when you want to process the variables in the array, you
can refer to a variable by either its name or its array reference. For example, the
names Reference and Books{1} are equivalent.
DO index-variable=1 TO number-of-elements-in-array;
… more SAS statements …
END;
To execute the loop as many times as there are variables in the array, specify that
the values of index-variable are 1 TO number-of-elements-in-array. SAS increases
the value of index-variable by 1 before each new iteration of the loop. When the
value exceeds the number-of-elements-in-array, SAS stops processing the loop. By
default, SAS automatically includes index-variable in the output data set. Use a
DROP statement or the DROP= data set option to prevent the index variable from
being written to your output data set.
An iterative DO loop that executes three times and has an index variable named
count has the following form:
do count=1 to 3;
… more SAS statements …
end;
The first time that the loop processes, the value of count is 1; the second time, 2;
and the third time, 3. At the beginning of the fourth iteration, the value of count is
4, which exceeds the specified range and causes SAS to stop processing the loop.
580 Chapter 23 / Using Arrays
DO Statement Description
The following example uses the index variable count as the subscript of array
references inside a DO loop:
array books{3} Reference Usage Introduction;
do count=1 to 3;
if books{count}=. then books{count}=0;
end;
When the value of count is 1, SAS reads the array reference as Books{1} and
processes the IF-THEN statement on Books{1}, which is the variable Reference.
When count is 2, SAS processes the statement on Books{2}, which is the variable
Usage. When count is 3, SAS processes the statement on Books{3}, which is the
variable Introduction.
Processing Simple Arrays 581
n replace the array subscript count with the current value of count for each
iteration of the IF-THEN statement
n locate the variable with that array reference and process the IF-THEN
statement on it
n replace missing values with zero if the condition is true.
The following DATA step defines the array Book and processes it with a DO loop.
options linesize=80 pagesize=60;
data changed(drop=count);
input Reference Usage Introduction;
array book{3} Reference Usage Introduction;
do count=1 to 3;
if book{count}=. then book{count}=0;
end;
datalines;
45 63 113
. 75 150
62 . 98
;
you use the asterisk to designate the number of elements. In the following example,
the array C1Temp references five variables with temperature measures.
array c1temp{*} c1t1 c1t2 c1t3 c1t4 c1t5;
If you specify the number of elements explicitly, you can omit the names of the
variables or array elements in the ARRAY statement. SAS then creates variable
names by concatenating the array name with the numbers 1, 2, 3, and so on. If a
variable name in the series already exists, SAS uses that variable instead of
creating a new one. In the following example, the array c1t references five variables:
c1t1, c1t2, c1t3, c1t4, and c1t5.
array c1t{5};
An array definition is in effect only for the duration of the DATA step. If you want to
use the same array in several DATA steps, you must redefine the array in each step.
You can, however, redefine the array with the same variables in a later DATA step
by using a macro variable. A macro variable is useful for storing the variable names
that you need, as shown in this example:
%let list=NC SC GA VA;
data one;
array state{*} &list;
… more SAS statements …
run;
data two;
array state{*} &list;
… more SAS statements …
run;
Variations on Basic Array Processing 583
DIMn(array-name)
You can also use the DIM function when you specify the number of elements in the
array with an asterisk. Here are some examples of the DIM function:
n do i=1 to dim(days);
n do i=1 to dim4(days) by 2;
n _NUMERIC_
n _ALL_
You can use these variable list names to reference variables that have been
previously defined in the same DATA step. The _CHARACTER_ variable lists
character values only. The _NUMERIC_ variable lists numeric values only. The
_ALL_ variable lists either all character or all numeric values, depending on how you
previously defined the variables.
You can use the _NUMERIC_ variable in your program (for example, you need to
convert currency). In this application, you do not need to know the variable names.
You need only to convert all values to the new currency.
For more information about variable lists, see the “ARRAY Statement” in SAS DATA
Step Statements: Reference.
From right to left, the rightmost dimension represents columns; the next dimension
represents rows. Each position farther left represents a higher dimension. The
following ARRAY statement defines a two-dimensional array with two rows and
Multidimensional Arrays: Creating and Processing 585
five columns. The array contains ten variables: five temperature measures (t1
through t5) from two cities (c1 and c2):
array temprg{2,5} c1t1-c1t5 c2t1-c2t5;
SAS places variables into a multidimensional array by filling all rows in order,
beginning at the upper left corner of the array (known as row-major order). You can
think of the variables as having the following arrangement:
c1t1 c1t2 c1t3 c1t4 c1t5
c2t1 c2t2 c2t3 c2t4 c2t5
To refer to the elements of the array later with an array reference, you can use the
array name and subscripts. The following table lists some of the array references
for the previous example:
c1t1 temprg{1,1}
c1t2 temprg{1,2}
c2t2 temprg{2,2}
c2t5 temprg{2,5}
DO index-variable-1=1 TO number-of-rows;
DO index-variable-2=1 TO number-of-columns;
... more SAS statements ...
END;
END;
An array reference can use two or more index variables as the subscript to refer to
two or more dimensions of an array. Use the following form:
The following example creates an array that contains ten variables- five
temperature measures (t1 through t5) from two cities (c1 and c2). The DATA step
contains two DO loops.
n The outer DO loop (DO I=1 TO 2) processes the inner DO loop twice.
586 Chapter 23 / Using Arrays
n The inner DO loop (DO J=1 TO 5) applies the ROUND function to all the
variables in one row.
For each iteration of the DO loops, SAS substitutes the value of the array element
corresponding to the current values of I and J.
options linesize=80 pagesize=60;
data temps;
array temprg{2,5} c1t1-c1t5 c2t1-c2t5;
input c1t1-c1t5 /
c2t1-c2t5;
do i=1 to 2;
do j=1 to 5;
temprg{i,j}=round(temprg{i,j});
end;
end;
datalines;
89.5 65.4 75.3 77.7 89.3
73.7 87.3 89.9 98.2 35.6
75.8 82.1 98.2 93.5 67.7
101.3 86.5 59.2 35.6 75.7
;
The following data set Temps contains the values of the variables rounded to the
nearest whole number.
The previous example can also use the DIM function to produce the same result:
do
i=1 to dim1(temprg);
do j=1 to dim2(temprg);
temprg{i,j}=round(temprg{i,j});
end;
end;
In the following ARRAY statement, the bounds of the first dimension are 1 and 2
and those of the second dimension are 1 and 5:
array test{2,5} test1-test10;
{<lower-1:>upper-1<,…<lower-n:>upper-n>}
Therefore, you can also write the previous ARRAY statements as follows:
array new{1:4} Jackson Poulenc Andrew Parson;
array test{1:2,1:5} test1-test10;
For most arrays, 1 is a convenient lower bound, so you do not need to specify the
lower bound. However, specifying both the lower and the upper bounds is useful
when the array dimensions have beginning points other than 1.
In the following example, ten variables are named Year76 through Year85. The
following ARRAY statements place the variables into two arrays named First and
Second:
array first{10} Year76-Year85;
array second{76:85} Year76-Year85;
In the first ARRAY statement, the element first{4} is variable Year79, first{7} is
Year82, and so on. In the second ARRAY statement, element second{79} is Year79
and second{82} is Year82.
To process the array names Second in a DO group, make sure that the range of the
DO loop matches the range of the array as follows:
do i=76 to 85;
if second{i}=9 then second{i}=.;
end;
588 Chapter 23 / Using Arrays
LBOUNDn(array-name)
HBOUNDn(array-name)
where
n
is the specified dimension and has a default value of 1.
You can use the LBOUND and HBOUND functions to specify the starting and
ending values of the iterative DO loop to process the elements of the array named
Second:
do i=lbound{second} to hbound{second};
if second{i}=9 then second{i}=.;
end;
In this example, the index variable in the iterative DO statement ranges from 76 to
85.
To process the array named YEARS in an iterative DO loop, make sure that the
range of the DO loop matches the range of the array as follows:
do i=lbound(years) to hbound(years);
if years{i}=99 then years{i}=.;
end;
For this example, the DIM function would return a value of 5, the total count of
elements in the array YEARS. Therefore, if you used the DIM function instead of the
HBOUND function for the upper bound of the array, the statements inside the DO
loop would not have executed.
The following ARRAY statement arranges the variables in an array by decades. The
rows range from 6 through 9, and the columns range from 0 through 9.
array X{6:9,0:9} X60-X99;
In array X, variable X63 is element X{6,3} and variable X89 is element X{8,9}. To
process array X with iterative DO loops, use one of these methods:
n Method 1:
do i=6 to 9;
do j=0 to 9;
if X{i,j}=0 then X{i,j}=.;
end;
end;
n Method 2:
do i=lbound1(X) to hbound1(X);
do j=lbound2(X) to hbound2(X);
if X{i,j}=0 then X{i,j}=.;
end;
end;
Both examples change all values of 0 in variables X60 through X99 to missing. The
first example sets the range of the DO groups explicitly. The second example uses
the LBOUND and HBOUND functions to return the bounds of each dimension of
the array.
590 Chapter 23 / Using Arrays
Examples
data text;
array names{*} $ n1-n5; /* 1 */
array capitals{*} $ c1-c5;
input names{*}; /* 2 */
do i=1 to 5;
capitals{i}=upcase(names{i}); /* 3 */
end;
datalines;
smithers michaels gonzalez hurth frank
;
1 The dollar sign ($) tells SAS to create the elements as character variables. If the
variables have already been declared as character variables, a dollar sign in the
array is not necessary.
2 The INPUT statement reads all the variables in array NAMES.
3 The statement inside the DO loop uses the UPCASE function to change the
values of the variables in array NAMES to uppercase. The statement then stores
the uppercase values in the variables in the CAPITALS array.
data score1(drop=i);
array test{3} t1-t3 (90 80 70); /* 1 */
array score{3} s1-s3;
input id score{*}; /* 2 */
do i=1 to 3;
if score{i}>=test{i} then /* 3 */
do;
NewScore=score{i};
output; /* 4 */
end;
end;
datalines;
1234 99 60 82
5678 80 85 75
;
1 Assign the initial values 90, 80, and 70 to the Test array and assign the variables
s1, s2, and s3 to the Score array.
2 The INPUT statement reads a value for the variable named ID and then reads
values for all the variables in the Score array.
3 If the value of the element in Score is greater than or equal to the value of the
element in Test, the variable NewScore is assigned the value in the element
Score.
4 The OUTPUT statement writes the observation to the SAS data set.
data score2(drop=i);
array test{3} _temporary_ (90 80 70);
array score{3} s1-s3;
input id score{*};
do i=1 to 3;
if score{i}>=test{i} then
do;
NewScore=score{i};
output;
end;
end;
datalines;
1234 99 60 82
5678 80 85 75
;
data sales;
infile datalines;
input Value1 Value2 Value3 Value4;
datalines;
11 56 58 61
22 51 57 61
22 49 53 58
;
data convert(drop=i);
set sales;
array test{*} _numeric_;
do i=1 to dim(test);
test{i} = (test{i}*3);
end;
run;
24
Debugging Errors
processing messages in the SAS log and then fixing your code. You can use the
DATA Step Debugger to detect logic errors in a DATA step during execution.
Syntax Errors
Syntax errors occur when program statements do not conform to the rules of the
SAS language. Here are some examples of syntax errors:
n misspelled SAS keyword
n missing a semicolon
In the following example, the DATA statement is misspelled, and SAS prints a
warning message to the log. Because SAS could interpret the misspelled word, the
program runs and produces output.
date temp; /* 1 */
x=1;
run;
Example Code 24.1 SAS Log: Syntax Error (Misspelled Key Word)
39 date temp;
----
14
WARNING 14-169: Assuming the symbol DATA was misspelled as date.
40 x=1;
41 run;
NOTE: The data set WORK.TEMP has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
42
43 proc print data=temp;
44 run;
NOTE: There were 1 observations read from the data set WORK.TEMP.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Some errors are explained fully by the message that SAS prints in the log. Other
error messages are not as easy to interpret because SAS is not always able to
detect exactly where the error occurred. For example, when you fail to end a SAS
statement with a semicolon, SAS does not always detect the error at the point
where it occurs. The free-format nature of SAS statements (they can begin and end
anywhere) can cause this issue. In the following example, the semicolon at the end
of the DATA statement is missing. SAS prints the word ERROR in the log, identifies
the possible location of the error, prints an explanation of the error, and stops
processing the DATA step.
data temp
x=1;
run;
67 data temp
68 x=1;
-
22
76
ERROR 22-322: Syntax error, expecting one of the following: a name,
a quoted string, (, /, ;, _DATA_, _LAST_, _NULL_.
69 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
70
71 proc print data=temp;
72 run;
NOTE: There were 1 observations read from the data set WORK.TEMP.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Whether subsequent steps are executed depends on which method you use to run
SAS, as well as your operating environment.
Note: You can add these lines to your code to fix unmatched comment tags,
unmatched quotation marks, and missing semicolons:
/* '; * "; */;
quit;
run;
Semantic Errors
Semantic errors occur when the form of the elements in a SAS statement is correct,
but the elements are not valid for that usage. Semantic errors are detected at
compile time and can cause SAS to enter syntax check mode. (For a description of
syntax check mode, see “Syntax Check Mode” on page 606.)
In the following example, SAS detects an invalid reference to the array all at
compile time.
data _null_;
array all{*} x1-x5;
all=3;
datalines;
1 1.5
. 3
2 4.5
3 2 7
3 . .
;
run;
Example Code 24.3 SAS Log: Semantic Error (invalid Reference to an Array)
81 data _null_;
82 array all{*} x1-x5;
ERROR: invalid reference to the array all.
83 all=3;
84 datalines;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.15 seconds
cpu time 0.01 seconds
90 ;
91
92 run;
93 proc printto; run;
Here is another example of a semantic error that occurs at compile time. In this
DATA step, the libref SomeLib has not been previously assigned in a LIBNAME
statement.
data test;
set somelib.old;
run;
Example Code 24.4 SAS Log: Semantic Error (Libref Not Previously Assigned)
SAS writes a message like "NOTE: SAS went to a new line when input statement
reached past the end of a line." when a semantic error occurs at execution time.
This note is written to the SAS log when FLOWOVER is used and all the variables in
the INPUT statement cannot be fully read.
is not initialized and the system option VARINITCHK=ERROR has been issued, SAS
stops processing a DATA step and writes an error message to the SAS log.
Execution-Time Errors
Definition
Execution-time errors occur when SAS executes a program that processes data
values. Most execution-time errors produce warning messages or notes in the SAS
log but allow the program to continue executing. The location of an execution-time
error is usually given as line and column numbers in a note or error message.
Note: When you run SAS in noninteractive mode, more serious errors can cause
SAS to enter syntax check mode and stop processing the program.
Out-of-Resources Condition
An execution-time error can also occur when you encounter an out-of-resources
condition, such as a full disk, or insufficient memory for a SAS procedure to
complete. When these conditions occur, SAS attempts to find resources for current
use. For example, SAS might ask the user for permission to perform these actions in
out-of-resource conditions:
n Delete temporary data sets that might no longer be needed.
your operating environment. For more information, see “CLEANUP System Option”
in SAS System Options: Reference and in the SAS documentation for your operating
system.
Examples
In the following example, an execution-time error occurs when SAS uses data
values from the second observation to perform the division operation in the
assignment statement. Division by 0 is an invalid mathematical operation and
causes an execution-time error.
data inventory;
input Item $ 1-14 TotalCost 15-20
UnitsOnHand 21-23;
UnitCost=TotalCost/UnitsOnHand;
datalines;
Hammers 440 55
Nylon cord 35 0
Ceiling fans 1155 30
;
123 ;
124
125 proc print data=inventory;
126 format TotalCost dollar8.2 UnitCost dollar8.2;
127 run;
SAS executes the entire step, assigns a missing value for the variable UnitCost in
the output, and writes the following to the SAS log:
n a note that describes the error
n the contents of the program data vector when the error occurred
Note that the values that are listed in the program data vector include the _N_ and
_ERROR_ automatic variables. These automatic variables are assigned temporarily
to each observation and are not stored with the data set.
172 ;
173
174 proc print data=test;
175 run;
NOTE: No variables in data set WORK.TEST.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Data Errors
Definition
Data errors occur when some data values are not appropriate for the SAS
statements that you have specified in the program. For example, if you define a
variable as numeric, but the data value is actually character, SAS generates a data
error. SAS detects data errors during program execution and continues to execute
the program, and does the following:
n writes an invalid data note to the SAS log.
n prints the input line and column numbers that contain the invalid value in the
SAS log. Unprintable characters appear in hexadecimal. To help determine
column numbers, SAS prints a rule line above the input line.
n prints the observation under the rule line.
In this example, a character value in the Number variable results in a data error
during program execution:
data age;
input Name $ Number;
datalines;
Sue 35
Joe xx
Steve 22
;
The SAS log shows that there is an error in line 8, position 5–6 of the program.
Definitions of Error Types in SAS 605
240 ;
241
242 proc print data=age;
243 run;
You can also use the INVALIDDATA= system option to assign a value to a variable
when your program encounters invalid data. For more information, see the
INVALIDDATA= system option in SAS System Options: Reference.
n input x ? 10-12;
_error_=0;
Macro-related Errors
Several types of macro-related errors exist:
n macro compile time and macro execution-time errors, generated when you use
the macro facility itself
n errors in the SAS code produced by the macro facility
For more information about macros, see SAS Macro Language: Reference.
In syntax check mode, SAS internally sets the OBS= option to 0 and the REPLACE/
NOREPLACE option to NOREPLACE. When these options are in effect, SAS acts as
follows:
n reads the remaining statements in the DATA step or PROC step
n creates the descriptor portion of any output data sets that are specified in
program statements
n does not write any observations to new data sets that SAS creates
n does not execute most of the subsequent DATA steps or procedures in the
program (exceptions include PROC DATASETS and PROC CONTENTS)
Error Processing in SAS 607
Note: Any data sets that are created after SAS has entered syntax check mode do
not replace existing data sets with the same name.
When syntax checking is enabled, SAS underlines the point where it detects a
syntax or semantic error in a DATA step and identifies the error by number. SAS
then enters syntax check mode and remains in this mode until the program finishes
executing. When SAS enters syntax check mode, all DATA step statements and
PROC step statements are validated.
You can place an OPTIONS statement that contains the SYNTAXCHECK system
option or the DMSSYNCHK system option before the step that you want to apply it
to. If you place the OPTIONS statement inside a step, then SYNTAXCHECK or
DMSSYNCHK does not take effect until the beginning of the next step.
A labeled code section is the SAS code that begins with label: outside of a DATA or
PROC step and ends with the RUN statement that precedes the next label: that is
outside of a DATA or PROC step,. Labels must be unique. Consider using labeled
code sections when you want to group DATA or PROC steps that might need to be
grouped together because the data for one is dependent on the other.
608 Chapter 24 / Debugging Errors
The following example program has two labeled code sections. The first labeled
code section begins with the label readSortData: and ends with the run;
statement for proc sort data=mylib.mydata;. The second labeled code section
starts with the label report: and ends with the run; statements for proc report
data=mylib.mydata;.
readSortData:
data mylib.mydata;
...more sas code...
run;
report:
proc report data=mylib.mydata;
...more sas code...;
run;
endReadSortReport:
Note: The use of label: in checkpoint mode and restart mode is valid only outside
of a DATA or PROC statement. Checkpoint mode and restart mode for labeled code
sections are not valid for labels within a DATA step or macros.
Checkpoint mode and restart mode can be enabled for either DATA and PROC
steps or for labeled code sections, but not both simultaneously. To use checkpoint
mode and restart mode on a step-by-step basis, use the step checkpoint mode and
the step restart mode. To use checkpoint mode and restart mode based on groups
of code sections, use the label checkpoint mode and the label restart mode. Each
group of code is identified by a unique label. If you use labels, all steps in a SAS
program must belong to a labeled code section.
When checkpoint mode is enabled, SAS records information about DATA and PROC
steps or labeled code sections in a checkpoint library. When a batch program
terminates prematurely, you can resubmit the program in restart mode to complete
execution. In restart mode, global statements are re-executed, macro definitions
are recompiled, and macros are re-executed.. SAS reads the data in the checkpoint
library to determine which steps or labeled code sections completed. Program
execution resumes with the step or the label that was executing when the failure
occurred.
The checkpoint-restart data contains information only about the DATA and PROC
steps or the labeled code sections that completed and the step or labeled code
sections that did not complete. The checkpoint-restart data does not contain the
following information:
n information about macro variables and macro definitions
n information that might have been processed in the step or labeled code section
that did not complete
Error Processing in SAS 609
Note: Checkpoint mode is not valid for batch programs that contain the DM
statement to submit commands to SAS. If checkpoint mode is enabled and SAS
encounters a DM statement, checkpoint mode is disabled and the checkpoint
catalog entry is deleted.
As a best practice, if you use labeled code sections, add a label at the end of your
program. When the program completes successfully, the label is recorded in the
checkpoint-restart data. If the program is submitted again in restart mode, SAS
knows that the program has already completed successfully.
If a DATA or PROC step must be re-executed, you can add the global statement
CHECKPOINT EXECUTE_ALWAYS immediately before the step. This statement
tells SAS to always execute the following step without considering the checkpoint-
restart data. It is applicable only to the step that follows the statement. For more
information, see “CHECKPOINT EXECUTE_ALWAYS Statement” in SAS Global
Statements: Reference.
You enable checkpoint mode and restart mode for DATA and PROC steps by using
system options when you start the batch program in SAS.
n STEPCHKPT system option enables checkpoint mode, which indicates to SAS
to record checkpoint-restart data
n STEPCHKPTLIB system option identifies a user-specified checkpoint-restart
library
n STEPRESTART system option enables restart mode, ensuring that execution
resumes with the DATA or PROC step indicated by the checkpoint-restart
library.
You enable checkpoint mode and the restart mode for labeled code sections by
using these system options when you start the batch program in SAS:
n LABELCHKPT system option enables checkpoint mode for labeled code
sections, which indicates to SAS to record checkpoint-restart data.
n LABELCHKPTLIB system option identifies a user-specified checkpoint-restart
library
n LABELRESTART system option enables restart mode, ensuring that execution
resumes with the labeled code section indicated by the checkpoint-restart
library.
If you use the Work library as your checkpoint-restart library, you can use the
CHKPTCLEAN system option to have the files in the Work library erased after a
successful execution of your batch program.
For information, see the following system options in SAS System Options:
Reference:
n “STEPCHKPT System Option” in SAS System Options: Reference
The labels for labeled code sections must be unique. If SAS enters restart mode for
a label that is a duplicate label, SAS starts at the first label. The code between the
duplicate labels might rerun needlessly.
Once the batch program has been modified, you start the program using the
appropriate system options:
Error Processing in SAS 611
n For checkpoint-restart data that is saved in the Work library, start a batch SAS
session that specifies these system options:
o SYSIN, if required in your operating environment, names the batch program.
o STEPCHKPT or LABELCHKPT enables checkpoint mode.
o NOWORKTERM saves the Work library when SAS ends.
o NOWORKINIT does not initialize the Work library when SAS starts.
o ERRORCHECK STRICT puts SAS in syntax-check mode when an error occurs
in the LIBNAME, FILENAME, %INCLUDE, and LOCK statements.
o ERRORABEND specifies whether SAS terminates for most errors.
o CHKPTCLEAN specifies whether to erase files in the Work library and delete
the Work library if the batch program runs successfully.
In the Windows operating environment, the following SAS command starts a
batch program in checkpoint mode using the Work library as the checkpoint-
restart library:
sas -sysin 'c:\mysas\myprogram.sas'-stepchkpt -noworkterm -noworkinit
-errorcheck strict -errorabend -chkptclean
n CHKPTCLEAN specifies whether to erase files in the Work library if the batch
program runs successfully.
To resubmit a batch SAS session using the checkpoint-restart data that is saved in
a user-specified library, include these system options when SAS starts:
n SYSIN, if required in you operating environment, names the batch program.
n NOWORKINIT does not initialize the Work library when SAS starts.
BYERR
specifies whether SAS produces errors when the SORT procedure attempts to
process a _NULL_ data set.
CHKPTCLEAN
in checkpoint mode or reset mode, specifies whether to erase files in the Work
directory if a batch program executes successfully.
DKRICOND=
specifies the level of error detection to report when a variable is missing from
an input data set during the processing of a DROP=, KEEP=, and RENAME= data
set option.
DKROCOND=
specifies the level of error detection to report when a variable is missing from
an output data set during the processing of a DROP=, KEEP=, and RENAME=
data set option.
DSNFERR
when a SAS data set cannot be found, specifies whether SAS issues an error
message.
ERRORABEND
specifies whether SAS responds to errors by terminating.
ERRORCHECK=
specifies whether SAS enters syntax-check mode when errors are found in the
LIBNAME, FILENAME, %INCLUDE, and LOCK statements.
ERRORS=
specifies the maximum number of observations for which SAS issues complete
error messages.
FMTERR
when a variable format cannot be found, specifies whether SAS generates an
error or continues processing.
INVALIDDATA=
specifies the value that SAS assigns to a variable when invalid numeric data is
encountered.
LABELCHKPT
specifies whether SAS checkpoint-restart data is to be recorded for a batch
program that contains labeled code sections.
LABELCHKPTLIB
specifies the libref of the library where checkpoint-restart data is saved for
labeled code sections.
LABELRESTART
specifies whether to execute a batch program by using checkpoint-restart data
for labeled code sections.
MERROR
specifies whether SAS issues a warning message when a macro-like name does
not match a macro keyword.
QUOTELENMAX
if a quoted string exceeds the maximum length allowed, specifies whether SAS
writes a warning message to the SAS log.
614 Chapter 24 / Debugging Errors
SERROR
specifies whether SAS issues a warning message when a macro variable
reference does not match a macro variable.
STEPCHKPT
specifies whether checkpoint-restart data is to be recorded for a batch program.
STEPCHKPTLIB=
specifies the libref of the library where checkpoint-restart data is saved.
STEPRESTART
specifies whether to execute a batch program by using checkpoint-restart data.
VARINITCHK=
specifies whether to stop or continue processing a DATA step when a variable is
not initialized. You can also specify the type of message that is written to the
SAS log.
VNFERR
specifies whether SAS issues an error or warning when a BY variable exists in
one data set but not another data set. SAS issues only these errors or warnings
when processing the SET, MERGE, UPDATE, or MODIFY statements.
For more information about SAS system options, see SAS System Options:
Reference.
n the SYSERR automatic macro variable to detect major system errors, such as
out of memory or failure of the component system
n log control options:
MSGLEVEL=
controls the level of detail in messages that are written to the SAS log.
PRINTMSGLIST
controls the printing of extended lists of messages to the SAS log.
SOURCE
controls whether SAS writes source statements to the SAS log.
SOURCE2
controls whether SAS writes source statements included by %INCLUDE to
the SAS log.
Error-Checking Tools
Two tools have been created to make error checking easier when you use the
MODIFY statement or the SET statement with the KEY= option to process SAS
data sets:
n _IORC_ automatic variable
_IORC_ is created automatically when you use the MODIFY statement or the SET
statement with KEY=. The value of _IORC_ is a numeric return code that indicates
the status of the I/O operation from the most recently executed MODIFY or SET
statement with KEY=. Checking the value of this variable enables you to detect
616 Chapter 24 / Debugging Errors
abnormal I/O conditions and to direct execution down specific code paths instead
of having the application terminate abnormally. For example, if the KEY= variable
value does match between two observations, you might want to combine them and
output an observation. If they do not match, however, you might want to only write
a note to the log.
Because the values of the _IORC_ automatic variable are internal and subject to
change, the SYSRC macro was created to enable you to test for specific I/O
conditions while protecting your code from future changes in _IORC_ values. When
you use SYSRC, you can check the value of _IORC_ by specifying one of the
mnemonics listed in the following table.
Table 24.2 Most Common Mnemonic Values of _IORC_ for DATA Step Processing
Overview
This example shows how to prevent an unexpected condition from terminating the
DATA step. The goal is to update a master data set with new information from a
transaction data set. This application assumes that there are no duplicate values
for the common variable in either data set.
Note: This program works as expected only if the master and transaction data sets
contain no consecutive observations with the same value for the common variable.
For an explanation of the behavior of MODIFY with KEY= when duplicates exist,
see the MODIFY statement in SAS DATA Step Statements: Reference.
1 1 10 1 4 14
2 2 20 2 6 16
3 3 30 3 2 12
4 4 40
5 5 50
Original Program
The objective is to update the Master data set with information from the
Transaction data set. The program reads Transaction sequentially. Master is read
directly, not sequentially, using the MODIFY statement and the KEY= option. Only
observations with matching values for PartNumber, which is the KEY= variable, are
read from Master.
618 Chapter 24 / Debugging Errors
data master; 1
set transaction; 2
modify master key=PartNumber; 3
Quantity = Quantity + AddQuantity; 4
run;
Resulting Log
This program has correctly updated one observation but it stopped when it could
not find a match for PartNumber value 6. The following lines are written to the SAS
log:
ERROR: No matching observation was found in Master data set.
PartNumber=6 AddQuantity=16 Quantity=70 _ERROR_=1
_IORC_=1230015 _N_=2
NOTE: The SAS System stopped processing this step because
of errors.
NOTE: The data set WORK.MASTER has been updated. There were
1 observations rewritten, 0 observations added and 0
observations deleted.
Revised Program
The objective is to apply two updates and one addition to Master. This action
prevents the DATA step from stopping when it does not find a match in Master for
the PartNumber value 6 in Transaction. By adding error checking, this DATA step is
allowed to complete normally and produce a correctly revised version of Master.
This program uses the _IORC_ automatic variable and the SYSRC autocall macro in
a SELECT group to check the value of the _IORC_ variable. If a match is found, the
program executes the appropriate code.
data master; 1
set transaction; 2
modify master key=PartNumber; 3
select(_iorc_); 4
when(%sysrc(_sok)) do;
Quantity = Quantity + AddQuantity;
replace;
end;
when(%sysrc(_dsenom)) do;
Quantity = AddQuantity;
_error_ = 0;
output;
end;
otherwise do;
put 'ERROR: Unexpected value for _IORC_= ' _iorc_;
put 'Program terminating. DATA step iteration # ' _n_;
put _all_;
stop;
end;
end;
run;
Resulting Log
The DATA step executed without error and observations were appropriately
updated and added. The following lines are written to the SAS log:
NOTE: The data set WORK.MASTER has been updated. There were
2 observations rewritten, 1 observations added and 0
observations deleted.
Overview
This example shows how important it is to use error checking on all statements
that use the KEY= option when reading data.
Master ORDER
1 1 10 1 2
2 2 20 2 4
3 3 30 3 1
4 4 40 4 3
5 5 50 5 8
6 5
7 6
Description
1 4 Nuts
2 3 Bolts
3 2 Screws
4 6 Washers
The program reads the Order data set sequentially and then uses SET with the
KEY= option to read the Master and Description data sets directly. This reading is
based on the key value of PartNumber. When a match occurs, an observation that
contains all the necessary information for each value of PartNumber in Order is
written. This first attempt at a solution uses error checking for only one of the two
SET statements that use KEY= to read a data set.
data combine; 1
length PartDescription $ 15;
set order; 2
set description key=PartNumber; 2
set master key=PartNumber; 2
select(_iorc_); 3
when(%sysrc(_sok)) do;
output;
end;
when(%sysrc(_dsenom)) do;
PartDescription = 'No description';
_error_ = 0;
output;
end;
otherwise do;
put 'ERROR: Unexpected value for _IORC_= ' _iorc_;
put 'Program terminating.';
622 Chapter 24 / Debugging Errors
put _all_;
stop;
end;
end;
run;
Resulting Log
This program creates an output data set but executes with one error. The following
lines are written to the SAS log:
PartNumber=1 PartDescription=Nuts Quantity=10 _ERROR_=1
_IORC_=0 _N_=3
PartNumber=5 PartDescription=No description Quantity=50
_ERROR_=1 _IORC_=0 _N_=6
NOTE: The data set WORK.COMBINE has 7 observations and 3 variables.
2 4 Nuts 40
3 1 Nuts 10
4 3 Bolts 30
5 8 No description 30
6 5 No description 50
7 6 No description 50
Revised Program
To create an accurate output data set, this example performs error checking on
both SET statements that use the KEY= option:
data combine(drop=Foundes); 1
length PartDescription $ 15;
set order; 2
Foundes = 0; 3
set description key=PartNumber; 4
select(_iorc_); 5
when(%sysrc(_sok)) do;
Foundes = 1;
end;
when(%sysrc(_dsenom)) do;
PartDescription = 'No description';
_error_ = 0;
end;
otherwise do;
put 'ERROR: Unexpected value for _IORC_= ' _iorc_;
put 'Program terminating. Data set accessed is Description';
put _all_;
_error_ = 0;
stop;
end;
end;
set master key=PartNumber; 6
select(_iorc_); 7
when(%sysrc(_sok)) do;
output;
end;
when(%sysrc(_dsenom)) do;
if not Foundes then do;
_error_ = 0;
put 'WARNING: PartNumber ' PartNumber 'is not in'
' Description or Master.';
end;
else do;
Quantity = 0;
_error_ = 0;
output;
end;
end;
otherwise do;
put 'ERROR: Unexpected value for _IORC_= ' _iorc_;
put 'Program terminating. Data set accessed is Master';
put _all_;
624 Chapter 24 / Debugging Errors
_error_ = 0;
stop;
end;
end; /* ends the SELECT group */
run;
Resulting Log
The DATA step executed without error. Six observations were correctly created and
the following message was written to the log:
WARNING: PartNumber 8 is not in Description or Master.
NOTE: The data set WORK.COMBINE has 6 observations
and 3 variables.
Examples 625
1 2 Screws 20
2 4 Nuts 40
3 1 No description 10
4 3 Bolts 30
5 5 No description 50
6 6 Washers 0
Examples
Example Code
The following example shows how to invoke the DATA step debugger. The DEBUG
option is specified in the DATA statement after the font slash (/).
data mydata2 / debug;
set mydata1;
run;
Key Ideas
n The DATA step debugger is a set of commands that enables you to debug logic
errors in the SAS DATA step.
n To use the DATA step debugger, specify the DEBUG option in the DATA statement
n You can use the DATA step debugger to debug logic errors in the SAS DATA step.
626 Chapter 24 / Debugging Errors
n The DATA step debugger enables you to issue commands to execute DATA step
statements one by one and then pause to display the resulting variables' values in
a window.
n The DATA step debugger is not supported for a DATA step that is running in CAS.
See Also
n Using the DATA Step Debugger: Examples
n DEBUG option
n DATA statement
Example Code
This example shows how to prevent an unexpected condition from terminating the
DATA step. The goal is to update a master data set with new information from a
transaction data set. This application assumes that there are no duplicate values
for the common variable in either data set.
Note: This program works as expected only if the master and transaction data sets
contain no consecutive observations with the same value for the common variable.
For an explanation of the behavior of MODIFY with KEY= when duplicates exist,
see the MODIFY statement in SAS DATA Step Statements: Reference.
The Transaction data set contains three observations: two updates to information
in Master and a new observation about PartNumber value 6 that needs to be
added. Master is indexed on PartNumber. There are no duplicate values of
PartNumber in Master or Transaction. The following shows the Master and the
Transaction input data sets:
Master Transaction
1 1 10 1 4 14
2 2 20 2 6 16
3 3 30 3 2 12
Examples 627
4 4 40
5 5 50
The objective is to update the Master data set with information from the
Transaction data set. The program reads Transaction sequentially. Master is read
directly, not sequentially, using the MODIFY statement and the KEY= option. Only
observations with matching values for PartNumber, which is the KEY= variable, are
read from Master.
data master; /* 1 */
set transaction; /* 2 */
modify master key=PartNumber; /* 3 */
Quantity = Quantity + AddQuantity; /* 4 */
run;
This program has correctly updated one observation but it stopped when it could
not find a match for PartNumber value 6. The following lines are written to the SAS
log:
ERROR: No matching observation was found in Master data set.
PartNumber=6 AddQuantity=16 Quantity=70 _ERROR_=1
_IORC_=1230015 _N_=2
NOTE: The SAS System stopped processing this step because
of errors.
NOTE: The data set WORK.MASTER has been updated. There were
1 observations rewritten, 0 observations added and 0
observations deleted.
Resulting Data Set: The Master file was incorrectly updated. The updated master
has five observations. One observation was updated correctly, a new one was not
added, and a second update was not made. The following shows the incorrectly
updated Master data set:
Master
The objective is to apply two updates and one addition to Master. This action
prevents the DATA step from stopping when it does not find a match in Master for
the PartNumber value 6 in Transaction. By adding error checking, this DATA step is
allowed to complete normally and produce a correctly revised version of Master.
This program uses the _IORC_ automatic variable and the SYSRC autocall macro in
a SELECT group to check the value of the _IORC_ variable. If a match is found, the
program executes the appropriate code.
628 Chapter 24 / Debugging Errors
data master; /* 1 */
set transaction; /* 2 */
modify master key=PartNumber; /* 3 */
select(_iorc_); /* 4 */
when(%sysrc(_sok)) do;
Quantity = Quantity + AddQuantity;
replace;
end;
when(%sysrc(_dsenom)) do;
Quantity = AddQuantity;
_error_ = 0;
output;
end;
otherwise do;
put 'ERROR: Unexpected value for _IORC_= ' _iorc_;
put 'Program terminating. DATA step iteration # ' _n_;
put _all_;
stop;
end;
end;
run;
Overview
It is important it is to use error checking on all statements that use the KEY= option
when reading data.
Examples 629
Master ORDER
1 1 10 1 2
2 2 20 2 4
3 3 30 3 1
4 4 40 4 3
5 5 50 5 8
6 5
7 6
Description
1 4 Nuts
2 3 Bolts
3 2 Screws
4 6 Washers
The program reads the Order data set sequentially and then uses SET with the
KEY= option to read the Master and Description data sets directly. This reading is
based on the key value of PartNumber. When a match occurs, an observation that
contains all the necessary information for each value of PartNumber in Order is
written. This first attempt at a solution uses error checking for only one of the two
SET statements that use KEY= to read a data set.
data combine; /* 1 */
length PartDescription $ 15;
set order; /* 2 */
set description key=PartNumber; /* 2 */
set master key=PartNumber; /* 2 */
select(_iorc_); /* 3 */
when(%sysrc(_sok)) do;
630 Chapter 24 / Debugging Errors
output;
end;
when(%sysrc(_dsenom)) do;
PartDescription = 'No description';
_error_ = 0;
output;
end;
otherwise do;
put 'ERROR: Unexpected value for _IORC_= ' _iorc_;
put 'Program terminating.';
put _all_;
stop;
end;
end;
run;
Resulting Log
This program creates an output data set but executes with one error. The following
lines are written to the SAS log:
Revised Program
To create an accurate output data set, this example performs error checking on
both SET statements that use the KEY= option:
data combine(drop=Foundes); /* 1 */
length PartDescription $ 15;
set order; /* 2 */
Foundes = 0; /* 3 */
set description key=PartNumber; /* 4 */
select(_iorc_); /* 5 */
when(%sysrc(_sok)) do;
Foundes = 1;
end;
when(%sysrc(_dsenom)) do;
PartDescription = 'No description';
_error_ = 0;
end;
otherwise do;
put 'ERROR: Unexpected value for _IORC_= ' _iorc_;
put 'Program terminating. Data set accessed is Description';
put _all_;
_error_ = 0;
stop;
end;
end;
set master key=PartNumber; /* 6 */
select(_iorc_); /* 7 */
when(%sysrc(_sok)) do;
output;
end;
when(%sysrc(_dsenom)) do;
632 Chapter 24 / Debugging Errors
Resulting Log
The DATA step executed without error. Six observations were correctly created and
the following message was written to the log:
WARNING: PartNumber 8 is not in Description or Master.
NOTE: The data set WORK.COMBINE has 6 observations
and 3 variables.
1 2 Screws 20
2 4 Nuts 40
3 1 No description 10
4 3 Bolts 30
5 5 No description 50
6 6 Washers 0
Example Code
Depending on the type and severity of the error, the method that you use to run
SAS, and your operating environment, SAS either stops program processing or flags
errors and continues processing. SAS continues to check individual statements in
procedures after it finds certain types of errors. In some cases SAS can detect
multiple errors in a single statement and might issue more error messages for a
given situation. This is likely to occur if the statement containing the error creates
an output SAS data set.
data temporary;
Item1=4;
run;
276
277 proc print data=temporary;
ERROR: Variable ITEM2 not found.
ERROR: Variable ITEM3 not found.
278 var Item1 Item2 Item3;
279 run;
NOTE: The SAS System stopped processing this step because of
errors.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.52 seconds
cpu time 0.00 seconds
SAS displays two error messages, one for the variable Item2 and one for the
variable Item3.
635
25
Optimizing System Performance
The following output shows an example of the FULLSTIMER output in the SAS log,
as produced in a UNIX operating environment.
Collecting and Interpreting Performance Statistics 637
Example Code 25.1 Sample Results of Using the FULLSTIMER Option in a UNIX Operating
Environment
The STIMER option reports a subset of the FULLSTIMER statistics. The following
example shows STIMER output in the SAS log.
Example Code 25.2 Sample Results of Using the STIMER Option in a UNIX Operating
Environment
To do this, you can modify your SAS programs to process only the necessary
variables and observations by:
n using WHERE processing
You can also modify your programs to reduce the number of times it processes the
data internally by:
n creating SAS data sets
n using indexes
You can reduce the number of data accesses by processing more data each time a
device is accessed by:
n setting the ALIGNSASIOFILES, BUFNO=, BUFSIZE=, CATCACHE=,
COMPRESS= , DATAPAGESIZE=. STRIPESIZE=, UBUFNO=, and UBUFSIZE=
system options
n using the SASFILE global statement to open a SAS data set and allocate enough
buffers to hold the entire data set in memory
When using SAS DATA step views, you can improve performance by:
n specifying the VBUFSIZE= system option
Note: Sometimes you might be able to use more than one method, making your
SAS job even more efficient.
For example, the following DATA step creates the data set Seatbelt. This data set
contains only those observations from the Auto.Survey data set for which the value
of Seatbelt is YES. The new data set is then printed.
libname auto 'SAS-library';
data seatbelt;
set auto.survey;
if seatbelt='yes';
run;
However, you can get the same output from the PROC PRINT step without creating
a data set if you use a WHERE statement in the PRINT procedure, as in the
following example:
proc print data=auto.survey;
where seatbelt='yes';
run;
The WHERE statement can save resources by eliminating the number of times that
you process the data. In this example, you might be able to use less time and
memory by eliminating the DATA step. Also, you use less I/O because there is no
intermediate data set. Note that you cannot use a WHERE statement in a DATA
step that reads raw data.
The extent of savings that you can achieve depends on many factors, including the
size of the data set. It is recommended that you test your programs to determine
the most efficient solution. For more information, see “WHERE Expressions” on
page 428.
SAS DATA Step Statements: Reference and “KEEP Statement” in SAS DATA Step
Statements: Reference.
Another consideration involves whether you are using data sets created with
previous releases of SAS. If you frequently process data sets created with previous
releases, it is sometimes more efficient to convert that data set to a new one by
creating it in the most recent version of SAS. See “Cross-Release Compatibility and
Migration” in SAS Language Reference: Concepts for more information.
Using Indexes
An index is an optional file that you can create for a SAS data file to provide direct
access to specific observations. The index stores values in ascending value order
for a specific variable or variables and includes information as to the location of
those values within observations in the data file. In other words, an index enables
you to locate an observation by the value of the indexed variable.
Techniques for Optimizing I/O 641
For information about optimizing system performance with SAS views, see “Setting
VBUFSIZE= and OBSBUF= for SAS DATA Step Views” on page 644.
The following statement does not explicitly specify an engine. In the output, notice
the Note about mixed engine types that is generated:
/* Engine not specified. */
642 Chapter 25 / Optimizing System Performance
Example Code 25.3 SAS Log Output from the LIBNAME Statement
NOTE: Directory for library FRUITS contains files of mixed engine types.
NOTE: Libref FRUITS was successfully assigned as follows:
Engine: V9
Physical Name: SAS-library
z/OS Specifics: In the z/OS operating environment, you do not need to specify an
engine for certain types of libraries.
See Chapter 13, “SAS Engines” for more information about SAS engines.
Note: You can also use the CBUFNO= system option to control the number of
extra page buffers to allocate for each open SAS catalog.
For more information, see “BUFNO= System Option” in SAS System Options:
Reference and the SAS documentation for your operating environment.
BUFSIZE=
When the BASE engine creates a data set, it uses the BUFSIZE= option to set
the permanent page size for the data set. The page size is the amount of data
that can be transferred for an I/O operation to one buffer. The default value for
BUFSIZE= is determined by your operating environment. Note that the default is
Techniques for Optimizing I/O 643
Whether you use your operating environment's default value or specify a value,
the engine always writes complete pages regardless of how full or empty those
pages are.
If you know that the total amount of data is going to be small, you can set a
small page size with the BUFSIZE= option, so that the total data set size
remains small and you minimize the amount of wasted space on a page. In
contrast, if you know that you are going to have many observations in a data
set, you should optimize BUFSIZE= so that as little overhead as possible is
needed. Note that each page requires some additional overhead.
Large data sets that are accessed sequentially benefit from larger page sizes
because sequential access reduces the number of system calls that are required
to read the data set. Note that because observations cannot span pages,
typically there is unused space on a page.
“Calculating Data Set Size” on page 649 discusses how to estimate data set
size.
For more information, see “BUFSIZE= System Option” in SAS System Options:
Reference and the SAS documentation for your operating environment.
CATCACHE=
SAS uses this option to determine the number of SAS catalogs to keep open at
one time. Increasing its value can use more memory, although this might be
warranted if your application uses catalogs that are needed relatively soon by
other applications. (The catalogs closed by the first application are cached and
can be accessed more efficiently by subsequent applications.)
For more information, see “CATCACHE= System Option” in SAS System Options:
Reference and the SAS documentation for your operating environment.
COMPRESS=
One further technique that can reduce I/O processing is to store your data as
compressed data sets by using the COMPRESS= data set option. However,
storing your data this way means that more CPU time is needed to decompress
the observations as they are made available to SAS. But if your concern is I/O
and not CPU usage, compressing your data might improve the I/O performance
of your application.
For more information, see “COMPRESS= System Option” in SAS System Options:
Reference.
DATAPAGESIZE=
Beginning with SAS 9.4, the optimal buffer page size is increased to improve I/O
performance. The increase in page size might increase the size of the data set or
utility file. If you find that the current optimization processes are not ideal for
your SAS session, you can use DATAPAGESIZE=COMPAT93 to use the
optimization processes that were used prior to SAS 9.4.
STRIPESIZE=
When data is stored in a RAID (Redundant Array of Independent Disks) device,
you can use the STRIPESIZE= system option to set the I/O buffer size for a
directory to be the size of a RAID stripe. SAS data sets or utility files that are
created in the directory have a page size that matches the RAID stripe size.
Using this option can improve the performance of individual disk.
For more information, see “STRIPESIZE= System Option” in SAS System Options:
Reference.
UBUFNO=
The UBUFNO= system option sets the number of utility buffers that SAS uses
to process data sets.
For more information, see “UBUFNO= System Option” in SAS System Options:
Reference.
UBUFSIZE=
The UBUFSIZE= option sets the page size for utility files that SAS uses to
process data sets. You can improve the number of disk accesses when values of
the UBUFSIZE= option and the BUFSIZE= option are the same.
For more information, see “UBUFSIZE= System Option” in SAS System Options:
Reference.
VBUFSIZE=
The VBUFSIZE= option set the size of the view buffer. View performance can be
improved by setting the size of the view buffer large enough to hold more
generated observations. For more information, see “VBUFSIZE= System Option”
in SAS System Options: Reference and “Setting VBUFSIZE= and OBSBUF= for
SAS DATA Step Views” on page 644.
For more information about the VBUFSIZE= system option, see “VBUFSIZE=
System Option” in SAS System Options: Reference. For more information about the
OBSBUF= data set option, see “OBSBUF= Data Set Option” in SAS Data Set
Options: Reference.
Techniques for Optimizing I/O 645
If your SAS program consists of steps that read a SAS data set multiple times and
you have an adequate amount of memory so that the entire file can be held in real
memory, the program should benefit from using the SASFILE statement. Also,
SASFILE is especially useful as part of a program that starts a SAS server such as a
SAS/SHARE server. For more information about the SASFILE global statement, see
the SAS DATA Step Statements: Reference.
System Options
If memory is a critical resource, several techniques can reduce your dependence on
increased memory. However, most of them also increase I/O processing or CPU
usage.
You can use the MEMSIZE= system option to increase the amount of memory
available to SAS and therefore decrease processing time. By increasing memory,
you reduce processing time because the amount of time spent on paging, or reading
pages of data into memory, is reduced.
The SORTSIZE= and SUMSIZE= system options enable you to limit the amount of
memory that is available to sorting and summarization procedures.
You can also make tradeoffs between memory and other resources, as discussed in
“Reducing CPU Time By Modifying Program Compilation Optimization” on page
648. To use the I/O subsystem most effectively, you must use more and larger
buffers. However, these buffers share space with the other memory demands of
your SAS session.
Operating Environment Information: The MEMSIZE= system option is not
available in some operating environments. If MEMSIZE= is available in your
operating environment, it might not increase memory. See the documentation for
your operating environment for more information.
For more information, see “Comparison of the BY and CLASS Statements” in Base
SAS Procedures Guide.
Techniques for Optimizing CPU Performance 647
Also, because the CPU performs all the processing that is needed to perform an I/O
operation, an option or technique that reduces the number of I/O operations can
also have a positive effect on CPU usage.
Note: Alignment can be especially important when you process a data set by
selecting only specific variables or when you use WHERE processing.
If you have a large DATA step program, performing code generation optimization
can result in a significant increase in compilation time and overall execution time.
You can reduce or turn off the code generation optimization by using the
CGOPTIMIZE= system option. Set the code generation optimization that you want
SAS to perform using these CGOPTIMIZE= system option values:
For more information, see “CGOPTIMIZE= System Option” in SAS System Options:
Reference.
You can estimate the size of a data set by creating a dummy data set that contains
the same variables as your data set. Run the CONTENTS procedure, which shows
the size of each observation. Multiply the size by the number of observations in
your data set to obtain the total number of bytes that must be processed. You can
compare processing statistics with smaller data sets to determine whether the
performance of the large data sets is in proportion to their size. If not, further
optimization might still be possible.
Note: When you use this technique to calculate the size of a data set, you obtain
only an estimate. Internal requirements, such as the storage of variable names,
might cause the actual data set size to be slightly different.
650 Chapter 25 / Optimizing System Performance
651
26
Using Parallel Processing
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
What Is Threading Technology in SAS? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
How Is Threading Controlled in SAS? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653
Threading in Base SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654
SAS/ACCESS Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657
SAS Scalable Performance Data Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657
SAS Intelligence Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658
SAS High-Performance Analytics Portfolio of Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659
SAS Grid Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660
SAS In-Database Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661
SAS In-Memory Analytics Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661
SAS High-Performance Analytics Product Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
SAS Viya . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665
Overview
SAS introduced threading technology starting in SAS 9 with the introduction of
several Base SAS procedures that had been enhanced to execute, in part, in
multiple threads. SAS has continued to develop and enhance products and
components that take advantage of the threaded processing capabilities provided
by proprietary internal subsystems. Threading is available on a variety of platforms
from a local desktop with multiple CPUs to high-performance platform servers.
These high-performance servers include large multi-core symmetric multi-processor
(SMP) systems and massively parallel processing (MPP) appliances typically
configured as a distributed cluster. Many SAS components that execute on these
platforms take advantage of threading technology.
With SAS 9.4M5, when you license SAS Viya, you can access SAS Cloud Analytic
Services (CAS), a distributed server environment that supports multithreaded, in-
652 Chapter 26 / Using Parallel Processing
memory processing. See “What is SAS Cloud Analytic Services?” for more
information.
Previous releases of Base SAS 9.4 support programs written in the SAS DS2
programming language or the SAS Federated SQL language. These languages can
take advantage of threading. Many other SAS products also use threading
technology. For example, the SAS High-Performance Analytics procedures, SAS
Stored Processes, and SAS Embedded Process either execute or generate code that
executes in high-performance distributed computing environments.
Threaded execution in SAS software includes one or both of these two general
techniques.
n Threaded I/O means that data (frequently in very high volume) is delivered to an
application in threads so that the application is continually processing, not
waiting on data. In Base SAS and SAS/STAT, several procedures take advantage
of threaded reads. Also Base SAS includes the SPD engine that reads from a
data set that is partitioned to optimize for threaded input to the application. The
SAS High-Performance Analytics procedures require very rapid data delivery.
They require threaded reads from data distributed across a computing cluster to
deliver huge amounts of data to the application (which is also processing on the
cluster) and then write the data in parallel to the data storage appliance. SAS
9.4M5 includes access to SAS Viya, which supports distributed, in-memory,
multithreaded processing. See “What is SAS Cloud Analytic Services?” in SAS
Language Reference: Concepts for more information about SAS Cloud Analytic
Services with SAS Viya.
n Threaded application processing means that the application itself is structured
to perform certain tasks in parallel on multiple-CPU machines. Threaded
application processing enables the application to process large amounts of data
to be processed more quickly because multiple threads execute on smaller
segments of data. Applications can be designed to take advantage of machines
with multiple CPUs whether it is a local four-way desktop or a server-class
machine. The SAS High-Performance Analytics Server executes on appliances
How Is Threading Controlled in SAS? 653
that distribute both the data and copies of the application across the appliance
nodes so that the data is co-located with the application processing
With SAS 9.4 and SAS Analytics 12.1, customers can access a wide variety of
products and components that use threading to support ever-increasing amounts of
data as well as computationally intensive algorithms and models. Base SAS and
Foundation SAS threading technologies support all of these.
n REPORT
n SORT
n SUMMARY
n TABULATE
n SQL
For details, see “Threaded Processing for Base SAS Procedures” in Base SAS
Procedures Guide. For details of the thread-enabled SQL procedure, see the SAS
SQL Procedure User’s Guide. Details of SAS System Options, see the SAS System
Options: Reference.
n FMM
n GLM
n GLMSELECT
n LOESS
n MIXED
n QUANTLIFE
n QUANTREG
n QUANTSELECT
n ROBUSTREG
See the SAS/STAT Procedures Guide for details for each procedure.
SAS Scalable Performance Data Engine
The SAS Scalable Performance Data Engine, which is included in Base SAS, is
engineered to exploit SMP hardware capabilities. The SAS Scalable
Performance Data Engine uses partitioned data sets that are optimized for
reading data in threads. The partition size can be configured with the SAS
Scalable Performance Data Engine PARTSIZE option. THREADNUM and
SPDEMAXTHREADS control threading for optimum threaded reads. The Base
SAS NOTHREADS and CPUCOUNT system options have no effect on SPD
Engine threaded reads. They remain in effect for the SAS thread-enabled
procedures executing on the SPD Engine data set. SPD Engine indexes are also
created in threads in parallel automatically without regard to NOTHREADS, if
set. You can use SPDEINDEXSORTSIZE= to optimize threaded index creation.
The SPD Engine is described in the SAS Scalable Performance Data Engine:
Reference.
SAS FedSQL Language
SAS FedSQL is a SAS proprietary SQL implementation based on the ANSI
SQL:1999 standard. It provides support for ANSI SQL data types and other ANSI
compliance features. The core strength of SAS FedSQL is its ability to execute
federated queries across a heterogeneous database environment and return a
single result set. FedSQL queries are automatically optimized with multi-
threaded algorithms in order to resolve large-scale operations. In addition,
FedSQL can execute outside of a SAS session, for example in the SAS
Federation Server and SAS Scalable Performance Data Server environments.
The NOTHREADS and CPUCOUNT options have no effect on FedSQL
processing.
defined methods, and packages. The DS2 SET statement accepts embedded
FedSQL syntax and the runtime-generated queries can exchange data
interactively between DS2 and any supported database. This allows SQL
preprocessing of input tables which effectively combines the power of the two
languages.
The DS2 procedure, which submits thread-enabled DS2 programs to the SAS
Embedded Process for execution is also included. A high-performance version
of the DS2 procedure, PROC HPDS2, submits DS2 language statements to the
separately licensed High-Performance Analytics Server for processing. See the
SAS High-Performance Analytics Server Usage Guide for documentation on this
and other high-performance versions of certain SAS procedures.
SAS/ACCESS Engines
SAS/ACCESS engines are LIBNAME engines that provide Read, Write, and Update
access to more than 60 relational and nonrelational databases, PC files, data
warehouse appliances, and distributed file systems. These engines are not part of
Base SAS but they depend on Base SAS. They are licensed separately or are
included in many product bundles such as SAS BI Server or SAS Activity-Based
Management. Many bundles offer the customer a choice of two out of the many
SAS/ACCESS engines available.
In SAS/ACCESS, threaded reads partition the result set across multiple threads.
Unlike threaded processing in Base SAS procedures, threaded reads in
SAS/ACCESS are not dependent on the number of processors on a machine.
Instead, the result set is retrieved on multiple connections between SAS and the
DBMS. SAS causes the DBMS to partition the result set by appending a WHERE
clause to the SQL statement. When this happens, a single SQL statement becomes
multiple SQL statements, one on each thread. The DBMS reads the partitions one
per thread also.
The amount of scalability that is provided with the SAS/ACCESS engines depends
on the efficiency of parallelization implemented in the DBMS itself. However,
SAS/ACCESS engines have options available in the LIBNAME statement that
enable tuning of the threaded implementation within the SAS/ACCESS engines.
The options that control threaded reads in SAS/ACCESS are DBSLICE,
DBSLICEPARM, THREADS|NOTHREADS, and whether BY, OBS, or KEY options are
used in a PROC or DATA step. Refer to the SAS/ACCESS for Relational Databases
documentation for more information.
n SAS Business Intelligence Server and SAS Data Integration Server technologies
o the SAS servers and supporting services such as SAS Stored Process Servers
o application messaging
o SAS BI Web Services
o publishing framework
o SAS Foundation Services
Each server is initiated with a pool of active threads. These threads are controlled
by the server and are used by server processes (for example, handling incoming
requests). If the NOTHREADS and CPUCOUNT options are specified, they are
ignored, except during the execution of submitted code that includes a SAS
procedure that honors these options.
For the SAS Metadata Server, thread usage is controlled by default settings for the
object server parameters (THREADSMAX and THREADSMIN) and for the
metadata server configuration option, MACACTIVETHREADS. Administrators can
override these settings in order to fine-tune performance. See the SAS Intelligence
Platform: System Administrator’s Guide for details and examples. The THREADMAX
and THREADMIN object server parameters are rarely used for servers other than
the SAS Metadata Server.
SAS High-Performance Analytics Portfolio of Products 659
Some SAS High-Performance Analytics products can work in concert with, or are
directly integrated with, other SAS applications and solutions. For example, you can
configure SAS Grid Manager to distribute the workload from SAS Enterprise Guide
executing on the SAS Intelligence Platform. The SAS Grid Manager can be used to
manage the workload of SAS jobs on the SAS High-Performance Analytic Server
running on a DBMS appliance such as EMC Greenplum or Teradata. And SAS Visual
Analytics can be used to explore data that is consumed by SAS Enterprise Miner
executing in a SAS Grid environment.
660 Chapter 26 / Using Parallel Processing
n SAS In-Database
SAS programs, that are grid-enabled to run in multiple independent steps within the
overall program flow, execute in threads. The threaded execution typically occurs
at the DATA step or procedure boundaries. Because the programs are specifically
written to execute in threads, it is assumed that the THREADS option is set and the
CPUCOUNT option is greater than one. SAS Grid Manager enables subtasks of
individual SAS jobs to run in parallel on different parts of the grid and share a pool
of resources. The threaded subtasks can further benefit from threaded processing
performed by any of the thread-enabled procedures executed from within the code.
Programs that have been analyzed for parallel processing using the SAS Code
Analyzer can be run on the grid with SAS Grid Manager. This automatically assigns
the identified subtasks to a grid node.
Many SAS products, such as SAS Enterprise Guide, SAS Enterprise Miner, SAS Data
Integration Studio, SAS Web Report Studio, SAS Marketing Automation, and SAS
Marketing Optimization are integrated with SAS Grid Manager. An option within the
application or in the SAS Metadata enables the integration. SAS Data Integration
Studio and SAS Enterprise Miner have code generation engines that can recognize
opportunities for parallelization and generate the appropriate code to submit to
SAS Grid Manager to execute in parallel across the grid.
Other SAS components, such as the SAS Intelligence Platform, use SAS Grid
Manager to determine the optimal SAS server for processing by distributing server
workloads across multiple computers on a network. SAS Grid Manager divides jobs
into separate processes that run in parallel across multiple servers. SAS Grid
Manager is documented in Grid Computing for SAS.
SAS In-Memory Analytics Technology 661
In-Database processes are divided into multiple parallel tasks within the database,
each working with small subsets of the overall data to be processed. As the results
of the smaller subsets are derived, they are consolidated and returned to the SAS
application. Typically, th execution times are much shorter than if the data were
transferred to the SAS server.
loaded into memory before the analytic procedure begins. This is in distinct
contrast to traditional processing where data is loaded in blocks as they are
needed. In addition, SAS High-Performance Analytics procedures are engineered to
execute on hundreds of threads and each thread is responsible for a small subset of
the overall data to be processed. Faster analysis of large data sets results in greater
refinement of analytic models.
statistics and use more specialized analytics in a very scalable way. This
provides more accurate insights into the future and aids decision making. For
example, users can visually explore data, execute analytic correlations on
billions of rows of data in just seconds, and visually present results. This helps
quickly identify patterns, trends, and relationships in data that were not evident
before.
SAS Visual Analytics relies on the SAS LASR Analytic Server to provide a highly
scalable analytics infrastructure that is optimized for large volumes of data and
complex computations.
SAS LASR Analytic Server
The SAS LASR Analytic Server is an analytic platform that provides a secure
environment for concurrent access to data. It loads the data into memory across
the computing nodes of a SAS High-Performance Analytics Server. The SAS
LASR Analytic Server executes on the SAS High-Performance Analytics Server
root node with worker nodes across the appliance that read data into memory in
parallel very fast. If the data is not from a co-located data provider, then the
data is read from the DBMS appliance or Hadoop cluster and transferred to the
root node of the SAS High-Performance Analytics Server. Then, it is loaded into
the memory of the worker nodes. The SAS LASR Analytic Server is not
influenced by CPUCOUNT or NOTHREADS. Instead, the NTHREADSoption in
the PERFORMANCE statement throttles thread usage. Refer to the SAS LASR
Analytic Server: Administration Guide for details.
For more SAS In-Memory Analytics products, see Products & Solutions/In-Memory
Analytics.
SAS Data Integration Studio can execute with in-memory products SAS High-
Performance Analytics Server and SAS Visual Analytics Server configured with
the SAS Intelligence Platform as an MPP environment, which is always
threaded. SAS Data Integration Studio provides High-Performance Analytics
transformations for SAS LASR Analytic Servers or HDFS. NOTHREADS and
CPUCOUNT options have no effect. See the SAS Data Integration Studio: User’s
Guide for details. The SAS Data Integration Server is administered as part of the
SAS Intelligence Platform.
SAS Risk Dimensions
The iterative workflow in SAS Risk Dimensions is similar to that in SAS Data
Integration Studio; they both execute the same analysis over different subsets
of the data. This workflow makes them ideal for taking advantage of SAS Grid
Manager to distribute the processing across the grid. For more information, see
SAS Risk Dimensions User’s Guide.
SAS Enterprise Miner
SAS Enterprise Miner can automatically generate SAS applications that are
enabled to execute on a SAS Grid Manager grid. These SAS applications detect
the presence of a SAS Grid Manager environment at run time and distribute the
execution accordingly. The applications can also be saved as SAS stored
processes and subsequently executed by the SAS Business Intelligence
components such as SAS Web Report Studio. The DMINE, DMREG, and DMDB
SAS Viya 665
SAS Viya
With SAS 9.4M5, you can license SAS Viya, software that offers a variety of high
performance products and access to SAS Cloud Analytic Services. For more
information, see An Introduction to SAS Viya Programming.
666 Chapter 26 / Using Parallel Processing
667
PART 5
Creating Output
Chapter 27
Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669
Chapter 28
Printing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
668
669
27
Output
program results
contain the programmatic results from SAS procedures and SAS DATA step
applications. These results can be sent to a file or printed as a report. There are
a variety of options, formats, statements, and commands available in SAS to
customize your output. The Output Delivery System enables you to specify
output destinations to control how your output is stored, table definitions to
control how your output is structured, and style templates to control the
stylistic elements of your output. For more information, see SAS Output
Delivery System: User’s Guide.
Here are a few examples of the types of output that you can get from running
SAS programs:
n a SAS data set
You can write specific information to the SAS log (such as variable values or
text strings) by using the SAS statements that are described in “Writing to
the Log in All Modes” on page 681.
The log is also used by some of the SAS procedures that perform utility
functions, for example, the DATASETS and OPTIONS procedures. See the
Base SAS Procedures Guide.
Note: For more information about the destination of the SAS console log,
see the SAS documentation for your operating environment.
Appenders are defined for the duration of a macro program or a DATA step.
Loggers are defined for the duration of the SAS session.
The following table shows the default destinations for each method of operation
based on SAS version:
672 Chapter 27 / Output
Mode of Running
SAS Version SAS Viewer ODS Destination
SAS 9.3 and later SAS Windowing SAS Results Viewer or HTML
Environment browser window
For more information about the new defaults and ODS destinations, see the SAS
Output Delivery System: User’s Guide.
The following list describes some of the commonly used ODS statements and other
SAS language elements that are used for routing output.
SAS system options redefine the destination of log and output for
an entire SAS program. These system options
are specified when you invoke SAS. The
system options used to route output are the
674 Chapter 27 / Output
For conceptual information about global ODS statements, see the following
resources:
n “Destination Category Table” in SAS Output Delivery System: User’s Guide.
For information about changing the default output location for the z/OS and UNIX
operating environments, see the following resources:
n z/OS: “Directing SAS Log and SAS Procedure Output” in SAS Companion for
z/OS. and “Changing the Default Destination” in SAS Companion for z/OS
n UNIX: “Changing the Default Routings in UNIX Environments” in SAS Companion
for UNIX Environments
Customize Output
The following list describes some of the statements and SAS system options that
you can use:
DATE | NODATE system option controls printing of date and time values.
When this option is enabled, SAS prints on the
Customize Output 675
LINESIZE= and PAGESIZE= system changes the default number of lines per page
options (page size) and characters per line (line size)
for printed output. The default depends on the
method of running SAS and the settings of
certain SAS system options. Specify new page
and line sizes in the OPTIONS statement or
OPTIONS window. You can also specify line
and page size for DATA step output in the
FILE statement.
The values that you use for the LINESIZE=
and PAGESIZE= system options can
significantly affect the appearance of the
output that is produced by some SAS
procedures.
For information about the Output Delivery System, see SAS Output Delivery
System: User’s Guide.
The FORMAT procedure enables you to design your own formats and informats,
giving you added flexibility in displaying values. See “FORMAT Procedure” in Base
SAS Procedures Guide for more information about the FORMAT procedure, and SAS
System Options: Reference for information about all other SAS system options.
The SAS Log 677
The MISSING= system option enables you to specify a character to print in place of
the period for ordinary missing numeric values. For more information, see the
“MISSING= System Option” in SAS System Options: Reference.
NOTE: Copyright (c) 2016 by SAS Institute Inc., Cary, NC, USA. 1
NOTE: SAS (r) Proprietary Software 9.4 (TS1B0) 2 Licensed to SAS Institute
Inc., Site 1. 3
NOTE: This session is executing on the W32_7PRO platform. 4
NOTE: SAS initialization used:
real time 4.19 seconds
cpu time 0.85 seconds
1 options pagesize=24
2 linesize=64 pageno=1 nodate; 5
3 data logsample; 6
5 infile
5 ! '\\myserver\my-directory-path\sampledata.dat'; 7
6 input LastName $ ID $ Gender $ Birth : date7. score1
6 ! score2 score3 score4 score5 score6 score7 score8;
7 format Birth mmddyy8.;
8 run;
NOTE: The infile
'\\myserver\my-directory-path\sampledata.dat' is: 8
Filename=\\myserver\my-directory-path\sampledata.dat,
RECFM=V,LRECL=256,File Size (bytes)=296,
Last Modified=08Jun2009:15:42:26,
Create Time=08Jun2009:15:42:26
NOTE: 5 records were read from the infile 9
'\\myserver\my-directory-path\sampledata.dat'.
The minimum record length was 58.
The maximum record length was 59.
NOTE: The data set WORK.LOGSAMPLE has 5 observations and 12
variables. 10
NOTE: DATA statement used (Total process time):
real time 0.21 seconds 11
cpu time 0.03 seconds
9
10 proc sort data=logsample; 12
11 by LastName;
12 run;
NOTE: There were 5 observations read from the data set
WORK.LOGSAMPLE.
NOTE: The data set WORK.LOGSAMPLE has 5 observations and 12
variables. 13
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
13
14 proc print data=logsample; 14
15 by LastName;
16 run;
NOTE: There were 5 observations read from the data set
WORK.LOGSAMPLE.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
The following list corresponds to the circled numbers in the SAS log shown above:
1 copyright information
2 SAS release used to run this program
3 name and site number of the computer installation where the program ran
4 platform used to run the program
The SAS Log 679
Note: Copyright information, licensing and site information, and the number of
observations and variables in the data set can be suppressed using the NOTES |
NONOTES System Option
5 The OPTIONS statement sets a page size of 24, a line size of 64, sets the SAS
output to page 1, and suppresses the date in the output.
6 SAS statements that make up the program (if the SAS system option SOURCE
is enabled)
7 long statement continued to the next line. Note that the continuation line is
preceded by an exclamation point (!), and that the line number does not change.
8 input file information-notes or warning messages about the raw data and where
they were obtained (if the SAS system option NOTES is enabled)
9 the number and record length of records read from the input file (if the SAS
system option NOTES is enabled)
10 SAS data set that your program created; notes that contain the number of
observations and variables for each data set created (if the SAS system option
NOTES is enabled)
11 reported performance statistics when the STIMER option or the FULLSTIMER
option is set
12 procedure that sorts your data set
13 note about the sorted SAS data set
14 procedure that prints your data set
The following sections discuss the log options that you can configure using the
LOGPARM= system option and how you would name the SAS log for those options
when the logging facility has not been initiated. The LOG= system option names the
SAS log. The LOGPARM= system option enables you to append to or replace the
SAS log, determine when to write to the SAS log, and start a new SAS log under
certain conditions.
680 Chapter 27 / Output
In the following SAS command, both the LOG= and LOGPARM= system options are
specified in order to replace an existing SAS log that is more than one day old:
sas -sysin "my-batch-program" -log "c:\sas\SASlogs\mylog"
-logparm open=replaceold
The OPEN= option is ignored when the ROLLOVER= option of the LOGPARM=
system option is set to a specific size, n.
You use the WRITE= option of the LOGPARM= system option to configure when
the SAS log contents are written. Set LOGPARM=“WRITE=IMMEDIATE” for the log
content to be written as it is produced and set LOGPARM=“WRITE=BUFFERED” for
the log content to be written when the buffer is full.
The LOGPARM= system option controls when log files are opened and closed and
the LOG= system option names the SAS log file. Logs can be rolled over
automatically, when a SAS session starts, when the log has reached a specific size,
or not at all. By using formatting directives in the SAS log name, each SAS log can
be named with unique identifiers. For more information about how to roll over the
SAS log, see the following examples:
n “Example : Use Directives to Name the SAS Log” on page 688
n “Example: Automatically Roll Over the SAS Log When Directives Change” on
page 689
The SAS Log 681
n “Example: Roll Over the SAS Log by SAS Session” on page 690
n “Example: Roll Over the SAS Log by the Log Size” on page 691
To not roll over the log at all, specify the LOGPARM= “ROLLOVER=NONE” option
when SAS starts. Directives are not resolved and no rollover occurs. For example, if
LOG=“March#b.log”, the directive #b does not resolve and the log name is
March#b.log.
Table 27.4 Other Statements That Write Information to the SAS Log
LIST Statement Writes to the SAS log the input data “LIST Statement” in
record for the observation that is being SAS DATA Step
processed. Statements: Reference
PUT Statement Writes lines to the SAS log, to the SAS “PUT Statement” in
output window, or to an external SAS DATA Step
location that is specified in the most Statements: Reference
recent FILE statement.
Use the PUT, PUTLOG, LIST, DATA, and ERROR statements in combination with
conditional processing to debug DATA steps by writing selected information to the
log.
The following list describes some of the SAS system options that you can use to
alter the contents of the log. For examples of how some of these options can be
used, see “Examples: Suppress Output to the SAS Log” .
ECHO: Windows and UNIX Specifies a message to be written to the SAS log
while SAS initializes. The ECHO system option is
valid only under the Windows and UNIX operating
environments.
LOGPARM "OPEN=APPEND When a log file already exists and SAS is opening the
| REPLACE | log, the LOGPARM option specifies whether to
REPLACEHOLD" append to the existing log or to replace the existing
log. The REPLACEHOLD option specifies to replace
logs that are more than one day old.
OPLIST: Windows OPLIST: Specifies whether the settings of the SAS system
z/OS OPLIST: UNIX options are written to the SAS log.
PAGEBREAKINITIAL Specifies whether the SAS log and the listing file
begin on a new page.
684 Chapter 27 / Output
RTRACE: Windows RTRACE: Produces a list of resources that are read or loaded
UNIX during a SAS session and writes them to the SAS log
if a location is not specified for the RTRACELOC=
system option. The RTRACE system option is valid
only for the Windows and UNIX operating
environments.
See SAS System Options: Reference for more information about how to use these
and other SAS system options.
Table 27.6 SAS Statements and System Options to Customize the Log
DTRESET system option Specifies whether to update the date and time in the
SAS log and in the listing file.
LINESIZE= system option Specifies the line size (printer line width) for the SAS
log and SAS output that are used by the DATA step and
procedures.
MSGCASE system option: Specifies whether to display notes, warning, and error
Windows MSGCASE messages in uppercase letters or lowercase letters.
system option: UNIX
MSGCASE system option:
z/OS
NUMBER system option Controls whether the page number is printed on the
first title line of each page of printed output.
PAGESIZE= system option Specifies the number of lines that you can print per
page of SAS output.
STIMEFMT system option: Specifies the format to use for displaying the read and
Windows STIMEFMT CPU processing times when the STIMER system option
system option: UNIX is set. The STIMEFMT system option is valid under
Windows and UNIX operating environments.
Operating Environment Information: The range of values for the FILE statement
and for SAS system options depends on your operating environment. See the SAS
documentation for your operating environment for more information.
Table 27.7 Other SAS System Options That Affect the SAS Log
ALTLOG system option: Windows ALTLOG Specifies the destination for a copy of
system option: UNIX ALTLOG system the SAS log.
option: z/OS
686 Chapter 27 / Output
LOG system option: Windows LOG system Specifies the destination for the SAS
option: UNIX LOG= system option: z/OS log when SAS is run in batch mode.
Example Code
The default output destination is HTML when running SAS programs. In SAS Studio,
the results are displayed in the Results window.
title 'Student Weight';
proc print data=sashelp.class;
where weight>100;
run;
Note: If you previously closed the HTML destination, then your output is sent to
the WORK directory by default. If you close the HTML destination and reopen it in
Examples: Manage Output Destinations 687
the same SAS session, then all output goes to the current directory. You do not
have to specify ods html close; to view your output.
Key Ideas
n The destination of your output depends on your operating environment, your mode
of running SAS, and your version of SAS.
Example Code
If you want to send your output to the LISTING destination, you can use ODS
statements in your SAS programs to change the destination.
title 'Students';
proc print data=sashelp.class;
where weight>100;
run;
ods html;
ods listing close;
688 Chapter 27 / Output
Key Ideas
n If you want a more permanent solution, you can change your settings so that every
time you run SAS, your output is sent to the LISTING destination by default.
See Also
n “Understanding ODS Destinations” in SAS Output Delivery System: User’s Guide
Example Code
The following example shows how to use directives in the LOG= system option to
include the year, the month, and the day in the SAS log name:
-log "c:\saslog\#Y#b#dsas.log
When the SAS log is created on February 2, 2019, the name of the log is
2019Feb02sas.log.
Examples: Rolling Over the SAS Log 689
Key Ideas
n Directives enable you to add information to the SAS log name such as the day, the
hour, the system node name, or a unique identifier. You can include one or more
directives in the name of the SAS log when you specify the log name in the LOG
system option.
n A directive is a processing instruction that is used to uniquely name the SAS log.
n Directives resolve only when the value of the ROLLOVER= option of the
LOGPARM= system option is set to AUTO or SESSION.
n If directives are specified in the log name and the value of the ROLLOVER option is
NONE or a specific size, n, the directive characters, such as #b or #Y, become part
of the log name.
n Using the example above for the LOG system option, if the LOGPARM= system
option specifies ROLLOVER=NONE, the name of the SAS log is #Y#b#dsas.log.
See Also
n For a complete list of directives, see “LOGPARM= System Option” in SAS
System Options: Reference
Example Code
When the SAS log name contains one or more directives and the ROLLOVER=
option of the LOGPARM= system option is set to AUTO, SAS closes the log and
opens a new log when the directive values change. The new SAS log name contains
the new directive values.
The table below shows some of the log names that are created when SAS is started
on the second of the month at 6:15 AM, using this SAS command:
sas -objectserver -log "london#n#d#%H.log"
-logparm
"rollover=auto"
690 Chapter 27 / Output
Key Ideas
n The directive #n inserts the system node name into the log name. #d adds the day
of the month to the log name. #H adds the hour to the log name. The node name
for this example is Thames
n Under the Windows and UNIX operating environments, you can begin directives
with either the % symbol or the # symbol, and use both symbols in the same
directive.
n The log for this SAS session rolls over when the hour changes and when the day
changes.
See Also
n For a complete list of directives, see “LOGPARM= System Option” in SAS
System Options: Reference
Example Code
To roll over the log at the start of a SAS session, specify the
LOGPARM=“ROLLOVER=SESSION” option when SAS starts. The example below
creates a log filename that contains the user name that started the SAS session.
sas -log "%l.log"
Examples: Rolling Over the SAS Log 691
-logparm "rollover=session"
Key Ideas
n SAS resolves the system-specific directives that are specified in the LOG= system
option and uses the resolved value to name the new log file.
n Under the Windows and UNIX operating environments, you can begin directives
with either the % symbol or the # symbol, and use both symbols in the same
directive.
n No roll over occurs during the SAS session, and the log file is closed at the end of
the SAS session.
See Also
n “LOGPARM= System Option” in SAS System Options: Reference
Example Code
To roll over the log when the log reaches a specific size, specify the
LOGPARM=“ROLLOVER=n” option when SAS starts.
sas -log "test%H%M.log"
-logparm "rollover=80"
Key Ideas
n n is the maximum size of the log, in bytes, and it cannot be smaller than 10K
(10,240) bytes.
n Under the Windows and UNIX operating environments, you can begin directives
with either the % symbol or the # symbol, and use both symbols in the same
directive.
n When the log reaches the specified size, SAS closes the log and appends the text
“old” to the filename (for example, londonold.log). SAS opens a new log using the
value of the LOG= option for the log name and ignores the OPEN= option
statement in the LOGPARM system option.
692 Chapter 27 / Output
n This is useful so that SAS never writes over an existing log file. Directives in log
names are ignored for logs that roll over based on log size.
n To ensure unique log filenames between servers, SAS creates a lock file that is
based on the log filename. The lock filename is logname.lck, where logname is the
value of the LOG= option.
n If a lock file exists for a server log and another server specifies the same log name,
a number is appended to the log and lock filenames for the second server. The
numbers begin with 2 and increment by 1 for subsequent requests for the same log
filename. For example, if a lock exists for the log file london.log, the second server
log would be london2.log and the lock file would be london2.lck.
See Also
n “LOGPARM= System Option” in SAS System Options: Reference
Example Code
In this example, the first DATA step prints the source code to the SAS log. The
source code in this case contains a password, which is displayed in the SAS log
when the DATA step executes:
data mytable;
password="ABC123";
x = 1;
run;
Examples: Suppress Output to the SAS Log 693
Example Code 27.2 SAS Log output that shows how the password is printed to the SAS log
in the source code
73 data mytable;
74 password="ABC123";
75 x = 1;
76 run;
You can suppress the printing of the source program to the log by using the
NOSOURCE system option.
options nosource;
data mytable;
password="ABC123";
run;
Example Code 27.3 SAS Log output that shows how the source code is suppressed from
printing to the log
Key Ideas
n The SOURCE | NOSOURCE System Option specifies whether SAS writes source
statements to the SAS log.
n The NOSOURCE System Option does not write SAS source statements to the SAS
log.
See Also
n “SOURCE System Option” in SAS System Options: Reference
694 Chapter 27 / Output
Example Code
In this example, the NOSOURCE option suppresses the printing of the source
program to the SAS log. However, because the second DATA step generates an
error, the password data is printed to the SAS log in the log note.
options nosource;
data mytable;
password="123ABC";
run;
data mytable2;
set mytable;
password=input(password,datetime.);
run;
Example Code 27.4 Log output that shows how the password is printed to the SAS log in the
NOTE
NOTE: Numeric values have been converted to character values at the places given
by:
(Line):(Column).
123:14
NOTE: Invalid argument to function INPUT at line 123 column 14.
password=. _ERROR_=1 _N_=1
NOTE: Mathematical operations could not be performed at the following places. The
results of the
operations have been set to missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 123:14
NOTE: There were 1 observations read from the data set WORK.MYTABLE.
NOTE: The data set WORK.MYTABLE2 has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
You can suppress the printing variable data when there is an error by specifying the
NOLIST option in the DATA statement.
data mytable2 / NOLIST;
set mytable;
password=input(password,datetime.);
run;
Examples: Suppress Output to the SAS Log 695
Example Code 27.5 Log output that shows how the NOLIST option suppresses the printing
of variable data to the SAS log when there is an error
NOTE: Numeric values have been converted to character values at the places given by:
(Line):(Column).
127:14
NOTE: Invalid argument to function INPUT at line 127 column 14.
NOTE: NOLIST option on the DATA statement suppressed output of variable listing.
NOTE: Mathematical operations could not be performed at the following places. The
results of the
operations have been set to missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 127:14
NOTE: There were 1 observations read from the data set WORK.MYTABLE.
NOTE: The data set WORK.MYTABLE2 has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds
Key Ideas
n The NOLIST DATA statement option suppresses the output of all variables to the
SAS log when the value of _ERROR_ is 1.
n NOLIST must be the last option in the DATA statement.
See Also
n “DATA Statement” in SAS DATA Step Statements: Reference
Example Code
These examples uses the NONOTES system option to suppress the printing of
notes to the SAS log.
data example;
password="ABC123";
run;
696 Chapter 27 / Output
Example Code 27.6 Log output showing how notes are printed to the SAS log
17 data example;
18 password="ABC123";
19 run;
You can suppress the printing of notes by using the NONOTES system option.
options NONOTES;
data example;
password="ABC123";
run;
Example Code 27.7 Log output showing how notes are suppressed in the SAS log
20 options NONOTES;
21 data example;
22 password="ABC123";
23 run;
Key Ideas
n The NOTES system option specifies whether notes are written to the SAS log.
n NONOTES specifies that SAS does not write notes to the SAS log.
See Also
n “NOTES System Option” in SAS System Options: Reference
697
28
Printing
Universal Printing
You can define printers with Universal Printing, and set options to control the
printed output. In addition to creating the various document and graphic output
types, you can send output to a printer.
SAS routes all printing through Universal Printing services. All Universal Printing
features are controlled by system options, thereby enabling you to control many
print features, even in batch mode.
PART 6
Managing Files
Chapter 29
Protecting Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701
Chapter 30
Repairing SAS Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
Chapter 31
Compressing SAS Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
Chapter 32
Moving SAS Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735
Chapter 33
Cross-Environment Data Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737
Chapter 34
Managing SAS Catalogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757
700
701
29
Protecting Files
Definition of a Password
SAS software enables you to restrict access to members of SAS libraries by
assigning passwords to the members. You can assign passwords to all member
types except catalogs. You can specify three levels of protection: Read, Write, and
Alter. When a password is assigned, it appears as uppercase Xs in the log.
Note: This document uses the terms SAS data file and SAS view to distinguish
between the two types of SAS data sets. Passwords work differently for type VIEW
than they do for type DATA. The term “SAS data set” is used when the distinction is
not necessary.
read
protects against reading the file.
write
protects against changing the data in the file. For SAS data files, write
protection prevents adding, modifying, or deleting observations.
alter
protects against deleting or replacing the entire file. For SAS data files, alter
protection also prevents modifying variable attributes and creating or deleting
indexes.
Alter protection does not require a password for Read or Write access; write
protection does not require a password for Read access. For example, you can read
an alter-protected or write-protected SAS data file without knowing the Alter or
Write password. Conversely, read and write protection do not prevent any
operation that requires alter protection. For example, you can delete a SAS data set
that is read- or write-protected only without knowing the Read or Write password.
To protect a file from being read, written to, deleted, or replaced by anyone who
does not have the proper authority, assign read, write, and alter protection. To allow
others to read the file without knowing the password, but not change its data or
delete it, assign just write and alter protection. To completely protect a file with
one password, use the PW= data set option. For more information, see “Assigning
Complete Protection with the PW= Data Set Option” on page 707.
Note: Because of how SAS opens files, you must specify the Read password to
update a SAS data set that is only read-protected.
Note: The levels of protection differ somewhat for the member type VIEW. See
“Using Passwords with Views” on page 708.
Assigning Passwords 703
Assigning Passwords
Syntax
To set a password, first specify a SAS data set in one of the following:
n a DATA statement
n the ToolBox
Then assign one or more password types to the data set. The data set might
already exist, or the data set might be one that you create. The following is an
example of syntax:
where password is a valid eight-character SAS name and password-type can be one
of the following SAS data set options:
n ALTER=
n PW=
n READ=
n WRITE=
TIP Each password option must be coded on a separate line to ensure that
they are properly blotted in the SAS log.
CAUTION
Keep a record of any passwords that you assign! If you forget or do not know the
password, you cannot get the password from SAS.
This example prevents deletion or modification of the data set without a password.
/* assign a write and an alter password to MYLIB.STUDENTS */
data mylib.students(write=yellow alter=red);
input name $ sex $ age;
datalines;
Amy f 25
… more data lines …
;
Note: When you replace a SAS data set that is alter-protected, the new data set
inherits the Alter password. To change the Alter password for the new data set, use
the MODIFY statement in the DATASETS procedure.
Passwords are hierarchical in terms of gaining access. For example, specifying the
ALTER password gives you Read and Write access. The following example creates
the data set States, with three different passwords, and then reads the data set to
produce a plot:
data mylib.states(read=green write=yellow alter=red);
input density crime name $;
datalines;
151.4 6451.3 Colorado
… more data lines …
;
If you are using batch or noninteractive mode, you receive an error message in the
SAS log if you try to access a password-protected member without specifying the
correct password.
Assigning Complete Protection with the PW= Data Set Option 707
If you are using interactive line mode, you are also prompted for the password if
you do not specify the correct password. When you enter the password and press
the Enter key, processing continues. If you cannot give the correct password, you
receive an error message in the SAS log.
Encoded Passwords
Encoding a password enables you to write SAS programs without having to specify
a password in plain text. The PWENCODE procedure uses encoding to disguise
passwords. With encoding, one character set is translated to another character set
through some form of table lookup. An encoded password is intended to prevent
casual, non-malicious viewing of passwords. You should not depend on encoded
passwords for all your data security needs; a determined and knowledgeable
attacker can decode the encoded passwords.
When an encoded password is used, the syntax parser decodes the password and
accesses the file. The encoded password is never written in plain text to the SAS
log. SAS does not accept passwords longer than eight characters. If an encoded
password is decoded and is longer than eight characters, SAS reads it as an
incorrect password and sends an error message to the SAS log. For more
information, see “PWENCODE Procedure” in Base SAS Procedures Guide.
Levels of Protection
The levels of protection for SAS views and stored programs are similar to the levels
of protection for other types of SAS files. However, with SAS views, passwords
affect not only the underlying data, but also the view’s definition (or source
statements).
You can specify three levels of protection for SAS views: Read, Write, and Alter.
The following section describes how these data set options affect the underlying
data as well as the view’s descriptor information. Unless otherwise noted, the term
“view” refers to any type of SAS view and the term “underlying data” refers to the
data that is accessed by the SAS view:
Read
n protects against reading of the SAS view's underlying data
n prevents the display of source statements in the SAS log when using
DESCRIBE
n allows replacement of the SAS view
Using Passwords with Views 709
Write
n protects the underlying data associated with a SAS view by insisting that a
Write password is given
n prevents the display of source statements in the SAS log when using
DESCRIBE
n allows replacement of the SAS view
Alter
n prevents the display of source statements in the SAS log when using
DESCRIBE
n protects against replacement of the SAS view
For example, to DESCRIBE a view that has both Read and Write protection, you
must specify its Write password. Similarly, to DESCRIBE a view that has both Read
and Alter protection, you must specify its Alter password (since Alter is the more
restrictive of the two).
The following program shows how to use the DESCRIBE statement to view the
descriptor information for a Read-protected and Alter-protected view:
/*create a view with read and alter protection*/
data exam / view=exam(read=read alter=alter);
set grades;
run;
/*describe the view by specifying the most restrictive password */
data view=exam(alter=alter);
describe;
run;
Example Code 29.1 Password-protected View
For more information, see “DESCRIBE Statement” in SAS DATA Step Statements:
Reference and “DATA Statement” in SAS DATA Step Statements: Reference.
In most DATA and PROC steps, the way you use password-protected views is
consistent with how you use other types of password-protected SAS files. For
example, the following PROC PRINT prints a Read-protected view:
proc print data=mylib.grade(read=green);
run;
Note: You might experience unexpected results when you place protection on a
SAS view if some type of protection is already placed on the underlying data set.
710 Chapter 29 / Protecting Files
Note: You can create a PROC SQL view from password-protected SAS data sets
without specifying their passwords. Use the view that you are prompted for the
passwords of the SAS data sets named in the FROM clause. If you are running SAS
in batch or noninteractive mode, you receive an error message.
SAS/ACCESS Views
SAS/ACCESS software enables you to edit View descriptors and, in some
interfaces, the underlying data. To prevent someone from editing or reading
(browsing) the View descriptor, assign Alter protection to the view. To prevent
someone from updating the underlying data, assign Write protection to the view.
For more information, see the SAS/ACCESS documentation for your DBMS.
Note that you can use the SAS view without a password, but access to the
underlying data requires a password. This is one way to protect a particular column
SAS Data File Encryption 711
of data. In the above example, proc print data=mylib.emp; executes, but proc
print data=mylib.employee; fails without the password.
Encryption does not affect file access. However, SAS recognizes all host security
mechanisms that control file access and can extend host security mechanisms by
binding the data sets to metadata. You can use encryption and those security
mechanisms together.
There are three types of algorithms that SAS uses for encrypting data files:
n SAS Proprietary Encryption on page 712 is implemented with the
ENCRYPT=YES data set option.
n AES (Advanced Encryption Standard) encryption on page 713 is implemented
with the ENCRYPT=AES or ENCRYPT=AES2 data set option.
Beginning in SAS 9.4M1, a metadata-bound library administrator can require that all
data files in the bound library be encrypted with one of the three algorithms. For
more information, see “Requiring Encryption for Metadata-Bound Data Sets” in
Base SAS Procedures Guide and SAS Guide to Metadata-Bound Libraries.
License required No No No
SAS version support 8 and later 9.4 and later 9.4m5 and later
See Also
“AUTHLIB Procedure” in Base SAS Procedures Guide
SAS Proprietary Encryption for SAS data sets is implemented with the ENCRYPT=
data set option. You can use the ENCRYPT= data set option only when you are
creating a SAS data file. You must also assign a password when encrypting a data
file with SAS Proprietary Encryption. At a minimum, you must specify the READ=
data set option or the PW= data set option at the same time you specify
ENCRYPT=YES. Because passwords are used in this encryption technique, you
cannot change any password on an encrypted data set without re-creating the data
set.
SAS Data File Encryption 713
n You cannot encrypt SAS data views, because they contain no data.
n If the data file is encrypted, all associated indexes are also encrypted.
The following example creates an SAS data set with SAS Proprietary Encryption:
data salary(encrypt=yes read=green);
input name $ yrsal bonuspct;
datalines;
Muriel 34567 3.2
Bjorn 74644 2.5
Freda 38755 4.1
Benny 29855 3.5
Agnetha 70998 4.1
;
TIP Each password option must be coded on a separate line to ensure that
they are properly blotted in the SAS log.
See Also
“AUTHLIB Procedure” in Base SAS Procedures Guide
AES Encryption
In SAS 9.4 release, AES encryption of data sets is available. You specify
ENCRYPT=AES when creating a data set. AES produces a strong encryption by
using a key value that can be up to 64 characters long. Beginning in SAS 9.4M5
release, a stronger AES key generation algorithm is available. You use
ENCRYPT=AES2 data set option. Instead of passwords that are stored in the data
set (SAS Proprietary encryption), AES and AES2 uses a key value that is not stored
in the data set. The key value is created using the ENCRYPTKEY= data set option
when the data set is created. You cannot change the ENCRYPTKEY= key value on
an AES encrypted data set without re-creating the data set or using PROC
714 Chapter 29 / Protecting Files
The following rules apply to AES and AES2 encryption of data sets:
n You use SAS/SECURE software, which is licensed with Base SAS software and
is available in all deployments.
n You must use the ENCRYPTKEY= data set option when creating or accessing an
AES encrypted data set unless the metadata-bound library administrator has
securely recorded the encryption key in metadata to which the data set is
bound. For more information, see “AUTHLIB Procedure” in Base SAS Procedures
Guide and SAS Guide to Metadata-Bound Libraries.
n To copy an AES-encrypted data file, the output engine must support AES
encryption. Otherwise, the data file is not copied.
n Releases before SAS 9.4 cannot use an AES-encrypted data file.
n Releases before SAS 9.4M5 cannot use an AES encrypted file that uses AES2
key generation algorithm.
n SAS Viya cannot access data sets created with ENCRYPT=AES2.
n If two or more data files are referentially related and any of them are AES
encrypted, then all must be AES encrypted. The encryption key for all of the
files must be the same unless the files are bound to metadata with the key
securely recorded in the metadata. For more information about metadata-bound
libraries, see “Metadata-Bound Library” in Base SAS Procedures Guide.
n If the data file has AES encryption, all associated indexes have AES encryption.
The ENCRYPTKEY= data set option does not protect the AES encrypted file from
deletion or replacement. AES encrypted data sets can be deleted by using either of
the following scenarios without having to specify an encrypt key value:
n the KILL option in PROC DATASETS
The encrypt key only prevents access to the contents of the file. To protect the file
from unauthorized deletion or replacement with the SAS system, the file must also
contain an ALTER= password or be bound to metadata.
The following example creates an encrypted data set using AES encryption:
data salary(encrypt=aes encryptkey=green);
input name $ yrsal bonuspct;
datalines;
Muriel 34567 3.2
Bjorn 74644 2.5
Freda 38755 4.1
Benny 29855 3.5
Agnetha 70998 4.1
;
run;
quit;
TIP Each password and encryption key option must be coded on a separate
line to ensure that they are properly blotted in the SAS log.
If you omit the ENCRYPTKEY= key value when accessing an AES secured data set,
a dialog box appears and prompts you to add the ENCRYPTKEY= key value. If the
data set is metadata-bound and the key has been stored in the metadata for the
library, the dialog box does not appear.
See Also
“AUTHLIB Procedure” in Base SAS Procedures Guide
In most cases, placing the password=value pair on a separate line blots the value:
data &ds(
read=secret
encrypt=aes
encryptkey=evenmoreso
);
x=1;
run;
%let ds=dataset;
data &ds(read=secret);
x=1
;
run;
n Typing errors cause the following passwords to show in the SAS log:
or
proc print data=library.abc(ERAD=secret);
run;
n If the code causes an ERROR message, the password is not blotted. For
example, in the following code the libref is misspelled causing SAS to issue the
message: "ERROR: Libref MYLUB is not assigned." and the password is not
blotted.
libname mylib 'c:\';
data mylub.abc(
read=secret
);
x=1;
run;
NOTE: The SAS System stopped processing this step because of errors.
Using Macros
When a password is assigned within a macro, the password is not blotted in the
SAS log when the macro executes. To prevent the password from being revealed in
the SAS log, you can redirect the SAS log to a file. For more information, see
“PRINTTO Procedure” in Base SAS Procedures Guide.
718 Chapter 29 / Protecting Files
Length of Passwords
In some cases, the length of the displayed password is fixed at eight blotted
characters. In other cases, the number of blotted characters is the length of the
password. Output from the OPTIONS procedure, VERBOSE option, and OPLIST
option have a fixed length of eight.
When a password value is being reported, its length is fixed at eight. But when a
password value is simply being echoed from an input statement, it retains its input
length. This example shows the length of the passwords:
options pdfpassword=(open=a owner=b );
proc options option=pdfpassword;
run;
PDFPASSWORD=XXXXXXXX
Specifies the password to use to open a PDF document and the
password used by a PDF document owner.
NOTE: PROCEDURE OPTIONS used (Total process time):
real time 0.04 seconds
cpu time 0.00 seconds
Metadata-Bound Libraries
A metadata-bound library is a physical library that is tied to a corresponding
metadata secured table object. Each physical table within a metadata-bound
library has information in its header that points to a specific metadata object. The
pointer creates a security binding between the physical table and the metadata
object. The binding ensures that SAS universally enforces metadata-layer access
requirements for the physical table—regardless of how a user requests access from
SAS. For more information, see SAS Guide to Metadata-Bound Libraries.
30
Repairing SAS Files
Overview
SAS can repair structural damage to SAS data sets and SAS catalogs resulting from
certain system errors. It can also rebuild simple indexes and integrity constraints
that were inadvertently deleted or damaged by using operating system commands.
These repairs can be made with the DLDMGACTION= system option, the
DLDMGACTION= data set option, or the PROC DATASETS REPAIR statement. For
more information about the damage that SAS can repair and how, see
“Understanding Damage That Can Be Repaired with SAS” on page 721.
n data sets whose audit files are damaged or have been deleted
n SAS views
n complex indexes
n indexes that were deleted when using the FORCE option in the SORT procedure.
The FORCE option overwrites the original file.
For these situations, take the action recommended in the table that follows.
DATA step views and PROC SQL views Re-create the file.
Truncated SAS data sets, or SAS data sets Recover from a backup device.1
whose audit file is damaged or deleted
1 When an audit file is damaged or accidentally deleted, recover the data set, its index, and the audit
file.
Repairing Damage to SAS Data Sets and Catalogs 721
n The disk where the data set (including the index file) or catalog is stored
becomes full before the file is completely written to it.
n An input/output error occurs while writing to the data set, index file, or catalog.
SAS repairs the structure of a data set by copying and re-creating the data set. A
data set’s indexes and integrity constraints are reconstructed from metadata in the
data set header. SAS attempts to restore the data in the data set, but in cases
where the disk is full before the data is written to it, the data set might not contain
the last several updates that occurred before the system failed. When this occurs,
it is recommended that you recover the data set from a backup device.
SAS repairs damaged catalogs as follows. SAS checks the catalog to see which
entries are damaged. If there is an error reading an entry, the entry is copied. If an
error occurs during the copy process, then the entry is automatically deleted. For
severe damage, the entire catalog is copied to a new catalog.
The DLDMGACTION=NOINDEX and REPAIR options should be used only when you
have seen an actual problem and are trying to rectify it. DLDMGACTION=REPAIR
repairs the data set and all of its indexes to the extent that it can. When
DLDMGACTION=REPAIR is set as a system option, damaged data sets and
catalogs are automatically repaired the next time that the data set or catalog is
accessed. This can prevent you from examining a data set for lost observations.
DLDMGACTION=REPAIR also does not distinguish between regular data sets and
722 Chapter 30 / Repairing SAS Files
generation data sets. Repairing a generation data set can be tricky. An automated
repair is not appropriate. For information to repair generation data sets, see
“Generation Data Sets” in SAS V9 LIBNAME Engine: Reference.
A damage log is maintained for every SAS data set that indicates how many times
the data set has been repaired and the date of the last repair. You can view the
repair log with PROC CONTENTS. See “Example: Determine How Many Times a
Data Set Was Repaired” on page 730.
Determine the proc options Prints the setting of the “Example: Determine
DLDMGACTION= option=dldmgaction value; DLDMGACTION= the Default Repair
run;
system option setting system option to the Action for Your SAS
for your SAS session SAS log. Session” on page 724.
View a data set’s repair proc contents Lists information about “Example: Determine
history data=filename; a SAS data set. Check How Many Times a
run;
the “Number of Data Data Set Was Repaired”
Set Repairs” and “Last on page 730
Repaired” entries in the
Engine/Host
Dependent Information
section of the output.
724 Chapter 30 / Repairing SAS Files
Examples
Example Code
The DLDMGACTION= system option controls the default repair action for your SAS
session. You can determine the setting of the DLDMGACTION= system option by
using PROC OPTIONS.
proc options option=dldmgaction value;
run;
Key Ideas
See Also
n “OPTIONS Procedure”
n “Example: Repair a SAS Data Set with the REPAIR Statement” on page 726.
n “Example: Override the Default Repair Action with the DLDMGACTION= Data
Set Option” on page 727.
Example Code
The following code sets the DLDMGACTION= system option to PROMPT and
verifies the setting with PROC OPTIONS.
options dldmgaction=prompt;
73 options dldmgaction=prompt;
74
75 proc options option=dldmgaction value;
76 run;
Key Ideas
n The OPTIONS statement changes the repair setting for all interactions in the
current SAS session. The system option can also be specified at SAS invocation, in
726 Chapter 30 / Repairing SAS Files
See Also
n “OPTIONS Statement” in SAS Global Statements: Reference
Example Code
The following code specifies to repair a damaged SAS data set named
mylib.myfile that is encrypted. The PROC DATASETS REPAIR statement is used
to repair the data set.
libname mylib "C:\dldmgactiontest";
After the process completes, the following message is written to the log:
Example Code 30.2 Log Message from the PROC DATASETS REPAIR Statement
Key Ideas
See Also
n “REPAIR Statement” in Base SAS Procedures Guide.
Example Code
The following request sets the DLDMGACTION= data set option for a request on
damaged data set mylib.myfile. For this example, the default action for the SAS
session is DLDMGACTION=REPAIR. The data set option specifies the NOINDEX
option for a PROC PRINT request on the data set using the DLDMGACTION= data
set option.
proc print data=mylib.myfile(dldmgaction=noindex);
run;
Example Code 30.3 Log Message Returned for a Data Set Repaired with the
DLDMGACTION=NOINDEX Option
Key Ideas
n The DLDMGACTION= data set option must be specified on an existing data set.
The option is not valid in statements that create a data set, except for the SET
statement.
n The NOINDEX option specifies to repair a damaged data set without the indexes
and integrity constraints. The repair also deletes the index file and updates the
data set to reflect the disabled indexes and integrity constraints. If the data set is
not damaged, the data set option has no effect.
n Because this data set is damaged, a warning is written to the SAS log instructing
you to execute the PROC DATASETS REBUILD statement to correct or delete the
disabled indexes and integrity constraints. The data set can be opened only in
input mode until you make the change.
n When used in the SET statement, the DLDMGACTION= data set option simply
omits indexes from the new data set.
See Also
n “DLDMGACTION= Data Set Option” in SAS Data Set Options: Reference
n “Example: Rebuild a Damaged SAS Data Set with the REBUILD Statement” on
page 729
Examples 729
Example Code
The following code rebuilds data set indexes and integrity constraints that were
disabled by the DLDMGACTION=NOINDEX data set option. The request rebuilds
indexes and integrity constraints that were removed in “Example: Override the
Default Repair Action with the DLDMGACTION= Data Set Option” on page 727.
proc datasets library=mylib;
rebuild myfile;
run;
quit;
Key Ideas
n The PROC DATASETS REBUILD statement enables you specify whether to restore
or delete the indexes and integrity constraints that were disabled with the
DLDMGACTION=NOINDEX option.
n The REBUILD statement restores disabled indexes and integrity constraints by
default (shown here). To delete the indexes and integrity constraints from the data
set, specify the NOINDEX option.
See Also
n “PROC DATASETS REBUILD Statement”
n “Example: Determine How Many Times a Data Set Was Repaired” on page 730.
730 Chapter 30 / Repairing SAS Files
Example Code
To determine how many times a data set has been repaired, use the CONTENTS
procedure. This example shows how many times data set mylib.myfile has been
repaired.
proc contents data=mylib.myfile;
run;
Key Ideas
n The CONTENTS procedure has two fields that report information about repairs:
See Also
"CONTENTS Procedure"
732 Chapter 30 / Repairing SAS Files
733
31
Compressing SAS Data Sets
Compression in SAS
Compressing a SAS data set is a process that reduces the number of bytes required
to represent each observation. In a compressed data set, each observation is a
varying-length record, while in an uncompressed data set, each observation is a
fixed-length record.
n less I/O operations necessary to read from or write to the data during
processing
32
Moving SAS Files
For details about this subject, see Moving and Accessing SAS Files.
736 Chapter 32 / Moving SAS Files
737
33
Cross-Environment Data Access
CEDA supports files that were created with SAS 7 and later releases. CEDA is a
Base SAS feature.
data representation
is the form in which data is stored in a particular operating environment.
Different operating environments use different standards or conventions for
storing data. (See “Compatible Data Representations” on page 741.)
n Floating-point numbers can be represented in IEEE floating-point format or
IBM floating-point format.
n Data alignment can be on a 1-byte, 4-byte, or 8-byte boundary, depending on
data type requirements for the operating environment.
n Data type lengths can be 8 bits or more for a character data type, 16 bit, 32
bit, or 64 bit for an integer data type, 32 bit for a single-precision floating-
point data type, and 64 bit for a double-precision floating-point data type.
n The ordering of bytes can be big Endian or little Endian.
encoding
is a set of characters (letters, logograms, digits, punctuation, symbols, control
characters, and so on) that have been mapped to numeric values (called code
points) that can be used by computers. The code points are assigned to the
characters in the character set by applying an encoding method. Some examples
of encodings are Wlatin1 and Danish EBCDIC. (See “Encoding Combinations
That Do Not Need CEDA Processing for Transcoding” in SAS National Language
Support (NLS): Reference Guide.)
incompatible
describes a file that has a different data representation or encoding than the
current SAS session. CEDA enables access to many types of incompatible files.
Advantages of CEDA
CEDA offers these advantages:
n You can transparently process a supported SAS file with no knowledge of the
file's data representation or encoding.
n No transport files are created. CEDA requires a single translation to the current
session's data representation, rather than multiple translations from the source
representation to the transport file to the target representation.
n CEDA eliminates the need to perform multiple steps in order to process the file.
CEDA supports SAS 7 and later SAS files that are created in directory-based
operating environments like UNIX and Windows. CEDA provides the following SAS
file processing for these SAS engines:
BASE
shipped default Base SAS engine and alias for the V9 engine in SAS 9, V8 in SAS
8, and V7 in SAS 7. Referred to as the V9 engine in the CEDA topics.
SASESOCK
TCP/IP port engine for SAS/CONNECT software.
SPDE
SAS Scalable Performance Data Engine, with some exceptions. For more
information, see “Accessing SPD Engine Files on Another Host” in SAS Scalable
Performance Data Engine: Reference. (Support was added in SAS 9.4M5.)
TAPE
sequential engine and alias for V9TAPE in SAS 9 , V8TAPE in SAS 8, V7TAPE in
SAS 7.
1 For output processing that replaces an existing SAS data set, there are behavioral differences. For
more information, see “How Output Processing Affects Encoding and Data Representation” on page
743.
2 CEDA supports SAS 8 and later MDDB files.
n Indexes are not supported. Therefore, WHERE optimization with an index is not
supported.
n Extended attributes cannot be updated, but they can be read.
n Other files that are not supported include DATA step views, SAS/ACCESS views
that are not for SAS/ACCESS for Oracle or SAP, stored compiled DATA step
programs, item stores, DMDB files, FDB files, or any SAS file that was created
prior to SAS 7.
n On z/OS, members of UNIX file system libraries can be created using any SAS
data representation. However, when bound libraries are created, they are
assigned the data representation of the SAS session that creates the library.
SAS does not allow the creation of bound library members with a data
representation that differs (except for the encoding) from the data
representation of the library. For example, if you create a bound library with 31-
bit SAS on z/OS, the library has a data representation of MVS_32 for the
duration of its existence, and you cannot use the OUTREP option of the
LIBNAME statement to create a member in the library with a data
representation other than MVS_32. For more information about library
implementation types for V9 and sequential engines on z/OS, see SAS
Companion for z/OS.
n SAS translates between data representations or transcodes between encodings
as the data is read. When you use multiple procedures, SAS must translate or
transcode the data multiple times, which can affect system performance.
SAS File Processing with CEDA 741
n If a data set is damaged, CEDA cannot process the file in order to repair it. CEDA
does not support update processing, which is required in order to repair a
damaged data set. To repair the file, you must move it back to the environment
where it was created or to a compatible environment that does not invoke
CEDA processing. For information about how to repair a damaged data set, see
the REPAIR statement in the DATASETS procedure in Base SAS Procedures
Guide.
n Transcoding could result in character data loss when encodings are
incompatible. See “Transcoding Considerations” in SAS National Language
Support (NLS): Reference Guide.
n Loss of precision can occur in numeric variables when you move data between
operating environments. If a numeric variable is defined with a short length, you
can try increasing the length of the variable. Full-size numeric variables are less
likely to encounter a loss of precision with CEDA. For more information, see
“Numeric Precision” on page 107.
n Numeric variables have a minimum length of either 2 or 3 bytes, depending on
the operating environment. In an operating environment that supports a
minimum of 3 bytes (such as Windows or UNIX), CEDA cannot process a
numeric variable that was created with a length of 2 bytes (for example, in
z/OS). If you encounter this restriction, then use the XPORT engine or the
CPORT and CIMPORT procedures instead of CEDA.
Note: If you encounter these restrictions because your files were created under a
previous version of SAS, consider using the MIGRATE procedure, which is
documented in the Base SAS Procedures Guide. PROC MIGRATE retains many
features, such as integrity constraints, indexes, and audit trails.
ALPHA_TRU64 (Tru64 UNIX) Although all of the environments in this group are
compatible, catalogs are an exception. Catalogs are
LINUX_IA64 (Linux for Itanium-based
compatible between Tru64 UNIX and Linux for Itanium.
systems)
Catalogs are compatible between Linux for x64, Solaris
LINUX_X86_64 (Linux for x64) for x64, and Linux on the Power Architecture.
SOLARIS_X86_64 (Solaris for x64)
742 Chapter 33 / Cross-Environment Data Access
LINUX_POWER_64 (Linux on the Power Linux on the Power Architecture is added in SAS Viya
Architecture) 3.5 and is not supported in SAS 9.
ALPHA_VMS_32 (OpenVMS Alpha) Although the 32-bit and 64-bit OpenVMS environments
have different data representations for some compiler
ALPHA_VMS_64 (OpenVMS Alpha)
types, SAS data sets that are created by the V9 engine
VMS_IA64 (OpenVMS on HP Integrity) do not store the data types that are different. Therefore,
if the encoding is compatible, CEDA is not used between
these environments. However, note that SAS 9 does not
support SAS 8 catalogs from OpenVMS. You can
migrate the catalogs with the MIGRATE procedure. For
more information, see the Base SAS Procedures Guide.
HP_IA64 (HP-UX for the Itanium Processor All of the environments in this group are compatible.
Family Architecture)
HP_UX_64 (HP-UX for PA-RISC, 64-bit)
RS_6000_AIX_64 (AIX)
SOLARIS_64 (Solaris for SPARC)
HP_UX_32 (HP-UX for PA-RISC) All of the environments in this group are compatible.
MIPS_ABI (MIPS ABI)
RS_6000_AIX_32 (AIX)
SOLARIS_32 (Solaris for SPARC)
LINUX_32 (Linux for Intel architecture) All of the environments in this group are compatible.
INTEL_ABI (ABI for Intel architecture)
WINDOWS_32 (32-bit SAS on Microsoft SAS data sets are compatible in these Windows
Windows) environments. Other file types such as catalogs are not
compatible between 32-bit and 64-bit SAS for
WINDOWS_64 (64-bit SAS on Microsoft
Windows.
Windows, for both Itanium-based systems
and x64)
SAS File Processing with CEDA 743
Compatible Encodings
Compatible encodings do not require CEDA processing for transcoding. See
“Encoding Combinations That Do Not Need CEDA Processing for Transcoding” in
SAS National Language Support (NLS): Reference Guide.
However, even when encodings are compatible, CEDA processing could be invoked
by an incompatible data representation.
In addition, some Microsoft Word characters such as smart quotation marks could
cause truncation errors. The characters require more than one byte in UTF-8
encoding.
If the data contains 7-bit ASCII characters (U.S. English) only, then the data is
compatible in any other ASCII session, including UTF-8. Other ASCII encodings use
the high-order bit for different national characters.
NOCLONE option and the OUTREP= option. When you use PROC COPY
with SAS/SHARE or SAS/CONNECT, the default behavior is to use the data
representation of the client session (not the server session).
n The SPD Engine uses the data representation of the current SAS session.
The CLONE option of PROC COPY is not supported.
The XPORT engine converts data representations but does not perform
transcoding. Use the XPORT engine to transport data sets between single-byte
encodings only. The XPORT engine expects that the bytes in a character
variable are single-byte, and emits them to the transport file as is if on an ASCII
Examples: CEDA 745
Examples: CEDA
Example Code
The following PRINT procedure generates a CEDA note because the SAS session
has a different data representation than that of the data set. The mytest data set
was created on Linux in “Example: Specify a Data Representation to Avoid CEDA”
on page 748.
libname myfiles v9 'c:\examples';
proc print data=myfiles.mytest;
run;
Here is the CEDA note in the SAS log. The note does not indicate an error.
Example Code 33.1 Log Output Showing CEDA Informational Note
The following SQL procedure attempts to update the mytest data set.
proc sql;
insert into myfiles.mytest
values ('other data string');
quit;
As shown by the following error message, CEDA does not support update
processing. The update fails.
746 Chapter 33 / Cross-Environment Data Access
Key Ideas
n CEDA processing is transparent and automatic, but most users want to know when
CEDA processing occurs. CEDA has several restrictions. For example, update
processing is not supported.
n Compatible encodings do not require CEDA processing for transcoding.
When SAS writes a CEDA note to the log, this note is informational. The note does
not indicate an error.
However, transcoding could result in character data loss when encodings are
incompatible. For example, a code point in one encoding could represent a
different character in another encoding. Therefore, always check your output when
you process a data set under a different encoding.
See Also
n “Restrictions for CEDA” on page 740
Example Code
This example creates the mytrunctest data set in a double-byte character set
(DBCS) session of SAS. The session encoding is SHIFT-JIS. The length of the a
variable is 22 bytes.
libname myfiles v9 'c:\examples';
Examples: CEDA 747
data myfiles.mytrunctest;
a='サンプルテキスト文字列';
run;
proc print data=myfiles.mytrunctest;
run;
A Unicode SAS session is started. The session encoding is UTF-8. In this session,
the PROC PRINT output is truncated.
libname myfiles v9 'c:\examples';
proc print data=myfiles.mytrunctest;
run;
WARNING: Some character data was lost during transcoding in the dataset
MYFILES.MYTRUNCTEST. Either the data contains characters that are not
representable in the new encoding or truncation occurred during transcoding.
The truncation occurs because the a variable requires more bytes in UTF-8
encoding than in SHIFT-JIS encoding. To prevent truncation, use the CVP engine to
expand the variable length. See “Example: Avoid Truncation When Copying a SAS
Library” in SAS V9 LIBNAME Engine: Reference and “Example: Avoid Truncation
When Migrating a SAS Library by Using a Two-Step Process” in SAS V9 LIBNAME
Engine: Reference.
Key Ideas
See Also
n “Restrictions for CEDA” on page 740
Example Code
In this example, the user is running 64-bit SAS on Microsoft Windows, which has a
data representation of WINDOWS_64. The following DATA step uses the OUTREP=
data set option to create a data set that has a data representation of LINUX_X86_64.
libname myfiles v9 'c:\examples';
data myfiles.mytest (outrep=linux_x86_64);
a='sample data string';
run;
proc contents data=myfiles.mytest;
run;
The SAS log displays the following message when the data set is created. The
message indicates that CEDA is invoked to create the data set. However, CEDA is
not invoked when the mytest data set is accessed in a SAS session on 64-bit Linux.
(CEDA is invoked if the encoding is not compatible.)
The CEDA message is written in the log again when the CONTENTS procedure
runs. Below is the output from PROC CONTENTS, showing LINUX_X86_64 as the
data representation.
Also, notice that SAS assigns the encoding latin1 Western (ISO). Without an
OUTREP= specification, the user’s session would assign wlatin1 Western
(Windows).
Examples: CEDA 749
Key Ideas
See Also
n “OUTREP= Data Set Option” in SAS Data Set Options: Reference
Example Code
In this example, the user is running SAS on Linux for x64. The session encoding is
latin1 Western (ISO). The following DATA step uses the ENCODING= data set
option to create a data set that has a UTF-8 encoding.
libname mylnx v9 '/mydata';
data mylnx.mytest (encoding=utf8);
a='sample data string';
run;
proc contents data=mylnx.mytest;
run;
The SAS log displays the following message when the data set is created. The
message indicates that CEDA is invoked to create the data set. However, CEDA is
not invoked when the mytest data set is accessed in a UTF-8 session encoding.
(CEDA is invoked if the operating environment is not compatible.)
The CEDA message is written in the log again when the CONTENTS procedure
runs. Below is the output from PROC CONTENTS, showing utf-8 Unicode
(UTF-8) as the encoding.
Examples: CEDA 751
Key Ideas
See Also
n “ENCODING= Data Set Option” in SAS National Language Support (NLS):
Reference Guide
n “INENCODING=, OUTENCODING= LIBNAME Statement Options” in SAS V9
LIBNAME Engine: Reference
n “Encoding for NLS” in SAS National Language Support (NLS): Reference Guide
752 Chapter 33 / Cross-Environment Data Access
Example Code
This example shows the correct way to specify a nondefault encoding such as
UTF-8 when you use the OUTREP= data set option or LIBNAME statement option.
When you specify the OUTREP= option, SAS ignores your session encoding and
assigns a default encoding. If you do not want the default encoding, you must
specify an encoding option.
In this example, the user is running SAS for Windows, and they want to create a
data set for a target session that is on Linux for x64. The user’s encoding and the
target encoding are both set to UTF-8 in their SAS configuration files. As noted
above, however, the user’s UTF-8 session encoding is ignored when they specify the
OUTREP= option. Therefore, the ENCODING=UTF8 option is needed below.
The following DATA step specifies both data representation and encoding for the
target environment.
libname myfiles 'C:\examples';
data myfiles.test (outrep=linux_x86_64 encoding=utf8);
x=1;
run;
proc contents data=myfiles.test;
run;
The SAS log displays the CEDA message when the data set is created and when the
CONTENTS procedure runs:
The PROC CONTENTS output shows that the data representation and encoding are
correctly assigned. Now the user can transport the data set to the target
environment, where it will not invoke CEDA.
Examples: CEDA 753
Key Ideas
n SAS has a default session encoding that is based on the following two values:
See Also
n “Default Values for DFLANG, DATESTYLE, and PAPERSIZE System Options
Based on the LOCALE= System Option” in SAS National Language Support
(NLS): Reference Guide
n “ENCODING System Option: UNIX, Windows, and z/OS” in SAS National
Language Support (NLS): Reference Guide
n “ENCODING= Data Set Option” in SAS National Language Support (NLS):
Reference Guide
n “INENCODING=, OUTENCODING= LIBNAME Statement Options” in SAS V9
LIBNAME Engine: Reference
n “Compatibility and Migration” in SAS V9 LIBNAME Engine: Reference
Example Code
The following statements return the session encoding and locale:
%put %sysfunc(getoption(encoding));
%put %sysfunc(getoption(locale));
Here is an example of the information that is written in the SAS log. The session
encoding is wlatin1, and the locale is en_us.
1 %put %sysfunc(getoption(encoding));
WLATIN1
2 %put %sysfunc(getoption(locale));
EN_US
The OPTIONS procedure is another way to check the session encoding and locale:
proc options option=(encoding locale) define value;
run;
PROC OPTIONS returns detailed information about the option settings. Many lines
in the example output below are omitted to highlight the relevant lines.
Examples: CEDA 755
To determine the data representation for your session, create a temporary data set
and submit the CONTENTS procedure or the CONTENTS statement of the
DATASETS procedure.
data sessiontest;
x=1;
run;
proc contents data=sessiontest;
run;
The sessiontest data set is created in the temporary Work library. Below is a
portion of the PROC CONTENTS output. The OUTREP= option was not specified
when the data set was created, so sessiontest has the data representation of the
session. The output also shows the session encoding.
Output 33.5 Portion of PROC CONTENTS Output to Learn the Session Encoding
and Data Representation
Key Ideas
n When you share data among different environments, you might need to check the
encoding and data representation of the SAS session. Knowing the locale can also
be useful.
n Use PROC OPTIONS or the GETOPTION function to query the session encoding.
756 Chapter 33 / Cross-Environment Data Access
n Use PROC CONTENTS (or the CONTENTS statement of PROC DATASETS) to see
the data representation and encoding of a data set.
n You cannot use PROC OPTIONS or the GETOPTION function to query the
session’s data representation value, because the OUTREP= option is not available
as a system option. Instead, create a temporary data set, without using OUTREP=
or ENCODING= options, so that the temporary data set has the session’s data
representation and encoding. Submit PROC CONTENTS (or the CONTENTS
statement of the DATASETS procedure). For an example that uses PROC SQL, see
Sample 55054.
See Also
n “GETOPTION Function” in SAS System Options: Reference
34
Managing SAS Catalogs
PART 7
Chapter 35
The SAS Registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761
Chapter 36
The SAS Windowing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781
Chapter 37
Cloud Analytic Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
Chapter 38
Industry Protocols Used in SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835
760
761
35
The SAS Registry
This configuration data is stored in a hierarchical form. The form works in a manner
similar to how directory-based file structures work under the operating
environments in UNIX and Windows, and under the z/OS UNIX System Services
(USS).
CAUTION
If you make a mistake when you edit the registry, your system might become
unstable or unusable.
Wherever possible, use the administrative tools, such as the New Library window,
the PRTDEF procedure, Universal Print windows, and the Explorer Options window,
to make configuration changes, rather than editing the registry directly. Using the
administrative tools ensures that values are stored properly in the registry when
you change the configuration.
CAUTION
If you use the Registry Editor to change values, you are not warned if any
entry is incorrect. Incorrect entries can cause errors, and can even prevent you from
starting a SAS session.
shortcuts or libraries that are assigned at start-up, and any other configuration
defaults for your site.
n The Sasuser library registry file contains the user defaults. When you change
your configuration information through a specialized window such as the Print
Setup window or the Explorer Options window, the settings are stored in the
Sasuser library.
This method prints the registry to the SAS log, and it produces a large list that
contains all registry entries, including subkeys. Because of the large size, it
might take a few minutes to display the registry using this method.
For more information about how to view the SAS registry, see the REGISTRY
PROCEDURE in “REGISTRY Procedure” in Base SAS Procedures Guide. Base SAS
Procedures Guide.
The key can be a place holder without values or subkeys associated with it, or it
can have many subkeys with associated values. Subkeys are delimited with a
764 Chapter 35 / The SAS Registry
backslash (\). The length of a single key name or a sequence of key names
cannot exceed 255 characters (including the square brackets and the
backslash). Key names can contain any character except the backslash and are
not case sensitive.
The SAS Registry contains only one top-level key, called SAS_REGISTRY. All the
keys under SAS_REGISTRY are subkeys.
subkey
A key inside another key. Subkeys are delimited with a backslash (\). Subkey
names are not case-sensitive. The following key contains one root key and two
subkeys:[SAS_REGISTRY\HKEY_USER_ROOT\CORE]
SAS_REGISTRY
is the root key.
HKEY_USER_ROOT
is a subkey of SAS_REGISTRY. In the SAS registry, there is one other subkey
at this level it is HKEY_SYSTEM_ROOT.
CORE
is a subkey of HKEY_USER_ROOT, containing many default attributes for
printers, windowing, and so on.
link
a value whose contents reference a key. Links are designed for internal SAS use
only. These values always begin with the word “link:”.
value
the names and content associated with a key or subkey. There are two
components to a value, the value name and the value content, also known as a
value datum.
Figure 35.1 Section of the Registry Editor Showing Value Names and Value Data for
the Subkey 'HTML'
.SASXREG file
a text file with the file extension .SASXREG that contains the text
representation of the actual binary SAS Registry file.
Managing the SAS Registry 765
CAUTION
If you use the Registry Editor to change values, you are not warned if any
entry is incorrect. Incorrect entries can cause errors, and can even prevent you from
starting a SAS session.
n modifying printer settings from the default printer settings that your system
administrator provides for you
n changing localization settings
1. The Sashelp part of the registry contains settings that are common to all users at your site. Sashelp is Write protected, and
can be updated only by a system administrator.
766 Chapter 35 / The SAS Registry
n If you delete the registry file called regstry.sas7bitm, which is located in the
Sasuser library, then SAS restores the Sasuser registry to its default settings.
CAUTION
Do not delete the registry file that is located in Sashelp; this prevents SAS
from starting.
1 Start SAS Explorer with the EXPLORER command, or select View ð Explorer.
5 Click Unhide.
If there is no icon associated with ITEMSTOR in the Type list, then you are
prompted to select an icon.
8 Select Copy from the pop-up menu and copy the Regstry file. SAS assigns the
name Regstry_copy to the file.
Operating Environment Information: You can also use a copy command from
your operating environment to make a copy of your registry file for backup
purposes. When viewed from outside SAS Explorer, the filename is
regstry.sas7bitm. Under z/OS, you cannot use the environment copy
command to copy your registry file unless your Sasuser library is assigned to an
HFS directory.
2 Select the top-level key in the left pane of the registry window.
4 Enter a name for your registry backup file in the filename field. (SAS applies the
proper file extension name for your operating system.)
768 Chapter 35 / The SAS Registry
5 Click Save.
This saves the registry backup file in Sasuser. You can control the location of your
registry backup file by specifying a different location in the Save As window.
To install the registry backup file that was created using SAS Explorer or an
operating system copy command:
2 Rename your backup file to regstry.sas7bitm, which is the name of your registry
file.
3 Copy your renamed registry file to the Sasuser location where your previous
registry file was located.
4 Click Open.
5 Restart SAS.
1 Open the Program editor and submit the following program to import the
registry file that you created previously.
proc registry import=<registry file specification>;
run;
2 If the file is not already properly named, then use Explorer to rename the
registry file to regstry.sas7bitm:
3 Restart SAS.
1 Rename the damaged registry file to something other than “registry” (for
example, temp).
5 Start the Registry Editor with the REGEDIT command. Select Solutions ð
Accessories ð Registry Editor ð View All.
7 Close your SAS session and rename the modified registry back to the original
name.
8 Open a new SAS session to see whether the changes fixed the problem.
The registry values for color are located in the COLORNAMES\HTML subkey.
4 Enter the color name in the Value Name field and the RGB value in the Value
Data field.
5 Click OK.
770 Chapter 35 / The SAS Registry
The easiest way is to first write the color values to a file in the layout that the
REGISTRY procedure expects. Then you import the file by using the REGISTRY
procedure. In this example, Spanish color names are added to the registry.
filename mycolors temp;
data _null_;
file "mycolors";
put "[colornames\html]";
put ' "rojo"=hex:ff,00,00';
put ' "verde"=hex:00,ff,00';
put ' "azul"=hex:00,00,ff';
put ' "blanco"=hex:ff,ff,ff';
put ' "negro"=hex:00,00,00';
put ' "anaranjado"=hex:ff,a5,00';
run;
After you add these colors to the registry, you can use these color names anywhere
that you use the color names supplied by SAS. For example, you could use the color
name in the GOPTIONS statement as shown in the following code:
goptions cback=anaranjado;
proc gtestit;
run;
Many of the windows in the SAS windowing environment update the registry for
you when you make changes to such items as your printer setting or your color
preferences. Because these windows update the registry using the correct syntax
and semantics, it is often best to use these alternatives when making adjustments
to SAS.
2 Enter all or part of the text string that you want to find, and click Options to
specify whether you want to find a key name, a value name, or data.
3 Click Find.
772 Chapter 35 / The SAS Registry
CAUTION
Before modifying registry values, always back up the regstry.sas7bitm file
from Sasuser.
1 In the left pane of the Registry Editor window, click the key that you want to
change. The values contained in the key appear in the right pane.
Figure 35.3 Example Window for Changing a Value in the SAS Registry
2 From the pop-up menu, select the New menu item with the type that you want
to create.
3 Enter the values for the new key or value in the window that is displayed.
774 Chapter 35 / The SAS Registry
Figure 35.4 Registry Editor with Pop-up Menu for Adding New Keys and Values
2 Select Rename from the pop-up menu and enter the new name.
3 Click OK.
Managing the SAS Registry 775
1 Select TOOLS ð Options ð Registry Editor This opens the Select Registry
View group box.
2 Select View All to display the Sasuser and Sashelp items separately in the
Registry Editor's left pane.
n The Sashelp portion of the registry is listed under the HKEY_SYSTEM_ROOT
folder in the left pane.
n The Sasuser portion of the registry is listed under the HKEY_USER_ROOT
folder in the left pane.
Note: In order to first create the backup registry file, you can use the REGISTRY
Procedure or the Export Registry File menu choice in the Registry Editor.
1 In the left pane of the Registry Editor, select the key that you want to export to
a SASXREG file.
To export the entire registry, select the top key.
4 Click Save.
data is stored in the registry under CORE\Explorer. The following table outlines the
location of the most commonly used Explorer configuration data.
Table 35.1 Registry Locations for Commonly Used Explorer Configuration Data
When you do this, they are stored in the SAS registry, where it is possible to modify
or delete them, as follows:
However, it is best to delete your library reference by using the SAS Explorer.
This removes this key automatically when you delete the file shortcut.
Note: You need special permission to write to the Sashelp part of the SAS
registry.
4 Click OK.
5 Verify that the file shortcut was created successfully and enter the REGEDIT
command.
8 Edit the exported file and replace all instances of HKEY_USER_ROOT with
HKEY_SYSTEM_ROOT.
Note: You need special permission to write to the Sashelp part of the SAS
registry.
5 Click OK.
6 Issue the REGEDIT command after verifying that the library was created
successfully.
12 Right-click the file and select Edit in NOTEPAD to edit the file.
13 Edit the exported file and replace all instances of “HKEY_USER_ROOT” with
“HKEY_SYSTEM_ROOT”.
14 To apply your changes to the site's Sashelp use PROC REGISTRY. The
following code imports the file:
proc registry import="yourfile.sasxreg" usesashelp;
run;
If any permanent libref that is stored in the SAS Registry fails at startup, then the
following note appears in the SAS Log:
NOTE: One or more library startup assignments were not restored.
n Required field values for libref assignment in the SAS Registry are invalid. For
example, library names are limited to eight characters, and engine values must
match actual engine names.
n Encrypted password data for a libref has changed in the SAS Registry.
Note: You can also use the New Library window to add librefs. You can open this
window by typing DMLIBASSIGN in the toolbar, or selecting File ð New from the
Explorer window.
780 Chapter 35 / The SAS Registry
CAUTION
You can correct many libref assignment errors in the SAS Registry Editor. If
you are unfamiliar with librefs or the SAS Registry Editor, then ask for technical support.
Errors can be made easily in the SAS Registry Editor, and they can prevent your
libraries from being assigned at startup.
or
CORE\OPTIONS\LIBNAMES\CONCATENATED
Note: These corrections are possible only for permanent librefs. That is, those that
are created at startup by using the New Library or File Shortcut Assignment
window.
For example, if you determine that a key for a permanent, concatenated library has
been renamed to something other than a positive whole number, then you can
rename that key again so that it is in compliance. Select the key, and then select
Rename from the pop-up menu to begin the process.
781
36
The SAS Windowing
Environment
The SAS windowing environment contains the windows that you use to create SAS
programs. However, you also find other windows that enable you to manipulate
data or change your SAS settings without writing a single line of code.
You might find the SAS windowing environment a convenient alternative to writing
a SAS program when you want to work with a SAS data set, or control some aspect
of your SAS session.
features of the SAS windowing environment, including toolbars, icons, menus, and
so on.
The five main windows in the SAS windowing environment are the Explorer,
Results, Enhanced Editor, Log, and Output windows.
When you first invoke SAS, the Enhanced Editor, Log, Output, and Explorer
windows are displayed. When you execute a SAS program, the default output
(HTML) is displayed in the Results window. If you use a PUT statement in your
program, then output is written to the SAS Log by default.
Note: The Microsoft Windows operating environment was used to create the
examples in this section. Menus and toolbars in other operating environments have
a similar appearance and behavior.
Windows Specifics: If you are using Microsoft Windows, the active window
determines which items are available on the main menu bar.
The following display shows one example of the arrangement of SAS windows. The
Explorer window shows active libraries.
Note: You can resize the Explorer window by dragging an edge or a corner of the
window. You can resize the left and right panes of the Explorer window by clicking
the split bar between the two panes and dragging it to the right or left.
Main Windows in the SAS Windowing Environment 785
Note: To open your SAS programs in the SAS windowing environment, you can
drag and drop them onto the Enhanced Editor window.
Program Editor window, follow the same steps for opening the Enhanced Editor
window, except select View ð Program Editor from the main menu. Alternatively,
you can enter PROGRAM or PGM in the command line and press Enter.
Log Window
Note: To keep the lines of your log from wrapping when your window is maximized,
use the LINESIZE= system option.
Results Window
Menu:
Select View ð Results.
Output Window
Note: To keep the lines of your output from wrapping when your window is
maximized, use the LINESIZE= system option.
Menu:
Select View ð Output.
The following example shows a program that produces LISTING output. There is an
ODS statement before the DATA statement and after the RUN statement:
The following is an example of the Preferences dialog box, with the Results tab
selected:
Main Windows in the SAS Windowing Environment 791
Several default values are selected in the Results tab. Under HTML, Create HTML
is the default output type, and HTMLBlue is the default output style. Use ODS
Graphics is also selected by default. When the Use ODS Graphics box is checked,
you are able to automatically generate graphs when running procedures that
support ODS graphics. Checking or unchecking this box enables you to turn on or
turn off ODS graphics when you invoke SAS.
To produce LISTING output, check the Create listing box under Listing. If you
deselect Create HTML and leave the Create listing box checked, your program
produces listing output only.
792 Chapter 36 / The SAS Windowing Environment
Menus in SAS
Menus contain lists of options that you can select.
The following example shows the menu options that are available when you select
Help from the menu bar:
Navigating in the SAS Windowing Environment 793
Menu choices change as you change the windows that you are using. For example, if
you select Explorer from the View menu, and then select View again, the menu
lists the View options that are available when the Explorer window is active.
The following display shows the View menu when the Explorer window is active:
794 Chapter 36 / The SAS Windowing Environment
If you select Program Editor from the View menu, and then select View again, the
menu lists the View options that are available when the Program Editor window is
active.
The following display shows the View menu when the Program Editor window is
active:
Navigating in the SAS Windowing Environment 795
Figure 36.10 View Options When the Program Editor Window Is Active
You can also access menus when you right-click an item. For example, when you
select View ð Explorer and then right-click Libraries in the Explorer window, the
following menu appears:
796 Chapter 36 / The SAS Windowing Environment
The menu remains visible until you make a selection from the menu or until you
click an area outside of the menu area.
Toolbars in SAS
A toolbar displays a block of window buttons or icons. When you click items in the
toolbar, a function or an action is started. For example, clicking a picture of a printer
in a toolbar starts a print process. The toolbar displays icons for many of the
actions that you perform most often in a particular window.
z/OS Specifics: SAS in the z/OS operating environment does not have a toolbar.
See SAS Companion for z/OS for more information.
The toolbar that you see depends on which window is active. For example, when
the Program Editor window is active, the following toolbar is displayed:
Figure 36.12 Example of the SAS Toolbar When the Enhanced Editor Window Is
Active
When you position your cursor at one of the items in the toolbar, a text window
appears that identifies the purpose of the icon.
Getting Help in SAS 797
for the item that you entered. The following window is displayed when you enter
Help footnote in the command line of a SAS session:
Figure 36.15 Results of Using Help in the Command Line of a SAS Session
Related items are displayed, along with the documents that contain the
information. Click a topic to view Help for that item.
n Send feedback.
About SAS®9
provides version and release information about SAS.
Note: You can use the DM statement in SAS to submit window commands as SAS
statements when running SAS in SAS Display Manager (DM).
The following table lists all portable SAS windows, window descriptions, and the
commands that open the windows.
800 Chapter 36 / The SAS Windowing Environment
SAS/ACCESS ACCESS
Note: Some additional SAS windows that are specific to your operating
environment might also be available. For more information, see the SAS
documentation for your operating environment.
If you are not familiar with SAS or with writing code in the SAS language, then you
might find the windowing environment helpful. With the windowing environment,
you can open a data set, point to rows and columns in your data. Then, you can
click menu items to reorganize and perform analyses on the information.
For more information about the SAS windowing environment, select SAS Help and
Documentation from the Help menu after you invoke a SAS session.
Work
is a library that is created by SAS at the beginning of each SAS session or SAS
job. Unless you have specified a User library, any newly created SAS file with a
one-level name is placed in the Work library by default. The newly created file is
deleted at the end of the current session or job.
n To view small icons, select Small Icons from the View menu.
n To view data sets in a list, select List from the View menu.
The following example uses large icons to show the contents of Sashelp:
If you select the Sashelp library and then select View ð Details from the menu bar,
the contents of the Sashelp library is displayed, along with the size and type of the
data sets:
806 Chapter 36 / The SAS Windowing Environment
If you double-click a table in this list, the data set opens. The VIEWTABLE window,
which is a SAS table viewer and editor, appears and is populated with the data from
the table.
2 In the File Shortcut Assignment window that appears, enter the name of the
fileref that you want to use in the Name field.
3 Enter the full pathname for the file in the File field.
The following display shows the File Shortcut Assignment window:
Managing Data with SAS Explorer 807
By default, filerefs that you create are temporary and can be used in the current
SAS session only. Selecting Enable at Start-up from the File Shortcut Assignment
window, however, assigns the fileref to the file whenever you start a new SAS
session.
3 Select Rename from the menu, and enter the new name of the data set.
4 Click OK.
3 From the menu that is displayed, choose Copy to copy a data set to another
library or catalog, or choose Duplicate to copy the data set to the same library
or catalog.
a Click the library in the left pane of SAS Explorer to select the library or
catalog into which the data set will be copied.
b In the right pane, right-click the mouse and select Paste from the menu that
appears.
A copy of the data set now resides in the new directory.
5 If you choose Duplicate, then the Duplicate window appears. In the Duplicate
window, SAS appends _copy to the data set name (for example, data-set-
name_copy).
Do one of the following:
n Keep the name and click OK.
n Create another name for your duplicated data set and click OK.
4 In the Description field of the General tab, you can enter a description of the
data set. To save the description, click OK.
5 Select other tabs to display additional information about the data set.
Overview of VIEWTABLE
To manipulate data interactively, you can use the SAS table editor, VIEWTABLE. In
the VIEWTABLE window, you can create a new table, and view or edit an existing
table.
810 Chapter 36 / The SAS Windowing Environment
Here are the steps for using the SAS Explorer window to open a SAS data set in a
VIEWTABLE window:
1 Open SAS Explorer and double-click on the icon for the library that contains the
target data set.
3 The VIEWTABLE window should appear, populated with data from the data set.
Working with VIEWTABLE 811
4 Use the scroll bar on the VIEWTABLE window to view all of the data.
1 Specify the VIEWTABLE command in the SAS Display Manager command line
using the following syntax:
2 Here is an example:
viewtable cars
n Using the VIEWTABLE pop-up menu to change the way table headers are
displayed:
1 Open a data set in VIEWTABLE (to access the VIEWTABLE pop-up menu,
you must have an active VIEWTABLE window open).
3 Select View ð Column Names or View ð Column Labels from the drop-
down View menu.
4 Once this selection is made, the opened table, and all tables that are
subsequently opened, will display table headers based on this setting in the
VIEWTABLE pop-up menu. When you exit VIEWTABLE, or exit SAS, the
preference for column labels or column names is saved. When you open
VIEWTABLE or invoke SAS again, the preference that you chose is
automatically selected.
This feature is available in SAS 9.4M1 and later releases.
n Using the VIEWTABLE command to change the way table headers are displayed
when a table is opened:
2 Here is an example:
viewtable cars colheading=names
opened from the SAS Explorer window. To do this, add the COLHEADING= option
to the Action Command in the SAS Explorer Options dialog box.
1 With the SAS Explorer window active, select Tools ð Options ð Explorer to
open the Explorer Options window.
3 Select Table in the list of registered types, and then click Edit to open the
TABLE Options dialog box.
4 Select the &Open Action Command in the list of actions, and then click Edit to
open the Edit Action dialog box.
5 In the Edit Action dialog box, add -COLHEADING=<value> to the end of the
VIEWTABLE command:
VIEWTABLE %8b.'%s'.DATA colheading=names
814 Chapter 36 / The SAS Windowing Environment
6 When you are finished making changes, click OK three times to exit all of the
open dialog boxes. From this point on, when you use the SAS Explorer Window
to open the VIEWTABLE window, SAS displays the table headers according to
what you specified in this SAS Explorer dialog box.
Note: These steps only affect how tables are displayed when they are opened from
the SAS Explorer Window (either by double-clicking on the icon or by right-clicking
on the icon and selecting "Open"). They do not affect how tables are opened when
you use the VIEWTABLE command to open a table.
If you open a table using the VIEWTABLE colheading=<value> command, SAS will
display the column headings according to the COLHEADING value, regardless of
how column headings are set in the VIEWTABLE pop-up menu. The setting in the
VIEWTABLE pop-up menu will reflect the COLHEADING= value. In other words,
COLHEADING= overrides the setting specified in the VIEWTABLE pop-up menu.
For information about the LABEL statement in SAS, see “LABEL Statement” in SAS
DATA Step Statements: Reference.
Working with VIEWTABLE 815
1 Select Tools ð Options ð Keys from the SAS menu. The Keys window will
appear.
2 In the Keys window, select the F-Key that you want to assign to the
VIEWTABLE command and place the cursor in the Definition field of the
selected F-Key.
3 Type the VIEWTABLE command with the desired option. Here is an example:
VIEWTABLE %8b. '%s'.DATA colheading=name
For more information about using VIEWTABLE, see Doing More with the SAS®
Display Manager: From Editor to ViewTable - Options and Tools You Should Know
(PDF).
1 Right-click the heading for the column that you want to change, and then select
Column Attributes from the menu.
816 Chapter 36 / The SAS Windowing Environment
2 In the Label field of the Column Attributes window, enter the new name of the
column heading and then click Apply.
In this example, the Name heading is replaced by the Name of Player label.
When you press Apply, the column heading in VIEWTABLE changes to the new
name.
In this example, the label was changed to Name of Player.
1 Click a column heading for the column that you want to move.
1 Right-click the heading of the column on which you want to sort, and select Sort
from the menu.
3 When the following warning message appears, click Yes to create a sorted copy
of the table.
818 Chapter 36 / The SAS Windowing Environment
Note: If you selected Edit Mode after opening the table and clicking a data cell,
this window does not appear. SAS updates the original table.
4 In the Sort window, enter the name of the new sorted table.
In this example, the name of the sorted table is BaseballStatisticsList.
5 Click OK.
The rows in the new table are sorted in ascending order by values of Team at
the End of 1986.
Working with VIEWTABLE 819
1 With the table open, select Edit ð Edit Mode from the Edit menu.
2 Click a cell in the table, and the value in the cell is highlighted.
In this example, the third cell in the fifth row is highlighted.
5 When prompted to save pending changes to the table, click Yes to save your
changes or No to disregard changes.
Note: If you make changes in one row and then edit cells in another row, the
changes in the first row are automatically saved. When you select File ð Close, you
are prompted to save the pending changes to the second row.
1 In the Explorer window, open a library and double-click the table that you want
to subset.
In this example, the Cars data table is selected.
Subsetting Data By Using the WHERE Expression 821
2 Right-click any table cell that is not a heading and select Where from the menu.
The WHERE EXPRESSION window appears.
3 In the Available Columns list, select a column, and then select an operator from
the Operators menu.
In this example, Make is selected from the Available Columns list, and EQ
(equal to) is selected from the Operators menu. Note that the WHERE
expression is being built in the Where box at the bottom of the window.
822 Chapter 36 / The SAS Windowing Environment
4 In the Available Columns list, select another value to complete the WHERE
expression.
In this example, scroll to the bottom of the Available Columns window and
select <LOOKUP distinct values>.
Note that the complete WHERE expression appears in the Where box at the
bottom of the window.
Subsetting Data By Using the WHERE Expression 823
Export Data
To export data, follow these steps:
2 Select the SAS data set from which you want to export data.
In this example, Sashelp is selected as the library, and Cars is the member
name.
3 Click Next and the Export Wizard - Select export type window appears.
4 Select the type of data source to which you want to export files.
Exporting a Subset of Data 825
6 In the Workbook field, enter the name of the workbook that will contain the
exported file and then click OK.
In this example, Myworkbook is entered as the name of the workbook.
7 When the Export Wizard - Select table window appears, enter a name for the
table that you are exporting.
In this example, Mytable is the table name.
826 Chapter 36 / The SAS Windowing Environment
8 Click Next.
9 If you want SAS to create a file of PROC EXPORT statements for later use, then
enter the name of the file that will contain the SAS statements.
In this example, PROC EXPORT statements are saved to the file. The Replace
file if it exists box is checked.
2 Select the type of file that you are importing by selecting a data source from the
Select a data source menu.
Note that Standard data source is selected by default. In this example,
Microsoft Excel Workbook is selected.
4 In the Connect to MS Excel window, enter the pathname of the file that you
want to export, and then click OK.
5 In the Import Wizard - Select table window, enter the name of the table that you
want to import.
7 In the Import Wizard - Select library and member window, enter a location in
which to store the imported file.
In this example, Work is selected as the library, and Book1 is selected as the
member name.
Importing Data into a Table 829
9 If you want SAS to create a file of PROC IMPORT statements for later use, then
enter the name of a file that will contain the SAS statements.
you a range of format options. To use EFI, select User-defined file format in the
Import Wizard and follow the directions for describing your data file.
831
37
Cloud Analytic Services
n SAS Viya is not a replacement for SAS 9.4. It is a platform designed to work with
SAS 9.4 and other languages such as Java, Python, Lua, and R.
For information about DATA step processing in CAS, see SAS Cloud Analytic
Services: DATA Step Programming.
Not all SAS language elements are supported for DATA step processing in CAS.
Language elements that are not supported in CAS are marked in the documentation
with a “Restriction” as shown in the following image:
Figure 37.1 Example Documentation Syntax Page That Shows a “Restriction” to Indicate That the
Language Element Is Not Supported in CAS
SAS language elements that are supported in CAS display “CAS” in the Categories
field of the syntax page for the language element:
SAS Language Elements for CAS 833
Figure 37.2 Example Documentation Syntax Page That Shows Support for CAS in the Categories Field
Figure 37.3 Example Documentation Category Page Showing CAS-Supported Language Elements
Here is a list of category tables for each of the SAS language element types:
n DATA Step Statements By Category in SAS DATA Step Statements: Reference
n Functions and CALL Routines By Category in SAS Functions and CALL Routines:
Reference
n Formats By Category in SAS Formats and Informats: Reference
that are supported by CAS, see SAS Viya Foundation Procedures in An Introduction
to SAS Viya Programming.
For information about these language elements, see the following CAS
documentation:
n CAS Language Element Syntax: SAS Cloud Analytic Services: User’s Guide
n CAS Actions in SAS Viya Actions and Action Sets by Name and Product, SAS
Viya: System Programming Guide and CAS DATA Step Action in SAS Cloud
Analytic Services: DATA Step Programming
Note: A SAS Viya Visual Analytics license is required for access to SAS Cloud
Analytic Services.
835
38
Industry Protocols Used in SAS
The EMAILSYS system option specifies which e-mail system to use for sending
electronic mail from within SAS. For more information about the EMAILSYS system
option, see the SAS documentation for your operating environment.
The following system options are specified only when the SMTP e-mail interface is
supported at your site:
EMAILACKWAIT=
specifies the number of seconds that SAS will wait to receive an
acknowledgment from an SMTP server. For more information, see
“EMAILACKWAIT= System Option” in SAS System Options: Reference.
EMAILAUTHPROTOCOL=
specifies the authentication protocol for SMTP E-mail. For more information,
see the “EMAILAUTHPROTOCOL= System Option” in SAS System Options:
Reference.
EMAILFROM
specifies whether the FROM e-mail option is required when sending e-mail by
using either the FILE or FILENAME statements. For more information, see the
“EMAILFROM System Option” in SAS System Options: Reference.
EMAILHOST
specifies the SMTP server that supports e-mail access for your site. For more
information, see the “EMAILHOST= System Option” in SAS System Options:
Reference.
EMAILPORT
specifies the port to which the SMTP server is attached. For more information,
see the “EMAILPORT System Option” in SAS System Options: Reference.
SAS Language Elements That Control SMTP E-Mail 837
EMAILUTCOFFSET
specifies a UTC offset that is used in the Date: header field of the e-mail
message. For more information, see the “EMAILUTCOFFSET= System Option” in
SAS System Options: Reference.
The following system options are specified with other e-mail systems, as well as
SMTP:
EMAILID=
specifies the identity of the individual sending e-mail from within SAS. For more
information, see the “EMAILID= System Option” in SAS System Options:
Reference.
EMAILPW=
specifies your e-mail login password. For more information, see the “EMAILPW=
System Option” in SAS System Options: Reference.
FILENAME Statement
In the FILENAME statement, the EMAIL (SMTP) access method enables you to
send e-mail programmatically from SAS using the SMTP e-mail interface. For more
information, see the “FILENAME Statement” in SAS Global Statements: Reference.
In the DATA step, after using the FILE statement to define your e-mail fileref as the
output destination, use PUT statements to define the body of the message. The
PUT statement directives override any other e-mail options in the FILE and
FILENAME statements.
838 Chapter 38 / Industry Protocols Used in SAS
Some SMTP servers require just the user identification as the login ID while others
require the full e-mail address. The SAS SMTP e-mail interface authenticates the
user identification in the following order.
2 If the user ID is not specified by the USERID= option, the SAS SMTP e-mail
interface attempts to authenticate by using the user ID specified by the FROM=
option of the FILENAME= statement.
4 If the user ID is not specified by the EMAILID= system option, the SAS SMTP e-
mail interface looks up the user ID from the operating system and attempts to
authenticate that user ID.
For more information about sending e-mail from SAS, see the SAS documentation
for your operating environment.
The UUIDGEND utility is required for non-Windows hosts that are running versions
of SAS prior to SAS 9.4M2.
n SAS applications that execute in UNIX environments that are running SAS
version 9.4M2 (or later)
sasOperatorPort: 6340
sasUUIDNode: 0123456789ab
sasUUIDPort: 6341
description: SAS Session UUID Generator Daemon on UNIX
object spawner. The OBJSPAWN.COM file also includes the following commands
that your site might need to perform before the object spawner is started:
n command to set the display node
n command to define a process level logical name that points to a template DCL
file (OBJSPAWN_TEMPLATE.COM)
The OBJSPAWN_TEMPLATE.COM file performs setup that is needed in order for the
client process to execute. The object spawner first checks to see whether the
logical name SAS$OBJSPAWN_TEMPLATE is defined. If it is, the commands in the
template file are executed as part of the command sequence used when starting
the client session. You do not have to define the logical name.
UUIDGEN Function
The UUIDGEN function returns a UUID for each cell. For more information, see
“UUIDGEN Function” in SAS Functions and CALL Routines: Reference.
842 Chapter 38 / Industry Protocols Used in SAS
Overview of IPv6
SAS 9.2 introduced support for the next generation of Internet Protocol, IPv6,
which is the successor to the current Internet Protocol, IPv4. Rather than replacing
IPv4 with IPv6, SAS supports both protocols. There is a lengthy transition period
during which the two protocols coexist.
A primary reason for the new protocol is that the limited supply of 32-bit IPv4
address spaces was being depleted. IPv6 uses a 128-bit address scheme. This
scheme provides more IP addresses than did IPv4.
n automatic configuration
Table 38.1 Comparison of Features of the IPv6 and IPv4 Address Formats
The :: (consecutive colons) notation can be used to represent four successive 16-bit
blocks that contain zeros. When SAS software encounters a collapsed IP address, it
reconstitutes the address to the required 128-bit address in eight 16-bit blocks.
The brackets are necessary only if also specifying a port number. Brackets are used
to separate the address from the port number. If no port number is used, the
brackets can be omitted.
As an alternative, the block that contains the zero can be collapsed. Here is an
example:
[2001:db8::1]:80
The http:// prefix specifies a URL. The brackets are necessary only if also
specifying a port number. Brackets are used to separate the address from the port
number. If no port number is used, the brackets can be omitted.
To avoid such problems, use of an FQDN is preferred over an IP address. The name-
resolution system that is part of the TCP/IP protocol is responsible for locating the
IP address that is associated with the FQDN.
PASSWORD="mypassword"
PROTOCOL=BRIDGE
ACTION=RESUME
OPTIONS=""
NOAUTOPAUSE;
If an IP address had been used and if the IP address that was associated with the
computer node name had changed, the code would be inaccurate.
An FQDN can remain intact in the code while the underlying IP address can change
without causing unpredictable results. The TCP/IP name-resolution system
automatically resolves the FQDN to its associated IP address.
The full FQDN, d11076.na.apex.com, is specified in the Remote Host field of the
Connect Server Properties window in SAS Management Console.
Some SAS products impose limits on the length for computer names.
Because the FQDN is longer than eight characters, the FQDN must be assigned to a
SAS macro variable, which is used in the RSUBMIT statement.
846 Chapter 38 / Industry Protocols Used in SAS
847
PART 8
Appendixes
Appendix 1
Data Sets Used in Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 849
Appendix 2
Understanding How the DATA Step Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863
Appendix 3
Updating Data Using the MODIFY Statement and the KEY= Option . . . . . 879
848
849
Appendix 1
Data Sets Used in Examples
Animal
data animal;
input common $ animal $;
datalines;
a Ant
b Bird
c Cat
d Dog
e Eagle
f Frog
;
AnimalDupes
data animalDupes;
input common $ animal $;
datalines;
a Ant
a Ape
b Bird
c Cat
d Dog
e Eagle
;
AnimalMissing
data animalMissing;
input common $ animal $;
datalines;
a Ant
c Cat
d Dog
e Eagle
;
Data Sets Used in Examples 851
CarSales
data carSales;
input transID $1-9 make $11-19 model $21-38;
datalines;
CSH203073 Chevrolet Aveo 4dr
CCA460564 Chevrolet Aveo LS 4dr htc
DEB848135 Honda Civic DX 2dr
CCA762350 Honda Insight 2dr
CCA314633 Hyundai Accent 2dr hatch
CSH118553 Hyundai Accent GL 4dr
CCA295057 Hyundai Accent GT 2dr htc
CSH285627 Kia Rio 4dr auto
DEB898883 Kia Rio 4dr manual
CCA803483 Kia Rio Cinco
DEB661568 Mazda MX-5 Miata LS
CSH36345 Mazda MX-5 Miata
CCA562255 Scion xA 4dr hatch
CCA651575 Scion xB
DEB322009 Toyota Echo 2dr auto
CSH607254 Toyota Echo 2dr manual
CSH879119 Toyota Echo 4dr
CSH230940 Toyota MR2 Spyder
;
CarsSmall
data carsSmall;
input make $1-9 model $11-39 type $41-46 driveTrain $48-52 weight
$54-57 length 59-61 makeModelDrive $63-120;
datalines;
Chevrolet Aveo 4dr Sedan Front 2370 167
Chevrolet Aveo 4dr - Front wheel drive
Chevrolet Aveo LS 4dr hatch Sedan Front 2348 153
Chevrolet Aveo LS 4dr hatch - Front wheel drive
Honda Civic DX 2dr Sedan Front 2432 175 Honda
Civic DX 2dr - Front wheel drive
Honda Insight 2dr (gas/electric) Hybrid Front 1850 155 Honda
Insight 2dr (gas/electric) - Front wheel drive
Hyundai Accent 2dr hatch Sedan Front 2255 167 Hyundai
Accent 2dr hatch - Front wheel drive
Hyundai Accent GL 4dr Sedan Front 2290 167 Hyundai
Accent GL 4dr - Front wheel drive
Hyundai Accent GT 2dr hatch Sedan Front 2339 167 Hyundai
Accent GT 2dr hatch - Front wheel drive
Kia Rio 4dr auto Sedan Front 2458 167 Kia Rio
4dr auto - Front wheel drive
852 Appendix 1 / Data Sets Used in Examples
Kia Rio 4dr manual Sedan Front 2403 167 Kia Rio
4dr manual - Front wheel drive
Kia Rio Cinco Wagon Front 2447 167 Kia Rio
Cinco - Front wheel drive
Mazda MX-5 Miata LS convertible 2dr Sports Rear 2387 156 Mazda
MX-5 Miata LS convertible 2dr - Rear wheel drive
Mazda MX-5 Miata convertible 2dr Sports Rear 2387 156 Mazda
MX-5 Miata convertible 2dr - Rear wheel drive
Scion xA 4dr hatch Sedan Front 2340 154 Scion xA
4dr hatch - Front wheel drive
Scion xB Wagon Front 2425 155 Scion xB
- Front wheel drive
Toyota Echo 2dr auto Sedan Front 2085 163 Toyota
Echo 2dr auto - Front wheel drive
Toyota Echo 2dr manual Sedan Front 2035 163 Toyota
Echo 2dr manual - Front wheel drive
Toyota Echo 4dr Sedan Front 2055 163 Toyota
Echo 4dr - Front wheel drive
Toyota MR2 Spyder convertible 2dr Sports Rear 2195 153 Toyota
MR2 Spyder convertible 2dr - Rear wheel drive
;
Class
data class;
input name $ age height weight;
format weight comma8.;
format height comma8.;
datalines;
Alice 13 56.5 84.00
Barbara 13 65.3 98.00
Carol 14 62.8 102.50
Jane 12 59.8 84.50
Janet 15 62.5 112.50
Joyce 11 51.3 50.50
Judy 14 64.3 90.00
Louise 12 56.3 77.00
Mary 15 66.5 112.00
;
Classfit
data classfit;
input name $ age height weight predict;
format weight comma8.2;
format predict comma8.1;
label weight="Weight in Pounds";
datalines;
Janet 15 62.5 112.5 100.662
Mary 15 66.5 112.0 116.259
Data Sets Used in Examples 853
Inventory
data Inventory;
input partNumber $ partName $;
datalines;
K89R seal
M4J7 sander
LK43 filter
MN21 brace
BC85 clamp
NCF3 valve
KJ66 cutter
UYN7 rod
JD03 switch
BV1E timer
;
InventoryAdd
data InventoryAdd;
input partNumber $ partName $ newStock newPrice;
format newPrice dollar12.2;
datalines;
K89R seal 6 247.50
AA11 hammer 55 32.26
BB22 wrench 21 17.35
KJ66 cutter 10 24.50
CC33 socket 7 22.19
BV1E timer 30 36.50
;
One
data one;
input ID state $;
datalines;
1 AZ
2 MA
3 WA
4 WI
;
854 Appendix 1 / Data Sets Used in Examples
Many
data many;
input ID city $ state $;
datalines;
1 Phoenix Ariz
2 Boston Mass
2 Foxboro Mass
3 Olympia Mass
3 Seattle Wash
3 Spokane Wash
4 Madison Wis
4 Milwaukee Wis
4 Madison Wis
4 Hurley Wis
;
Master
data master;
input common $ animal $ plant $;
datalines;
a Ant Apple
b Bird Banana
c Cat Coconut
d Dog Dewberry
e Eagle Eggplant
f Frog Fig
;
Minerals
data minerals;
input common $ plant $ mineral $;
datalines;
a Apricot Amethyst
b Barley Beryl
c Cactus .
e . .
f Fennel .
g Grape Garnet
;
Data Sets Used in Examples 855
Plant
data plant;
input common $ plant $;
datalines;
a Apple
b Banana
c Coconut
d Dewberry
e Eggplant
f Fig
;
PlantDupes
data plantDupes;
input common $ plant $;
datalines;
a Apple
b Banana
c Coconut
c Celery
d Dewberry
e Eggplant
;
PlantG
data plantG;
input common $ plant $;
datalines;
a Apple
b Banana
c Coconut
d Dewberry
e Eggplant
g Fig
;
PlantMissing
data plantMissing;
856 Appendix 1 / Data Sets Used in Examples
PlantMissing2
data plantMissing2;
input common $ plant $;
datalines;
a Apple
b Banana
c Coconut
e Eggplant
f Fig
;
PlantNew
data plantNew;
input common $ plant $;
datalines;
a Apricot
b Barley
c Cactus
d Date
e Escarole
f Fennel
;
PlantNewDupes
data plantNewDupes;
input common $ plant $;
datalines;
a Apricot
b Barley
c Cactus
d Date
d Dill
e Escarole
f Fennel
;
Data Sets Used in Examples 857
Product_List
data product_list;
input Product_Id Product_Name $14-49 Supplier_ID;
datalines;
240200100101 Grandslam Staff Tour Mhl Golf Gloves 3808
210200100017 Sweatshirt Children's O-Neck 3298
240400200022 Aftm 95 Vf Long Bg-65 White 1280
230100100017 Men's Jacket Rem 50
210200300006 Fleece Cuff Pant Kid'S 1303
210200500002 Children's Mitten 772
210200700016 Strap Pants BBO 798
210201000050 Kid Children's T-Shirt 2963
210200100009 Kids Sweat Round Neck,Large Logo 3298
210201000067 Logo Coord.Children's Sweatshirt 2963
220100100019 Fit Racing Cap 1303
220100100025 Knit Hat 1303
220100300001 Fleece Jacket Compass 772
220200200036 Soft Astro Men's Running Shoes 1747
230100100015 Men's Jacket Caians 50
230100500004 Backpack Flag, 6,5x9 Cm. 316
210200500006 Rain Suit, Plain w/backpack Jacket 772
230100500006 Collapsible Water Can 316
224040020000 Bat 5-Ply 3808
220200200035 Soft Alta Plus Women's Indoor Shoes 1747
240400200066 Memhis 350,Yellow Medium, 6-pack 1280
240200100081 Extreme Distance 90 3-pack 3808
;
datalines;
1 2781
2 1990
;
data quarter4;
length mileage 6;
input account mileage;
datalines;
1 3278
2 2209
;
Sales
data Sales;
input partNumber $ partName $ salesPerson $;
datalines;
NCF3 valve JN
BV1E timer JN
LK43 filter KM
K89R seal SJ
LK43 filter JN
M4J7 sander KM
BV1E timer KM
;
Sales2019
data Sales2019;
input partNumber $ lastSoldDate mmddyy10.;
format lastSoldDate mmddyy10.;
datalines;
BC85 10/15/2019
BV1E 10/23/2019
KJ66 11/11/2019
LK43 09/12/2019
MN21 09/13/2019
NCF3 07/24/2019
UYN7 12/11/2019
;
Supplier
data Supplier;
Data Sets Used in Examples 859
Table1
data Table1;
set sashelp.class(where=(age=14));
run;
Table2
data Table2;
set sashelp.classfit(where=(age=14));
drop uppermean lower upper;
run;
proc sort data=Table2; by name; run;
Year1
data Year1;
input date;
datalines;
2009
2010
2011
2012
;
860 Appendix 1 / Data Sets Used in Examples
Year2
data Year2;
input date;
datalines;
2010
2011
2012
2013
2014
;
In the zone portion of the punched card (the first three rows), the zone component
of the pair can have the values 12, 11, 0 (or 10), or not punched. In the digit portion of
the card (the fourth through the twelfth rows), the digit component of the pair can
have the values 1 through 9, or not punched.
row punch
12 X X X X X X X X X
zone 11 X X X X X X X X X
portion 10 X X X X X X X X
1 X X
2 X X X
3 X X X
4 X X X
digit
portion 5 X X X
6 X X X
7 X X X
8 X X X
9 X X X
alphabetic
character A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
SAS stores each column of column-binary data (a “virtual” punched card) in two
bytes. Since each column has only 12 positions and since 2 bytes contain 16
positions, the 4 extra positions within the bytes are located at the beginning of
each byte. The following figure shows the correspondence between the rows of
“virtual” punched card data and the positions within 2 bytes that SAS uses to store
them. SAS stores a punched position as a binary 1 bit and an unpunched position as
a binary 0 bit.
byte 1 byte 2
byte
positions 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
12
11
10 ( o r 0)
1
2
3
4
5
6
7
8
9
862 Appendix 1 / Data Sets Used in Examples
863
Appendix 2
Understanding How the DATA
Step Works
Flow of Action
When you submit a DATA step for execution, it is first compiled and then executed.
The following figure shows the flow of action for a typical SAS DATA step.
864 Appendix 2 / Understanding How the DATA Step Works
input buffer
is a logical area in memory into which SAS reads each record of raw data when
SAS executes an INPUT statement. This buffer is created only when the DATA
step reads raw data. When the DATA step reads a SAS data set, SAS reads the
data directly into the program data vector.
program data vector (PDV)
is a logical area in memory where SAS builds a data set, one observation at a
time. When a program executes, SAS reads data values from the input buffer or
creates them by executing SAS language statements. The data values are
assigned to the appropriate variables in the program data vector. From here,
SAS writes the values to a SAS data set as a single observation.
Along with data set variables and computed variables, the PDV contains two
automatic variables: _N_ and _ERROR_. The _N_ variable counts the number of
times the DATA step begins to iterate. The _ERROR_ variable signals the
occurrence of an error caused by the data during execution. The value of
_ERROR_ is either 0 (indicating no errors exist), or 1 (indicating that one or more
errors have occurred). SAS does not write these variables to the output data
set.
descriptor information
is information that SAS creates and maintains about each SAS data set,
including data set attributes and variable attributes. For example, it contains the
name of the data set, its member type, the date and time that the data set was
created, and the number, names, and data types (character or numeric) of the
variables. The descriptor information also contains information about extended
attributes (if defined on a data set). Extended attribute descriptor information
includes the name of the attribute, the name of the variable, and the value of
the attribute.
1 The DATA step begins with a DATA statement. Each time the DATA statement
executes, a new iteration of the DATA step begins, and the _N_ automatic
variable is incremented by 1.
2 SAS sets the newly created program variables to missing in the program data
vector (PDV).
3 SAS reads a data record from a raw data file into the input buffer, or it reads an
observation from a SAS data set directly into the program data vector. You can
use an INPUT, MERGE, SET, MODIFY, or UPDATE statement to read a record.
4 SAS executes any subsequent programming statements for the current record.
5 At the end of the statements, an output, return, and reset occur automatically.
SAS writes an observation to the SAS data set, the system automatically
866 Appendix 2 / Understanding How the DATA Step Works
returns to the top of the DATA step, and the values of variables created by
INPUT and assignment statements are reset to missing in the program data
vector. Note that variables that you read with a SET, MERGE, MODIFY, or
UPDATE statement are not reset to missing here.
6 SAS counts another iteration, reads the next record or observation, and
executes the subsequent programming statements for the current observation.
7 The DATA step terminates when SAS encounters the end-of-file in a SAS data
set or a raw data file.
Note: The figure shows the default processing of the DATA step. You can place
data-reading statements (such as INPUT or SET), or data-writing statements (such
as OUTPUT), in any order in your program.
1 The DROP= data set option prevents the variable TeamName from being written
to the output SAS data set called Total_Points.
2 The INPUT statement describes the data by giving a name to each variable,
identifying its data type (character or numeric), and identifying its relative
location in the data record.
Processing a DATA Step: A Walk-Through 867
3 The SUM statement accumulates the scores for three events in the variable
TeamTotal.
The PDV contains all the variables in the input data set, the variables created in
DATA step statements, and the two variables, _N_ and _ERROR_, that are
automatically generated for every DATA step. The _N_ variable represents the
number of times the DATA step has iterated. The _ERROR_ variable acts like a
binary switch whose value is 0 if no errors exist in the DATA step, or 1 if one or more
errors exist. The following figure shows the Input Buffer and the program data
vector after DATA step compilation.
Input Buffer
1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
Variables that are created by the INPUT and the Sum statements (TeamName,
ParticipantName, Event1, Event2, Event3, and TeamTotal) are set to missing
initially. Note that in this representation, numeric variables are initialized with a
period and character variables are initialized with blanks. The automatic variable
_N_ is set to 1; the automatic variable _ERROR_ is set to 0.
The variable TeamName is marked Drop in the PDV because of the DROP= data set
option in the DATA statement. Dropped variables are not written to the SAS data
set. The _N_ and _ERROR_ variables are dropped because automatic variables
created by the DATA step are not written to a SAS data set.See Chapter 5,
“Variables,” on page 75 for details about automatic variables.
868 Appendix 2 / Understanding How the DATA Step Works
Reading a Record
SAS reads the first data line into the input buffer. The input pointer, which SAS uses
to keep its place as it reads data from the input buffer, is positioned at the
beginning of the buffer, ready to read the data record. The following figure shows
the position of the input pointer in the input buffer before SAS reads the data.
Figure A2.3 Position of the Pointer in the Input Buffer Before SAS Reads Data
Input Buffer
1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
Kn i g h t s S u e 6 8 8
The INPUT statement then reads data values from the record in the input buffer
and writes them to the PDV where they become variable values. The following
figure shows both the position of the pointer in the input buffer, and the values in
the PDV after SAS reads the first record.
Figure A2.4 Values from the First Record Are Read into the Program Data Vector
Input Buffer
1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
Kn i g h t s S u e 6 8 8
After the INPUT statement reads a value for each variable, SAS executes the Sum
statement. SAS computes a value for the variable TeamTotal and writes it to the
PDV. The following figure shows the PDV with all of its values before SAS writes
the observation to the data set.
Figure A2.5 Program Data Vector with Computed Value of the Sum Statement
SAS then returns to the DATA statement to begin the next iteration. SAS resets the
values in the PDV in the following way:
n The values of variables created by the INPUT statement are set to missing.
n The value of the automatic variable _N_ is incremented by 1, and the value of
_ERROR_ is reset to 0.
22 2 0
Drop Drop Drop
Figure A2.8 Input Buffer, Program Data Vector, and First Two Observations
Input Buffer
1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
C a r d i n a l s J a n e 9 7 8
As SAS continues to read records, the value in TeamTotal grows larger as more
participant scores are added to the variable. _N_ is incremented at the beginning of
each iteration of the DATA step. This process continues until SAS reaches the end
of the input file.
Data-reading statements: 1
Optional SAS programming statements, further processes the data for the current
for example: observation
1 The table shows the default processing of the DATA step. You can alter the sequence of statements
in the DATA step. You can code optional programming statements, such as creating or reinitializing a
constant, before you code a data-reading statement.
Note: You can also use functions to read and process data. For information about
how statements and functions process data differently, see “Using Functions to
Manipulate Files” in SAS Functions and CALL Routines: Reference. For specific
information about SAS functions, see the SAS File I/O and External Files categories
in “SAS Functions and CALL Routines by Category” in SAS Functions and CALL
Routines: Reference.
More About DATA Step Execution 873
For more information, see the individual statements in SAS DATA Step Statements:
Reference.
LINK and RETURN statements alter the flow of control, execute statements
following the label specified, and return
control of the program to the next statement
following the LINK statement.
HEADER= option in the FILE alters the flow of control whenever a PUT
statement statement causes a new page of output to
begin; statements following the label
specified in the HEADER= option are
executed until a RETURN statement is
encountered, at which time control returns to
the point from which the HEADER= option
was activated.
EOF= option in an INFILE statement alters the flow of execution when the end of
the input file is reached; statements following
the label that is specified in the EOF= option
are executed at that time.
More About DATA Step Execution 875
_N_ automatic variable in an IF- causes parts of the DATA step to execute
THEN construct only for particular iterations.
Step Boundaries
Understanding step boundaries is an important concept in SAS programming
because step boundaries determine when SAS statements take effect. SAS
executes program statements only when SAS crosses a default or a step boundary.
Consider the following DATA steps:
data _null_; 1
set allscores(drop=score5-score7);
title 'Student Test Scores'; 2
data employees; 3
set employee_list;
run;
3 The DATA statement is the default boundary for the first DATA step.
The TITLE statement in this example is in effect for the first DATA step as well as
for the second because the TITLE statement appears before the boundary of the
first DATA step. This example uses the default step boundary data employees;.
data test;
set alltests;
run;
The OPTIONS statement specifies that the first observation that is read from the
input data set should be the 5th, and the last observation that is read should be the
55th. Inserting a RUN statement immediately before the OPTIONS statement
causes the first DATA step to reach its boundary (run;) before SAS encounters the
OPTIONS statement. The OPTIONS statement settings, therefore, are put into
effect for the second DATA step only.
Following the statements in a DATA step with a RUN statement is the simplest way
to make the step begin to execute, but a RUN statement is not always necessary.
SAS recognizes several step boundaries for a SAS step:
n another DATA statement
n a PROC statement
n a RUN statement
When you submit a DATA step during interactive processing, it does not begin
running until SAS encounters a step boundary. This fact enables you to submit
statements as you write them while preventing a step from executing until you
have entered all the statements.
More About DATA Step Execution 877
raw data instream data INPUT statement after the last data
lines line is read
observations one SAS data set SET and MODIFY after the last
sequentially statements observation is
read
A DATA step that reads observations from a SAS data set with a SET statement
that uses the POINT= option has no way to detect the end of the input SAS data
set. (This method is called direct or random access.) Such a DATA step usually
requires a STOP statement.
878 Appendix 2 / Understanding How the DATA Step Works
A DATA step also stops when it executes a STOP or an ABORT statement. Some
system options and data set options, such as OBS=, can cause a DATA step to stop
earlier than it would otherwise.
If the VARINITCHK= system option is set to ERROR, a DATA step stops processing
and writes an error to the SAS log if a variable is not initialized. For more
information, see “VARINITCHK= System Option” in SAS System Options: Reference.
879
Appendix 3
Updating Data Using the MODIFY
Statement and the KEY= Option
Updating Data Using the MODIFY Statement and the KEY= Option . . . . . . . . . . . . . . . . . . . . 879
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 880
Understanding the MODIFY Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 881
Sequential Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883
Matching Access Using BY Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883
Comparing Matching Access to the UPDATE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 885
Direct (Random) Access by Observation Number Using POINT= . . . . . . . . . . . . . . . . . . 887
Direct Access by Index Values Using KEY= . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887
Comparing the Matching Access Method Using the BY Statement
to the Direct Access Method Using Index Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
Monitoring Update Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889
Performing Automatic Updates for Like-Named Variables Using KEY= . . . . . . . . . . . 890
Using MODIFY in a SAS/SHARE Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 892
Understanding the KEY= Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893
For more information about combining SAS data sets, see “Modifying” on page
487.
Definitions
Note the following SAS terms and definitions, which are used in this appendix:
data set
a SAS file that consists of descriptor information and data values organized as a
table of rows (SAS observations) and columns (SAS variables) that can be
processed by SAS.
_IORC_
an automatic variable created by SAS when you use the MODIFY statement or
the SET statement with the KEY= option. The value of _IORC_ is a numeric
return code that indicates the status of the most recent I/O operation
performed on an observation in a SAS data set. The return code indicates
whether the retrieval for matching observations was successful:
Table A3.1 _IORC_ Return Code Values
Code Value
0 successful execution
-1 end-of-file error
Typically, you use _IORC_ in conjunction with the autocall macro %SYSRC,
which enables you to specify a mnemonic name that describes a potential
outcome of an I/O operation.
index
a file that is created when you define an index for a SAS data set.
%SYSRC
an autocall macro that provides a convenient way of testing for a specific I/O
error condition created by the most recently executed MODIFY statement or
Updating Data Using the MODIFY Statement and the KEY= Option 881
SET statement with the KEY= option. %SYSRC returns a numeric value
corresponding to the mnemonic string passed to it. The mnemonic is a literal
that corresponds to a numeric value of the _IORC_ automatic variable. SAS
supplies a library of autocall macros to each site. The autocall facility enables
you to invoke a macro without having previously defined that macro in the same
SAS program. To use the autocall facility, specify the MAUTOSOURCE system
option.
Note: The DATA= option is used with the PRINT procedure to specify the
master data set. If DATA= is not specified, PROC PRINT displays the last
created data set that was opened for output. Note that the last created data set
might not be the one that was just modified.
view
contains only the information required to retrieve data values. The data is
obtained from another file. There are three types of views:
SAS/ACCESS view
defines data formatted by other software products. When a SAS/ACCESS
view is processed, the data that it accesses remains in its original format.
If data set options such as KEEP, RENAME, or DROP are used on the modified data
set, the options are in effect only during processing. The descriptor portion of a SAS
data set opened in update mode cannot be changed.
For information about the parts of a SAS data set, including the descriptor portion,
see PDV.
The data set referenced in the MODIFY statement must also be referenced in the
DATA statement. Then, based on the DATA step logic, you can replace, delete, and
append new observations. For example, the following simple DATA step uses the
MODIFY statement to update the data set Invty.Stock by replacing the date values
in all observations for the variable Recdate with the current date:
data invty.stock;
modify invty.stock; recdate=today();
882 Appendix 3 / Updating Data Using the MODIFY Statement and the KEY= Option
run;
When SAS processes the previous DATA step, the following occurs:
1 The MODIFY statement opens SAS data set Invty.Stock for update processing.
2 The first observation is read from the data set and the values are written into
the PDV. The variable Recdate exists in the data set Invty.Stock.
3 The value of variable Recdate is replaced with the result of the TODAY function.
4 The values in the PDV replace the values in the data set.
5 The process is repeated for each observation, using a sequential access method,
until the end-of-file marker is reached.
To further control processing, you can include the following statements in a DATA
step execution in conjunction with the MODIFY statement:
n OUTPUT statement — appends the current observation as a new observation to
the end of the data set. The data set specified in the MODIFY statement must
also be specified in the DATA statement as the output data set.
n REMOVE statement — deletes the current observation from the data set. The
deletion is either a physical or logical deletion, depending on the SAS I/O engine
maintaining the data set.
n REPLACE statement — writes the current observation to the same physical
location; that is, replaces it in the data set. An implicit REPLACE statement at
the bottom of the DATA step is the default action.
The REPLACE, REMOVE, and OUTPUT statements are independent of each other.
More than one statement can apply to the same observation. Note that if both an
OUTPUT statement and a REPLACE or REMOVE statement execute on the same
observation, be sure the OUTPUT statement is executed last to keep proper
positioning in the index.
Sequential Access
Sequential access is the simplest form of processing using the MODIFY statement.
Sequential access provides less control than the other access methods, but it
provides the quickest method for updating all observations in a data set. The
syntax for sequential access is:
MODIFYmaster-data-set<(data-set-option(s))><NOBS=variable><END=variable>
In the following code, SAS sequentially accesses each observation in the master
data set looking for an observation that contains the value u for variable mod. When
an observation is located, the value of variable x is replaced with a 1. The execution
of the implicit REPLACE causes the observation to be rewritten with the updated
value. The variable x must exist in the master data set. The MODIFY statement
opens the data set in update mode, and the header information for a data set
opened for update cannot be modified.
data master;
modify master;
if mod='u' then x=1;
run;
The BY statement is required. The specified variables must exist in both the master
data set and the transaction data set. When using MODIFY with the BY statement,
the rules that apply to the UPDATE statement also apply to MODIFY, except that
the data sets are not required to be in sorted order.
1 The MODIFY statement opens the specified data set for update processing.
2 The MODIFY statement reads an observation from the transaction data set to a
temporary storage location in memory.
The following example uses the matching access method to update the master data
set invty.stock using information from the transaction data set work.addinv as
well as to update the date on which stock was received:
data invty.stock;
modify invty.stock addinv;
by partno; recdate=today();
instock=instock + nwstock;
run;
Even though you do not have to sort or index either the transaction data set or the
master data set, you can improve performance by the following:
n creating an index for the master data set on the variable used as the BY variable
The difference between the implicit REPLACE and the implicit OUTPUT statements
is illustrated in the following example, which uses two data sets: master and trans.
Their DATA steps are shown below:
data master;
input ssn : $11. nickname $;
datalines;
134-56-9094 Megan
160-58-1223 Kathryn
161-60-5881 Joshua
;
data trans;
input ssn : $11. nickname $;
datalines;
134-56-9094 Meg
142-67-9888 Bill
160-58-1223 Kate
161-60-5881 .
;
To process the previous sorted data, you can use the UPDATE statement. The
UPDATE statement generates an implicit OUTPUT statement. The OUTPUT
statement appends the values from the current program data vector to the end of
the data set. This happens regardless of whether there is a match on the BY
variables.
data master;
update master trans
updatemode=nomissingcheck;
by ssn;
run;
886 Appendix 3 / Updating Data Using the MODIFY Statement and the KEY= Option
The SAS log reflects the value 1230013 in the automatic variable _IORC_, because a
match for the second observation in the trans data set does not exist in the master
data set. Because an error occurs, the automatic variable _ERROR_ is set to a value
of 1.
ERROR: The TransACTION data set observation does not exist on the MASTER data set.
ERROR: No matching observation was found in MASTER data set.
ssn=142-67-9888 nickname=Bill FIRST.ssn=1 LAST.ssn=1 _ERROR_=1 _IORC_=1230013 _N_=2
NOTE: TRANS data set does not exist in the MASTER data set. Because an error occurs,
the automatic variable
_ERROR_ is set to a value of 1.he SAS System stopped processing this step because of
errors. NOTE:
There were 1 observations read from the dataset WORK.MASTER.
NOTE: The data set WORK.MASTER has been updated. There were 1 observations rewritten,
0 observations added and 0 observations deleted.
NOTE: There were 3 observations read from the dataset WORK.TRANS.
To prevent displaying the non-matched record in the SAS log, reset the automatic
variable _ERROR_ to zero. (Error checking is discussed later in this appendix.)
data master;
modify master trans
updatemode=nomissingcheck;
by ssn;
if _iorc_=0 then replace;
else _error_=0;
run;
Updating Data Using the MODIFY Statement and the KEY= Option 887
To illustrate direct access by observation number, assume that you have a data set
named newp. newp has variable tool_obs containing the observation number of
each tool in the tool company’s master data set invty.stock and variable
newprice containing the new price for each tool.
The following example uses the information in data set newp to update data set
invty.stock. newp is specified in the SET statement, which reads values to be
supplied for the tool_obs and newprice variables. Variable tool_obs is specified as
the value of the POINT= option. Variable newprice is specified in the assignment
statement to replace existing values of price in invty.stock. As the SET
statement executes, values of tool_obs are read from newp, placed in the PDV, and
then used by the MODIFY statement to retrieve observations directly from
invty.stock. The observation number in tool_obs is used as a key to allow direct
retrieval of observations.
data invty.stock;
set newp;
modify invty.stock point=tool_obs;
price=newprice;
recdate=today():
run;
Direct access by index values lets you supply a lookup value from a secondary data
source such as another SAS data set. An observation is read using a SET statement
to supply a lookup value, which is then used as a key to search the master data set
888 Appendix 3 / Updating Data Using the MODIFY Statement and the KEY= Option
to locate the observation. The search is performed through an index created for
that data set. You specify the index name with the KEY= option. Once the
observation is located, you can assign variables new values and perform other
processing on the observation prior to the implicit REPLACE. If the observation is
not located, an appropriate action such as OUTPUT is performed. The SAS data set
supplying the lookup value (specified in the SET statement) must contain the same
names as those specified in the KEY= option.
The following example uses the KEY= option to specify the index with which to
identify observations for retrieval by matching the values of variable partno from
data set work.addinv with the indexed values of variable partno in data set
invty.stock:
data invty.stock;
set addinv;
modify invty.stock key=partno;
if _iorc_=0 then do;
instock=instock+nwstock;
recdate=today();
replace;
end;
else _error_=0;
run;
The direct access method is a form of the MODIFY statement that uses the KEY=
option in the MODIFY statement to name an indexed variable from the data set
that is being modified. For an example that shows the direct access method by
indexed values, see “Modifying Observations Located by an Index” in SAS DATA
Step Statements: Reference.
These two methods (which uses the BY statement) and the direct access method
by using an index (which uses the KEY= option) compare as follows:
Updating Data Using the MODIFY Statement and the KEY= Option 889
Table A3.2 Comparing the Matching Access Method Using the BY Statement to the
Direct Access Method Using Index Values
Matching observations in the Matching observations in the master data set are
master data set are retrieved by a retrieved through the index specified with the
generated WHERE statement. KEY= option.
The WHERE statement can use
an index created on the master
data set.
If duplicate BY values exist in the If duplicate KEY= values exist in the master data
master data set, only the first set, the duplicate values can be updated by using
occurrence is updated. a DO UNTIL loop. The DO UNTIL loop can be
written to force the execution of the KEY= option
on the master data set until a non-match
condition occurs. See “Using MODIFY with
Duplicate Key Values in the Master Data Set” on
page 897.
If duplicate BY values exist in the All consecutive duplicates in the transaction are
transaction data set, they are applied by using the UNIQUE option. . If the
applied one after another to the UNIQUE option is not specified, only the first
same observation in the master consecutive duplicate is applied.
data set, unless you write an
All non-consecutive duplicates are applied to the
accumulation statement.
observation in the master data set one after the
Otherwise, the last duplicate is
other, so only the last duplicate is applied.
applied.
See“Using MODIFY with Duplicate Key Values in
the Transaction Data Set” on page 895.
MODIFY with KEY= _DSENOM Specifies that the master data set does not
option contain the observation.
MODIFY with KEY= or _SOK Specifies that the observation was located.
BY statement
The complete list of mnemonics and their current, corresponding numeric values
and descriptions for _IORC_ are contained in the SYSRC member of the autocall
macro library. You can view the contents of the SYSRC member in the SAS log by
submitting the following code:
options source2;
%include sasautos(sysrc);
run;
Note: Beginning with Version 7, the IORCMSG function returns a formatted error
message associated with the current value of _IORC_.
statement and the POINT= option in conjunction with the MODIFY statement.
Observe the order of the following statements:
data master;
set trans;
modify master key=ssn;
i+1;
if _iorc_=%sysrc(_sok) then do;
13
set trans point=i;
replace;
end;
else _error_=0;
run;
The SET statement loads the values of the trans variables into the PDV. Next, the
MODIFY statement is executed. The values of all like-named variables in the PDV
are overlaid with the values from the master data set when a fetch is successful. If
a REPLACE or OUTPUT statement were executed at this point, master would be
updated with the master values in the PDV for the like-named variables.
However, the above code issues a second SET statement, using the POINT= option
(with the counter i+1) to read the same observation in the trans data set. This
effectively updates the PDV with the like-named variables from the trans data set.
As discussed earlier, the POINT= option specifies a variable from another data
source whose value is the number of an observation that you want to modify in the
master data set.
If you have only a few variables to update, you can rely on the normal processing of
the KEY= option, along with explicit assignment statements. When using KEY=, the
trans data set must have variables with the same name as those key variables
used to create the index on the master data set. The trans data set is read with the
SET statement and the values for the key variables are loaded into the PDV. The
value loaded into the PDV is used against the index values of the master data set
specified by the KEY= option to fetch the matching observation.
No automatic update of the values for like-named variables occurs between the
trans and master data set. So in order to change the value of a given variable, an
explicit assignment of the master data set variable is made.
data master;
set trans(rename=(nickname=tnicknam));
modify master key=ssn;
if _iorc_=%sysrc(_sok) then do;
nickname=tnicknam;
replace;
end;
else _error_ = 0;
run;
The differences between the MODIFY and SET statements are listed below:
Updates the original data set Creates a temporary data set with the updates
opened in update mode without applied. Upon successful completion of the
creating a copy. DATA step, the original data set is replaced
with the temporary data set if the DATA
statement and the SET statement refer to the
same data set name. If the data set name is
different, the temporary data set is renamed to
that existing in the DATA statement.
Variables cannot be added or The output data set can contain new variables
dropped from the modified data and variables existing in the lookup and
set. The header information for a primary data sets.
data set opened for update cannot
be modified.
The REPLACE statement updates The REPLACE statement is not valid since a
the current observation of the copy of the original observation is updated and
original data set. OUTPUT.
The original data set is updated The UPDATEMODE= option is not valid since a
with missing values from the copy of the original observation is updated
transaction data set using the with missing values and OUTPUT.
UPDATEMODE= option.
As with the MODIFY statement, using the _IORC_ automatic variable in conjunction
with the %SYSRC autocall macro provides more error-handling information. When
you use the SET statement with KEY=, _IORC_ is created and set to a return code
that indicates the status of the most recent I/O operation performed on an
observation in the data set. If the KEY= value is not found in the master data set,
894 Appendix 3 / Updating Data Using the MODIFY Statement and the KEY= Option
_IORC_ returns a numeric value that corresponds to the %SYSRC autocall macro's
mnemonic _DSENOM and the automatic variable _ERROR_ is set to 1.
Note: When issuing multiple SET statements using the KEY= option, you must
perform separate error checking of the automatic variable _IORC_on each data set.
That is, in order to produce an accurate output data set, test the _IORC_ variable
following each SET statement using the KEY= option.
Using the SET statement with KEY= supports the concept of lookup values. The
lookup value can be provided through another SAS data set, an external file, a view,
or a FRAME entry. For Version 6, the lookup data set must be a native SAS data set.
Version 7 supports SAS/ACCESS views and the DBMS engine for the LIBNAME
statement.
data combine;
set lookup;
set primary key=partno;
select(_iorc_);
when (%sysrc(_sok)) do;
output;
end;
when (%sysrc(_dsenom)) do;
_error_ = 0;
end;
otherwise;
end;
run;
The lookup value from the lookup data source is used as a key to locate
observations in the primary or master data set. This primary data set must be
indexed (either simple or composite) and specified with the KEY= option. The
lookup values must be provided through variables named the same as the key
variables in the primary data set. For example, if the lookup data source is a SAS
data set, it must contain variables with the same name as those defined to the
index of the primary data set. Once the observation is successfully fetched from
the primary data set, an action such as OUTPUT can be performed on the
observation.
Note: If a DROP or KEEP is not specified, using the KEY= option combines all
variables on the lookup data set and the primary data set. That is, the output data
set contains not only the variables from the lookup data set, but also all the
variables in the primary data set. This does not happen with KEY= and the MODIFY
statement.
For example, consider the following two programs, which are identical except for
the order of the duplicate values for the key variable ssn in the trans data set.
Notice that in Program 1, the trans data set has duplicate key values of
160-58-1223, but they are not consecutive; Program 2 also has the duplicate
values, but they are consecutive.
Program 1 Program 2
The following PRINT procedure shows the results of the above programs.
Note: The DATA= option is used with PROC PRINT to specify the master data set.
If DATA= is not specified, PROC PRINT displays the last created data set opened
for output, which might not be the last one you opened for update.
896 Appendix 3 / Updating Data Using the MODIFY Statement and the KEY= Option
n From Program 2, the consecutive value of is Kate for ssn in the trans data set.
That value does not update the master data set. The consecutive record of Kate
actually causes a non-match to occur, and only the first observation in trans
with the corresponding value is applied.
The order of duplicate key values in trans affects the results of the master data
set. When searching the index, SAS begins at the top of the index only when the
value of the key variable changes in the trans data set.
If the duplicate key values in the trans data set are not consecutive, the search
starts from the top of the index in both searches for Social Security number
160-58-1223. Therefore, the one and only matched value in the master data set is
located and updated both times, so the value in the last duplicate observation is
applied.
For consecutive duplicate key variable values in the trans data set, when the first
value of 160-58-1223 is supplied by the trans data set, the value for the key
variable changes – the index search of the master data set begins at the top and the
observation is found. The nickname variable in master is updated to Kathy. When
the consecutive value of 160-58-1223 is supplied by the trans data set, the value
does not change. Therefore, the search does not begin at the top of the master
index; rather, it begins from the current position in the index structure. The
observation is not found, and an update is not performed for the subsequent
duplicate observations. Only the value in the first duplicate is applied.
To assure the same result whether duplicates in trans are consecutive or not, you
can force the software to begin the search at the top of the index by specifying the
UNIQUE option in the MODIFY statement. The UNIQUE option specifies to search
from the top of the index, regardless of whether the key variable value changes
from one iteration to the next.
The following code uses the UNIQUE option to produce the same results for each
program. For readability purposes, the IF statement is coded as a SELECT
statement instead.
data master;
set trans;
modify master key=ssn/unique;
select (_iorc_);
when (%sysrc(_sok)) do;
nickname=tnicknam;
replace master;
end;
when (%sysrc(_dsenom)) do;
_error_=0;
end;
otherwise;
end;
For example, the trans data set observation with the ssn value of 161-60-5881
updates only the first corresponding observation of the ssn variable in the master
data set to the value of Josh. Notice that in the PROC PRINT results, the
highlighted record is updated.
data master;
set trans(rename=(nickname=tnicknam));
modify master key=ssn;
select (_iorc_);
when (%sysrc(_sok)) do;
nickname=tnicknam;
replace master;
end;
when (%sysrc(_dsenom)) do;
_error_=0;
end;
otherwise;
end;
To update variable Nickname for multiple observations in the master data set with
the unique, corresponding observation in the trans data set, force a continuous
search of the master data set’s index file using a DO UNTIL loop until a non-match
condition is encountered. Notice that the highlighted observations from the PRINT
procedure indicate that all corresponding records of the ssn value 161-60-5881 are
now updated in the master data set.
data master;
set trans;
do until (_iorc_=%sysrc(_dsenom));
modify master key=ssn;
select (_iorc_);
when (%sysrc(_sok)) do;
nickname=tnicknam;
replace master;
end;
Updating Data Using the MODIFY Statement and the KEY= Option 899
When consecutive duplicate values of ssn are supplied by the transaction data set,
you have seen how the UNIQUE option forces a search to start from the top of the
index. For this situation, however, if you use the UNIQUE option, the same
observation in the master data set is located and updated. Therefore, the duplicate
observations in master are not updated.
You must force the search to start at the top of the index by changing the key
variable without using the UNIQUE option, and it must happen between each
observation of the same ssn. Because you need to change the key variable between
consecutive values of ssn in the trans data set, use BY processing to determine
whether more observations of the same ssn in the trans data set are supplied.
Therefore, the trans data set must be sorted for BY processing. To continue the
search in the index for all corresponding matches, use a DO UNTIL statement to
process until a non-match situation is encountered in master.
The following example shows how to temporarily change the key variable value to
a value that is not located in the master data set.
; ;
data master;
set trans;
by ssn;
dummy=0;
do until (_iorc_=%sysrc(_dsenom));
if dummy then ssn=’999-99-9999’;
modify master key=ssn;
select (_iorc_);
when (%sysrc(_sok)) do;
nickname=tnicknam;
replace master;
end;
when (%sysrc(_dsenom)) do;
_error_=0;
if not last.ssn and not dummy then do;
dummy=1;
_iorc_=0;
end;
end;
otherwise;
end;
;
data combine;
set lookup;
do until(_iorc_=%sysrc(_dsenom));
set primary key=partno;
select(_iorc_);
when (%sysrc(_sok)) do;
output;
end;
when (%sysrc(_dsenom)) do;
_error_=0;
end;
otherwise;
end;
end;
proc sql;
create table shiplst as
select a.*, b.desc
from lookup as a, primary as b
where a.partno=b.partno;
quit;
When combining SAS data sets with multiple SET statements, SAS does not
reinitialize existing variables to missing for each DATA step iteration. The
nonmissing value on the output data set is the retained value in the PDV from the
most recent match that occurred.
In the following example, the observation in the lookup data set with the value
A812 for Partno is not located in the primary data set. The DESC variable loaded in
the PDV currently holds the value switch from the last successful match for Partno
value A220. Because a match does not occur, this DESC variable value is not
overwritten in the PDV with a new value for Partno A812.
data combine;
set lookup;
set primary key=partno;
select(_iorc_);
when (%sysrc(_sok)) do;
output;
end;
when (%sysrc(_dsenom)) do;
_error_=0;
*desc = ' ';
output;
end;
904 Appendix 3 / Updating Data Using the MODIFY Statement and the KEY= Option
otherwise;
end;
To write a missing value for DESC that does not exist in the primary data set to the
output data set, you must execute an explicit assignment statement. In the
assignment statement, set the DESC value equal to missing. This overwrites the
retained value in the PDV with the desired value of missing. In the example code
above, uncomment the DESC= code for the desired output.
data combine;
set lookup;
set primary key=partno;
select(_iorc_);
when (%sysrc(_sok)) do;
status="available";
output;
end;
when (%sysrc(_dsenom)) do;
desc=' ';
Updating Data Using the MODIFY Statement and the KEY= Option 905
_error_=0;
output;
end;
otherwise;
end;
DBKEY= can be used to improve performance just as indexing a SAS data file can
improve performance. For example, to improve the performance, the DBKEY=
option can be used in a DATA step with the KEY= option in a SET statement. Note
that you must specify the keyword DBKEY as the value of the KEY= option.
The following DATA step creates a new data file by joining data file keyvalues with
the DBMS table mytable. The software uses the variable deptno with the DBKEY=
data set option to cause a WHERE expression to be passed to the DBMS.
Performance benefits might occur if the DBMS optimizer selects to optimize the
WHERE expression with an existing index on the DBMS table.
data sasuser.new;
set sasuser.keyvalues;
set dblib.mytable (dbkey=deptno) key=dbkey;
run;
906 Appendix 3 / Updating Data Using the MODIFY Statement and the KEY= Option