0% found this document useful (0 votes)
2 views

Routine SAS SQL

The document is a comprehensive guide on integrating SAS and SQL, focusing on common tasks and essential functions for data handling and querying. It covers various aspects, including SQL syntax, queries, computations, and the differences between SAS and SQL terminology. The book aims to simplify the use of SQL within the SAS environment for both SAS specialists and SQL programmers.

Uploaded by

shecrime248
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Routine SAS SQL

The document is a comprehensive guide on integrating SAS and SQL, focusing on common tasks and essential functions for data handling and querying. It covers various aspects, including SQL syntax, queries, computations, and the differences between SAS and SQL terminology. The book aims to simplify the use of SQL within the SAS environment for both SAS specialists and SQL programmers.

Uploaded by

shecrime248
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 161

Routine SAS® SQL

Rick Aster
Routine SAS SQL
Rick Aster
Edition: 1
ISBN: 978-1-891957-21-5 (Paperback)
© 2014 Rick Aster
Portions based on Professional SAS Programming Logic © 2000
www.globalstatements.com
Breakfast Communications Corporation, P.O. Box 176, Paoli, PA 19301-0176 U.S.A. www.breakfast.us
SAS, SAS/ACCESS, and SAS/CONNECT are registered trademarks of SAS Institute Inc., www.sas.com
Contents
Foreword

1. Putting SAS and SQL Together


SQL
SAS
The SQL Procedure
Why Mix SAS and SQL?
SAS and SQL Objects and Terminology
SAS Coding Rules and Conventions
SAS and SQL List Syntax
SAS Tables and SAS Program Files

2. Queries
The Simple Query Expression
ODS and SAS Output
The SELECT Clause and Columns
New Columns and Aliases
Unnamed Columns
Column Attributes
The FROM Clause and Tables
The WHERE Clause
The ORDER BY Clause
List of Values
Storing a Result Set as a Table
Storing a Query Expression as a View
Selecting All Columns
Using Data Set Options
Reserved Words

3. SQL Computations and Conditions


Expressions as Result Columns
Expression Data Types
Constant Values
WHERE Expressions
The WHERE Statement and Data Set Option
Numeric Operators
Comparison Operators
Operator Priority
Operators for Character Values
The CASE Operator
Numeric Functions
SAS Functions Not Available in SQL
Selection Functions
Time Functions
Substrings and String Padding
Text Search Functions
String Concatenation
Functions for Character Processing and Encoding
Environment Functions
Functions for Formats and Informats

4. Summary Queries
Summary Statistics and Aggregate Functions
Null Values in Statistics
Aggregate Functions for Character Columns
Special Arguments for Aggregate Functions
Computations Based on Summary Data
Grouping
The HAVING Clause
Sorting Summary Rows
Query Execution Sequence
Using CALCULATED To Mark Column Aliases

5. SAS Output From SQL


SQL Statements and Global Statements
ODS Destinations
Large Output Tables
Title Lines
ODS Text Lines
Centering Output
Labels as Column Headers
Displaying Missing Values
Formats
Value Formats

6. Combining Tables
Table Aliases
The FROM Clause With a List of Tables
Table Join Operators
ID Columns When Joining Tables
Internal and External Identifiers
Joining Three or More Tables
Groups and Summary Functions When Joining Tables
Self Joins
Subqueries
Subqueries in Table Joins
Subquery as Column Expression
The IN Operator With a Subquery
Set Operators
Set Operators When Columns Aren’t Identical
Set Operator Results When One Set Has No Rows
Combining Summary Data

7. Working With SAS Data


Libraries
The WORK Library
Other Predefined Libraries
Tables
Rows
Changing Values in a Table
Columns
Data Types
Column Attributes
Indexes
Integrity Constraints
Views
Views That Contain Library Definitions
Editable Views
The CONTENTS Procedure
Deleting
Data Set Options
DICTIONARY Tables

8. Working With DBMS Data


Two Ways to Connect
SAS/ACCESS for Relational Databases, and Access Options
SQL Pass-Through
Avoiding Confusion When Working in Multiple Environments
Converting Database Columns to the SAS Environment
SAS Name Literals and the RENAME= Data Set Option
Time Data
Converting Numeric ID Codes
Character Column Lengths
Recoding
Character Transcoding
Strategies for Large Objects
The Database Library Engine
The DATASETS and CONTENTS Procedures With the Database Library Engine
Creating and Populating a Database Table
In-Database Processing
Remote SQL Pass-Through
Database Performance Issues

9. SQL Options and Execution


SQL Options
A Macro Variable Option
System Options for SQL
SQL Execution

10. Macro Variables for SQL


Working With Macro Variables
Automatic Macro Variables for SQL
SQL Return Codes
Other Automatic Macro Variables
Writing Queries With Macro Variables
Creating Macro Variables From Data Values
Combining Columns in a Macro Variable
Writing Results in the Log
Creating a List of Values in a Macro Variable
Creating a List of Macro Variables

Appendix 1. The SAS Data Model

Appendix 2. SQL Reserved Words

Index
Foreword
The word “routine” in the title of this book has a double meaning. The book focuses on the common, everyday tasks
in programming in SAS and SQL — work that might be part of your daily routine of working with data, or perhaps
the first things you might want to do as you begin putting SAS data handling and SQL queries together. You can
certainly find more advanced coverage of techniques for either SQL or SAS, but in writing this book, my
assumption was that you would want to put off most of the more difficult techniques until after you had mastered the
essential and straightforward tasks of SAS SQL.
At the same time, the book places a special emphasis on functions and formats. Among the small, reusable
program units, or routines, of the SAS System, these are the two kinds that are important when you are working in
SQL. They can serve as the building blocks of much of the programming you do in the SAS environment. You
could already be familiar with many of these routines if you have done previous work in SAS, but I did not want to
assume that, nor did I want to send you off to another book that shows how the routines may be used in a different
context.
This book, after all, is as much for the SQL programmer starting to work in SAS as it is for the SAS specialist
beginning to explore SQL. SQL has earned its place as one of the cornerstones of the SAS environment, allowing
programmers to use the techniques and the data of the relational database world within SAS. For many people, data
center workers or data analysts who draw data from data centers, SQL is at the center of the programming they do in
SAS. This book takes on that point of view, putting SQL at the heart of SAS work, in order to explain the common
tasks of SAS SQL as simply and directly as possible.
1
Putting SAS and SQL Together
SAS and SQL make a useful combination. SAS is powerful, programmable integrated software for working with
data in a wide range of ways, including collecting, organizing, and analyzing it and presenting it in formatted
reports. After you have collected and organized data, SQL is the closest thing there is to a standard language for
selecting a specific part of the data you have collected. SQL is used most often for retrieving, or querying, data, but
it also includes statements for managing data. In SAS, the SQL procedure makes it possible for SQL statements to
act on SAS data.

SQL
SQL (most often pronounced as the three letters “S-Q-L”) is not a programming language in the most traditional
sense. Programming implies that you are defining a sequence of actions in advance, but SQL statements are careful
not to specify actions in too much detail. Instead, each SQL statement spells out a specific intended result, or
outcome. The letters in SQL stand for Structured Query Language, and in the long form of the name the word query
points to the idea of asking a question of the data and receiving an answer.
The result orientation of SQL is what allows it to be a standard, operating in a wide range of database
management systems (DBMSs), along with the SAS environment. The SQL programmer does not have to know all
the details of the way the data is stored. Often it is sufficient to know the names of the data elements, which in SQL
are called columns, and the names of the tables that contain them.
IBM developed SQL in the 1970s as a database interface based on the relational database model, which IBM
also developed around the same time. The relational database model defines a streamlined formal structure for data
in a database, and SQL is designed to make use of this structure.
With its promises of flexibility and efficiency, the relational database model quickly took charge of the world of
business data starting in 1979, and SQL has been part of nearly every relational database management system since.
SQL has been the subject of a series of ANSI and ISO standards since 1986, though no real-world implementation
of SQL ever followed any standard exactly.

Pre-Release
A pre-release version of SQL was called SEQUEL. This name stood for “Structured
English Query Language” and was also a pun on “seek well.” The English-like qualities of
the language were taken out in a redesign that made the language more reliable, resulting
in the early name change from SEQUEL to SQL, though some SQL fans stuck with the
“sequel” pronunciation. (In the most common usage and in SQL standards, the letters are
pronounced separately.) As the pun on “seek” suggests, SQL was originally designed with
searching in mind — with the thought that you would be looking for a very small part of
the data. In some ways this is what SQL is best at, but it is used just as often for operations
on entire tables of data.
SAS
SAS (pronounced like “sass”) might be a few years older than SQL, but it was developed with some of the same
ideas about data. SAS, though, was designed to work with a procedural programming language, also called SAS,
and it is built on a data model that in some ways is a little bit simpler than the relational database model.
The key to understanding the differences of the SAS approach is the idea that data is a means to an end. Data in
SAS may be a starting point or an intermediate stage toward an objective that typically includes analysis and
reporting. Some of the data along the way may be “throwaway” data, in temporary data sets that are discarded as
soon as the work is done. This differs from the perspective that is built into a database, where data is seen as an end
in itself and as the center of any work that might be done.
In the SAS data model, the data elements are variables, though these are essentially identical to the columns of
SQL. Data is organized into tables that have considerably more metadata, or identifying information, than SQL
requires. In SAS these tables are called SAS data sets. New SAS data sets can be created easily, and this is the usual
result of an action in SAS. The SAS data model is described in more detail in Appendix 1.
Programming in SAS is divided into large-scale sequential actions, which are called steps. Most steps create
SAS data sets as output or use them as input, or both. Data steps allow any sequence of actions, usually to create a
SAS data set. Proc steps run procedures, which are special-purpose programs designed to run in the SAS
environment. Usually proc steps use SAS data sets as input, and they may also generate new SAS data sets as
output.

The SQL Procedure


SAS expects a SAS program to contain SAS statements. The SQL procedure provides a way to tell SAS that a
section of the program consists of SQL statements instead. The SQL procedure doesn’t do most of the things that
more typical procedures do, so its syntax can be very simple. It starts with a PROC SQL statement. This statement
tells SAS to start looking for SQL statements. Any number of SQL statements can follow. These SQL statements are
executed one by one. Finally, the PROC SQL step wraps up with the QUIT statement.
The code model for a simple PROC SQL step, then, is just this:
proc sql;
SQL statement
. . .
quit;

The SQL statements in the PROC SQL step have to be written according to the SAS rules for SQL. SAS SQL is
based on the 1992 SQL standard and follows the standard as closely as you would expect, while still adapting it to
the capabilities and requirements of the SAS environment.
The PROC SQL step ends with a QUIT statement. In SAS, the RUN statement is usually the statement that tells
SAS that a step is complete. In the SQL procedure, the RUN statement has no effect, except to produce this log
message:

NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.

The loss of the RUN statement in the SQL procedure happens for technical reasons, related to adjustments that
allow you to include global statements among the SQL statements of a PROC SQL step. TITLE, ODS, and other
global statements modify the output from SQL statements. To allow these statements, the RUN statement, which can
also be a global statement, can’t be treated as the end of the step. Instead, you have to use the statement that tells
SAS, “The proc step is really, really over now.” That statement is the QUIT statement. When it reaches the QUIT
statement, SAS concludes its SQL work and writes its usual end-of-step notes in the log. After the QUIT statement,
SAS stops looking for SQL statements.
Why Mix SAS and SQL?
If you are working entirely within the SAS environment, SQL might be seen as optional. The SAS language has its
own ways of accomplishing all the same things you would expect to do in SQL. The SQL statements, with their
focus on results, may look nothing like the action-oriented SAS statements that do the same thing. For some tasks,
the SQL statements can provide a much clearer or simpler coding approach. For others, though, it can be tricky to
get them to fit into the SQL paradigm at all.
SQL’s focus on results also means you are turning the mechanics of the work over to SAS for it to figure out.
This can often result in a highly efficient approach, perhaps more efficient than you could devise by combining other
SAS language features.
Another reason to use SQL within SAS is simply that SQL is so widely used in the database world. If you have
existing SQL code, you may be able to move this to the SAS environment with only slight changes. Also, the easy
way to connect SAS to a database so that you can use database data as a starting point for analysis and reporting in
SAS is by combining database SQL with SAS SQL in the SQL procedure.
Even though you can run SQL statements directly in a DBMS, there are reasons to bring SAS into the mix. SAS
has capabilities, especially in analysis and reporting, that aren’t available inside the DBMS itself. SAS also makes it
easy to combine data found in two separate databases into a single result. Whenever you use a database and SAS in
combination, SQL provides the easy way to connect the two environments.

SAS and SQL Objects and Terminology


When you first use SAS and SQL together, some of the terminology can be confusing. SAS and SQL were created
with similar ideas of data. Both use many of the same concepts of data organization. However, some of the most
important data objects are called different names in the SAS and SQL environments. The following list translates
between SAS and SQL terminology.

A file containing organized data


SAS term: SAS data file, SAS data set
SQL term: table
A distinct data element
SAS term: variable
SQL term: column
An instance of data, including one value for each associated data element
SAS term: observation
SQL term: row
A special value of a data element that indicates that a value is not available
SAS term: missing, missing value
SQL term: null, null value
A rule that restricts the values of a data element
SAS term: integrity constraint
SQL term: constraint

The SAS and SQL objects are identical or nearly the same, but the terms differ sometimes because of the
different purposes for which SAS and SQL were originally envisioned. The SAS terms variable, observation, and
missing are the terms used in the field of statistics. The SQL terms table, column, and row derive from the other
branches of mathematics that were used to develop relational database theory.
Fortunately, for many other objects, the SAS and SQL terms are the same. If you are talking about a view, index,
or function, the words are the same, whether the context is SAS or SQL.

SAS Coding Rules and Conventions


When you write SQL for SAS, the SQL statements you write form part of a SAS program. To fit in, they have to
adhere to the SAS ideas of what program statements should look like. If you have worked with SAS before, all of
this is familiar and may seem obvious. But if you are an SQL programmer working in SAS for the first time, take
note of the following syntactical concerns.

Statements
Write a semicolon at the end of a statement.

The SQL Procedure


As mentioned a moment ago, SAS uses the SQL procedure to run SQL statements. Write SQL statements between
the PROC SQL statement and the QUIT statement.

Names
A SAS name is a single word. This is the usual rule in computer programming, but it differs from the multiple words
and arbitrary text permitted as column names in some databases that use SQL.
When multiple words are needed in a SAS name, they are actually multiple names, joined with dots (written as
periods) to form a multilevel name. You usually see this with SAS data sets (tables) and other SAS files (and it is the
same with table names in most DBMSs). For example, the SAS data set WORK.MORE is a file called MORE
contained in the WORK library. WORK and MORE are two separate names, but it takes the combination of them to
fully identify the SAS data set.

Quoted Strings
Use either the single quote or double quote to enclose a character literal or other quoted string in SAS.
Quoted strings are used not just for character values, but also for several other kinds of constant values.
SAS names are single words, so quoted strings ordinarily are not used to set off names, as you might see in some
other SQL implementations.

Time
SAS has three kinds of time measurement values. The most common is a SAS date value, which indicates a calendar
day by counting days from 0=January 1, 1960. A SAS time value indicates time of day in seconds since midnight.
Combine date and time of day into a single value (elsewhere often called a timestamp), and you get a SAS datetime
value, which counts seconds since the beginning of 1960.
To make these values easier to work with, you can write constants using familiar calendar and clock elements.
These are examples of constant values:

SAS date constant: '05mar1960'd


SAS time constant: '17:15:00't
SAS datetime constant: '03jan2020 23:45'dt

Though SAS has extensive support for time values and constants, it does not have separate data types for them.
Time measurements are ordinary numeric values.

Case
SAS is mostly case-insensitive, so that you can write keywords and names in any combination of upper- and
lowercase letters. Most programmers feel that they make a SAS program more readable by coding either mostly or
entirely in lowercase letters.
SAS does keep track of the case of variable names, though, and when it displays them as columns it shows the
same case that was used in the SAS program when the variables were first created. This could be a reason to write
variable names in a combination of upper- and lowercase letters, at least when you create a new variable.

SAS and SQL List Syntax


SAS and SQL follow many of the same rules of syntax, at least at the most basic level. The two languages use
keywords, names, options, symbols, spaces, and lines in much the same way.
There is, however, one difference that is especially important to note. In SQL, a reference to a table, column, or
any other data object is an expression. Therefore, in any list of objects in an SQL statement, you must write commas
as separators between list items. This contrasts with the SAS style, in which a reference to an object is merely a
name, and in lists, the names are separated only by spaces.
To see this difference in concrete terms, compare the two code lines below. (I know, it’s four lines, but the
symbols /* and */ mark off comments that aren’t executed as part of a SAS or SQL program, but merely serve to
describe it.) The first is a SELECT clause from SQL, selecting four columns. The second is a VAR statement from a
conventional SAS procedure, such as the PRINT procedure. In the SELECT clause, commas are required between
list items. In the VAR statement, only variable names are permitted, and commas are not allowed between the
variables in the list.
/* SQL list of column expressions */
select component, quantity, pulldate, stockdate

/* SAS list of variable names */


var component quantity pulldate stockdate;

Be alert to this difference in approach. If you are used to the SAS style of coding, remember to write the commas
between items when you write a list in SQL. Conversely, if you are an SQL coder and you start to use some of the
other features of SAS, remember not to write commas in lists of variables in SAS.

SAS Tables and SAS Program Files


As you work with SQL in SAS you will be working with SAS tables and SAS program files.
The details of viewing a SAS table depend on the user interface of the front end application you are using to
interact with SAS. In general, locate the table icon and double-click, or select the Open menu item to view a table.
These additional points may be helpful:

In the command line of the SAS programming environment, the VIEWTABLE command
with a table name shows a table in the Viewtable application.
Viewtable may show a table with column labels in place of names. Select the Column
Names item in the View menu to show column names.
In Microsoft Windows, SAS Universal Viewer is a stand-alone application that displays
SAS files.
You can view a table indirectly by viewing the output from a query. See “Selecting All
Columns” in the next chapter.

SAS SQL code is embedded in a SAS program file, along with SAS statements. A SAS program file is an
ordinary text-only file, conventionally stored with the file extension .sas. Use the SAS programming environment
or any text editor to view and edit SAS program files.
Run SAS programs interactively with the Run menu item or button. Then check the Log window to see what
happened when the program ran. Or, run in batch mode by issuing the operating system command (usually sas) with
the name of the program file, for example, sas myprogram.sas. SAS creates a log file that has the same file name,
but the file extension .log.
2
Queries
SQL syntax is built around specific kinds of expressions, the most important of which is the query expression. The
query expression is the heart of SQL coding. When an SQL statement selects data, it returns it in the form of a table,
with rows and columns. The query expression says which columns and which rows to include in the result.

The Simple Query Expression


A simple query expression may contain three clauses, each beginning with a distinct keyword:

1. a SELECT clause to select columns


2. a FROM clause to identify the source of the data, which could be an existing table
3. a WHERE clause to provide a condition for selecting rows from the source table

The clauses always appear in this same order, and each clause can be used only once. A code model for a simple
query expression is:
select column, column, . . .
from table
where condition

A query expression in this form selects columns and rows from an existing table to produce a result in the form
of a table. To see the result, use the query expression as a statement in the SQL procedure. This statement starts with
the keyword SELECT, so it is known as a SELECT statement. This is an example of a SELECT statement:
select mix_sequence, ingredient, amount, amount_unit
from bakery.ingredients
where recipe = 'Chocolate Chip Cookies';

ODS and SAS Output


When you run a SELECT statement in SAS, SAS prepares and displays a formatted document showing the results of
the query. Consider the SELECT statement above. Supposing that the table BAKERY.INGREDIENTS exists and
contains the columns indicated, and that it has a number of rows including several for Chocolate Chip Cookies, the
resulting output might look generally like this:

mix_sequence ingredient amount amount_unit


1 baking 1.5 teaspoon
soda
1 flour 1.25 cup
1 salt 1 teaspoon
2 brown sugar 0.5 cup
2 coconut oil 0.5 cup
2 sugar 0.75 cup
...
Depending on the ODS settings, the output might instead be in the Listing destination, with a text-only look,
something like this:
mix_sequence ingredient amount amount_unit
----------------------------------------------------
1 baking soda 1.5 teaspoon
1 flour 1.25 cup
1 salt 1 teaspoon
2 brown sugar 0.5 cup
2 coconut oil 0.5 cup
2 sugar 0.75 cup
. . .

Despite the different look, the data values are the same. ODS statements determine what happens to SAS output and
let you control many of the details of the appearance of output from SQL; see chapter 5.

The SELECT Clause and Columns


The beginning of a query expression is the SELECT clause that defines, or selects, the columns that appear in the
result. The clause contains the word SELECT, then a list of columns. In the simple case, you select columns that
already exist in a source table. This is the table indicated in the FROM clause that follows. The SELECT and FROM
clauses together are sufficient to form a query expression.
Prepare to write a query by looking at the source table with the table name, column names, and at least a few
rows of data. Consider this small SAS data set as the starting point for a query:

FIN.EURONOTE
Value Color WidthHeight Series
5 Grey 120 62 2013
10 Red 127 67 2002
20 Blue 133 72 2002
50 Orange 140 77 2002
100 Green 147 82 2002
200 Yellow 153 82 2002
500 Purple 160 82 2002

Write a simple SELECT clause by forming a list of the column names you decide to include. Suppose you want
a result that shows COLOR and VALUE. Write:
select color, value

Add FROM and the name of the table, and you have formed a valid query. For this example, this is the query,
with a semicolon added to form a statement:
select color, value
from fin.euronote;

This is the resulting output table:

Color Value
Grey 5
Red 10
Blue 20
Orange 50
Green 100
Yellow 200
Purple 500
The SELECT clause does not merely choose the columns of the output. It also determines the order they appear in,
as you see in the output table above. Change the order of the columns in the SELECT clause, and the output changes
accordingly, as the following revision demonstrates.
select value, color
from fin.euronote;

ValueColor
5Grey
10Red
20Blue
50Orange
100Green
200Yellow
500Purple

SQL does not promise to retrieve rows in any particular order. In simple cases with one table, SAS generates
output rows in the same order as the input rows, as you see above. When necessary, you can control the sequence of
output rows by writing an ORDER BY clause, described shortly.

New Columns and Aliases


The output columns do not all have to be columns that already exist in the input. You can create new columns in
various ways, mainly:

by providing a new name for an existing column


as a constant value
by computing a value from the existing columns

When you create a name within a query in SQL, the new name is called an alias. Write the word AS and the alias
after the item you are renaming. This is an example of using an existing column with a new name:
value as Denomination

In this example, the query is selecting a table column called VALUE, but is renaming it as DENOMINATION.
Using an alias to provide a name is a necessity when you use a constant or expression to create a new column, if
you want the new column to have a name at all. To create a constant column, which has the same value for all rows,
write the constant value as the column expression, along with a column alias. Here are three examples:
'The Beatles' as band,
'01jan1960'd as bandyear,
4 as bandmembers

A column can be computed from existing columns and constant values. This example computes the new column
MID_VALUE as the average of 1 and HIGH_VALUE:
(1 + high_value)/2 as mid_value

A SELECT clause can contain any combination of the different kinds of column expressions. The example
below shows a SELECT clause with a constant column, an original table column, and a computed column.
select
'Euro' as Currency,
value as Denomination,
round((width*height/1000000)*89.1, .001) as Weight
from fin.euronote;
Below is the output from the example.

CurrencyDenominationWeight
Euro 5 0.663
Euro 10 0.758
Euro 20 0.853
Euro 50 0.96
Euro 100 1.074
Euro 200 1.118
Euro 500 1.169

In the example above, the CURRENCY column is a simple character constant (a character literal), and the
WEIGHT column shows a simple physical computation from numeric constants and table columns. SAS SQL
allows many other possibilities for constant values and column expressions. The next chapter describes constants
and expressions in detail.
Aliases are not just for columns that result from computations. Use aliases:

When a column is used more than once in the same query, to ensure that all result columns
have distinct names. This occurs more often when a query uses two or more tables, as
described in chapter 6. Having distinct names for columns might not be essential in all SQL
environments, but it is especially important in SAS.
To provide a more meaningful or useful name for a result column that will be stored in a
table.
To name a column for display in an output report. You may want to show a combination of
upper- and lowercase letters; SAS will store and display the name using the case you
provide when you first create the name.

When a column alias is created in the SELECT clause, you might expect that you could refer to the column alias
in other clauses of the same query, such as the WHERE clause. However, this is recommended only for the ORDER
BY clause that provides the sort order of the result set. In other clauses, the new column might not be available. For
a discussion of the issues involved, see “Using CALCULATED to Mark Column Aliases” at the end of chapter 4.

Unnamed Columns
If you do not provide an alias for a computed column (or any other column that is not simply a table column), then
the column is created without a name.

In an output table from a SELECT statement, the unnamed column is displayed with a blank
column header.
If you create a table or view, SAS generates a name for the column. SAS creates names for
unnamed columns by combining the prefix _TEMA with the numeric suffixes 001, 002, and
so on.

Column Attributes
Every column you create has several attributes that determine specific details of the way the column is used. You
can set a column’s attributes by writing column modifier terms after the column name or expression. Form a column
modifier by writing the attribute name, an equals sign, and an appropriate value for the attribute.
Those familiar with SAS syntax may notice that these terms are mostly the same as the terms in the ATTRIB
statement used in other steps in SAS. If you write multiple attributes, write them in any order. If there is also a
column alias, the AS clause may appear anywhere among the column modifiers, but it is perhaps easiest to find if
placed last, as shown in the code model below.
column expression attribute=value . . . AS alias,
. . .

The available attributes are length, label, format, informat, and transcode. They are discussed in detail in chapter
7. Format and label provide details of the visual presentation of a column, as discussed in chapter 5. The label may
be used as a column header, and the format controls the appearance of the column’s values.
The most immediately important attribute, though, is the length attribute. For a character column, the length
attribute sets the length of the column, which determines how many characters its values can hold. It may be
important to set the length of a column, especially a computed column, if a column comes with unwanted trailing
spaces or if the length you want is different from the length that SAS offers for the column. This use of the length
attribute is described in more detail in the next chapter.
Below is an example of a SELECT statement that uses the LENGTH, FORMAT, and LABEL column modifiers.
select
startdate format=date9. label='Start Date',
enddate format=date9. label='End Date',
name length=4 label='Short Name' as shortname
from main.events
order by startdate, enddate;

The INFORMAT= and TRANSCODE= column modifiers can also be used to set those two respective attributes,
usually when creating a table. The informat attribute indicates a routine and arguments for converting text to a value.
This may be important when data in a table is edited interactively. The transcode attribute is used only for character
columns. It has the value YES or NO to indicate whether it is permissible to convert the value to another character
encoding. See chapter 7 for more information.

The FROM Clause and Tables


The FROM clause, as mentioned, identifies the data source for the query. In the simplest cases, this is a single
existing table. Write the keyword FROM and the name of the table.
In the SAS environment, a table is usually a SAS data set. The name is ordinarily written as a two-level name,
two names connected by a dot. The first name is the libref, which identifies the library that contains the SAS data
set. The second name is the member name, identifying the specific file.
We use the word table when describing how to form a query expression, but the data source for a query can just
as easily be a view. A view is a kind of program that supplies data in the form of a table. There are several kinds of
views in SAS, including the SQL view, which is described shortly. You can use a view in a query expression
anywhere a table is called for.
In practice, most queries require a combination of at least two tables. The FROM clause can indicate a list of
tables when needed. There are some complexities involved when you combine tables in SQL. The strategies and
syntax that are required are described in chapter 6. There are other data sources that a FROM clause might indicate:

A system table that contains detailed information about the state of the environment. In
SAS, these are the DICTIONARY tables. See “DICTIONARY Tables” in chapter 7.
Another query, enclosed in parentheses. This is called a subquery. The result set of the
subquery is in the form of a table, so it does not require any special adjustment to use this
immediately as a starting point for another query. However, in SAS, there are not many
situations where using a subquery provides an advantage. See “Subqueries” in chapter 6 for
a discussion of subqueries and alternatives.
Two tables joined with a table join operator. Table join operators provide several specific
ways of combining tables. See chapter 6.
The result of a DBMS query. This is indicated by the CONNECTION TO operator along
with a query that SAS hands off to a DBMS to execute. The query is written using the
DBMS’s rules of SQL syntax and is known as a pass-through query. The technique is
known as SQL pass-through and is the primary means of transferring data from databases to
SAS. See chapter 8.

A SELECT clause and a FROM clause together are all you need to form a valid query. A query written this way
returns all the rows found in the source data. Often, though, you want to include only some of the available rows.
That is the purpose of the WHERE clause.

The WHERE Clause


When you look at a table, you don’t necessarily want to look at every row. To choose which rows appear in the
result set, write a WHERE clause after the SELECT and FROM clauses. In the WHERE clause, write a logical
condition for selecting rows. This is an example of a query expression with a WHERE clause:
select
name, area, length, elevation
from geo.lakes
where country = 'New Zealand'

The WHERE condition, country = 'New Zealand', has the effect of selecting only New Zealand lakes from a table of
lakes. Note that a column used in the WHERE condition, COUNTRY in this case, does not have to be listed in the
SELECT clause.
The condition in a WHERE clause is formed using the same operators and functions that you can use in creating
the expression for a computed column. See the next chapter for details of forming an expression.

The ORDER BY Clause


The ORDER BY clause indicates the order of the rows in the result set. When you add an ORDER BY clause to a
query, you are telling SAS to sort the result set in that order. If the sort order is based on multiple columns, write the
list with commas between the columns. Write the modifier DESC after a column to sort in descending order of that
column. This is an example of a SELECT statement with an ORDER BY clause:
select
name, area, length, elevation
from geo.lakes
where country = 'New Zealand'
order by area desc, length desc;

This ORDER BY clause sorts by descending values of AREA and LENGTH in order to show the largest lakes first
in the output.
If you are sorting by a column that is calculated in the SELECT clause, provide a column alias for the column in
the SELECT clause, then write this alias in the ORDER BY clause. Unlike the other clauses in a query, the ORDER
BY clause points more toward the result set than toward the original source data.

List of Values
In a SELECT clause, write the keyword DISTINCT before the list of columns to eliminate duplicate rows from the
result. The result, then, is a list of the distinct combinations of values of the selected columns. If you list only one
column with the DISTINCT option, it produces a list of all the different values of that column.
For the purposes of the example that follows, suppose that EN.LETTER is this table (continuing to include the
26 letters of the English alphabet):

EN.LETTER
LETTER TYPE
A Vowel
B Consonant
C Consonant
D Consonant
E Vowel
F Consonant
G Consonant
...

With this data, you can easily see the effect of adding DISTINCT to a query. First, consider this query:
select type from en.letter;

The output has one row for every row in the input:

TYPE
Vowel
Consonant
Consonant
Consonant
Vowel
Consonant
Consonant
...

Below the query is revised to add the word DISTINCT after the word SELECT.
select distinct type from en.letter;

The revised output shows one row for each different value of TYPE, and it arranges them in sorted order. TYPE has
two different values, so there are two rows in the output table:

TYPE
Consonant
Vowel

Storing a Result Set as a Table


A SELECT statement takes the result set of a query and uses it to create an output object. Result sets can also be
stored as data. The CREATE TABLE statement creates a table with the result set of a query.
A SELECT statement is converted to a CREATE TABLE statement by adding terms to the beginning of the
statement. The added words are CREATE TABLE, the name of the table, and AS. This is an example of a CREATE
TABLE statement:
create table geo.nzlakes as
select
name, area, length, elevation, maxdepth, volume
from geo.lakes
where country = 'New Zealand'
order by area desc, length desc;
NOTE: Table GEO.NZLAKES created, with 4 rows and 6 columns.

The new table GEO.NZLAKES is a SAS table, stored as a SAS data file. The log note that describes it is similar to
the note that describes a new SAS data file in any other step in a SAS program, but it uses the SQL words table,
rows, and columns in place of the SAS words SAS data set, variables, and observations.

Storing a Query Expression as a View


Change the word TABLE to VIEW and you have a CREATE VIEW statement. You can use a view in much the
same way that you would use a table. The syntax is the same except that the CREATE VIEW statement does not use
the ORDER BY clause. In spite of the similar statement syntax, creating a view is a very different process from
creating a table.
Creating a view involves much less work than creating a table. The CREATE TABLE statement executes a
query and stores the result set in a table. It has to do all the work of a query before it can create the new table. The
CREATE VIEW statement checks the syntax of the query, then stores the query for later use. The real work of the
query is saved for later. SAS does not execute the view until the view appears in a subsequent query.
The data values are not considered at all in the process of creating a view. The log note says only that a new
view has been stored:
create view work.cables as
select source, target
from work.net
where source is not null and target is not null;

NOTE: SQL view WORK.CABLES has been defined.

The log does not indicate any information about the result set of the query, such as the number of rows and columns,
because the query has not yet been executed. The query is executed whenever a later statement or program reads
from the view.
Which is better, you might ask, a table or a view? It depends in part on what you want to do with the data. If you
need a stable, fixed set of data for analysis, only a table will do. If you must have the latest data no matter what, you
are better off with a view. In other situations, the answer may depend on the way the data flows. A view might be
updated automatically every time, but a table too can be updated as often as you like, by running the CREATE
TABLE statement again. A view may be preferable, then, if the source data is updated more often than the view is
queried. But a table can be less work when data is updated on a regular, known schedule.

Selecting All Columns


Sometimes you want to see all the columns of a table. You could list all the column names in the SELECT clause,
but what if you don’t know the exact names of the columns? SQL uses the symbol * to indicate all available columns
in a table. (This symbol is commonly pronounced “star” in SQL, even though you write it using the character
“asterisk.”)
This example uses the option INOBS=5 to limit the number of rows in the output set. This limits the size of the
result, which otherwise might be large because of a potentially large number of columns.
The example also uses the option NOLABEL to show column names in the output. This makes it easy to revise
the query, replacing * with a list of column names.
options nolabel;
proc sql inobs=5;
select *
from main.mytable;

If the table actually has more than 5 rows, SAS writes this log note to remind you that your output was cut short:
WARNING: Only 5 records were read from WORK.MYTABLE due to INOBS= option.

Using Data Set Options


Data set options are options that change the way a SAS data set is stored or read. The same data set options that are
used with SAS data sets in other steps can be used in much the same way when SAS data sets appear as tables in
SQL statements. Write the data set options in parentheses after the SAS data set name.
An example of a data set option that is just as useful in SQL as anywhere else is the COMPRESS= option, which
can apply data compression to the rows in a table. The following example applies the COMPRESS=NO option, to turn
compression off, to the new table WORK.LOGINS.
create table work.logins (compress=no) as
select distinct username from work.netsession;

Data set options are not used as often in SQL as elsewhere in SAS, as many of their most useful capabilities are
provided by other features in SQL syntax. Specific data set options and cases for using them are discussed in chapter
7.

Reserved Words
The SQL standards restrict the use of all words that SQL uses as keywords. In standard SQL, these reserved words
cannot be used, at least not by themselves, as the names of SQL objects. SAS, though, has no such restrictions on
variable names, and it wants to let you use whatever names you might have as SQL columns, so SAS SQL reserves
only a few words. The names CASE and USER cannot be used as column names. CASE is an operator for selecting
a value based on logical conditions; USER is a special identifier that provides the user ID. If these are names of
columns that you want to use in a query, use the RENAME= data set option to change their names.
As an example, consider a table containing the column USER. You can’t use this column directly in a query, but
by renaming USER with any other name, you can use the column. The statement below uses the RENAME= data set
option to change the name of the column USER to USERNAME so that it can be used in SQL.
select distinct username
from work.users (rename=(user=username));

Although USER and CASE are the only column names that SAS won’t recognize, it can be confusing to use
names that are the same as the keywords commonly found in a query. To reduce the potential for confusion, avoid
using SQL keywords as names, and especially try to avoid these keywords as column names:

NULL
AS
FROM
WHERE
HAVING
GROUP
JOIN
INTO
AND
OR
NOT
BETWEEN
IS
CONTAINS
LIKE

You face tighter restrictions with table aliases. Table aliases are essential when you write queries that read
columns from multiple tables. Do not use any SQL clause keyword, table operator, or set operator as a table alias.
For a more detailed discussion of table aliases, see chapter 6. For a list of SQL reserved words, noting those with
SAS restrictions, see Appendix 2.
SAS has several restricted words of its own. These names are easy to spot. They begin and end with a single
underscore. Elsewhere in SAS, the data step creates automatic variables such as _ERROR_ and _N_, and procedures
create variables such as _NAME_ and _TYPE_. Special names such as _ALL_ stand for lists of objects. Conflicts
could occur if you were to create columns, tables, and views with these words as names. This trouble is easily
avoided: do not create names that begin and end with a single underscore.
3
SQL Computations and Conditions
Column expressions are an important part of queries. They are especially important in the SELECT and WHERE
clauses. When columns you want to add to a SELECT clause do not already exist in the input tables, it is good to
know that there are other ways to write column expressions. A WHERE expression is formed in much the same
way, usually from table columns, constant values, and operators, but serves a very different purpose. The expression
in a WHERE clause is expected to provide a logical true or false value that separates some rows from others.
Column expressions also appear in other clauses.
SQL column expressions use SAS and SQL operators and functions to compute new values from existing
columns. Virtually the whole range of SAS functions and operators are available for use in SQL. There are also
many additional SQL operators which may not be so familiar to SAS programmers. These operators, particularly the
CASE operator, are important in SQL.

Expressions as Result Columns


The need for column expressions is most obvious in the SELECT clause. The SELECT clause consists of a list of
column expressions. The expressions create the columns of the result set. A column expression can be as simple as
an input table column or a constant value, or it can combine columns, constants, operators, and functions to compute
a value.
The following data on locks in the Panama Canal is used to demonstrate column expressions in the SELECT
clause.

GEO.PANAMALOCK
location stepslanes elev_lo elev_hiwidthlengthdepth
Gatun 3 2 0 26.5 33.5 320 12.5
Atlantic 3 1 0 26.5 55 427 18
Pedro
1 2 16.5 26.5 33.5 320 12.5
Miguel
Miraflores 2 2 0 16.5 33.5 320 12.5
Pacific 3 1 0 26.5 55 427 18

The following query shows the use of operators and table columns in column expressions. It selects three table
columns, then creates two new columns, RISE and VOLUME, which are computed from existing table columns.
select location, steps, lanes,
elev_hi - elev_lo as rise,
width*length*depth as volume
from geo.panamalock;

location stepslanes rise volume


Gatun 3 2 26.5 134000
Atlantic 3 1 26.5 422730
Pedro 1 2 10 134000
Miguel
Miraflores 2 2 16.5 134000
Pacific 3 1 26.5 422730
Expression Data Types
When you create a column using a column expression, the expression determines the data type of the resulting
column. This means the expression must result in the data type you have in mind for the column. SQL supports a
long list of data types, but only two are actually implemented in SAS. When you create a column using a column
expression, SAS stores the column as either a character value or a numeric value, according to the data type that
results from the expression. In general:

Computations result in numeric values.


Most functions and nearly all operators result in numeric values.
The concatenation operator || and functions that extract and transform character strings
result in character values.
The MAX and MIN operators and functions and the COALESCE function may result in
numeric or character values, depending on the data type of the arguments.
The PUT function (see “Functions for Formats and Informats” at the end of this chapter)
creates a character value.
The INPUT function (again, see “Functions for Formats and Informats” at the end of this
chapter) creates a value of the data type of the informat used as its second argument.

When an expression creates a character value, its length might be longer than the length you want. Use the
length column attribute to set the length. For example, the STNAMEL function returns U.S. state names from 2-
letter codes. It creates a length of 20, long enough to hold the value District of Columbia. In data that is strictly
limited to states, you could shorten the resulting state name column to a length of 14, still long enough for North
Carolina, as shown in this example:

stnamel(state) length=14 as statename

Constant Values
When you write constant values in a SELECT clause or anywhere else in an SQL statement in SAS, write them as
SAS constants that are consistent with the data type.

Write numeric constants in the standard computer style, with no commas, such as 1.25,
5900, or -6. You can also write scientific notation using the letter E, such as 4.0E6 to indicate
4 million.
Character values can be enclosed in either single or double quotes. (This depends on the
DQUOTE= option; see “SQL Options” in chapter 9.)
SAS does not recognize the SQL keyword NULL as representing a null value in a SELECT
or WHERE clause. (NULL can be used only in the VALUES clause of an INSERT
statement — see chapter 7.) Instead, write a period to indicate a numeric null value. Write a
null string (that is, write '') to indicate a null character value.
Write a SAS date constant as the day number, 3-letter month abbreviation, and year, quoted
and followed by the letter D (for example, '15MAR2015'D for March 15, 2015).

WHERE Expressions
Like a computed column in a SELECT clause, a WHERE expression is an example of an SQL expression that
results in a single value. A WHERE expression, though, is a logical expression resulting in an indication of true or
false. Input rows that result in a value of true are used in processing the query. Rows that result in a value of false
are skipped. In this way, the WHERE clause may be seen as defining a subset of the rows in an input table.
A WHERE condition may be used to create several different effects: to find a single observation or a group of
observations, to select a range of values, or to qualify or validate rows.
The simplest use of a WHERE condition is to select a single row by matching one or more key variables.
Usually this kind of condition is formed with the = operator. These are two examples:
where supplier_id = 'AQ1455'

where state = 'OH' and county = 'Franklin'

A WHERE condition written in the same way, but with a column that defines groups or categories, may select a
group of rows.
A WHERE expression is also commonly used to select rows that meet a rule or criterion. This expression is
often formed with the > operator, as in this example:
where purchase_ytd > 0

Here is another example, using the BETWEEN-AND operator to select values in a range:
where year between 1979 and 2016

To limit the results to rows that have a value in a particular column, use the IS NOT NULL operator. This
example might be used to select people whose names are known:
where name is not null

To select a set of acceptable or relevant values, use the IN operator. This example is meant to select data from
the United Kingdom, the Netherlands, Germany, and Denmark, based on the ISO 3166-1 alpha-2 country code
values in each row of data:
where countrycode in ('GB', 'NL', 'DE', 'DK')

The WHERE Statement and Data Set Option


SAS considers WHERE expressions so valuable that it lets you use them throughout a SAS program. Write a
WHERE clause as a statement in almost any data step or proc step, other than SQL, to limit the observations read
from the input SAS data set. Or, write the WHERE= data set option with an input or output SAS data set.
A WHERE statement is essentially a WHERE clause written as a separate statement. In a data step, a WHERE
statement refers to the observations of every input SAS data set in the step. In a proc step, a WHERE statement is
associated with the primary input SAS data set.
The WHERE= data set option can be used almost anywhere you read or write a SAS data set, including as a
table in SQL. Like other data set options, the WHERE= data set option is written in parentheses after the SAS data
set name. Since a WHERE condition can contain an equals sign, the condition itself must also be enclosed in
parentheses. Use the WHERE= data set option:

in the DATA statement of a data step to limit the observations stored in an output SAS data
set.
in the SET statement or other action statement that reads a SAS data set in a data step to
limit the observations used from an input SAS data set.
in the DATA= option, OUT= option, or similar option in a proc step where a SAS data set is
used, to limit the input or output observations.
as part of an argument to a function or method that processes data from a SAS data set.
in the FROM clause of an SQL query to limit the rows used from an input table (this is
permitted, but it is usually clearer to write the condition in an ordinary WHERE clause).

The WHERE= data set option can even be used when updating a SAS data set. For this situation, there is another
data set option, WHEREUP=, to clarify the effect of the WHERE= option.

With WHEREUP=YES, the WHERE condition applies to both input and output observations. That
is, you can update only observations that meet the condition, and you are prevented from
changing them so that they no longer meet the condition. (You can, however, delete these
observations.)
With WHEREUP=NO, the WHERE condition applies only on input. You update only observations
that meet the condition, but you are free to change them in any way.

When you write WHERE conditions outside of the SQL context, in the WHERE statement and WHERE= data
set option, use a hybrid of SAS and SQL syntax.

Conditions use SAS operators, but there are several exceptions. The <> symbol has its SQL
meaning (is not equal to) rather than its SAS meaning (maximum). The BETWEEN-AND
and string-matching operators from SQL (described below) are also available.
SAS functions can be used, but only those that are permitted in SQL. SQL functions are not
available.
Conditions may refer to variables in the SAS data set, but not to any other variables.

The two examples that follow demonstrate the use of the WHERE statement and data set option. First, to
demonstrate the WHERE statement, the following data step interleaves the SAS data sets WORK.TOUR and
WORK.PROMO, both already in sorted order by COUNTRY, but includes only observations that have a STATUS
value of CONFIRMED.
data work.combined;
set work.tour work.promo;
by country;
where status = 'CONFIRMED';
run;

The example below uses the WHERE= data set option in the TRANSPOSE procedure. The procedure transposes
groups of observations representing events in the SAS data set WORK.CALENDAR. It uses only observations that
have APPOINTMENT_TYPE values of Event Start Date and Event End Date, converting the variable
APPOINTMENT_DATE to the new variables EVENT_START_DATE and EVENT_END_DATE.
proc transpose data=work.calendar
(where=(appointment_type in
('Event Start Date', 'Event End Date')))
out=work.multiday;
by name;
id appointment_type;
var appointment_date;
run;

Numeric Operators
The operators you can use in SQL column expressions in SAS are the same operators that are used to form SAS
expressions in general.
Numeric operators provide the core actions of arithmetic. The SQL standards indicate four numeric operators for
arithmetic. The arithmetic operators are:
+

Addition, identity
-

Subtraction, negation
*

Multiplication
/

Division

These four operators are used in the same way in SAS.


SAS adds one arithmetic operator that is not considered standard in SQL:
**

Exponentiation

The arithmetic operators alone may be sufficient to compute a new column, as in this example of a SELECT
clause:
select length, width, length*width as area

For other examples, look back at the SELECT clause examples at the beginning of the chapter.
In a WHERE clause, an expression almost always involves other operators. Comparison operators such as = and
>are often needed. There are several comparison operators, and they are described in detail next. For more
complicated conditions, these expressions can be combined using the logical operators AND, OR, and NOT, or
extended with the selection operators MIN and MAX. These operators are listed here:
AND
&

Logical and
OR
|

Logical or
NOT
^

Logical not
MIN

Minimum
MAX

Maximum

Although using the & symbol for the AND operator is permitted in SAS, writing the operator this way is not
necessarily a good idea because of the role of & in macro variable references (see chapter 10).
This example tests for rows where the combined product weight and tare weight is at least 2:
where product_weight + tare_weight >= 2

Use the AND operator to apply two conditions at the same time, as in the example below. This WHERE clause
limits the rows to those that have specific values for PROTOCOL and YEAR:
where protocol = 'SAFETY' and year between 2014 and 2016

Comparison Operators
A comparison operator compares two values. In SQL, the values must belong to the same data type.
The list below shows the core group of comparison operators in SQL. With slight variations, these same
operators form comparisons in many programming languages, including SAS.
=

Equals
<>

Is not equal to
>

Is greater than
<

Is less than
>=

Is greater than or equal to


<=

Is less than or equal to

SQL calls for null to be a clearly distinct value, not having a data type, not able to be compared, and with only a
few exceptions, not able to be used in any way at all. SAS logic departs from this, allowing null values to be
compared. If you compare an ordinary numeric value to a null value with the = operator, the result is false. If you
compare two null values to each other, the result may be true. Null values are considered less than ordinary numeric
values.
Among the core comparison operators in SQL, only <> clashes with SAS syntax. Elsewhere in SAS, this symbol
represents the maximum operator. In the SQL procedure and WHERE expressions, it means is not equal to. (The
word MAX may be used for the maximum operator.)
SAS allows the core group of comparison operators to be written in other ways, including these two-letter words:
EQ

Equals
NE

Is not equal to
GT

Is greater than
LT

Is less than
GE

Is greater than or equal to


LE

Is less than or equal to

SQL adds three more operators used for checking for specific values and ranges of values.
IN

In; compares a value (on the left) to a list of constant values, in parentheses (on the right). In SAS,
commas are not strictly required between the items in the list of constants.
The IN operator is also available in the data step and elsewhere in SAS.
IS NULL
IS MISSING

Is null; tests for a null value. IS NULL is the standard SQL usage. The word MISSING is allowed in
this operator in keeping with SAS usage.
SAS accepts the IS NULL or IS MISSING operator in SQL and in WHERE expressions. Elsewhere
in SAS, use the MISSING function to test for missing values.
BETWEEN-AND

Tests whether a value falls within a range. The value is usually a column and the range is usually
defined by two constant endpoints. Write the lowest value in the range before the word AND, the
highest value after. Note that this use of the word AND is different from the logical AND operator.

Comparison operators can be logically negated by writing the NOT or ^ operator before them. This results in
additional possibilities for comparison operators, such as:

^>

,
^= NOT EQ

,
NOT IN NOTIN

, ,
NOT IS NULL IS NOT NULL IS^NULL

These are examples of expressions using the comparison operators:


age >= 14
start_date + 7 < end_date
segment_number in (1050, 1055, 1280, 1285)
base ne 0
base ne 0 and base is not null
price is null or quantity is null
price is not null and quantity is not null
percent between 0 and 100

Operator Priority
Most expressions involve more than one operator. In this situation, SAS has to decide which of the operators to
evaluate first. You can specifically indicate that an operator is evaluated before the operators around it by enclosing
the operator and its operands in parentheses. In the example below, the subexpressions x - 1 and y - 1 are evaluated
first, then the result multiplied together.
(x - 1)*(y - 1)

In this example the parentheses are needed to indicate the order of operations, but often parentheses are not
needed. When it has to choose, SAS follows rules of operator priority that say what order it evaluates operators in.
Higher priority operators are evaluated first. When two operators have the same level of priority, SAS follows a rule
to put the operators in order, going from left to right or right to left depending on the priority level. Most of these
rules are spelled out in the SQL rules of operator precedence, and in all, the SAS approach to SQL expressions is
mostly consistent with the usual SAS rules of operator priority.

Level 1, evaluated first, right to left


-a Negative

+a Positive

a**b Exponentiation

a MAX b Maximum

a MIN b Maximum

Level 2, left to right


a*b Multiplication

a/b Division

Level 3, left to right


a + b Addition

a - b Subtraction
Level 4
a || b Concatenation
Level 5, left to right
All comparison operators
Level 6, right to left
NOT a Logical not

Level 7
a AND b Logical and

Level 8, evaluated last


a OR b Logical or

The comparison operators are all at the same priority level, and this includes a set of special comparison
operators for character strings, discussed below. The CASE operator, also discussed below, does not have an
operator priority level because it is always delineated by the keywords CASE and END.
The main difference from the usual SAS rules for operators is the treatment of the NOT operator. It is in the
highest level of priority elsewhere in SAS, but has a low priority in SQL expressions, though it still comes before
AND and OR.
In general, the logical operators are the most common area of confusion in operator priority. Not all
programmers realize that AND and OR have different levels of priority, and the priority of NOT varies between
languages. To avoid confusion I recommend using parentheses with logical operators:

Enclose the operand of the NOT operator if it is anything more than a function call.
Enclose the AND operator and its operands whenever it is used in the same expression with
any other logical operator.

Operators for Character Values


There are a few special considerations when operators are used with character values as operands.
The concatenation operator || is the only regular SAS operator that works exclusively with character values. It is
discussed and compared to concatenation functions later in the chapter under “String Concatenation.”
The MAX and MIN operators select the maximum and minimum values, the same as they do for numeric
operands. In SAS, a null value in a character column is considered the equivalent of a blank string, and it can be
used with the MAX and MIN operators. When one operand is blank and the other is a name or an alphanumeric
code, the MIN operator selects the blank string; the MAX operator selects the name or code.
When comparison operators are used with character values, usually it is to test for specific values. Use the =
operator to test for a single specific value. This WHERE clause, for example, selects all rows where the value of
COUNTRY is Egypt:
where country = 'Egypt'

Use the IN operator in the same way when you want to match any of several values in a list. This example looks
for rows where COUNTRY is Paraguay or Peru:
where country in ('Paraguay', 'Peru')

To test for a null or blank value, use the IS NULL operator. To test for a nonblank, non-null value, use IS NOT
NULL.
All of the comparison operators, including BETWEEN-AND, have the same meaning with character operands
that they have with numeric operands. The simple comparison operators have alternate versions that can truncate
operands, and there are several special character comparison operators.
In ordinary character comparisons in SAS, trailing spaces are assumed as needed to compare values that have
unequal lengths. SAS can also do character comparison with truncation. In this form of comparison, when two
character values have unequal lengths, the longer value is shortened to the length of the shorter one before the
comparison is done. In SAS generally, these operators are formed by writing a colon (:) after the comparison
operator (such as CODE1 =: CODE2). The colon, though, has a conflicting use in SQL syntax (marking host variables, as
described in chapter 10), so for SAS SQL, comparison with truncation must be written by adding the letter T to the
word form of the operator. That results in these operators:
EQT

Equals, with truncation


NET

Is not equal to, with truncation


GTT

Is greater than, with truncation


LTT

Is less than, with truncation


GET

Is greater than or equal to, with truncation


LET

Is less than or equal to, with truncation

The truncating version of the IN operator (written IN: elsewhere in SAS) is not available in SQL. Instead, form
an equivalent expression using the EQT and OR operators.
There are several special character comparison operators for more demanding text-matching situations. The
LIKE, CONTAINS, and sounds-like operators look for partial matches and near-matches between two character
strings.
LIKE

The LIKE operator checks for a match between two strings with wild-card characters. In the second
string, the character _ matches any single character, and the character % matches any sequence of
characters.
CONTAINS
?

The CONTAINS or ? operator tests whether the second operand is contained in the first operand.
=*
EQ*

This is the sounds-like operator, which tests whether two words are likely to sound generally similar
when pronounced in U.S. English. For a match in this algorithm, initial letters must be the same
letter, but subsequent vowel letters and repeats of consonants are disregarded, and similar-sounding
consonant letters are considered to match.

You would typically use the special character comparison operators with a column as the left operand and a
constant value as the right operand. The following examples are shown with two constant operands in order to
demonstrate the effects of the operators. Each of the following expressions evaluates as true.
'XTM1A' like 'XTM_A'
'business' ? 'us'
'broom' =* 'baron'

The CASE Operator


The CASE operator is so much more complex than other operators that it is often classified separately. Unlike any
other operator in SQL, it has four required keywords, CASE, WHEN, THEN, and END. It acts as an operator,
though, generating one value that depends on the value of its operands.
The CASE operator allows conditional logic to determine the result of an expression. Those familiar with data
step programming in SAS might notice that the form of the CASE expression resembles that of the SELECT block
of the data step. This is an example of a column definition that uses a CASE expression:
case
when amount < 0 then 'Credit'
when amount > 0 then 'Debit'
else 'No Balance'
end
as balance_text

This CASE expression makes a series of comparisons, which come after the word WHEN. The resulting value,
after the word THEN or ELSE, is selected according to which of the WHEN conditions is true. The corresponding
value is selected for the first WHEN condition that evaluates as true. If all WHEN conditions are false, the ELSE
value is selected, if there is one. If all WHEN conditions are false and there is no ELSE value, a null value results.
A CASE expression can also be written with a value after the word CASE. This value is compared to values
written after the word WHEN until a match is found. The example below uses a CASE expression to create the new
column PROJECT_PHASE with the value “Pilot,” “Charter,” or “Public,” depending on the value of
ENROLL_YEAR.
case enroll_year
when 2002 then 'Pilot'
when 2003 then 'Pilot'
when 2004 then 'Charter'
else 'Public'
end
as project_phase

This example looks at the value of the column YEAR. When YEAR is 2002 or 2003 it results in the value Pilot
for the new column PROJECT_PHASE. When YEAR is 2004, the resulting value is Charter. When YEAR has any
other value, the value of PROJECT_PHASE is Public.

Numeric Functions
Most of the functions in SAS are numeric functions used for mathematical computations or other kinds of
processing that use arguments as numbers.
SQL suggests only a few standard functions, including ABS, MOD, CEIL, EXP, FLOOR, and SQRT. These are
also SAS functions.
ABS(x)

Absolute value. This is the argument converted to a positive value. In mathematical terms, it is the
argument itself for a positive argument, or the negative of a negative argument.
MOD(x, modulus)

Modulo. Returns the remainder that results from dividing x by modulus.


CEIL(x)

Ceiling. This is the argument rounded up to the next integer value.


EXP(x)

Exponential function. The mathematical constant e (approximately 2.718) raised to the power
indicated by the argument.
FLOOR(x)

Floor. This is the argument rounded down to the next integer value.
SQRT(x)

Square root.
There are two standard SQL function names that are not available in SAS. If you are looking for the LN function
of SQL, for common logarithms, use the LOG function instead. SAS also does not have the POWER function. In its
place, use the exponentiation operator, written as **.
There are many more SAS numeric functions. These are other SAS numeric functions of note:
INT(x)

Integer truncation. Removes the fractional part of a value to create an integer value.
FUZZ(x)

Fuzz effect. This function rounds near-integer values to integer values.


The fuzz effect is also included in the CEIL, FLOOR, INT, and MOD functions.
CEILZ(x)
FLOORZ(x)
INTZ(x)
MODZ(x, modulus)

Write the Z at the end of these function names to remove the fuzz effect. These versions of the
functions do not automatically round near-integer values to the nearest integers.
ROUND(x)
ROUND(x, unit)

Rounding. Rounds x to the nearest multiple of unit, or 1.


SIGN(x)

Sign; a value that indicates the sign of a number: 1 for positive values, 0 for 0, -1 for negative
values, null for null values.

SAS Functions Not Available in SQL


You can use most SAS functions when you write SQL expressions, but there are some functions that wouldn’t quite
make sense in the SQL context or that don’t easily translate to the SQL environment. These functions are not
available in SQL:

Queue functions
The queue functions LAGn and DIFn depend on rows being processed in a well-defined sequence. The
data step’s observation loop and DO loop provide this, but there is nothing that corresponds to this
in SQL execution.
Array functions
Functions such as LBOUND and HBOUND require arrays as arguments, but SQL does not provide
a way to define arrays.
Variable information functions
Variable information functions such as VLABEL and VLABELX return information about variable
attributes. This would make a certain kind of sense in a SELECT clause, but SQL handles query
columns differently from the way the data step handles data step variables, so it is not able to
support these functions.

Other functions, though nominally supported, may be impractical in SQL because they depend on actions
happening in a particular sequence. It may not be possible, for example, to accomplish a task with I/O functions
(reading and writing files), because this requires one function call to open a file, followed by other function calls to
take actions on the file, and SQL does not incorporate this idea of sequence. Similarly, functions such as
ALLCOMB and ALLPERM that generate combinations and permutations are meant to be called repeatedly in a
loop with an index variable. That action-oriented pattern wouldn’t easily be replicated in the results-oriented
environment of SQL.
Selection Functions
In a results-oriented language such as SQL, conditional logic has to fit into expressions, and this makes selection
functions especially important in SQL. A selection function chooses one of several values depending on specific
qualities of the values.
There are several functions for selecting a value from among a few alternatives.
COALESCE(value, . . . )

The COALESCE function returns the first non-null (nonmissing) value among its arguments. The
arguments can be any number of columns of the same data type. Often the last argument is a
constant value, to provide a result in the event that none of the columns can provide a value.
COALESCE can be used to provide a substitute value. In this example, if NAME is null,
SCREEN_NAME is used instead, and if that too is null, the constant value Anonymous is used.
coalesce(name, screen_name, 'Anonymous') as name

The COALESCE function is a standard SQL function. Elsewhere in SAS, a different function called
COALESCE accepts only numeric arguments, and there is a corresponding function called
COALESCEC for character arguments.
IFC(condition, value, value)
IFN(condition, value, value)

The IFC or IFN function selects a value based on a condition. Write the condition as the first
argument. The function returns the second argument if the condition is true, the third argument if the
condition is false. If you provide a fourth argument, the function returns this value if the condition is
a null value.
Use the IFC function if the values are character values, IFN if the values are numeric.
These functions are often useful in data cleaning situations. The example below consolidates weight
values that were measured in either pounds (abbreviated as LB in the column WEIGHT_UNIT) or
kilograms, to create WEIGHT_KG, the weight in kilograms.
ifn(weight_unit = 'LB', weight/0.45359237, weight) as weight_kg

The IFC and IFN functions are similar in some ways to the CASE operator, but the two functions
are limited to the specific task of selecting a value based on a single condition.
LARGEST(value, . . . )
SMALLEST(value, . . . )

The LARGEST or SMALLEST function selects a number from a list based on the way it compares
to the other numbers in the list. Write the rank you want to select, a counting number between 1 and
the length of the list, as the first argument. Write the list of numbers as the remaining arguments.
The function returns one of these numbers as its result, according to the rank you indicate.
For example, to select the largest value among the four columns A, B, C, and D, write largest(1, a,
b, c, d) or, alternatively, smallest(4, a, b, c, d). For the 2nd smallest value, write smallest(2, a, b, c,

d).

Time Functions
SAS has dozens of functions for obtaining and converting time values. A large number of functions are needed
because of the complexity of the clock and calendar and the many different ways of measuring time.
There are three main ways of measuring time in SAS. The SAS date values, SAS datetime values, and SAS time
values correspond to the standard SQL data types DATE, TIMESTAMP, and TIME. There are functions to obtain
the current time and to convert among these three kinds of values.
DATE()

Returns the current date as a SAS date value.


DATETIME()

Returns the current time as a SAS datetime value.


Use this function if auditing rules require you to record the time when each individual row was
created.
TIME()

Returns the current time of day as a SAS time value.


DATEPART(SAS datetime value)

Extracts the date from a SAS datetime value, returning a SAS date value.
TIMEPART(SAS datetime value)

Extracts the time of day from a SAS datetime value, returning a SAS time value.
DHMS(SAS date value, 0, 0, SAS time value)

Creates a SAS datetime value that corresponds to the date indicated by the SAS date value and the
time of day indicated by the SAS time value.
You might wonder, what are the two 0 arguments in the DHMS function? The DHMS function was
mainly designed to create a SAS datetime value from a SAS date value and separate hour, minute,
and second values. It is the hour and minute arguments that are 0 when you use a SAS time value
for the time of day. With a SAS time value, the hours and minutes have already been combined with
the seconds and are represented in the seconds argument.

In current business data, the time of an event is usually recorded as either a date or a timestamp. It is not as
common to find data in which the date and time of day are recorded separately. However, if you have this kind of
data, with a separate SAS date value and SAS time value referring to the same event, the DHMS function combines
them into a single column. Write a column expression such as the one shown here.
dhms(transaction_dt, 0, 0, transaction_tm)
as transaction_ts format=datetime18.

Not all time computations require functions. The unit of SAS date values is days, so you can do time
computations in days just by adding and subtracting, as appropriate. Similarly, with SAS datetime values, you can
do computations in seconds.
When writing time computations, you may need to write time constants. These are examples of time constants:

SAS date value (days since 1960)


'14SEP2016'D

SAS datetime value (seconds since 1960)


'01MAR1994:18:30:00'DT

SAS time value or elapsed time in seconds


'13:00:00'T
'13:00'T

Functions are available to extract the various calendar elements from a SAS date value. The functions are
YEAR, MONTH, and DAY, along with QTR for quarter and WEEKDAY for the day of the week, counting from 1
for Sunday to 7 for Saturday. These functions are demonstrated in the following query.
select
a_date format=date9.,
year(a_date) as a_year,
month(a_date) as a_mo,
day(a_date) as a_day,
qtr(a_date) as a_qtr,
weekday(a_date) as a_wkd
from main.calendar;

If the table MAIN.CALENDAR has one row, in which the value of A_DATE is '28NOV2013'D, the result set is:

a_date a_year a_moa_day a_qtra_wkd


28NOV2013 2013 11 28 4 5
The calendar element functions expect a SAS date value as an argument. To obtain calendar elements from a SAS
datetime value, use the calendar element function in combination with the DATEPART function. For example, if
A_TS is a SAS datetime value, its month number can be obtained with this expression:
month(datepart(a_ts)) as a_mo

There are similar functions, HOUR, MINUTE, and SECOND, for obtaining clock elements from a SAS datetime
value or SAS time value. The result of the SECOND function includes fractional parts of a second. The HOUR
function returns the hours of the 24-hour clock, ranging from 0 to 23.
To calculate an age in years based on the calendar, use the YRDIF function with two SAS date values as the first
two arguments, followed by the code value 'AGE' as the third argument.
yrdif(start, stop, 'age') as duration_years

The YRDIF function includes fractions of years in its result. To obtain the whole number, apply the INT
function to the result, as shown below.
int(yrdif(start, stop, 'age')) as duration_whole_years

An Older AGE
The 'AGE' code argument is new in SAS 9.3. In previous releases, you can obtain similar
results (the same except when one of the two years is a leap year and the other is not)
using the 'ACTUAL' code argument instead.

The INTCK function, like YRDIF, computes the number of elapsed time intervals, but it is far more complex
and flexible and can handle almost any time interval you may have heard of. This is also true of the INTNX
function, which does related time computations based on intervals. These two functions have a wide range of
capabilities, but here, I want to cover one specific situation because it comes up often: determining the first day of a
time period that contains a date.
You might need the first day of the quarter, for example, because that is the value you would ordinarily use to
represent the quarter. Compute this value with the INTNX function with three arguments: the time interval code
'QUARTER', the date, and the offset value 0.

intnx('quarter', mydate, 0) as myquarter


format=yyqp6.

In this example, the format YYQP displays the resulting value as a year and quarter number, such as 2017.2.
Use other time interval codes to find the first day of other calendar intervals. Available time interval codes
include 'YEAR', 'MONTH', and 'WEEK'. Using 'WEEK' gives you the date of the latest Sunday. Use the code 'WEEK.2' instead
for the date of the latest Monday.

Displaying Dates
In other SQL environments one of the most pressing questions about handling data is often
the question of how to convert a date to a character value. In SAS this is usually not
necessary, as just associating a format with a column, using the format column attribute as
shown in the preceding examples, is enough to have SAS display the value in the way you
prefer and write it that way in any output files.
In those cases where you do need to create a separate text column that represents a date,
you can accomplish this using the PUT function with your selected format. See “Functions
for Formats and Informats” at the end of this chapter.

Substrings and String Padding


Use functions to extract parts of character strings, called substrings, and also to change the length of string values.
Some of these functions also have the effect of fixing the length of the result, padding with trailing spaces as needed.
SUBSTR(string, start, length)
SUBSTR(string, start)

SUBSTR is the basic substring function in SAS. It extracts a segment of the string argument based
on a starting position and length. It is highly efficient but can create errors if the arguments are not
consistent with each other.
If you omit the third argument the function selects up to the end of the string.
SUBSTRN(string, start, length)
SUBSTRN(string, start)

SUBSTRN is similar to SUBSTR, but is more forgiving when the start and length arguments are
inconsistent. It returns a null string if no part of the string is selected.
SUBSTRING(string FROM start FOR length)
SUBSTRING(string FROM start)

SUBSTRING is the same as SUBSTRN, but provides the conventional SQL function name and
syntax. SUBSTRING is limited to constant integer start and length arguments.
SUBPAD(string, start, length)
SUBPAD(string, start)

SUBPAD is similar to SUBSTRN, but it always returns a result of the exact length indicated,
padding with trailing spaces.
You can use SUBPAD just for its padding action, if there is a need to add trailing spaces to extend
the length of a string. Write 1 as the starting position and indicate the intended length, which does
not have to be related to the length of the string argument.
CHAR(string, n)
FIRST(string)

CHAR and FIRST are special cases of the SUBPAD function, for situations when you want a
substring of a single character. FIRST returns the first character of the string.
STRIP(string)
TRIMN(string)
TRIM(string)
LEFT(string)

These functions work on leading and trailing spaces. The STRIP function removes leading and
trailing spaces from a string. TRIMN and TRIM remove trailing spaces only; the difference is that
TRIM will not remove a blank string’s last remaining space. LEFT removes leading spaces, but
moves those spaces to the end in order not to change the length of the string.
BTRIM('character' FROM string)
BTRIM(LEADING 'character' FROM string)
BTRIM(TRAILING 'character' FROM string)

The BTRIM function is SAS’s implementation of the standard SQL TRIM function, but renamed to
avoid conflicting with the existing SAS TRIM function. It removes a specific character, indicated as
a constant value, from the beginning and/or end of a string.

If you use character functions to create a new column that you store in a table, the length of the new column will
not always match the length of the values you create. Set the length of the column, when necessary, using the
LENGTH= column modifier. In the example below, the SUBSTR function extracts the second and third characters
of the column PROJECT_CODE to create the column PROJECT_DOMAIN. The LENGTH= column modifier sets
the length of the result to 2.
substr(project_code, 2, 2) length=2 as project_domain

The LENGTH= column modifier can also be used to shorten a new column, sometimes taking away the need for
a substring function. For example, to extract the first 3 characters of PROJECT_CODE, it is sufficient to write:
project_code length=3 as project_seq

Text Search Functions


Sometimes you need to look for a segment of a string that is not in a known, fixed location. To extract a field
marked off by delimiters within a string, use the SCAN function. To look for specific characters or a substring
within a character column, as you might do in a WHERE expression, use text search functions such as FIND and
FINDC.
The SCAN function parses a string, the first argument, according to delimiter characters you provide as the third
argument. Between the delimiters are fields, and you pick one of the fields by indicating an index number as the
second argument.
Often in a database table when a sequence of codes must be contained in a single column, they are separated by
vertical bars. Retrieve these bar-delimited codes separately using the SCAN function with the vertical bar character
as the delimiter.
The following example extracts four bar-delimited codes from the column CODE_STRING.
scan(code_string, 1, '|') length=4 as code1,
scan(code_string, 2, '|') length=16 as code2,
scan(code_string, 3, '|') length=16 as code3,
scan(code_string, 4, '|') length=16 as code4

Suppose the value of CODE_STRING is A211|432-445-229-58|Band|Place. The resulting values of CODE1


through CODE4 are A211, 432-445-229-58, Band, and Place.
The FIND function searches the text string, its first argument, for a substring, its second argument. It returns the
location of the substring, or 0 if it is not found. Often you are just looking for the presence of the substring; when
this is the case, checking the result of the function for a value greater than 0 provides the same result as the
CONTAINS operator. For example, the following two WHERE conditions are equivalent:
where find(code_string, 'Place') > 0

where code_string contains 'Place'

You might, though, be looking for what comes before or after the substring. In this case, the FIND function can
be used together with the SUBSTRN function, perhaps along with the CASE operator, to extract that part of the
string. The example below is a column expression that extracts the secure web address that follows https://, if that
sequence is present.
case when find(url, 'https://') > 0
then substrn(url, find(url, 'https://') + 8)
else '' end
length=112 as secure_address

The FINDC function looks for specific characters rather than a substring. Supply the set of characters as a string
in the second argument. Like the FIND function, the FINDC function returns the location of the first instance it
finds, or 0 if it does not find any of the characters. The example below looks for either of the capital letters A or E in
the column CODE_STRING.
findc(code_string, 'AE')
A series of functions looks for defined classes of characters in a similar way. These functions start with the ANY
prefix, followed by a four-, five-, or six-letter abbreviation for a character class. The form of the function call, then,
is:
ANYclass(string)

The function searches the string for any character of that character class and returns the location of the first such
character it finds, or 0 if it does not find any.
These are the character classes for the ANY functions:
ALNUM

alphanumeric characters (letters and digits)


ALPHA

alphabetic characters (letters)


CNTRL

control characters
DIGIT

digits
FIRST

characters valid as the first character of a name (Roman letters and underscore)
GRAPH

graphical characters (characters with visible shapes)


LOWER

lowercase letters
NAME

characters valid in a name (Roman letters, underscore, and digits)


PRINT

printable characters (graphical and whitespace characters)


PUNCT

punctuation characters
SPACE

whitespace characters (which provide spacing between graphical characters)


UPPER

uppercase letters
XDIGIT

hexadecimal digits

Use these functions to test for the presence of their respective character classes. For example, this WHERE
clause rejects any row in which NAME contains a punctuation character:
where anypunct(name) = 0

Replace the prefix ANY with NOT, and you can search for any character that does not belong to a character class.
Suppose the column HEXCODE is required to consist of hexadecimal characters only. This WHERE clause leaves
out any rows in which HEXCODE contains any other character:
where notxdigit(hexcode) = 0

You might want to test a value after removing trailing spaces. Combine the ANY or NOT function with the
TRIMN function, as shown here:
where notxdigit(trimn(hexcode)) = 0
String Concatenation
String concatenation is a matter of combining two character strings in sequence, so that, for example, X and 47 go
together to form X47. Concatenation can be done with the concatenation operator or any of various concatenation
functions.
SAS might have borrowed a few ideas from SQL, but the concatenation operator || is a case of SQL borrowing
from SAS. This operator is recognized as a standard feature in SQL, though it is not supported with every character
data type or in every SQL environment.
The concatenation operator is generally useful only for code values and constants. It doesn’t provide punctuation
or adjust spacing, but when values have a fixed length and just need to be put together, the concatenation operator
works well. One example is where two code values are combined and displayed together. If CENTER is a character
column of length 3 containing three-digit codes, and PART_IND contains the letters Y and N, then center || part_ind
provides values such as 120N and 019Y. The concatenation operator can be used to add a constant prefix to a value
for display. If CLOUD_COVER has the value Clear, then 'Skies: ' || cloud_cover is Skies: Clear.
With most values, though, adjustments for spacing, punctuation, or both are required. For example, if
EMAIL_ACCT has a length of 14 and the value reporting, then email_acct || '@codecorp.us' is
reporting @codecorp.us, with five spaces between the words, which is probably not what you want. For this
kind of concatenation, use one of the concatenation functions.
For concatenating text in various ways, there are several concatenation functions: CATT, which removes trailing
spaces from values before concatenating; CATS, which removes leading and trailing spaces; and CATX, which
removes leading and trailing spaces and adds a delimiter (the first argument) between values to form a list or a
sentence. There is also the CAT function which produces results consistent with the || operator, though the function
works more smoothly than the operator when arguments are numeric. All of the concatenation functions work with
both character and numeric values, but the resulting values are strictly character.
Use the CATS function to assemble a word from parts. For example, this expression combines the columns
PREFIX, ROOT, and SUFFIX, representing parts of a word, omitting a part if it is blank:
cats(prefix, root, suffix) length=32 as word

If PREFIX is re, ROOT is flow, and SUFFIX is s, then WORD is reflows.


As the examples here show, it is always necessary to set the length of the resulting column when you create a
column using the concatenation functions. Write the LENGTH= column modifier with a length large enough to hold
the resulting values.
Most text combinations require punctuation or spaces between parts, so CATX is the most generally useful
concatenation function. Write the delimiter as the first argument. The function adds the delimiter between strings.
Use a space as the delimiter when assembling a name or sentence. This expression forms a name from a first name
and last name:
catx(' ', firstname, lastname) length=40 as name

If FIRSTNAME is John Paul and LASTNAME is Jones, then NAME is John Paul Jones.
The expression below revises the previous example to write the last name first, followed by a comma:
catx(', ', lastname, firstname) length=40 as name

With the revised expression, NAME is Jones, John Paul.

Arguments can be numeric values. If IP1-IP4 are numeric (or character) values indicating the four parts of an IP
address, this expression provides a text version of the complete address:
catx('.', ip1, ip2, ip3, ip4) length=23 as iptext

The example data shown here is a one-row table provided for the purposes of the examples that follow. In this
table, AERO is a character column with a length of 4. WAY is a character column with a length of 8. The other
columns are numeric columns.

AERO WAY LONG LAT


BNA I-440 -86.6 36.1

These values are used to contrast the effects of the various concatenation functions:
cat(aero, way, long, lat)
BNA I-440 -86.636.1

catt(aero, way, long, lat)


BNAI-440-86.636.1

cats(aero, way, long, lat)


BNAI-440-86.636.1

catx(', ', aero, way, long, lat)


BNA, I-440, -86.6, 36.1

catx(' ', aero, way, long, lat)


BNA I-440 -86.6 36.1

catx('|', aero, way, long, lat)


BNA|I-440|-86.6|36.1

The CATS and CATT functions provide the same result in this example because none of the values have leading
spaces.
The CATX function can be useful for adding dashes, periods, or other punctuation to character code values that
are stored without punctuation. The following example uses the CATX and SUBPAD functions together to add
dashes after the third and fifth characters in the code column SSN to create the column SSNDASH.
select ssn,
catx('-', subpad(ssn, 1, 3), subpad(ssn, 4, 2), subpad(ssn, 6, 4))
as ssndash
from main.accountholder;

If the input table MAIN.ACCOUNTHOLDER contains one row with the value 123456789 for SSN, the output
is:

ssn ssndash
123456789123-45-6789

Functions for Character Processing and Encoding


These functions take a single argument, a character string, and encode, decode, or otherwise process it:
UPCASE(string)
LOWCASE(string)
PROPCASE(string)

These functions convert letters to uppercase, lowercase, and proper case (title case, with an initial
uppercase letter followed by lowercase letters in each word), respectively. Other characters are not
affected.
UPPER(string)
LOWER(string)

These are the standard SQL names for the UPCASE and LOWCASE functions.
HTMLENCODE(string)

Encodes text for use in HTML or XML. This mainly involves substituting the character entities
&amp;, &lt;, and &gt; for the characters &, <, and >.
HTMLDECODE(string)

Decodes HTML-encoded text.


QUOTE(string)
Converts text to a quoted string, using double quotes.
DEQUOTE(quoted string)

Interprets a quoted string as text.


REVERSE(string)

Reverses the order of characters.


COMPRESS(string)

Removes all space characters.


COMPBL(string)

Replaces all instances of multiple spaces with single spaces.

Environment Functions
Environment functions provide information about the SAS environment and operating system environment.
SYSEXIST(environment variable name)

Checks for the existence of an environment variable.


Return values: 1 if the variable exists, 0 if not.
SYSGET(environment variable name)

Obtains the value of an environment variable.


SYMEXIST(macro variable name)

Checks for the existence of a macro variable.


Return values: 1 if the variable exists, 0 if not.
SYMGET(macro variable name)

Obtains the value of a macro variable.


SYSPARM()

Obtains the value of the parameter string. The parameter string can be set at SAS startup using the
SYSPARM system option.
DATE()
DATETIME()
TIME()

The current system clock time, as a SAS date value, SAS datetime value, and SAS time value,
respectively.

Additional information about the SAS environment is available from:

automatic macro variables; see chapter 10


system options; see chapter 9
DICTIONARY tables; see chapter 7
The special SQL name USER, which provides the current user name according to the
operating system, equivalent to the automatic macro variable SYSJOBID

Functions for Formats and Informats


If you want to use a format in an expression, the PUT function makes it possible to do so. The similar INPUT
function lets you use an informat in an expression.
Write the value as the first argument to the PUT function and the format, including its arguments, as the second
argument. A common use for the PUT function is to convert an integer code, such as you might obtain from a
database column, to a character code with leading zeros for display. Use the Z format for leading zeros, as shown in
this example:
put(product_num, z12.) as product_code

If the value of PRODUCT_NUM is 8088123456, the value of PRODUCT_CODE is 008088123456.


Use the PUT function with a date format to convert a SAS date value to text. The following two examples cover
the two most commonly needed formats for dates.
put(mydate, yymmdd10.) as yyyymmdd_text

The result of the expression above is a 10-character string representing a date as it is commonly written in
international usage, such as 1999-12-31.
put(mydate, mmddyy10.) as mmddyyyy_text

The result of the expression above is a 10-character string in the traditional U.S. style for a date, such as
12/31/1999.
The INPUT function is similar, but works with an informat and converts a text argument to a data value. The
resulting data type is numeric if the informat argument is a numeric informat.
The INPUT function is used mainly with the standard numeric informat to convert character codes containing
numeric characters to numeric values. This example converts a five-digit ZIP code to a number:
input(zip, f5.) as zip_code_number

If you use the INPUT function this way, make sure the text you provide as the argument is valid input for the
format you provide. For the standard numeric informat, the text should be either a valid numeral, or a blank value or
a single period to indicate the absence of a value. Other input text can result in null values and these log messages:

NOTE: Invalid string.


NOTE: Invalid argument to function INPUT. Missing values may be generated.
4
Summary Queries
Usually a query produces results at the same level of detail as the data it starts with. Each row in the result set shows
values taken from a specific row in the input table. Summary data takes a very different approach. In summary data,
each result row represents the combined effect of a set of input rows. SQL creates summary data with aggregate
functions, the GROUP BY clause, and the HAVING clause. Aggregate functions combine detail values to produce
summary values. The GROUP BY clause forms the input rows into sets, called groups, for the aggregate functions
to work with. The HAVING clause allows you to apply conditions to the summary rows.

Summary Statistics and Aggregate Functions


SQL can calculate summary statistics for a column using very ordinary-looking column expressions. Write the
statistic, then the column name in parentheses. For example, SUM(TRAFFIC) calculates the sum of the column TRAFFIC.
The syntax is that of a function call, but it indicates an aggregate function and computes a statistic over a column.
To distinguish the two kinds of function calls that occur in queries, we describe them as scalar and aggregate
functions. An ordinary function call that uses individual values is known as a scalar function. An aggregate function
computes with values of multiple rows to produce a single resulting value. Especially in SAS, an aggregate function
may also be referred to as a summary function.
A function call is written in essentially the same way whether it is a scalar function or an aggregate function. The
same statistics that are implemented as aggregate functions are also available as scalar functions. You have to look
at the details to tell the two kinds of functions apart:

Only a short list of statistic names, such as COUNT, SUM, AVG, and N, are implemented
as aggregate functions. All other functions are scalar functions.
To indicate an aggregate function, a function call has to have exactly one argument. If a
function call has two or more arguments, or if it has zero arguments, then it is a scalar
function.

Aggregate functions work with equal ease regardless of the number of rows they are aggregating. The following
table provides a small data set, GEO.NZLAKES, for demonstrating the use of aggregate functions.

GEO.NZLAKES
Name AreaVolumeLengthMaxDepth
Lake Taupo 616 59 46 186
Lake Te Anau 344 . 65 417
Lake
291 67 80 230
Wakatipu
Lake Wanaka 192 58 42 .

The query below computes summary statistics from the table above, using four of the most familiar statistics and
producing the results shown.
select
sum(area) as SumArea,
max(area) as MaxArea,
sum(volume) as SumVolume,
mean(length) as AvgLength,
max(maxdepth) as MaxDepth,
count(maxdepth) as NDepth
from geo.nzlakes;

SumAreaMaxAreaSumVolume AvgLengthMaxDepth NDepth


1443 616 184 58.25 417 3

Looking at the query and the results, note:

The column aliases provide names for the resulting columns.


It is often useful to compute more than one statistic from the same column.
It is possible for the column alias to be the same as the original column name.
The null values do not affect the way the statistics are computed.
The COUNT function, also called N, tells you how many rows have values, excluding null
values.

You can expect to find five standard SQL aggregate functions in any SQL environment: COUNT, SUM, AVG,
MIN, and MAX. In SAS, COUNT is better known as N, and AVG as MEAN. SAS provides several other statistics
for use in SQL. The statistics that are available as aggregate functions are generally the same statistics you can
compute in other SAS procedures.
The list below shows the 10 SAS statistics most commonly used as aggregate functions in SQL. The first five
are the standard SQL aggregate functions and are also the most common summary statistics in SAS. The next five
are other common summary statistics in SAS.
N
COUNT
FREQ

frequency; number of nonmissing values


SUM

sum; total
MEAN
AVG

mean; average
MIN

minimum
MAX

maximum
NMISS

number of missing (null) values


STD

standard deviation
RANGE

range
USS

uncorrected sum of squares


CSS

corrected sum of squares

Five more aggregate function names may be familiar and potentially useful if you are using SAS for statistical
analysis: VAR, STDERR, CV, T, and PRT.
Finally, SAS officially supports the SUMWGT statistic, the sum of weights, as an aggregate function, but this is
not as useful as it might seem. SQL does not provide a way to indicate a weight variable, so the weight is 1 for each
row and the SUMWGT statistic has the same effect as the N statistic.
Null Values in Statistics
In keeping with the conventions of both SQL and SAS, statistics disregard null values. That is, if a column contains
null values, the statistic is computed the same way as if the null values were not present. Examples of this can be
seen in the preceding section, in the computation of SumVolume and MaxDepth.
The counting statistics (COUNT, FREQ, N, NMISS) always result in a counting number. The resulting value is
0 if there are no rows or if all values are null. For the NMISS statistic, the result is 0 if no values are null.

Aggregate Functions for Character Columns


If you want to include a character column in summary data, you will need an aggregate function to convert the set of
values in the input data to a single value for use in the result set. You may need an aggregate function even if the
column has the same value in all of the input rows. The most likely function to use to aggregate a character column
is the MAX function.
With numeric values, the MAX function is one of the usual SAS statistic functions. In SQL, SAS also accepts
character arguments for the MAX function, and then it returns a character result, the maximum or largest value
among its arguments.
Another possibility is the MIN function, which returns the minimum or smallest value among its arguments. The
MIN function disregards null values, just as it does with numeric arguments. In SAS, null values in a character
column are ordinary blank values, so you will not get a blank result from the MIN aggregate function unless a
column is blank in every row.
The COUNT or N function, often used with the DISTINCT modifier as discussed next, is another aggregate
function that accepts a character column as an argument.

Special Arguments for Aggregate Functions


There are two special uses of arguments for aggregate functions.

Use the expression COUNT(*) to count the rows in a table.


Write the modifier DISTINCT before the column name to compute the statistic on only the
distinct values in the column. This approach ignores repeats of the same value. Usually, this
is used to count the number of distinct values in a column, with the expression COUNT(DISTINCT
column). UNIQUE is an older, nonstandard synonym for DISTINCT.

Any statistic can use the keyword DISTINCT before the column name. With DISTINCT, the statistic is applied
to the set of distinct values in the column, rather than the values of all the rows. In other words, each separate value
is used only once. For example, COUNT(PLACE) counts rows in which PLACE has a value, but COUNT(DISTINCT PLACE) counts
the different values of PLACE. As another example, MEAN(X) is the mean of X for all rows in the table, but
MEAN(DISTINCT X) is the mean of the set of distinct values of X.

Computations Based on Summary Data


If all columns in a query are summary functions (or constants), the result is one row that contains a summary of the
rows that the query reads. If a query contains a combination of summary statistics and other expressions, the
statistics are repeated in each row of the result. Combining summary statistics and detail data in this way is called
remerging. A log note indicates the process of remerging in case you wrote a remerging query without realizing it.

NOTE: The query requires remerging summary statistics back with the original data.
Remerging is of interest not just because it can occur by accident, producing results that you didn’t intend.
Remerging has performance implications. A query that remerges may take twice as long to run as a query that
includes only detail data or only summary data. Another reason you might avoid remerging is that it is a violation of
the conventions of SQL. Standard SQL does not permit you to mix detail data and summary data in the same
SELECT clause.
Remerging is permitted in SAS, though, because it is often useful. The most common reason you might use
summary statistics and detail data together in a column expression is to calculate percents or relative frequencies.
This is seen in this example below.
select planet label='Planet',
mass format=comma9. label='Mass 10^21 kg',
mass/sum(mass) format=percent9.3
label='% of Total' as share
from geo.terres;

Planet Mass 10^21 % of


kg Total
Mercury 330 2.793%
Venus 4,869 41.210%
Earth 5,974 50.563%
Mars 642 5.434%

The SQL-standard way to construct an equivalent query involves joining the summary data to the detail data
with the CROSS JOIN operator (see “Table Join Operators” and “Self Joins” in chapter 6).

Grouping
The GROUP BY clause in a query expression divides the rows of the query into groups so that you can compute
summary statistics within those groups. The GROUP BY clause follows the WHERE clause, if there is one. It
usually lists one or several columns from the input table. This is an example of a GROUP BY clause:
group by state, year

GROUP BY columns have the same effect as class variables in other SAS procedures. They organize the rows of
data into groups. Statistics are calculated within the groups instead of being calculated for the entire set of data. In
this example, statistics are calculated separately for each state and year. Write the GROUP BY columns in the
SELECT clause too so that these columns will appear in the output. Groups appear in sorted order in the output,
regardless of the order the input rows are in.
GROUP BY items are usually table columns, but they can also be expressions based on the columns, to form
groups that are not directly indicated by the columns in the query. If the item is an integer constant, it is treated as a
column number indicating one of the columns from the SELECT clause.
If there is a GROUP BY clause and all of the columns are GROUP BY items, summary statistics, or expressions
based on constants, the query generates one row for each group.
If the query contains a combination of summary statistics and other expressions based on columns, the summary
statistics of a group are repeated in each row of the group. If summary statistics are used to calculate percents, they
are percents of the total for the group, rather than percents of the total for the entire set of data. This combination of
summary data and detail data is considered a form of remerging and is nonstandard SQL. To reach the same result in
a standard way, use the INNER JOIN operator (see chapter 6) to join the summary columns to the original detail
columns.
One simple way to use grouping is to create a frequency table. Imagine a table MAIN.INQUIRY showing
customer inquiries, with the column CHANNEL indicating the communications medium the customer used. A query
such as the one below would show the frequency of each communications medium.
proc sql constdatetime;
select channel, count(*) format=comma10. as inquiries
from main.inquiry
where datepart(inquiry_rcvd_ts) = date() - 1
group by channel;

In this example, the WHERE condition selects yesterday’s data only, and the CONSTDATETIME option
permits SAS to evaluate the DATE function (which provides today’s date) only once. The query produces a table
such as this:

channel inquiries
EMAIL 33,010
MOBILE 18,455
POST 5
SOCIAL 68
TEXT 121
WEB 1,776

The HAVING Clause


You cannot use a WHERE clause to select groups or rows based on summary statistics of a group. That is because
the WHERE clause is always evaluated separately for each individual row. Instead, when a condition contains
summary statistics, write it in a HAVING clause. Write the HAVING clause after the GROUP BY clause.
For example, to discard groups in which the total of NETREVENUE is less than 1,000, write this HAVING
clause:
having sum(netrevenue) >= 1000

It is also possible to write a HAVING clause in a summary query that has no GROUP BY clause. In this case,
the HAVING clause is applied to all the rows as one group. The query result set then has either one row or none,
depending on whether the HAVING condition holds.
The following is an example of a HAVING clause you might use without a GROUP BY clause. With this
HAVING clause, the query would generate a summary row only if the data has at least two rows.
having count(*) > 1

The HAVING condition may also be needed in a query with multiple tables, as described in chapter 6, if the
query matches on summary rows of one or more of the tables. The criterion for matching summary rows must be
written in the HAVING clause, rather than the WHERE clause, because the WHERE clause applies only to
individual rows, not to the summary rows of groups.
If the WHERE and HAVING clauses are used together in the same query, the WHERE clause applies to the
individual rows before the groups are formed. The HAVING clause applies later, after the summary rows are
created.

Sorting Summary Rows


In a query with a GROUP BY clause, the result set is automatically sorted in the order of the GROUP BY columns.
Usually that is what you want, but sometimes, it may make more sense to present the results in some other sequence.
You might want to group by PRODUCTNAME, for example, but display the results in order of LOWPRICE.
Add an ORDER BY clause to the end of the query to indicate the sort order. The ORDER BY clause works the
same way as it does with detail data, though with summary data, you are more likely to be sorting by column
expressions. If you are sorting by a summary value, you must refer to it using its column alias (or column number).
The ORDER BY clause is available only when a query is used to form a SELECT statement or CREATE
TABLE statement.
The example below demonstrates the use of a GROUP BY clause with a different ORDER BY clause. This
example also shows that are scenarios in which you might use all six of the primary query clauses in the same query.
select productname, count(*) as sellers,
min(price) as lowprice
from main.offer
where quantity > 0 and price > 0
group by productname
having min(price) between 0 and 5000
order by lowprice

Query Execution Sequence


You have now seen all six of the clauses that may define the result set in a query. The rules of SQL syntax require
the SELECT and FROM clauses, to select columns and identify data sources, and you write these clauses at the
beginning of the query. This can be followed by a WHERE clause with a condition for selecting rows, then GROUP
BY and HAVING clauses for forming groups in summary data. Finally, depending on how you are using the query,
there can be an ORDER BY clause to sort the rows of the result set. Not all clauses are required, but whenever
clauses are present, they must be written in this sequence:

1. SELECT
2. FROM
3. WHERE
4. GROUP BY
5. HAVING
6. ORDER BY

This sequence is not, however, the sequence in which clauses execute. Considering what is involved in executing
a query, it would be difficult to select the columns first, before you know what tables the columns are coming from.
It is simpler if you postpone the actions involved in selecting columns as long as possible.
SQL standards recognize this, and they spell out a logical sequence of clause execution for a query. The
execution sequence puts the SELECT clause next to last. SQL implementations tend to follow this standard
sequence, at least at the conceptual level. The execution sequence for clauses in a query is:

1. FROM
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. ORDER BY

This sequence of execution is what allows table columns that are not in the result set to be used in the WHERE
clause to select rows, or in the GROUP BY and HAVING clauses to form and select summary rows.
When you look at the order of execution of clauses, it is easier to appreciate the difference in effect between the
WHERE and HAVING clauses. In a summary query, the WHERE clause is executed before the GROUP BY clause
that defines groups. Only the source rows exist when the WHERE clause executes, so the WHERE condition could
only apply to the source rows. The HAVING clause comes later, after the source rows have been replaced by
summary rows, so the HAVING condition applies to the summary rows.
Using CALCULATED to Mark Column Aliases
In most programming languages, once you have computed a variable, you can use that result in subsequent actions
in the same program. This works only in a very limited way in SQL. After you create a column alias in a SELECT
clause, you can refer to that column alias in an ORDER BY clause. This makes sense; it seems simple enough that
you can sort the result set according to the columns it contains.
However, you cannot necessarily use a column alias in any other clause, or even in computing another column in
the same SELECT clause. You begin to see why this is when you look at the sequence of execution of clauses in a
query.
According to the design of SQL, the SELECT clause is one of the last clauses to execute when a query executes.
Columns that are computed in the SELECT clause are not available to be used in the other clauses, with the
exception of the ORDER BY clause, because they may not have been computed yet when the other clauses execute.
SAS provides a way around this limitation. Write CALCULATED before the column alias when you reuse a
column alias in the SELECT, WHERE, GROUP BY, or HAVING clause.
It is the WHERE clause where this approach is most likely to be useful. In the example below, a query computes
the column EFFICIENCY, then limits the result set to rows where the computed value is above 6 percent.
select name, length, width, power_max,
power_max/(1000*length*width) as efficiency
from main.solar
where calculated efficiency > .06;

Using the CALCULATED keyword in a query, though, does not change the order of execution, nor does it save
SAS any work. When the WHERE expression is computed in the example above, the SELECT clause has not yet
started. SAS has to calculate the same expression separately for the WHERE clause.
CALCULATED is also nonstandard SQL, found in a few SQL environments but not widely supported. It is
generally better, then, not to use the CALCULATED keyword. You can use column aliases from the SELECT
clause in the ORDER BY clause, and there, the CALCULATED keyword is not required. Otherwise, copy the
expression that you are repeating to each clause where you use it, so that you do not have to rely on the
CALCULATED keyword and column alias.
The above example can be rewritten as shown below. The revision is a simple matter of replacing the alias with
a copy of the expression it refers to.
select name, length, width, power_max,
power_max/(1000*length*width) as efficiency
from main.solar
where power_max/(1000*length*width) > .06;

Revised this way, the query executes in exactly the same way, but the appearance of the code perhaps corresponds
more closely to the way it executes.
5
SAS Output From SQL
When you execute a SELECT statement, SAS displays the result set of the query expression in the form of a table.
In an interactive session, SAS may show you the table as an HTML document in a browser window, or a listing
using monospace text in the Output window. Regardless of the context the program runs in, it saves each output
table in a file that can be displayed, delivered, and archived.
The output table you see may have the same shape as the data table you create and store in a CREATE TABLE
statement, but it has a different purpose: it is meant to be used as a document. In an output table, all values are
converted to text with the use of formats, they are positioned on a line or a page, and details such as fonts, layout,
and color may have been added. These details let you view results or deliver them outside the SAS environment.
Unlike a data table, an output table cannot be used, at least not so easily and directly, as input to another SQL query.
A SAS component called ODS, or Output Delivery System, converts SQL result sets to output documents. ODS
collects output objects from all steps in a SAS program, so a single output document can combine SQL output with
other SAS output. Or, you can store each output table in a separate document file.
ODS can create many different document formats, including HTML, CSV, PDF, and Listing. You write ODS
statements in the SAS program to control the destination and visual styling of the documents you create. Other SAS
statements, especially TITLE and OPTIONS statements, control other details of the output document. These
statements can shape the output only after an output table is created, though. It is the SELECT statement itself that
determines the sequence, content, and formatting of the columns.

SQL Statements and Global Statements


Statements such as the TITLE and ODS statements are considered global statements. They are not involved with the
processing of data that takes place within a specific SAS step. Instead, they affect objects and settings in the larger
SAS environment.
Elsewhere in SAS, we tend to think of placing global statements between steps. The global statements for a step
execute before the step executes in order to set the stage for the processing that takes place during the step. In SQL,
however, every statement is a separate action, so global statements may be needed between any two SQL
statements.
Most global statements associated with a specific SQL statement must be placed before that SQL statement.
These statements may include TITLE statements that define title lines, OPTIONS statements for setting system
options, and ODS statements to open destinations and set options. There are also ODS statements to close
destinations, and these must be placed after the SQL statements that write output to the destinations.
The following example shows a sequence of global statements in a PROC SQL step. These statements are
explained in more detail over the course of this chapter; for now, concentrate on the sequence of statements. Among
all these statements, only SELECT is an SQL statement. The SELECT statement generates an output table. Before
this statement, the OPTIONS statement sets options that affect the display of the table, and the TITLE statement
provides a title line that appears with the table. These statements need to come before the SELECT statement so that
the settings they create are in effect when the SELECT statement executes. Also before the SELECT statement there
are two ODS statements to close all open ODS destinations and open a new HTML destination specifically for the
output table being generated. With the ODS destination open, the SELECT statement can generate output that is
directed to this specific ODS destination. After the SELECT statement, the two remaining ODS statements close the
HTML destination and open the next ODS destination. These actions come last because they must wait until after
the SELECT statement has produced the output for the ODS destination.
proc sql;
options missing=' ' nocenter;
title1 'Poster Colors';
ods _all_ close;
ods html file="colors.html";
select color, c, m, y, k
from main.colorlist;
ods html close;
ods listing;
quit;

ODS Destinations
ODS routes SAS output to specific destinations in specific document formats. What follows here is only a cursory
working introduction to ODS for use in the SQL procedure.
Each distinct document or document format created by ODS is considered an ODS destination. When a SELECT
statement generates output, ODS delivers the output to whatever destinations are open at that moment. The key to
working with ODS, then, is opening and closing destinations using the ODS destination and ODS CLOSE
statements.
The selection of ODS destination name indicates the document format. ODS has dozens of document formats,
and among the most useful are HTML, PDF, Listing, and CSV. Some of these destinations support visual
formatting, and some do not.
For visual destinations, an ODS style provides many of the details of visual formatting. Depending on the ODS
style, the column labels may be in bold or larger type, and the header cells containing them may be set off with rules
and a contrasting background color.
Book Style
Not all the details of ODS style come across in a book. In print, colors may be reduced to
grayscale. In an electronic medium, the output examples get filtered through a rendering
environment that might override many of the distinctions of typefaces, color, and table
layout. Given these limitations, the examples you see in this book do not try to precisely
represent any particular ODS style.

To open an ODS destination, write an ODS destination statement to indicate the destination, the output file, and
any other options that might be needed. The general form of the statement is:
ods destination file="output document file" options;

To close an ODS destination, write an ODS CLOSE statement, indicating only the destination and the CLOSE
action:
ods destination close;

For most ODS destinations, you must close the file before you can view it, so it makes sense to close the
destination as soon as the output that goes to it has been generated. The Listing destination is the main exception.
Each page in the Listing destination can be viewed as soon as it is generated, even while the destination is still open.
If you do not know what destinations might be open, close all open ODS destinations by using the special
destination name _ALL_. This statement is:
ods _all_ close;

It is good to have at least one ODS destination open at all times in a SAS program. That way, if any output is
generated, there is a place for it to go. Whenever you close a destination, then, open the next destination
immediately, in the next statement. If there is no particular ODS destination to open, then open either the Listing
destination or the HTML destination.
To create two or more document types from the same output, open multiple destinations at the same time. ODS
routes any output generated to all open destinations.
The list below shows some useful ODS destinations, the situations in which you might use them, and their
common file name extensions.

HTML
HTML5
HTML4
Electronic document, web page, formatted email message, formatted document for use in
spreadsheet or word processing
File name extension: html, htm

EPUB
A container file format for HTML documents, especially for ebooks and ebook readers (Also:
ePUB, ePub, Epub)
File name extension: epub

PDF
Self-contained electronic document, document for printing, document to be published without
changes
File name extension: pdf
Listing
Plain text, a low-cost default destination
File name extension: txt, lst, log
CSV
Unformatted data for use in spreadsheet, database, or other software
File name extension: csv
XML
Electronic data for exchange between places or applications
File name extension: xml
ExcelXP
Formatted spreadsheet document in a form of XML
File name extension: xml
RTF
Formatted text document for word processing or page layout, for applications that have trouble with
HTML
File name extension: rtf

A destination can hold the output from one SELECT statement, a sequence of SELECT statements, or SELECT
statements combined with output from other steps.
An example earlier in the chapter shows code to generate an HTML file from a SELECT statement.
The example below generates an HTML file with the output from two SELECT statements.
proc sql;
ods _all_ close;
ods html file="colors.html";
title1 'CMYK Colors';
select color, c, m, y, k
from main.colorlist;
title1 'RGB Colors';
select color, r, g, b
from main.colorlist;
ods html close;
ods listing;
quit;

Advanced ODS for SQL


For advanced ODS programmers, the presentation of an output table can be changed
around with the use of a style definition or a custom table template. A table template can
change the sequence of columns and decide how columns are split between pages when
that is necessary. A style definition determines aspects of visual presentation such as fonts,
colors, rules, and layout. The style definition is not limited to the table design, but can also
be used to control other details such as title lines and page layout. SAS makes it possible
to create your own styles and table templates, though the process is more complicated than
I can describe here.
In either case, the column-oriented arrangement of SQL data does not give you many
details to work with when arranging and styling an output table. For ODS purposes, the
output from a SELECT statement is a single table object, with only two kinds of table
cells to style. First, there are the column labels, which are ordinarily displayed as column
headers. All the other cells are ordinary data values.

Large Output Tables


Tables can be larger in length and width than the examples that fit comfortably in a book. The way an output table
expands depends on the way a destination handles pages.
As a table gets longer, a destination that divides its content into pages of a fixed size may continue on additional
pages. Title lines and column headers are repeated on each page.
If a table is wider than the display area, an HTML display scrolls right and left to show the extra columns. For a
document format with a fixed page size, such as PDF, the ODS template may break a table horizontally, going to
another page after columns fill the width of the page.
Although there is no set limit on the number of columns you can show in a table, the most useful tables to
display are those with relatively few columns, so that it is easy for the reader to relate one column to another within
the same row, or to compare one row to another.

Title Lines
Title lines appear at the top of each page of output. In HTML and other document types that allow unlimited page
sizes, the title lines for a table appear only once. In Listing, PDF, and other destinations that divide content into
pages of a fixed size, the title lines are repeated at the top of each page.
Typically, a title line should provide a title for the table that a SELECT statement generates. A title should be
distinct enough that you can tell one table apart from the next. At the same time, it should be comprehensive enough
to provide a meaningful sense of context for a report, if that is needed. If a report may be viewed far and wide, or
well removed from its original context, this might require several title lines, perhaps to identify the company,
project, and author in addition to the specific table. On the other hand, if a table will be incorporated into another
document that provides this kind of context, a single title line might be sufficient.
Write a TITLE statement with these essential components:
titlen "text";

The numeric suffix n is a number from 1 to 10 indicating the sequence of the title line. The character constant
provides the text of the title line. Write TITLE statements in order. Skip over a title line number (other than 1) to
leave a title line blank.
This example provides text for title lines 1 and 3, while defining a blank title line 2.
title1 "My Title Line 1";
title3 "My Title Line 3";

These statements result in these title lines at the top of the page of output:
My Title Line 1

My Title Line 3

Within a SAS program, title lines stay in effect after they are defined, until you define new title lines. Remove
the previously defined title lines by writing a new TITLE1 statement.
Footnote lines are similar to title lines, but they are defined in FOOTNOTE statements and are displayed at the
bottom of the page. Footnote lines are not as common now as they were in previous decades when more documents
were distributed on paper. In a paper document, via an ODS destination such as Listing or PDF, the footnote lines
can be seen at the bottom of each page. In an HTML document, though, footnote lines are seen only after the reader
scrolls to the end of a table, and that could be a long distance, depending on the size of the table. Footnote lines may
be useful for disclosures or similar supporting information, perhaps identifying the source of the data, or the
program that generated the output.
The TITLE (and FOOTNOTE) statement provides limited text formatting options. Text formatting is supported
only in visual ODS destinations, such as HTML and PDF. The text formatting options are ignored in other
destinations.
Most options can be applied separately to parts of a title line. Write several character constants in the TITLE
statement, with separate formatting options before each character constant.
Use these formatting options in TITLE statements:
FONT="font family"
BOLD
ITALIC

These options identify the typeface, or font.


HEIGHT=n

The font size in points, usually a number between 8 and 15.


UNDERLIN=n

Underlining. Write UNDERLIN=1 for underlining, or UNDERLIN=0 to turn underlining off.


COLOR="color"

The text color.


There are several ways to indicate a color, including color names (such as RED, BLUE, BROWN, PINK, GRAY)
and Internet RGB colors (such as #779977 for a medium dull green).
LINK="URL"

A hyperlink. This is mainly useful for documents that will be displayed in a web browser, and it
may be useful more often in a FOOTNOTE statement than in a TITLE statement.
BCOLOR="color"

The background color. This option affects the entire title line. Write this option at the beginning of
the TITLE statement.
Use a background color in a title line mainly for paper documents printed in color, to create a
highlighter effect. This effect may be useful to emphasize that the current report is something
different from what readers would ordinarily expect.
JUSTIFY=

Justification, or horizontal alignment. Use the value LEFT, CENTER, or RIGHT. This option affects the
entire title line. Write this option at the beginning of the TITLE statement.
To center or left-align all output including title lines, use the CENTER or NOCENTER system
option, described below.

The following example defines a title line using a medium blue 13-point Arial bold type.
title1 color="#5555aa" font=arial height=13
bold "My Title Line";

My Title Line

Title lines can show page numbers and the date and time. The system options NUMBER, PAGENO=, DATE,
and DTRESET control the page numbers and dates that may appear in or near title lines.
NUMBER

The NUMBER option displays page numbers in output. The NONUMBER option removes page
numbers from output. NUMBER is the default.
Page numbers are used only in ODS destinations that divide output tables into pages of a fixed size,
such as the Listing and PDF destinations. HTML, XML, and spreadsheet destinations do not use
page numbers, regardless of the setting of the NUMBER option.
PAGENO=

Use the PAGENO= option to reset page numbers. Usually, write PAGENO=1 to start numbering pages at
1 again.
DATE

The DATE option determines whether the title lines indicate the current date and time. Write DATE
or NODATE.
If you save SAS output, having the date on every page makes it easy to keep track of when the
output was generated.
DTRESET

The DTRESET option determines what date and time appear in the title lines. The default is
NODTRESET, which shows the date and time the SAS session started, or when the SAS program
started running. DTRESET shows the time the specific page was started.
This difference may not be important for batch-mode SAS programs that process a moderate
amount of data, perhaps just a few million rows, and run for a matter of minutes. But for a long-
running program, or in an interactive session that lasts all day, changes might occur while the
program is running, and it can be important to know specifically when a certain part of the program
ran.

Another system option that affects title lines is CENTER, which is described below. These options, like most
system options, are session options, which means you can change them at any time in an OPTIONS statement. Write
one or more options in an OPTIONS statement, for example:
options number pageno=1 date nodtreset nocenter;

ODS Text Lines


The ODS TEXT statement adds a line of text to the open ODS destinations. Write the statement as:
ods text="text";

Use ODS TEXT statements as a simple way to add explanations and similar messages to an output table, within
the body of the page.
If a line of text is not enough and you need to add whole paragraphs of formatted text, in SAS 9.4 or later, look
into the features of the ODSTEXT procedure.

Centering Output
SAS lets you choose whether you want tables, title lines, and similar output centered horizontally or aligned at the
left side of the page. Make this choice using the CENTER system option. Like most system options, CENTER is a
session option, so you can change it at any point in the OPTIONS statement. Write this statement to center tables
and titles:
options center;

To left-align tables and titles, write:


options nocenter;
Labels as Column Headers
Ordinarily, column headers show the names of the columns, but you can gain more control over the column headers
by having them show labels instead, using the LABEL system option. To display specific text as the header for a
column, create a label attribute for the column using the LABEL= column modifier.
You can create a label when you store a column in a data table or when you select it for use in an output table.
Write the LABEL= column modifier with the label text in the SELECT clause. The example below defines and
displays labels for each of its columns. This demonstrates the use of the LABEL system option and the label column
attribute. The data seen in this example is taken from an earlier example, at the beginning of chapter 3, where it is
shown using column names as column headers.
options label;
select
location label='Locks',
elev_lo label='Elevation at Bottom (m)',
elev_hi label='Elevation at Top (m)'
from geo.panamalock;

Locks Elevation at Bottom Elevation at Top


(m) (m)
Gatun 0 26.5
Atlantic 0 26.5
Pedro 16.5 26.5
Miguel
Miraflores 0 16.5
Pacific 0 26.5

If you write the LABEL= term for a column in the SELECT clause of a CREATE TABLE statement, the column
label is stored in the table you create and is available for use whenever you use the data from that table. There are
other ways that labels can come about in SAS, including these:

Other SAS steps, when they create SAS data sets, can create labels using the LABEL
statement.
The DATASETS procedure can add or change labels in an existing SAS data set using the
MODIFY and LABEL statements.
When SAS retrieves data from a database, it may use the database column names as labels.
Other procedures that create variables from existing data may create labels that reflect
available information about the the source of the variables.

Whenever it cannot use labels, SAS uses column names instead. If the LABEL system option is in effect, but a
column does not have a label attribute, then the column name is displayed. If you decide you want to show column
names instead of the labels associated with the columns in a query, there are two things you can do:

Assign a blank label to remove a previously assigned label from a column so that the name
is shown in the column header.
Use the NOLABEL system option to disregard labels entirely and identify all columns using
the column names.

Note that you cannot create a blank column header by assigning a blank label attribute. To show a blank column
header, assign a null character, written as LABEL='00'X, as the column’s label attribute.

Displaying Missing Values


Null values in numeric columns are treated as standard missing values in SAS. The MISSING= system option
determines what character is displayed for a standard missing value. The default is MISSING='.', so that a standard
missing value displays as a period or dot, like a decimal point. Use MISSING=' ' to leave table cells blank when they
contain standard missing values. This can be a way to deemphasize the missing values. Or, use MISSING='*' to call
extra attention to missing values.
Special missing values are missing values that are displayed as a capital letter or underscore. You assign them in
a program by writing them as a period followed by the letter or underscore. Special missing values are not affected
by the MISSING= option.
The example below uses a CASE expression to generate two kinds of missing values: standard missing values,
which it displays as !, and the special missing value .N, which displays as N.
options missing='!';
select age,
case when age >= 14 then age
when age >= 0 then .
else .n end as age_validated
from main.member;

Formats
Data values can appear in an output document only after being converted to text characters. The SAS routines that
convert data values to text are called formats. Associate a format with a column by writing the FORMAT= column
modifier in the column definition.
When you indicate a format, it follows a specific syntax form that may include one or two numeric arguments.
Consider this example of a format:
format=comma10.2

In this example:

comma is the format name.


10is the width argument. This argument indicates the number of bytes or
characters of text the format generates. The width argument is optional.
.is a dot, the period character, which is a required component when writing a
format.
2is the decimal argument. This indicates the number of decimal places in a
numeral. The decimal argument is optional. It is used only with numeric
formats. If you write a decimal argument you must also write a width argument.

The syntax of a format in general may be summarized as:

formatw.d

where

formatis the format name. For a character format, the format name begins with a
dollar sign ($).
w is the optional width argument. For a character column, the default width is
the column’s length attribute.
. is the period character.
is the optional decimal argument. Omitting the decimal argument is the same
d

as having a decimal argument of 0.

There are no spaces or other gaps between these components.


There are two standard formats in SAS, one for numeric values and one for character values.The standard
character format, written as $Fw. or $w., writes character values without any changes beyond shortening them to the
width indicated. The $CHARw. format does the same thing. The standard numeric format, Fw.d or w.d, writes numbers as
ordinary decimal numerals with the indicated number of decimal places. Like most numeric formats, it indicates
negative numbers with a leading minus sign.
SAS has many other formats to choose from. A few of the most useful are listed below.
If a table may be edited interactively, every column that has a format attribute should also have an informat
attribute, and it is important for the informat to be compatible with the format. After a format converts a data value
to text, a compatible informat converts that text to the same or nearly the same data value. The syntax for an
informat is the same as for a format, though an informat usually does not indicate a decimal argument. The list
below shows compatible informats to go with the formats. Often the format and informat use the same name.
$CHARw.
$Fw.
$w.

These formats merely copy character data without changing it or altering it in any way.
Compatible informats: $CHARw., $Fw., $w.
$UPCASEw.

The $UPCASE format converts lowercase letters to uppercase, especially for displaying a case-
insensitive code or name that should be viewed in uppercase letters.
Compatible informats: $UPCASEw., $CHARw.
$HEXw.
$BASE64Xw.

These formats encode data as text for transmission via a text-friendly medium. The $HEX format
encodes in hexadecimal, writing each byte of data as two hexadecimal digits. The $BASE64X
format encodes in base64, writing each three bytes of data as four base64 characters.
Compatible informats: use the same names.
Fw.d
w.d

This is the standard numeric format, which writes standard decimal numerals, without commas. If
there is a decimal argument, it writes a decimal point and a fixed number of decimal places, as
indicated by the decimal argument. This format can be written with the name F, or it can be written
without a name as long as the width argument is present.
Compatible informats: Fw., w.
Ew.

The E format writes numbers in scientific notation. It writes a number as a multiple of a power of
10, indicated by the letter E followed by an exponent of at least two digits. It writes as much
precision as it can in the width available.
Compatible informats: Fw., Ew.
BESTw.

The BEST format, which is the default format for writing numeric values, writes numbers with as
much precision as it can in the width you provide it. For whole numbers that fit in the field, the
output is no different from the output produced by the F format. If a number has a fractional part,
the BEST format writes as many decimal places as will fit, but without writing a 0 as the last
decimal digit. The BEST format determines the number of decimal places to write based on the
value it is writing, so it does not use a decimal argument. If the magnitude of a number is too great
for it to fit in the field, the BEST format writes a condensed form of scientific notation.
Compatible informats: Fw., BESTw.
COMMAw.d
DOLLARw.d

The COMMA format adds the commas that are conventionally written between every three digits
when longer numerals appear in most kinds of documents. The DOLLAR format is similar, but also
adds a dollar sign ($) immediately before the numeral.
Compatible informats: COMMAw., DOLLARw.
Zw.d

The Z format writes numbers with leading zeros. This is often the appropriate presentation for code
values that are stored in numeric columns in databases.
Compatible informats: Fw.
HEXw.
OCTALw.
BINARYw.

These formats write whole number values in computer-oriented numbering systems, base 16, 8, and
2, respectively.
Compatible informats: use the same names.
MMDDYY10.
YYMMDD10.
DATE9.

These formats write SAS date values in three different ways showing the year, month, and day. The
names of the MMDDYY and YYMMDD formats signal that they show year (YY), month (MM),
and day (DD) numbers in a particular sequence. MMDDYY writes dates in a style common in the
United States, while YYMMDD follows recent international conventions. The DATE format writes
dates in SAS style, with the day number, three-letter English month abbreviation, and year with no
punctuation, the same format you would use when writing a SAS date constant.
Compatible informats: use the same names.
YEAR4.
MONTH2.
YYMMD7.

These formats write the year and month of a SAS date value, separately and together.
DATETIME18.

The DATETIME format writes a SAS datetime value, adding a time of day to the date of the DATE
format.
Compatible informat: use the same name.
B8601DA.
E8601DA.
B8601DN.
E8601DN.

These four formats write dates according to the prevailing standard for network documents. The
formats with the B prefix (meaning “basic”) write the year, month, and day numbers with no
punctuation between them, like 20121221; those with the E prefix (meaning “extended”) add dashes
(hyphens) between, like 2012-12-21. The formats that end in DA write SAS date values; those that
end in DN write the date of a SAS datetime value.
Compatible informats: use the same names.
B8601DT.
E8601DT.

These two formats write timestamps according to the prevailing standard for network documents.
The format with the B prefix has no punctuation between the time elements, though there is a T
between the date and time, like 20121221T070000; the one with the E prefix adds dashes and colons,
like 2012-12-21T07:00:00.
Compatible informats: use the same names.

There are many more formats, particularly for SAS date values.
Value Formats
SAS lets you create formats of your own. The most useful of these are value formats that tell SAS to display specific
code values as the corresponding text values.
Value formats are defined in the VALUE statement of the FORMAT procedure. In its minimal form, the
statement provides a name for the new format, followed by pairs of values and labels, where the value is often a
code value, and the text is the name or description to display.
This example defines two simple value formats. ANY is a numeric format that displays 0 as No and 1 as Yes.
$ORIG is a character format that displays IE or ie as Domestic and any other nonblank value as Imported.
proc format;
value any
0 = 'No'
1 = 'Yes';
value $orig
'IE', 'ie' = 'Domestic'
' ' = ' '
other = 'Imported';
run;

After these formats are defined, use them in column definitions such as:
select
answer format=any.,
origin_code format=$orig. as origin,
. . .

These are points to consider when you write a VALUE statement to define a value format:

When you pick a name for a format, it should be short. It must be a valid name, and because
of the way formats are written with numeric arguments, the last character of the name
cannot be a digit.
The name of a character format must start with a dollar sign.
Value formats can provide labels for missing values.
Provide the same label for multiple values by writing a comma-separated list of values.
Mention each value only once — but if you do mention a value multiple times, it is the first
mention that matters.
Use the special value OTHER to provide text to display for any unexpected values. Or, in
the absence of a label for OTHER, unexpected values are displayed as themselves.
For value formats to work efficiently, the number of values you are formatting should be
relatively few — no more than a few thousand. (If the number of code values is around
5,000 or more, you are better off using a regular table join to associate code values with text
for display.)

It is just as easy to create a value format from an existing table. Create a view with the specific column names
required for this purpose, then use the CNTLIN= option to create the format from the view.
The view you use in this technique is considered a control data set. When you create a simple control data set,
these column names are expected:
FMTNAME

The format name


TYPE

For a numeric value format, the code 'N'. For a numeric value format, the code 'C'.
START

The data value


LABEL

The text to display

The example below starts with a table of airport codes and city names, similar to the four rows shown, though it
could contain any number of rows up to several thousand. The objective is to create a format that converts the codes
to the associated city names.

GEO.AERO_TABLE
IATA_code City
BNA Nashville
MCI Kansas City
MEM Memphis
STL St. Louis

The code below creates the format. The TYPE value is C to create a character format, named $AIRPORT.
proc sql;
create view work.airportcntl as
select
'AIRPORT' as fmtname,
'C' as type,
IATA_code as start,
City as label
from geo.aero_table;
quit;
proc format cntlin=work.airportcntl;
run;

When the program runs, these log notes indicate the success of the two steps:
NOTE: SQL view WORK.AIRPORTCNTL has been defined.
NOTE: Format $AIRPORT has been output.

If you create a permanent data table in which a column’s format attribute refers to a value format, you must
ensure that the format is available whenever the table is used. There are two main strategies for this. Use either one:

Run the PROC FORMAT step that defines the format in every SAS session where the
format might be used. This could be in an autoexec program, as described in chapter 7.
Store the format in the LIBRARY library by defining a library with that libref, then adding
the LIBRARY=LIBRARY option to the PROC FORMAT statement. Then make sure the LIBRARY
libref is defined in every SAS session where the format might be used.

The system option FMTERR determines what happens if a needed value format cannot be located. With the
option FMTERR, SAS generates an error if a column’s format attribute refers to a format that SAS cannot locate.
With the NOFMTERR option, SAS gets around the problem by using a default format in place of the missing
format.
6
Combining Tables
Often in SQL, one table is not enough. In relational database theory, it is expected that related columns are often
stored in separate tables. To put the columns side by side, queries need a way to put the tables together. SQL does
this by expanding the FROM, WHERE, and SELECT clauses to accommodate multiple tables. For the FROM
clause, SQL provides table join operators that indicate specific ways of joining two tables.
There are also situations that call for combining rows, rather than columns, from multiple tables. For these
situations, there are set operators, also known as result operators.
Some queries in SQL are made easier to write by placing one query inside another. The smaller query, called a
subquery, acts as a table or list in the larger query. Subqueries provide one of the few ways to break up a complex
SQL task into pieces.

Table Aliases
Before you can actually start putting tables together, you need to know how to use table aliases. When a query
involves two or more tables, you need to be more specific in identifying columns. The same column name could be
found in any number of tables, so you need a way to say which table a column is found in. That is what a table alias
does.
A table alias is a short name that identifies the table in the query. Programmers most often use the letters A, B,
C, and continuing through the alphabet as far as necessary.
Create table aliases by writing them after the table name in the FROM clause of a query. For example, to give
the table MAIN.PARTICIPANT the alias A, write:
main.participant a

In other clauses, identify columns with two-level names, formed with the table alias and column name. For
example, to refer to the column PARTICIPANT_NAME in the table MAIN.PARTICIPANT, write:
a.participant_name

A table alias is just a name you choose, but there are tighter restrictions on table aliases than on other names in
SAS SQL. You cannot use any SQL keyword as a table alias.
Though there are no specific syntax rules about using SAS keywords as aliases, it is a sensible precaution not to
use a SAS statement keyword, such as DATA, OPTIONS, RESET, TITLE, or ODS, as a table alias. This could
confuse a programmer reading the program, if they think they are seeing the beginning of a new statement.
Most programmers use single letters as table aliases. This approach is safe, as no single letter is a reserved word
in SQL, and it easy to read, as the one-letter table aliases do not unduly distract from the column names they are
attached to.
Another common approach is to use a letter or word with a range of numeric suffixes, such as T1, T2, and so on,
as table aliases. This is also a safe approach; no SQL reserved word has a numeric suffix.
A table alias can be used together with the special column identifier *, which selects all available columns. To
select all columns from table A, write:
a.*
The FROM Clause With a List of Tables
When you need to combine data from multiple tables, it is almost as easy as writing a list of tables in the FROM
clause of a query. To write a query that draws columns from multiple tables using this approach:

1. List the tables, separated by commas, in the FROM clause.


2. After each table name, write an alias.
3. Use these table aliases to form two-level names for columns wherever they appear in the query expression,
as described above. Use these two-level names to select a list of columns in the SELECT clause.
4. In the WHERE expression, include the condition for combining rows between the two tables. Most often,
the requirement for combining rows is that the key columns match. For example, if the key columns that
connect two tables are DAY and STATE, the WHERE condition could be A.DAY = B.DAY AND A.STATE = B.STATE.

The query below is an example. It draws the columns NAME, AGE, and SUBSPECIES from the table
MAIN.TIGER and adds the column NATIVE_RANGE from the table GEO.TIGERPLACE. It forms rows only
when the values of TRINOMIAL match between the two tables. The selected column names appear as the column
headings in the output.
select
i.name,
i.age,
i.subspecies,
t.native_range
from main.tiger i, geo.tigerplace t
where i.trinomial = t.trinomial;

Suppose that the two tables contain the following data:

MAIN.TIGER
NAME AGE SUBSPECIES TRINOMIAL
Scorpio 4 Indochinese Panthera tigris corbetti
Felicia 5 Bengal Panthera tigris tigris
Panthera tigris
Dea 5 South China
amoyensis
Rio 9 Bengal Panthera tigris tigris
Shift 14 Caspian Panthera tigris virgata

GEO.TIGERPLACE
TRINOMIAL SUBSPECIESNATIVE_RANGE
Panthera tigris altaica Siberian SE Siberia
Panthera tigris
South China SE China
amoyensis
Panthera tigris balica Bali Bali
Panthera tigris corbetti Indochinese SE Asia
Panthera tigris jacksoni Malayan Malay Peninsula
Panthera tigris sondaica Javan Java
Panthera tigris sumatrae Sumatran Sumatra
Panthera tigris tigris Bengal Bay of Bengal
Panthera tigris virgata Caspian C-SW Asia

From this data, the result set of the query is:

NAME AGE SUBSPECIESNATIVE_RANGE


Dea 5 South China SE China
Scorpio 4 Indochinese SE Asia
Felicia 5 Bengal Bay of Bengal
Rio 9 Bengal Bay of Bengal
Shift 14 Caspian C-SW Asia

Only rows that match are included when the two tables are joined. All five rows in MAIN.TIGER match rows in
GEO.TIGERPLACE, so all five rows are represented in the result set, if not in the same order. Only four of the nine
rows of GEO.TIGERPLACE match, so only those four rows are represented. Notice that the Panthera tigris tigris
row, which provides the value “Bay of Bengal,” is used twice because it matches two of the rows in MAIN.TIGER.
In a table join, the ID column you match on is present in both tables. Often, this column has the same name in
the two tables, so if you include both columns in the SELECT clause, you end up with a result set that has two
columns of the same name. This can cause confusion and make the resulting table harder to work with. Even if the
names are different, if the two columns provide the same value there is no advantage in having both. Write one ID
column or the other in the SELECT clause. Note that if you select all columns by writing SELECT *, this includes both
ID columns in the result set. For this reason, you will rarely see SELECT * in a table join. However, you may often see
a SELECT clause that begins with SELECT A.*, then goes on to add specific columns from table B.
Writing the WHERE condition correctly is critical for a multi-table query, particularly when the query uses
tables that have more than a couple of rows. If the WHERE condition is incorrectly formed, the result of the query
could be an impossibly large number of rows, or it could be no rows at all.
If you are uncertain about the result of a table join, consider using an SQL option to limit the amount of
processing or the number of result rows. The most useful options for this purpose are OUTOBS= and LOOPS=.
These options are described in chapter 9. Write the options in the PROC SQL statement. For example, if queries
should result in about 1,000 rows, you might write the options:
proc sql outobs=10000 loops=200000;

These option settings are likely to let the query complete if it is running correctly, but will limit how much work the
query does if it is formed incorrectly.
When you start joining tables, it is important to avoid coding accidental Cartesian joins, as these can run for a
long time and produce a large number of output rows. In set theory, a Cartesian product is the combination of each
element in one set with each element in another set. If the number of elements in the two sets are a and b, then the
number of combinations is a × b, a much larger number. Cartesian products occur in SQL whenever a table join is
not properly controlled. A Cartesian join occurs most obviously when you combine two tables without a WHERE
clause. This results in all combinations of rows from table A with rows from table B. But Cartesian joins also occur
when a WHERE clause is formed incorrectly, in particular:

If a WHERE condition contains only rules for subsetting the data, and there are no controls
for the table join, the result is a Cartesian join of the subsets formed.
If the column used for a table join is not unique in either table, that is, if there are multiple
matching rows in both tables, the result for each value found in the column is a Cartesian
join of the available rows within the two tables for that value.
If several tables are joined and one of the tables is not represented in the table join condition,
the result is a Cartesian join of that table with the other joined tables.

One approach SQL programmers use to avoid Cartesian joins is to copy table joins from existing queries that are
known to work correctly. When databases are designed, the routine table joins are part of the design, so it makes
sense that programmers might use variations on the same table joins over and over.
When you develop a new table join, test it before you rely on it. The test can involve running the table join on a
small volume of data, with no more than 100 rows in any one table. Or, you can test with options such as the
OUTOBS= and LOOPS= options noted above to limit the scale of the query.
Most table joins can be written using table join operators, discussed next, and many programmers find that this
more structured approach to joining tables is less error-prone. Even with table join operators, though, it is possible to
form accidental Cartesian joins, so it is still important to test new joins before you rely on them.
Table Join Operators
Table join operators represent more specific ways to combine tables. By providing specific conditions for joining
each table, they eliminate some of the risk of coding error you face when you write a simple list of tables in the
FROM clause. Table join operators also create the possibility of including nonmatching rows in the result.
All expressions using table join operators are written in essentially the same way. Each operator requires two
tables as operands, one written before the operator, or on the left, and one after the operator, or on the right. Each
table, in turn, must be followed by a table alias. The table alias is not a requirement of SQL syntax, but is a practical
necessity when using columns from the table. At the end of the table join expression, you will usually need the
keyword ON with a condition that describes the join mechanism, or the way rows of one table are connected to rows
of the other table. This results in a table expression of this form:
table 1 alias 1
table join operator
table 2 alias 2
ON condition

The join condition is most often looking for a match between columns in the two tables. If the table aliases are A
and B and rows are matched by a single column, the expression reduces to:
table 1 A
table join operator
table 2 B
ON A.column = B.column

The INNER JOIN operator is the one that produces the most familiar form of table join, which includes only
matching rows. The earlier tiger example could equivalently be written:
select
i.name,
i.age,
i.subspecies,
t.native_range
from
main.tiger i
inner join
geo.tigerplace t
on i.trinomial = t.trinomial;

There are several table join operators in order to provide various ways of joining two tables. These are the table
join operators, most of which can be written in two ways:
INNER JOIN
JOIN

An inner join includes all combinations of matching rows between the two tables. Rows in each
table that do not match the other table are excluded.
Usually tables are joined by unique key values, so that no more than one row in one table matches a
particular value from the other table. In this case, there is a one-to-one or one-to-many match, and
the table join does not form new rows. In a one-to-one match, rows are merely put next to each
other. In a one-to-many match, a row from one table may be repeated as it is matched to multiple
rows in the other table.
The other possibility is a many-to-many match, which results in a larger number of rows as rows
from each table are repeated as many times as it takes to form all matching combinations. Many-to-
many matches should be avoided except in cases where you have reason to expect that the number
of matching combinations will not be excessively large.
LEFT OUTER JOIN
LEFT JOIN

The idea of an outer join is to include rows that don’t match. The most common outer join is a left
outer join. All rows from the left table, the one written before the table join operator, are included in
the result whether they match the other table or not. But rows in the right table are included only
when they match.
If no matching row is available from the right table, its columns get null values in that row.
RIGHT OUTER JOIN
RIGHT JOIN

A right outer join is the same action as a left outer join, but it is unmatched rows from the right
table, the one listed after the operator, that are added to the result. The right outer join is not seen
often because it is equivalent to the left outer join and in a few special cases, table joins are more
efficient when formed using the LEFT OUTER JOIN operator instead (reversing the order of the
tables accordingly).
FULL OUTER JOIN
FULL JOIN

In a full outer join, nonmatching rows from both tables are added to the result.
NATURAL INNER JOIN
NATURAL JOIN
NATURAL FULL OUTER JOIN
NATURAL FULL JOIN
NATURAL LEFT OUTER JOIN
NATURAL LEFT JOIN
NATURAL RIGHT OUTER JOIN
NATURAL RIGHT JOIN

Natural joins are modified versions of inner and outer joins. Each of the four natural join operators
can be written in two different ways, as shown above.
Form a natural join operator by writing the word NATURAL before an inner or outer join operator.
The resulting natural join is equivalent to an inner join or outer join matching on all columns that
the two tables have in common.
When you write a natural join, you do not write an ON clause. Instead, SAS generates an ON
condition that matches all columns that have the same name and data type in the two tables. The
result of a natural join is exactly the same as the result of an inner or outer join written with the
equivalent ON condition.
Natural joins might make sense in certain tightly designed databases, but they are not seen often in
SAS because they clash with the usual SAS philosophy about data. Programmers working in SAS
tend to see data files as potentially changeable. To take one example, it is not hard at all to change a
SAS program (or its input data) so that the tables it creates have additional columns. Adding an
extra column, though, may completely change the effect and result of a natural join. Of particular
concern, if rows form differently when the tables are joined, the number of rows in the result set
could increase significantly. Given the way data is used in SAS, it is preferable to write ON clauses
explicitly, and not to write SQL with natural joins.
CROSS JOIN

A cross join forms all combinations of rows between the two tables. That is, if the table aliases are
A and B, each row in table A is combined with each row in table B. When you write a cross join,
there is no ON condition for joining the tables.
A cross join forms a Cartesian product, equivalent to listing the two tables in the FROM clause with
no WHERE condition. The main difference is that the CROSS JOIN operator shows that this result
is specifically intended, and is not an excessive result set caused by a programmer’s error of
omission. A Cartesian product usually implies a large result set. The number of rows in the result set
is the product of the row counts of the two tables.
The one special case in which a cross join does not create any additional rows in the result set is
when one of the tables has only one row. In this situation, the cross join provides a way to add
values that are the same in all rows.
The most common use of cross joins in SAS is to create a table that has all possible combinations of
values for a list of columns used as categorical variables. The result set provides all of the possible
categories, including ones that may not have occurred in the available data.
UNION JOIN

A union join combines two tables without matching rows. With this table join, there is no ON
clause.
A union join is similar to a full outer join in which none of the rows match. It is also somewhat like
concatenating two SAS data sets using the SET statement of the data step. In a union join, no rows
are matched, so the number of rows in the result set is the sum of the row counts of the two tables.
Columns also are not combined, even if they have the same name and data type. However, you can
consolidate columns between the two tables by writing expressions using the COALESCE function,
described in chapter 3.

Changing the earlier tiger example to a left outer join would allow a row from MAIN.TIGER to be included in
the result even if its value of TRINOMIAL was not found in the other table, GEO.TIGERPLACE. This is
accomplished with a small change in the FROM clause:
select
i.name,
i.age,
i.subspecies,
t.native_range
from
main.tiger i
left outer join
geo.tigerplace t
on i.trinomial = t.trinomial;

A full outer join makes sense when you are combining two groups that overlap. Imagine that DOT.DRIVERS
contains information about licensed drivers and DOT.OWNERS contains information about vehicle owners, with
each table containing only one row on each person. The two tables have most of the same people, but there also
people who are in just one table or the other — they drive but don’t own a car, or they own a car but don’t drive it.
With a full outer join you can combine these two tables so that you have the combined information on people who
own or drive a vehicle. Write the FROM clause like this:
from
dot.drivers a
full outer join
dot.owners b
on a.driver = b.owner

ID Columns When Joining Tables


An inner join or outer join between two tables ordinarily implies that the tables have an ID column in common.
Usually you will want to include the ID column in the result set, which means listing the ID column in the SELECT
clause. However, you do not necessarily need to get the same ID column from both tables.
In an inner join, the two ID columns are equivalent, so select either one.
In a left outer join, select the ID column from the left table. Why? In rows that are not present in the right table,
the ID column is also not present in the right table. Selecting the ID column from the right table would result in null
values in rows that are present only in the left table. The ID column from the left table is always present, though, so
you can rely on that column. This pattern is indicated in the following code model:
select
a.id,
other columns
from
table 1 a
left outer join
table 2 b
on a.id = b.id;

By the same logic, select the ID column from the right table in a right outer join.
A full outer join should consider both ID columns, as either ID column may be null in some rows. Use column
aliases to tell the two IDs apart, for example:
select a.id as id_a, b.id as id_b, . . .
from
table 1 a
full outer join
table 2 b
on a.id = b.id;

To form an ID column that is always there, combine the two IDs with the COALESCE function, as shown here:
coalesce(a.id, b.id) as id

In the WHERE clause, the ID columns can be used to distinguish rows that are present in a particular source
table. Taking the case of a left outer join, if the ID column from the right table is not null, that means that the ID
value found a match in that table. If the right table is table B and the tables are joined on the column ID, then this
expression tests for the row’s presence in the right table:
b.id is not null

Use this condition, along with the complementary condition B.ID IS NULL, to test for conditions that depend on the
presence or absence of data from table B. For example, if rows that are present in table B should have the code value
A in the column B.ACTIVE to be included, test for this using the condition:
b.active = 'A' or b.id is null

In a full outer join, test either table’s ID columns for null values in the same manner. Note that you can use the
columns in the WHERE clause even if they are not selected separately in the SELECT clause.
It is easy to create indicator columns, when needed, that tell you whether a row is present in either table in a join.
In the following example, that is all there is to the query. This query consolidates two lists of ID values, while
keeping track of which of the two lists each value was found in.
select
coalesce(a.id, b.id) as id,
case when a.id is not null then 'Y' else 'N' end
as inlist1,
case when b.id is not null then 'Y' else 'N' end
as inlist2
from
work.list1 a
full outer join
work.list2 b
on a.id = b.id;

Internal and External Identifiers


Tables connect to each other by matching on key identifying columns. These are often internal identifiers, arbitrary
codes or numbers created just for the purposes of connecting the data. This is especially likely when people and
personal accounts are involved. The external (or natural) identifiers that identify people in the outside world cannot
always be used as key columns because of security and data integrity issues. As the most obvious example, imagine
using people’s names to form a key column. Problems might immediately arise with:

Uniqueness. Multiple people could have the same name. This would lead to incorrect results
in joining tables.
Variability. A person could use different forms of the same name, or could take on a new
name, causing rows not to match between tables.
Privacy and security. There might be a business reason or a privacy obligation not to
disclose the person’s name, creating restrictions around that column. When there are
security concerns surrounding a key column, it can limit the ability to use any of the data in
the table.

Personal data may be the most sensitive, but there are similar issues whenever external identifiers are used as
table columns. To avoid these problems and others, database designers may create internal identifiers for all key
columns. The names of these column often end with ID, KEY, or NUM.
A common scenario, then, is that the external identifiers are stored in a separate table, and when you need to
include one of these columns in a query, you link to that table using the internal identifier as the key. Suppose that
CORP.CUSTOMER contains business data on customers and CORP.CUSTOMERCONTACT contains customers’
personal information such as addresses. A table join for a query that uses columns from both tables might be:
corp.customer a
left outer join
corp.customercontact b
on a.customer_id = b.customer_id

The codes or numbers used as internal keys don’t convey any meaning, and this can make the table joins harder
for you to trace visually. On the other hand, the internal key values are usually compact 4- or 8-byte codes, and that
may result in faster table joins.

Joining Three or More Tables


The result of a table join operator can be joined to another table, effectively producing a join of three tables. By
adding more table join operators you can join as many tables as are needed.
The classic scenario is a series of left outer joins. These have the effect of adding columns from additional
tables. As long as the joins are made on unique identifying columns, no additional rows are created by the
successive table joins.
The following example shows a model FROM clause for one possible pattern of a join of four tables.
from table1 a
left outer join
table2 b
on a.id1 = b.id1
left outer join
table3 c
on a.id1 = c.id1 and a.id2 = c.id2
left outer join
table4 d
on c.id3 = d.id3

The succession of table join operations in this FROM clause might look complex at first glance, but it is a fair
representation of the way four or five tables are likely to go together. Note that the matching columns do not have to
be the same in every table — indeed, in most data it wouldn’t make much sense to have a large number of tables that
match on the same key column.
In this example, you might think of TABLE1 as the base table, which supplies the rows of the result set. This
table is joined to TABLE2, then the result of that join is joined to TABLE3, then that result is joined to TABLE4.
When there are multiple table join operators, they proceed in the order written — or you can use parentheses for
grouping to specify any sequence of joins.
Joining more tables can take more time to run. For speed, when you can, write a table that defines a small subset
of the data (based on a WHERE condition that only a small fraction of rows will meet) early in a series of table
joins, especially when the joins are left outer joins. Depending on the assumptions SAS makes when it optimizes the
query, this could make the query run faster.

Groups and Summary Functions When Joining Tables


In a table join, you can freely join summary functions from one table with detail data from another table. To ensure
that this works correctly:

List all of the detail columns in the GROUP BY clause.


For the columns you are joining on, be careful to use the same table alias for the column in
the SELECT and ORDER BY clauses.

This is a simplified example to illustrate these points:


select b.id, b.name, sum(a.volume) as total
from
work.catalog a
inner join
work.activity b
on a.id = b.id
group by b.id, b.name;

In this example, it matters that the first column in the GROUP BY clause is B.ID, as listed in the SELECT clause,
and not A.ID. The two columns might seem to be interchangeable, especially when you see the ON clause indicating
a.id = b.id, but A.ID cannot be confused with B.ID in this context.

Self Joins
Nothing prevents you from using the same table on both sides of a table join operator. Any such join is considered a
self join, and it is a topic of special interest in relational database theory.
A self join makes sense as soon as you realize you need to look at the same table in two different ways. As an
example, imagine a table LANG.STUDENT with information about language students. Suppose you want to make a
list, linked to each student, of the other students in the same town studying the same language. This requires looking
at the table twice — first with a focus on the student, and then looking at towns and languages.
select a.student, a.town, a.language,
b.student as nearbystudent
from lang.student a
inner join lang.student b
on a.town = b.town and a.language = b.language
and a.student <> b.student;

Notice that the join shown above is an example of a Cartesian join, and is practical only if the number of
students studying the same language in any one town is not unduly large. Note also the importance of the last part of
the ON condition, which is necessarily to avoid the reflexive link of a student to him- or herself.
Self joins highlight the necessity of aliases when doing table joins. Without the table aliases A and B in this
example, there wouldn’t be any way to know which columns you were getting. The column alias in this example is
also essential in keeping the columns straight.
These are other examples of familiar questions that may require self joins in SQL:

In social networking data, finding mutual connections, shared connections, and second-level
connections.
In a family tree, identifying indirect relatives such as grandparents and grandchildren.
Joining columns of a table with totals of the same columns, computed with the SUM
function, in order to compute each row’s share of the total.

Subqueries
Anywhere in a query expression where a table is needed, you can write a subquery instead. A subquery is a query
written in parentheses and used as a component in another query. The result set of the subquery is in the form of a
table, so it makes sense that a subquery can take the place of a table in a query. Perhaps less obviously, a subquery
that results in a single column can serve as a list in a query, and a subquery that returns a single value can serve as a
value in a query.
To consider the simplest example, by replacing a table with a subquery in a FROM clause, you may change a
query like SELECT * FROM table to
SELECT * FROM (
query
)

Suppose you want to count the number of distinct combinations of the columns DEPT and VENDOR in the table
CORP.SOURCE. To count the number of distinct values in one column, you could use the COUNT function, for
example:
select count(distinct vendor) as n
from corp.source

This approach can’t be used with two columns together, though. Instead, you can write:
select count(*) as n
from (
select distinct dept, vendor
from corp.source
)

In this query, the subquery is:


select distinct dept, vendor
from corp.source

The subquery uses the DISTINCT keyword to create a set of the distinct combinations of the two columns. Then, the
COUNT(*) function counts the number of rows in that set.

One way to think about subqueries is that they produce the same result as if you had saved the result of the
subquery as a table in a separate CREATE TABLE statement. The previous example could alternately have been
written as:
create table work.supplylink as
select distinct dept, vendor
from corp.source;
select count(*) as n
from work.supplylink;

Besides having two statements instead of one, this approach obliges SAS to store the first result set as a table. This
could be more work, and it could take longer to do, than the approach that SAS’s SQL optimizer comes up with for
the combined query.
The above example also works if WORK.SUPPLYLINK is a view — changing the CREATE TABLE statement
to a CREATE VIEW statement. In principle, there is little difference between executing a subquery and executing a
view. In recognition of this similarity, subqueries are also referred to as inline views.
Reading a query that has another query in the middle of it takes some getting used to. Subqueries are useful,
though, allowing you to break a complex task into pieces while still coding it as a single query.

Subqueries in Table Joins


Subqueries are often seen in table joins. Using the INNER JOIN operator as an example, the query expression turns
into:
select columns
from
(
query 1
) a
inner join
(
query 2
) b
on a.column = b.column

When you write a table join with a subquery, make sure you use the columns of the subquery result set in the ON
condition or a WHERE condition that comes after it. These conditions take place after the subquery is already
resolved, with the subquery result set serving as the source table at this point, so only the columns defined in the
SELECT clause of the subquery are available for use.
A query can be formed from multiple subqueries drawing from the same table. To form a table of all the possible
combinations of categorical columns in a table, write a query using cross joins of subqueries drawn from the table.
The subqueries in the example below read the table WORK.AUTO, which contains rows describing individual
vehicles, and extracts lists of distinct values in three categorical columns. These three lists are then joined with the
CROSS JOIN operator to form the final table, WORK.AUTOCAT. The SORTED BY clause ensures that the output
data is in the most useful order.
create table work.autocat as
select a.bodystyle, b.color, c.sizeclass
from
( select distinct bodystyle from work.auto
where bodystyle is not null ) a
cross join
( select distinct color from work.auto
where color is not null ) b
cross join
( select distinct sizeclass from work.auto
where sizeclass is not null ) c
order by a.bodystyle, b.color, c.sizeclass;

If in WORK.AUTO, BODYSTYLE has 8 different values, COLOR has 13, and SIZECLASS has 5, then the
resulting table WORK.AUTOCAT has 520 rows representing all combinations that can be formed from the original
values. Probably most of these will be combinations that do not exist in the source table.

Subquery as Column Expression


A query that returns a single value can be used as a column expression in a query. The effect is similar to that of a
constant value in a SELECT clause. The resulting column has the same value in every row. To illustrate this,
consider the following:
select placename,
'New Jersey' as statename
from work.njplaces

If you want to obtain the STATENAME value from a table instead of writing it as a constant, this query might be
written instead as:
select placename,
(select statename from main.stateinfo
where statecode = 'NJ') as statename
from work.njplaces

The IN Operator With a Subquery


The IN operator compares a column (or column expression) to a list of values. The result if true if the column
matches one of the values in the list. It is false if the column does not match any of the values. In the simplest use of
the operator, the list is a specific list of constant values, for example:
a.type_code in ('081', '083', '104', '106')

For a larger or changeable list, the list can instead be a subquery that returns a list of values. The result of the
subquery should be a single column of the same data type as the value that is being compared. If the column
APPROVED_TYPE contains code values in a table called ACTLIST, then this could be an expression using the IN
operator:
a.type_code in
(select approved_type from actlist
where status = 'Active')

This expression evaluates as true for values of TYPE_CODE that are found in the subquery result set, false for
values that are not found.

Set Operators
A set operator (or result operator) combines the result sets of two queries, applying a version of a mathematical set
operation. The most basic set operator is UNION ALL, which concatenates the two complete result sets.
At first glance, set operators might seem similar to table join operators, but there are three essential differences:

Set operators work on result sets. Their operands are written as complete query expressions.
Table join operators work on tables or equivalent data sources such as views and subqueries.
Table join operators usually match rows between tables based on key columns. When set
operators match rows between sets, they look at whether the entire row is identical.
You might think of using table join operators to add columns to a result set. Set operators
focus instead on rows, which they may add or remove. A set operator works best if the two
sets have exactly the same columns.

Combine two entire sets with the UNION ALL operator. To visualize the effect of the UNION ALL operator, it
may help to imagine combining two subsets drawn from the same table.
Consider the tiger data shown earlier. The two queries below select two subsets of the table MAIN.TIGER, as
shown.
select name, age, subspecies, trinomial from main.tiger
where subspecies = 'Bengal'
;

NAMEAGE SUBSPECIESTRINOMIAL
Felicia 5 Bengal Panthera tigris
tigris
Rio 9 Bengal Panthera tigris
tigris

select name, age, subspecies, trinomial from main.tiger


where subspecies = 'South China' and age >= 5
;

NAMEAGE SUBSPECIESTRINOMIAL
Dea 5 South China Panthera tigris
amoyensis

The two result sets are combined with the UNION ALL operator. The combined query is simply the two
complete queries written with UNION ALL between them:
select name, age, subspecies, trinomial from main.tiger
where subspecies = 'Bengal'
union all
select name, age, subspecies, trinomial from main.tiger
where subspecies = 'South China' and age >= 5
;

The resulting output is an equally simple combination of the two result sets:
NAMEAGE SUBSPECIESTRINOMIAL
Felicia 5 Bengal Panthera tigris tigris
Rio 9 Bengal Panthera tigris tigris
Dea 5 South China Panthera tigris
amoyensis

This example is perhaps not the most profound use of the UNION ALL operator, as you might have obtained the
same rows, if not in the same order, with the OR operator in a single WHERE condition, writing the query as:
select name, age, subspecies, trinomial from main.tiger
where subspecies = 'Bengal'
or (subspecies = 'South China' and age >= 5);

The example nevertheless demonstrates the idea that set operators are ideally used with sets that look like they go
together. They don’t have to actually come from the same table, but they should look like they belong in the same
table, ideally with all of the same columns in the same sequence.
Here are a few examples of situations that might call for the use of the UNION ALL operator:

Adding a summary row to the end of a table. If you do this, create blank or null columns
where necessary so that the summary row has all the same columns as the other rows.
Combining subsets from the same table that are selected with different columns (perhaps
also having WHERE conditions that refer to different columns). As one possibility, you
might combine data showing actual measurements for past time periods with data showing
projections for future time periods.
Combining similar data from two different tables.
Summarizing different subsets in different ways. For example, a U.S. retailer might
summarize U.S. sales by state and all other sales by country.

In addition to UNION ALL, the set operators that are intended for sets where all the columns are the same are
UNION, INTERSECT ALL, INTERSECT, EXCEPT ALL, and EXCEPT.
UNION

UNION is the same as UNION ALL, except that it sorts the resulting set and removes duplicate
rows. This effect is similar to that of writing DISTINCT before the list of columns in a SELECT
clause. Two rows are considered equal only if all values are the same. Because of the sorting
involved, UNION can take several times as long to run as UNION ALL. In eliminating duplicates,
the UNION operator more closely follows the idea of the union operation of set theory. The UNION
operator is probably the operator you will use if you are building a list for future action, such as a
list of customers who qualify for an offer or a list of problem areas for further investigation.
INTERSECT ALL
INTERSECT

INTERSECT ALL includes only rows that the two sets both contain. INTERSECT is the same, but
removes duplicate rows from the result. The INTERSECT operator is based on the intersection
operation of set theory. An example of the use of the INTERSECT operator is in a merger of two
companies, to find the customers that the two companies have in common.
EXCEPT ALL
EXCEPT

EXCEPT ALL includes rows of the first set after removing any rows that are also found in the
second set. Where there are duplicate rows, it takes one occurrence of the row in the second set to
cancel out each occurrence of the row in the first set. EXCEPT is the same, but it removes duplicate
rows from the two sets before comparing. The EXCEPT operator is based on the relative
complement operation in set theory. An example of a likely use for the EXCEPT operator is when
you are building a list of customers who must be notified of a change, and you want to exclude from
the list any customer who has been notified already.
An expression involving a set operator is itself a query expression, so it can be used in a SELECT statement, a
CREATE TABLE statement, or a CREATE VIEW statement. It can also be used as a subquery. You might write a
set operation as a subquery in order to sort or summarize the results of the set operation. This example sorts the rows
that result from a UNION ALL operator:
select * from (
select name, age, subspecies, trinomial from main.tiger
where subspecies = 'Bengal'
union all
select name, age, subspecies, trinomial from main.tiger
where subspecies = 'South China' and age >= 5
) order by age;

Assuming that columns match correctly, the UNION, INTERSECT ALL, and INTERSECT operators are
associative and commutative, so you can use the operators multiple times in any sequence to form the union or
intersection of three or more sets. The same property holds for the UNION ALL operator if you are not concerned
about the sequence of rows in the result. That is, the following two queries return the same rows, if not in the same
order:
create table work.result1 as
select col1, col2, col3, col4 from table1
union all
select col1, col2, col3, col4 from table2
union all
select col1, col2, col3, col4 from table3
;

create table work.result2 as


select col1, col2, col3, col4 from table3
union all
select col1, col2, col3, col4 from table2
union all
select col1, col2, col3, col4 from table1
;

The EXCEPT or EXCEPT ALL operator also can be used with three or more sets. It matters what set you write
first — this is the set that provides rows that may appear in the result — but the sequence of the remaining sets does
not affect the result.
For more complex combinations of sets you can write expressions that use two or more different set operators
together with three or more sets. If you do this, use parentheses to show which operations should happen first.

Set Operators When Columns Aren’t Identical


If two sets don’t have exactly the same columns, you may still be able to conduct set operations on them. There are
four scenarios:

The sets have the same columns in the same order but with different column names.
Differences in column names don’t matter. The set operators use the column names from the first set
and disregard the column names used in the subsequent sets. The data types must still be
compatible.
You want to combine sets that may have no columns in common.
Use the OUTER UNION operator. This operator forms the same rows as the UNION ALL operator,
but it places each set’s columns in separate columns in the result set, even if columns have the same
names. (OUTER UNION is a nonstandard extension to SQL. It has essentially the same effect as the
UNION JOIN table join operator, described earlier in the chapter.)

The sets have the same column names and data types, but the columns are not necessarily in the same
order.
Use the CORRESPONDING modifier, also written as CORR. You can write CORRESPONDING
after any of the set operators. SAS then connects columns between sets according to name rather
than position.
The two sets in a union operation do not have exactly the same columns.
Again, use the CORRESPONDING modifier to ensure that SAS identifies columns by name. SAS
includes all columns it finds in either table. This is a nonstandard extension to SQL and something
that the major relational DBMSs do not try to do, but it is a natural adaptation to the way SAS
ordinarily works with data. The same idea would not really work with the INTERSECT and
EXCEPT operators, as their effects are based on finding identical rows in the two sets. Rows cannot
easily be identical if they do not have exactly the same columns to begin with.

OUTER UNION CORRESPONDING

The CORRESPONDING modifier changes the OUTER UNION operator in a way you
might not expect. With the OUTER UNION CORRESPONDING operator, SAS collapses
columns that have the same name (and data type), so that the columns of the two tables are
not necessarily kept in separate columns in the result, as you would otherwise expect with
OUTER UNION. The CORRESPONDING modifier, then, takes away the essential
distinction between UNION ALL and OUTER UNION. Ordinarily, UNION ALL
CORRESPONDING and OUTER UNION CORRESPONDING produce identical results.

Set Operator Results When One Set Has No Rows


Set operators proceed even if one or both sets have no rows. If both sets have no rows, the result has no rows. If the
first set has no rows:

Any union operator results in the rows of the second set. For the UNION ALL and UNION
operators, it is worth noting that even though the values come from the second set, the
column names are taken from the first set.
Other operators result in no rows.

If the second set has no rows:

The INTERSECT operators result in no rows.


Other operators result in the rows of the first set.

It is expected that set operators may sometimes be used on sets that have no rows. Consider the case of a
program that runs on a regular schedule, with a history table that reflects the results of previous runs. The UNION
ALL operator might be used to add in historical results, or the EXCEPT operator might be used to exclude historical
transactions from repeated processing. In either case, the history table is empty the first time the program runs.

Combining Summary Data


The UNION ALL operator can be used to combine summary data that comes from multiple sources. The result,
when resummarized, forms a combined summary.
Note:
Only a few common statistics can correctly be resummarized. Mainly, use the SUM, MAX,
and MIN statistics.
Some statistics can be used because they are strictly cumulative: the frequency statistics N
(or COUNT) and NMISS, and to prepare to compute more advanced statistics, USS. Use the
SUM function to resummarize these statistics.
Compute other statistics at the end: mean as the ratio of sum and n, range as the difference
of maximum and minimum.
It is important to join the summary data with the UNION ALL operator rather than UNION.
The UNION operator deletes duplicate rows, which is not a valid action to take on summary
data.

The following example illustrates these points. The tables MAG.EDITORIAL and MAG.COMMENT represent
two sources of content that appear on a magazine web site. There is a separate query for each table. The WHERE
clauses pick out rows that describe photos, and both queries count photos and compute the sum and maximum of
PIXEL_COUNT, grouping the results by DEPARTMENT. These results are combined with the UNION ALL
operator, then resummarized in the outer query. At that stage, an average of PIXEL_COUNT is computed.
select
department,
sum(photo_count) as photo_count,
max(largest_pixel_count) as largest_pixel_count,
sum(pixel_total)/sum(photo_count) as avg_pixel_count
from (
select
department,
count(*) as photo_count,
max(pixel_count) as largest_pixel_count,
sum(pixel_count) as pixel_total
from mag.editorial
where content_type = 'image'
group by department
union all
select
department,
count(*) as photo_count,
max(pixel_count) as largest_pixel_count,
sum(pixel_count) as pixel_total
from mag.comment
where content_type = 'image'
group by department
)
group by department;
7
Working With SAS Data
SQL is not just for querying data, but also for managing it. The idea of managing data covers a wide range of
possible actions, and most of the statements in SQL syntax are designed to provide these actions. SAS has its own
separate tools for managing data, some of which are essential enough to mention here.
Working with data in SAS also requires an understanding of the SAS approach to data. The key objects in SAS
data are covered here, taking an SQL point of view. More information on the SAS approach in general can be found
in Appendix 1, “The SAS Data Model.”

Libraries
SAS programs refer to SAS files using two-level names formed from a libref and a member name. The first part of
the name, the libref, identifies the library that contains the file. The SAS files in a library are called members of the
library. Each member within a library can be identified by its member name and member type.
An ordinary library consists of the files in a directory. When it is defined in a LIBNAME statement, the
statement connects the libref to the physical file name of the directory. This example defines the libref MAIN:
libname main "/mydata/maindata";

The purpose of most libraries is to store SAS files that will be used in more than one SAS program or session.
Every program that uses the library requires a LIBNAME statement or an equivalent action before it can access the
members of the library.

Autoexec
The LIBNAME statements that define a project’s librefs are often written in the autoexec
program. This is a separate SAS program which may be identified by the AUTOEXEC=
system option.
The autoexec program is an initialization program that runs automatically at SAS startup
and contains SAS statements to set up resources for the main SAS program. In addition to
librefs, an autoexec program may set up objects such as formats and macro variables.
Using an autoexec program makes it possible to set up the same environment for multiple
programs.

Other forms of the LIBNAME statement allow other actions on libraries. This statement removes the definition
of the libref MAIN:
libname main clear;

This statement writes a log message showing the definition of the libref:
libname main list;
This statement writes a log message showing all libref definitions:
libname _all_ list;

In an interactive SAS session you can explore the contents of libraries using the hierarchical display provided for
that purpose. In a SAS program, these are two ways to view the members of a library:
proc datasets library=libref;
quit;

proc sql;
select * from dictionary.members
where libname = 'LIBREF';
quit;

The first step uses the DATASETS procedure to create a list of members. The second obtains the list from a
DICTIONARY table. See “DICTIONARY Tables” later in the chapter for more details on DICTIONARY tables.

The WORK Library


Not all libraries are defined in LIBNAME statements. A few of the more interesting ones are predefined by system
options, which create the librefs automatically at SAS startup. This group includes the one library that SAS
programmers use more than any other, the WORK library.
The WORK library is mainly meant for temporary SAS data sets, which will be used only within the current
SAS session. SAS automatically deletes these files when the session ends.
Typically, a SAS program creates most of its tables in the WORK library. When you are reading the program,
the WORK libref can help you understand the flow of data. If a table is in the WORK library, that tells you that the
table is not the final product of the program, but contains intermediate data that will be erased when the program
ends.
The WORK library is such a big part of SAS work that SAS allows you to refer to its members without a libref.
For example, if you refer to the table NEW, it is actually the table WORK.NEW, in the WORK library. You see this
in the log notes that result. Log messages always refer to tables using two-level names, so they will identify a table
as WORK.NEW even if you write the name simply as NEW. This is reason enough to write the two-level names for
members of the WORK library — then it is easier to cross-reference the program and the log.
There are other reasons to write the WORK libref explicitly. There are several special situations in which a one-
level name may refer to a library other than WORK (this depends especially on the USER= system option). You
take away any possibility of confusion when you write the WORK libref as the first part of the name whenever you
write the names of temporary tables.

Other Predefined Libraries


Besides the WORK library, there may be many more predefined SAS libraries, depending on which SAS products
are installed. These are the most important:

SASUSER
The user profile
SASHELP
Components of the interactive environment, along with sample data, ODS styles, and other system-
wide resources
DICTIONARY
Tables of reference data on the SAS environment. See “DICTIONARY Tables” later in this chapter.
Tables
The various kinds of files that can be found in a library are identified by member types. The most common member
type is DATA, for a SAS data file, the simplest and most standard form of a SAS data set. These are the files that are
used as tables when you are working in SQL.
A SAS data file organizes data in variables and observations, which in SQL serve as columns and rows. It makes
sense to think of a table as containing a fixed set of columns, since in SAS, the columns of a table must be finalized
before any rows can be added. Column attributes, including names, can change, but the data type, length, and
sequence of columns cannot change after the table is created. (SQL syntax does provide ways to add and remove
columns, but in SAS, this action involves rebuilding the entire table.) Rows, by contrast, are changeable; they can be
added and removed all day long, and the data values in a row can be changed even more easily. On the other hand,
in spite of this potential for change, the most common scenario for the life of a SAS data file is that the data does not
change after the file is first created.
SAS data files are used as tables in SQL, and they can also be used essentially anywhere in SAS. Data steps,
most procedures, and many interactive applications use SAS data sets as their primary form of input and output data,
so SAS data sets provide a way to exchange data between programs working anywhere the SAS environment.
Three versions of the CREATE TABLE statement can create a new table from a list of column definitions, from
the column definitions of an existing table, or from a query result set. In the first two cases, the new table is empty,
with columns but no rows.
To create a table from column definitions, write the list of definitions in parentheses. Each column definition
provides the name and data type for the column, along with column modifiers, if needed. The form of the statement
is:
create table table
(column name data type modifiers,
. . . );

The data type is usually NUM or CHAR(n), where n is the length in bytes of the character column. Column modifiers
can be LABEL=, FORMAT=, INFORMAT=, and TRANSCODE=. See “Columns” below for more details on the
form of a column definition.
The statement below is an example of defining a new empty table.
create table work.empty (
name char(24),
sequence num,
new_date num format=date9. informat=date9.
);

NOTE: The table WORK.EMPTY has been defined with 0 rows and 3 columns.

Use the DESCRIBE TABLE statement with existing tables to see more examples of this kind of CREATE
TABLE statement.
The DESCRIBE TABLE statement generates a log note that tells you about a specific table. It writes the note in
the form of a CREATE TABLE statement that could have originally defined the table. This is an example:
describe table riaa.yearend;

NOTE: SQL table RIAA.YEAREND was created like:

create table RIAA.YEAREND( bufsize=65536 )


(
Year num,
CD num,
Cassette num,
LP_EP num,
Single num
);

If an existing table has the exact columns you have in mind, you can replace the list of column definitions with a
LIKE clause that provides the name of a table to mimic. For example, this statement creates the table
MAIN.DESTINATION that has the same columns as the table MAIN.ORIGIN:
create table main.destination like main.origin;

Finally, as described previously in chapter 2, you can create a new table from a query result set in a CREATE
TABLE statement. Write the query expression in the AS clause:
create table table as
query;

The CREATE TABLE statement with an AS clause takes its column definitions and its rows of data from the result
set of the query expression.
Use this form of the CREATE TABLE statement to create a table that is a copy of the data in an existing table.
The directive AS SELECT * FROM makes the new table a simple copy of the existing table. This example copies the entire
table MAIN.ORIGIN to the new table MAIN.DESTINATION.
create table main.destination as
select * from main.origin;

SAS lets you create a table even if an existing table is already using the name you indicate. To avoid a name
conflict, SAS deletes the old table after the new table is created. This is affected by the REPLACE= system option;
see chapter 9.

Rows
If there is a statement to create an empty table, then there must be a statement to add rows. This is the effect of the
various forms of the INSERT statement.
INSERT INTO table
SET column=value, . . .
or
VALUES (value, . . . )
or
query expression;

The INSERT statement adds rows to a table. For the examples below, the table MAIN.STOCK was defined as:
create table main.stock
(symbol char(5), trade_date date, close num);

To add a row with specific values, write an INSERT statement with a VALUES clause. In the VALUES clause
write a list of values in parentheses.
insert into main.stock
values ('CPB', '30DEC1994'D, 21.07);

NOTE: 1 row was inserted into MAIN.STOCK.

A new row is created and the values are assigned to the columns according to the order of the columns. In the
new row, SYMBOL has a value of 'CPB', TRADE_DATE has a value of '30DEC1994'D, and CLOSE has a value of 21.07.
You can list selected columns of the table after the table name in the INSERT statement. Then values are
assigned to the columns in the order you indicate. List the values in the same order in the VALUES clause. This
example adds another row to MAIN.STOCK:
insert into main.stock (symbol, trade_date, close)
values ('CPB', '31DEC1999'D, 38.69);

NOTE: 1 row was inserted into MAIN.STOCK.


Any omitted columns get null values in the new rows.
Write multiple VALUES clauses to add multiple rows. This example has four VALUES clauses to create four
new rows:
insert into main.stock (symbol, trade_date, close)
values ('AOL', '30DEC1994'D, 0.88)
values ('AOL', '31DEC1999'D, 75.88)
values ('TWX', '30DEC1994'D, 17.56)
values ('TWX', '31DEC1999'D, 72.31)
;

NOTE: 4 rows were inserted into MAIN.STOCK.

To add existing data to a table, use a query expression in the INSERT statement. Write the query expression so
that its columns are in the same order as the columns of the table or the columns listed in the INSERT statement.
This is an example:
insert into main.stock
select symbol, date(), close
from main.daily;

Another way to add rows to a table is with the SET clause. In a SET clause, each column is listed with an equals
sign and a value for the column, much like an assignment statement. Any other columns get null values in the new
row. Use multiple SET clauses to add multiple rows. This example adds four rows:
insert into main.stock
set symbol = 'HGSI', trade_date = '30DEC1994'D,
close = 7.38
set symbol = 'HGSI', trade_date = '31DEC1999'D,
close = 76.31
set symbol = 'CRA', trade_date = '28APR1999'D,
close = 12.50
set symbol = 'CRA', trade_date = '31DEC1999'D,
close = 74.50
;

The SET clause might be more wordy, but its advantage is that it is self-contained, with the column names and
the values in one place.
In a sense, deleting rows is not quite as simple as adding rows. To delete an existing row, you first have to
identify the row. The DELETE statement deletes rows, and it uses a WHERE clause to identify the rows it acts on.
The syntax of the DELETE statement is:
delete from table
where condition;

The WHERE clause in the DELETE statement is the same as it is in a query, limiting the scope of action to rows
that meet a condition. In the case of the DELETE statement, though, where the condition is true, the rows are
removed from the table.
This example removes from the table MAIN.ACTIVE any rows for which the value of EXPIR is earlier than the
current date returned by the DATE function:
delete from main.active
where expir < date();

NOTE: 5 rows were deleted from MAIN.ACTIVE.

Write a DELETE statement without a WHERE clause to remove all rows from a table. This example removes all
rows from the table MAIN.PENDING:
delete from main.pending;

NOTE: 518 rows were deleted from MAIN.PENDING.


With the combination of adding and deleting rows, you can move rows from one table to another. For this to
work well, the two tables should have the same columns in the same order. Also, to keep data intact, the table from
which rows are being moved should be protected from being changed by any other program during the time it takes
to move rows from one table to another. This is indicated by the LOCK statements in the examples below.
A move of rows between tables may involve either all of the rows in a table, or a specific subset defined by a
WHERE clause. Either way, add the rows to the destination table, then delete them from the source table, making
sure to act on the same set of rows.
This example moves all rows from MAIN.PENDING to MAIN.HISTORY:
proc sql;
lock main.pending;
insert into main.history
select * from main.pending;
delete from main.pending;
lock main.pending clear;
quit;

This example identifies rows in MAIN.SALE that have a TRANSACTION_TYPE of RETURN and moves those
rows to MAIN.RETURN:
proc sql;
lock main.sale;
insert into main.return
select * from main.sale
where transaction_type = 'RETURN';
delete from main.sale
where transaction_type = 'RETURN';
lock main.sale clear;
quit;

In this example, note that the same WHERE clause is used in the INSERT and DELETE statements.

Changing Values in a Table


The UPDATE statement modifies existing values in a table. It uses a SET clause written the same way as the SET
clause of the INSERT statement. Usually, the UPDATE statement includes a WHERE clause so that changes are
made in one specific row or a selected subset of rows. The new values of the SET clause are applied to all rows that
meet the WHERE condition. Columns that are not listed in the SET clause are not changed. If there is no WHERE
clause, the new values are applied to every row in the table.
This example changes one specific value to another in one column of a table:
update corp.source
set vendor = 'Time Warner Inc.'
where vendor = 'Warner Communications Corp.';

NOTE: 4 rows were updated in CORP.SOURCE.

Columns
Columns are the primary data elements in SQL. The variables of a SAS data set are used as columns when the SAS
data set is used as an SQL table.
Defining a table is mainly a matter of defining its columns. It is also possible to redefine and rebuild a table by
adding and removing columns. Column names and attributes can be changed in place in a table without having to
rebuild the table.
Column definitions are used in the CREATE TABLE and ALTER TABLE statements to add columns to new
and existing tables. A column definition provides the name, data type, and attributes of the column.
The code fragment below is an example of a list of column definitions.
sequence num,
name char(28),
add_date num format=yymmdd10. format=yymmdd10.,
size num

This list defines the columns SEQUENCE, NAME, ADD_DATE, and SIZE, in that order. NUM and CHAR(28) are
data types. The remaining terms are column modifiers. See “Data Types” and “Column Attributes” below for more
information on the terms of a column definition. A CREATE TABLE statement with this list of columns would
create a table with these four columns. An ALTER TABLE statement with an ADD clause indicating this list of
columns would add the columns to an existing table.
Use the ALTER TABLE statement for actions on columns. The ALTER TABLE statement indicates the table
name followed by the details of one or more actions on the table. Each action is written in a separate clause. An
ADD clause contains column definitions to add columns to the table. For example, this statement adds the numeric
columns FORECAST and ERROR to the table CORP.REVENUE:
alter table corp.revenue
add
forecast num format=comma14.,
error num format=comma14.2;

An ALTER TABLE statement that adds or deletes columns will result in the table being rebuilt, as indicated by a
log note such as this one:

NOTE: Table CORP.REVENUE has been modified, with 11 columns.

A MODIFY clause is written in a similar way, but uses column modifiers to apply new attributes to existing
columns. See “Column Attributes” below for details. Most attributes can be changed in place, but if you change the
length of a character column, that requires rebuilding the table.
A DROP clause contains a list of columns to remove from the table. This ALTER TABLE statement deletes the
columns CENTER and REGION from the table CORP.REVENUE:
alter table corp.revenue
drop center, region;

NOTE: Table CORP.REVENUE has been modified, with 9 columns.

Data Types
A data type provides a specific way to organize digital space to hold a range of possible values. For a column in
SQL, the data type determines what values it is possible for the column to hold, and it also determines how much
storage space the column takes up in each row of the table. SQL standards specify around 30 data types, but SAS
takes a much narrower approach. SAS implements only two data types, though it maps SQL data types as well as it
can to its two data types.
The SAS data types are character and numeric. SAS SQL syntax recognizes a longer list of data types, but treats
them as aliases for the numeric and character data types. Some data types use arguments to set the size of the value.
These data types are available in SAS SQL:

Character
Data type: CHAR(n) where n is the length in bytes
Aliases: CHARACTER(n), VARCHAR(n)
Numeric
Data type: NUM
Aliases: DEC, DECIMAL, DOUBLE , , , , , ,
PRECISION FLOAT INT INTEGER NUMERIC REAL SMALLINT

Date
Data type: DATE
(Numeric with DATE informat and format)

The DATE data type is treated as a numeric value, but SAS also associates the DATE informat and DATE
format with the column so that dates are handled in a meaningful way.
SQL syntax allows width and decimal arguments for the DEC, DECIMAL, FLOAT, NUM, and NUMERIC type
names, but SAS includes these arguments in its syntax only for the sake of compatibility. It ignores the arguments
and creates its usual numeric type. For example, in other SQL environments you might declare a column as the data
type DEC(9, 3) to indicate a decimal value with 9 digits and 3 decimal places. In SAS, though, that is the same as
declaring it as NUM.

Column Attributes
The column attributes set by column modifiers in the SQL procedure are the same as the variable attributes used
throughout SAS. Column modifiers can be written:

in the SELECT clause of a query expression


in the CREATE TABLE statement to define a new empty table
in the ADD clause of the ALTER TABLE statement to add columns to an existing table
in the MODIFY clause of the ALTER TABLE statement to change the attributes of an
existing column

Write each column modifier in the form


attribute=value

where attribute is one of the attributes described below and value is a value suitable for that particular attribute.
These are the column attributes you can use:
LENGTH

The length attribute indicates the length of a character column. Write a counting number between 1
and 32767.
When you are defining a new empty column, the LENGTH column modifier is not needed because
you can indicate the length as part of the data type. That is, instead of writing the column modifier
LENGTH=12, write the data type CHAR(12). The LENGTH column modifier is useful, however, in other

circumstances. Use it in the SELECT clause of a query to set the length of a character column,
usually because a column is computed using functions or was previously stored with unneeded
trailing spaces. Use it in the MODIFY clause of an ALTER TABLE statement to change the length
of a character column in an existing table.
The LENGTH column modifier cannot be used to set or change the length of a numeric column.
LABEL

The label attribute provides a text label that the SELECT statement, all reporting procedures, and
interactive applications can use to label a column, usually as a column header. Write the label as a
quoted string.
FORMAT

The format attribute selects a format with optional arguments for use with the column. This format
converts the data value to text whenever the column is displayed, whether in a SELECT statement,
another procedure, or an interactive display.
If you do not indicate a format when creating a column, the default format is used. This is the BEST
format for a numeric column, the standard character format for a character column.
INFORMAT

The informat attribute selects an informat with optional arguments for use with the column. The
informat converts input text to a data value whenever a table is edited interactively. If you select an
informat, it should be compatible with the format for the column, whenever possible.
See chapter 5 for information on formats and informats.
TRANSCODE

For character columns only, the transcode attribute permits or prevents conversion of the column’s
values between character encodings when a table is moved between environments where different
character encodings are in use. Use the value YES to permit a change in encoding or NO to prevent
it.
For ordinary text data under normal circumstances, writing TRANSCODE=YES keeps text characters intact.
Write TRANSCODE=NO if a character column contains binary data or if transcoding is causing problems.

Change column attributes in an existing table using the MODIFY clause in the ALTER TABLE statement.
Follow this form when you write a statement for this purpose:
ALTER TABLE table
MODIFY
column attribute=value . . .,
. . .
column attribute=value . . .;

SAS rebuilds the table if you change the length of a character column. For any other attribute changes, the table
is modified in place.
The most common reason to change column attributes in an existing table is that a table is created with the
correct data, but without the format and label attributes needed for reporting. Suppose that the table
WORK.PROJTREE was created with the columns HOURS, START, FINISH, and LINK, but without indicating
any attributes. A statement such as this one would add format and label attributes to the table:
alter table work.projtree
modify
hours format=comma6. label='Actual Hours',
start format=yymmddn8. label='Actual Start Date',
finish format=yymmddn8. label='Actual Complete Date',
link label='Follow-Up';

Indexes
An index keeps track of the locations of rows in a table according to the values of key columns. Indexes make
retrieving data faster, particularly when a query retrieves a single row or a small subset of rows based on the value
of one or two columns. Indexes may also speed processing when a table is needed in sorted order, as is common in
table joins, or when a key column is used to form rows into groups for computing statistics.
An index is attached to a specific table and refers to a specific column or a few columns in that table. If you are
designing a database, it makes sense to think of a table’s indexes as being part of that table, and SQL syntax treats
indexes in this way. In another sense, though, an index is outside of the table it refers to. SAS does not physically
store a table’s indexes in the same file as the table itself, but in a separate file.

Plurals
The conventional plural for index is indexes. In older writing and especially in scholarly
mathematical writing the more common usage is the Latin-influenced irregular plural form
indices, pronounced almost like “in da seas.” As relational database theory is based on
mathematical theory, you might see indices in discussions of SQL.
Indexes may be part of the design of a database, if you are putting together a database in SAS. Otherwise, you
may not need to think about indexes except when you are trying to speed up a program. These are some factors that
may point you toward creating an index for a column:

You read from a table many times, but do not update it nearly so often.
You use a WHERE clause to select a small subset of a table, or a specific observation.
You use a GROUP BY clause to form rows into groups.
You usually query the table with a specific ORDER BY clause.
You match on a specific column when joining tables.

A table might use one index, or several. Create indexes using the distinctive columns that are at the center of
accessing or processing rows.

Use the primary key, the ID column or columns that distinctly identify a record.
Use the key columns that join the table to another table in an ON clause.
Use the most distinctive column (less often, it is better to use two or three columns together)
in a WHERE clause.
Use the columns of the ORDER BY or GROUP BY clause.

When you know the table and column(s) that will be included in an index, you are ready to define the index.
Write a CREATE INDEX statement. For an index on one column, called a simple index, write the statement as:
CREATE INDEX column ON table (column);

The column name appears twice because it serves both as the name of the index and the name of the column
used in the index.
To define an index on multiple columns, called a composite index, you have to select a separate name for the
index. The name cannot be the same as the name of any of the columns in the table. Write the statement as:
CREATE INDEX index ON table
(column, column, . . .);

The following examples create a simple index on the column ID in the table CORP.PERSONNEL and a
composite index TRANSFER on the columns SRC and DEST in the table MAIN.NETWORK.
create index id on corp.personnel (id);
create index transfer on main.network (src, dest);

Use the DESCRIBE TABLE statement to find out what indexes a table has. When a table has indexes, the log
note from this statement shows a CREATE TABLE statement to show the table’s columns, followed by CREATE
INDEX statements to show how its indexes are defined.

Integrity Constraints
An integrity constraint, often simply called a constraint, has some of the qualities of a WHERE clause and some of
the qualities of an index. You can write a condition in a WHERE clause to limit the rows that are returned by a
query. An integrity constraint builds a similar kind of restriction into a table, so that the table cannot store rows that
are not consistent with its rules.
This serves two purposes. It prevents data that breaks the rules of a table’s design from being stored in the table.
Then, it simplifies the process of querying the table, because certain qualities of the data in the table are known at
the outset and do not have to be checked again by each query that reads the table.
This form of the ALTER TABLE statement defines integrity constraints:
ALTER TABLE table
ADD CONSTRAINT constraint rule;

Use any valid name for the integrity constraint, but names should not be the same as the SQL keywords that
appear in integrity constraint rules. These are the integrity constraint rules:
CHECK (condition)

A CHECK rule lets you use a condition to validate the value of an individual column or any
combination of the columns in any individual row. When you write the CHECK rule, condition is a
logical expression based on the table columns, similar to a WHERE condition.
NOT NULL (column)

The NOT NULL rule for a column prevents the column from having a null value.
DISTINCT (column, . . .)

The DISTINCT rule for a column requires unique values for the column in every row. No two rows
can have the same value in that column.
When applied to a list of columns, the DISTINCT rule requires a unique combination of values for
those columns in every row. No other row can have that same combination of values.
As elsewhere, UNIQUE can be used as a synonym for DISTINCT.
PRIMARY KEY (column, . . .)

The PRIMARY KEY rule defines a primary key based on one or several key columns. Each column
in the primary key is prevented from having a null value, the same as in a NOT NULL rule. The
combination of key values is required to be unique in each row, the same as in a DISTINCT rule. A
table can have only one primary key.
Defining a primary key allows it to be referenced in a FOREIGN KEY rule of another table.
Options in the FOREIGN KEY rule can prevent you from deleting certain observations from the
table that contains the primary key.
FOREIGN KEY (column, . . .) REFERENCES table

The FOREIGN KEY rule defines a foreign key based on one or several columns. These columns are
the columns of a primary key of a different table, as indicated in the REFERENCES clause. The
foreign key values are required to match the values in a row of the primary key that the foreign key
refers to.
Primary and foreign keys are used together — often forming the ON clause of a table join operator
— but you must define the primary key before you can define the foreign key that refers to it.
A foreign key may refer to a primary key that uses different column names. If there are multiple
columns, write the column names in the foreign key in the order in which they correspond to the
columns of the other table’s primary key.

The definition of a foreign key may include action options, which are written at the end of the definition. These
options determine the reactions of the foreign key and primary key when you try to update or delete a primary key
value that the foreign key refers to. The two referential actions are DELETE and UPDATE, in each case referring to
a change in a row in the primary key, potentially invalidating the foreign key’s reference to the primary key in at
least one row. These are the possible action options and their effects:
ON DELETE RESTRICT
ON UPDATE RESTRICT

Prevents the change in the primary key and generates an error condition. This is the default action.
ON DELETE SET NULL
ON UPDATE SET NULL

Sets the corresponding foreign key values to null when there are changes in the primary key.
ON UPDATE CASCADE

Copies any changes from the primary key to the foreign key.

When you define integrity constraints, error conditions occur if the data already in the table does not meet the
rules of the integrity constraints. After integrity constraints are in place, error conditions can occur when a program
changes the data values in the affected tables or an interactive user edits one of the tables. If a change would violate
the integrity constraint, there is a data error message in the SAS log or the interactive window.
SAS often must create indexes (or use existing indexes) to implement integrity constraints. These indexes cannot
be removed from the table unless you remove the integrity constraints first.

Views
Views, as mentioned in chapter 2, allow a stored query expression to be used like a table. A view is defined in the
CREATE VIEW statement:
create view view as
query expression;

Like tables, views are usually given two-level names, consisting of a libref and a member name. The name of a
view must not conflict with the name of an existing table, and vice versa.
The query expression in a view can use most features of a query, but it cannot use the ORDER BY clause.
Create a table instead if you want to see data in a specific order.
Within a view, one-level names can be used to refer to tables and views that are stored in the same library as the
view. When a view is written this way, if the view and the tables and views it refers to are copied together to a
different library, the view will remain valid.
Use the DESCRIBE VIEW statement to see the query expression that is stored in a view:
describe view view;

The DESCRIBE VIEW statement shows the query program of a view by writing it as a log note, as shown in
this example:
describe view SASHELP.VTABLE;

NOTE: SQL view SASHELP.VTABLE is defined as:

select *
from DICTIONARY.TABLES;

Views That Contain Library Definitions


If a view must refer to tables in multiple libraries, it may be a good idea to store the library definitions within the
view definition, as described here.
Query expressions depend on librefs to identify the tables from which they draw data. The libref is the first part
of the two-level name of a table or view. The librefs of an SQL view can be defined in the view itself, so that the
view is self-contained. At the end of the CREATE VIEW statement, write a USING clause that contains LIBNAME
statements.
A LIBNAME statement defines a libref, usually by associating it with the physical file name of the directory that
contains a library. The essential terms of a LIBNAME statement are a libref and a physical location, and often,
options are also needed.
Write the same terms when you write the LIBNAME statement as a clause in a DEFINE VIEW statement. If a
view needs multiple LIBNAME statements, though, write commas, not semicolons, to separate them. Any USING
clause you write is effective only in the context of the view that contains it.
The code model below shows the construction of a CREATE VIEW statement with a USING clause that
contains two LIBNAME statements.
create view name as
query
using libname libref "location" options,
libname libref "location" options;

Editable Views
Some kinds of SQL views allow you to edit the data they show. In general, these are views in which the query refers
to only one source table and contains only the SELECT, FROM, and WHERE clauses. You can change the data in
the source table by taking these actions on the view:

The INSERT statement to add rows.


The DELETE statement to delete rows.
The UPDATE statement to change data values.
Interactive editing.

The CONTENTS Procedure


The CONTENTS procedure provides a way to get information about a table or view, especially about its columns.
To see information about a table or view, indicate the table or view in the DATA= option of the PROC CONTENTS
statement, as shown here:
PROC CONTENTS DATA=table or view;
RUN;

The CONTENTS procedure generates output that shows general information about a table, such as the file size
and a count of variables (columns), observations (rows), indexes, and integrity constraints. This is followed by a
table of column names and attributes, which is the same for a table or view. If a table has indexes or integrity
constraints, separate output tables describe those.
For a table that has integrity constraints, the DESCRIBE TABLE and DESCRIBE TABLE CONSTRAINTS
statements generate the same “Alphabetic List of Integrity Constraints” that the CONTENTS procedure generates.
These SQL statements are written as:
DESCRIBE TABLE table, . . . ;
DESCRIBE TABLE CONSTRAINTS table, . . . ;

Deleting
It isn’t necessary to delete every object you create in SAS and SQL. SAS automatically deletes many of them. The
WORK library is deleted automatically at the end of the SAS session. Tables are replaced when you create new
tables of the same names. Indexes are automatically deleted when you delete the table they belong to. Still,
sometimes you need to explicitly delete something you have created, and SQL has statements for this purpose,
particularly the DROP statement.
The three forms of the DROP statement are:
DROP TABLE table, . . .;
DROP VIEW view, . . .;

DROP INDEX index, . . . FROM table;

These statements delete tables, views, and indexes, respectively.


To delete tables, list them in the DROP TABLE statement, for example:
DROP TABLE WORK.TEMP1, WORK.TEMP2, WORK.TEMP3;

NOTE: Table WORK.TEMP1 has been dropped.


NOTE: Table WORK.TEMP2 has been dropped.
NOTE: Table WORK.TEMP3 has been dropped.

Replace the word TABLE with VIEW to delete views.


To delete indexes, you will need to know what indexes a table has, along with their names. Use the DESCRIBE
TABLE statement, if necessary, to find out about the indexes.
To delete the indexes from a table, list the indexes in the DROP INDEX statement followed by a FROM clause
to identify the table. For example, this statement removes three indexes from the table CORP.CENTURY:
drop index priority, cont, start from corp.century;

NOTE: Index PRIORITY has been dropped.


NOTE: Index CONT has been dropped.
NOTE: Index START has been dropped.

Deleting integrity constraints requires a form of the ALTER TABLE statement:


ALTER TABLE table
DROP CONSTRAINT constraint, . . . ;

Data Set Options


Data set options modify the way tables are accessed. Data set options are a feature of SAS, not so commonly used in
SQL as in other steps in a SAS program, but available for use in SQL statements.
Write data set options in parentheses after the table name. If you write more than one option, write them as a list
separated by spaces. All data set options are written as an option name, an equals sign, and a value. The code model
for writing data set options, then, is:
table (option=value . . .)

Among the many data set options, these are the ones you are most likely to use in SQL:
CNTLLEV=

The CNTLLEV= option determines the extent to which a table is locked while the program is using
it. Locking prevents another program from modifying the table while it is in use. Any locking
selected with this option stays in place for the duration of the SQL statement.
Values: REC indicates record-level locking. Individual rows are locked as needed. MEM indicates
member-level locking. The table is locked for the duration of the SQL query or statement. LIB
indicates library-level locking. All files in the library are locked for the duration of the SQL query
or statement.
COMPRESS=

The COMPRESS= option applies data compression to rows of a table. Use this option when you
create a new table.
Values: CHAR or YES applies character compression (RLE, or Run Length Encoding). BINARY applies
binary compression. NO stores rows in their uncompressed form.
Default: the value of the COMPRESS= system option
DROP=columns

The DROP= option excludes columns from being stored in an output table, or from being read in an
input table.
EXTENDOBSCOUNTER=

The EXTENDOBSCOUNTER= option for a new table uses an expanded observation counter that
allows quintillions of observations to be counted in the table, but is incompatible with SAS 9.2 and
older SAS releases. The old observation counter stops counting after 2 billion. Use EXTENDOBSCOUNTER=NO
to create tables with observation counters that are compatible with SAS 9.2 and earlier.
Values: YES, NO
Default: YES (NO in SAS 9.3)
Available: SAS 9.3 and later
FIRSTOBS=n

The FIRSTOBS= option lets you skip over rows at the beginning of a table, starting at the row
number indicated.
Default: 1, or the value of the FIRSTOBS= system option
IDXNAME=index

The IDXNAME= option selects a specific index to use while reading the table.
IDXWHERE=

The IDXWHERE= option lets you control whether an index is used for processing a WHERE
clause. Selecting NO may make a query run faster when a WHERE clause selects a large number of
rows.
Values: YES, NO
KEEP=columns

The KEEP= option selects columns to store in a new table or columns to read from an existing table
or view.
LABEL='label'

The LABEL= option provides a descriptive label.


OBS=n

The OBS= option lets you skip over rows at the end of a table, stopping at the row number
indicated.
Default: MAX, or the value of the OBS= system option
RENAME=(old name=new name . . . )

The RENAME= option changes the name of one or more columns.


REPLACE=

The REPLACE= option, for a new table, determines whether the table can replace an existing table
of the same same.
Values: YES, NO
Default: YES, or determined by the REPLACE system option for permanent libraries

Data set options are used mainly with tables. Most data set options cannot be used with SQL views. Data set
options also cannot be used when reading DICTIONARY tables (described next).

DICTIONARY Tables
The special libref DICTIONARY contains tables with information about objects in the SAS environment. These
DICTIONARY tables are virtual tables that can be used only in SQL queries.
Each DICTIONARY table has its own specialized area of focus. These are some of the DICTIONARY tables
you might look at, along with the items they describe:

DICTIONARY.OPTIONS
System options
DICTIONARY.TITLES
Title and footnote lines
DICTIONARY.STYLES
ODS styles

DICTIONARY.EXTFILES
Filerefs
DICTIONARY.LIBNAMES
Librefs
DICTIONARY.ENGINES
Library engines
DICTIONARY.MEMBERS
SAS files
DICTIONARY.TABLES
SAS tables

DICTIONARY.COLUMNS
Columns in SAS tables
DICTIONARY.XATTRS
Extended attributes for SAS tables and columns
DICTIONARY.DICTIONARIES
Columns in DICTIONARY tables
DICTIONARY.INDEXES
Indexes
DICTIONARY.TABLE_CONSTRAINTS
Integrity constraints

DICTIONARY.VIEWS
Views
DICTIONARY.VIEW_SOURCES
Tables and views used as input to views
DICTIONARY.CATALOGS
Catalogs
DICTIONARY.MACROS
Macros
DICTIONARY.FORMATS
Informats and formats

DICTIONARY.FUNCTIONS
Functions and CALL routines

You can query these tables in a SAS program, using SQL statements only, to get information about objects and
settings in the SAS session. For example, this query returns information on the CENTER system option, including
its current value:
select * from dictionary.options
where optname = 'CENTER';

optname opttype setting optdesc level group


CENTER BooleanCENTER Center SAS procedure PortableLISTCONTROL
output

For a complete list of DICTIONARY tables, run this query:


select distinct memname, memlabel
from dictionary.dictionaries;

To get a list of the columns in a DICTIONARY table, use a DESCRIBE TABLE statement, such as:
describe table dictionary.titles;

On occasion it may make sense to use DICTIONARY tables not just as a reference source, but as part of a query.
By writing a subquery to return a specific value from a DICTIONARY table, you can use this value almost like a
constant value in a SELECT or WHERE clause.
The subquery in the example below retrieves the numeric value of the YEARCUTOFF= system option. The
WHERE clause then uses this value in a comparison, looking for rows where the column EVENT_YEAR has a
value less than this.
where event_year < (select input(setting, f24.) from dictionary.options
where optname = 'YEARCUTOFF')

Another way to use a value from a DICTIONARY table is to create a macro variable from the value. For
example, you might create a macro variable that indicates the number of columns in a table. The macro variable can
then be used in another query or in any other step in the SAS program. See chapter 10 for a discussion of this
approach.
8
Working With DBMS Data
Wouldn’t it be nice if you could work with SAS data and DBMS data in the same program? A single program could,
for example, retrieve data from a database, combine it with other data, analyze it, and format a report of the results.
This is, in fact, easily done in SAS with the use of SAS/ACCESS and the SQL procedure.

Two Ways to Connect


SQL statements in SAS act on SAS files. The tables and views in SAS SQL are SAS data sets. However, some of
the most interesting data may be found in other SQL environments, especially relational databases. To let you make
use of this data in SAS, SAS provides the ability to pass queries and statements to databases for them to execute.
The result sets of queries are returned to the SAS environment for subsequent processing. This is a SAS feature
known as SQL pass-through.
SQL pass-through is implemented using statements and a clause in the SQL procedure.

The CONNECT statement establishes a connection to a specific database.


The EXECUTE statement passes an SQL statement to the database for execution.
The CONNECTION TO clause in a query passes a query to the database and returns the
result set.
The DISCONNECT statement ends the connection to the database.

SQL pass-through is especially useful when you want to process subsets of data or combine data from multiple
tables in the same database. It lets the database do most of the work of retrieving the data, which is, after all, what
databases are especially good at.
A second way to connect to database data from SAS is with a database library engine. This allows a database to
be treated as a SAS library. A database library engine uses a special form of the LIBNAME statement to declare the
database as a SAS library. You can then access database tables as if they were SAS tables. The database library
engine is especially useful if you want to process an entire database table, if you want to retrieve summary data from
the database, or if you want to create a new database table with the results of SAS processing.

SAS/ACCESS for Relational Databases, and Access Options


SAS/ACCESS is a set of add-on products for SAS that add the ability to connect to external data in various forms,
especially relational databases. It is this connection to the database that makes SQL pass-through and the database
LIBNAME engine possible. There are many different SAS/ACCESS products because there are many different
DBMSs (and other data formats) to connect to.
Usually, the database is on another computer, so setting up a connection to a database may start with adding the
database’s connection information to your computer or a server, in a format defined by the DBMS — a step that
might be done by a database administrator. You may also need to obtain a user ID and password that lets you access
the database. With that done, connecting to the database from SAS is simple. You will need the DBMS name,
database name (or alias), user ID, and password, and you write each of these as a separate option in the statement
that connects to the database, either the CONNECT TO statement or the LIBNAME statement. The CONNECT TO
statement, used in the SQL procedure as described below, follows this form:
CONNECT TO DBMS (DATABASE=database
USER=user ID PASSWORD=password);

The exact options you use depend on the DBMS and database. Some databases may require more than three options,
or may use different names for some of the options.
You use the database connection by referring to the DBMS name in subsequent statements. If there is a reason to
use a different name for the connection — for example, if you connect to two separate databases that use the same
DBMS — write the word AS and an alias after the DBMS name, then use the alias as the connection name in all
subsequent statements.
The LIBNAME statement may use the same connection options, but it uses them to define a libref:
LIBNAME libref DBMS DATABASE=database
USER=user ID PASSWORD=password;

The LIBNAME statement runs once, near the beginning of the program (or when you launch an interactive SAS
session). The libref is a short name that you use in the rest of the SAS program to identify the library. The use of the
LIBNAME statement and engine is described later in the chapter.
For some programming approaches, you benefit from connecting both ways at once. In SAS 9.3 and later,
execute the LIBNAME statement first, then in the PROC SQL step, execute the CONNECT USING statement,
using either of these forms:
CONNECT USING libref;

CONNECT USING libref AS alias;

The CONNECT USING statement tells the SQL procedure to use the same database session that a LIBNAME
statement has already opened. Depending on the DBMS, having a shared session may make additional database
programming techniques possible — techniques that the DBMS permits only within a single session.

SQL Pass-Through
For SQL pass-through, you write a database query or command and embed it in a PROC SQL step. This pass-
through query or command is an SQL statement that is intended to be executed by a database and that refers to data
stored in that database. The database query or command has to be written according to the SQL rules of the DBMS,
which are sure to differ in important ways from the way SQL is written in SAS.
SQL pass-through may require several statements in the PROC SQL step, starting with a CONNECT TO
statement to connect to the database. Among the SQL statements that follow, there can be any number of
CONNECTION TO clauses and EXECUTE statements, indicating database queries and commands, respectively.
After these are complete, a DISCONNECT FROM statement may be added to disconnect from the database.
Looking specifically at a database query, it appears in an SQL query as an expression, essentially a subquery, in
the following form.
CONNECTION TO DBMS (database query expression)

The query is passed to a database, and the results that are returned from the database are used within the SAS query
expression.
In practice, the result set from the database is almost always stored in its entirety as a SAS table. Statements to
connect to and disconnect from the database may also be needed. When you fill in the required clauses of the
database query and add the PROC SQL and QUIT statements, the form of a complete step to create a SAS table of
data from a database is
PROC SQL;
CONNECT TO DBMS (DATABASE=database
USER=user ID PASSWORD=password);
CREATE TABLE table
AS SELECT *
FROM CONNECTION TO DBMS (
SELECT column, . . .
FROM table, . . .
);
DISCONNECT FROM DBMS;
QUIT;

There is a separate statement for passing along a complete SQL statement (or command) to the database to
execute. This is the EXECUTE statement. The form of the statement is:
EXECUTE (
DBMS statement
) BY DBMS;

Use the EXECUTE statement for DBMS statements that declare new tables, add indexes, define views, delete
objects, or take similar actions. As with the CONNECTION TO clause, a connection to the database is needed (from
the CONNECT TO statement), and SAS does not examine the SQL statements in any detail before passing them
through to the database.
Most SQL environments use semicolons to terminate statements, just as SAS does. You can send a series of
semicolon-terminated statements to the database in a single EXECUTE statement. SAS sends the statements to the
database all at once, but the database executes them one by one. It is considered good form to write a semicolon at
the end of each pass-through SQL statement even when you are executing only one statement.
Do not write database SELECT statements in the EXECUTE statement because this would not provide a way to
obtain the results.

Avoiding Confusion When Working in Multiple Environments


The SQL pass-through mechanism allows program statements that execute in multiple environments to be combined
in the same program. This can cause confusion if you lose track of the varying requirements that the code must
meet, depending on which environment it executes in. One moment, you’re coding in SAS, then a moment later, in a
relational database. They both use SQL, but they do not use it in quite the same way. A degree of confusion is
almost inevitable, but it is possible to keep the environments separate in the code you write.
In general, follow the DBMS rules of SQL syntax for pass-through code, since those statements will actually
execute in the DBMS environment. Follow the SAS rules of syntax in all other code. These are specific points to be
alert to when you are using two or more kinds of SQL in the same program:

A database doesn’t know anything about SAS routines, particularly formats and functions.
Use database functions in the SQL code that you pass through to the database.
Most SQL environments require you to write quoted strings with single quotes only. This
differs from SAS, where single quotes and double quotes may often be used
interchangeably. If you convert SAS SQL to database SQL, change any double quotes to
single quotes.
Constant values have to be written according to the environment they are in. This is a
particular issue with dates. In the SAS environment, you write a date as a SAS date constant,
such as '31DEC2015'D. When you write a date in pass-through code, you must write it according
to the syntax rules of the DBMS, often as either '2015-12-31' or '12/31/2015'.
If you write macro variable references in SQL statements, they are resolved before the SQL
statement executes, regardless of whether SAS executes the statement itself or passes it to a
database for execution. This means you can freely use SAS macro variable references in
pass-through code. See chapter 10 for a discussion of macro variables.
If you create macro variables in SQL using the INTO clause, also described in chapter 10,
run those statements in the SAS environment only.
Converting Database Columns to the SAS Environment
When a pass-through database query is executed, the result set from the database has its columns converted so that
they fit into the SAS environment. A column’s name, attributes, data type, and data values may have to be converted
to the nearest available SAS equivalent.

Names
SAS limits names to 32 characters. Many DBMSs permit longer names than this. SAS truncates names to 32
characters to make them fit. If this results in two or more columns having the same name, SAS substitutes numeric
suffixes for the final character(s) of the names, as needed.
Many DBMSs permit column names to be arbitrary text, including spaces and symbols. These names are
permitted in SAS but are hard to use. (See “SAS Name Literals and the RENAME= Data Set Option,” below.) A
name in a SAS program must be a single word, though it can contain underscore characters and digits in addition to
letters.
Instead of letting SAS convert column names for you, it is better to write AS clauses in the database query
wherever they are needed to create column aliases that are legal SAS names.
This example from the SELECT clause of a database query creates a shorter column alias for a database column
name that is longer than SAS permits.
column_name_that_is_just_simply_too_long_for_sas
as column_name_long_long_long

Data Types
A DBMS may have dozens of data types in order to store data efficiently and securely. SAS uses only two data
types. When SAS receives columns from a database, ordinary numeric and time data types are converted to SAS’s
numeric data type. All character and most binary data types are converted to SAS’s character data type.

Column Attributes
SAS may create column attributes based on information from the database. This is especially important for time
data. A date column in the database is converted to a SAS date value, and SAS gives the column a SAS date format
so that it is displayed as a date. Similarly, a timestamp column in the database is converted to a SAS datetime
column with a SAS datetime format, and a time column in the database is converted to a SAS time column with a
SAS time format.

Data Values
In practice, you may never notice data values being changed when they move from a DBMS to SAS, but values are
converted whenever a value is not available in SAS. This can happen when:

An especially long integer is converted to the double-precision floating point of the SAS
numeric data type. The value is rounded to a nearby value. (See “Converting Numeric ID
Codes” below.)
There is a fractional decimal value. SAS doesn’t have a decimal data type, and it converts
numeric values to double-precision floating point. Though you probably won’t notice the
difference, the double-precision equivalent of a fractional number is not exactly the same as
the decimal value (except in a few special cases).
A character value contains a character that is not available in the SAS session. A similar
character or placeholder character is put in its place.
A character column has a length of 32,768 bytes (32 kilobytes) or more. Characters at this
position or beyond are omitted.
A date is outside of the SAS calendar range.
Null Values
SAS treats its missing values as the equivalent of the null values of SQL. For a numeric column from a database,
any null value gets converted to a standard missing value. In SAS reporting, standard missing values are typically
displayed as dots.
For a character column, null values are converted to blank values. This is a change that sometimes makes a
difference. In relational databases there is a clear distinction between null values and blank values in a character
column, but SAS makes no such distinction.
If you specifically need to keep track of which values were null in a database column, you can create a separate
column for that purpose. This can be done with the CASE and IS NULL operators in the database query. The
example below adds the column LOC_NAME_IS_NULL to indicate when the column LOC_NAME has a null
value.
loc_name,
case when loc_name is null then 'Y' else 'N' end
as loc_name_is_null

SAS Name Literals and the RENAME= Data Set Option


Most databases follow the same rules about names that SAS uses, but there can be exceptions. A database may
permit arbitrary column names which may consist of multiple words and may contain symbol characters. To refer to
these columns in standard SQL, their names are enclosed in double quotes.
In SAS, these names have to be quoted, with the quoted string followed by the letter N to indicate a name literal.
Name literals are awkward to work with and not accepted everywhere in SAS, so the best approach is to
immediately rename these columns to give them proper SAS names. Do this using the RENAME= data set option in
the CREATE TABLE statement. Data set options are enclosed in parentheses, and the RENAME= list of name
changes must also be enclosed in parentheses. A possible example of this, connecting to a database with the alias
MYDB, is shown below.
create table current.change
(rename=('Common Name'n = common_name
'% Change'n=percent_change))
as select *
from connection to mydb (
SELECT "Common Name", "% Change"
FROM G1.MEASURES
);

In this example, the database column Common Name is renamed as COMMON_NAME. Similarly, % Change is renamed
as PERCENT_CHANGE. These new names are SAS names and can be used freely in SAS programs that work with
the data. When the data is stored in the table CURRENT.CHANGE, it is with the new names.

Time Data
SQL standards suggest date, timestamp, and time data types for time-based values, but the standards do not provide
many other details. In SQL programming, then, time data is known as the one area where conversions will inevitably
be required when moving data and SQL code between SQL environments.
Fortunately, SAS/ACCESS handles the conversion of data values. You aren’t likely to run into issues with this
as long as the dates are between the years 1582 and 19999 — these years mark the endpoints of the SAS calendar.
SQL date, timestamp, and time columns turn into SAS date, datetime, and time values, respectively. These fall
within the numeric data type in SAS, so it is not the data types, but the format attributes, that ensure that the values
are displayed in a sensible way. (Similarly, the correct informat attributes make it possible for the data to be edited
in SAS.) SAS automatically provides an appropriate format when it creates a SAS date, datetime, or time value from
a database column.
However, the format SAS selects for a database column is not necessarily the one you would choose. When it is
important to have a specific format, change the format and informat attributes of a column in the SELECT clause
that precedes a pass-through query, or using the ALTER TABLE statement afterward, as described in the previous
chapter. When you create tables from database data, this is a need that will arise most often with time data. Formats
are discussed in more detail in chapter 5.
The following example shows how you can change column attributes immediately after creating a table. The
CREATE TABLE statement creates the table CURRENT.PAY by SQL pass-through. The ALTER TABLE
statement then changes the format and informat attributes of one of the new table’s columns.
proc sql;
create table current.pay
as select *
from connection to database (
query
);
alter table current.pay
modify change_ts
format=e8601dt.
informat=e8601dt.;
quit;

The E8601DT format and informat routines in this example are SAS datetime routines based on the ISO 8601
standard for display of time data.

Converting Numeric ID Codes


Some databases contain long decimal integer values that are used as ID codes. If these ID codes are 16 digits or
longer (not counting leading zeros), they cannot be accurately represented in SAS’s numeric data type. Odd numbers
may be rounded down to the next even number, or rounding may go farther than that, depending on the length of the
codes. Showing 4000100023456789 as 4000100023456788 might be close enough if the value represents something
like a count or a distance, but if the value is an account number, this limited accuracy means you may be pointing to
the wrong account. To keep long ID code numbers intact, they must be converted to character values. If you are
lucky, the database will identify the column as an ID code, and SAS will do this conversion automatically;
otherwise, you can write the conversion in the database query. You might prefer to convert all such codes to
character values, depending on how the values will be used in the SAS environment.
The exact SQL column expression to accomplish this depends on the DBMS and the details of the specific
column. Often, though, it can be written using the CAST function of SQL, which converts one data type to another.
This is an example:
cast(account_num as char(16)) as account

This example converts the 16-digit integer code column ACCOUNT_NUM to a 16-character string, then gives the
result the alias ACCOUNT.

Character Column Lengths


If you are used to working with character data in SAS, it can seem as if database designers are surprisingly casual in
their use of storage space for character columns. A database column that contains simple descriptive phrases such as
“Out of stock,” “Replacement,” and “Return” may be given a length of 100 or even 500 characters.
Database designers can do this because in the column-oriented storage scheme of a relational database, only the
actual text in a column has to be physically stored. The word “Return” takes up essentially the same storage space
whether it is in a column with a length of 20 characters, or a column with a length of 250 characters. Also in the
column-oriented database world, a table column has no particular consequences for the execution of a query unless
the column is actually used in the query.
The implications of column length are quite different in SAS, where all storage is row-oriented. In a SAS table,
all character values have a fixed length equal to the column length. This creates trailing spaces that have to be
physically stored. A character column with a length of 250 takes up real space, 250 bytes in each row. A table may
contain several such columns, so the need for storage space can add up quickly. At the same time, every column in a
table affects the speed of access to the table, whether that column is actually used or not.
For efficiency, then, you may often want to shorten character columns when you transfer them from the
relational database environment to the SAS environment. A 500-character column that contains 20-character phrases
may be shortened to 20 characters in length. It is best if you can do this in the database by shortening the columns in
the pass-through query.

Querying Lengths
You don’t always know what the lengths of values in a database column are, but that is an
answer the database can tell you.
Assuming that TRIM and LENGTH are database functions that, used together, return the
length of a character string without its trailing spaces, the following pass-through query
produces a frequency table of the lengths of a character column (BIG_TEXT in this
example):
select
length(trim(big_text)) as textlength,
count(*) as freq
from big_db_table
group by length(trim(big_text))

After you see what lengths are actually used in the character values, it may be easier to
decide what length you need for the column you store in SAS.

The exact column expression you write to shorten a character column depends on the DBMS and its approach to
SQL. It may also be affected by the data type of the column. Often the best way to shorten a column is using the
CAST function to explicitly convert the value to a fixed-length character string with a length that you choose. This
example, which works in some DBMSs, shortens the column REASON1 to 20 characters.
cast(a.REASON1 as char(20)) as reason1

If CAST is not available or does not work in the same way in a particular DBMS, the SUBSTR function may be
an alternative. The three arguments are a character column, the constant 1 indicating the beginning of the character
value, and a constant indicating the new length. However, the SUBSTR function is not available in every DBMS
and does not necessarily shorten a column if it is available. (The SUBSTR function is also implemented in SAS, but
it does not have the effect of shortening a column in SAS.) Where this technique works, the expression is formed as
shown in the example below.
substr(a.REASON1, 1, 20) as reason1

Even after shortening, character values may have many trailing spaces. Character compression can reduce the
storage space these character values take up in a SAS table. When you compress a SAS table, each entire row is
compressed separately. Apply character compression with the COMPRESS=YES or COMPRESS=CHAR data set option in the
CREATE TABLE statement, as shown here:
create table main.bigtable (compress=char) as
select * from connection to . . . ;

A log note tells you the percent of storage space saved when you create a SAS table with compression. When the
combined length of the columns in a table is relatively short, less than 50 bytes, then compression is unlikely to
decrease the size of the table, and it could increase the size instead. Compression is also ineffective if a table
contains just a few rows. If compression does not significantly reduce the size of a table, change the option to
COMPRESS=NO to turn compression off.

Recoding
If a database column uses long phrases to indicate a simple distinction, you might choose for the sake of efficiency
to recode the values as one-letter codes. You can do this using a column expression formed with the CASE operator
in the database query. The example below uses one-character codes in place of the descriptive phrases found in the
column REASON2.
case a.REASON2
when 'Return for Refund' then 'R'
when 'Return for Exchange' then 'E'
else ' ' end
as reason2code

If you want to display the original phrase instead of the code in SAS reporting, you can do that using a value
format. For the example above, the format can be defined using the step below:
proc format;
value $rsn2x
'R' = 'Return for Refund'
'E' = 'Return for Exchange'
;
run;

The step above creates the format $RSN2X which can be attached to the column with the FORMAT= column
modifier, as shown here:
reason2code format=$rsn2x.

This kind of recoding is extra work but can reduce the data size of a table, saving time when the program runs.

Character Transcoding
Character data can be in any of various character encodings. A database selects a specific native encoding, and then,
a table or column may have a different encoding. Similarly, there is a native encoding for the SAS session, but a
table may have been created in a different encoding. For the most part, databases and SAS can convert among these
various encodings so that text remains intact.
On occasion something goes wrong, usually because data is labeled with a character encoding that differs from
the encoding it is actually in. You may be seeing this problem when some special characters are replaced by other
special characters, seemingly selected at random.
Your first line of defense in the SAS environment is the transcode column attribute, specifically TRANSCODE=NO, to
tell SAS not to automatically convert between encodings for that column.
This attribute can be applied when a column is first created in the SAS environment. In the following example,
ADB is an imaginary database and CODE1 and NAME1 are character columns created from a pass-through query.
create table work.mytable as
select code1 transcode=no,
name1 transcode=no
from connection to adb (
database query
);

Protecting a character column from transcoding may be enough, if it allows you to deliver the character value to
an output file in the form that is expected. If you need to do more, that is, to convert a character value in a specific
way, SAS has functions for that purpose as part of its National Language Support (NLS) features.
With TRANSCODE=NO, SAS could replace an entire character field with asterisks in output, but only if you attempt to
use an ordinary character format to display the column in an ODS destination that requires transcoding. The
asterisks do not mean that the data itself has been replaced. Use a binary-friendly format such as $HEX to display
the data in this kind of column.

Strategies for Large Objects


Databases can contain large objects, also known as LOBs, in table columns. Large objects may belong to the CLOB
or BLOB data type, indicating character and binary large objects, respectively. The XML data type is a special case
of a character large object.
The potential size of a large object is effectively unlimited, and for this reason, it is better not to think of large
objects as columns, and it is better not to attempt to transfer large objects to SAS. In theory, if a large object is
smaller than 32 kilobytes, it can be converted to a character or binary data type, then transferred to a SAS character
column. In practice, this is not likely to be an effective or efficient way to move data around. There is little you
could do with a large object in SAS besides writing it out to a separate file, and that is something more easily done
in the database.
Instead, consider what properties of a large object might be useful next to the other data you are working with in
SAS. What is interesting for analysis purposes is probably not the large object itself, but its properties, such as
whether it exists, its size, and so on. These properties are easily derived using the database’s functions, and then are
simple columns that can be transferred to SAS the same as any ordinary column.
XML objects may have a larger number of interesting properties that you can obtain from the database, such as
the existence of specific named objects within the XML.
One other possibility is to obtain a fragment of the large object, such as its first 2048 bytes (2 kilobytes), and
analyze that in SAS. This can be a useful approach if important identifying information is embedded at a particular
known point in the large object, or conversely, if the initial block is likely to be representative of the entire object.
Another possibility is that the large objects are not actually very large, but are known to be a reliable fixed size
of just a few hundred bytes. In that case, they can be converted to a character data type (usually VARCHAR) in a
database SQL expression and the resulting value transferred to SAS.

The Database Library Engine


The database library engine allows you to access a database as the equivalent of a normal SAS data library.
Depending on the DBMS and the database, a LIBNAME statement might cover either an entire database or just one
operational segment or domain within the database. There could be additional options in the LIBNAME statement to
provide any additional details needed. If a LIBNAME statement covers only one part of a database, you may need a
second LIBNAME statement to cover another part of the database.
Once you have the options you need for a particular database, the LIBNAME statement can be the same every
time, and you will need to run the LIBNAME statement only once near the beginning of a SAS program or session.
The LIBNAME statement defines a libref, which serves as a short name you use to access the database tables in
the SAS program. Write the libref as the second word in the LIBNAME statement. Once defined, the libref stays
active until the end of the SAS program or the end of an interactive session.
Form the SAS data set name for a database table by combining the libref with the table name. Then use the SAS
data set name to act on the database table throughout the SAS environment. These are common actions with a
database table accessed with the database library engine:

Read a database table in a data step using a SET statement or other statement.
Read a database table in a proc step using the DATA= option.
Create a database table in a data step, writing the name in the DATA statement.
Create a database table in the CREATE TABLE statement of SQL.
Add rows to an existing database table in the INSERT INTO statement of SQL.
Remove rows from a database table with the DELETE FROM statement of SQL.

Many of the same cautions about data type and column attributes with SQL pass-through also apply when you
are using the library engine. SAS converts all SQL data types to the two SAS data types. This process happens in
reverse when you create a database table from SAS, but not necessarily in quite the same way, as there is little to tell
SAS to form a particular SQL data type for a column.
For time data, make sure you create a column with an appropriate SAS date, SAS datetime, or SAS time format,
usually indicated in a FORMAT statement. This is what tells SAS that a column contains a SAS date, SAS datetime,
or SAS time value, so that it will ask the database to create an SQL date, timestamp, or time column in a new
database table.
The following code model shows the statements involved in defining a libref for a database and creating a table
from SAS data. The resulting table is stored in the database.
libname libref DBMS database=database
user=user ID password=password other options;
proc sql;
create table libref.table as select . . . ;
quit;

Not every database will let you create tables. If you have read-only access to a database, you can still use the
library engine to process database tables in SAS procedures.
If you are accessing a database using the database library engine and want to be assured that you are not
changing the data in the database, add the ACCESS=READONLY option to the LIBNAME statement. This option in the
LIBNAME statement prevents the SAS program from making any changes to the data in the library, but still permits
you to read the data.
When you access a database using a library engine, SAS still has to send requests to the database in the form of
SQL. This process is officially known as “implicit SQL pass-through.” For contrast, the SQL components that
specifically mention a database are “explicit” SQL pass-through. Don’t get the wrong idea from the word “implicit.”
It only means that SAS itself is generating the SQL requests it is sending to the database.
A database table that you access using the database library engine may be mentioned in statements in the SQL
procedure. In this situation, you must write SAS SQL, not database SQL, even though the table is a database table.
SAS will generate database SQL from your SAS SQL to the extent that it can, and this implicit SQL pass-through
code could be somewhat different from the SAS SQL you write.
A view that refers to a table via the database library engine is the same as any other SQL view in SAS. You have
to know how the libref is defined to know that the data comes from a database. The LIBNAME statement for the
database library can be embedded in the view if necessary, the same as any other LIBNAME statement, as described
in the previous chapter.

The DATASETS and CONTENTS Procedures With the Database Library


Engine
After you have defined a libref with the database library engine, if you do not know all the details of the database, a
good place to start is the DATASETS procedure. The table produced by the procedure lists the tables and views
available in the database. If MYDB is the libref of the database library, this step lists the available tables in the
library.
proc datasets library=mydb;
quit;

Either a table or a view from a database is treated as a table by the database library engine. After you know the name
of a table, you can get more information about it, especially its columns, using the CONTENTS procedure. The use
of the CONTENTS procedure is the same as it is for a native SAS table, as seen in this example:
proc contents data=mydb.mytable;
run;

Creating and Populating a Database Table


When you need more control over the way you create a database table in SAS, you can use SQL pass-through
statements to define the table, then add data in a separate action using the INSERT INTO statement in SQL. If you
are adding data from the SAS environment, the usual scenario, it is important to remember that the INSERT INTO
statement is a SAS SQL statement, not an SQL pass-through statement. Write the INSERT INTO statement using
SAS SQL syntax.
The following code model shows the two separate actions involved in this approach.
proc sql;
connect to DBMS database=database options;
execute (
statements to define table
) by DBMS ;
insert into libref.table
select . . . ;
quit;

This approach lets you choose the exact column attributes and data types you want the table to have, consistent
with the other tables and columns in the database. Most importantly, if you will be joining the new table to a
database table, create the key column with the same data type as the existing database column you will be matching
it to.

In-Database Processing
Sometimes SAS does not merely ask the database for the data. Especially when sorting, categorizing, and summary
statistics are involved, SAS may have the database do the first stages of processing — or all of the processing if
possible. The in-database processing saves work for SAS and it may also make work easier for the database,
particularly if the database can deliver summary data instead of detail data to SAS. With less total work, the program
runs faster.
In-database processing works only for specific features of roughly two dozen SAS procedures, though the
number is increasing, and only in a few DBMSs, though that number too is going up. When in-database processing
is available for a part of a SAS program, SAS uses it automatically.
Some aspects of in-database processing are obvious enough. Sorting and creating summary data come as
naturally to a database as they do to SAS. Other details can be a bit tricky, especially when SAS functions and
formats are involved. SAS adds functions to the database to mimic the effects of SAS functions. Among these, the
SAS_PUT function is used to apply SAS formats. This includes user-defined formats, though SAS may substitute a
temporary database table instead if that is more efficient.

Remote SQL Pass-Through


SAS/CONNECT is a SAS component that coordinates SAS sessions on multiple computers. A “local” client
computer, which could be a desktop system, can send SAS statements to a “remote” server system for execution.
Among its many other uses, this can be a convenient way to manage SQL pass-through, particularly if the server
system is better equipped to handle the follow-up SAS processing that comes after an SQL pass-through query. This
combination of features is known as remote SQL pass-through.
Remote SQL pass-through is not fundamentally any different from SQL pass-through in any other context.
When setting it up, though, it is important to remember that it is the “remote” system that has to have the connection
to the database.
To program using remote SQL pass-through, write all the components of SQL pass-through between the
RSUBMIT and ENDRSUBMIT statements that mark off a section of code to execute on the “remote” system.
SAS/CONNECT and remote SQL pass-through may make sense in situations where a high-speed, low-cost
direct network connection between the user and the database is not available, but there is a well-connected server
running SAS that is available to serve as the “remote” system for SAS/CONNECT.

Database Performance Issues


It is hard to make more than a few general statements about SQL performance in relational databases. Different
DBMSs and even different databases within the same DBMS are optimized in different ways and with different
purposes in mind, so that a coding approach that is efficient in one database may be inefficient in another.
However, there are some recurring themes in database performance when connecting SAS to a database.
Relational databases have many qualities in common that may differ from what you would expect if you are more
used to the SAS way of working with data. Separately, there are some problematic points in the way SAS and a
relational database connect, and these are areas you might look at if you want to improve the performance of a
database running programs that come from SAS.
The first thing to understand about data storage in relational databases is that they tend to be column-oriented —
that is, all the values for one column in a table are stored in one place. This is the opposite of SAS. The native
library engines in SAS are row-oriented. They store all the values for one row in one place.
One of the surprising implications of a column-oriented approach to data, if you are used to thinking in terms of
rows, is that the number of columns that are present in a table may not affect the performance of a query. That is,
retrieving a column from a table involves the same actions by the database whether the table has a total of 5
columns, or 500. If a table has extra columns you are not using, you don’t have to care about them.
There is another side to this principle, however, and that is that it does matter how many columns you use in a
query. SQL makes it easy to ask for all of the columns from a table, by writing SELECT *, but retrieving all of the
columns is more work for the database than if you select just the columns you need.
When using the database library engine, some of the trickiest issues involve functions and formats. Issues with
functions arise most often when a function is used in a WHERE clause. In the form of a WHERE statement or data
set option, a WHERE clause can appear in any SAS step that reads data from a database table or SAS data set. The
example below shows one way this might occur. The CORR procedure measures correlations among numeric
variables. Imagining that DBA is the libref of a database that supports in-database processing for the CORR
procedure, SAS will look for a way to translate the entire step, including the SAS function calls and the rest of the
WHERE clause, to SQL that can execute in the database. Or, if SAS does not support in-database processing for the
particular procedure in the particular DBMS, it will still try to execute the WHERE clause and select only the
needed columns in-database.
proc corr data=dba.dxtranshist;
where substr(transtypcd, 1, 3) = '011'
and year(transdt) = 2014;
run;

The SUBSTR function call extracts the first three characters of the TRANSTYPCD column for a comparison. The
YEAR function call determines the year of the date indicated by the column TRANSDT. In order to be able to pass
this condition through to the database, SAS may replace these functions with corresponding database functions, or it
may add versions of the SAS functions to the database.
Depending on the details of the DBMS, database, and table, a WHERE condition that contains a function may
take longer to run, and some functions may run faster than others. Consider the SUBSTR function call in the
example. SAS might use the SUBSTR function or another similar function in the DBMS, it might add a version of
the SAS SUBSTR function to the DBMS, or it might replace the function call with an expression using the LIKE
operator. With the YEAR function call, since the objective is to select a range of dates, SAS has another option, to
replace the function call and comparison with an expression using the BETWEEN-AND operator, the equivalent of
transdt between '01jan2014'd and '31dec2014'd

Only a limited set of SAS functions have been converted to work in a database. If a SAS function is not available
in the SAS/ACCESS interface to the database you are using, SAS will instead retrieve the needed data from the
database and apply the function in the SAS environment. If the amount of data is large, this could take noticeably
longer because of the need to move the data from one place to another. Whenever a WHERE expression with
function calls is being applied to a database table via a database library engine and the step is taking a long time to
run, it is worth considering what is happening to the function calls, and perhaps revising the expression so that it
does not use functions that might be slow to execute in the database, or that are not available in the database. Log
notes provide some information about the steps SAS is taking as it attempts in-database processing. Database SQL
optimization is a relatively opaque process to begin with, but it helps to know that SAS function calls are a possible
sticking point.
The PUT function presents particular difficulties when it appears in a WHERE expression that SAS wants to
pass to a database. The PUT function applies a SAS format, which could even be a user-defined format, to a value. It
is hard to pass this task to a database because databases don’t know anything about SAS formats.
User-defined value formats might seem to be the most difficult, but they are usually not so problematic. They
represent a limited set of values mapped to formatted values, so SAS can send this mapping to the database as a
temporary table. This technique works with any format when it is being applied to a limited set of values. The
REDUCEPUT= option and several related options give you some control over this process when it occurs in the
SQL procedure, and the corresponding system option applies wherever it occurs. (See chapter 9 for information on
these options.)
Formats are also important when a procedure forms groups based on the formatted value of a continuous
variable. This can happen, for example, when temperature values such as 26.02 and 26.33 are formatted with no
decimal places so that they appear as 26. If this formatted temperature variable is a BY variable in a proc step, the
procedure can treat 26 and each separate formatted value as a group so that it processes that group separately. When
SAS converts this to database SQL, it can generally reproduce the effects of the SAS format using the functions
available in the database. It still can involve a significant amount of extra computation to form groups, depending on
the database, the data values, and other factors.
9
SQL Options and Execution
Usually the words “PROC SQL” are enough to start the SQL procedure, but sometimes you may want to add options to
change the way the procedure operates. For example, if you want to write column names in double quotes,
something the ANSI standard for SQL allows for, you will need the option DQUOTE=ANSI. Add options such as this to
the PROC SQL statement. If the DQUOTE= option is the only one you need, the statement becomes:
proc sql dquote=ansi;

It is also possible to change SQL options in the middle of the step using the RESET statement. To change the
DQUOTE option in the middle of the step, the statement might be:
reset dquote=sas;

Some of the options can also be set as system options, and there are other system options that can affect SQL
processing. Many of the options, both system options and procedure options, are meant to help you with
performance tuning of large-scale SQL queries.

SQL Options
SQL options affect the way SQL statements execute in the PROC SQL step.
Write SQL options in the PROC SQL statement. To change options in the middle of the step, write the options in
a RESET statement.
These are the most useful SQL options:
EXEC

The NOEXEC option tells the SQL procedure to check the syntax of SQL statements, but not to execute them. Use
the EXEC option to start executing statements again.
PRINT

With the NOPRINT option, SELECT statements do not produce output. Use this option if you are using SELECT
statements to create macro variables, as described in the next chapter. Use the PRINT option for later statements in
the same step that produce output.
REMERGE

The REMERGE option permits remerging, for combining summary data with detail data within the same SELECT
clause. This is something SAS allows as an extension to the SQL standards. The NOREMERGE option enforces the
standard SQL rules that prohibit remerging. When a query contains both summary columns and detail columns, the
SQL procedure stops executing and issues an error message.
DQUOTE=

The DQUOTE= option determines the meaning of double quotes. With the default DQUOTE=SAS, text in double quotes is
a quoted string, representing a constant value, as usual in SAS. With DQUOTE=ANSI, text in double quotes is a name,
usually a column name, as the ANSI standard for SQL indicates. In either case, text in single quotes represents a
quoted string.
ERRORSTOP

The ERRORSTOP option tells the SQL procedure what to do after it finds an error in a statement. The SQL
procedure continues to look at statements, but with the ERRORSTOP option, it does not execute them. With the
NOERRORSTOP option, the SQL procedure continues to execute the other statements if it can.
NOERRORSTOP is the default for the SAS interactive environment, or whenever a SAS session has an
interactive user available. ERRORSTOP is the default for batch programs or whenever there is no interactive user
available. The defaults are consistent with the way SAS responds to errors in other steps of a SAS program.
The system option ERRORABEND, described later in the chapter, has a related purpose.
STOPONTRUNC

The STOPONTRUNC option prevents a SET clause from truncating data. With this option, if a character value is
too long to fit in a column, the value is discarded.
IPONEATTEMPT

This option determines the reaction when a part of an implicit pass-through query fails with a database error or
communications error. The IPONEATTEMPT option stops processing as soon as a failure occurs. The
NOIPONEATTEMPT option attempts to continue processing. NOIPONEATTEMPT is the default.
INOBS=n

The INOBS= option limits the size of the input data for a query. It limits the number of rows used from any one
source. This is similar to the effect of the OBS= option elsewhere in SAS. It provides one way to limit the scale of a
query when you are testing it.
If you use the INOBS= option at the same time as the FIRSTOBS= and OBS= system options, the INOBS=
option takes effect after the FIRSTOBS= and OBS= system options have been applied. As an example, for an input
table with 25 rows, with the system options FIRSTOBS=11 OBS=20 and the SQL option INOBS=4, a query reads rows 11
through 14 in the table.
OUTOBS=n

The OUTOBS= option limits the number of rows in the result set of a query. Use this option when testing a query if
there is a chance that output files could be excessively large.
LOOPS=n

The LOOPS= option provides the most accurate way to limit the work the SQL procedure does on any single query,
especially when you are testing new queries. It limits the number of loop iterations during query execution to the
number you specify. You can use the SQLOOPS automatic macro variable, described in the next chapter, to find out
how many loop iterations a query actually uses.
PROMPT

In an interactive session, the PROMPT option indicates to ask the user whether to continue when the limits of the
INOBS=, OUTOBS=, and LOOPS= options are reached. In the default behavior of the NOPROMPT option, query
execution stops automatically when the limits of these options are reached.
CONSTDATETIME

The CONSTDATETIME option affects the way the SQL procedure interprets functions that get the current time
from the operating system clock. These are the DATETIME, TIME, and DATE functions. With the
CONSTDATETIME option, the SQL procedure checks the time once, when the query is starting. This keeps the
time value the same in all rows of the query. This is the default behavior. With the NOCONSTDATETIME option,
the SQL procedure checks the operating system clock anew for every row in the query. This makes it possible for
different rows to show different times, but at a considerable performance penalty. The whole query runs slower and
uses more energy because of watching the clock on every row. Use the NOCONSTDATETIME option together with
the DATETIME function if audit rules require a record of the exact execution time of each row a query generates.
REDUCEPUT=

The REDUCEPUT= option and two other options control some optimizations the SQL procedure makes for queries
that use the PUT function. Optimizations for the PUT function are important, particularly when it appears in a
WHERE clause, because this function can take more time to run than most other SAS functions.
The REDUCEPUT= option has four possible values that turn PUT function optimization on and off. ALL
optimizes all PUT function calls. DBMS optimizes PUT function calls for which the value argument is a DBMS
column. BASE optimizes PUT function calls for which the value argument is a SAS column. NONE turns off all PUT
function optimization.
PUT function optimization replaces the PUT function call with a table in memory that provides the results of the
function call for each possible value argument. This saves the most time when the number of different values used
as arguments is much less than the number of rows processed in the query. SAS considers PUT function
optimization only when the number of distinct values is less than the limit set by the REDUCEPUTVALUES=
option.
In some query column expressions, the PUT function might be called with a single argument, the current date
and time provided by the DATETIME function. As an example of this, imagine a query that contains this column
expression:
put(datetime(), datetime22.) as run_timestamp

To speed up execution of the query that contains this column, use both the CONSTDATETIME option and the
REDUCEPUT=ALL or REDUCEPUT=BASE option. These two optimizations together convert this entire column expression to a
constant value. The query runs much more efficiently this way.
REDUCEPUTVALUES=n

The REDUCEPUTVALUES= option sets an upper limit on the number of distinct values used in PUT function
optimization. The default is 100. You can set higher values up to 3000.
REDUCEPUTOBS=n

The REDUCEPUTOBS= option sets a minimum number of rows in a table for PUT function optimization.

UBUFSIZE=n or BUFFERSIZE=n
The UBUFSIZE= option sets the buffer size for temporary objects the SQL procedure uses, especially when joining
tables, forming groups, and comparing sets for the INTERSECT and EXCEPT set operators. With larger values,
query execution tends to use more memory, but large queries may run faster. Prior to SAS 9.4, this option was
known as BUFFERSIZE.
UNDO_POLICY

An SQL statement can lead to a prohibited action on a SAS table or view. An example of this is attempting to add a
duplicate row to a table that is defined such that it does not accept duplicate rows. SAS attempts to reverse these
actions, and the UNDO_POLICY option determines how far it goes in undoing the actions. The possible values are
NONE, OPTIONAL, and REQUIRED. The default is REQUIRED.

With NONE, SAS does not attempt to undo the actions. This is fastest but can leave
discrepancies in the affected data.
With OPTIONAL, SAS reverses the actions that can reliably be reversed, but it does not
attempt to reverse actions where that cannot be done reliably.
With REQUIRED, SAS reverses the actions to the extent that it can.
FEEDBACK

FEEDBACK shows the SQL query in the log, modified to show the way SAS understands the query. The log shows
the query with:

parentheses added to indicate the order of precedence of operators


the column list symbol * replaced with a list of columns
the source code of an SQL view as a subquery
macro language references resolved

If you want to replace the symbol * in a query with an actual list of columns, the NOEXEC and FEEDBACK
options provides an easy way to generate that list.

, ,
STIMER SORTMSG NOWARNRECURS

Several more SQL options affect only the log messages that the SQL procedure generates as it executes statements.
STIMER adds performance measurements for each SQL statement. With NOSTIMER, the SQL procedure
reports performance measurements only for the step as a whole, depending on the related STIMER system option.
The level of detail of these notes is controlled by the FULLSTIMER system option, described later in the chapter.
SORTMSG adds messages about sorting that occurs as a query is executed.
NOWARNRECURS takes away the warning message about recursive references in a query.

A Macro Variable Option


SYS_SQLSETLIMIT is an automatic macro variable that acts as an option specifically for the SQL procedure.
When the SQL procedure cannot optimize a table join using indexes, it might create a hash table in memory
containing the data from the smaller table in the join, a technique called a hash join. The value of the
SYS_SQLSETLIMIT macro variable limits the size of the hash table created in this way. The default value is 1024.
Set a larger value if you have a large amount of physical memory available and want to make larger hash joins
possible.
Set this macro variable in a %LET statement, for example:
%LET SYS_SQLSETLIMIT = 2048;

Hash joins are also affected by the UBUFSIZE option already mentioned. Use a larger value for the UBUFSIZE
option to make hash joins occur more often.

System Options for SQL


System options provide a way to control details of the behavior of the SAS environment in general. Some system
options, though, have particular relevance for SQL.
Write system options in an OPTIONS statement before the PROC SQL statement or before any SQL statement.
These system options have the same effect as the corresponding SQL options:

SQLREMERGE: see REMERGE


SQLCONSTDATETIME: see CONSTDATETIME
SQLREDUCEPUT: see REDUCEPUT
SQLREDUCEPUTVALUES: see REDUCEPUTVALUES
SQLREDUCEPUTOBS: see REDUCEPUTOBS
SQLIPONEATTEMPT: see IPONEATTEMPT
SQLUNDOPOLICY: see UNDO_POLICY

There are two more system options of note specifically for SQL. These two options mainly affect in-database
processing.
SQLGENERATION=

The SQLGENERATION option tells SAS when to use in-database processing (that is, when
qualifying procedures should generate SQL). SQLGENERATION=DBMS turns in-database processing on.
SQLGENERATION=NONE turns it off. More complicated forms of this option allow you to turn in-database

processing on and off for specific DBMSs and for specific procedures.
SQLMAPPUTTO=

The SQLMAPPUTTO option makes SAS formats part of in-database processing. SAS duplicates
the effects of formats in a database by adding a new function, SAS_PUT, to the database.
enables the use of this function. SQLMAPPUTTO=NONE turns off in-database processing for
SQLMAPPUTTO=SAS_PUT
SAS formats.
This option affects both SQL pass-through and in-database processing, but it is especially important
when in-database processing depends on formats for grouping.

Many other system options that apply to the SAS environment in general are useful when you are working in
SQL.
OBS=n

The OBS= and FIRSTOBS= options have the same effect in SQL as elsewhere in SAS. They limit
the number of observations or records used from any input source. The OBS= option tells a step to
stop looking for more input data after a specific number of input observations or records have been
obtained.
The option OBS=1000, for example, tells a step to stop after reaching 1,000 input records from a text
file or 1,000 input observations from a SAS data set. This can be a way to ensure that a step does not
run for a long time even if the input data is unexpectedly large, or it can be a way to verify the
processing of a program using a small part of the available input data. The OBS= option has no
effect if the number of input records or observations is less than the value of the option. Write
OBS=MAX, the default, to process all available input data.

In SQL, the OBS= option limits the number of rows used from each input table. In queries that have
only one input table, it is easy to imagine the effect of this option. You can be sure a certain number
of rows will be processed (assuming that many rows are present in the table). When a query
contains table joins or subqueries, the effects of the OBS= option can be harder to predict. A setting
such as OBS=1000 can result in getting no rows at all in the result set, depending on the way the tables
are joined, the arrangement of data values within the table, and the details of the way the query is
executed. Or, in an incorrectly written query, the result set may still contain millions of rows in spite
of the OBS= option. The available SQL options provide a more reliable way to limit the size of an
SQL query.
FIRSTOBS=n

If the OBS= option tells SAS not to process input beyond a certain extent, the FIRSTOBS= option
tells it not to start processing input until a certain input observation or record is reached. The
FIRSTOBS= and OBS= options may be used together to tell a program to process a particular slice
of the input data. For example, the combination FIRSTOBS=901 OBS=1000 may pick out a range of 100
input records. The FIRSTOBS= option must be used with caution, however. If the value of the
FIRSTOBS= option is larger than the extent of the input data, then no data at all is processed. Use
the default setting FIRSTOBS=1 OBS=MAX to restore normal processing.
If the effect of the OBS= option is not always easy to predict in SQL, the effect of the FIRSTOBS=
option is less so, and most of the time, it should not be used together with SQL statements. On the
other hand, the identically named data set options apply these limitations on observations to
individual tables, and this can sometimes be useful in SQL; see “Data Set Options” in chapter 7.
STIMER

With the STIMER system option, SAS reports performance measurements for each step in a SAS
program, including each PROC SQL step.
When performance is a concern, it usually also makes sense to report the performance
measurements of each separate SQL statement, using the STIMER SQL option described earlier in
the chapter.
FULLSTIMER

The FULLSTIMER system option adds many more details to the performance statistics reported in
the log with the STIMER option. It affects the notes generated by the STIMER system option and
those that result from the STIMER SQL option.
SORTEQUALS

The SORTEQUALS option tells SAS, when it sorts rows that have equal key values, to keep the
rows in the order they were in originally. This option primarily affects the SORT procedure, but it
also affects sorting done in the SQL procedure.
There are other options for tuning sorting, and these options also affect sorting done in executing
SQL statements.
YEARCUTOFF=

If you work with data that has 2-digit year numbers, such as 13 representing 2013, use the
YEARCUTOFF= system option to determine which years the 2-digit years represent. The SAS 9.4
default, YEARCUTOFF=1926, means that two-digit years are considered to belong to the span from 1926 to
2025. Pick a more recent year for this option if you have 2-digit years that indicate 2026 or later
years.
ERRORABEND

Similar to the ERRORSTOP option mentioned earlier, the ERRORABEND system option tells SAS
to stop processing as soon as an error occurs. With the ERRORABEND option, SAS responds to an
error by ending immediately and returning an error code to the operating system. This option is
typically used with production jobs where the system error code may signal the job scheduler not to
start the next scheduled jobs, so that operators can intervene to identify and correct the cause of the
failure and restart processing at the appropriate stage.
REPLACE=

The REPLACE= option determines what happens when a program tries to create a new table or
view and a file with the same name (and of the same kind) already exists. Ordinarily, SAS lets you
replace the existing file. It deletes the old file as soon as the new one is successfully created. This is
the behavior indicated by the REPLACE=YES option. This approach makes it possible to run the same
program again. To prevent a program from accidentally replacing an existing table or view, use the
REPLACE=NO option. Then SAS stops with an error message if you try to replace an existing table or

view. Even with REPLACE=NO you can still replace tables and views in the WORK library.

Quite a few system options affect SAS output. These are discussed separately in chapter 5.

SQL Execution
The SAS SQL execution process can be broken out broadly into four main phases, focused on the program, data
files, data, and output documents. SQL options and system options may point toward specific phases of SQL
processing.

Program
SAS starts with the program statements. A PROC SQL step can contain any combination of SAS global statements
and SQL statements, along with comments. SAS processes only one statement at a time. It looks at the first word to
determine whether the statement is a global statement or an SQL statement. SAS executes the global statements
separately, so only the SQL statements execute within the SQL procedure.
Options such as EXEC, REMERGE, and DQUOTE affect this phase of processing — they affect the way SAS
understands and responds to the SQL code.

Files
Assuming that it has a correctly formed SQL statement, SAS has to open the input files it refers to and verify that
objects such as columns and views are present. At this stage, it also creates any new files that are needed, including
new tables and the columns they contain. If a query uses functions and formats, SAS verifies that those routines are
available.
Data set options and related system options may affect the way SAS accesses tables in this phase of processing.

Data
The most time-consuming work in SQL, especially for a query, has to do with the actual data values. It can be hard
to know exactly what actions are involved, as SAS’s query optimizer decides on the sequence of actions, particularly
what indexes to use and what sequence to follow when joining and subsetting tables.
Many of the most important actions in a query are passed off to other routines in SAS. Any sorting that is needed
is done by the sort engine; grouping with aggregate functions is done by the summary engine; access to data in
tables is accomplished via library engines. In almost every query, this will be more than half of the work of the
query.
To the extent that a query relies on functions and formats, those routines are also part of the mix.
Options that affect the size of data, the extent of processing, or processing strategies affect this phase of
processing.
At the end of this phase, SAS closes any files it read or wrote along the way.

Output
In the case of a SELECT statement, the statement execution creates an output document that contains the result set
of the query. Before you see the results, they are shaped, formatted, and formed into a document file by ODS. All
ODS statements and options have their effect at this stage of processing.
10
Macro Variables for SQL
Macro language is a preprocessor language in the SAS environment. It can be used to put together SAS statements
and SQL statements, including SQL pass-through statements that are passed along to a DBMS for execution. It is
also something of a programming language in its own right, able to inspect files and adjust its actions based on the
existence of a file or properties found in a file.
It is the macro processor that makes macro language happen. The macro processor executes macro statements,
macro functions, and macros and converts macro language objects to SAS and SQL statements. This chapter
describes only a very limited part of the functionality of macro language, focusing on features that specifically relate
to SQL.

Working With Macro Variables


When you are working in SQL, the most interesting and useful macro language object is the macro variable. A
macro variable can be created or modified in a %LET statement, as in the example below, which assigns a value to
the macro variable DBSTATUS.
%LET DBSTATUS = YELLOW;

Name Prefixes
When you create macro variables, avoid using names that begin with SYS and SQL. This
is the easy way to avoid conflicts with SAS’s automatic macro variables that have names
that begin with SYS and SQL.

After a macro variable is created, you can use it as part of a statement by writing a macro variable reference.
Write the macro variable name prefixed with an ampersand and, usually, followed by a period. This is a reference to
the macro variable DBSTATUS:
&DBSTATUS.

To see the value of a macro variable, write the macro variable in a %PUT statement. The %PUT statement
writes a line in the log, and the message it writes can contain macro variables. This example writes a log note that
includes the value of the macro variable DBSTATUS:
%PUT Current status: &DBSTATUS.;

If the value of DBSTATUS is YELLOW, the log line that results is:

Current status: YELLOW

SAS creates many macro variables automatically. A few of these, specifically related to SQL or useful in SQL
work, are described next. There are times when it makes sense to use macro variables as terms in a query. For this,
you will usually want to create your own macro variables. Macro variables can be created in the %LET statement, as
mentioned. They can also be created from the results of queries, as described later in the chapter.

Automatic Macro Variables for SQL


These automatic macro variables are specifically related to SQL execution.
SQLOBS

A count of the rows in the result set. Specifically, this can be:

the number of data rows in the output table produced by a SELECT


statement
the number of rows in the table created by the CREATE TABLE statement
1, for a SELECT statement that does not create a result set because of the
NOPRINT option (and does not create macro variables either)
1, for a SELECT statement that creates a macro variable containing a single
value
the number of macro variables or values generated by a SELECT statement
that creates a list of macro variables
the number of rows added to or removed from an existing table by an
INSERT or DELETE statement
1, for a statement that does not produce a result set
0, for a CREATE VIEW statement
SQLOOPS

The number of loop iterations that occurred in processing the query. This is a measure of the work
that SAS does on a query. It can be used to determine an appropriate limit on the scale of SQL
queries for use in the LOOPS= option (see chapter 9).
SQLRC

The SQL return code, an error code indicating the nature of the outcome of executing the SQL
statement. A value of 0 indicates a successful execution. Positive integer values indicate varying
degrees of problems. The next section discusses return codes in more detail.
SQLXRC

The return code from a statement or query executed by a DBMS. These values are generated by the
DBMS.
SQLXMSG

The error message from a statement or query executed by a DBMS. These messages are generated
by the DBMS. Ordinarily, this macro variable contains no text if a statement or query is successful.
SQLEXITCODE

The highest return code from certain statements that add rows to existing tables. At the end of the
step this becomes the value of the SYSERR macro variable.

Check the values of these macro variables immediately after the SQL statement that interests you, because they
will change again when the next query executes.
The example below shows writing the values of SQLOBS and SQLOOPS in the log, then assigning the value of
SQLRC to another macro variable for later use.
create table work.grid as
select a.x, b.y
from work.xlist a cross join work.ylist b;
%PUT Rows: &SQLOBS. Loops: &SQLOOPS.;
%LET GRIDRC = &SQLRC.;

The automatic macro variables that begin with SQL are created by the SQL procedure, unlike ordinary automatic
macro variables that are maintained by the macro processor. What this means in practical terms is that the macro
variables do not exist until after you have executed SQL statements in a SAS session. It may result in a macro error
if you try to test these macro variables without first executing the SQL statements that they refer to.

SQL Return Codes


SAS generates a return code with every execution of an SQL statement. The return code value is 0 after a statement
executes successfully, or a positive integer code value after a statement that runs into problems. The return code is
available in the macro variable SQLRC.
These are the possible SQL return code values and what they mean:

0
Success. Successful execution of a statement.
4
Warning. SAS is not sure it executed the statement the way you intended.
8
Error message. SAS could not execute the statement because of problems with syntax or semantics.
12
Internal error. A bug in SQL compilation.
16
Error condition. Execution stopped because the data was not consistent with the way the query was
written.
24
System error. Usually, a problem with a file, or running out of space.
28
Internal error. A bug in SQL execution.

Using return codes typically requires macro programming with macro control flow statements. These are macro
statements that execute inside macros to determine the flow of a program based on conditions of macro variables.
For those who are familiar with macros, here is a simple example of a macro statement that checks the value of the
SQL return code:
%IF &SQLRC. NE 0 %THEN %GOTO E0100;

Other Automatic Macro Variables


These other automatic macro variables may be useful when you work in SQL.
SYSLAST

In general, the SYSLAST macro variable contains the name of the most recently created SAS data
set. After a CREATE TABLE or CREATE VIEW statement executes, it tells you the name of the
table or view that was created.
SYSENCODING
The SYSENCODING macro variable provides the character encoding of the SAS session. This may
be important to know if character data includes characters other than the ones that appear on the
standard computer keyboard.
SYSUSERID
SYSJOBID

The SYSUSERID macro variable indicates the current user or account logged into the operating
system, the same as the special USER column name in SQL. The SYSJOBID macro variable
indicates the job ID or process ID in which the SAS program is running. In some circumstances
SYSUSERID and SYSJOBID could be the same.
SYSDATE9
SYSTIME

The SYSDATE9 and SYSTIME macro variables provide the current date and time, as of the start of
the SAS session, in a form suitable for use in a constant value.
SYSDATE is the same as SYSDATE9, except that it uses a two-digit year.
These are models of column expressions that use the SYSDATE9 and SYSTIME macro variables to create
columns in SQL:
"&SYSDATE9.:&SYSTIME."dt as session_timestamp format=e8601dt.,
"&SYSDATE9."d as session_date format=e8601da.,
"&SYSTIME."t as session_timeofday format=e8601tm.

Writing Queries With Macro Variables


If you find yourself wishing you could revise the terms of a query or other statement for each execution of a
program, a macro variable might be the answer.
SQL statements can be written with macro variables that provide specific terms. For example, if TABLENAME
is a macro variable that contains the name of a table and INCOMELIMIT is a macro variable that contains a
constant value representing an income amount, then this could form a valid SQL statement:
create table &TABLENAME. as
select * from main.pilotgroup
where income <= &INCOMELIMIT.;

To use a macro variable as a character result column in a query, or as any other kind of quoted constant value,
write the macro variable reference in double quotes. The macro processor resolves macro language references inside
double quotes, but it cannot see inside quoted strings that are quoted with single quotes.
Use the DQUOTE=SAS option, if necessary, to ensure that strings enclosed in double quotes are treated as constant
values. This is the usual behavior in SAS, but it is not necessarily what you expect in SQL.
The following example uses a macro variable to add a constant value to the result set of a query.
proc sql dquote=sas;
create table work.hotlist as
select *, "&SYSDATE9."d format=date9. as reportdate
from main.newreport
where priority = "Hot";

This example uses the automatic macro variable SYSDATE9 to form a constant value like "01JAN2014"d, a SAS
date constant containing the current date. This is treated as a constant value during the execution of the query
because the macro processor turns it into a constant, but it differs from the usual idea of a constant value because the
value may change between one execution and the next.

Creating Macro Variables From Data Values


A SELECT statement can create macro variables using a feature of SQL for working with host variables. In the SAS
environment, the host variables are macro variables.
The INTO clause, which follows the SELECT clause at the beginning of a SELECT statement, provides the
names of the macro variables. Write a colon before each macro variable name. This is SQL’s way of marking host
variables as something different from SQL columns and aliases.
Write the INTO clause only in a SELECT statement. SAS ignores the INTO clause if you write it in a query in
any other statement.
The NOPRINT option is usually used with a SELECT INTO statement so that the result set is not also displayed
as an output table. Write the NOPRINT option in the PROC SQL statement or a RESET statement.
In the simplest case, the query produces a result set of a single value, one row and one column, which is assigned
to one macro variable. The example below uses a WHERE clause to limit the result set to one row. This example
assigns the text of the first title line to the macro variable TITLETEXT1, perhaps because the program will be
restoring that title line in a later TITLE1 statement.
proc sql noprint;
select text
into :TITLETEXT1
from dictionary.titles
where type = 'T' and number = 1;
%PUT &TITLETEXT1.;
quit;

The macro variable created in this example is TITLETEXT1. This macro variable is available for use immediately
after the SELECT statement completes, as seen in the example, where the %PUT statement writes the value of the
new macro variable in the log.
If the macro variable does not already exist, the SELECT statement creates a new macro variable with the name
indicated and the value that results from the query. If a macro variable with that name exists, the SELECT statement
assigns the resulting value to that macro variable.
Macro variables contain only text. If you assign a numeric value to a macro variable, SAS converts it to text
using the column’s format attribute. If there is no format attribute, SAS uses the BEST8. format, which represents
the value as precisely as it can in 8 characters. Ordinarily, you will want to remove the leading and trailing spaces
from the result before assigning it to the macro variable. Write the TRIMMED option after the macro variable name
to accomplish this.
The previous example used a WHERE condition to produce a result set with just one row. A query based on a
summary function, with no GROUP BY clause, also produces a one-row result set.
The example below counts the number of title lines. It uses the summary function COUNT(*) to count the rows in
the table DICTIONARY.TITLES that represent title lines (indicated by the code T). It assigns the resulting number,
converted to text without leading or trailing spaces, to the macro variable TITLECOUNT.
proc sql noprint;
select count(*) into :TITLECOUNT trimmed
from dictionary.titles
where type = 'T';
quit;

Getting Trimmed
The TRIMMED option was added in SAS 9.3. In previous releases, the INTO clause
always includes leading and trailing spaces from the SQL column when creating single
values in macro variables. When it is necessary to remove the leading and trailing spaces
from the macro variables, you can do so in subsequent macro statements.

To create multiple macro variables from the same query, write a SELECT clause with a list of columns, then an
INTO clause with a corresponding list of macro variables.
This example retrieves a row from a table and creates the macro variables PROT, MODEL, and TRS from three
of the columns.
proc sql noprint;
select protocol, model, threshold
into :PROT, :MODEL trimmed, :TRS trimmed
from main.research
where study = '101931';
quit;

SAS uses a column’s format attribute when it creates the text that it assigns to a macro variable. You can select a
specific format by writing the FORMAT= column modifier in the SELECT clause. Formats are especially important
for date values, as shown in this example, which assigns the same column twice, without and then with a format
attribute.
select date1, date1 format=e8601da.
into :NDATE, :FDATE
. . .
%PUT &NDATE. is &FDATE.;

19667 is 2013-11-05

Combining Columns in a Macro Variable


Write a column expression using one of the concatenation functions, followed by an INTO clause, to combine the
values of two or more columns in a single macro variable.
Use the CATX function to combine values with a delimiter between them. Often, as in the example below, the
first argument is the constant value ', ' representing a comma followed by a space. (Be careful not to confuse this
comma, a constant value enclosed in quotes, with the commas that separate the arguments to the function.) This
argument is used as the delimiter added between the other arguments. The CATX function converts numeric values
to character values and removes leading and trailing spaces so that it creates a well-formed text list with any
combination of arguments.
In this example, the SELECT and INTO clauses combine the values of the columns PRE_STAGE,
LIVE_STAGE, and POST_STAGE into the macro variable STAGES.
select catx(', ', pre_stage, live_stage, post_stage)
into :stages

If the values are 1, 2, and 3, the value of the resulting macro variable is:

1, 2, 3

If you need to combine values with no other characters to separate them, use the CATS function. This function is
similar to CATX but it does not use a delimiter and does not have a delimiter argument. It removes leading and
trailing spaces and combines values with no characters added between. This revises the example above to use the
CATS function:
select cats(pre_stage, live_stage, post_stage)
into :stages

The revised result is:


123

Writing Results in the Log


SAS writes the results of a SELECT statement to the current ODS destination. This makes sense if you have a table
to show, but if you have just one or two values that measure the results of a program, you might prefer to show them
in the log instead. This is one possible use of the INTO clause. Create macro variables of the results with an INTO
clause, then write %LET statements to form the values into log messages.
This example computes two statistics and writes them in the log.
proc sql noprint;
select
count(*), sum(area)
into :rowcount trimmed, :total trimmed
from geo.iceberg;
quit;
%PUT Number of icebergs: &ROWCOUNT.;
%PUT Total area: &TOTAL.;

Number of icebergs: 150


Total area: 45

A SELECT clause that contains only summary functions always returns a row of results, but other queries
intended to generate one row might come up empty. This happens when the input table is empty or when no row
matches the condition of a WHERE clause. The %LET statements at the beginning of next example ensure that the
intended macro variables exist even if there is no matching row. The values provided in the %LET statements are
the values that appear in the message if the SELECT statement does not generate any rows and does not assign
values to the macro variables.
%LET PRECURS = None;
%LET TRANSMOD = Unknown;
proc sql noprint;
select precursor, transfermode
into :PRECURS, :TRANSMOD
from main.bond
where valence = 'X443AA';
quit;
%PUT Precursor: &PRECURS.;
%PUT Transfer Mode: &TRANSMOD.;

If the result set is one row in which the values are 1505 and 1755, the resulting log lines are:
Precursor: 1505
Transfer Mode: 1755

If the result set is empty, the program still runs correctly, and the log lines show the default values that were set
beforehand:
Precursor: None
Transfer Mode: Unknown

Creating a List of Values in a Macro Variable


The simple form of the INTO clause assumes that there is only one row in the result set. What happens if there are
multiple rows?
If the result set of a SELECT INTO statement has multiple rows, ordinarily only the first row is used in
assigning a value to the macro variable. Which row appears first may be determined by the sequence of rows in the
source table. If necessary, an ORDER BY clause may be added to the SELECT statement to determine which row
comes first.
If you want to use more than one row or all available rows when creating macro variables, there are two ways to
do that.

Assign multiple values as a list to a single macro variable. This uses the SEPARATED BY
clause and is described here.
Assign values to a range of macro variables. This requires macro variable names with
numeric suffixes and is described next.

Write the SEPARATED BY clause as an option after a macro variable name in an INTO clause. The
SEPARATED BY clause serves two purposes. It tells SAS to assign the values from multiple rows as a list in a
single macro variable. It also tells SAS what character string to use as the delimiter that separates two values.
Usually the delimiter is a comma, a space, or both, so the clause is one of these:
separated by ','
separated by ' '
separated by ', '

The following example demonstrates the SEPARATED BY clause. It uses the OUTOBS= option to limit the
number of rows used in the result. The STRIP and QUOTE functions quote each value to create a quoted list of
values.
proc sql noprint outobs=6;
select quote(strip(name))
into :namelist separated by ', '
from chemical.element;
quit;
%PUT &NAMELIST.;

"Hydrogen", "Helium", "Lithium", "Beryllium", "Boron", "Carbon"

By default, SAS removes leading and trailing spaces when creating macro variables with a list of values. Add the
NOTRIM option if there is a reason to keep leading and trailing spaces. The NOTRIM option can be used with
either the SEPARATED BY clause described above or a list of macro variables, described next.

Creating a List of Macro Variables


An alternate approach, when you have a list of values, is to generate a corresponding list of macro variables. Instead
of writing a single macro variable name, indicate a list of macro variable names with numeric suffixes.
Write an open-ended list by writing a dash (the hyphen character) after a macro variable name that ends with a
numeral (usually 1). Consider this example:
into :font1-

This generates the macro variables FONT1, FONT2, FONT3, and so on, as many as it takes to hold the values that
result from the query.
If the result set has multiple columns, provide a list of macro variables for each column. The example below
selects three columns, which it places into three corresponding macro variable lists.
select key, rank, id
into :key1-, :rank1-, :id1-

Generating an enormous number of macro variables might not be what you intend, but that is what might happen
if the result set has more rows than you expect. To limit the number of generated macro variables, write a final
macro variable after the dash. Use the same root, but write a numeric suffix you choose as a stopping point. For
example, to limit the number of generated macro variables to 160, write:
into :font1-:font160

The number of generated macro variables is still no greater than the number of rows generated by the query. So in
this example, if the query produces 20 values, it generates only the macro variables FONT1 through FONT20. But if
the query produces 2,000 values, only the first 160 are saved in macro variables.
The dash symbol in the INTO clause may instead be written as the word THROUGH or THRU.
Becoming Open-Ended
The open-ended macro variable list was introduced in SAS 9.3. In previous releases you
must indicate the end of the range for the macro variable list in the INTO clause.
However, you can provide a large number as the final numeric suffix.

The SQLOBS macro variable mentioned earlier in the chapter can be especially useful in this context.
Immediately after the SELECT statement executes, it tells you how many rows were available in the result set for
generating macro variables. This lets you determine how many macro variables were created.
The SQLOBS macro variable changes with every SQL statement that executes, so if you will be using its value,
immediately assign it to another macro variable, in a macro statement such as this one:
%LET FONTCOUNT = &SQLOBS.;

This new macro variable is then available for use as a loop control, perhaps in a macro %DO statement such as:
%DO I = 1 %TO &FONTCOUNT.;
Appendix 1
The SAS Data Model

Files as SAS Objects


A SAS program usually does not refer to files directly by the physical file names that the file system uses. Instead,
the program first creates SAS names as internal identifiers for the files. This indirect approach allow programs to
move from one computer to another (or data, from one storage volume to another) with only minimal changes in the
code.
SAS treats files as belonging to two main categories: text files and SAS files. Binary files are treated as text files,
as are certain kinds of devices that might be available on a computer. A few parameter files that SAS uses are text
files, and SAS program files can be declared as text files. The category of SAS files includes all of SAS’s special
types of files.
A SAS identifier for a text file is a fileref. Define a fileref in a FILENAME statement. This example defines the
fileref ORIGINAL:
filename original 'original.txt';

The new fileref ORIGINAL is associated with a file whose physical name is original.txt. After the
FILENAME statement executes, you can use the fileref ORIGINAL to refer to that file in the SAS program, or
elsewhere in the SAS environment. If necessary, you can use another FILENAME statement later to clear or
redefine the fileref. Filerefs can be defined in the same way for directories and for devices such as printers and email
servers.
When you define a fileref, it does not matter whether the physical file exists. In the case of a new output text file,
SAS creates a file to go with the fileref.
The special SAS identifier may be a good idea for a text file, but it is required for a SAS file. SAS accesses SAS
files in groups, collections called libraries (or SAS data libraries). In most operating systems, a library is a directory.
Before you can use a library, you must define a libref. You can do this in a LIBNAME statement, which works
much like the FILENAME statement. This example defines the libref MAIN for the SAS library contained in a
particular directory:
libname main '/projects/main/data';

When you create a new library, create its directory before you assign the libref. Use the appropriate operating
system command or utility program to create the directory. After the libref is assigned, you can use that short name
whenever you need to refer to the library in the SAS program or elsewhere in the SAS environment.
Notice that the libref is not physically or permanently attached to the library. Nothing prevents you from using
multiple librefs to refer to the same library, or from using the same libref later to refer to a different library. You
might assign one libref to a library in one program, then assign another libref to the same library in a different
program. For example, you can create a library with the libref NEW, then access it later using the libref OLD.
It is possible for a libref to span multiple directories. First define a libref for each directory, then concatenate
them using this form of the LIBNAME statement:
libname multi ( dir1 dir2 dir3 dir4 );

SAS files are referred to as members of the library they are stored in, and they are usually referred to with two-
level names. The libref is the first level of the name, and the SAS file’s member name is the second level. For
example, MAIN.ACTION is a SAS file with the member name ACTION, stored in the SAS library associated with
the libref MAIN.

Member Types
There are several types of SAS files, identified by their different member types. The most important member types
are DATA, VIEW, CATALOG, and ITEMSTOR.

SAS Data File


The member type DATA identifies a SAS data file, a file created by SAS software that stores data in
a format that is often referred to as a table. It contains not just data values, but also identifying
information that lets you use the data very easily in a SAS program without having to know all the
details about the file format.
A SAS data file is a kind of SAS data set, which means that it is organized in a way that makes it
readily accessible in the SAS environment. As a SAS data set is conceptually organized, it has
columns of data, called variables, and it can have any number of rows of data, called observations.
In addition to the values of the variables, it includes identifying information about each variable,
such as the name of the variable. Procedures, the large-scale routines that do the most interesting
work in SAS, are designed to work only with data that is organized as a SAS data set.
View
A view also organizes data, but it is usually data that is physically stored somewhere else, which
might be a text file, a database, or one or more SAS data files. A view is also a kind of SAS data set.
It provides a means to organize data from various sources in a way that SAS routines can use,
without having to physically rewrite the data as a SAS data file.
Catalog
A catalog is a SAS file for storing various kinds of objects and resources that are used in the SAS
environment. The objects and resources that are stored in a catalog are called entries. Macros and
formats are two examples of objects used in a SAS program that are stored as entries in a catalog.
Item Store
An item store, with the member type ITEMSTOR, is much like a catalog, but with a hierarchical
structure. Its objects can contain other objects.

It is possible for SAS files of different types in the same library to have the same member name. However,
because both views and SAS data files are used as SAS data sets, you cannot have VIEW and DATA members with
the same member name. Most SAS statements and options that use SAS files use only SAS data sets or only one
particular member type, so it is not necessary to state the member type to identify the correct SAS file. But in utility
procedures that work with all types of SAS files, it may be necessary to use the MEMTYPE= or MTYPE= option to
identify the member type of a SAS file in order to uniquely identify the file.

Catalogs and Entries


A catalog is essentially only a container for the entries it holds. When you identify a catalog entry, it is usually
necessary to provide the the two-level name of the catalog combined with the entry name and the entry type. This
results in a four-level name for the entry, which has the form
libref.catalog.entry.type
where libref.catalog is the two-level name of the catalog, consisting of the libref and member name of the catalog;
entry is the entry name; and type is the entry type, which indicates what kind of object the entry is. For example, the
name
MAIN.DIGITAL.FRONT.REPT

identifies the entry FRONT.REPT, that is, an entry whose name is FRONT and whose entry type is REPT, in the
catalog MAIN.DIGITAL.
Often it works to provide only part of the name of an entry. When a specific catalog is being considered, it is
sufficient to provide only the last two levels of the name. When only a certain entry type will do, it may be enough
to provide the first three levels of the name. Or, in a situation where a specific catalog and a specific entry type are
understood, you can sometimes use only the one-level entry name. There are more than 100 entry types, but most
are objects with highly specialized uses, such as user interface objects and components of graphics. Below is a list of
some commonly used entry types.
CATAMS

Text data created by a program


FORMAT

Numeric format
FORMATC

Character format
INFMT

Numeric informat
INFMTC

Character informat
LOG

Text saved from the SAS log


MACRO

Macro (compiled)
MSYMTAB

Values of local macro variables


OUTPUT

Listing output
PROFILE

User settings and preferences


REPT

Report definition for the interactive windows of the REPORT procedure


SOURCE

Source code, usually a SAS program


TITLE

Title or footnote line


TRANTAB

Translation table

These catalog entry types are user interface components:


CBT
CLASS
FORMULA
FRAME
HELP
KEYS
LIST
MENU
PMENU
PROGRAM
RANGE
RESOURCE
SCL
SCREEN
WSAVE

Steps and SAS Data Sets


In a SAS program, each step executes almost like a separate program. Other programming languages have global
variables shared by all the routines or units that make up the program. One part of the program can assign a value to
a global variable in order to communicate a state or a result to another part of the program. There is nothing like that
to hold a SAS program together. When a step ends, all its variables disappear from memory. Steps have no direct
way to interact or communicate with each other.
The connections between steps are found in the objects, especially SAS data sets, that are created in one step to
be used again later in another step. The SAS data set is designed to convey tables of data from one step to another.
SAS data sets are the glue that holds a SAS program together.
The SAS language makes it easy to work with SAS data sets. You can create or use a SAS data set just by
indicating its name. To use SAS data sets in a SAS program is so easy that you might forget that they are files, but
they are files, with all the costs and considerations that files entail.

Variables
It is not just data values that flow from one step to another in a SAS program. Each data value in a SAS data set is
identified as a variable, almost exactly the same way that data values in the program are identified as variables.
When a SAS data set is created in a data step, the variables of the data step become the variables of the SAS data set
it creates. Usually, when a procedure creates a SAS data set in a proc step, it works the same way; the variables of
the proc step become the variables of the new SAS data set. The reverse of this process is also true. When a SAS
data set is read in a data step or proc step, the variables of the SAS data set are used as variables in the step.
In this way, variables flow from one step to another in a SAS program — not directly, but by way of a SAS data
set. In some programs, variables you create in the initial data step may appear in all the subsequent steps of the
program, arriving there via one or more SAS data sets. You usually have to define a variable only once, regardless
of how many places you use it. This quality of SAS programs makes them easier to maintain. If you subsequently
need to add a variable to a program, it is a minor change; you do not have to rewrite the entire program.
As variables transfer between steps and SAS data sets, so do most of their attributes, including each variable’s
name, its data type and length, and its associated informat, format, and label. These attributes are stored in the SAS
data set so that any steps that read the SAS data set can use them.

Data Set Options


Data set options are options for the way a SAS data set is accessed. To use data set options, list them in parentheses
after the SAS data set name. Write each option as the option name, an equals sign, and the option value. The values
of a few data set options are expressed with equals signs. Values for these options — the INDEX=, WHERE=, and
RENAME= options — have to be enclosed in parentheses. If there are multiple options, separate them with spaces.
SAS data set (option=value option=value . . . )

You can use data set options in most places where you write a SAS data set name in a SAS program, including
data step statements that mention SAS data sets, other than the OUTPUT statement, and in options in the proc step
that identify input and output SAS data sets.
Many of the things that can be done with data set options, such as selecting observations and renaming variables,
could also be done in a separate data step or proc step. However, the data set options use fewer computer resources
than are used in a separate step.
Data set options can be used with most kinds of SAS data sets, but not with SQL views. If you indicate data set
options for an SQL view, SAS writes a warning message and ignores the options.

Observations and the Observation Loop


A SAS data set could contain just one value for each variable, but more often, SAS data sets contain many values for
their variables, which are organized into observations and stored as records in the file. Each observation in a SAS
data set has a value for each variable. A SAS data set can have any number of observations. The observations in a
SAS data set have a certain sequence, which is the order in which they appear when you print the SAS data set with
the PRINT procedure or access it in some other way. The order of observations, however, does not necessarily
correspond to the order in which they are stored.
When a step connects with a SAS data set, variables in the step essentially are the variables in the SAS data set.
The connection between the observations in a SAS data set and the execution of a step is not quite so easy to
describe. The process of a step that reads or writes a SAS data set contains a sequence of actions that creates or
processes an observation. Those actions are repeated in what is called an observation loop. By repeating the actions
of the observation loop, once for each observation, the step eventually gets through all of the observations. Each
observation in a SAS data set corresponds to one repetition of the observation loop in the step that created the SAS
data set. There was one specific repetition that created the observation. Later, each observation corresponds to one
repetition of the observation loop in a step that reads the SAS data set.
Often, a SAS data set is created in a data step in which an INPUT statement creates the values for the variables.
Each time the INPUT statement executes, it reads one record from the input text data file and creates the values of
variables for one observation in the SAS data set. The INPUT statement has to execute one time for each record it
reads and each observation it creates. The data step’s observation loop is what makes that happen.
The observation loop continues to repeat as long as there is input data. It stops when it reaches the end of the
input file for the step, whether it is an input text data file or an input SAS data set.

Variables and Attributes


SAS variables are used in programs; the actions of a program are largely reflected in the changes of values of
variables. Variables are also stored in SAS data sets. A SAS variable is much more than just a value; there are
various other properties, or attributes, that a variable has, whether in the execution of a data step or proc step or
stored in a SAS data set.
The important attributes of a SAS variable are its name, position, type, length, informat, format, and label.
Variables have these same attributes whether they are being used in a program or stored in a file.

Name
A variable’s name is a word that identifies the variable in program statements, log notes, and output.
The name attribute may be capitalized in a specific way, and this capitalization is used when the
variable is displayed in a data file or in output.

Position
Variables in a SAS data set are considered to be stored in a specific sequence, and this is also true of
variables as they are used in a step. The position may be indicated in various ways. The variable
number in a SAS data set is a simple index number for the variable’s place in the list; the “position”
of a variable may refer instead to the abstract byte offset of the variable in an imaginary block
containing all the variables of the step or the SAS data set. Either way, it is only the relative position
of variables compared to each other that matters.
The position of variables corresponds to the sequence in which procedures come upon the variables
when they ask a SAS data set for its variables. Often variables are processed in this sequence.
Consider the PRINT procedure as an example. The VAR statement determines the sequence of
variables in this procedure’s output. If there is no VAR statement, then the procedure adds variables
to the output table in order of position.
The position attribute has a similar effect on the selection of variables in abbreviated variable lists.
Abbreviated variable lists such as _ALL_, _NUMERIC_, and STARTVAR--STOPVAR represent lists of
variables, and these variables are selected in order of position. It is the same in SQL when the wild
card symbol * causes the procedure to look for all available variables in a SAS data set; this is
essentially the same action that results from the abbreviated variable list _ALL_.
Data Type
SAS’s two data types are character and numeric, providing two ways of organizing data values. In
the LENGTH statement and throughout SAS, the dollar sign ($) identifies the character data type.
The data types are described in the next section.
Length
The length of a variable is the number of bytes used to store it in a SAS data set. The length of a
character variable indicates the number of characters the variable can hold. A numeric variable is
ordinarily 8 bytes in length. When a numeric variable is stored in a SAS data set, using a shorter
length reduces the precision of the values and saves storage space.

Informat
Informats are routines for converting text to data values. A variable’s informat attribute indicates the
informat to use when reading new values for the variable in some situations, such as list input, or
when editing the SAS data set in the interactive environment. The informat attribute may contain
width and decimal arguments for the informat.
The informat must belong to the same data type as the variable. The informat attribute can be left
blank, indicating to use the default informat.
Format
Formats are routines for converting data values to text. A variable’s format attribute is used when
the variable is printed or displayed. The format attribute works much the same way as the informat
attribute. It can include width and decimal arguments; the type of the format must match the type of
the variable; and the format attribute can be left blank to use the default formats.
Label
The label attribute contains up to 256 characters of text. It can be used as a label for display, or as a
description. Procedures and applications can use the label of a variable instead of the name to
identify the variable in display or print output. This depends on the LABEL system option; turn this
option off to see variables’ names only.
Most programmers leave the label attribute blank most of the time. When a variable’s label is blank,
SAS provides the name instead for use as a label, and procedures and applications use the variable
name to identify the variable.
Transcode
The transcode attribute applies to character variables only, and it is not often needed. It allows or
disallows a change in character encoding when data is moved between environments that use
different encodings.

Custom attributes
Starting with SAS 9.4, a program can define its own attributes for variables. These are stored with
the variable and processed in the same way as the built-in attributes.

A variable’s attributes are determined in the step where the variable is created. Most attributes can be changed at
any step along the way.
In a data step, a LENGTH statement at the beginning of the step can set the name, data type, length, and position
for all variables. If variables are not declared in this way, SAS determines data type, along with length of character
variables, based on the values used with the variables at their first appearance.
The FORMAT, INFORMAT, and LABEL statements set those respective attributes. All available attributes can
be set for a single variable in an ATTRIB statement. In a proc step, these statements are used only for variables that
come from the input SAS data set.
These are examples of statements to set attributes:
length name city $ 32 affinity mention 8;

defines NAME and CITY as character variables with length 32 and AFFINITY and MENTION as
numeric variables with length 8.
format startdate enddate yymmdd10. price comma11.2;

sets one format attribute for the variables STARTDATE and ENDDATE and another for PRICE.
label mention='Number of Mentions' city='Location' startdate='Start' enddate='End';

provides a label for each variable mentioned.


attrib ranking label='Ranking' format=comma6. informat=comma. length=4;

sets several attributes for the variable RANKING.


attrib city transcode=no;

sets the transcode attribute of the variable CITY.

The name of a variable can be changed in the RENAME= data set option. All attributes except the data type and
the length of a character variable can be changed using the MODIFY statement and related statements in the
DATASETS procedure, or in the ALTER TABLE statement in the SQL procedure.
The CONTENTS procedure provides a report that includes complete information on the attributes of variables in
a SAS data set. Indicate the SAS data set in the DATA= option:
proc contents data=SAS data set;
run;

Data Types
SAS’s two data types, character and numeric, are two ways of organizing data values. The numeric data type is
based on a particular way of representing numbers in a digital form. The character data type is based on the idea that
each byte can represent a symbol you can see.
Numeric values in SAS are 64-bit floating-point values. That means a numeric value can represent almost any
number you might use, with a high degree of accuracy. Floating-point values are not meant to represent numbers
exactly, but they are precise enough for most purposes. The precision of numbers can be measured in significant
digits; SAS numeric values have about 15 significant digits.
The character data type is based on the idea that computer data is something you can see and read. Each byte of a
character value represents a character — a letter or other familiar symbol. The length of a character value is the
number of bytes it takes up, and it is also the number of characters it can contain.
The length of a character variable can be set to any value from 1 to 32,767 characters. In practice, a character
variable longer than about 120 characters probably should not be stored in a SAS data set with other variables
because of the space it takes up — the variable’s length multiplied by the number of observations.
If a character variable variable is not as long as the value assigned to it, SAS truncates, keeping as much of the
value as it can. Depending on how the variable is used, that can interfere with the logic of a program and lead to
incorrect results.
If a character variable is used for binary data, the $HEX format may be useful to display the value in a
meaningful way. Similarly, you may write character hexadecimal constants, described below, for constant values
used with the variable.
Constants
A data value that is written directly in a program is a constant. Different kinds of constants are used at different
points in a SAS program.
These are the more common types of constants in SAS:

Whole Number
Many places in SAS statements require a whole number, one of the numbers 0, 1, 2, 3, . . . . This
number must be written as a sequence of digits, with no other symbols allowed.
These are examples of places where whole numbers are required:

the width or decimal argument of an informat or format


the location or distance of a pointer control
values of options that indicate a count or magnitude

Digital Size
Digital size constants are a special form of a whole number value where the suffixes K, M, G, and T
can be added to indicate multiples of powers of 1024. These are used especially for system options
and data set options, wherever an option relates to size considerations. For example, OBS=1G is
equivalent to OBS=1073741824.
Numeric
Write numeric constants in computer-style notation, without any commas between digits. Use
decimal points and decimal places to indicate fractional values. Write a minus sign - at the
beginning to indicate a negative value. These are examples of standard numeric constants:
37
-25
0.025
-.64
10287.455834

Scientific Notation
Very large and very small numbers can be easier to write in scientific notation. Scientific notation
begins with a number in standard numeric notation and appends the letter E and an integer. This
integer represents a power of 10. The number is multiplied by 10 raised to the indicated power. So,
for example, 2.5E5 is 2.5 times 100000, or 250000. The process works the same way with negative
exponents. For example, 2.06E-8 is 2.06 times .00000001, or .0000000206.
The scientific notation form allows for some variability of notation. You can write the letter D
instead of E. When the exponent is positive, you can write a positive sign between the letter and the
exponent.
Hexadecimal
Hexadecimal notation is a base 16 system of writing integers. It is often used for numeric codes and
sizes in computer systems because of the way it translates so easily to binary. Each hexadecimal
digit corresponds to four binary digits. Two hexadecimal digits can represent one byte.
Hexadecimal uses the digits 0–9 followed by A–F (or a–f) to represent 10–15. Values of 16 or more
are written with more than one digit. The last digit in a hexadecimal numeral represents the actual
value of the digit. Preceding digits represent successively higher powers of 16. So, for example, the
hexadecimal numeral 2BC can be interpreted as (2 × 16 + 11) × 16 + 12, or 700.
Only whole numbers can be written as hexadecimal constants. Write the hexadecimal numeral
followed by the letter X (or x). So that SAS will recognize it as a number and not as word, a
hexadecimal constant must begin with one of the digits 0–9. If a hexadecimal numeral begins with
one of the digits A–F, write a leading 0 before it. These are examples of hexadecimal constants, along
with the same value in standard notation:

2BCX is 700
0FX is 15
400X is 1024
0A0X is 160

SAS Date, SAS Time, SAS Datetime


Date, time, and datetime constants are used to represent time measurements. The specific numeric
meaning of these values is described farther below in “Values That Measure Time.” These are
examples of valid constant values:
'31dec1999'd
'07:30't
'23:59:59't
'72:00't
'31dec1999 00:00'dt
'31dec1999 12:00 am'dt

Character
A simple quoted string represents a character constant. The characters of the quoted string are the
characters of the character value. These are examples of the use of character constants in SAS
statements:
IF NAME = '' THEN NAME = 'Anonymous';
TITLE1 "Trope Report";
FILENAME IN "/data/in/latest.txt";
putlog 'Beginning processing.';

In writing a quoted string in SAS:

Use single or double quotes.


Write the quote character twice if it occurs within the data value.
Write two quote characters with nothing between to indicate the null string.
It is possible for a quoted string to extend across multiple lines.

Character constants are used in expressions in data step programming and in various other ways
throughout the SAS environment. A character constant in a SAS statement might contain:

the value of a system option or statement options


an initial value of a character variable
a title or footnote line in a TITLE or FOOTNOTE statement
the label of a variable, SAS file, or entry
text to be written in a PUT statement
the physical name of a file
a command to execute in the X statement

Character hexadecimal
For control characters and binary data that can’t be written in a standard character constant, there are
character hexadecimal constants. Write each byte as two hexadecimal digits. Write a quoted string
and follow it with the letter X (or x).
Three character hexadecimal constants you can expect to see sooner or later are '09'X, the ASCII tab
character, used in tab-delimited data files; '00'X, the null character, used as a string terminator in
some data, and used to indicate a blank label; and '1A'X, the “Control-Z” character, sometimes used
as an end-of-line or end-of-file indicator.
To write a valid character hexadecimal constant, write an even number of hexadecimal digits. For
clarity, you can write commas between bytes, such as '4040,60606060'X.

Missing Values
Missing values serve as placeholders in those situations when no value is available, particularly for numeric
variables.
The period symbol represents a missing value in SAS. Write a missing constant as a period. This is an example
of assigning a missing value to a variable:
response = .;

Support for missing values is found throughout SAS.

Numeric informats read a period or a blank field as a missing value.


Numeric formats write a missing value as a period, or you can select a different character
with the MISSING= system option. For example, change the option to MISSING=' ' to leave
fields blank when a value is missing.
Statistical computations disregard or exclude missing values.
Financial functions may compute a value to replace the missing argument.
Comparisons and sorting treat missing values as less than any number.
In logical expressions, a missing value is treated as false.

Missing values can also appear in places where they don’t have any particular meaning. When missing values
are used with arithmetic operators or most functions, the result of the expression is a missing value, and SAS may
consider this a data error.
SAS generates missing values as initial values for variables and in any situation where no value is available. It
uses a blank value as a missing value for character variables.
Use the MISSING function to test for missing values. This function works with either a numeric argument or a
character argument. This example checks the value of EXPENSES and replaces a missing value with zero:
if missing(expenses) then expenses = 0;

Special Missing Values


You might need to distinguish among various reasons why a value is unavailable, and this is the purpose of special
missing values. A special missing value constant is a period followed by a letter or underscore, resulting in 27
different special missing values, in addition to the standard missing value.
Special missing values may be assigned to variables in assignment statements. To read special missing values in
input data, declare the characters (uppercase or lowercase letters or underscore) that indicate missing values in the
MISSING statement, for example:
MISSING N n;
This allows you to read input data such as the following:

4.0 N 11.2 5.8


n 1.0 9.9 N

Numeric formats write special missing values as single characters, either a capital letter or an underscore.
In comparisons, different missing values are not equal. Missing values compare in the following order, from
lowest to highest:
._
.
.A
.B
.C
. . .
.X
.Y
.Z

All missing values compare less than all numbers.

Boolean Values
To make it possible to do arithmetic on logical results, mathematicians write true as the number 1 and false as the
number 0, and SAS follows this convention. When 1 and 0 indicate true and false, they are Boolean values, and
subsequent computations may be Boolean algebra.
Comparison operators such as > produce Boolean values as their results. For example, if VOLUME is 6, then
VOLUME > 2 results in the value 1, indicating true.

The result of a comparison can be assigned to a variable to create a Boolean variable. Boolean variables,
sometimes called dummy variables, are important in statistical models, making it possible for the models to
encompass logical conditions as independent variables.
Any numeric value may be used with a logical operator or in a control flow statement where a logical value is
expected. In addition to 0, all missing values are treated as false. All other values are treated as true.

Values That Measure Time


SAS measures time using ordinary numeric values. It has particular support for two time units, seconds and days,
and three specific kinds of measurements.

A SAS time value is the time of day measured in seconds since midnight, or any
measurement in seconds of elapsed time or a time offset.
A SAS datetime value is the date and time measured in seconds since the beginning of
1960.
A SAS date value is the date counted in days, also starting with 0 at the beginning of 1960.

SAS informats, functions, and formats process time data using these definitions.
For example, the DATE informat can read a date written as 27AUG2002 and interpret that text as a SAS date
value (resulting in the number 15,579). The DATE format can write the resulting SAS date value as the same text, or
you can use other formats, such as the YYMMDD format, to write it in other conventional ways. Various functions
allow you to extract the year, month, and day from the SAS date value and to work with it in other ways.
SAS time, datetime, and date values can be written as constants. Write a SAS time constant as hours, hours and
minutes, or hours, minutes, and seconds, separated by colons. Enclose the entire value in quotes, and write the letter
T immediately after the closing quote. This is the syntax of a SAS time constant:
'hh:mm'T
'hh:mm:ss'T
'hh:mm:ss.ssss'T

A SAS time constant can also be written with a 12-hour clock, using the letters AM or PM at the end of the
string to indicate the day half.
If you are using a SAS time constant to indicate duration, the hour part might be more than two digits. A time
offset could be positive or negative, so you can write a negative sign at the beginning of the quoted string.
To write a SAS date constant, write the day of the month, the three-letter abbreviation for the month, and the
year, quoted, followed by the letter D.
'ddMONyyyy'D

The SAS datetime constant combines the notation of the SAS date constant and the SAS time constant. Write a
space or colon between the date and the time of day.
'ddMONyyyy hh:mm:ss.ssss'DT

Years are limited to the range from 1582 and 19999. Two-digit years, values from 0 to 99, are mapped to a span
according to the setting of the YEARCUTOFF= system option. The option indicates the first year of the 100-year
span that can be represented with two-digit year numbers. For example, with YEARCUTOFF=1926, two-digit years belong
to the century from 1926 to 2025; when you write the year 26, it means 1926, and when you write the year 25, it
means 2025. Date formats, depending on the width argument, write any year as a two-digit year, simply writing the
last two digits of the year.
SAS time and SAS datetime values are simple linear measurements, so you can use them in ordinary time
computations that use seconds as the unit of measurement. For example, if EVENT_TS is a SAS datetime value,
these statements compute SAS datetime values 15 seconds earlier and 6 hours later:
earlier_15s = event_ts - 15;
later_6h = event_ts + '6:00't;

Similarly, compute with SAS date values by adding and subtracting days. If DATE_TODAY is today’s date,
then DATE_TODAY - 1 is yesterday’s date.
Appendix 2
SQL Reserved Words
SQL standards identify specific words used in SQL statements as reserved words. These keywords and names are
excluded from use as identifiers; to comply with the standards, you should not use these words as names for
columns, tables, or other user-created objects. SQL implementations vary in the way they enforce these restrictions,
and SAS has only a minimum of restrictions on names. If you are looking to write portable SQL, though, you will
want to avoid creating objects that use any of these reserved words as names.
The following list includes words identified as reserved in one or more SQL standards along with ones marked
as reserved by SAS for any dialect of SQL. Specific SAS restrictions are noted for the words that SAS restricts in
SAS SQL.
ABORT
ABS
ABSOLUTE
ACCESS
ACTION
ADD
AFTER
AGGREGATE
ALL
ALLOCATE
ALTER
ANALYSE
ANALYZE
AND
ANY
ARE
ARRAY
ARRAY_AGG
ARRAY_MAX_CARDINALITY
AS (SAS restriction: table alias)
ASC
ASENSITIVE
ASSERTION
ASSIGNMENT
ASYMMETRIC
ASYNC
AT
ATOMIC
AUTHORIZATION
AVG

BACKWARD
BEFORE
BEGIN
BEGIN_FRAME
BEGIN_PARTITION
BETWEEN
BIGINT
BINARY
BIT
BIT_LENGTH
BLOB
BOOLEAN
BOTH
BY

CACHE
CALCULATED
CALL
CALLED
CARDINALITY
CASCADE
CASCADED
CASE (SAS restriction: column name)
CAST
CATALOG
CEIL
CEILING
CHAIN
CHAR
CHARACTER
CHARACTERISTICS
CHARACTER_LENGTH
CHAR_LENGTH
CHECK
CHECKPOINT
CLASS
CLOB
CLOSE
COALESCE
COLLATE
COLLATION
COLLECT
COLUMN
COMMENT
COMMIT
COMMITTED
CONDITION
CONNECT
CONNECTION
CONSTRAINT
CONSTRAINTS
CONTAINS
CONTENTS
CONTINUE
CONVERSION
CONVERT
COPY
CORR
CORRESPONDING
COUNT
COVAR_POP
COVAR_SAMP
CREATE
CREATEDB
CREATEUSER
CROSS (SAS restriction: table alias)
CUBE
CUME_DIST
CURRENT
CURRENT_CATALOG
CURRENT_DATE
CURRENT_DEFAULT_TRANSFORM_GROUP
CURRENT_PATH
CURRENT_ROLE
CURRENT_ROW
CURRENT_SCHEMA
CURRENT_TIME
CURRENT_TIMESTAMP
CURRENT_TRANSFORM_GROUP_FOR_TYPE
CURRENT_USER
CURSOR
CYCLE

DATABASE
DATALINK
DATE
DAY
DEALLOCATE
DEC
DECIMAL
DECLARE
DEFAULT
DEFAULTS
DEFERRABLE
DEFERRED
DEFINER
DELETE
DELIMITER
DELIMITERS
DENSE_RANK
DEREF
DESC
DESCRIBE
DESCRIPTOR
DETERMINISTIC
DIAGNOSTICS
DICTIONARY
DISCONNECT
DISTINCT
DLNEWCOPY
DLPREVIOUSCOPY
DLURLCOMPLETE
DLURLCOMPLETEONLY
DLURLCOMPLETEWRITE
DLURLPATH
DLURLPATHONLY
DLURLPATHWRITE
DLURLSCHEME
DLURLSERVER
DLVALUE
DO
DOMAIN
DOUBLE
DROP
DYNAMIC

EACH
ELEMENT
ELSE
ENCODING
ENCRYPTED
END
END_FRAME
END_PARTITION
ENGNAME
ENGOPT
EQ
EQUALS
ESCAPE
EVERY
EXCEPT (SAS restriction: table alias)
EXCEPTION
EXCLUDING
EXCLUSIVE
EXEC
EXECUTE
EXISTS
EXP
EXPLAIN
EXTERNAL
EXTRACT

FALSE
FETCH
FILTER
FIRST
FIRST_VALUE
FLOAT
FLOOR
FOR
FORCE
FOREIGN
FORMAT
FORWARD
FOUND
FRAME_ROW
FREE
FREEZE
FROM (SAS restriction: table alias)
FULL (SAS restriction: table alias)
FUNCTION
FUSION

GE
GET
GLOBAL
GO
GOTO
GRANT
GROUP (SAS restriction: table alias)
GROUPING
GROUPS
GT

HANDLER
HAVING (SAS restriction: table alias)
HOLD
HOUR

IDENTITY
ILIKE
IMMEDIATE
IMMUTABLE
IMPLICIT
IMPORT
IN
INCLUDING
INCREMENT
INDEX
INDEXES
INDICATOR
INFORMAT
INHERITS
INITIALLY
INNER (SAS restriction: table alias)
INOUT
INPUT
INSENSITIVE
INSERT
INSTEAD
INT
INTEGER
INTERSECT (SAS restriction: table alias)
INTERSECTION
INTERVAL
INTO
INVOKER
IS
ISNULL
ISOLATION

JOIN (SAS restriction: table alias)

KEY

LABEL
LAG
LANCOMPILER
LANGUAGE
LARGE
LAST
LAST_VALUE
LATERAL
LE
LEAD
LEADING
LEFT (SAS restriction: table alias)
LEVEL
LIBREF
LIKE
LIKE_REGEX
LIMIT
LISTEN
LN
LOAD
LOCAL
LOCALTIME
LOCALTIMESTAMP
LOCATION
LOCK
LOWER
LT

MATCH
MAX
MAXVALUE
MAX_CARDINALITY
MEMBER
MERGE
METHOD
MIN
MINUTE
MINVALUE
MISSING
MOD
MODE
MODIFIES
MODIFY
MODULE
MONTH
MOVE
MULTISET

NAMES
NATIONAL
NATURAL (SAS restriction: table alias)
NCHAR
NCLOB
NE
NEW
NEXT
NO
NOCREATEDB
NOCREATEUSER
NONE
NORMALIZE
NOT
NOTHING
NOTIFY
NOTIN
NOTNULL
NTH_VALUE
NTILE
NULL
NULLIF
NUM
NUMERIC

OCCURRENCES_REGEX
OCTET_LENGTH
OF
OFF
OFFSET
OIDS
OLD
ON (SAS restriction: table alias)
ONLY
OPEN
OPERATION
OPERATOR
OPTION
OR
ORDER (SAS restriction: table alias)
OUT
OUTER (SAS restriction: table alias)
OUTPUT
OVER
OVERLAPS
OVERLAY
OWNER

PAD
PARAMETER
PARTIAL
PARTITION
PASSWORD
PATH
PENDANT
PERCENT
PERCENTILE_CONT
PERCENTILE_DISC
PERCENT_RANK
PERIOD
PLACING
PORTION
POSITION
POSITION_REGEX
POWER
PRECEDES
PRECISION
PREPARE
PRESERVE
PRIMARY
PRIOR
PRIVILEGES
PROCEDURAL
PROCEDURE
PUBLIC

RANGE
RANK
READ
READS
REAL
RECHECK
RECURSIVE
REF
REFERENCES
REFERENCING
REGR_AVGX
REGR_AVGY
REGR_COUNT
REGR_INTERCEPT
REGR_R2
REGR_SLOPE
REGR_SXX
REGR_SXY
REGR_SYY
REINDEX
RELATIVE
RELEASE
RENAME
REPLACE
RESET
RESTART
RESTRICT
RESULT
RETURN
RETURNS
REVOKE
RIGHT (SAS restriction: table alias)
ROLLBACK
ROLLUP
ROW
ROWS
ROW_NUMBER
RULE

SAVEPOINT
SCHEMA
SCOPE
SCROLL
SEARCH
SECOND
SECTION
SECURITY
SELECT
SENSITIVE
SEQUENCE
SERIALIZABLE
SESSION
SESSION_USER
SET
SETOF
SHARE
SHOW
SIMILAR
SIMPLE
SIZE
SMALLINT
SOME
SPACE
SPECIFIC
SPECIFICTYPE
SQL
SQLCODE
SQLERROR
SQLEXCEPTION
SQLSTATE
SQLWARNING
SQRT
STABLE
START
STATEMENT
STATIC
STATISTICS
STDDEV_POP
STDDEV_SAMP
STDIN
STDOUT
STORAGE
STRICT
SUBMULTISET
SUBSTRING
SUBSTRING_REGEX
SUCCEEDS
SUM
SYMMETRIC
SYSTEM
SYSTEM_TIME
SYSTEM_USER

TABLE
TABLESAMPLE
TEMP
TEMPLATE
TEMPORARY
THEN
TIME
TIMESTAMP
TIMEZONE_HOUR
TIMEZONE_MINUTE
TO
TOAST
TRAILING
TRANSACTION
TRANSLATE
TRANSLATE_REGEX
TRANSLATION
TREAT
TRIGGER
TRIM
TRIM_ARRAY
TRUE
TRUNCATE
TRUSTED
TYPE

UESCAPE
UNENCRYPTED
UNION (SAS restriction: table alias)
UNIQUE
UNKNOWN
UNLISTEN
UNNEST
UNTIL
UPDATE
UPPER
USAGE
USER (SAS restriction: column name)
USING

VACUUM
VALID
VALIDATE
VALIDATOR
VALUE
VALUES
VALUE_OF
VARBINARY
VARCHAR
VARYING
VAR_POP
VAR_SAMP
VERBOSE
VERSION
VERSIONING
VIEW
VOLATILE

WHEN (SAS restriction: table alias)


WHENEVER
WHERE (SAS restriction: table alias)
WIDTH_BUCKET
WINDOW
WITH
WITHIN
WITHOUT
WORK
WRITE

XML
XMLAGG
XMLATTRIBUTES
XMLBINARY
XMLCAST
XMLCOMMENT
XMLCONCAT
XMLDOCUMENT
XMLELEMENT
XMLEXISTS
XMLFOREST
XMLITERATE
XMLNAMESPACES
XMLPARSE
XMLPI
XMLQUERY
XMLSERIALIZE
XMLTABLE
XMLTEXT
XMLVALIDATE

YEAR
YES

ZONE
Index
_TEMA 2
%LET statement 10
%PUT statement 10, 10
$ format and informat 5
$BASE64X format and informat 5
$CHAR format and informat 5
$F format and informat 5
$HEX format and informat 5
$UPCASE format and informat 5
ABS function 3
ACCESS=READONLY 8
alias 2, 2, 4
ALTER TABLE 7
ADD CONSTRAINT 7
MODIFY 7, 8
ANY and NOT functions 3
arithmetic 3
AS 2
ATTRIB statement 2, A1
autoexec 5, 7
automatic macro variables 10
as option 9
AVG 4
B8601 formats and informats 5
BEST format and informat 5
BETWEEN-AND operator 3
binary files A1
BINARY format and informat 5
BLOB 8
Boolean values A1
BTRIM function 3
BUFFERSIZE= option 9
CALCULATED 4
CASE 2, 3
CAST 8
CAT function 3
catalogs A1
entries A1
CATS function 3, 10
CATT function 3
CATX function 3, 10
CEIL function 3
CEILZ function 3
CENTER option 5, 5
CHAR function 3
character data type 7, A1
CHECK rule 7
CLOB 8
CNTLIN= option 5
CNTLLEV= option 7
COALESCE function 3, 6
columns 1, 2, 2, 7
adding 7
alias 4
all 2
attributes 2, 7
CONTENTS procedure 7
definitions 7
expression 2, 3
subquery 6
label 7
length 2, 3, 3, 3, 7, 8, 8
modifiers 7
order 2
pass-through 8
removing 7
table alias as prefix 6, 6
unnamed 2
COMMA format and informat 5
comments 1
COMPBL function 3
COMPRESS function 3
COMPRESS= option 2, 7, 8
concatenation 10
CONNECT statement 8, 8
USING 8
connection alias 8
CONNECTION TO 8, 8
constants 1, 2, 3, 3, A1
from system clock 10
SAS vs. DBMS 8
time A1
CONSTDATETIME option 9
constraints see integrity constraints
CONTAINS operator 3
CONTENTS procedure 7, 8, A1
control data set 5
CORRESPONDING 6
COUNT 4, 4, 4, 4, 6
CREATE INDEX 7
CREATE TABLE 2, 5, 6, 7, 7, 10
CONNECTION TO 8, 8
database library engine 8
CREATE VIEW 2, 6, 7, 10
CROSS JOIN 6, 6
CSS 4
CSV 5
CV 4
data 1
data set options 2, 3, 7, A1
data types 3, 7, A1
syntax 7
database library engine 8, 8
functions and formats 8
DATASETS procedure 5, 7, 8, A1
DATE format and informat 5
DATE function 3
DATE option 5
DATEPART function 3
DATETIME format and informat 5
DATETIME function 3
DAY function 3
DELETE statement 7, 10
views 7
DEQUOTE function 3
DESCRIBE TABLE 7, 7
integrity constraints 7
DESCRIBE VIEW 7
devices A1
DHMS function 3
DICTIONARY 7, 7
directories A1
DISCONNECT statement 8
DISTINCT 2, 4, 6
integrity constraint 7
DOLLAR format and informat 5
DQUOTE option 9, 9, 10
DROP statement 7
DTRESET option 5
E format and informat 5
E8601 formats and informats 5, 8
entries
names A1
types A1
environment variables 3
EPUB 5
ERRORABEND option 9
ERRORSTOP option 9
EXCEPT 6
EXEC option 9
EXECUTE statement 8, 8
EXP function 3
EXTENDOBSCOUNTER= option 7
external identifiers 6
F format and informat 5
FEEDBACK 9
FILENAME statement A1
filerefs A1
files A1
FIND function 3
FINDC function 3
FIRST function 3
FIRSTOBS= option 7, 9
FLOOR function 3
FLOORZ function 3
FMTERR option 5
footnote lines 5
FOREIGN KEY 7
FORMAT procedure 5
FORMAT statement A1
formats 9
arguments 5
attribute 7, A1
creating macro variables 10, 10
database library engine 8
names 5
SQL pass-through 8, 8
storing 5
value formats 5
FREQ 4, 4
frequency table 4
FROM 2, 2, 2, 4, 6
FULL OUTER JOIN 6, 6
FULLSTIMER option 9
functions 3
aggregate 4
concatenation 3
database library engine 8
environment 3
numeric 3
SAS vs. DBMS 8
scalar and aggregate 4
selection 3
text search 3
time 3
FUZZ function 3
global statements 5
GROUP BY 4, 4
indexes for 7
table joins 6
groups 4
hash joins 9
HAVING 4, 4
HEX format and informat 5
HOUR function 3
HTML 3, 5, 5
HTMLDECODE function 3
HTMLENCODE function 3
ID codes, converting 8
IDXNAME= option 7
IDXWHERE= option 7
IFC function 3
IFN function 3
implicit SQL pass-through 8
IN operator 3
subquery 6
in-database processing 8
options 9
indexes 7
CONTENTS procedure 7
creating 7
deleting 7
options 7
used in integrity constraints 7
INFORMAT statement A1
informats 5
attribute 7, A1
INNER JOIN 6
INOBS= option 2, 9
INPUT function 3, 3
INSERT statement 7, 10
database library engine 8
views 7
INT function 3
INTCK function 3
integrity constraints 7
CONTENTS procedure 7
deleting 7
internal vs. external identifiers 6
INTERSECT 6
INTNX function 3
INTO 8, 10
SEPARATED BY 10
INTZ function 3
IPONEATTEMPT option 9
IS MISSING operator 3
IS NULL operator 3, 3
item stores A1
JOIN 6
key
see integrity constraints; indexes; table joins; SORTED BY; GROUP BY; PRIMARY KEY; FOREIGN KEY
label attribute 7
LABEL option 2, 5, 5
LABEL statement A1
LABEL= option 7
labels 5
blank 5
large objects 8
LARGEST function 3
LEFT function 3
LEFT OUTER JOIN 6, 6
length attribute 7
LENGTH statement A1
LIBNAME statement 7, 8, 8, A1
in view 7
libraries 7, A1
concatenation A1
predefined 7
LIBRARY library 5
librefs 7, A1
clearing 7
defining 7
listing 7
LIKE operator 3
Listing destination 2, 5, 5, 5
lists 1
LN function 3
LOCK statement 7
log 1, 10
file 1
SQL options 9
LOG function 3
LOOPS= option 6, 9, 10
LOWCASE function 3
LOWER function 3
macro language 10
macro processor 10
macro variables 3, 10
automatic 10
from queries 10
in queries 10
INTO 8
names 10
reference 10
MAX 4, 4
MEAN 4
member types A1
members 7, 7
metadata 1
MIN 4, 4
MINUTE function 3
missing 1, 5, 8
INPUT function 3
MISSING statement A1
missing values A1
special A1
MISSING= option 5
MMDDYY format and informat 5
MOD function 3
MODZ function 3
MONTH format 5
MONTH function 3
N 4, 4
names 1, 2, 2, 7
for format 5
literals 8
NATURAL 6
NMISS 4, 4
NOT functions see ANY and NOT functions
NOT NULL rule 7
NOTRIM 10
NOWARNRECURS option 9
null 1, 4, 5, 8
CASE 3
comparison 3
INPUT function 3
integrity constraint 7
statistics 4
null character 5
NUMBER option 5
numeric data type 7, A1
OBS= option 7, 9
observation loop A1
observations 1, A1
OCTAL format and informat 5
ODS 2, 5
destinations 5
paragraphs 5
statement 5
style 5, 5
table template 5
text lines 5
ON 6
subqueries 6
operators 3
character 3, 3
comparison 3, 3, 3
logical 3
numeric 3
priority 3
OPTIONS statement 5, 5
ORDER BY 2, 4, 4
indexes for 7
not allowed in views 7
OUTER UNION 6
OUTOBS= option 6, 9
Output Delivery System 5
PAGENO= option 5
pages 5
password 8
PDF 5
percents 4
POWER function 3
PRIMARY KEY 7
PRINT option 9, 10
procedures 7
program files 1
PROPCASE function 3
PRT 4
PUT function 3, 3
database library engine 8
optimization 9
QTR function 3
queries 2, 9
as views 7
creating macro variables 10
database 8
execution sequence 4
macro variables in 10
QUIT statement 1
QUOTE function 3, 10
quoted strings 1
RANGE 4
recoding 8
REDUCEPUT option 9
REDUCEPUTOBS= option 9
REDUCEPUTVALUES= option 9
relative complement 6
REMERGE option 9
remerging 4, 4, 9
remote SQL pass-through 8
RENAME= option 2, 7, 8
REPLACE= option 7, 9
reserved words 2, 6, A2
RESET statement 9
result operators 6
result set 2
combining 6
resummarizing 6
REVERSE function 3
RIGHT OUTER JOIN 6
ROUND function 3
routines 0
rows 1, 2
adding 7
deleting 7
moving 7
order 2, 2, 4, 4
RSUBMIT statement 8
RUN statement 1
SAS 1
and SQL 1, 1
names 1
program files 1, 9
syntax 1
terminology 1
SAS data files 1, 7, A1
conflict with views A1
SAS data sets 1, A1
and steps A1, A1
SAS date value 1, 3, 8, A1
SAS datetime value 1, 3, 8, A1
SAS files A1
SAS time value 1, 3, 8
SAS time values A1
SAS_PUT function 9
SAS/ACCESS 8
SAS/CONNECT 8
SASHELP library 7
SASUSER library 7
SCAN function 3
scrolling 5
SECOND function 3
SELECT 2, 2, 2, 4, 5, 5, 5, 6, 6, 10
expressions 3
self joins 6, 6
SEPARATED BY 10
set operators 6
SIGN function 3
SMALLEST function 3
SORTEQUALS option 9
SORTMSG option 9
sounds-like operator 3
special missing values A1
SQL 1
and SAS 1
options 9
pass-through 8, 8
avoiding confusion 8
explicit vs. implicit 8
remote 8
return codes 10
phases of execution 9
procedure 1, 1, 2, 9
return codes 10
terminology 1
SQLCONSTDATETIME option 9
SQLEXITCODE macro variable 10
SQLGENERATION= option 9
SQLIPONEATTEMPT option 9
SQLMAPPUTTO 9
SQLOBS macro variable 10, 10
SQLOOPS macro variable 10
SQLREDUCEPUT option 9
SQLREDUCEPUTOBS option 9
SQLREDUCEPUTVALUES option 9
SQLREMERGE option 9
SQLUNDOPOLICY option 9
SQLXMSG macro variable 10
SQLXRC macro variable 10
SQRT function 3
statements 1
statistics 4
STD 4
STDERR 4
steps 1, 1, A1, A1
STIMER option 9, 9
STOPONTRUNC option 9
STRIP function 3, 10
SUBPAD function 3
subqueries 6
set operators 6
SUBSTR function 3, 8
SUBSTRING function 3
substrings 3
SUBSTRN function 3, 3
SUM 4
summary data 4, 9
combining 6
table joins 6
summary statistics 4
SUMWGT 4
SYMEXIST function 3
SYMGET function 3
SYS_SQLSETLIMIT 9
SYSDATE9 macro variable 10
SYSENCODING macro variable 10
SYSERR macro variable 10
SYSEXIST function 3
SYSGET function 3
SYSJOBID macro variable 10
SYSLAST macro variable 10
SYSPARM 3
system options 9
SYSTIME macro variable 10
SYSUSERID macro variable 10
T statistic 4
table joins 6
Cartesian 6
GROUP BY 6
ID columns 6, 6, 6, 7
indexes for 7
operators 6
self joins 6
subqueries 6
three or more tables 6
vs. set operators 6
tables 1, 1, 2, 2, 7
adding columns 7
alias 2, 6, 6, 6
creating 2, 7
deleting 7
locking 7
names 7, 7, 7
removing columns 7
replacing 7, 7, 9
temporary 7
updating 7
vs. subqueries 6
text files A1
time 1, 8, 9, A1, A1
arithmetic A1
automatic macro variables 10
functions 3
TIME function 3
TIMEPART function 3
title lines 5
formatting options 5
transcode attribute 7, 8
TRIM function 3
TRIMMED 10
TRIMN function 3, 3
UBUFSIZE= option 9
UNDO_POLICY= option 9
UNION 6
UNION ALL 6
summary data 6
UNION JOIN 6
UNIQUE see DISTINCT
UPCASE function 3
UPDATE statement 7
views 7
UPPER function 3
USER 2
USING LIBNAME 7
USS 4
value formats 5, 8
VAR statistic 4
variables 1, A1
attributes A1
views 2, 7, A1
conflicts with tables 7, A1
database library engine 8
deleting 7
inline 6
WEEKDAY function 3
WHERE 2, 2, 3, 4
DELETE statement 7
indexes for 7
table join 6
UPDATE statement 7
vs. HAVING 4
WORK library 7
XML 3, 5, 8
YEAR format 5
YEAR function 3
YEARCUTOFF= option 9, A1
YRDIF function 3
YYMMD format 5
YYMMDD format and informat 5
Z format 5

You might also like