SAS Base SAS 9.4 Procedures Guide 7e (2017)
SAS Base SAS 9.4 Procedures Guide 7e (2017)
4 Procedures
®
SAS® Documentation
June 13, 2024
The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2017. Base SAS® 9.4 Procedures Guide, Seventh Edition. Cary,
NC: SAS Institute Inc.
Base SAS® 9.4 Procedures Guide, Seventh Edition
Copyright © 2017, SAS Institute Inc., Cary, NC, USA
ISBN 978-1-63526-021-2 (Paperback)
ISBN 978-1-62960-818-1 (PDF)
All Rights Reserved. Produced in the United States of America.
For a hard copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any
means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc.
For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire
this publication.
The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal
and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of
copyrighted materials. Your support of others' rights is appreciated.
U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computer software developed at
private expense and is provided with RESTRICTED RIGHTS to the United States Government. Use, duplication, or disclosure of the Software
by the United States Government is subject to the license terms of this Agreement pursuant to, as applicable, FAR 12.212, DFAR
227.7202-1(a), DFAR 227.7202-3(a), and DFAR 227.7202-4, and, to the extent required under U.S. federal law, the minimum restricted rights
as set out in FAR 52.227-19 (DEC 2007). If FAR 52.227-19 is applicable, this provision serves as notice under clause (c) thereof and no other
notice is required to be affixed to the Software or documentation. The Government’s rights in Software and documentation shall be only
those set forth in this Agreement.
SAS Institute Inc., SAS Campus Drive, Cary, NC 27513-2414
June 2024
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and
other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
9.4-P11:proc
Contents
PART 1 Concepts 1
Appendix 3 / Raw Data and DATA Steps for Base SAS Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 2745
Overview of Raw Data and DATA Steps for Base SAS Procedures . . . . . . . . . . . . . . 2746
CARSURVEY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2747
CENSUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2748
CHARITY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2749
CONTROL Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2751
CUSTOMER_RESPONSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2782
DJIA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2784
EDUCATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2785
EMPDATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2786
ENERGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2788
EXP Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2789
EXPREV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2790
GROC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2792
MATCH_11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2793
PROCLIB.DELAY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2795
PROCLIB.EMP95 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2796
PROCLIB.EMP96 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2797
PROCLIB.INTERNAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2798
PROCLIB.LAKES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2798
PROCLIB.MARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2799
PROCLIB.PAYLIST2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2800
Contents xiii
PROCLIB.PAYROLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2801
PROCLIB.PAYROLL2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2804
PROCLIB.SCHEDULE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2804
PROCLIB.STAFF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2807
PROCLIB.STAFF2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2810
PROCLIB.SUPERV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2811
RADIO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2811
SALES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2824
n style conventions
n special characters
Syntax Components
The components of the syntax for most language elements include a keyword and
arguments. For some language elements, only a keyword is necessary. For other
language elements, the keyword is followed by an equal sign (=). The syntax for
arguments has multiple forms in order to demonstrate the syntax of multiple
arguments, with and without punctuation.
keyword
specifies the name of the SAS language element that you use when you write
your program. Keyword is a literal that is usually the first word in the syntax. In a
CALL routine, the first two words are keywords.
ALTER (alter-password)
BEST w.
REMOVE <data-set-name>
In this example, the first two words of the CALL routine are the keywords:
CALL RANBIN(seed, n, p, x)
DO;
... SAS code ...
END;
Some system options require that one of two keyword values be specified:
DUPLEX | NODUPLEX
In this example, string and position follow the keyword CHAR. These arguments
are required arguments for the CHAR function:
Each argument has a value. In this example of SAS code, the argument string has
a value of 'summer', and the argument position has a value of 4:
x=char('summer', 4);
In this example, string and substring are required arguments, whereas modifiers
and startpos are optional.
MISSING character(s);
<LITERAL_ARGUMENT> argument-1 <<LITERAL_ARGUMENT> argument-2 ... >
specifies that one argument is required and that a literal argument can be
associated with the argument. You can specify multiple literals and argument
Style Conventions xvii
pairs. No punctuation is required between the literal and argument pairs. The
ellipsis (...) indicates that additional literals and arguments are allowed.
Style Conventions
The style conventions that are used in documenting SAS syntax include uppercase
bold, uppercase, and italic:
xviii Syntax Conventions for the SAS Language
UPPERCASE BOLD
identifies SAS keywords such as the names of functions or statements. In this
example, the keyword ERROR is written in uppercase bold:
ERROR <message>;
UPPERCASE
identifies arguments that are literals.
In this example of the CMPMODEL= system option, the literals include BOTH,
CATALOG, and XML:
LINK label;
n nonliteral values that are assigned to an argument.
Special Characters
The syntax of SAS language elements can contain the following special characters:
=
an equal sign identifies a value for a literal in some language elements such as
system options.
In this example of the MAPS system option, the equal sign sets the value of
MAPS:
MAPS=location-of-maps
<>
angle brackets identify optional arguments. A required argument is not enclosed
in angle brackets.
In this example of the CMPMODEL= system option, you can choose only one of
the arguments:
In this example of the CAT function, multiple item arguments are allowed, and
they must be separated by a comma:
If you use a logical name, you typically have a choice of using a SAS statement
(LIBNAME or FILENAME) or the operating environment's control language to make
the reference. Several methods of referring to SAS libraries and external files are
available, and some of these methods depend on your operating environment.
xx Syntax Conventions for the SAS Language
In the examples that use external files, SAS documentation uses the italicized
phrase file-specification. In the examples that use SAS libraries, SAS
documentation uses the italicized phrase SAS-library enclosed in quotation marks:
infile file-specification obs = 100;
libname libref 'SAS-library';
xxi
Overview
The following procedures are new:
n PROC AUTHLIB
n PROC DELETE
n PROC DS2
n PROC DSTODS2
n PROC FEDSQL
n PROC FMTC2ITM
n PROC HDMD
n PROC JSON
n PROC LUA
n PROC PRESENV
n PROC PRODUCT_STATUS
n PROC S3
n PROC SCOREACCEL
n PROC SQOOP
n PROC STREAM
SAS 9.4M1 has a new option for the AUTHLIB procedure. By using the
REQUIRE_ENCRYPTION=YES option in the CREATE or MODIFY statements, an
administrator can require that all data sets in a metadata-bound library be
automatically encrypted when created. For more information, see “Requiring
Encryption for Metadata-Bound Data Sets” on page 134.
SAS 9.4M3 has the following changes and enhancements for the MODIFY
statement of the AUTHLIB procedure:
n The PURGE statement removes any retained metadata-bound library
credentials older than a given date of replacement. For more information, see
“PURGE Statement” on page 148 and “Retaining and Purging Metadata-Bound
Library Credentials” on page 133.
n The MODIFY statement has a PURGE= option that automatically removes all
retained metadata-bound library credentials if all tables in the library are
successfully modified to the newer credentials. For more information, see
“PURGE=YES | NO” on page 146 and “Retaining and Purging Metadata-Bound
Library Credentials” on page 133.
SAS 9.4M1 has a new name for the INDB= procedure option. The name changed to
DS2ACCEL=. The default value for the option also changed from YES to NO, which
prevents DS2 code from executing in the database unless requested. INDB= is still
supported as an alias. SAS 9.4M1 adds support for SAP HANA as a data source,
when appropriate SAS/ACCESS software is installed.
SAS 9.4M2 has a new XCODE= procedure option that controls the behavior of the
SAS session when an NLS transcoding failure occurs. In addition, the SYSCC macro
variable now contains the current SAS condition code that is returned to your
operating environment. SAS 9.4M2 adds support for Hive and PostgreSQL as data
sources, when appropriate SAS/ACCESS software is installed.
SAS 9.4M3 adds a new LIBS= procedure option that enables you to override the
default data source connection string that is sent the DS2 program, which includes
all active librefs. LIBS= restricts the data source connection to the specified
libref(s) and the Work library or User library (if a User library is defined). LIBS= can
facilitate connections when many librefs are active in the SAS session. SAS 9.4M3
adds support for HAWQ and Impala as data sources, when appropriate
SAS/ACCESS software is installed.
SAS 9.4M4 adds SAS Scalable Performance Data (SPD) Server tables as a data
source. PROC DS2 can access SPD Server 5.3 and later.
n Support for SAS Cloud Analytic Services (CAS) tables as a data source. You
must have SAS Viya 3.3 software in addition to SAS 9.4M5 software. You must
start a CAS session. You connect to the CAS session by specifying a new
SESSREF= or new SESSUUID= procedure option. The SESSREF= procedure
option identifies the CAS session by its session name. The SESSUUID=
procedure option identifies the session by its universally unique identifier
(UUID). You work with data in memory. PROC DS2 provides no way to promote
or persist data in CAS.
SAS 9.4M6 adds support for JDBC-compliant databases as data sources, when
appropriate SAS/ACCESS software is installed.
PROC DS2 is included in SAS Viya beginning with SAS Viya 3.1. In SAS Viya 3.1,
PROC DS2 supports SAS data sets and SAS Cloud Analytic Services (CAS) tables
as data sources. The DS2 functionality of SAS 9.4 is available for SAS data sets.
Most DS2 functionality is available in CAS, but not all of it. For information about
the differences, see SAS DS2 Programmer’s Guide. You must start a CAS session.
You connect to the CAS session by specifying the SESSREF= or SESSUUID=
procedure option. You must create tables or explicitly load tables in your CAS
session before you can manipulate them with PROC DS2. All work is in the default
caslib, unless you assign a caslib. For more information, see SAS Cloud Analytic
Services: User’s Guide. You work with CAS tables in memory. PROC DS2 cannot be
used to persist tables to the CAS server or to promote them to other CAS sessions.
In SAS Viya 3.2, PROC DS2 can automatically load tables into the CAS session
when you reference a caslib that specifies a SAS data connector.
SAS Viya 3.3 adds support for SPD Engine data sets and for all SAS 9.4 external
data sources, except SPD Server, as data sources in SAS Viya. The procedure
operates on the data sources on the SAS Compute Server by default. You can
process external data sources that have a data connector in CAS by assigning a
caslib and specifying the SESSREF= or SESSUUID= procedure option on the PROC
DS2 statement. Not all data sources have data connectors. See information about
available data connectors in SAS Cloud Analytic Services: User’s Guide.
Beginning in April 2019, the MongoDB and Salesforce non-relational databases are
supported as data sources for SAS 9.4M6. Access to both databases is Read-only
and through a SAS library. Appropriate SAS/ACCESS software must be installed.
Beginning in August 2019, the Google BigQuery and Snowflake databases are
supported as data sources for SAS 9.4M6 and for SAS Viya 3.4. Read and write
access is supported from a SAS library and on the CAS server. Appropriate
SAS/ACCESS software must be installed.
New Base SAS Procedures xxv
Beginning in November 2019 on SAS 9.4M6 and with SAS Viya 3.5, write access is
available for MongoDB and Salesforce data sources. The write support is available
through a SAS library and on the CAS server. Appropriate SAS/ACCESS software
must be installed.
In addition in SAS Viya 3.5, the VARBINARY data type is supported for DS2
programming in CAS.
Beginning with SAS 9.4M7, Spark and Yellowbrick are supported as data sources,
when appropriate SAS/ACCESS software is installed. Access is read and write
through a SAS library only.
SAS 9.4M5 adds support for the DSTODS2 procedure. This procedure enables you
to translate a subset of your SAS DATA step code into DS2 code. Then, if
necessary, you can revise your program to take advantage of DS2 features and
submit your program using PROC DS2. For more information, see Chapter 22,
“DSTODS2 Procedure,” on page 837.
SAS 9.4M1 adds support for SAP HANA as a data source, when appropriate
SAS/ACCESS software is installed.
SAS 9.4M2 adds the new XCODE= option that controls the behavior of the SAS
session when an NLS transcoding failure occurs. SAS 9.4M2 adds support for Hive
and PostgreSQL as data sources, when appropriate SAS/ACCESS software is
installed.
active librefs. LIBS= restricts the data source connection to the specified
libref(s) and the Work library or User library (if a User library is assigned). LIBS=
can facilitate connections when many librefs are active in the SAS session.
n Support for HAWQ and Impala as data sources, when appropriate SAS/ACCESS
software is installed.
n The behavior of the QUIT statement is documented.
SAS 9.4M4 adds SAS Scalable Performance Data (SPD) Server tables as a data
source. PROC FEDSQL can access SPD Server 5.3 and later. The documentation has
been enhanced to include an example that shows how to use a DS2 package
method as an expression.
SAS 9.4M6 adds support for JDBC-compliant databases as data sources, when
appropriate SAS/ACCESS software is installed.
PROC FEDSQL is included in SAS Viya applications beginning with SAS Viya 3.1. In
SAS Viya 3.1, PROC FEDSQL supports SAS data sets and SAS Cloud Analytic
Services (CAS) tables as data sources. The FedSQL functionality of SAS 9.4 is
available for SAS data sets. A subset of the functionality that FedSQL has in SAS
9.4 is available for CAS tables. For information about CAS functionality, see SAS
Viya: FedSQL Programming for SAS Cloud Analytic Services. You connect to a CAS
session by specifying the SESSREF= or SESSUUID= procedure option. The
SESSREF= option identifies the CAS session by its session name. The SESSUUID=
option identifies the session by its universally unique identifier (UUID). You must
explicitly load tables in your CAS session before you can submit FedSQL
statements. You work with the CAS tables in memory. PROC FEDSQL cannot be
used to persist tables to the CAS server or promote them to other CAS sessions.
SAS Viya 3.1 adds two procedure options. The new _METHOD procedure option
prints a text description of the FedSQL query plan for executing the specified
FedSQL statements. The new _POSTOPTPLAN procedure option prints an XML
tree illustrating the FedSQL query plan.
In SAS Viya 3.2, PROC FEDSQL can automatically load tables into CAS when you
reference a caslib that specifies a SAS data connector.
SAS Viya 3.3 adds support for SPD Engine data sets and for all SAS 9.4 external
data sources, except SPD Server, as data sources in SAS Viya. The procedure
New Base SAS Procedures xxvii
operates on the data sources on the SAS Compute Server by default. You can
process external data sources that have a data connector in CAS by assigning a
caslib and specifying the SESSREF= or SESSUUID= procedure option on the PROC
DS2 statement. Not all data sources have data connectors. See information about
available data connectors in SAS Cloud Analytic Services: User’s Guide.
In addition, SAS Viya 3.3 adds the following new features in CAS:
n FedSQL implicit pass-through is available for SQL-based caslibs. In the previous
SAS Viya release, with SAS data connector software, FedSQL automatically
loaded data into CAS for processing. In SAS Viya 3.3, the FedSQL language
supports single-source, full query implicit pass-through in CAS. When a request
is accessing a single data source, an attempt is made to implicitly pass the full
query down to the data source for processing. If pass-through is not possible,
the request is loaded for processing in CAS. FEDSQL output is always an in-
memory CAS table.
n PROC FEDSQL supports a new procedure option, CNTL=. CNTL= specifies
optional control parameters for the FedSQL query planner in CAS.
Beginning in April 2019, the MongoDB and Salesforce non-relational databases are
supported as data sources for SAS 9.4M6. Access to both databases is Read-only
and through a SAS library. Appropriate SAS/ACCESS software must be installed.
Beginning in August 2019, the Google BigQuery and Snowflake databases are
supported as data sources for SAS 9.4M6 and for SAS Viya 3.4. Read and Write
access is supported from a SAS library and on the CAS server. Appropriate
SAS/ACCESS software must be installed.
Beginning in November 2019 on SAS 9.4M6 and with SAS Viya 3.5, write access is
available for MongoDB and Salesforce data sources. The write support is available
from a SAS library and on the CAS server. Appropriate SAS/ACCESS software must
be installed.
n there are several CNTL= options for optimizing FedSQL performance on the
CAS server: DYNAMICCARDINALITY, OPTIMIZEVARBINARYPRECISION, and
xxviii What's New in Base SAS 9.4 Procedures
Beginning with SAS 9.4M7, Spark and Yellowbrick are supported as data sources,
when appropriate SAS/ACCESS software is installed. Access is read and write
through a SAS library only.
The JSON procedure reads data from a SAS data set and writes it to an external file
in JSON representation. For more information, see Chapter 38, “JSON Procedure,”
on page 1357.
New Base SAS Procedures xxix
In SAS 9.4M5, the following enhancements were made for PROC LUA:
n For customers running SAS Viya, PROC LUA enables you to call CAS actions.
n The LUA_PATH environment variable was added. Use this environment variable
to identify multiple locations for Lua scripts when one or more locations
contains a special character, such as a single quotation mark.
n The SAS.OPEN function accepts data set options, such as KEEP=, DROP=, or
WHERE=.
n The SAS.PUT_VALUE function replaces the SAS.PUT function from previous
releases. The SAS.PUT_VALUE function requires you to identify a variable by its
name only (and does not accept its position in the data set).
n Support for several functions from the TABLE library has been added. These
functions are TABLE.CONCAT, TABLE.INSERT, TABLE.REMOVE, and
TABLE.SORT.
In SAS Viya 3.3, support was added for the VARCHAR data type.
In SAS Viya 3.4, the following changes and enhancements were made:
n The Lua constant Math.Huge is now represented in SAS as the value
1.7976931348623E308. In previous releases, SAS represented this value as nil.
n Support has been added for the following string-manipulation functions. These
functions are SAS extensions to the Lua language and can be used only within
PROC LUA:
STRING.ENDS_WITH STRING.STARTS_WITH
STRING.RESOLVE STRING.TRIM
STRING.SPLIT
In SAS 9.4M6, information about the scope of variables and other objects that are
defined for PROC LUA was added to the documentation. For more information, see
“Scope for PROC LUA” on page 1410.
Note: PROC PRODUCT_STATUS is deprecated for SAS Viya 3.5 and will not be
available in future SAS Viya releases. PROC PRODUCT_STATUS is available to SAS
9.4 users.
The S3 Procedure
SAS 9.4M4 adds support for the S3 procedure. The S3 procedure enables you to
perform object management for objects in Amazon S3. These objects include
buckets, files, and directories. For more information, see Chapter 60, “S3
Procedure,” on page 2259.
In SAS 9.4M6, support was added for encryption when working with the Amazon S3
or Amazon Redshift environment. This support includes the new ENCKEY
statement that enables you to register encryption keys. There are also new options
available with the COPY, GET, GETDIR, INFO, PUT, and PUTDIR statements that
New Base SAS Procedures xxxi
enable encryption. For more information, see “Using Server-Side Encryption with
AWS Data” on page 2264.
In SAS 9.4M7 and SAS Viya 3.5, support was added for the OUT= option in the LIST
statement. This option enables you to save LIST output to an external file.
Also in SAS 9.4M7, support was added for the ROLENAME= and ROLEARN=
options in the PROC S3 statement.
In the May 2021 update for SAS 9.4M7 and SAS Viya 3.5, support was added for the
SESSION= option in the PROC S3 statement. This option enables you to specify an
AWS session token.
In the July 2021 update for SAS Viya 3.5, support was added for the REGION
statement for PROC S3. This statement enables you to define a custom region.
In the August 2021 update for SAS Viya 3.5, support was added for the HOST=
option in the REGION statement.
In the May 2022 update for SAS 9.4M7 and SAS Viya 3.5, the
CREDENTIALSPROFILE= option is no longer supported.
In SAS 9.4M8, support has been added for EC2 Instance Metadata Service version 2
(IMDSv2) and the REGION statement.
In the May 2024 update for SAS 9.4M8, the list of supported region values has been
updated.
In SAS Viya 3.4, the following enhancements have been made to the SCOREACCEL
procedure:
n The DELETEMODEL statement enables you to delete models previously
published to CAS, Teradata, and Hadoop.
n The AUTHDOMAIN option is added to the PUBLISHMODEL, RUNMODEL, and
DELETEMODEL statements. This option enables you to specify the name of the
authentication domain that contains the credentials that are used to access
Teradata.
n The PUBLISHMODEL statement now supports the FORMATITEMSTOREFILE
and STORETABLES options. The FORMATITEMSTOREFILE option enables you
to specify the file containing the format item store to be published. The
STORETABLES option enables you to specify one or more CAS blob table
names that contain the analytic stores to be published.
n The KEEPLIST option in the PUBLISHMODEL statement enables you to specify
whether to include a KEEP statement in the DS2 model program that was
automatically generated from an analytic store model.
xxxii What's New in Base SAS 9.4 Procedures
SAS Viya 3.5 adds support for publishing a model to a global model table. The
PUBLISHGLOBAL= option in the PUBLISHMODEL statement enables you to
publish a model to a global model table, and the DELETEGLOBAL= option in the
DELETEMODEL statement enables you to delete a model from a global model
table.
SAS Viya 3.5 (August 2021 release, or earlier if you apply a hot fix) adds the
following options:
n The WEBHDFSURL= option specifies a URL to delete, publish, or run a model in
a platform that is configured to access the distributed file system through the
REST API.
n The JOBMANAGEMENTURL= option specifies a URL to submit execution
requests over a REST interface to services such as Apache Livy.
n The INDATASET= and OUTDATASET= options provide additional support for
Spark.
For more information, see Chapter 62, “SCOREACCEL Procedure,” on page 2305.
o Data sets with time zone offsets can now be transported using PROC
CPORT (with the DATECOPY option specified) and PROC CIMPORT. For
more information, see “DATECOPY” on page 542.
n SAS 9.4M2 adds the SORT option to PROC CIMPORT. This option causes the
data set that is being imported to be re-sorted according to the destination
operating system’s collating sequence. For more information, see “SORT” on
page 398.
n SAS 9.4M3 adds support to PROC CIMPORT for importing data sets created in
non-UTF-8 SAS sessions into UTF-8 SAS sessions. Prior to this release,
transport files were encoded in a Windows encoding that corresponded to the
SAS session encoding.
n In SAS Viya 3.5, PROC CIMPORT can be used to import a file that was created
in UTF-8 encoding into non-UTF-8 SAS sessions and output corresponding data
sets.
n SAS Viya 3.5 adds options EXTENDVAR= and the EXTENDFORMAT = options
to PROC CIMPORT. When a transport file is imported, the original string
variable length might not be large enough for the transcoded strings.To ensure
that your destination buffer size is sufficient for the transcoded data, specify
the EXTENDVAR= and the EXTENDFORMAT = options.
SAS 9.4M5 adds support for the VARCHAR data type. PROC CONTENTS output
shows the number of bytes and characters for variables.
n ENCRYPTKEY= option specifies the key value for an AES-encrypted data set.
For more information, see ENCRYPTKEY= on page 616.
n Extended attributes are customized metadata for your SAS files. They are user-
defined characteristics that you associate with a SAS data set or variable. For
more information, see “Extended Attributes” on page 567.
In SAS 9.4M5, you can copy a CAS table to another CAS table on the CAS server.
SAS 9.4M5 adds support for the VARCHAR data type for the COPY and
CONTENTS statements.
n SAS 9.4M1 adds support for exporting CSV files with a SAS data set name that
contains a single quotation mark when the VALIDMEMNAME=EXTEND system
option is specified. Using VALIDMEMNAME= expands the rules for SAS data set
names. For more information, see “Using the External File Interface (EFI) ” in
SAS/ACCESS Interface to PC Files: Reference.
n The following is true for JMP files:
o SAS 9.4 imports data from JMP files that are saved in JMP 7 or later formats,
and it exports to files in JMP 7 or later formats. File formats in JMP 3
through 6 are no longer supported. Support for these newer formats enables
you to access JMP files for viewing in a variety of ways, such as with the
JMP Graph Builder iPad app.
o The META data type for JMP files is no longer supported. Instead, extended
attributes are automatically used. META can remain in programs but doing
so generates a NOTE in the log and the statement is ignored.
o The META statement for PROC EXPORT is no longer supported for JMP files
and is ignored. Instead, extended attributes are automatically used.
o JMP variable names can be up to 255 characters in length.
o The ROWSTATE data type is generated by JMP and is used to store several
row-level characteristics. When PROC EXPORT sees a column named
_rowstate_, it converts it back into row state information in the output JMP
Enhanced Base SAS Procedures xxxvii
file. (If the JMP file contains row state information, then PROC IMPORT
stores this information as a new variable with the name _rowstate_.)
o For more information, see Chapter 23, “EXPORT Procedure,” on page 849.
For more information, see Chapter 23, “EXPORT Procedure,” on page 849.
SAS 9.4M3 adds the STATIC statement. For more information see “STRUCT
Statement” on page 896 .
SAS 9.4M5 adds dictionary and ASTORE support to PROC FCMP. For more
information see “Dictionaries” on page 918 and “PROC FCMP and ASTORE” on page
918.
In SAS Viya 3.5, the quoted string size limit and label size limit for PROC FCMP
were updated to match the DATA step size limits. PROC FCMP now supports
quoted strings up to 32,767 bytes and labels up to 256 bytes.
For more information, see Chapter 29, “FONTREG Procedure,” on page 1057.
n You can create a format based on a Perl regular expression by using the
INVALUE statement options REGEXP and REGEXPE.
o The HDFS statement supports the CAT= option to display the contents of
files, the CHMOD= option to change file access permissions, and the LS=
option to list HDFS files.
o Several HDFS statement options support wildcard characters when you
specify HDFS files and request recursive action to execute the operation on
the specified directory as well as subdirectories.
Enhanced Base SAS Procedures xxxix
o You can connect to the Hadoop cluster by copying the Hadoop cluster
configuration files to a physical location that is accessible to the SAS client
machine, and then set a SAS environment variable to the location of the
configuration files.
o You can submit a MapReduce program and Pig language code to a Hadoop
cluster through the Apache Oozie RESTful API.
The HADOOP procedure is available in both SAS 9.4 and in SAS Viya. However, the
HADOOP procedure is not supported in a CAS session
For more information, see Chapter 33, “HADOOP Procedure,” on page 1233.
SAS 9.4M5 supports the NAME= option, where you can specify Hadoop or the new
Spark data source.
In SAS 9.4M6, Hive 3.0 supports managed, external, and transactional tables. By
default, a new table is created as managed and transactional.
The HTTP procedure is available in both SAS 9.4 and in SAS Viya. The initial
versions of SAS Viya have the same functionality as SAS 9.4M3.
n SAS Viya 3.3 adds the DEBUG statement, the TIMEOUT= procedure option, and
the PROC HTTP response status macro variables to SAS Viya. It also adds a
new OAUTH_BEARER= procedure option and a new value for
OAUTH_BEARER=, the constant SAS_SERVICES. The SAS_SERVICES constant
is supported in SAS Viya only.
n In May 2019, the documentation was updated to include the following:
In addition, SAS Viya 3.5 and the November 2019 release of SAS 9.4M6 add two
new procedure options, two new parameters for the IN= procedure option,
change quoting requirements for the METHOD= procedure option, and include a
documentation enhancement.
o The MAXWAITS= procedure option enables you to specify the maximum
number of redirects that are allowed.
o The QUERY= procedure option enables you to submit URL-encoded query
parameters for the URL= argument.
o The FORM parameter to the IN= procedure option enables you to send input
data as a standard HTML form.
o The MULTI parameter to the IN= procedure option enables you to upload
generic multipart data or generic form data, when it is used with the FORM
option. When the FORM option is used, the upload is performed using the
specialized multipart type known as multipart/form-data.
o The METHOD= procedure option no longer requires the http-method
argument to be quoted.
o The documentation now describes how to send chunked data in the
HEADERS statement.
n In May 2022, the documentation was updated to include a new example:
“Specify Local Options for Two-Way Encryption in UNIX”.
For more information, see Chapter 35, “HTTP Procedure,” on page 1279.
n SAS 9.4M1 adds support for exporting CSV files with a SAS data set name that
contains a single quotation mark when the VALIDMEMNAME=EXTEND system
option is specified. Using VALIDMEMNAME= expands the rules for SAS data set
names. For more information, see “Using the External File Interface (EFI) ” in
SAS/ACCESS Interface to PC Files: Reference.
n The following is true for JMP files:
o SAS 9.4 imports data from JMP files that are saved in JMP 7 or later formats,
and it exports to files in JMP 7 or later formats. File formats in JMP 3
through 6 are no longer supported. Support for these newer formats enables
xlii What's New in Base SAS 9.4 Procedures
you to access JMP files for viewing in a variety of ways, such as with the
JMP Graph Builder iPad app.
o The META data type for JMP files is no longer supported. Instead, extended
attributes are automatically used. META can remain in programs but doing
so generates a NOTE in the log and the statement is ignored.
o The META statement for PROC IMPORT is no longer supported for JMP files
and is ignored. Instead, extended attributes are automatically used. When
importing a JMP file with extended attributes, the attributes are
automatically attached to the new SAS data set.
o JMP variable names can be up to 255 characters in length.
o The ROWSTATE data type is generated by JMP and is used to store several
row-level characteristics. When the JMP file contains row state information,
PROC IMPORT stores this information as a new variable with the name
_rowstate_. (If PROC EXPORT sees a column named _rowstate_, then it
converts it back into row state information in the JMP output file.)
o SAS 9.4M3 supports In-Database processing for PROC MEANS with the
Impala, HAWK, and SAP HANA database management systems.
SAS 9.4M3 has an enhancement to restore the previous location of the SAS log and
LISTING output files. SAS saves the path of the SAS log and LISTING output files in
automatic macro variables. For more information, see Chapter 49, “PRINTTO
Procedure,” on page 1857.
xliv What's New in Base SAS 9.4 Procedures
o A new section was added to describe the use of ODS Styles with PROC
REPORT. For more information, see “Using ODS Styles with PROC REPORT”
on page 2142.
xlvi What's New in Base SAS 9.4 Procedures
o PROC REPORT now supports statistical keywords P20, P30, P40, P60, P70,
and P80. For information, see “Statistics That Are Available in PROC
REPORT” on page 2140.
n In SAS 9.4M6, the PROC PRINT CONTENTS= option on page 1759 accepts #BY
directives.
Beginning with SAS 9.4M5, PROC REPORT summarization can be executed on the
CAS server.
For more information, see Chapter 58, “REPORT Procedure,” on page 2047.
Starting with SAS 9.4M6, you can specify the CAPTION= option in the PROC
REPORT statement and specify the ACCESSIBLETABLE system option to add
visible captions to the tables. Starting with SAS 9.4M6, when BY directives are
specified in the CAPTION= and CONTENTS= options, labels for the BY group
tables are displayed in the table of contents in PDF output and in the contents file
in HTML output. The labels are based on the values of the BY variable.
o SAS 9.4M3 supports In-Database processing for PROC SORT with the
Impala, HAWK, and SAP HANA database management systems.
o SAS 9.4M6 supports In-Database processing for PROC SORT with the
Google BigQuery database management system.
o SAS 9.4M7 supports In-Database processing for PROC SORT with the
Yellowbrick database management system.
n SAS Viya 3.4 supports ICU version 56. This ICU version uses locale data from
version 28 of the Unicode Common Locale Data Repository (CLDR). For in-
depth information, see Download ICU 56 and CLDR 28 Release Note.
n In SAS Viya 3.5, PROC SORT supports the ability to run the duplicate detection
and manipulation options (NODUPKEY, NOUNIKEY, DUPOUT=, UNIOUT=) in
CAS. This functionality is provided to facilitate migration of code for users of
SAS Viya. This task is performed by using the deduplicate Action.
For more information, see Chapter 64, “SORT Procedure,” on page 2355.
In SAS Viya 3.5, support was added for the HIVE_SERVER= and HIVE_URI= options.
o SAS 9.4M3 supports In-Database processing for PROC SUMMARY with the
Impala, HAWK, and SAP HANA database management systems.
o SAS 9.4M7 supports In-Database processing for PROC SUMMARY with the
Yellowbrick database management system.
n In SAS 9.4M6, the CAPTION= option is new for the TABLE statement.
n In SAS 9.4M6, PROC TABULATE provides the ability to create accessible output
tables when used with the ACCESSIBLETABLE system option.
n Beginning with SAS 9.4M5, PROC TABULATE summarization can be executed
on the CAS server.
n SAS supports In-Database processing for PROC TABULATE.
o SAS 9.4M6 supports In-Database processing for PROC TABULATE with the
Google BigQuery database management system.
o SAS 9.4M7 supports In-Database processing for PROC TABULATE with the
Yellowbrick database management system.
o SAS 9.4 supports In-Database processing for PROC TRANSPOSE with the
HADOOP and Terradata database management system.
Software Enhancements
SAS supports access to files that are created with the Advanced Encryption
Standard (AES). For more information, see “AES Encryption” on page 25.
The International Components for Unicode (ICU) libraries have been upgraded from
version 4.2 to version 4.8. The ICU is used by SAS for linguistic collation of
character data. For more information about ICU version 4.8, see the ICU website at
https://icu.unicode.org/download/48.
Documentation Enhancements xlix
Data sets that are sorted linguistically by one release of SAS might not be
recognized as sorted by another release. For more information about the effect of a
change in the ICU version, see:
n “ Linguistic Sorting of Data Sets and ICU” on page 2362
Documentation Enhancements
The following changes have been made to the Base SAS Procedures Guide:
n “Threaded Processing for Base SAS Procedures” on page 26 contains
information about SAS procedures that support threaded processing.
n “Using PROC FCMP Component Objects” on page 918 contains more
information about hashing.
n SAS 9.4M1 adds a link and supporting text for Microsoft Excel functions that are
available to PROC FCMP.
n Information about the Chapter 34, “HDMD Procedure” was moved to this
document from SAS/ACCESS for Relational Databases: Reference.
n Chapter 41, “MIGRATE Procedure,” on page 1559 contains more information
about using the BUFSIZE= option to improve the performance of migrated data
sets.
l What's New in Base SAS 9.4 Procedures
1
PART 1
Concepts
Chapter 1
Choosing the Right Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2
Fundamental Concepts for Using Base SAS Procedures . . . . . . . . . . . . . . . . . . 21
Chapter 3
Statements with the Same Function in Multiple Procedures . . . . . . . . . . . . . . 73
Chapter 4
In-Database Processing of Base Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Chapter 5
CAS Processing of Base Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Chapter 6
Base SAS Procedures Documented in Other Publications . . . . . . . . . . . . . . . 99
2
3
1
Choosing the Right Procedure
Report Writing
These procedures display useful information, such as data listings (detail reports),
summary reports, calendars, letters, labels, multipanel reports, and graphical
reports.
Statistics
These procedures compute elementary statistical measures that include
descriptive statistics based on moments, quantiles, confidence intervals, frequency
counts, crosstabulations, correlations, and distribution tests. They also rank and
standardize data.
MEANS STANDARD
Utilities
These procedures perform the following basic utility operations:
n create, edit, sort, and transpose data sets
n provide basic file maintenance such as copy, append, and compare data sets
PMENU1
1 See the SAS documentation for your operating environment for a description of this procedure.
2 For a description of this procedure, see the SAS Output Delivery System: User’s Guide.
3 For a description of this procedure, see the SAS/ACCESS for Relational Databases: Reference.
4 For a description of this procedure, see the SAS/ACCESS Interface to PC Files: Reference.
5 For a description of this procedure, see the Base SAS Guide to Information Maps.
6 For a description of this procedure, see the SAS Language Interfaces to Metadata.
Report-Writing Procedures
The following table lists report-writing procedures according to the type of report.
Detail reports PRINT Produces data listings quickly; can supply titles,
footnotes, and column sums.
6 Chapter 1 / Choosing the Right Procedure
1 These reports quickly produce a simple graphical picture of the data. To produce high-resolution graphical reports, use
SAS/GRAPH software.
Statistical Procedures
Distribution analysis UNIVARIATE Computes tests for location and tests for normality.
Data transformation
Computing ranks RANK Computes ranks for one or more numeric variables
across the observations of a SAS data set and
creates an output data set; can produce normal
scores or other rank scores.
Low-resolution graphics1
Efficiency Issues
Quantiles
For a large sample size n, the calculation of quantiles, including the median,
requires computing time proportional to nlog(n). Therefore, a procedure, such as
UNIVARIATE, that automatically calculates quantiles might require more time than
other data summarization procedures. Furthermore, because data is held in
memory, the procedure also requires more storage space to perform the
computations. By default, the report procedures PROC MEANS, PROC SUMMARY,
and PROC TABULATE require less memory because they do not automatically
compute quantiles. These procedures also provide an option to use a new fixed-
memory, quantiles estimation method that is usually less memory-intense. For
more information, see “Quantiles” on page 1512 in the PROC MEANS
documentation.
procedures discuss the statistical concepts that are useful to interpret a procedure
output.
Utility Procedures
The following table groups utility procedures according to task.
Supply information COMPARE Compares the contents of two SAS data sets.
Manage SAS system OPTIONS Lists the current values of all SAS system options.
options
Create, browse, and edit FCMP Enables creation, testing, and storage of SAS
data functions and subroutines before they are used in
other SAS procedures.
Manage SAS files APPEND Appends one SAS data set to the end of another.
EXPORT5 Reads data from a SAS data set and writes them to
an external data source.
Manage metadata in a METADATA7 Sends a method call, in the form of an XML string,
SAS Metadata to a SAS Metadata Server.
Repository
1 See the SAS documentation for your operating environment for a description of these procedures.
2 For a description of this procedure, see the SAS Output Delivery System: User’s Guide.
3 For a description of this procedure, see the SAS/ACCESS for Relational Databases: Reference.
4 For a description of this procedure, see the SAS National Language Support (NLS): Reference Guide
5 For a description of this procedure, see the SAS/ACCESS Interface to PC Files: Reference.
6 For a description of this procedure, see the Base SAS Guide to Information Maps.
7 For a description of this procedure, see the SAS Language Interfaces to Metadata.
CHART procedure
produces vertical and horizontal bar charts, block charts, pie charts, and star
charts. These charts provide a quick visual representation of the values of a
single variable or several variables. PROC CHART can also display a statistic
associated with the values.
CIMPORT procedure
restores a transport file created by the CPORT procedure to its original form (a
SAS library, catalog, or data set) in the format appropriate to the operating
environment. Coupled with the CPORT procedure, PROC CIMPORT enables you
to move SAS libraries, catalogs, and data sets from one operating environment
to another.
COMPARE procedure
compares the contents of two SAS data sets. You can also use PROC COMPARE
to compare the values of different variables within a single data set. PROC
COMPARE produces a variety of reports on the comparisons that it performs.
CONTENTS procedure
prints descriptions of the contents of one or more files in a SAS library.
CONVERT procedure
converts BMDP system files, OSIRIS system files, and SPSS portable files to
SAS data sets. For more information, see the SAS documentation for your
operating environment.
COPY procedure
copies an entire SAS library or specific members of the library. You can limit
processing to specific types of library members.
CORR procedure
computes Pearson product-moment and weighted product-moment correlation
coefficients between variables and descriptive statistics for these variables. In
addition, PROC CORR can compute three nonparametric measures of
association (Spearman's rank-order correlation, Kendall's tau-b, and Hoeffding's
measure of dependence, D), partial correlations (Pearson's partial correlation,
Spearman's partial rank-order correlation, and Kendall's partial tau-b), and
Cronbach's coefficient alpha. For more information, see Base SAS Procedures
Guide: Statistical Procedures.
CPORT procedure
writes SAS libraries, data sets, and catalogs in a special format called a
transport file. Coupled with the CIMPORT procedure, PROC CPORT enables
you to move SAS libraries, data sets, and catalogs from one operating
environment to another.
CV2VIEW procedure
converts SAS/ACCESS view descriptors to PROC SQL views. Starting in SAS 9,
conversion of SAS/ACCESS view descriptors to PROC SQL views is
recommended because PROC SQL views are platform-independent and enable
you to use the LIBNAME statement. For more information, see the SAS/ACCESS
for Relational Databases: Reference.
DATASETS procedure
lists, copies, renames, and deletes SAS files and SAS generation groups;
manages indexes; and appends SAS data sets in a SAS library. The procedure
provides all the capabilities of the APPEND, CONTENTS, and COPY procedures.
16 Chapter 1 / Choosing the Right Procedure
You can also modify variables within data sets; manage data set attributes, such
as labels and passwords; or Create and Delete integrity constraints.
DELETE procedure
deletes SAS files from the disk or tape on which it is stored.
DISPLAY procedure
executes SAS/AF applications. For information about building SAS/AF
applications, see the Guide to SAS/AF Applications Development.
DOCUMENT procedure
manipulates procedure output that is stored in ODS documents. PROC
DOCUMENT enables a user to browse and edit output objects and hierarchies,
and to replay them to any supported ODS output format. For more information,
see SAS Output Delivery System: User’s Guide.
DS2 procedure
enables you to submit DS2 language statements from a Base SAS session.
EXPORT procedure
reads data from a SAS data set and writes it to an external data source.
FCMP procedure
enables you to create, test, and store SAS functions and subroutines before you
use them in other SAS procedures. PROC FCMP accepts slight variations of
DATA step statements. Most features of the SAS programming language can be
used in functions and subroutines that are processed by PROC FCMP.
FEDSQL procedure
enables you to submit FedSQL language statements from a Base SAS session.
FONTREG procedure
adds system fonts to the SAS registry.
FORMAT procedure
creates user-defined informats and formats for character or numeric variables.
PROC FORMAT also prints the contents of a format library, creates a control
data set to write other informats or formats, and reads a control data set to
create informats or formats.
FREQ procedure
produces one-way to n-way frequency tables and reports frequency counts.
PROC FREQ can compute chi-square tests for one-way to n-way tables; for
tests and measures of association and of agreement for two-way to n-way
crosstabulation tables; risks and risk difference for 2×2 tables; trends tests;and
Cochran-Mantel-Haenszel statistics. You can also create output data sets. For
more information, see Base SAS Procedures Guide: Statistical Procedures.
FSLIST procedure
displays the contents of an external file or copies text from an external file to
the SAS Text Editor.
GROOVY procedure
enables SAS code to execute Groovy code on the Java Virtual Machine (JVM).
HADOOP procedure
enables SAS to run Apache Hadoop code against Hadoop data.
Brief Descriptions of Base SAS Procedures 17
HDMD procedure
generate XML-based metadata that describes the contents of files that are
stored in HDFS.
HTTP procedure
issues Hypertext Transfer Protocol (HTTP) requests.
IMPORT procedure
reads data from an external data source and writes them to a SAS data set.
INFOMAPS procedure
creates or updates a SAS Information Map. For more information, see the Base
SAS Guide to Information Maps.
JAVAINFO procedure
conveys diagnostic information to the user about the Java environment that
SAS is using. The diagnostic information can be used to confirm that the SAS
Java environment has been configured correctly and can be helpful when
reporting problems to SAS technical support.
JSON procedure
reads data from a SAS data set and writes it to an external file in JSON
representation.
MEANS procedure
computes descriptive statistics for numeric variables across all observations
and within groups of observations. You can also create an output data set that
contains specific statistics and identifies minimum and maximum values for
groups of observations.
METADATA procedure
sends a method call, in the form of an XML string, to a SAS Metadata Server. For
more information, see SAS Language Interfaces to Metadata.
METALIB procedure
updates metadata in a SAS Metadata Repository to match the tables in a library.
For more information, see SAS Language Interfaces to Metadata.
METAOPERATE procedure
performs administrative tasks on a metadata server. For more information, see
SAS Language Interfaces to Metadata.
MIGRATE procedure
migrates members in a SAS library forward to the most current release of SAS.
The migration must occur within the same engine family; for example, V6, V7, or
V8 can migrate to V9, but V6TAPE must migrate to V9TAPE.
OPTIONS procedure
lists the current values of all SAS system options.
OPTLOAD procedure
reads SAS system option settings from the SAS registry or a SAS data set, and
puts them into effect.
OPTSAVE procedure
saves SAS system option settings to the SAS registry or a SAS data set.
PDS procedure
lists, deletes, and renames the members of a partitioned data set. For more
information, see the SAS Companion for z/OS.
18 Chapter 1 / Choosing the Right Procedure
PDSCOPY procedure
copies partitioned data sets from disk to tape, disk to disk, tape to tape, or tape
to disk. For more information, see the SAS Companion for z/OS.
PLOT procedure
produces scatter plots that graph one variable against another. The coordinates
of each point on the plot correspond to the two variables' values in one or more
observations of the input data set.
PMENU procedure
defines menus that you can use in DATA step windows, macro windows, and
SAS/AF windows, or in any SAS application that enables you to specify
customized menus.
PRESENV procedure
preserves all global statements and macro variables in your SAS code from one
SAS session to another. When this procedure is invoked at the end of a SAS
session, all of the global statements and macro variables are written to a file.
PRINT procedure
prints the observations in a SAS data set, using all or some of the variables.
PROC PRINT can also print totals and subtotals for numeric variables.
PRINTTO procedure
defines destinations for SAS procedure output and the SAS log.
PROTO procedure
enables you to register, in batch mode, external functions that are written in the
C or C++ programming languages. You can use these functions in SAS as well as
in C-language structures and types. After these functions are registered in
PROC PROTO, they can be called from any SAS function or subroutine that is
declared in the FCMP procedure. After registration, they can also be called from
any SAS function, subroutine, or method block that is declared in the COMPILE
procedure.
PRTDEF procedure
creates printer definitions for individual SAS users or all SAS users.
PRTEXP procedure
exports printer definition attributes to a SAS data set so that they can be easily
replicated and modified.
PWENCODE procedure
encodes passwords for use in SAS programs.
QDEVICE procedure
produces reports about graphics devices and universal printers.
RANK procedure
computes ranks for one or more numeric variables across the observations of a
SAS data set. The ranks are written to a new SAS data set. Alternatively, PROC
RANK produces normal scores or other rank scores.
REGISTRY procedure
imports registry information into the USER portion of the SAS registry.
RELEASE procedure
releases unused space at the end of a disk data set in the z/OS environment. For
more information, see the SAS Companion for z/OS.
Brief Descriptions of Base SAS Procedures 19
REPORT procedure
combines features of the PRINT, MEANS, and TABULATE procedures with
features of the DATA step in a single report-writing tool that can produce both
detail and summary reports.
SCAPROC procedure
implements the SAS Code Analyzer, which captures information about input,
output, and the use of macro symbols from a SAS job while it is running.
SCOREACCEL procedure
provides an interface to the CAS server for DATA step and DS2 model
publishing and scoring.
SOAP procedure
reads XML input from a file that has a fileref and writes XML output to another
file that has a fileref.
SORT procedure
sorts observations in a SAS data set by one or more variables. PROC SORT
stores the resulting sorted observations in a new SAS data set or replaces the
original data set.
SOURCE procedure
provides an easy way to back up and process source library data sets. For more
information, see the SAS documentation for your operating environment.
SQL procedure
implements a subset of the Structured Query Language (SQL) for use in SAS.
SQL is a standardized, widely used language that retrieves and updates data in
SAS data sets, SQL views, and DBMS tables, as well as views based on those
tables. PROC SQL can also create tables and views, summaries, statistics, and
reports and perform utility functions such as sorting and concatenating. For
more information, see SAS SQL Procedure User’s Guide.
SQOOP procedure
allows access to Apache Sqoop using options to allow data transfer between a
database and HDFS.
STANDARD procedure
standardizes some or all of the variables in a SAS data set to a given mean and
standard deviation and produces a new SAS data set that contains the
standardized values.
STREAM procedure
enables you to process an input stream that consists of arbitrary text that can
contain SAS macro specifications. It can expand macro code and store it in a file.
SUMMARY procedure
computes descriptive statistics for the variables in a SAS data set across all
observations and within groups of observations, and writes the results to a new
SAS data set.
TABULATE procedure
displays descriptive statistics in tabular form. The value in each table cell is
calculated from the variables and statistics that define the pages, rows, and
columns of the table. The statistic associated with each cell is calculated on
values from all observations in that category. You can write the results to a SAS
data set.
20 Chapter 1 / Choosing the Right Procedure
TAPECOPY procedure
copies an entire tape volume or files from one or more tape volumes to one
output tape volume. For more information, see the SAS Companion for z/OS.
TAPELABEL procedure
lists the label information of an IBM standard-labeled tape volume under the
z/OS environment. For more information, see the SAS Companion for z/OS.
TEMPLATE procedure
customizes ODS output for an entire SAS job or a single ODS output object. For
more information, see SAS Output Delivery System: User’s Guide.
TIMEPLOT procedure
produces plots of one or more variables over time intervals.
TRANSPOSE procedure
transposes a data set that changes observations into variables and vice versa.
TRANTAB procedure
creates, edits, and displays customized translation tables. For more information,
see SAS National Language Support (NLS): Reference Guide.
UNIVARIATE procedure
computes descriptive statistics (including quantiles), confidence intervals, and
robust estimates for numeric variables. Provides detail on the distribution of
numeric variables, which include tests for normality, plots to illustrate the
distribution, frequency tables, and tests of location. For more information, see
Base SAS Procedures Guide: Statistical Procedures.
XSL procedure
transforms an XML document into another format, such as HTML, text, or
another XML document type.
21
2
Fundamental Concepts for Using
Base SAS Procedures
Language Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Temporary and Permanent SAS Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
SAS System Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Data Set Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Global Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
AES Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Procedure Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Input Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Threaded Processing for Base SAS Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Controlling the Order of Data Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
RUN-Group Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Creating Titles That Contain BY-Group Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Shortcuts for Specifying Lists of Variable Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Formatted Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Processing All the Data Sets in a Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Operating Environment-Specific Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Statistic Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Computational Requirements for Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Output Delivery System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
22 Chapter 2 / Fundamental Concepts for Using Base SAS Procedures
Language Concepts
The SAS system options WORK=, WORKINIT, and WORKTERM affect how you
work with temporary and permanent libraries. For more information, see SAS
System Options: Reference.
Typically, two-level names represent permanent SAS data sets. A two-level name
takes the form libref.SAS-data-set. The libref is a name that is temporarily
associated with a SAS library. A SAS library is an external storage location that
stores SAS data sets in your operating environment. A LIBNAME statement
associates the libref with the SAS library. In the following PROC PRINT step,
PROCLIB is the libref and EMP is the SAS data set within the library:
libname proclib 'SAS-library';
proc print data=proclib.emp;
run;
USER Library
You can use one-level names for permanent SAS data sets by specifying a USER
library. You can assign a USER library with a LIBNAME statement or with the SAS
system option USER=. After you specify a USER library, the procedure assumes
Language Concepts 23
that data sets with one-level names are in the USER library instead of the WORK
library. For example, the following PROC PRINT step assumes that DEBATE is in
the USER library:
options user='SAS-library';
proc print data=debate;
run;
Note: If you have a USER library defined, then you can still use the WORK library
by specifying WORK.SAS-data-set.
n DATE | NODATE
n DETAILS | NODETAILS
n FMTERR | NOFMTERR
n FORMCHAR=
n FORMDLIM=
n LABEL | NOLABEL
n LINESIZE=
n NUMBER | NONUMBER
n PAGENO=
n PAGESIZE=
n REPLACE | NOREPLACE
n SOURCE | NOSOURCE
For a complete description of SAS system options, see the SAS System Options:
Reference.
The individual procedure chapters contain reminders that you can use data set
options where it is appropriate.
ALTER= OBS=
BUFNO= OBSBUF=
BUFSIZE= OUTREP=
CNTLLEV= POINTOBS=
COMPRESS= PW=
DLDMGACTION= PWREQ=
DROP= READ=
ENCODING= RENAME=
ENCRYPT= REPEMPTY=
FILECLOSE= REPLACE=
FIRSTOBS= REUSE=
GENMAX= SORTEDBY=
GENNUM= SPILL=
IDXNAME= TOBSNO=
IDXWHERE= TYPE=
IN= WHERE=
INDEX= WHEREUP=
KEEP= WRITE=
LABEL=
For a complete description of SAS data set options, see the SAS Data Set Options:
Reference.
Language Concepts 25
Global Statements
You can use these global statements anywhere in SAS programs except after a
DATALINES, CARDS, or PARMCARDS statement:
comment ODS
DM OPTIONS
ENDSAS PAGE
FILENAME RUN
FOOTNOTE %RUN
%INCLUDE SASFILE
LIBNAME SKIP
%LIST TITLE
LOCK X
For information about all the above statements except for the ODS statement, see
the SAS DATA Step Statements: Reference. For information about the ODS
statement, see “Output Delivery System” on page 72 and SAS Output Delivery
System: User’s Guide.
AES Encryption
Prior to the SAS 9.4M5 release, SAS supported only one Advanced Encryption
Standard (AES) encryption algorithm that was specified with the ENCRYPT=AES
data set option. Beginning with the SAS 9.4M5 release, SAS supports a stronger
AES key generation algorithm specified with the ENCRYPT=AES2 data set option.
This stronger algorithm meets newer standards requested by some SAS customers.
The same key value passphrase specified by the ENCRYPTKEY= data set option
can be used with either algorithm. A data set that is encrypted with the AES2
algorithm cannot be accessed by any SAS release prior to SAS 9.4M5.
To access a data set that is created with AES encryption, you have to supply the
encryption key value with the ENCRYPTKEY= option. If you omit the
ENCRYPTKEY= key value when accessing an AES secured data set, a dialog box
appears and prompts you to add the ENCRYPTKEY= key value. For more
26 Chapter 2 / Fundamental Concepts for Using Base SAS Procedures
Procedure Concepts
If you omit the DATA= option, the procedure uses the value of the SAS system
option _LAST_=. The default of _LAST_= is the most recently created SAS data set
in the current SAS job or session. _LAST_= is described in detail in the SAS Data Set
Options: Reference.
See Also
System Options
n “CPUCOUNT= System Option” in SAS System Options: Reference
n “THREADS System Option” in SAS System Options: Reference
Other Documentation
n “Support for Parallel Processing” in SAS Language Reference: Concepts
n SAS Scalable Performance Data Server: User's Guide
n the BY statement
n formats
This example code was run on Windows and z/OS, changing the title for each
system:
data order;
input x $1.;
datalines;
1
a
A
z
Z
\
;
proc print;
run;
The following table shows the difference in the PRINT procedure output between
ASCII and EBCIDIC:
1 a
A z
Z A
\ \
a Z
z 1
For more information, see “Collating Sequence” in SAS National Language Support
(NLS): Reference Guide.
30 Chapter 2 / Fundamental Concepts for Using Base SAS Procedures
Here are the first two BY groups ordered by the Style variable for the house styles:
By default, the SORT procedure orders BY groups in ascending order. To reverse the
order, you use the DESCENDING option in the BY statement in PROC SORT and in
subsequent procedures that process the data set.
Procedure Concepts 31
The following table lists some of the procedures and their statements that define
classification variables:
32 Chapter 2 / Fundamental Concepts for Using Base SAS Procedures
Procedure Statement
FREQ TABLES
MEANS CLASS
REPORT DEFINE
use options GROUP, ORDER, or ACROSS
SUMMARY CLASS
TABULATE CLASS
For most procedures, the default ordering scheme is ascending for single
classification variables with no formats or ordering options:
proc means data=sasuser.houses nway mean;
class style;
var sqfeet;
run;
Figure 2.6 The Default Ascending Order for a Single Classification Variable
For information about the default data ordering behavior for PROC REPORT, see
“Order Data Using the ORDER= Option” on page 44.
Statement That
Specifies the Higher-
Procedure Order Variable Example
run;
The ordering scheme is determined first by developing a master order for each class
variable, for the entire data set or BY group. The master order is then applied to
each subgroup of the hierarchy. The order does not change from one class subgroup
to the next.
The master order for Style and Bedrooms is determined separately. Style is the
higher-order class variable. Bedrooms forms subgroups of each value of Style. The
order for Bedrooms is not determined again for each value of Style. Instead, the
order is taken from the master order that is determined before generating
subgroups.
When no ORDER= option is specified in the CLASS statement, the data is ordered
by using unformatted values, which results in the same order as the SORT
procedure. The master order for Style is ascending alphabetically, and the master
order for Bedrooms is ascending numerically:
34 Chapter 2 / Fundamental Concepts for Using Base SAS Procedures
Consider this example where the order also considers frequency counts:
proc tabulate data=sasuser.houses format=3. noseps order=freq;
class style bedrooms;
table style*bedrooms, n / rts=23;
run;
For PROC TABULATE, if the frequency count is the same for multiple variables,
then the master order uses the order in which the data was read by the procedure.
When PROC TABULATE reads the data set, the frequency count for Ranch is 4,
Split is 3, Condo is 4, and TwoStory is 4. Therefore, the master order of the higher-
order variable, Style, is Ranch, Condo, TwoStory, and Split.
The master order for bedrooms is then determined. Two bedrooms has a count of 5,
four bedrooms has a count of 4, three bedrooms has a count of 4, and one bedroom
has a count of 2. The master order for bedrooms is 2, 4, 3, 1. Here is the output:
Procedure Concepts 35
The order becomes obvious when you add the PRINTMISS option to the TABLE
statement to show the frequency count:
36 Chapter 2 / Fundamental Concepts for Using Base SAS Procedures
The REPORT procedure default order is ascending order based on the formatted
values of the order, group, or across variable.
The following table summarizes the default ordering schemes for procedures when
formats are applied to classification variables:
Procedure Concepts 37
In this example, GROUP B appears before GROUP A because the lowest actual
value encountered for GROUP B is 1. For GROUP A, the lowest actual value is 3.
The lowest possible actual value that could be in GROUP A is 0, but 0 does not
exist in the data.
proc format;
value numf 0,3,4='GROUP A'
1,2='GROUP B';
Another situation that uses the lowest actual value occurs when a format contains
groups or ranges that are independent from one another.
In this example, for DEPT=PET, the value OTHER appears last in the sequence. It
appears first for DEPT=PLANT. This is because the master ordering sequence for ID
is determined before subgroups are created. ID=199 and ID=299 have the same
format, OTHER. Because only the formatted value OTHER was established in the
master ordering for ID, OTHER is ordered first for DEPT=PLANT.
data sample;
length dept $ 5;
input dept id;
datalines;
PET 100
PET 110
PET 120
PET 199
PLANT 200
PLANT 210
PLANT 220
PLANT 299
;
proc format;
value idfmt
100='CAT'
110='DOG'
120='FISH'
199='OTHER'
200='CACTUS'
210='IVY'
220='FERN'
299='OTHER';
Most procedures that use class variables provide the MISSING option, which
enables you to specify whether missing values are to be considered valid class
levels. PROC FREQ has the option MISSPRINT, which displays missing class levels
but does not use them in calculating statistics.
PROC FREQ applies formats before classifying missing values as valid or invalid. If
MISSING or MISSPRINT is not used and a format range groups nonmissing class
levels with missing class levels, then both the nonmissing and missing class levels
are considered invalid. The following example demonstrates the difference in effect
between PROC FREQ and PROC REPORT:
proc format;
value bedfmt 1='ONE' 2='TWO' other='OTHER';
40 Chapter 2 / Fundamental Concepts for Using Base SAS Procedures
data houses;
set sasuser.houses end=last;
output;
if last then do;
bedrooms=.;
output;
end;
format bedrooms bedfmt.;
run;
Output 2.1 Order of Data Compared in PROC FREQ and PROC REPORT
42 Chapter 2 / Fundamental Concepts for Using Base SAS Procedures
PROC FREQ does not include the formatted class level OTHER, where PROC
REPORT does. This is because PROC FREQ applied the BEDFMT. format to the
Bedrooms variable before classifying missing values as invalid. PROC REPORT
classifies missing values as invalid before applying the format. This allows the
nonmissing class levels that would normally be grouped with the missing class
levels to be treated as valid.
You can verify this by observing the frequency counts. A total of 16 observations
reside in the Work.Houses data set. PROC FREQ reports nine invalid class levels.
PROC REPORT treats only one class level as invalid. If the lowest internal value in
a format group is a missing value, then the entire group is treated as missing.
Because missing values are considered the lowest possible internal value for either
numeric or character variables, missing values cause an entire format group to have
the lowest internal value. Missing values rank first when ordering by internal
values. The following code demonstrates this by adding the MISSING option to the
previous example:
proc freq data=houses;
title1 "PROC FREQ";
title2 "With MISSING Specified";
tables bedrooms / nocum nopercent missing ;
run;
Output 2.2 PROC FREQ and PROC REPORT with MISSING Specified
In the results for PROC FREQ, the group OTHER is ordered first in the sequence. In
the results for PROC REPORT, OTHER is ordered second. By default, PROC FREQ
orders by internal values and PROC REPORT orders by formatted values. Note that
the frequency count is the same for both procedures. In PROC FREQ, OTHER
corresponds to the missing frequency in the Work.Houses data set. For PROC
REPORT, the frequency increases by one to account for the single missing value
that was not included in Work.Houses.
proc format;
value numf 3='GROUP A' 1,2,4='GROUP B';
run;
Note: The MEANS, REPORT, SUMMARY, and TABULATE procedures accept both
the INTERNAL and UNFORMATTED as a value for the ORDER= option to order
unformatted data.
The values for bedroom are ONE, TWO, THREE, and FOUR, which correspond to
the numbers 1, 2, 3, 4. If you do not specify ORDER=INTERNAL, the values are
listed in alphabetical order by the formatted value: FOUR, ONE, THREE, TWO.
Here, BEST9. is the default format because a format is not specified. Since the
default setting for PROC REPORT is ORDER=FORMATTED, the values are sorted
using the formatted values. This can be complicated by character comparisons that
sometimes give misleading results. The values that are being compared are " 1", " 2",
" 3", "1.5", and "2.5". The single-digit values have leading blanks. Since a blank
character sorts before a number or a period (.), the values "1", "2", and "3" sort before
the values 1.5 or 2.5.
Output 2.6 PROC REPORT Default Formatting When Values Have Leading Blanks
This problem is corrected in the next output by using the 3.1 format, causing no
leading blanks to appear in the comparison:
proc report data=sasuser.houses;
title1 "ORDER=FORMATTED";
title2 "(default)";
title4 "FORMAT=3.1";
title5 "(specified)";
column baths;
define baths / group format=3.1;
run;
48 Chapter 2 / Fundamental Concepts for Using Base SAS Procedures
Output 2.7 PROC REPORT Formatted When Values Have No Leading Blanks
The problem is also corrected in the following output because the internal values
and not the formatted values are used to determine the order:
proc report data=sasuser.houses nowd;
title1 "ORDER =INTERNAL";
title2 "(specified)";
title4 "FORMAT=BEST9.";
title5 "(default)";
column baths;
define baths / group order=internal;
run;
The output order for Style is neither ascending nor descending. RANCH appears
first because it is the first value that is encountered in the input data set. SPLIT
appears second because it is encountered second, and so on. If you use a BY
statement, then the order is reset at the beginning of each new BY group, as if a
new data set were being processed.
To compare, the order for Style is RANCH, SPLIT, CONDO, TWOSTORY. The order
for Bedrooms is 2, 1, 4, 3. This is not clearly apparent in the output because no value
of STYLE has all four values of Bedrooms. The order of values is established
independently for each variable and not according to the subgroups. You can verify
this by adding the PRINTMISS option to the TABLE statement or by comparing the
order of Style and Bedrooms in the Sasuser.Houses data set. See “Order Data Using
Multiple Classification Variables” on page 32.
If two class levels have the same frequency, a secondary ordering algorithm is used.
All Base SAS procedures, except for PROC FREQ, use ORDER=DATA as a
secondary ordering method.
If duplicate frequency counts occur with PROC FREQ and ORDER=FREQ has been
specified, then PROC FREQ uses ORDER=FORMATTED as a secondary ordering
method. If in PROC FREQ a format has not been applied, then the tie is broken
Procedure Concepts 51
The following table summarizes the behavior across the Base SAS procedures that
use the ORDER= option:
Yes ORDER=FORMATTED
1 If you specify the DESCENDING option, then both the primary method, ORDER=FREQ, and the
secondary method, ORDER=DATA, list levels in descending order.
Here are some examples when ORDER=FREQ is specified with the TABULATE,
FREQ, and REPORT procedures. First we have PROC TABULATE and PROC FREQ
output without a format specified:
proc format;
value bedfmt 1='ONE' 2='TWO' 3='THREE' 4='FOUR';
Both outputs list data values in descending order of the frequency counts.
52 Chapter 2 / Fundamental Concepts for Using Base SAS Procedures
PROC REPORT, when no format is specified, lists the output in ascending order:
proc report data=sasuser.houses;
title1 "PROC REPORT";
title2 "Without Format";
title3 "Without DESCENDING";
column bedrooms n;
define bedrooms / group order=freq;
run;
Procedure Concepts 53
For PROC REPORT to list output in descending order, you must specify
DESCENDING in the DEFINE statement:
proc report data=sasuser.houses;
title1 "PROC REPORT";
title2 "Without Format";
title3 "With DESCENDING";
column bedrooms n;
define bedrooms / group order=freq descending;
run;
PROC TABULATE and PROC REPORT use the ORDER=DATA method to handle
values that are the same.
proc tabulate data=sasuser.houses order=freq noseps format=3.;
title1 "PROC TABULATE";
title2 "With Format";
class bedrooms;
table bedrooms, n;
format bedrooms bedfmt.;
run;
proc report data=sasuser.houses;
title1 "PROC REPORT";
title2 "With Format";
title3 "Without DESCENDING";
54 Chapter 2 / Fundamental Concepts for Using Base SAS Procedures
column bedrooms n;
define bedrooms / group order=freq format=bedfmt8.;
run;
PROC FREQ uses ORDER=FORMATTED when a format is specified and values are
the same.
proc freq data=sasuser.houses order=freq;
title1 "PROC FREQ";
title2 "With Format";
tables bedrooms / nocum nopercent;
format bedrooms bedfmt.;
run;
Procedure Concepts 55
Output 2.15 Output Using PROC REPORT with a Format and the DESCENDING
Option
As with ORDER=DATA, when you add a BY statement, the order for classification
variables is established as if each BY group were a separate data set. When you use
multiple classification variables with ORDER=FREQ, the order is determined
independently for each classification variable. The overall ordering scheme is then
applied first by using the order of the highest order variable and then the next-
highest order variable, and so on.
class baths;
table baths, n;
run;
If you make Bedrooms the highest order variable and Baths the next-highest order
variable, you expect the rows to be ordered by the descending frequency of values
first in Bedrooms, then in Baths.
proc tabulate data=sasuser.houses order=freq format=3. noseps;
class baths bedrooms;
table bedrooms*baths, n;
run;
The N statistic does not list the frequencies of the combinations in descending
order. Instead, the order is set first by the descending frequencies of BEDROOMS
and then by BATHS. You can verify this by comparing the order of each variable
against the orders used in Output 2.11 on page 52. The master ordering sequence for
BEDROOMS is 2, 4, 3, 1. The master ordering sequence for BATHS is 1, 3, 2.5, 1.5, 2.
Output 2.17 PROC TABULATE with Bedrooms as the High Order Variable
Procedure Concepts 57
RUN-Group Processing
RUN-group processing enables you to submit a PROC step with a RUN statement
without ending the procedure. You can continue to use the procedure without
issuing another PROC statement. To end the procedure, use a RUN CANCEL or a
QUIT statement. Several Base SAS procedures support RUN-group processing:
Note: PROC SQL executes each query automatically. Neither the RUN nor RUN
CANCEL statement has any effect.
BY-Group Processing
BY-group processing uses a BY statement to process observations that are ordered,
grouped, or indexed according to the values of one or more variables. By default,
when you use BY-group processing in a procedure step, a BY line identifies each
group. This section explains how to create titles that serve as customized BY lines.
Note: You must use the NOBYLINE option if you insert BY-group information into
titles for the following procedures:
n MEANS
n PRINT
n STANDARD
n SUMMARY
If you use the BY statement with the NOBYLINE option, then these procedures
always start a new page for each BY group. This behavior prevents multiple BY
58 Chapter 2 / Fundamental Concepts for Using Base SAS Procedures
groups from appearing on a single page and ensures that the information in the
titles matches the report on the pages.
#BY-specification<.suffix>
suffix supplies text to place immediately after the BY-group information that you
insert in the title. No space appears between the BY-group information and the
suffix.
1 creates a data set, GROC, that contains data for stores from four regions. Each
store has four departments. See “GROC” on page 2792 for the DATA step that
creates the data set.
3 uses the SAS system option NOBYLINE to suppress the BY line that normally
appears in output that is produced with BY-group processing.
Procedure Concepts 59
4 uses PROC CHART to chart sales by Region and Department. In the first TITLE
statement, #BYVAL2 inserts the value of the second BY variable, Department,
into the title. In the second TITLE statement, #BYVAL(Region) inserts the value
of Region into the title. The first period after Region indicates that a suffix
follows. The second period is the suffix.
5 uses the SAS system option BYLINE to return to the creation of the default BY
line with BY-group processing.
data groc; 1
input Region $9. Manager $ Department $ Sales;
datalines;
Southeast Hayes Paper 250
Southeast Hayes Produce 100
Southeast Hayes Canned 120
Southeast Hayes Meat 80
...more lines of data...
Northeast Fuller Paper 200
Northeast Fuller Produce 300
Northeast Fuller Canned 420
Northeast Fuller Meat 125
;
proc sort data=groc; 2
by region department;
run;
options nobyline nodate pageno=1
linesize=64 pagesize=20; 3
proc chart data=groc; 4
by region department;
vbar manager / type=sum sumvar=sales;
title1 'This chart shows #byval2 sales';
title2 'in the #byval(region)..';
run;
options byline; 5
1 uses the SAS system option NOBYLINE to suppress the BY line that normally
appears in output that is produced with BY-group processing.
2 uses PROC CHART to chart sales by Region. In the first TITLE statement,
#BYVAR(Region) inserts the name of the variable Region into the title. (If
Region had a label, #BYVAR would use the label instead of the name.) The
suffix al is appended to the label. In the second TITLE statement, #BYVAL1
inserts the value of the first BY variable, Region, into the title.
3 uses the SAS system option BYLINE to return to the creation of the default BY
line with BY-group processing.
1 uses the SAS system option NOBYLINE to suppress the BY line that normally
appears in output that is produced with BY-group processing.
2 uses PROC CHART to chart sales by Region and Department. In the TITLE
statement, #BYLINE inserts the complete BY line into the title.
3 uses the SAS system option BYLINE to return to the creation of the default BY
line with BY-group processing.
Notation Meaning
Notation Meaning
Note: You cannot use shortcuts to list variable names in the INDEX CREATE
statement in PROC DATASETS.
Formatted Values
Jobcode indicates the job and level of the employee. For example, TA1 indicates
that the employee is at the beginning level for a ticket agent.
options nodate pageno=1
linesize=64 pagesize=40;
proc print data=proclib.payroll(obs=10)
noobs;
title 'PROCLIB.PAYROLL';
title2 'First 10 Observations Only';
run;
The following PROC FORMAT step creates the format $JOBFMT., which assigns
descriptive names for each job:
proc format;
value $jobfmt
'FA1'='Flight Attendant Trainee'
'FA2'='Junior Flight Attendant'
'FA3'='Senior Flight Attendant'
'ME1'='Mechanic Trainee'
'ME2'='Junior Mechanic'
'ME3'='Senior Mechanic'
'PT1'='Pilot Trainee'
'PT2'='Junior Pilot'
'PT3'='Senior Pilot'
'TA1'='Ticket Agent Trainee'
'TA2'='Junior Ticket Agent'
'TA3'='Senior Ticket Agent'
'NA1'='Junior Navigator'
'NA2'='Senior Navigator'
'BCK'='Baggage Checker'
'SCP'='Skycap';
run;
Procedure Concepts 65
The FORMAT statement in this PROC MEANS step temporarily associates the
$JOBFMT. format with the variable Jobcode:
options nodate pageno=1
linesize=64 pagesize=60;
proc means data=proclib.payroll mean max;
class jobcode;
var salary;
format jobcode $jobfmt.;
title 'Summary Statistics for';
title2 'Each Job Code';
run;
PROC MEANS produces this output, which uses the $JOBFMT. format:
Note: Because formats are character strings, formats for numeric variables are
ignored when the values of the numeric variables are needed for mathematical
calculations.
66 Chapter 2 / Fundamental Concepts for Using Base SAS Procedures
In this example, the FORMAT statement in the DATA step permanently associates
the $YRFMT. variable with the variable Year. Thus, when you use the variable in a
PROC step, the procedure uses the formatted values. The PROC MEANS step,
68 Chapter 2 / Fundamental Concepts for Using Base SAS Procedures
however, contains a FORMAT statement that dissociates the $YRFMT. format from
Year for this PROC MEANS step only. PROC MEANS uses the stored value for Year
in the output.
proc format;
value $yrfmt '1'='Freshman'
'2'='Sophomore'
'3'='Junior'
'4'='Senior';
run;
data debate;
input Name $ Gender $ Year $ GPA @@;
format year $yrfmt.;
datalines;
Capiccio m 1 3.598 Tucker m 1 3.901
Bagwell f 2 3.722 Berry m 2 3.198
Metcalf m 2 3.342 Gold f 3 3.609
Gray f 3 3.177 Syme f 3 3.883
Baglione f 4 4.000 Carr m 4 3.750
Hall m 4 3.574 Lewis m 4 3.421
;
PROC MEANS produces this output, which does not use the YRFMT. format:
Procedure Concepts 69
Note: To ensure that SAS can find user-written formats, use the SAS system
option FMTSEARCH=. How to store formats is described in “Storing Informats and
Formats ” on page 1080.
“Example 10: Printing All the Data Sets in a SAS Library” on page 1851 shows how to
print all the data sets in a library. You can use the same macro definition to perform
any procedure on all the data sets in a library. Simply replace the PROC PRINT
piece of the program with the appropriate procedure code.
Statistic Descriptions
The following table identifies common descriptive statistics that are available in
several Base SAS procedures. For more detailed information about available
statistics and theoretical information, see “Keywords and Formulas” on page 2700.
Table 2.1 Common Descriptive Statistics That Base SAS Procedures Calculate
n SUM, MEAN, MAX, MIN, RANGE, USS, and CSS require at least one nonmissing
observation.
n VAR, STD, STDERR, and CV require at least two observations.
Prior to Version 7, most SAS procedures generated output that was designed for a
traditional line-printer. This type of output has limitations that prevent you from
getting the most value from your results:
n Traditional SAS output is limited to monospace fonts. With today's desktop
document editors and publishing systems, you need more versatility in printed
output.
n Some commonly used procedures do not produce output data sets. Before ODS,
if you wanted to use output from one of these procedures as input to another
procedure, then you relied on PROC PRINTTO and the DATA step to retrieve
results.
For more information about the Output Delivery System, see the SAS Output
Delivery System: User’s Guide.
73
3
Statements with the Same
Function in Multiple Procedures
Overview
Several Base SAS statements have the same function in a number of Base SAS
procedures. Some of the statements are fully documented in the SAS DATA Step
Statements: Reference, and others are documented in this section.
Note: For procedure steps that create output, these statements apply only to the
INPUT data set.
The following list shows you where to find more information about each statement:
ATTRIB
affects the procedure output and the output data set. The ATTRIB statement
does not permanently alter the variables in the input data set. The LENGTH=
74 Chapter 3 / Statements with the Same Function in Multiple Procedures
option has no effect. For the complete documentation, see the SAS DATA Step
Statements: Reference.
BY
orders the output according to the BY groups. See “BY” on page 74.
FORMAT
affects the procedure output and the output data set. The FORMAT statement
does not permanently alter the variables in the input data set. The DEFAULT=
option is not valid. For the complete documentation, see the SAS DATA Step
Statements: Reference.
FREQ
treats observations as if they appear multiple times in the input data set. See
“FREQ” on page 79.
INFORMAT
applies a pattern to or executes instructions for a data value to be read as input.
The DEFAULT= option is not valid. For the complete documentation, see the
SAS DATA Step Statements: Reference.
LABEL
affects the procedure output and the output data set. The LABEL statement
does not permanently alter the variables in the input data set except when it is
used with the MODIFY statement in PROC DATASETS. For complete
documentation, see the SAS DATA Step Statements: Reference.
QUIT
executes any statements that have not executed and ends the procedure. See
“QUIT” on page 81.
WEIGHT
specifies weights for analysis variables in the statistical calculations. See
“WEIGHT” on page 82.
WHERE
subsets the input data set by specifying certain conditions that each
observation must meet before it is available for processing. See “WHERE” on
page 88.
Statements
BY
For more information, see “Creating Titles That Contain BY-Group Information ” on
page 57.
BY <DESCENDING> variable-1
<… <DESCENDING> variable-n>
<NOTSORTED>;
Required Arguments
variable
specifies the variable that the procedure uses to form BY groups. You can
specify more than one variable. If you do not use the NOTSORTED option in the
BY statement, then either the observations in the data set must be sorted by all
the variables that you specify, or they must be indexed appropriately. Variables
in a BY statement are called BY variables.
Optional Arguments
DESCENDING
specifies that the observations are sorted in descending order by the variable
that immediately follows the word DESCENDING in the BY statement.
NOTSORTED
specifies that observations are not necessarily sorted in alphabetic or numeric
order. The observations are grouped in another way (for example, chronological
order).
Note: You cannot use the GROUPFORMAT option, which is available in the BY
statement in a DATA step, in a BY statement in any PROC step.
BY-Group Processing
Procedures create output for each BY group. For example, the elementary statistics
procedures and the scoring procedures perform separate analyses for each BY
group. The reporting procedures produce a report for each BY group.
76 Chapter 3 / Statements with the Same Function in Multiple Procedures
Note: All Base SAS procedures except PROC PRINT process BY groups
independently. PROC PRINT can report the number of observations in each BY
group as well as the number of observations in all BY groups. Similarly, PROC
PRINT can sum numeric variables in each BY group and across all BY groups.
You can use only one BY statement in each PROC step. When you use a BY
statement, the procedure expects an input data set that is sorted by the order of
the BY variables or one that has an appropriate index. If your input data set does
not meet these criteria, then an error occurs. Either sort it with the SORT procedure
or create an appropriate index on the BY variables.
Depending on the order of your data, you might need to use the NOTSORTED or
DESCENDING option in the BY statement in the PROC step.
n For more information about the BY statement, see SAS DATA Step Statements:
Reference.
n For more information about PROC SORT, see Chapter 64, “SORT Procedure,” on
page 2355.
n For more information about creating indexes, see “INDEX CREATE Statement”
on page 638.
3 The procedure continues adding observations to the current BY group until both
the internal and formatted values of the BY variable or variables change.
This process can have unexpected results if, for example, nonconsecutive internal
BY values share the same formatted value. In this case, the formatted value is
represented in different BY groups. Alternatively, if different consecutive internal
BY values share the same formatted value, then these observations are grouped
into the same BY group.
set. The primary data set is usually, but not always, the DATA= data set. A BY
statement always applies to the primary data set. The variables in the BY
statement must appear in the primary data set.
If the BY statement is applied to the secondary data set, then each BY variable that
exists on both the data sets must have the same type, character or numeric, in both
data sets. The BY variables are required to have either the same formatted value or
the same unformatted value. Formatted values match only if both the formatted
lengths and the formatted values are the same. Unformatted values are not
required to have the same length in order to match. The unformatted character
values match if the unformatted values are the same after stripping the trailing
blanks. The unformatted doubles match if they have the same value.
A secondary data set does not need to have all of the BY variables that are in the
primary data set. A procedure can define a subset of the BY variables for the
secondary data set. For example, if the primary data set has the BY variables
A,B,C,D, then the procedure can define the following BY variables on the secondary
data set:
n A
n A,B
n A,B,C
n A,B,C,D
If both the primary and secondary data sets have the same number of BY variables,
and all the BY variables have the same byte lengths and format lengths, then either
the unformatted values or the formatted values in the BY buffer (for all of the BY
variables) have to match. If they do not match, then each variable is compared. The
formatted values of each variable are compared first. The formatted lengths have
to match, and the formatted values have to match. If the formatted lengths and
values do not match, then the unformatted values are compared even if the byte
lengths are different.
If corresponding character variable lengths differ, then the longer character variable
can contain only trailing blanks for the extra characters. If the lengths of the
78 Chapter 3 / Statements with the Same Function in Multiple Procedures
character variables are different, then the values match as long as they are the
same after stripping the trailing blanks. For example, ‘ABCD’ in the primary data set
matches ‘ABCD ’ in the secondary data set. If the secondary data set contained
‘ABCDEF’, then they would not match.
COMPARE STANDARD
CORR SUMMARY
FREQ TABULATE
MEANS TIMEPLOT
PLOT TRANSPOSE
PRINT UNIVARIATE
RANK
Note: In the SORT procedure, the BY statement specifies how to sort the data. In
the other procedures, the BY statement specifies how the data is currently sorted.
Example
This example uses a BY statement in a PROC PRINT step. There is output for each
value of the BY variable Year. The DEBATE data set is created in “Example:
Temporarily Dissociating a Format from a Variable” on page 67.
options nodate pageno=1 linesize=64
pagesize=40;
proc print data=debate noobs;
by year;
title 'Printing of Team Members';
title2 'by Year';
run;
Statements 79
FREQ
You can use a WEIGHT statement and a FREQ statement in the same step of any
procedure that supports both statements.
FREQ variable;
Required Arguments
variable
specifies a numeric variable whose value represents the frequency of the
observation. If you use the FREQ statement, then the procedure assumes that
each observation represents n observations, where n is the value of variable. If
variable is not an integer, then SAS truncates it. If variable is less than 1 or is
missing, then the procedure does not use that observation to calculate
statistics. If a FREQ statement does not appear, then each observation has a
default frequency of 1.
The sum of the frequency variable represents the total number of observations.
n MEANS or SUMMARY
n REPORT
n STANDARD
n TABULATE
n UNIVARIATE
Example
The data in this example represents a ship's course and speed (in nautical miles per
hour), recorded every hour. The frequency variable Hours represents the number of
hours that the ship maintained the same course and speed. Each of the following
PROC MEANS steps calculates average course and speed. The different results
demonstrate the effect of using Hours as a frequency variable.
The following PROC MEANS step does not use a frequency variable:
options nodate pageno=1 linesize=64 pagesize=40;
data track;
input Course Speed Hours @@;
datalines;
30 4 8 50 7 20
75 10 30 30 8 10
80 9 22 20 8 25
Statements 81
83 11 6 20 6 20
;
proc means data=track maxdec=2 n mean;
var course speed;
title 'Average Course and Speed';
run;
Without a frequency variable, each observation has a frequency of 1, and the total
number of observations is 8.
When you use Hours as a frequency variable, the frequency of each observation is
the value of Hours. The total number of observations is 141 (the sum of the values
of the frequency variable).
QUIT
QUIT;
82 Chapter 3 / Statements with the Same Function in Multiple Procedures
n DATASETS
n PLOT
n PMENU
n SQL
WEIGHT
You can use a WEIGHT statement and a FREQ statement in the same step of any
procedure that supports both statements.
WEIGHT variable;
Required Arguments
variable
specifies a numeric variable whose values weight the values of the analysis
variables. The values of the variable do not have to be integers.
Less than 0 Converts the weight value to zero and counts the
observation in the total number of observations
The procedure substitutes the value of the WEIGHT variable for wi, which
appears in “Keywords and Formulas” on page 2700.
n FREQ
n MEANS or SUMMARY
n REPORT
n STANDARD
n TABULATE
n UNIVARIATE
Note: In PROC FREQ, the value of the variable in the WEIGHT statement
represents the frequency of occurrence for each observation. For more information,
see the information about PROC FREQ in the Base SAS(R) 9.3 Procedures Guide:
Statistical Procedures.
By using a WEIGHT statement to compute moments, you assume that the ith
observation has a variance that is equal to σ 2 /wi. When you specify VARDEF=DF
(the default), the computed variance is a weighted least squares estimate of σ 2.
Similarly, the computed standard deviation is an estimate of σ. Note that the
computed variance is not an estimate of the variance of the ith observation,
because this variance involves the observation's weight, which varies from
observation to observation.
If the values of your variable are counts that represent the number of occurrences
of each observation, then use this variable in the FREQ statement rather than in the
WEIGHT statement. In this case, because the values are counts, they should be
integers. (The FREQ statement truncates any noninteger values.) The variance that
84 Chapter 3 / Statements with the Same Function in Multiple Procedures
Note: If your data comes from a stratified sample where the weights wi represent
the strata weights, then neither the WEIGHT statement nor the FREQ statement
provides appropriate stratified estimates of the mean, variance, or variance of the
mean. To perform the appropriate analysis, consider using PROC SURVEYMEANS,
which is a SAS/STAT procedure that is documented in the Base SAS(R) 9.3
Procedures Guide: Statistical Procedures.
The SAS data set SIZE contains the estimate (ObjectSize) in centimeters at each
distance (Distance) in meters and the precision (Precision) for each estimate.
Notice that the largest deviation (an overestimate by 20 cm) came at the greatest
distance (7.5 meters from the object). As a measure of precision, 1/Distance, gives
more weight to estimates that were made closer to the object and less weight to
estimates that were made at greater distances.
data size;
input Distance ObjectSize @@;
Precision=1/distance;
datalines;
1.5 30 1.5 20 1.5 30 1.5 25
3 43 3 33 3 25 3 30
4.5 25 4.5 36 4.5 48 4.5 33
6 43 6 36 6 23 6 48
7.5 30 7.5 25 7.5 50 7.5 38
;
The following PROC MEANS step computes the average estimate of the object size
while ignoring the weights. Without a WEIGHT variable, PROC MEANS uses the
default weight of 1 for every observation. Thus, the estimates of object size at all
distances are given equal weight. The average estimate of the object size exceeds
the actual size by 3.55 cm.
proc means data=size maxdec=3 n mean var stddev;
var objectsize;
title1 'Unweighted Analysis of the SIZE Data Set';
run;
Statements 85
The next two PROC MEANS steps use the precision measure (Precision) in the
WEIGHT statement and show the effect of using different values of the VARDEF=
option. The first PROC step creates an output data set that contains the variance
and standard deviation. If you reduce the weighting of the estimates that are made
at greater distances, the weighted average estimate of the object size is closer to
the actual size.
proc means data=size maxdec=3 n mean var stddev;
weight precision;
var objectsize;
output out=wtstats var=Est_SigmaSq std=Est_Sigma;
title1 'Weighted Analysis Using Default VARDEF=DF';
run;
The variance of the ith observation is assumed to be var xi = σ 2 /wi and wi is the
weight for the ith observation. In the first PROC MEANS step, the computed
variance is an estimate of σ 2. In the second PROC MEANS step, the computed
variance is an estimate of n − 1/n σ 2 /w, where w is the average weight. For large n,
this value is an approximate estimate of the variance of an observation with
average weight.
86 Chapter 3 / Statements with the Same Function in Multiple Procedures
The following statements create and print a data set with the weighted variance
and weighted standard deviation of each observation. The DATA step combines the
output data set that contains the variance and the standard deviation from the
weighted analysis with the original data set. The variance of each observation is
computed by dividing Est_SigmaSq (the estimate of σ 2 from the weighted analysis
when VARDEF=DF) by each observation's weight (Precision). The standard
deviation of each observation is computed by dividing Est_Sigma (the estimate of σ
from the weighted analysis when VARDEF=DF) by the square root of each
observation's weight (Precision).
data wtsize(drop=_freq_ _type_);
set size;
if _n_=1 then set wtstats;
Est_VarObs=est_sigmasq/precision;
Est_StdObs=est_sigma/sqrt(precision);
WHERE
WHERE where-expression;
Required Arguments
where-expression
is a valid arithmetic or logical expression that generally consists of a sequence
of operands and operators. For more information about where processing, see
SAS DATA Step Statements: Reference.
Statements 89
CALENDAR RANK
CHART REPORT
COMPARE SORT
CORR SQL
FREQ TABULATE
PLOT TRANSPOSE
PRINT UNIVARIATE
Details
n The CALENDAR and COMPARE procedures and the APPEND statement in
PROC DATASETS accept more than one input data set. For more information,
see the documentation for the specific procedure.
n To subset the output data set, use the WHERE= data set option:
For more information about WHERE=, see SAS DATA Step Statements:
Reference.
90 Chapter 3 / Statements with the Same Function in Multiple Procedures
Example
In this example, PROC PRINT prints only those observations that meet the
condition of the WHERE expression. The DEBATE data set is created in “Example:
Temporarily Dissociating a Format from a Variable” on page 67.
options nodate pageno=1 linesize=64
pagesize=40;
proc print data=debate noobs;
where gpa>3.5;
title 'Team Members with a GPA';
title2 'Greater than 3.5';
run;
91
4
In-Database Processing of Base
Procedures
n The data source might be capable of optimizing a query for execution in a highly
parallel and scalable fashion.
Beginning with SAS 9.2M3, Base SAS procedures were enhanced to process data
inside the Teradata, DB2, and Oracle data sources. In SAS 9.3, procedures were
enhanced to process data inside the Netezza data source. In SAS 9.4, procedures
have been enhanced to process data inside the Amazon Redshift, Aster, Google
BigQuery, Greenplum, Hadoop, HAWQ, Impala, Microsoft SQL Server, PostgreSQL,
SAP HANA, Snowflake, Vertica, and Yellowbrick data sources. The in-database
procedures are used to generate more sophisticated queries that allow the
aggregations and analytics to be run inside the data source.
All of these in-database procedures generate SQL queries. You use SAS/ACCESS or
SQL as the interface to the data source.
92 Chapter 4 / In-Database Processing of Base Procedures
Procedure Description
PROC FREQ in Base SAS Procedures Guide: Produces one-way to n-way tables; reports
Statistical Procedures frequency counts; computes test and measures of
association and agreement for two-way to n-way
crosstabulation tables; can compute exact tests
and asymptotic tests; can create output data sets.
PROC RANK on page 20061 Computes ranks for one or more numeric variables
across the observations of a SAS data set; can
produce some rank scores.
PROC REPORT on page 2154 Combines features of the PRINT, MEANS, and
TABULATE procedures with features of the DATA
step in a single report-writing tool that can
produce a variety of reports.
PROC SORT on page 23871 Orders SAS data set observations by the values of
one or more character or numeric variables.
PROC TRANSPOSE on page 2655 2 Creates an output data set by restructuring the
values in a SAS data set, transposing selected
variables into observations.
5
CAS Processing of Base
Procedures
Note: SAS 9.4 CAS-enabled procedures are designed to work with CAS in SAS Viya
3.5. In the SAS Viya 4 platform, some CAS-enabled procedures have been further
optimized to take advantage of multithreading and the cloud-native platform.
94 Chapter 5 / CAS Processing of Base Procedures
When processing data in the SAS Viya 4 CAS server, SAS recommends that you
submit the code from a SAS client on the SAS Viya 4 platform for best results.
The principle is to summarize and analyze large data volumes in the in-memory
tables in the CAS server. The smaller, summarized results, are transferred from the
server to the SAS client. The procedure then post-processes the summarized
results to produce additional statistics, Output Delivery System (ODS) objects, and
so on.
The core product of SAS Viya is SAS Visual Analytics. If you install SAS Visual
Analytics only, then you have access to a subset of the Base SAS procedures. If you
have SAS Viya with any other offering (in addition to SAS Visual Analytics) that is
licensed and installed, you also have access to all SAS 9.4 Base procedures. The
Base SAS Procedures Guide contains complete documentation for all Base
procedures.
Some Base SAS procedures execute code on the CAS server. See Chapter 5, “CAS
Processing of Base Procedures,” on page 93.
Procedure Description
PROC APPEND on page 109 Adds rows from a CAS table to the end of
a SAS data set, and adds rows from a
SAS data set to the end of a CAS table.
PROC CONTENTS on page 489 Shows the contents of a CAS table and
prints the directory of the caslib.
PROC DELETE on page 785 Deletes SAS data sets and CAS tables.
PROC FCMP on page 873 Enables you to create, test, and store
SAS functions, CALL routines, and
subroutines before you use them in other
SAS procedures or in DATA steps.
Procedures That Use CAS Actions 95
Procedure Description
PROC LUA on page 1403 Enables you to run statements from the
Lua programming language within SAS
code.
Procedure Description
1 The DS2 procedure does not use the CAS LIBNAME engine to access in-memory tables. Instead, the
procedure accesses tables by caslib and name. For information and limitations, see “DS2 in CAS:
Concepts” in SAS DS2 Programmer’s Guide.
2 The FEDSQL procedure does not use the CAS LIBNAME engine to access in-memory tables. Instead,
the procedure accesses tables by caslib and name. For information and limitations, see SAS Viya:
FedSQL Programming for SAS Cloud Analytic Services
By default, the CAS LIBNAME engine limits data transfer to 100 MB. If you reach
the limit:
1 You might be able to achieve the result that you want with a SAS Visual
Statistics procedure or SAS Visual Data Mining and Machine Learning
procedure.
2 You might be able to program with CAS actions so that the summarization is
performed by the server.
3 You can increase the limit with the CASDATALIMIT= system option or the
DATALIMIT= LIBNAME option or data set option.
BY-Group Processing
For procedures that support a BY statement, the information is passed to the CAS
server. This enables two optimizations:
1 The data does not need to be pre-sorted by the specified variables.
2 When the results are transferred by the server to the SAS client, the groups are
already formed. These results can be summarized results as with PROC MEANS
or they can be observation-level as is the case with PROC PRINT.
Related Documents 97
If you know in advance that you will perform BY-group processing, especially if you
have large in-memory tables, you can partition the in-memory table as a further
efficiency. When you partition an in-memory table with the same variables that you
use for BY-group processing, you avoid the performance penalty for forming the
groups each time you access the table.
Filtering Observations
Procedures that support a WHERE statement, the expression is sent to the server.
The server resolves the expression and subsets the data for the analysis. This can
greatly reduce processing time and data transfer from the server to the SAS client.
The same is true of the WHERE= data set option when it is used with the CAS
LIBNAME engine—the expression is sent to the server to subset the data.
cas casauto host="cloud.example.com" port=5570;
libname mycas cas sessref=casauto;
proc casutil;
load data=sashelp.prdsale;
quit;
Related Documents
n SAS Cloud Analytic Services: Fundamentals
6
Base SAS Procedures
Documented in Other
Publications
GREDUCE Processes map data sets so that they can SAS/GRAPH and
draw simpler maps with fewer boundary Base SAS: Mapping
points. The resulting output map data set Reference
with an added DENSITY variable can be
used as an input map data set by the GMAP
Base SAS Procedures Documented in Other Publications 101
SGDESIGN Produces a graph from one or more input SAS ODS Graphics:
SAS data sets and a user-defined ODS Procedures Guide
Graphics Designer (SGD) file.
102 Chapter 6 / Base SAS Procedures Documented in Other Publications
SGMAP Identifies the data sets needed for map SAS ODS Graphics:
areas, map response values, and overlay Procedures Guide
plots.
SGPANEL Creates a panel of graph cells for the values SAS ODS Graphics:
of one or more classification variables. Procedures Guide
SGPIE Identifies the data set that contains the plot SAS ODS Graphics:
variables. The statement also gives you the Procedures Guide
option to specify a description, control
automatic legends, and specify whether the
chart background is opaque or transparent.
SGPLOT Creates one or more plots and overlays SAS ODS Graphics:
them on a single set of axes. Procedures Guide
SGSCATTER Creates a paneled graph of scatter plots for SAS ODS Graphics:
multiple combinations of variables, Procedures Guide
depending on the plot statement that you
use.
For information about all SAS procedures, see SAS Procedures by Name and
Product. The information in SAS Procedures by Name and Product is arranged
alphabetically by the procedures' names and by their products' names.
103
PART 2
Procedures
Chapter 7
APPEND Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Chapter 8
AUTHLIB Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Chapter 9
CALENDAR Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Chapter 10
CATALOG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Chapter 11
CHART Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Chapter 12
CIMPORT Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Chapter 13
COMPARE Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
Chapter 14
CONTENTS Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
Chapter 15
COPY Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
Chapter 16
CPORT Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
Chapter 17
DATASETS Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
Chapter 18
DATEKEYS Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
Chapter 19
DELETE Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785
104
Chapter 20
DISPLAY Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797
Chapter 21
DS2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 801
Chapter 22
DSTODS2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837
Chapter 23
EXPORT Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 849
Chapter 24
FCMP Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873
Chapter 25
FCMP Special Functions and Call Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 939
Chapter 26
FCmp Function Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985
Chapter 27
FEDSQL Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001
Chapter 28
FMTC2ITM Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1049
Chapter 29
FONTREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057
Chapter 30
FORMAT Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075
Chapter 31
FSLIST Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1205
Chapter 32
GROOVY Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1219
Chapter 33
HADOOP Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1233
Chapter 34
HDMD Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1259
Chapter 35
HTTP Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1279
Chapter 36
IMPORT Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1323
Chapter 37
JAVAINFO Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1355
105
Chapter 38
JSON Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1357
Chapter 39
LUA Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1403
Chapter 40
MEANS Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1463
Chapter 41
MIGRATE Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1559
Chapter 42
OPTIONS Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1581
Chapter 43
OPTLOAD Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1603
Chapter 44
OPTSAVE Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1609
Chapter 45
PLOT Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1617
Chapter 46
PMENU Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1691
Chapter 47
PRESENV Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1739
Chapter 48
PRINT Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1751
Chapter 49
PRINTTO Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1857
Chapter 50
PRODUCT_STATUS Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1881
Chapter 51
PROTO Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1885
Chapter 52
PRTDEF Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1913
Chapter 53
PRTEXP Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1933
Chapter 54
PWENCODE Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1939
Chapter 55
QDEVICE Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1951
106
Chapter 56
RANK Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2001
Chapter 57
REGISTRY Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2027
Chapter 58
REPORT Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2047
Chapter 59
REPORT Procedure Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2229
Chapter 60
S3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2259
Chapter 61
SCAPROC Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2291
Chapter 62
SCOREACCEL Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2305
Chapter 63
SOAP Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2341
Chapter 64
SORT Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2355
Chapter 65
SQL Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2411
Chapter 66
SQOOP Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2413
Chapter 67
STANDARD Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2421
Chapter 68
STREAM Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2441
Chapter 69
SUMMARY Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2455
Chapter 70
TABULATE Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2459
Chapter 71
TIMEPLOT Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2621
Chapter 72
TRANSPOSE Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2651
Chapter 73
XSL Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2685
107
108
109
7
APPEND Procedure
For more information, see Chapter 5, “CAS Processing of Base Procedures,” on page
93.
Generally, the APPEND procedure functions the same as the APPEND statement in
the DATASETS procedure. The only difference between the APPEND procedure
and the APPEND statement in PROC DATASETS is the default for libref in the
BASE= and DATA= options. For PROC APPEND, the default is either Work or User.
For the APPEND statement, the default is the libref of the procedure input library.
110 Chapter 7 / APPEND Procedure
APPEND Add observations from one SAS data set to the Ex. , Ex. 3
end of another SAS data set
Example 1: Concatenating Two SAS Data Sets 111
Details
This example demonstrates the following tasks:
112 Chapter 7 / APPEND Procedure
To create the Exp.Results and Exp.Sur data sets and print them out before using
this example to concatenate them, see “EXP Library” on page 2789.
Program
options pagesize=40 linesize=64 nodate pageno=1;
LIBNAME exp 'SAS-library';
proc append base=exp.results data=exp.sur force;
run;
proc print data=exp.results noobs;
title 'The Concatenated RESULTS Data Set';
run;
quit;
Program Description
This example appends one data set to the end of another data set.
The data set Exp.Sur contains the variable Wt6Mos, but the Exp.Results data set
does not.
Set the system options. The NODATE option suppresses the display of the date
and time in the output. The PAGENO= option specifies the starting page number.
The LINESIZE= option specifies the output line length, and the PAGESIZE= option
specifies the number of lines on an output page.
options pagesize=40 linesize=64 nodate pageno=1;
Append the data set Exp.Sur to the Exp.Results data set. PROC APPEND appends
the data set Exp.Sur to the data set Exp.Results. FORCE causes PROC APPEND to
carry out the Append operation even though Exp.Sur has a variable that
Exp.Results does not. PROC APPEND does not add the Wt6Mos variable to
Exp.Results.
proc append base=exp.results data=exp.sur force;
run;
Output 7.3 Concatenating the Results and the Sur Data Sets
Details
This example demonstrates the following tasks:
n appending a CAS table to a SAS data set
n contents of the table, the data set, and the new data set after appending
Example 2: Concatenating a CAS Table to a SAS Data Set 115
Program
options pagesize=40 linesize=64 nodate pageno=1;
libname sascas1 cas;
Program Description
This example appends a CAS table to the end of a SAS data set.
Set the system options. The NODATE option suppresses the display of the date
and time in the output. The PAGENO= option specifies the starting page number.
The LINESIZE= option specifies the output line length, and the PAGESIZE= option
specifies the number of lines on an output page.
options pagesize=40 linesize=64 nodate pageno=1;
The LIBNAME statements assign the CAS engine and BASE engine libraries.
libname sascas1 cas;
Check the contents of the table and data set. Use PROC CONTENTS to view the
data set and table.
proc contents data=saleslib.monthly;
run;
Append the SasCas1.LastMonth table to the SalesLib.Monthly data set. The data
for last month's sales in a CAS table is appended to the accumulated sales data
stored in a SAS data set. The CAS table uses VARCHAR to store the city and
address values. The SAS data set stores the values in character variables. Since the
attribute for the two values differ, the FORCE option is used in PROC APPEND.
proc append base=saleslib.monthly data=sascas1.lastmonth force;
run;
116 Chapter 7 / APPEND Procedure
Retrieve total sales. Use PROC SQL to retrieve five variables and sales that are
greater than $2,000,000.
proc sql outobs=5;
select store_id, address, city, state, zipcode, totalsales
format dollar12.
from saleslib.monthly(obs=4)
where totalsales gt 2000000;
quit;
Details
This example demonstrates the following tasks:
n creates two data sets: one with no observations and one with observations
Program
data mtea;
length var1 8.;
stop;
run;
data phull;
length var1 8.;
do var1=1 to 100000;
output;
end;
run;
do var1=1 to 10;
output;
end;
run;
Program Description
The following example shows that a sort indicator can be inherited using the
GETSORT option with the APPEND procedure.
Create another data set with the same structure, but with many observations.
Sort the data set.
data phull;
length var1 8.;
do var1=1 to 100000;
output;
end;
run;
A sort indicator is being created using the SORTEDBY data set option.
data mysort(sortedby=var1);
length var1 8.;
do var1=1 to 10;
output;
end;
122 Chapter 7 / APPEND Procedure
run;
Output Examples
Output 7.8 Descending Sort Information
Example 3: Getting Sort Indicator Information 123
Output 7.9 Sort Indicator Information Using the SORTEDBY= Data Set Option
8
AUTHLIB Procedure
n purge replaced password and encryption key values that are also known as
metadata-bound library credentials
n repair metadata-bound libraries by recovering security information, secured
library objects, and secured table objects
n remove the physical security information and metadata objects that protect a
metadata-bound library
n report inconsistencies between physical library contents and corresponding
metadata objects within a specified metadata-bound library
Users cannot access metadata-bound data sets from any release of SAS prior to
9.3M2.
Concepts: AUTHLIB Procedure 127
Note: For a z/OS direct-access bound library that has been bound to metadata, the
constraint is slightly broader. Neither the library nor any of its members can be
accessed by earlier releases of SAS.
Metadata-Bound Library
A metadata-bound library is a physical library that is tied to a corresponding
metadata secured table object. Each physical table within a metadata-bound
library has information in its header that points to a specific metadata object. The
pointer creates a security binding between the physical table and the metadata
object. The binding ensures that SAS universally enforces metadata-layer access
requirements for the physical table—regardless of how a user requests access from
SAS. For more information, see SAS Guide to Metadata-Bound Libraries.
The metadata-bound library passwords also prevent a user from exporting the
secured library and secured table objects from a SAS Metadata Server and then
importing them to a SAS Metadata Server that an unauthorized user created and
controls. This prevents the unauthorized user from using such objects where the
user has modified the permissions.
captured from a transmission and presented to SAS as a password value in the SAS
language. Administrators might choose to use the PWENCODE procedure to
encode the passwords for use in a PROC AUTHLIB statement. Using an encoded
password prevents a casual observer from seeing the clear-text password in the
PROC AUTHLIB statements that the administrator types.
There are three passwords in the metadata-bound library set that correspond to
the READ=, WRITE=, and ALTER= passwords of SAS data sets. For greater
simplicity in administration of metadata-bound libraries, it is recommended that
you use the PW= option in PROC AUTHLIB statements to specify a single password
value. In the context of metadata-bound libraries, the READ=, WRITE=, and
ALTER= options do not Create access distinctions. If you are concerned that a
single eight character password does not meet your security requirements, then
you can choose to set three different password values (using READ=, WRITE=, and
ALTER=). Setting different values for these three options can create a 24-character
password. However, you must keep track of all password values that you have
assigned to a metadata-bound library. You must specify the passwords to do the
following:
n unbind the library
TIP All password values must be valid SAS names with a maximum length
of 8 characters.
CAUTION
If you lose the password (or passwords) for a metadata-bound library, then
you cannot unbind the library or change its passwords. Be sure to keep track of
passwords that you assign in the CREATE and MODIFY statements.
All of the password options in the CREATE, MODIFY, TABLES, and REMOVE
statements accept a syntax where two values can be specified separated by a slash
(/) (for example, PW=password-value/new-password-value). For CREATE and
MODIFY statements, a password value to set in the metadata or data sets is
obtained from the password value before the slash (/) if no new password value is
specified after the slash (/). The same is true for the REMOVE statement with the
additional possibility of specifying the slash (/) and no new password value to
indicate that the password should be removed from the data sets during the unbind
process. However, note that if the CREATE, MODIFY, or REMOVE statement also
specifies TABLESONLY=YES, then any new password values on those statements
are ignored.
In general, you do not specify a new password value in a TABLES statement
following a CREATE or MODIFY statement. The new value is obtained from the
metadata to which the data set is bound or being bound. You can specify a new
password value in TABLES statements following a REMOVE statement if you want
different data sets to have unique passwords. In that case, you follow these steps:
1 Change the password for the data sets using a REMOVE statement with
TABLESONLY=YES and an individual TABLES statement for each unique
password.
2 Remove the metadata-bound library with a REMOVE statement without
TABLESONLY=YES.
See Also
n “Example 1: Binding a Physical Library That Contains Unprotected Data Sets” on
page 167
n “Example 2: Binding a Physical Library That Contains Password-Protected Data
Sets” on page 169
n “Example 3: Binding a Library When Existing Data Sets Are Protected with the
Same Passwords” on page 171
n “Example 4: Binding a Library When Existing Data Sets Are Protected with
Different Passwords” on page 173
n “Example 5: Changing Passwords on Data Sets” on page 176
n “Example 6: Changing Metadata-Bound Library Passwords” on page 178
considerations apply for these encrypted data sets when processed by the
AUTHLIB procedure. The same encrypt passphrase is used for both AES and AES2,
but different keys are generated for the actual encryption. AES2 keys meet stricter
NIST (National Institute of Standards and Technology) guidelines..
CAUTION
AES encryption is supported only in SAS 9.4 and later releases. Do not use AES
encryption if the data sets need to be accessible by SAS 9.3M2. The AES2 key
generation algorithm is supported only in SAS 9.4M5 and later. Do not use it if data sets
need to be accessible by earlier releases.
CAUTION
Even if you record the encryption key in metadata for the library, you
should also record the key elsewhere when using ENCRYPT=AES or
ENCRYPT=AES2. If you lose the metadata and forget the ENCRYPTKEY= key
value, then you lose your data. SAS cannot assist you in recovering the
ENCRYPTKEY= key value. The following note is written to the log:
CAUTION
If data sets using AES encryption have referential integrity constraints, then
the encryption key for all data sets must be available when they are opened
for Update access. Normally, SAS requires that all data sets share the same
encryption key. With a recorded optional or required encryption key in metadata, related
data sets can have different keys. However, issues can arise if you change the
encryption key on one library that has data sets related to data sets in a different library.
See Also
n “Example 10: Binding a Library When Existing Data Sets Are SAS Proprietary
Encrypted” on page 184
n “Example 11: Binding a Library When Existing Data Sets Are AES-Encrypted” on
page 186
n “Example 12: Binding a Library with an Optional Recorded Encryption Key When
Existing AES-Encrypted Data Sets Have Different Encryption Keys ” on page 189
n “Example 13: Binding a Library with Required AES Encryption When Existing
Data Sets Are Encrypted with the Same Encryption Key” on page 192
n “Example 14: Changing the Encryption Key on a Metadata-Bound Library That
Requires AES Encryption” on page 196
132 Chapter 8 / AUTHLIB Procedure
n ENCRYPT=
n ENCRYPTKEY=
The metadata-bound library encryption options are set in the CREATE statement
and can be changed with the MODIFY statement. The encryption of data sets in the
operating system library can be changed by the CREATE and MODIFY statements
and subordinate TABLES statements. The encryption of data sets can also be
changed if the library is unbound from the metadata by using a REMOVE
statement. However, note that if the CREATE, MODIFY, or REMOVE statement also
specifies TABLESONLY=YES, then any new encryption options on those
statements are ignored. Also note that when encryption options are changed for a
data set, the copy-in-place operation is automatically executed to re-encrypt the
data with the new options. For more information about the copy-in-place operation,
see “Copy-In-Place Operation” on page 166.
The ENCRYPT= option specifies the encryption type to use: AES, AES2, YES, or
NO. ENCRYPT=NO is not valid if encryption is required. To record or change a
metadata-bound library encryption key, ENCRYPT=AES or ENCRYPT=AES2 must
be specified. If you want to switch from a required encryption with a recorded AES
or AES2 encryption key to a required encryption with the SAS Proprietary
algorithm, then specify ENCRYPT=YES in the MODIFY statement. This process
also removes the recorded encryption key. To remove the recorded encryption key
when encryption is not required, specify ENCRYPT=NO in the MODIFY statement.
To change the encryption of data sets when unbinding with the REMOVE
statement, perform one of the following tasks:
n specify different encryption options for data sets that are unbound by using
TABLESONLY=YES and the encryption options on different TABLES statements
n change to a common encryption for all data sets that are unbound with the
ENCRYPT= option if TABLESONLY is not YES
For CREATE and MODIFY statements, the encryption key value to record in the
metadata or data sets is obtained from the encryption key value before the slash
(/) if
Concepts: AUTHLIB Procedure 133
n ENCRYPT=AES or ENCRYPT=AES2
If encryption is required, then you do not specify a new key value in a TABLES
statement following a CREATE or MODIFY statement. The new value is obtained
from the metadata to which the data set is bound or being bound. If encryption is
not required or if you are following a REMOVE statement with TABLESONLY=YES,
then you can specify ENCRYPT=AES or ENCRYPT=AES2 and a new key value in
TABLES statements to have the data set re-encrypted with the new key value.
See Also
n “Example 15: Binding a Library with Existing Data Sets That Are AES-Encrypted
with Different Encryption Keys” on page 199
n “Example 16: Changing a Metadata-Bound Library to Require AES Encryption
When Existing Data Sets Are Encrypted with Different Encryption Keys” on
page 202
Beginning with SAS 9.4M3, the credentials are retained in metadata and can be
used by the system to open data sets that were not modified. This retention
enables the user to continue processing tables and the administrator to complete
the modification of credentials. The retained credentials are purged if a MODIFY
statement that is processing all of the tables in the library determines that all the
tables have been successfully changed with the credentials.
An administrator might want to retain the credentials even after all the existing
tables have been processed successfully. The following are reasons for retaining
the credentials:
n It enables processing of view files that implemented row and column level
security on underlying tables by using the old passwords in the view definition.
134 Chapter 8 / AUTHLIB Procedure
SAS does not know which view files might contain the passwords and does not
have the ability to modify them in the view file. The administrator must redefine
the views with the new passwords.
n It enables processing of data sets restored from backups prior to the
modification.
An administrator who wants to retain older credentials and not purge them can
specify the PURGE=NO option in the MODIFY statement.
Note: The administrator must specify the PURGE=NO option in each MODIFY
statement that processes all tables until the administrator is ready for the replaced
credentials to be purged.
If a library contains tables that do not follow our best practices, automatic deletion
of old credentials might not occur when issuing a MODIFY statement for all tables.
For example, a MODIFY statement that changes the stored encryption key for a
library with optional encryption would not modify the keys of data sets whose keys
do not match the stored key. Because some data sets were not modified, the old
encryption key is not removed. In this case, the PURGE statement must be used to
remove the old credentials.
Note: Notes are written to the SAS log whenever a metadata-bound table is
accessed and the replaced credentials are used to successfully open the data set.
The Note identifies the date and time that these credentials were replaced.
See Also
n “Example 15: Binding a Library with Existing Data Sets That Are AES-Encrypted
with Different Encryption Keys” on page 199
n “Example 16: Changing a Metadata-Bound Library to Require AES Encryption
When Existing Data Sets Are Encrypted with Different Encryption Keys” on
page 202
This can also occur if data sets were to be copied into the library by an operating
system copy utility.
If a data set was bound before being copied, then the data set is still protected by
the permissions that the users have in the secured table object to which it is bound
in the original secured library.
136 Chapter 8 / AUTHLIB Procedure
If a data set was not bound before being copied, then it is also not bound in the new
library or protected by the metadata permissions. If the data set has passwords,
then you must supply the appropriate passwords to access the data.
You can use the MODIFY statement to modify the passwords if necessary and to
bind the data set to a secured table object in the secured library object to which the
library is bound. For more information, see “Example 5: Changing Passwords on
Data Sets” on page 176.
CREATE Create the secured library object in the SAS Ex. 1, Ex. 2,
Metadata Server and record the physical Ex. 3, Ex. 4,
138 Chapter 8 / AUTHLIB Procedure
Note: Data set names and variable names that end with large numeric values that are
larger than a long integer cannot be used in numbered-range lists. For more
information, see “Restriction for Numbered Range Lists” in SAS Language Reference:
Concepts.
Syntax
PROC AUTHLIB <options>;
PROC AUTHLIB Statement 139
Optional Arguments
LIBRARY=libref
is the name of the physical library for which the secured library object is created
and the security information is stored.
Aliases LIB=
DDNAME=
DD=
NOWARN
suppresses the file not found error message when a data set in a TABLES
statement does not exist.
PWREQ=YES | NO
controls the pop up of a dialog box for a data set password in interactive mode.
Default NO
140 Chapter 8 / AUTHLIB Procedure
CREATE Statement
Binds a physical library and data sets in the library to metadata by generating corresponding
metadata objects in the SAS Metadata Repository and creating a record of the metadata objects in
the physical directory and data sets.
Requirement: The AUTHLIB CREATE statement requires a connection to the target metadata
server. For more requirements, see “Requirements for Using the AUTHLIB
Statements” on page 165.
Tip: Each password and encryption key option must be coded on a separate line to
ensure that they are properly blotted in the log.
Syntax
CREATE
SECUREDLIBRARY='secured-library-name'
<SECUREDFOLDER='secured-folder-path'>
<LIBRARY=libref>
PW=all-password-value </ new-all-password-value> |
ALTER=alter-password-value </ new-alter-password-value>
READ=read-password-value </ new-read-password-value>
WRITE=write-password-value </ new-write-password-value>
<REQUIRE_ENCRYPTION=YES | NO>
<ENCRYPT=YES | NO | AES | AES2>
<ENCRYPTKEY=key-value </ new-key-value>>;
Required Arguments
SECUREDLIBRARY='secured-library-name'
names the secured library object in the SAS Metadata Server.
Alias SECLIB=
Restriction The total length of the secured library object pathname including
the fully qualified secured folder path cannot exceed 256 characters.
TIP All password values must be valid SAS names with a maximum
length of 8 characters.
Optional Arguments
SECUREDFOLDER='secured-folder-path'
is the name of the metadata folder within the /System/Secured Libraries
folder tree where the secured library object is created.
If the SECUREDFOLDER= option is not specified, then the metadata-bound
library is created directly in the /System/Secured Libraries folder of the
Foundation repository. If the SECUREDFOLDER= option does not begin with a
slash (/), then it is a relative path and the value is appended to /System/
Secured Libraries/ to find the folder. If the SECUREDFOLDER= option begins
with a slash (/), then it is an absolute path and the value must begin with /
System/Secured Libraries or /<repository_name>/System/Secured
Libraries.
Alias SECFLDR=
Restriction The total length of the secured library object pathname including
the fully qualified secured folder path cannot exceed 256 characters.
Note The encryption key value for all the data sets in a library can be
stored in a metadata-bound library so that an authorized user does
not have to supply the encryption key value every time a data set
is opened. For more information, see “Considerations for Data File
Encryption” in the SAS Guide to Metadata-Bound Libraries.
LIBRARY=libref
name of the physical library for which the secured library object is created and
the security information is stored.
If the LIBRARY= option is not specified, then the physical library from the
AUTHLIB procedure is used.
Aliases LIB=
DDNAME=
DD=
REQUIRE_ENCRYPTION=YES | NO
YES specifies that all data sets in a metadata-bound library are
automatically encrypted.
NO specifies that data sets in a metadata-bound library are not
automatically encrypted.
Details
Specifying Passwords
If your physical library does not contain password-protected data sets, then you
need to specify the new metadata-bound library password(s) with either the PW=
option or READ=,WRITE=, and ALTER= options in the CREATE statement. This is
the most common case. For an example, see “Example 1: Binding a Physical Library
That Contains Unprotected Data Sets” on page 167.
If your physical library contains some password-protected data sets that all share
the same current set of passwords, then you can specify the most restrictive
password on the data sets before a slash (/) in the CREATE statement password
option(s) and the new password(s) after the slash (/). For an example, see
“Example 3: Binding a Library When Existing Data Sets Are Protected with the
Same Passwords” on page 171.
If your physical library contains password-protected data sets with different sets of
passwords, then you can specify the data sets with each set of passwords on
separate TABLES statements (see “Example 4: Binding a Library When Existing
CREATE Statement 143
Data Sets Are Protected with Different Passwords” on page 173) or you can
subsequently use MODIFY and TABLES statements to change the passwords after
the library has been bound with the CREATE statement (see “Example 5: Changing
Passwords on Data Sets” on page 176).
If your physical library contains some AES-encrypted data sets that all share the
same AES or AES2 encryption key, then you can specify the key value following
ENCRYPTKEY= in the CREATE statement. If you want to record the key in
metadata, then specify ENCRYPT=AES or ENCRYPT=AES2. For an example, see
“Example 13: Binding a Library with Required AES Encryption When Existing Data
Sets Are Encrypted with the Same Encryption Key” on page 192.
If your physical library contains AES or AES2-encrypted data sets with different
encryption keys, then you can specify the data sets with each encryption key on
separate TABLES statements. For an example, see “Example 15: Binding a Library
with Existing Data Sets That Are AES-Encrypted with Different Encryption Keys”
on page 199.
TIP For more information, see “Considerations for Data File Encryption” in
the SAS Guide to Metadata-Bound Libraries.
For more information, see “ENCRYPTKEY= Data Set Option” in SAS Data Set
Options: Reference and “ENCRYPT= Data Set Option” in SAS Data Set Options:
Reference.
CAUTION
If data sets using AES encryption have referential integrity constraints, then
the encryption key for all data sets must be available when they are opened
for Update access. Normally, SAS requires that all data sets share the same
encryption key. With a recorded optional or required encryption key in metadata, related
data sets can have different keys. However, issues can arise if you change the
encryption key on one library that has data sets related to data sets in a different library.
CAUTION
For AES-encrypted data sets that are referentially related to one another,
follow these best practices to ensure that the data does not become
inaccessible: Store the encryption key in the library’s metadata. You can modify the
stored key, but do not remove the key from metadata and do not unbind the library.
CAUTION
Even if you record the encryption key in metadata for the library, then you
should also record the key elsewhere when using ENCRYPT=AES or
ENCRYPT=AES2. If you lose the metadata and forget the ENCRYPTKEY= key value,
144 Chapter 8 / AUTHLIB Procedure
then you lose your data. SAS cannot assist you in recovering the ENCRYPTKEY= key
value. The following note is written to the log:
MODIFY Statement
Modifies password and encryption key values for a metadata-bound library.
Requirement: The AUTHLIB MODIFY statement requires a connection to the target metadata
server. For more requirements, see “Requirements for Using the AUTHLIB
Statements” on page 165.
Tip: Each password and encryption key option must be coded on a separate line to
ensure that they are properly blotted in the log.
Syntax
MODIFY
<LIBRARY=libref>
PW=all-password </ new-all-password> |
ALTER=alter-password </ new-alter-password>
READ=read-password </ new-read-password>
WRITE=write-password </ new-write-password>
<TABLESONLY=YES | NO>
<REQUIRE_ENCRYPTION=YES | NO>
<ENCRYPT=YES | NO | AES | AES2>
<ENCRYPTKEY=key-value </ new-key-value>>
<PURGE=YES | NO>;
Required Arguments
PW=all-password </ new-all-password >
modifies a single password for a metadata-bound library.
TIP All password values must be valid SAS names with a maximum
length of 8 characters.
Optional Arguments
ENCRYPT=YES | NO | AES | AES2
specifies the encryption type.
YES
specifies the SAS Proprietary algorithm.
NO
specifies no encryption.
AES
AES2
specifies Advanced Encryption Standard (AES) encryption and to record the
key in metadata.
Note The encryption key value for all the data sets in a library can be
stored in a metadata-bound library so that an authorized user does
not have to supply the encryption key value every time a data set
is opened. For more information, see “Considerations for Data File
Encryption” in the SAS Guide to Metadata-Bound Libraries.
LIBRARY=libref
name of the physical library that is metadata-bound.
146 Chapter 8 / AUTHLIB Procedure
If the LIBRARY= option is not specified, then the physical library from the
AUTHLIB procedure is used.
PURGE=YES | NO
YES
removes all retained metadata-bound library credentials if all tables in the
library are successfully modified to the newer credentials.
Default YES
NO
does not remove replaced metadata-bound library credentials even if all
tables in the library were successfully modified.
REQUIRE_ENCRYPTION=YES | NO
YES
specifies that all data sets in a metadata-bound library are automatically
encrypted.
NO
specifies that data sets in a metadata-bound library are not automatically
encrypted.
TABLESONLY=YES | NO
specifies whether the MODIFY statement action is applied at the library level or
just to the tables. If TABLESONLY=NO, then the action is applied to the library
and data sets. If TABLESONLY=YES, then the action is applied only to the data
sets.
Default NO
Details
If your physical library is currently bound to a metadata library with one set of
passwords and you want to change the metadata-bound library passwords to
another set, then specify the current and new values for the metadata-bound
library passwords separated by a / in the MODIFY statement. For an example, see
“Example 6: Changing Metadata-Bound Library Passwords” on page 178.
If your physical library contains password-protected data sets with different sets of
passwords from the metadata-bound library passwords, then you can modify the
data set passwords to match the metadata-bound library required passwords using
the MODIFY and TABLES statements. Specify the metadata-bound library
passwords in the MODIFY statement. Specify the data sets with each set of
passwords in separate TABLES statements. For more information, see “Example 5:
Changing Passwords on Data Sets” on page 176.
If you want to change encryption options for the library, then specify the new
options in the MODIFY statement. If your physical library contains AES-encrypted
data sets, then you must specify the ENCRYPTKEY= key value in the MODIFY or
TABLES statements or have a recorded encryption key for the library to make any
modifications to the encrypted data sets. For and example, see “Example 16:
Changing a Metadata-Bound Library to Require AES Encryption When Existing
Data Sets Are Encrypted with Different Encryption Keys” on page 202.
CAUTION
For AES-encrypted data sets that are referentially related to one another,
follow these best practices to ensure that the data does not become
inaccessible: Store the encryption key in the library’s metadata. You can modify the
stored key, but do not remove the key from metadata and do not unbind the library.
CAUTION
Even if you record the encryption key in metadata for the library, you should
also record the key elsewhere when using ENCRYPT=AES or
ENCRYPT=AES2. If you lose the metadata and forget the ENCRYPTKEY= key value,
then you lose your data. SAS cannot assist you in recovering the ENCRYPTKEY= key
value.
You might have a need to import a SecuredLibrary object from a backup package
for one of the following reasons:
n the SecuredLibrary object was inadvertently deleted
Password values and encryption key values are not exported with the
SecuredLibrary object. This prevents them from being imported to a rogue
Metadata Server. In this case, the passwords and any recorded encryption key
values need to be reset in the imported SecuredLibrary object. Until you do this,
libname assignments that refers to the imported SecuredLibrary object will fail
with the following messages:
If you want to modify the passwords or encryption options for a secured library
object that is no longer bound to a physical library, then specify LIBRARY=_NONE_
with the SECUREDLIBRARY= and SECUREDFOLDER= options to locate the
secured library object.
MODIFY <LIBRARY=_NONE_ SECUREDLIBRARY=secured-library-name>
<SECUREDFOLDER=secured-folder-name>
CAUTION
Do not use LIB=_none_ when the secured library object is bound to a physical
library. LIB=_none_ causes the action to operate only on the secured library object and
has no effect on the physical data.
PURGE Statement
Removes any retained metadata-bound library credentials older than a given date of replacement.
Requirement: The AUTHLIB PURGE statement requires a connection to the target metadata
server. For more requirements, see “Requirements for Using the AUTHLIB
Statements” on page 165.
PURGE Statement 149
Tip: Each password and encryption key option must be coded on a separate line to
ensure that they are properly blotted in the log.
Syntax
PURGE CREDENTIALS | CREDS <LIBRARY=libref>
PW=all-password |
ALTER=alter-password
READ=read-password
WRITE=write-password
BEFORE=datetime;
Required Arguments
PW=all-password
specifies a single password for a metadata-bound library.
ALTER=alter-password
specifies one of a maximum of three password values for a metadata-bound
library.
READ=read-password
specifies one of a maximum of three password values for a metadata-bound
library.
WRITE=write-password
specifies one of a maximum of three password values for a metadata-bound
library.
TIP All password values must be valid SAS names with a maximum
length of 8 characters.
BEFORE=datetime
specifies a datetime constant before any replaced, but retained, credentials are
removed.
Optional Argument
LIBRARY=libref
name of the physical library for which the metadata-bound library is created and
the security information is stored.
If the LIBRARY= option is not specified, then the physical library from the
AUTHLIB procedure is used.
Details
If you want to purge the credentials for a secured library object that is no longer
bound to a physical library, then specify LIBRARY=_NONE_ with the
SECUREDLIBRARY= and SECUREDFOLDER= options to locate the secured library
object.
PURGE CREDENTIALS <LIBRARY=_NONE_ SECUREDLIBRARY=secured-library-name>
<SECUREDFOLDER=secured-folder-name>
REMOVE Statement
Removes the physical security information and metadata objects that protect a metadata-bound
library so that it is no longer a metadata-bound library.
Requirement: The AUTHLIB REMOVE statement requires a connection to the target metadata
server. For more requirements, see “Requirements for Using the AUTHLIB
Statements” on page 165.
Note: If any data set uses SAS Proprietary Encryption, then you cannot remove
passwords unless you also specify ENCRYPT=NO to remove encryption.
Tips: Each password and encryption key option must be coded on a separate line to
ensure that they are properly blotted in the log.
If you do not want the non-secured data sets altered, then move all non-secured
data sets from the physical library before performing a REMOVE statement.
Before you use the REMOVE statement, consider running the REPORT statement.
The output from the REPORT statement identifies any physical tables that do not
have corresponding secured table objects in metadata. In the unusual circumstance
that such physical tables exist, their security location information is unaffected by
the REMOVE statement unless you specify AUTHADMIN=YES in the LIBNAME
REMOVE Statement 151
Syntax
REMOVE<LIBRARY=libref>
PW=all-password </ <new-all-password>> |
ALTER=alter-password </ <new-alter-password>>
READ=read-password </ <new-read-password>>
WRITE=write-password </ <new-write-password>>
<TABLESONLY=YES | NO>
<ENCRYPT=YES | NO | AES | AES2>
<ENCRYPTKEY=key-value </ new-key-value>>;
Required Arguments
PW=all-password </ <new-all-password>>
specifies a single password for a metadata-bound library.
Optional Arguments
ENCRYPT=YES | NO | AES | AES2
specifies the encryption type.
See “ENCRYPTKEY= Data Set Option” in SAS Data Set Options: Reference
LIBRARY=libref
name of the physical library that is metadata-bound.
If the LIBRARY= option is not specified, then the physical library from the PROC
AUTHLIB statement is used.
TABLESONLY=YES | NO
specifies whether the REMOVE statement action is applied at the library level
or just to the tables. If TABLESONLY=NO, then the action is applied to the
library and data sets. If TABLESONLY=YES, then the action is applied only to
the individual data sets listed.
Default NO
Details
The REMOVE statement is used to unbind the metadata-bound library feature from
a SAS library and the data sets within it. This statement also removes the secured
library and secured table objects from the SAS Metadata Server. The data sets
remain in the physical library protected by the metadata-bound library passwords
unless the administrator specifies password modifications in the REMOVE
statement. Since the metadata-bound library feature is being removed and there is
no longer a requirement that the data set passwords match the metadata-bound
library passwords, the data set passwords can be removed by using a slash (/) after
the current password but not specifying a new password. If you choose to do this,
then you are warned in the SAS log that the data sets no longer have any SAS
protection. You can also modify the encryption key of data sets by specifying the
new key following a slash (/) in ENCRYPTKEY= and specifying ENCRYPT=AES or
ENCRYPT=AES2. You can change to SAS Proprietary Encryption by specifying
ENCRYPT=YES. You can remove all encryption by specifying ENCRYPT=NO.
The REMOVE statement removes the location information from any data set if the
passwords specified match the metadata-bound library passwords stored in the
data set. Note also that if the data set is AES or AES2-encrypted, the encryption
REPAIR Statement 153
Note: Ensure that all physical tables that are protected by a particular metadata-
bound library remain within that library (directory). This best practice maximizes
clarity and is essential in order for REMOVE statements to be fully effective.
Special circumstances (for example, a table that is host copied to another
directory) can prevent a REMOVE statement from unbinding the relocated data set.
CAUTION
If you have to unbind a library that contains AES-encrypted data sets that are
referentially related to other data sets, then either make sure that all related
data sets are no longer AES-encrypted or make sure that all related data sets
share the same encryption key. If you preserve AES encryption, the data will be
available only to those users who supply the key and have host-layer access.
REPAIR Statement
Recovers security information (in physical data) or secured library and table objects (in metadata).
Requirement: The AUTHLIB REPAIR statement requires a connection to the target metadata
server. For more requirements, see “Requirements for Using the AUTHLIB
Statements” on page 165.
Tip: Each password and encryption key option must be coded on a separate line to
ensure that they are properly blotted in the log.
Syntax
REPAIR ADD | UPDATE | DELETE
LOCATION | METADATA
SECUREDLIBRARY=’secured-library-name’
SECUREDFOLDER='secured-folder-path'
<LIBRARY=libref>
PW=all-password |
ALTER=alter-password
READ=read-password
WRITE=write-password
<TABLESONLY=YES | NO>
<ENCRYPT=YES | NO | AES | AES2>
154 Chapter 8 / AUTHLIB Procedure
<ENCRYPTKEY=key-value>;
Required Arguments
ADD | UPDATE | DELETE
one of these actions must be specified.
LOCATION | METADATA
clarifies whether the action is to apply to the physical security information in
the file system, to the metadata objects in the SAS Metadata Server, or to both.
PW=all-password
specifies a single password for a metadata-bound library.
ALTER=alter-password
assigns, changes, or removes an Alter password from the secured library object
and from the data sets in the physical library.
READ=read-password
assigns, changes, or removes a Read password from the secured library object
and from the data sets in the physical library.
WRITE=write-password
assigns, changes, or removes a Write password from the secured library object
and from the data sets in the physical library.
TABLESONLY= YES | NO
specifies whether the REPAIR statement action is applied at the library level or
just to the tables. If TABLESONLY=NO, then the action is applied to the library
and the tables. If TABLESONLY=YES, then the action is applied only to the
tables. This is especially important for REPAIR because it gives the
administrator a way to delete specific secured table objects without deleting
the secured library and all secured tables.
Optional Arguments
/
is required if any options are included, such as passwords or MEMTYPE=. Here
is an example:
tables table-name / pw=password;
ENCRYPTKEY=key-value
specifies a key value for AES encryption.
Note The encryption key value for all the data sets in a library can be
stored in a metadata-bound library so that an authorized user does
not have to supply the encryption key value every time a data set
is opened. For more information, see “Considerations for Data File
Encryption” in the SAS Guide to Metadata-Bound Libraries.
LIBRARY=libref
name of the physical library where the security information is stored.
If the LIBRARY= option is not specified, then the physical library from the PROC
AUTHLIB statement is used.
SECUREDLIBRARY='secured-library-name'
names the secured library object in the SAS Metadata Server.
Alias SECLIB=
Restriction The total length of the secured library object pathname including
the fully qualified secured folder path cannot exceed 256 characters.
SECUREDFOLDER='secured-folder-path'
name of the metadata folder within a /System/Secured Libraries folder tree
where the secured library is repaired or re-created.
Alias SECFLDR=
Restriction The total length of the secured library object pathname including
the fully qualified secured folder path cannot exceed 256 characters.
156 Chapter 8 / AUTHLIB Procedure
TABLESONLY=YES | NO
specifies whether the REPAIR statement action is applied at the library level or
just to the tables. If TABLESONLY=NO, then the action is applied to the library
and the tables. If TABLESONLY=YES, then the action is applied only to the
tables. This is especially important for REPAIR because it gives the
administrator a way to delete specific secured table objects without deleting
the secured library and all secured tables.
Default NO
Details
The REPAIR statement feature that has been fully tested is REPAIR DELETE
LOCATION. Use this combination of options when you need to delete the security
information in a metadata-bound library and or data sets within the library without
deleting the metadata objects.
It is possible for a system administrator to get in situations where a data set still
has location information pointing to a secured table object that no longer exists.
REPAIR DELETE LOCATION is required to remove that location information before
the data set can be accessed in any other way.
When using the REPAIR statement, one of the ADD, UPDATE, or DELETE actions
must be specified. LOCATION, METADATA, or both are used to clarify if the action
is to apply to the metadata security information in the file system, to the metadata
objects in the SAS Metadata Server, or to both. Other than DELETE LOCATION,
these other actions have not been fully tested and are considered pre-production
implementations. They are documented here but should be used only under advise
and direction from Technical Support.
One or more TABLES statements can follow the REPAIR statement to perform the
same action on the specified data sets. An implicit TABLES _ALL_ is used if no
TABLES statement follows the REPAIR statement.
CAUTION
Repairing a metadata-bound library is an advanced task. Make sure you have a
current backup (of both metadata and physical data) before you use this statement.
1 Create a new operating system directory and metadata-bound library, and then
use SAS Management Console to set appropriate default library permissions for
the new secured library object.
2 Access the current library with the AUTHADMIN=YES, AUTHPW= or
AUTHALTER=, AUTHWRITE=, and AUTHREAD= options in the LIBNAME
statement.
3 Use the SAS COPY procedure to copy the SAS data sets to the new library. Use
CONSTRAINT=YES if any data sets have referential integrity constraints. Use
SAS Management Console to set any permissions on the secured table objects
that differ from those inherited from the secured library object. The following is
an example of using the COPY procedure.
Metadata-bound library ABCDE also has data sets Employees, EmpInfo, and
Product. The REPORT statement has shown some inconsistencies between the
physical library contents and the corresponding metadata objects. This is an
example of a way to resolve these differences.
libname klmno "SAS-library-2";
NOTE: There were 5 observations read from the data set ABCDE.EMPLOYEES.
NOTE: The data set KLMNO.EMPLOYEES has 5 observations and 6 variables.
NOTE: Copying ABCDE.PRODUCT to KLMNO.PRODUCT (memtype=DATA).
NOTE: Data set ABCDE.PRODUCT.DATA has secured table object location information, but the
secured library object location information that it contains:
SecuredFolder: /System/Secured Libraries/Department XYZZY
SecuredLibrary: ABCDEEmps
SecuredLibraryGUID: 38C24AF4-9CF5-458B-8389-52092307007E
is different from the registered location for the library ABCDE:
SecuredFolder:
SecuredLibrary:
SecuredLibraryGUID:
The data set might have been copied to this directory with a host copy utility.
NOTE: Permissions are obtained from the secured table and the secured library objects that are
referenced in the header of the metadata-bound table.
NOTE: Metadata-bound library permissions are used for KLMNO.PRODUCT.DATA.
NOTE: Successfully added new secured table object "PRODUCT.DATA" to the secured library object
at path "/System/Secured Libraries/Department XYZZY/KLMNOEmps" for data set
KLMNO.PRODUCT.DATA.
NOTE: There were 5 observations read from the data set ABCDE.PRODUCT.
NOTE: The data set KLMNO.PRODUCT has 5 observations and 2 variables.
NOTE: PROCEDURE COPY used (Total process time):
real time 0.14 seconds
cpu time 0.04 seconds
REPORT Statement 159
Note: The METADATA option is not supported with a REPAIR UPDATE action.
REPORT Statement
For a specified metadata-bound library, compares physical library contents with corresponding
metadata objects (in order to identify any inconsistencies).
160 Chapter 8 / AUTHLIB Procedure
Requirement: The AUTHLIB REPORT statement requires a connection to the target metadata
server. For more requirements, see “Requirements for Using the REPORT
Statement” on page 160.
Tip: Each password and encryption key option must be coded on a separate line to
ensure that they are properly blotted in the log.
Example: “Example 8: Using the REPORT Statement” on page 181
Syntax
REPORT
<LIBRARY=libref>
<ENCRYPTKEY=key-value>;
Optional Arguments
LIBRARY=libref
name of the physical library on which to report binding information.
If the LIBRARY= option is not specified, then the physical library from the PROC
AUTHLIB statement is used.
ENCRYPTKEY=key-value
specifies a key value for an AES encryption.
See “ENCRYPTKEY= Data Set Option” in SAS Data Set Options: Reference
Details
In order to use the REPORT statement, you must meet the following criteria:
TABLES Statement 161
n The SAS session runs under an account that has host-layer Read access to the
target physical library. This is necessary in order to assign the libref.
n The SAS session connects to the metadata server as an identity that has the
ReadMetadata permission for the target secured library object and secured
table objects.
n If the library has secured library object location information and the secured
library object cannot be obtained, then you will need to use the
AUTHADMIN=YES option in the LIBNAME= statement in order to assign the
library.
Reporting Inconsistencies
The REPORT statement is used to report any inconsistencies between the physical
library contents and the corresponding metadata objects.
The REPORT statement reports the secured table and metadata-bound library
security information for each data set in the operating system directory of the
library. This data set information is grouped by the metadata-bound library
attributes that all the data sets share. If any data sets in the physical library are
correctly registered to the secured library object for the library and have the
required passwords, then those data sets and attributes will be listed as the first
grouping in the report. Subsequent groupings are for data sets with either
passwords that differ from the metadata-bound library passwords or whose
metadata-bound library security information does not match the metadata-bound
library location registered for the operating system directory.
TABLES Statement
Used after a CREATE, MODIFY, REMOVE, REPAIR, and REPORT statement to specify the tables to
process a statement action. Also, you can specify the current passwords or encryption key value of
the data sets in the TABLES statement, if different from the metadata-bound library passwords or
recorded encryption key.
Default: When no TABLES statement is specified, the TABLES _ALL_ statement is the
default behavior.
Requirement: The TABLES statement must be preceded by a CREATE, MODIFY, REMOVE,
REPAIR, REPORT, or another TABLES statement.
Tip: Each password and encryption key option must be coded on a separate line to
ensure that they are properly blotted in the log.
162 Chapter 8 / AUTHLIB Procedure
Syntax
TABLES SAS-dataset(s) | _ALL_ | _NONE_
</>
<PW=all-password > </ <new-all-password>> |
<ALTER=alter-password> </ <new-alter-password>>
<READ=read-password> </ <new-read-password>>
<WRITE=write-password> </ <new-write-password>>;
<MEMTYPE= DATA | VIEW>
<ENCRYPT=YES | NO | AES | AES2>
<ENCRYPTKEY=key-value< / new-key-value>>;
Required Argument
SAS-dataset(s) | _ALL_ | _NONE_
SAS- name of one or more SAS data sets.
dataset(s)
_ALL_ specifies password options to apply to all data sets.
_NONE_ limits the action of the previous CREATE, MODIFY, or
REPAIR statements to the library level and does not apply
the action to any table.
Optional Arguments
/
is required if any options are included, such as passwords or MEMTYPE=. Here
is an example:
tables table-name / pw=password;
Requirement ENCRYPTKEY= data set option is required if the data file has AES
encryption and the key is not recorded for the library.
TABLES Statement 163
Aliases MTYPE=
MT=
Default ALL
TIP All password values must be valid SAS names with a maximum
length of 8 characters.
Details
TABLES _NONE_ can be used to limit the action of the previous CREATE, MODIFY,
or REPAIR statements to the library level and not apply the action to any table.
TABLES _ALL_ is the default behavior if no TABLES statement is specified. You
might wish to write an explicit TABLES _ALL_ if you want to specify passwords or
encryption key values to use when opening all data sets.
n data sets with passwords or encryption key values matching the metadata-
bound library
When you use the TABLES statement after the REMOVE statement, an
ENCRYPT=NO option removes the encryption on the data set as the table is being
removed. For more information, see “Encrypted Data Set Considerations” on page
129. This process is necessary only if the administrator is trying to remove the
passwords or encryption of a data set.
Usage: AUTHLIB Procedure 165
If you are removing the binding of the physical library to metadata or the physical
library is not bound to a secured library, then you might want to modify the data set
passwords or encryption to some other value. You are not restricted to changing to
a common metadata-bound library password or encryption. You might choose to
specify both a current and new password or current and new encryption key
separated by a slash (/) in the REMOVE statement. If you want the different data
sets to have unique passwords or encryption, then use the following two steps:
1 Change the PW= option for the data sets using a REMOVE statement with
TABLESONLY=YES and an individual TABLES statement for each unique
password and encryption.
2 Remove the metadata-bound library using a REMOVE statement without
TABLESONLY=YES.
n You must supply the password(s) in CREATE, MODIFY, REPAIR, and REMOVE
statements.
The REPORT statement requirements are less restrictive and are documented with
that statement.
Copy-In-Place Operation
In the SAS 9.4 release, the copy-in-place operation is used to re-encrypt data sets.
Details
This example demonstrates binding a physical library that contains data sets that
do not have passwords or AES encryption.
Program
proc authlib lib=zyxwvut;
create securedfolder="Department XYZZY"
securedlibrary="ZYXWVUTEmps"
pw=secretpw;
run;
quit;
Program Description
Library ZYXWVUT contains three data sets that do not have passwords:
Employees, EmpInfo, Product.
proc authlib lib=zyxwvut;
Using the CREATE statement, enter the name of the metadata folder and name
the secured library object in the SAS Metadata Server. Specify metadata-bound
library passwords with the PW= option.
create securedfolder="Department XYZZY"
securedlibrary="ZYXWVUTEmps"
pw=secretpw;
run;
quit;
Results: The library and data sets are bound with the password secretpw. The
binding is straightforward, as PROC AUTHLIB has unhindered access to the data.
Example 2: Binding a Physical Library That Contains Password-Protected Data Sets 169
Log Examples
Example Code 8.2 Unprotected Data Sets
NOTE: Successfully created a secured library object for the physical library ZYXWVUT and recorded its
location as:
SecuredFolder: /System/Secured Libraries/Department XYZZY
SecuredLibrary: ZYXWVUTEmps
SecuredLibraryGUID: 1A323C03-A3D8-4A83-9615-2BC2CB9FAAE2
NOTE: Successfully added new secured table object "EMPINFO.DATA" to the secured library object at
path "/System/Secured Libraries/Department XYZZY/ZYXWVUTEmps" for data set ZYXWVUT.EMPINFO.DATA.
NOTE: The passwords on ZYXWVUT.EMPINFO.DATA were successfully modified.
NOTE: Successfully added new secured table object "EMPLOYEES.DATA" to the secured library object at
path "/System/Secured Libraries/Department XYZZY/ZYXWVUTEmps" for data set
ZYXWVUT.EMPLOYEES.DATA.
NOTE: The passwords on ZYXWVUT.EMPLOYEES.DATA were successfully modified.
NOTE: Successfully added new secured table object "PRODUCT.DATA" to the secured library object at
path "/System/Secured Libraries/Department XYZZY/ZYXWVUTEmps" for data set ZYXWVUT.PRODUCT.DATA.
NOTE: The passwords on ZYXWVUT.PRODUCT.DATA were successfully modified.
86 quit;
Details
This example demonstrates what happens if you use a similar CREATE statement
as Example 1 when the physical library contains two data sets that have the same
READ=, WRITE=, and ALTER= passwords and one data set that does not have any
passwords. None of the data sets are AES-encrypted.
170 Chapter 8 / AUTHLIB Procedure
Program
proc authlib lib=abcde;
create securedfolder="Department XYZZY"
securedlibrary="ABCDEEmps"
pw=secretpw;
run;
quit;
Program Description
Library ABCDE has Employees, EmpInfo, and Product data sets. However, in
library ABCDE, the Employees and EmpInfo data sets are protected with a READ=
password abcd, WRITE= password efgh, and an ALTER= password ijkl before the
library is secured by the statements. The third data set, Product, is not protected
with passwords.
proc authlib lib=abcde;
Using the CREATE statement, enter the name of the metadata folder and name
the secured library object in the SAS Metadata Server. Specify metadata-bound
library passwords with the PW= option.
create securedfolder="Department XYZZY"
securedlibrary="ABCDEEmps"
pw=secretpw;
run;
quit;
Results: The ABCDE library is bound and the unprotected Product data set is
bound and the password was set. The protected data sets are not bound and their
passwords did not change because their current passwords were not specified.
Example 3: Binding a Library When Existing Data Sets Are Protected with the Same
Passwords 171
Log Examples
Example Code 8.3 Password-Protected Data Sets
NOTE: Successfully created a secured library object for the physical library ABCDE and recorded its
location as:
SecuredFolder: /System/Secured Libraries/Department XYZZY
SecuredLibrary: ABCDEEmps
SecuredLibraryGUID: 4881263D-C346-41F7-AC49-BF9181AF13D2
ERROR: The ALTER password is the most restrictive on ABCDE.EMPINFO.DATA. You must supply its value in
order to alter or add any passwords.
ERROR: The ALTER password is the most restrictive on ABCDE.EMPLOYEES.DATA. You must supply its value
in order to alter or add any passwords.
NOTE: Successfully added new secured table object "PRODUCT.DATA" to the secured library object at
path "/System/Secured Libraries/Department XYZZY/ABCDEEmps" for data set ABCDE.PRODUCT.DATA.
NOTE: The passwords on ABCDE.PRODUCT.DATA were successfully modified.
NOTE: Some statement actions not processed because of errors noted above.
186 quit;
NOTE: The SAS System stopped processing this step because of errors.
Details
This example demonstrates how to specify the passwords for the Employees and
EmpInfo data sets from the preceding example in the PROC AUTHLIB CREATE
statement. None of the data sets are AES-encrypted.
172 Chapter 8 / AUTHLIB Procedure
Program
proc authlib lib=abcde;
create securedlibrary="ABCDEEmps"
securedfolder="Department XYZZY"
pw=ijkl/secretpw;
run;
quit;
Program Description
Library ABCDE also has Employees, EmpInfo, and Product data sets. However, in
library ABCDE, the Employees and EmpInfo data sets are protected with a READ=
password abcd, WRITE= password efgh, and ALTER= password ijkl before the
library is secured by the statements. The third data set, Product, is not protected
with any passwords.
proc authlib lib=abcde;
Using the CREATE statement, enter the name of the metadata folder and name
the secured library object in the SAS Metadata Server. Specify the ALTER=
password ijkl for the data sets in the PW= argument before the new password
secretpw, separated by a slash (/).
create securedlibrary="ABCDEEmps"
securedfolder="Department XYZZY"
pw=ijkl/secretpw;
run;
quit;
Results: The library ABCDE is bound. All three data sets are bound with the same
password secretpw.
Example 4: Binding a Library When Existing Data Sets Are Protected with Different
Passwords 173
Log Examples
Example Code 8.4 Securing a Library with Data Sets That Are Protected with the Same
Passwords
NOTE: Successfully created a secured library object for the physical library ABCDE and recorded its
location as:
SecuredFolder: /System/Secured Libraries/Department XYZZY
SecuredLibrary: ABCDEEmps
SecuredLibraryGUID: 9F746F86-2336-4E2F-A67E-BFB77DEC27F0
NOTE: Successfully added new secured table object "DEPTNAME.DATA" to the secured library object at
path "/System/Secured
Libraries/Department XYZZY/ABCDEEmps" for data set ABCDE.DEPTNAME.DATA.
NOTE: The passwords on ABCDE.DEPTNAME.DATA were successfully modified.
NOTE: Successfully added new secured table object "EMPINFO.DATA" to the secured library object at path
"/System/Secured
Libraries/Department XYZZY/ABCDEEmps" for data set ABCDE.EMPINFO.DATA.
NOTE: The passwords on ABCDE.EMPINFO.DATA were successfully modified.
NOTE: Successfully added new secured table object "EMPLOYEE.DATA" to the secured library object at
path "/System/Secured
Libraries/Department XYZZY/ABCDEEmps" for data set ABCDE.EMPLOYEE.DATA.
NOTE: The passwords on ABCDE.EMPLOYEE.DATA were successfully modified.
44 quit;
Details
This example demonstrates how to bind the library KLMNO, which contains three
data sets with different passwords. None of the data sets are AES-encrypted. It
also demonstrates creating a longer metadata-bound library password by
specifying the READ=, WRITE=, and ALTER= password options.
Program
proc authlib lib=klmno;
create securedlibrary="KLMNOEmps"
securedfolder="Department XYZZY"
read=abcdefgh
write=ijklmno
alter=pqrstuvw;
tables employees /
pw=lmno;
tables empinfo /
read=abcd
write=efgh
alter=ijkl;
tables product;
run;
quit;
Program Description
Library KLMNO has Employees, EmpInfo, and Product data sets. The Employees
data set is protected with the PW= password lmno. The EmpInfo data set is
protected with a READ= password abcd, a WRITE= password efgh, and an ALTER=
password ijkl. The Product data set is not protected.
proc authlib lib=klmno;
Using the CREATE statement, enter the name of the metadata folder and name
the secured library object in the SAS Metadata Server. Specify the values for
READ= password abcdefgh, WRITE= password ijklmno, and ALTER= password
pqrstuvw to create a longer metadata-bound library password.
create securedlibrary="KLMNOEmps"
securedfolder="Department XYZZY"
read=abcdefgh
write=ijklmno
alter=pqrstuvw;
Example 4: Binding a Library When Existing Data Sets Are Protected with Different
Passwords 175
Use the TABLES statement to specify the current password for each data set.
When using TABLES statements, a TABLES statement must be specified for all
data sets.
tables employees /
pw=lmno;
tables empinfo /
read=abcd
write=efgh
alter=ijkl;
tables product;
run;
quit;
Results: The library KLMNO is bound, and all three data sets are bound with the
same passwords. The passwords are READ= password abcdefgh, WRITE=
password ijklmno, and ALTER= password pqrstuvw.
176 Chapter 8 / AUTHLIB Procedure
Log Examples
Example Code 8.5 Securing a Library with Existing Data Sets That Are Protected with Different
Passwords
NOTE: Successfully created a secured library object for the physical library KLMNO and recorded its
location as:
SecuredFolder: /System/Secured Libraries/Department XYZZY
SecuredLibrary: KLMNOEmps
SecuredLibraryGUID: BC74E81F-E86B-402E-8C16-F9A94A078F81
NOTE: Successfully added new secured table object "EMPLOYEES.DATA" to the secured library object at
path "/System/Secured
Libraries/Department XYZZY/KLMNOEmps" for data set KLMNO.EMPLOYEES.DATA.
NOTE: The passwords on KLMNO.EMPLOYEES.DATA were successfully modified.
NOTE: Successfully added new secured table object "EMPINFO.DATA" to the secured library object at path
"/System/Secured
Libraries/Department XYZZY/KLMNOEmps" for data set KLMNO.EMPINFO.DATA.
NOTE: The passwords on KLMNO.EMPINFO.DATA were successfully modified.
NOTE: Successfully added new secured table object "PRODUCT.DATA" to the secured library object at path
"/System/Secured
Libraries/Department XYZZY/KLMNOEmps" for data set KLMNO.PRODUCT.DATA.
NOTE: The passwords on KLMNO.PRODUCT.DATA were successfully modified.
193 quit;
Details
This example shows a different approach for modifying the passwords of existing
data sets to match the metadata-bound library passwords. It uses the MODIFY
statement. Here, the MODIFY statement is used to modify the data set passwords
of the Employees and EmpInfo data sets from Example 2 on page 169 to match the
metadata-bound library password. Neither of these data sets are AES-encrypted.
The MODIFY statement can also be used to modify the data set passwords of data
sets that are copied into a metadata-bound library by operating system commands
after the library has been bound.
Program
proc authlib lib=abcde;
modify tablesonly=yes
pw=secretpw;
tables _all_ /
pw=ijkl/secretpw;
run;
quit;
Program Description
Library ABCDE has Employees, EmpInfo, and Product data sets. The library is
bound with metadata-bound library password secretpw. However, in library ABCDE,
the Employees and EmpInfo data sets are not bound to the library and are
protected with an ALTER= password ijkl. The third data set, Product, is already
bound.
proc authlib lib=abcde;
The MODIFY statement is used to modify the data set passwords of the
Employees and EmpInfo data sets to match the metadata-bound library
password. The TABLESONLY= statement specifies to modify table passwords only.
modify tablesonly=yes
pw=secretpw;
A TABLES statement must be specified. The existing data sets’ ALTER password is
specified in the PW= argument before the metadata-bound password, separated by
a slash (/) in the TABLES statement.
tables _all_ /
pw=ijkl/secretpw;
run;
178 Chapter 8 / AUTHLIB Procedure
quit;
Results: All three data sets are now bound with the secretpw password.
Log Examples
Example Code 8.6 Changing Data Set Passwords
Details
This example demonstrates how to use the MODIFY statement to change the
library passwords if you believe that the metadata-bound library passwords have
been compromised. The following code changes the library passwords and the data
set passwords of all data sets in the library that use the specified passwords or do
not have a password. In this example, no data sets are AES-encrypted. See later
examples if your library has AES-encrypted data.
Program
proc authlib lib=abcde;
Example 6: Changing Metadata-Bound Library Passwords 179
modify securedlibrary="ABCDEEmps"
securedfolder="Department XYZZY"
pw=secretpw/new-password;
run;
quit;
Program Description
Library ABCDE requires a password change.
proc authlib lib=abcde;
Use the MODIFY statement to change the library passwords and the data set
passwords. Note that the name of the secured library object and the name of the
metadata folder are optional, but can be specified to ensure that the library is
bound to that secured library object before making the change. This is used when
the SAS Management Console submits the code from the Modify action to ensure
that the correct operation system library path was specified.
modify securedlibrary="ABCDEEmps"
securedfolder="Department XYZZY"
pw=secretpw/new-password;
run;
quit;
Results: The library ABCDE remains bound and the library password is modified to
the new-password. All three data sets remain bound, and their passwords are
modified with new-password. An error message would be displayed in the SAS log
for any data set that had a password other than secretpw.
Log Examples
Example Code 8.7 Changing Metadata-bound Library Passwords
NOTE: The passwords for the secured library object with path "/System/Secured Libraries/Department
XYZZY/ABCDEEmps" were successfully modified."
NOTE: The passwords on ABCDE.EMPINFO.DATA were successfully modified.
NOTE: The passwords on ABCDE.EMPLOYEES.DATA were successfully modified.
NOTE: The passwords on ABCDE.PRODUCT.DATA were successfully modified.
223 quit;
180 Chapter 8 / AUTHLIB Procedure
Details
This example demonstrates how to unbind a metadata-bound library. The code
does the following:
n deletes metadata that describes the library and its tables from the SAS
Metadata Repository
n removes security bindings from the physical library and data sets
n removes the assigned password from the data sets, leaving them unprotected
The slash (/) after the password is optional and is used to remove or replace the
password from the data sets. If a library is bound with READ=, WRITE=, and
ALTER= passwords, as in Example 4 on page 173, then you must specify all of the
passwords, and they must each have a slash (/). None of the data sets are AES-
encrypted.
Program
proc authlib lib=abcde;
remove
pw=currntpw/;
run;
quit;
Program Description
Unbinding the metadata-bound library ABCDE.
proc authlib lib=abcde;
Use the REMOVE statement to unbind the metadata-bound library. The slash (/)
after the password is used to remove the password from the data sets.
remove
pw=currntpw/;
Example 8: Using the REPORT Statement 181
run;
quit;
Results: The library ABCDE and all the data sets that are bound to it are no longer
bound. All passwords are removed from the unbound data sets making them
unprotected.
Log Examples
Example Code 8.8 Unbinding a Metadata-Bound Library
WARNING: Some or all the passwords on ABCDE.DEPTNAME.DATA were removed along with the secured library
object location,
leaving the data set unprotected.
NOTE: The secured table object location for ABCDE.DEPTNAME.DATA was successfully removed.
WARNING: Some or all the passwords on ABCDE.EMPINFO.DATA were removed along with the secured library
object location, leaving
the data set unprotected.
NOTE: The secured table object location for ABCDE.EMPINFO.DATA was successfully removed.
WARNING: Some or all the passwords on ABCDE.EMPLOYEE.DATA were removed along with the secured library
object location,
leaving the data set unprotected.
NOTE: The secured table object location for ABCDE.EMPLOYEE.DATA was successfully removed.
NOTE: Successfully deleted the secured library object that was located at:
SecuredFolder: /System/Secured Libraries/Department XYZZY
SecuredLibrary: ABCDEEmps
SecuredLibraryGUID: 9F746F86-2336-4E2F-A67E-BFB77DEC27F0
NOTE: Successfully deleted the recorded location of the secured library object for the physical
library ABCDE.
199 quit;
Details
This example demonstrates how to check a library's bindings.
182 Chapter 8 / AUTHLIB Procedure
Program
proc authlib lib=abcde;
report;
run;
quit;
Program Description
Check the bindings of the metadata-bound library ABCDE.
proc authlib lib=abcde;
Results: For the REPORT statement results, see“Output Example” on page 183.
Log Examples
Example Code 8.9 Creating a Report
52 quit;
Example 9: Using the TABLES Statement 183
Output Example
Output 8.2 REPORT Statement Results for the ABCDE Library
Details
Example 4 on page 173 demonstrates how to use the TABLES statement.
184 Chapter 8 / AUTHLIB Procedure
Details
The following example demonstrates how to bind and change passwords on SAS
Proprietary encrypted data sets.
Program
proc authlib lib=klmno;
create securedlibrary="KLMNOEmps"
securedfolder="Department XYZZY"
pw=pqrstuvw;
tables employees /
pw=lmno;
tables empinfo /
read=abcd;
tables product;
run;
quit;
Program Description
Library KLMNO has three data sets: Employees, EmpInfo, and Product. In this
library, the Employees data set is protected with the PW= password lmno. The
EmpInfo data set is protected with a READ= password abcd. Both Employees and
EmpInfo data sets are SAS Proprietary encrypted. The Product data set is not
protected.
Example 10: Binding a Library When Existing Data Sets Are SAS Proprietary Encrypted
185
proc authlib lib=klmno;
Using the CREATE statement, enter the name of the metadata folder and name
the secured library object in the SAS Metadata Server. Set the library password to
pqrstuvw.
create securedlibrary="KLMNOEmps"
securedfolder="Department XYZZY"
pw=pqrstuvw;
Because these data sets have different passwords, a TABLES statement must be
specified for all data sets in order to change their passwords.
tables employees /
pw=lmno;
tables empinfo /
read=abcd;
tables product;
run;
quit;
Results: The library KLMNO is bound. All three data sets are bound and use the
same PW= password pqrstuvw. Data sets Employees and EmpInfo are copied-in-
place to encrypt with the password pqrstuvw. Data set Product is bound, but not
encrypted.
186 Chapter 8 / AUTHLIB Procedure
Log Examples
Example Code 8.10 TABLES Statement for the KLMNO Library Containing a SAS Proprietary
Data Set
NOTE: Successfully created a secured library object for the physical library KLMNO and recorded its
location as:
SecuredFolder: /System/Secured Libraries/Department XYZZY
SecuredLibrary: KLMNOEmps
SecuredLibraryGUID: E71881CD-8C54-4E21-A8B5-FD7D4FBDAA7D
NOTE: Copying data set KLMNO.EMPLOYEES in place to encrypt with the new secured library passwords or
encryption options.
NOTE: Renaming the data set KLMNO.EMPLOYEES to KLMNO.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: Copying the data set KLMNO.__TEMP_ENCRYPT_FILE_NAME__ to KLMNO.EMPLOYEES.
NOTE: Metadata-bound library permissions are used for KLMNO.EMPLOYEES.DATA.
NOTE: Successfully added new secured table object "EMPLOYEES.DATA" to the secured library object at
path "/System/Secured
Libraries/Department XYZZY/KLMNOEmps" for data set KLMNO.EMPLOYEES.DATA.
NOTE: There were 5 observations read from the data set KLMNO.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: The data set KLMNO.EMPLOYEES has 5 observations and 6 variables.
NOTE: Deleting the data set KLMNO.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: The passwords on KLMNO.EMPLOYEES.DATA were successfully modified.
NOTE: Copying data set KLMNO.EMPINFO in place to encrypt with the new secured library passwords or
encryption options.
NOTE: Renaming the data set KLMNO.EMPINFO to KLMNO.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: Copying the data set KLMNO.__TEMP_ENCRYPT_FILE_NAME__ to KLMNO.EMPINFO.
NOTE: Metadata-bound library permissions are used for KLMNO.EMPINFO.DATA.
NOTE: Successfully added new secured table object "EMPINFO.DATA" to the secured library object at path
"/System/Secured
Libraries/Department XYZZY/KLMNOEmps" for data set KLMNO.EMPINFO.DATA.
NOTE: There were 5 observations read from the data set KLMNO.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: The data set KLMNO.EMPINFO has 5 observations and 6 variables.
NOTE: Deleting the data set KLMNO.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: The passwords on KLMNO.EMPINFO.DATA were successfully modified.
NOTE: The passwords on KLMNO.PRODUCT.DATA do not require modification.
NOTE: Successfully added new secured table object "PRODUCT.DATA" to the secured library object at path
"/System/Secured
Libraries/Department XYZZY/KLMNOEmps" for data set KLMNO.PRODUCT.DATA.
275 quit;
SECUREDLIBRARY=
SECUREDFOLDER=
TABLES statement option:
ENCRYPTKEY=
Details
This example demonstrates how to bind data sets that are AES-encrypted. None of
the data sets have passwords.
CAUTION
SAS strongly recommends that you not have AES-encrypted data sets with
different encryption keys in metadata-bound libraries, like this example
creates. Instead, SAS recommends that you record a default encryption key in
metadata and convert all AES-encrypted data sets to use that key. Doing this, your
users, and programs do not have to specify the key when opening the data sets. The
examples following this example show you how to do this process.
Program
proc authlib lib=klmno;
create securedlibrary="KLMNOEmps"
securedfolder="Department XYZZY"
pw=pqrstuvw;
tables employees /
encryptkey=lmno;
tables empinfo /
encryptkey=abcd;
tables product;
run;
quit;
Program Description
Library KLMNO has three data sets: Employees, EmpInfo, and Product. In this
library, the Employees data set is AES-encrypted and has the ENCRYPTKEY= value
lmno. The EmpInfo data set is AES-encrypted and has the ENCRYPTKEY= value
abcd. The Product data set is not protected.
proc authlib lib=klmno;
188 Chapter 8 / AUTHLIB Procedure
Using the CREATE statement, enter the name of the metadata folder and name
the secured library object in the SAS Metadata Server. Set the library password to
pqrstuvw.
create securedlibrary="KLMNOEmps"
securedfolder="Department XYZZY"
pw=pqrstuvw;
Using the TABLES statements, specify the encrypt key for each data set. A
TABLES statement must be specified for all data sets.
tables employees /
encryptkey=lmno;
tables empinfo /
encryptkey=abcd;
tables product;
run;
quit;
Results: The library KLMNO is bound. All three data sets are bound. The
Employees and EmpInfo data sets remain AES-encrypted. The Product data set is
not encrypted. The encrypt key values for the Employees and Empinfo data sets are
different. SAS strongly recommends that you not have AES-encrypted data sets
with different encryption keys in metadata-bound libraries, like this example
created.
Example 12: Binding a Library with an Optional Recorded Encryption Key When Existing
AES-Encrypted Data Sets Have Different Encryption Keys 189
Log Examples
Example Code 8.11 TABLES Statement for the KLMNO Library Containing AES-Encrypted Data
Sets
NOTE: Successfully created a secured library object for the physical library KLMNO and recorded its
location as:
SecuredFolder: /System/Secured Libraries/Department XYZZY
SecuredLibrary: KLMNOEmps
SecuredLibraryGUID: 48E2C4C7-ADE1-49D2-BBFE-14E5EAAB8961
NOTE: Successfully added new secured table object "EMPLOYEES.DATA" to the secured library object at
path "/System/Secured
Libraries/Department XYZZY/KLMNOEmps" for data set KLMNO.EMPLOYEES.DATA.
NOTE: The passwords on KLMNO.EMPLOYEES.DATA were successfully modified.
NOTE: Successfully added new secured table object "EMPINFO.DATA" to the secured library object at path
"/System/Secured
Libraries/Department XYZZY/KLMNOEmps" for data set KLMNO.EMPINFO.DATA.
NOTE: The passwords on KLMNO.EMPINFO.DATA were successfully modified.
NOTE: Successfully added new secured table object "PRODUCT.DATA" to the secured library object at path
"/System/Secured
Libraries/Department XYZZY/KLMNOEmps" for data set KLMNO.PRODUCT.DATA.
NOTE: The passwords on KLMNO.PRODUCT.DATA were successfully modified.
361 quit;
Details
This example demonstrates how to bind a library with an optional recorded
encryption key. None of the data sets have passwords.
Since some SAS code existed that created and references the EmpInfo data set
with ENCRYPTKEY=DEF and since the recorded library key is not required, the
specification of the ENCRYPTKEY=DEF should be removed from the code. Any
code that re-creates the data must keep the ENCRYPT=AES or ENCRYPT=AES2
option so that the optional recorded key is used when the data set is re-created.
Program
proc authlib lib=abcde;
create securedlibrary="ABCDEEmps"
securedfolder="Department XYZZY"
pw=secret
encrypt=aes
encryptkey=optionalkey;
tables employee;
tables empinfo /
encryptkey=def/optionalkey
encrypt=aes;
tables deptname;
run;
quit;
Program Description
Library ABCDE has Employees, EmpInfo, and DeptName data sets. In this library,
the EmpInfo data set is AES-encrypted and has the ENCRYPTKEY= value def.
proc authlib lib=abcde;
Using the CREATE statement, enter the name of the metadata folder and name
the secured library object in the SAS Metadata Server. The optional encrypt key is
specified for the metadata-bound library.
create securedlibrary="ABCDEEmps"
securedfolder="Department XYZZY"
pw=secret
encrypt=aes
encryptkey=optionalkey;
Example 12: Binding a Library with an Optional Recorded Encryption Key When Existing
AES-Encrypted Data Sets Have Different Encryption Keys 191
A TABLES statement is required for each data set.
tables employee;
tables empinfo /
encryptkey=def/optionalkey
encrypt=aes;
tables deptname;
run;
quit;
Results: The ABCDE library is bound and the optional encrypt key is stored. When
the statements are executed, the following happens to the three data sets. The
Employee data set is updated with the new metadata-bound library password but
is not encrypted. The DeptName data set is updated with the metadata-bound
library password but is not encrypted. The EmpInfo data set is copied to re-encrypt
with the optional recorded key and gets the new metadata-bound library password.
Note that it is necessary to supply both the current and new optional key in the
TABLES statement for EmpInfo in the following program. Without the new key
specification, the data set would remain encrypted with the def key.
192 Chapter 8 / AUTHLIB Procedure
Log Examples
Example Code 8.12 Changing an Encryption Key Value to the Recorded Encryption Key
NOTE: Successfully created a secured library object for the physical library ABCDE and recorded its
location as:
SecuredFolder: /System/Secured Libraries/Department XYZZY
SecuredLibrary: ABCDEEmps
SecuredLibraryGUID: 8E683650-B306-4871-A92D-16D481EC6456
NOTE: Successfully added new secured table object "EMPLOYEE.DATA" to the secured library object at
path "/System/Secured
Libraries/Department XYZZY/ABCDEEmps" for data set ABCDE.EMPLOYEE.DATA.
NOTE: The passwords on ABCDE.EMPLOYEE.DATA were successfully modified.
NOTE: Copying data set ABCDE.EMPINFO in place to encrypt with the new secured library passwords or
encryption options.
NOTE: Renaming the data set ABCDE.EMPINFO to ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: Copying the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__ to ABCDE.EMPINFO.
NOTE: Metadata-bound library permissions are used for ABCDE.EMPINFO.DATA.
NOTE: Successfully added new secured table object "EMPINFO.DATA" to the secured library object at path
"/System/Secured
Libraries/Department XYZZY/ABCDEEmps" for data set ABCDE.EMPINFO.DATA.
NOTE: There were 5 observations read from the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: The data set ABCDE.EMPINFO has 5 observations and 6 variables.
NOTE: Deleting the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: The passwords on ABCDE.EMPINFO.DATA were successfully modified.
NOTE: Successfully added new secured table object "DEPTNAME.DATA" to the secured library object at
path "/System/Secured
Libraries/Department XYZZY/ABCDEEmps" for data set ABCDE.DEPTNAME.DATA.
NOTE: The passwords on ABCDE.DEPTNAME.DATA were successfully modified.
480 quit;
Details
This example demonstrates how to bind a library with requiring that all of the data
sets in this metadata-bound library have AES encryption and have the same
encryption key.
Program
proc authlib lib=abcde;
create seclib="ABCDEEmps"
securedfolder="Department XYZZY"
pw=secret
require_encryption=yes
encrypt=aes
encryptkey=abc ;
run;
quit;
Program Description
Library ABCDE has three data sets: Employees, EmpInfo, and DeptName. Data set
EmpInfo has encryption key value of abc. The other two data sets are not AES-
encrypted. None of the data sets have passwords.
proc authlib lib=abcde;
Using the CREATE statement, enter the name of the metadata folder and name
the secured library object in the SAS Metadata Server.
REQUIRE_ENCRYPTION=YES specifies that all data sets in the metadata-bound
library are automatically AES-encrypted and use the AES key generation algorithm.
Note that with required encryption and an encryption key, the specific key
generation algorithm specified with ENCRYPT= is always used. With optional
encryption, whichever key generation algorithm is specified in code with
ENCRYPT= is used with the recorded key.
create seclib="ABCDEEmps"
securedfolder="Department XYZZY"
pw=secret
194 Chapter 8 / AUTHLIB Procedure
require_encryption=yes
encrypt=aes
encryptkey=abc ;
run;
quit;
Results: The library ABCDE is bound, and all of the data sets are bound and AES-
encrypted with the same encryption key.
Example 13: Binding a Library with Required AES Encryption When Existing Data Sets Are
Encrypted with the Same Encryption Key 195
Log Examples
Example Code 8.13 Library ABCDE Requiring AES Encryption When the Data Sets Are Already
Encrypted with the Same Encryption Key
NOTE: Successfully created a secured library object for the physical library ABCDE and recorded its
location as:
SecuredFolder: /System/Secured Libraries/Department XYZZY
SecuredLibrary: ABCDEEmps
SecuredLibraryGUID: 9FD6C5D9-EF00-4CDC-8D0A-348D08BB329E
NOTE: Copying data set ABCDE.DEPTNAME in place to do required encryption with the library's required
encryption key and
passwords.
NOTE: Renaming the data set ABCDE.DEPTNAME to ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: Copying the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__ to ABCDE.DEPTNAME.
NOTE: Metadata-bound library permissions are used for ABCDE.DEPTNAME.DATA.
NOTE: Successfully added new secured table object "DEPTNAME.DATA" to the secured library object at
path "/System/Secured
Libraries/Department XYZZY/ABCDEEmps" for data set ABCDE.DEPTNAME.DATA.
NOTE: There were 10 observations read from the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: The data set ABCDE.DEPTNAME has 10 observations and 2 variables.
NOTE: Deleting the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: The passwords on ABCDE.DEPTNAME.DATA were successfully modified.
NOTE: Successfully added new secured table object "EMPINFO.DATA" to the secured library object at path
"/System/Secured
Libraries/Department XYZZY/ABCDEEmps" for data set ABCDE.EMPINFO.DATA.
NOTE: The passwords on ABCDE.EMPINFO.DATA were successfully modified.
NOTE: Copying data set ABCDE.EMPLOYEE in place to do required encryption with the library's required
encryption key and
passwords.
NOTE: Renaming the data set ABCDE.EMPLOYEE to ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: Copying the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__ to ABCDE.EMPLOYEE.
NOTE: Metadata-bound library permissions are used for ABCDE.EMPLOYEE.DATA.
NOTE: Successfully added new secured table object "EMPLOYEE.DATA" to the secured library object at
path "/System/Secured
Libraries/Department XYZZY/ABCDEEmps" for data set ABCDE.EMPLOYEE.DATA.
NOTE: There were 22 observations read from the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: The data set ABCDE.EMPLOYEE has 22 observations and 11 variables.
NOTE: Deleting the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: The passwords on ABCDE.EMPLOYEE.DATA were successfully modified.
48 quit;
196 Chapter 8 / AUTHLIB Procedure
Details
This example demonstrates how to use the MODIFY statement to change the
stored library encryption key if you believe that the metadata-bound library
encryption keys might have been compromised.
Program
proc authlib lib=abcde;
modify
pw=secret
encrypt=aes
encryptkey=/new;
run;
quit;
Program Description
Library ABCDE has three data sets: Employees, EmpInfo, and DeptName. In this
library, all data sets are AES-encrypted with encryption key value abc since AES
encryption is required for the metadata bound library.
proc authlib lib=abcde;
Use the MODIFY statement to change the library encryption key and the data set
encryption key. You must specify ENCRYPT=AES or ENCRYPT=AES2. Note that
Example 14: Changing the Encryption Key on a Metadata-Bound Library That Requires AES
Encryption 197
the key generation algorithm can also be changed here between AES and AES2
simply by changing the value in the ENCRYPT= option.
modify
pw=secret
encrypt=aes
encryptkey=/new;
run;
quit;
Results: The library ABCDE remains bound with the same password and a new
encryption key. All three data sets remain bound with the same password and a
new encryption key. Note that the data sets were copied-in-place to be encrypted
with the new key value and the specified encryption key algorithm, AES in this
case.
198 Chapter 8 / AUTHLIB Procedure
Details
This example demonstrates how to change all data sets in the metadata-bound
library that contain different encryption keys to have the required AES encryption
and have the same encryption key. None of the data sets have passwords.
Program
proc authlib lib=abcde;
create seclib="ABCDEEmps"
securedfolder="Department XYZZY"
pw=secret
require_encryption=yes
encrypt=aes
encryptkey=new ;
tables employee /
encryptkey=abc;
tables empinfo /
encryptkey=def;
tables deptname ;
run;
quit;
200 Chapter 8 / AUTHLIB Procedure
Program Description
Library ABCDE has three data sets: Employee, EmpInfo, and DeptName. The
Employee and EmpInfo data sets are already AES-encrypted with different keys.
The DeptName data set is not encrypted.
proc authlib lib=abcde;
Using the CREATE statement, enter the name of the metadata folder and name
the secured library object in the SAS Metadata Server.
REQUIRE_ENCRYPTION=YES specifies that all data sets in the metadata-bound
library are automatically AES-encrypted.
create seclib="ABCDEEmps"
securedfolder="Department XYZZY"
pw=secret
require_encryption=yes
encrypt=aes
encryptkey=new ;
Using the TABLES statement, specify the encrypt key for each data set. TABLES
statements are required for each data set.
tables employee /
encryptkey=abc;
tables empinfo /
encryptkey=def;
tables deptname ;
run;
quit;
Results: The library ABCDE is bound. All data sets in the metadata-bound library
ABCDE have been copied-in-place to be encrypted with the required key and the
specified encryption key algorithm, AES in this case.
Example 15: Binding a Library with Existing Data Sets That Are AES-Encrypted with
Different Encryption Keys 201
Log Examples
Example Code 8.15 Library ABCDE Requiring AES Encryption When Each Data Set Has
Different Encryption Key Values
NOTE: Successfully created a secured library object for the physical library ABCDE and recorded its
location as:
SecuredFolder: /System/Secured Libraries/Department XYZZY
SecuredLibrary: ABCDEEmps
SecuredLibraryGUID: 097E9A84-D6E8-488E-B779-1E2AB0670036
NOTE: Copying data set ABCDE.EMPLOYEE in place to do required encryption with the library's required
encryption key and
passwords.
NOTE: Renaming the data set ABCDE.EMPLOYEE to ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: Copying the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__ to ABCDE.EMPLOYEE.
NOTE: Metadata-bound library permissions are used for ABCDE.EMPLOYEE.DATA.
NOTE: Successfully added new secured table object "EMPLOYEE.DATA" to the secured library object at
path "/System/Secured
Libraries/Department XYZZY/ABCDEEmps" for data set ABCDE.EMPLOYEE.DATA.
NOTE: There were 5 observations read from the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: The data set ABCDE.EMPLOYEE has 5 observations and 6 variables.
NOTE: Deleting the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: The passwords on ABCDE.EMPLOYEE.DATA were successfully modified.
NOTE: Copying data set ABCDE.EMPINFO in place to do required encryption with the library's required
encryption key and
passwords.
NOTE: Renaming the data set ABCDE.EMPINFO to ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: Copying the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__ to ABCDE.EMPINFO.
NOTE: Metadata-bound library permissions are used for ABCDE.EMPINFO.DATA.
NOTE: Successfully added new secured table object "EMPINFO.DATA" to the secured library object at path
"/System/Secured
Libraries/Department XYZZY/ABCDEEmps" for data set ABCDE.EMPINFO.DATA.
NOTE: There were 5 observations read from the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: The data set ABCDE.EMPINFO has 5 observations and 6 variables.
NOTE: Deleting the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: The passwords on ABCDE.EMPINFO.DATA were successfully modified.
NOTE: Copying data set ABCDE.DEPTNAME in place to do required encryption with the library's required
encryption key and
passwords.
NOTE: Renaming the data set ABCDE.DEPTNAME to ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: Copying the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__ to ABCDE.DEPTNAME.
NOTE: Metadata-bound library permissions are used for ABCDE.DEPTNAME.DATA.
NOTE: Successfully added new secured table object "DEPTNAME.DATA" to the secured library object at
path "/System/Secured
Libraries/Department XYZZY/ABCDEEmps" for data set ABCDE.DEPTNAME.DATA.
NOTE: There were 4 observations read from the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: The data set ABCDE.DEPTNAME has 4 observations and 2 variables.
NOTE: Deleting the data set ABCDE.__TEMP_ENCRYPT_FILE_NAME__.
NOTE: The passwords on ABCDE.DEPTNAME.DATA were successfully modified.
567 quit;
202 Chapter 8 / AUTHLIB Procedure
Details
This example is similar to the previous example. The difference is that the library is
already bound to metadata, so the MODIFY statement is used to change the
binding to require AES encryption.
Program
proc authlib lib=abcde;
modify seclib="ABCDEEmps"
securedfolder="Department XYZZY"
pw=secret
require_encryption=yes
encrypt=aes
encryptkey=new;
tables employee /
encryptkey=abc;
tables empinfo /
encryptkey=def;
tables deptname ;
run;
quit;
Example 16: Changing a Metadata-Bound Library to Require AES Encryption When Existing
Data Sets Are Encrypted with Different Encryption Keys 203
Program Description
Library ABCDE has three data sets: Employees, EmpInfo, and DeptName. In this
library, the Employees data set has the encryption key value abc. The EmpInfo data
set has the encryption key value def. The DeptName data set is not AES-
encrypted.
proc authlib lib=abcde;
Using the MODIFY statement, enter the name of the metadata folder and name
the secured library object in the SAS Metadata Server. You use the
REQUIRE_ENCRYPTION=YES option to require that all data sets in the metadata-
bound library have AES encryption. Note that the name of the secured library
object and the name of the metadata folder are optional, but can be specified to
ensure that the library is bound to that secured library object before making the
change.
modify seclib="ABCDEEmps"
securedfolder="Department XYZZY"
pw=secret
require_encryption=yes
encrypt=aes
encryptkey=new;
Using the TABLES statement, specify the encrypt key for each data set. TABLES
statements are required for each data set.
tables employee /
encryptkey=abc;
tables empinfo /
encryptkey=def;
tables deptname ;
run;
quit;
Results: The library ABCDE remains bound. The MODIFY statement changed the
binding to require AES encryption. All three data sets are copied-in-place to
encrypt the data sets with the required encrypt key and the specified encryption
key algorithm, AES in this case.
204 Chapter 8 / AUTHLIB Procedure
Log Examples
Example Code 8.16 Library ABCDE Requiring AES Encryption and Changing the Encryption Key
Values of Each Data Set to a Recorded Encryption Key Value
Details
This example demonstrates how to unbind a metadata-bound library. The code
does the following:
n deletes metadata that describes the library and its tables from the SAS
Metadata Repository
n removes security bindings from the physical library and data sets
n removes the assigned password and encryption from the data sets, leaving them
unprotected
The slash (/) after the password is optional and is used to remove or replace the
password from the data sets. If a library is bound with READ=, WRITE=, and
ALTER= passwords, as in Example 4 on page 173, then you must specify all of the
passwords, and they must each have a slash (/).
Program
proc authlib lib=abcde;
remove
pw=currntpw/
encrypt=no;
run;
quit;
Program Description
206 Chapter 8 / AUTHLIB Procedure
Use the REMOVE statement to unbind the metadata-bound library. The slash (/)
after the password is used to remove the password from the data sets.
ENCRYPT=NO specifies that encryption is removed from all data sets.
remove
pw=currntpw/
encrypt=no;
run;
quit;
Results: The library ABCDE and all the data sets bound to it are no longer bound.
All passwords and encryption are removed from the unbound data sets making
them unprotected.
Example 17: Using the REMOVE Statement on a Metadata-Bound Library with Required
AES Encryption 207
Example Code 8.17 Using the REMOVE Statement on a Metadata-Bound Library with
Required AES Encryption
Details
This example shows how to reset the passwords and encryption key on
SecuredLibrary objects that are imported from a backup package.
n The LIBNAME statement without the AUTHADMIN=YES option fails because
there are no associated password values restored by the import.
n The AUTHADMIN=YES option is used to enable the AUTHLIB procedure to
execute with the binding information in the physical library.
n The MODIFY statement is used to reset the metadata-bound library passwords
and encryption key value on the library from “Example 13: Binding a Library with
Required AES Encryption When Existing Data Sets Are Encrypted with the
Same Encryption Key” on page 192 assuming that the SecuredLibrary object was
imported from a backup package without those values.
Program
libname abcde "sas-library" ;
libname abcde "sas-library" authadmin=yes;
proc authlib lib=abcde;
modify
pw=secret
encrypt=aes
encryptkey=value;
run;
quit;
libname abcde "sas-library";
Example 18: Resetting Credentials on Imported SecuredLibrary Objects 209
Program Description
Library ABCDE has three data sets: Employees, EmpInfo, and DeptName. This
LIBNAME statement fails because there are no associated password values.
libname abcde "sas-library" ;
Use the MODIFY statement to reset the metadata-bound library passwords and
encryption key value. The PW= option resets the password. The ENCRYPTKEY=
option resets the encryption key value.
proc authlib lib=abcde;
modify
pw=secret
encrypt=aes
encryptkey=value;
run;
quit;
NOTE: Required encryption will use AES encryption with the recorded key.
9
CALENDAR Procedure
n display holidays
n process data about multiple calendars in a single step and print them in a
separate, mixed, or combined format
n apply different holidays, weekly work schedules, and daily work shifts to
multiple calendars in a single PROC step
n produce a mean and a sum for variables based on either the number of days in a
month or the number of observations
PROC CALENDAR also contains features that are specifically designed to work
with PROC CPM in SAS/OR software, a project management scheduling tool.
Overview: CALENDAR Procedure 213
For the activities data set shown that is in this calendar, see “Example 1: Schedule
Calendar with Holidays: 5-Day Default” on page 261.
The following calendar uses one of the two default calendars, the 24-hour-day, 7-
day-week calendar.
214 Chapter 9 / CALENDAR Procedure
For an explanation of the program that produces this calendar, see “Example 4:
Multiple Schedule Calendars with Atypical Work Shifts (Combined and Mixed
Output)” on page 278.
216 Chapter 9 / CALENDAR Procedure
In a summary calendar, each piece of information for a given day is the value of a
variable for that day. The variables can be either numeric or character, and you can
format them as necessary. You can use the SUM and MEAN options to calculate
sums and means for any numeric variables. These statistics appear in a box below
the calendar, as shown in the following output. The data set that is shown in this
calendar is created in “Example 7: Summary Calendar with MEAN Values by
Observation” on page 295.
Types of Calendars
PROC CALENDAR can produce two types of calendars: schedule and summary.
Summary calendar Calculate sums and means Activities can last only
one day
Note: PROC CALENDAR produces a summary calendar if you do not use a DUR or
FIN statement in the PROC step.
Concepts: CALENDAR Procedure 219
Schedule Calendar
Definition
A report in calendar format that shows when activities and holidays start and end.
Required Statements
You must supply a START statement and either a DUR or FIN statement. If you do
not use a DUR or FIN statement, then PROC CALENDAR assumes that you want to
create a summary calendar report.
Examples
n “Simple Schedule Calendar” on page 213
Summary Calendar
Definition
A report in calendar format that displays activities and holidays that last only one
day and that can provide summary information in the form of sums and means.
Required Statements
You must supply a START statement. This statement identifies the variable in the
activities data set that contains an activity's starting date.
Examples
n “Simple Summary Calendar” on page 216
Description
PROC CALENDAR provides two default calendars for simple applications. You can
produce calendars without having to specify detailed work shifts and weekly work
patterns if your application can use one of two simple work patterns. Consider
using a default calendar if the following conditions are true:
n your application uses a 5-day work week with 8-hour days or a 7-day work week
with 24-hour days, as shown in the following table
n you want to print all activities on the same calendar
Examples
n See the 7-day default calendar in Output 9.30 on page 214
n See the 5-day default calendar in “Example 1: Schedule Calendar with Holidays:
5-Day Default” on page 261
Definitions
calendar
a logical entity that represents a weekly work pattern, which consists of weekly
work schedules and daily shifts. PROC CALENDAR contains two default work
patterns: 5-day week with an 8-hour day or a 7-day week with a 24-hour day.
You can also define your own work patterns by using CALENDAR and
WORKDAYS data sets.
calendar report
a report in calendar format that displays activities, holidays, and nonwork
periods. A calendar report can contain multiple calendars in one of three
formats:
separate
each identified calendar is printed on separate output pages.
combined
all identified calendars are printed on the same output pages and each is
identified.
mixed
all identified calendars are printed on the same output pages but are not
identified as belonging to separate calendars.
multiple calendar
a logical entity that represents multiple weekly work patterns.
schedules and weekly work patterns for work crews on different parts of the
project.
Another use for multiple calendars is to identify activities so that you can choose
to print them in the same calendar report. For example, if you identify activities as
belonging to separate departments within a division, then you can choose to print a
calendar report that shows all departmental activities on the same calendar.
Finally, using multiple calendars, you can produce separate calendar reports for
each calendar in a single step. For example, if activities are identified by
department, then you can produce a calendar report that prints the activities of
each department on separate pages.
You can use the special variable name _CAL_ or you can use another variable name.
PROC CALENDAR automatically looks for a variable named _CAL_ in the holiday
and calendar data sets, even when the activities data set uses a variable with
another name as the CALID variable. Therefore, if you use the name _CAL_ in your
holiday and calendar data sets, then you can more easily reuse these data sets in
different calendar applications.
For example, consider a calendar that shows the activities of all departments
within a division. Each department can have its own calendar identification value
and, if necessary, can have individual weekly work patterns, daily work shifts, and
holidays.
If you place activities that are associated with different calendars in the same
activities data sets, then you use PROC CALENDAR to produce calendar reports
that print the following:
n the schedule and events for each department on a separate page (separate
output)
n the schedule and events for the entire division, each identified by department
(combined output)
n the schedule and events for the entire division, but not identified by department
(mixed output)
You can use the special variable name _CAL_ or you can use another variable name.
PROC CALENDAR automatically looks for a variable named _CAL_ in the holiday
and calendar data sets, even when the activities data set uses a variable with
another name as the CALID variable. Therefore, if you use the name _CAL_ in your
holiday and calendar data sets, then you can more easily reuse these data sets in
different calendar applications.
Concepts: CALENDAR Procedure 225
Examples
n “Example 2: Schedule Calendar Containing Multiple Calendars” on page 266
Table 9.4 Four Possible Input Data Sets for PROC CALENDAR
Purpose
The activities data set, specified with the DATA= option, contains information
about the activities to be scheduled by PROC CALENDAR. Each observation
describes a single activity.
n The activities data set must always be sorted or indexed by the START variable.
n If you use a CALID (calendar identifier) variable and want to produce output
that shows multiple calendars on separate pages, then the activities data set
must be sorted by or indexed on the CALID variable and then the START
variable.
n If you use a BY statement, then the activities data set must be sorted by or
indexed on the BY variables.
Concepts: CALENDAR Procedure 227
Structure
Each observation in the activities data set contains information about one activity.
One variable must contain the starting date. If you are producing a schedule
calendar, then another variable must contain either the activity duration or finishing
date. Other variables can contain additional information about an activity.
Examples
Every example in the Examples section uses an activities data set.
Purpose
You can use a holidays data set, specified with the HOLIDATA= option, to identify
the following:
228 Chapter 9 / CALENDAR Procedure
n days that are not available for scheduling work. (In a schedule calendar, PROC
CALENDAR does not schedule activities on these days.)
Structure
Each observation in the holidays data set must contain at least the holiday starting
date. A holiday lasts only one day unless a duration or finishing date is specified.
Supplying a holiday name is recommended, though not required. If you do not
specify which variable contains the holiday name, then PROC CALENDAR uses the
word DATE to identify each holiday.
No Sorting Needed
You do not need to sort or index the holidays data set.
Examples
Every example in the Examples section uses a holidays data set.
Purpose
You can use a calendar data set, specified with the CALEDATA= option, to specify
work schedules for different calendars.
Structure
Each observation in the calendar data set defines one weekly work schedule. The
data set created in the DATA step shown below defines weekly work schedules for
two calendars, CALONE and CALTWO.
data cale;
input _sun_ $ _mon_ $ _tue_ $ _wed_ $ _thu_ $ /
_fri_ $ _sat_ $ _cal_ $ d_length time6.;
datalines;
230 Chapter 9 / CALENDAR Procedure
n names of variables in the WORKDATA= data set (in this example, SHIFT1
and SHIFT2)
_CAL_
the CALID (calendar identifier) variable. The values of this variable identify
different calendars. If this variable is not present, then the first observation in
this data set defines the work schedule that is applied to all calendars in the
activities data set.
If the CALID variable contains a missing value, then the character or numeric
value for the default calendar (DEFAULT or 0) is used. For more details, see “The
Default Calendars” on page 221.
D_LENGTH
the daylength identifier variable. Values of D_LENGTH indicate the length of the
standard workday to be used in calendar calculations. You can set the workday
length either by placing this variable in your calendar data set or by using the
DAYLENGTH= option.
Missing values for this variable default to the number of hours specified in the
DAYLENGTH= option. If the DAYLENGTH= option is not used, then the day
length defaults to 24 hours if INTERVAL=DAY, or eight hours if
INTERVAL=WORKDAY.
You can reset the length of the standard workday with the DAYLENGTH= option or
a D_LENGTH variable in the calendar data set. You can define other work shifts in a
workdays data set.
Examples
The following examples feature a calendar data set:
n “Example 3: Multiple Schedule Calendars with Atypical Work Shifts (Separated
Output)” on page 271
n “Example 4: Multiple Schedule Calendars with Atypical Work Shifts (Combined
and Mixed Output)” on page 278
n “Example 7: Summary Calendar with MEAN Values by Observation” on page 295
Purpose
You can use a workdays data set, specified with the WORKDATA= option, to define
the daily work shifts named in a CALEDATA= data set.
Structure
Each variable in the workdays data set contains one daily schedule of alternating
work and nonwork periods. For example, this DATA step creates a data set that
contains specifications for two work shifts:
data work;
input shift1 time6. shift2 time6.;
datalines;
7:00 7:00
12:00 11:00
13:00 .
17:00 .
;
The variable SHIFT1 specifies a 10-hour workday, with one nonwork period (a lunch
hour); the variable SHIFT2 specifies a 4-hour workday with no nonwork periods.
Examples
See “Example 3: Multiple Schedule Calendars with Atypical Work Shifts (Separated
Output)” on page 271
Data set
Variable Treatment of Missing Values
Data set
Variable Treatment of Missing Values
“SUM Statement” on 0
page 257 , “MEAN
Statement” on page 253
“SUM Statement” on 0
page 257 , “MEAN
Statement” on page 253
Holiday (HOLIDATA=) “_CAL_” on page 230 All holidays apply to all calendars.
PROC CALENDAR Display data from a SAS data set in a monthly Ex. 1, Ex. 3,
calendar format Ex. 4, Ex. 5,
Ex. 6, Ex. 8
Syntax: CALENDAR Procedure 235
DUR Specify the variable that contains the duration Ex. 1, Ex. 2,
of each activity Ex. 3, Ex. 4,
Ex. 5
HOLIDUR Specify the variable in the holidays data set Ex. 1, Ex. 5
that contains the duration of each holiday for a
schedule calendar
HOLISTART Specify a variable in the holidays data set that Ex. 1, Ex. 5
contains the starting date of each holiday
OUTFIN Specify the last day of the week to display in Ex. 3, Ex. 4,
the calendar Ex. 8
OUTSTART Specify the starting day of the week to display Ex. 3, Ex. 4,
in the calendar Ex. 8
Restriction: This procedure is not available in SAS Viya orders that include only SAS Visual
Analytics.
Examples: “Example 1: Schedule Calendar with Holidays: 5-Day Default” on page 261
“Example 3: Multiple Schedule Calendars with Atypical Work Shifts (Separated
Output)” on page 271
“Example 4: Multiple Schedule Calendars with Atypical Work Shifts (Combined and
Mixed Output)” on page 278
“Example 5: Schedule Calendar, Blank or with Holidays” on page 284
“Example 6: Calculating a Schedule Based on Completion of Predecessor Tasks” on
page 287
“Example 8: Multiple Summary Calendars with Atypical Work Shifts (Separated
Output)” on page 300
Syntax
PROC CALENDAR <options>;
Optional Arguments
CALEDATA=SAS-data-set
specifies the calendar data set, a SAS data set that contains weekly work
schedules for multiple calendars.
Default If you omit the CALEDATA= option, then PROC CALENDAR uses a
default work schedule.
Tip A calendar data set is useful if you are using multiple calendars or a
nonstandard work schedule.
DATA=SAS-data-set
specifies the activities data set, a SAS data set that contains starting dates for
all activities and variables to display for each activity. Activities must be sorted
or indexed by starting date.
Default If you omit the DATA= option, then the most recently created SAS
data set is used.
DATETIME
specifies that START and FIN variables contain values in DATETIME. format.
Default If you omit the DATETIME option, then PROC CALENDAR assumes
that the START and FIN values are in the DATE. format.
DAYLENGTH=hours
The hour value must be a SAS TIME value.
Interactions If you specify the DAYLENGTH= option and the calendar data set
contains a D_LENGTH variable, then PROC CALENDAR uses the
DAYLENGTH= value only when the D_LENGTH value is missing.
Tips The DAYLENGTH= option is useful when you use the DUR
statement and your work schedule contains days of varying lengths
(for example, a work week of five half-days). In a work week with
varying day lengths, you need to set a standard day length to use in
calculating duration times. For example, an activity with a duration
of 3.0 workdays lasts 24 hours if DAYLENGTH=8:00 or 30 hours if
DAYLENGTH=10:00.
See “Calendar Data Set ” on page 229 for more information about
setting the length of the standard workday
FILL
displays all months between the first and last activity, start and finish dates
inclusive, including months that contain no activities.
PROC CALENDAR Statement 239
Default If you do not specify FILL, then PROC CALENDAR prints only months
that contain activities. (Months that contain only holidays are not
printed.)
FORMCHAR <(position(s))>='formatting-character(s)'
defines the characters to use for constructing the outlines and dividers for the
cells in the calendar as well as all identifying markers (such as asterisks and
arrows) used to indicate holidays or continuation of activities in PROC
CALENDAR output.
position(s)
identifies the position of one or more characters in the SAS formatting-
character string. A space or a comma separates the positions.
See Table 9.15 on page 240 shows the formatting characters that PROC
CALENDAR uses.
formatting-character(s)
lists the characters to use for the specified positions. PROC CALENDAR
assigns characters in formatting-character(s) to position(s), in the order in
which they are listed. For example, the following option assigns an asterisk
(*) to the 12th position, assigns a single hyphen (-) to the 13th, and does not
alter remaining characters:
formchar(12 13)='*-'
to this:
*------------------ACTIVITY--------------*
240 Chapter 9 / CALENDAR Procedure
1 | Vertical bar
2 - Horizontal bar
13 = Activity line
16 / Activity separator
20 * Holiday marker
See For information about which hexadecimal codes to use for which
characters, consult the documentation for your hardware.
SMALL
prints the month and year on one line.
MEDIUM
prints the month and year in a box four lines high.
LARGE
prints the month seven lines high using asterisks (*). The year is included if
space is available.
Default MEDIUM
242 Chapter 9 / CALENDAR Procedure
HOLIDATA=SAS-data-set
specifies the holidays data set, a SAS data set that contains the holidays that
you want to display in the output. One variable must contain the holiday names
and another must contain the starting dates for each holiday. PROC CALENDAR
marks holidays in the calendar output with asterisks (*) when space permits.
INTERVAL=DAY | WORKDAY
specifies the units of the DUR and HOLIDUR variables to one of two default day
lengths:
DAY
specifies the values of the DUR and HOLIDUR variables in units of 24-hour
days and specifies the default 7-day calendar. For example, a DUR value of
3.0 is treated as 72 hours. The default calendar work schedule consists of
seven working days, all starting at 00:00 with a length of 24:00.
WORKDAY
specifies the values of the DUR and HOLIDUR variables in units of 8-hour
days. WORKDAY also specifies that the default calendar contains five days a
week, Monday through Friday, all starting at 09:00 with a length of 08:00.
When WORKDAY is specified, PROC CALENDAR treats the values of the
DUR and HOLIDUR variables in units of working days, as defined in the
DAYLENGTH= option, the CALEDATA= data set, or the default calendar. For
example, if the working day is eight hours long, then a DUR value of 3.0 is
treated as 24 hours.
Default DAY
LEGEND
prints the names of the variables whose values appear in the calendar. This
identifying text, or legend box, appears at the bottom of the page for each
month if space permits. Otherwise, it is printed on the following page. PROC
CALENDAR identifies each variable by name or by label if one exists. The order
of variables in the legend matches their order in the calendar.
Interaction If you use the SUM and MEAN statements, then the legend box also
contains SUM and MEAN values.
LOCALE
prints the names of months and weekdays in the language that is indicated by
the value of the LOCALE= SAS system option. The LOCALE option in PROC
CALENDAR does not change the starting day of the week.
Default If LOCALE is not specified, then names of months and weekdays are
printed in English.
MEANTYPE=NOBS | NDAYS
specifies the type of mean to calculate for each month.
NOBS
calculates the mean over the number of observations displayed in the month.
NDAYS
calculates the mean over the number of days displayed in the month.
Default NOBS
Interaction Normally, PROC CALENDAR displays all days for each month.
However, it might omit some days if you use the OUTSTART
statement with the OUTDUR or OUTFIN statement.
MISSING
determines how missing values are treated, based on the type of calendar.
Summary Calendar
If there is a day without an activity scheduled, then PROC CALENDAR prints
the values of variables for that day by using the SAS or user-defined that is
format specified for missing values.
Default If you omit MISSING, then days without activities contain no values.
244 Chapter 9 / CALENDAR Procedure
Schedule Calendar
variables with missing values appear in the label of an activity, using the
format specified for missing values.
See “Missing Values in Input Data Sets” on page 232 for more information
about missing values
WEEKDAYS
suppresses the display of Saturdays and Sundays in the output. It also specifies
that the value of the INTERVAL= option is WORKDAY.
proc calendar weekdays;
start date;
run;
proc calendar interval=workday;
start date;
outstart monday;
outfin friday;
run;
Default If you omit WEEKDAYS, then the calendar displays all seven days.
WORKDATA=SAS-data-set
specifies the workdays data set, a SAS data set that defines the work pattern
during a standard working day. Each numeric variable in the workdays data set
denotes a unique work-shift pattern during one working day.
Tip The workdays data set is useful in conjunction with the calendar data
set.
See “Workdays Data Set ” on page 231 and “Calendar Data Set ” on page
229
BY Statement
Processes activities separately for each BY group, producing a separate calendar for each value of
the BY variable.
Syntax
BY <DESCENDING> variable-1
<<DESCENDING> variable-2 …>
<NOTSORTED>;
Required Argument
variable
specifies the variable that the procedure uses to form BY groups. You can
specify more than one variable, but the observations in the data set must be
sorted by all the variables that you specify or have an appropriate index.
Variables in a BY statement are called BY variables.
Optional Arguments
DESCENDING
specifies that the observations are sorted in descending order by the variable
that immediately follows the word DESCENDING in the BY statement.
NOTSORTED
specifies that observations are not necessarily sorted in alphabetic or numeric
order. The observations are grouped in another way (for example, chronological
order).
Details
When you use the CALID statement, you can process activities that apply to
different calendars, indicated by the value of the CALID variable. Because you can
specify only one CALID variable, however, you can create only one level of
grouping. For example, if you want a calendar report to show the activities of
several departments within a company, then you can identify each department with
246 Chapter 9 / CALENDAR Procedure
the value of the CALID variable and produce calendar output that shows the
calendars for all departments.
When you use a BY statement, however, you can further divide activities into
related groups. For example, you can print calendar output that groups
departmental calendars by division. The observations for activities must contain a
variable that identifies which department an activity belongs to and a variable that
identifies the division that a department resides in. Specify the variable that
identifies the department with the CALID statement. Specify the variable that
identifies the division with the BY statement.
CALID Statement
Processes activities in groups defined by the values of a calendar identifier variable.
Syntax
CALID variable
</ OUTPUT=COMBINE | MIX | SEPARATE>;
Required Argument
variable
a character or numeric variable that identifies which calendar an observation
contains data for.
Requirement If you specify the CALID variable, then both the activities and
holidays data sets must contain this variable. If either of these
data sets does not contain the CALID variable, then a default
calendar is used.
CALID Statement 247
Tip You do not need to use a CALID statement to create this variable.
You can include the default variable _CALID_ in the input data sets.
Optional Argument
OUTPUT=COMBINE | MIX | SEPARATE
controls the amount of space required to display output for multiple calendars.
COMBINE
produces one page for each month that contains activities and subdivides
each day by the CALID value.
MIX
produces one page for each month that contains activities and does not
identify activities by the CALID value.
SEPARATE
produces a separate page for each value of the CALID variable.
Restriction The input data must be sorted by the CALID variable and then by
the START variable or must contain an appropriate composite
index.
Default COMBINE
DUR Statement
Specifies the variable that contains the duration of each activity.
Alias: DURATION
Interaction: If you use both a DUR statement and a FIN statement, then DUR is ignored.
Supports: Schedule calendars
Tip: To produce a schedule calendar, you must use either a DUR or FIN statement.
Examples: “Example 1: Schedule Calendar with Holidays: 5-Day Default” on page 261
“Example 2: Schedule Calendar Containing Multiple Calendars” on page 266
“Example 3: Multiple Schedule Calendars with Atypical Work Shifts (Separated
Output)” on page 271
“Example 4: Multiple Schedule Calendars with Atypical Work Shifts (Combined and
Mixed Output)” on page 278
“Example 5: Schedule Calendar, Blank or with Holidays” on page 284
Syntax
DUR variable;
Required Argument
variable
contains the duration of each activity in a schedule calendar.
See For more information about activity durations, see “Activities Data
Set ” on page 226 and “Calendar Data Set ” on page 229
Details
Duration is measured inclusively from the start of the activity (as given in the
START variable). In the output, any activity that lasts part of a day is displayed as
lasting a full day.
WORKDAY 8 hours
You can override the default length of a duration unit by using one of the following:
n the DAYLENGTH= option
FIN Statement
Specifies the variable in the activities data set that contains the finishing date of each activity.
Alias: FINISH
Interaction: If you use both a FIN statement and a DUR statement, then FIN is used.
Supports: Schedule calendars
Tip: To produce a schedule calendar, you must use either a FIN or DUR statement.
Example: “Example 6: Calculating a Schedule Based on Completion of Predecessor Tasks” on
page 287
Syntax
FIN variable;
Required Argument
variable
contains the finishing date of each activity.
Restrictions The values of variable must be either SAS date or datetime values.
If the FIN variable contains datetime values, then you must specify
the DATETIME option in the PROC CALENDAR statement.
Both the START and FIN variables must have matching formats. For
example, if one contains datetime values, then so must the other.
250 Chapter 9 / CALENDAR Procedure
HOLIDUR Statement
Specifies the variable in the holidays data set that contains the duration of each holiday for a
schedule calendar.
Alias: HOLIDURATION
Default: If you do not use a HOLIDUR or HOLIFIN statement, then all holidays last one day.
Restriction: You cannot use the HOLIDUR statement with a HOLIFIN statement.
Supports: Schedule calendars
Examples: “Example 1: Schedule Calendar with Holidays: 5-Day Default” on page 261
“Example 5: Schedule Calendar, Blank or with Holidays” on page 284
Syntax
HOLIDUR variable;
Required Argument
variable
contains the duration of each holiday.
Details
n If you use both the HOLIFIN and HOLIDUR statements, then PROC CALENDAR
uses the HOLIFIN variable value to define each holiday's duration.
n Set the unit of the holiday duration variable in the same way that you set the
unit of the duration variable; use either the INTERVAL= and DAYLENGTH=
options or the CALEDATA= data set.
n Duration is measured inclusively from the start of the holiday (as given in the
HOLISTART variable). In the output, any holiday lasting at least half a day
appears as lasting a full day.
HOLISTART Statement 251
HOLIFIN Statement
Specifies the variable in the holidays data set that contains the finishing date of each holiday.
Alias: HOLIFINISH
Default: If you do not use a HOLIFIN or HOLIDUR statement, then all holidays last one day.
Supports: Schedule calendars
Syntax
HOLIFIN variable;
Required Argument
variable
contains the finishing date of each holiday.
Details
If you use both the HOLIFIN and HOLIDUR statements, then PROC CALENDAR
uses only the HOLIFIN variable.
HOLISTART Statement
Specifies a variable in the holidays data set that contains the starting date of each holiday.
Aliases: HOLISTA
HOLIDAY
Requirement: When you use a holidays data set, HOLISTART is required.
Supports: Summary and schedule calendars
252 Chapter 9 / CALENDAR Procedure
Examples: “Example 1: Schedule Calendar with Holidays: 5-Day Default” on page 261
“Example 5: Schedule Calendar, Blank or with Holidays” on page 284
Syntax
HOLISTART variable;
Required Argument
variable
contains the starting date of each holiday.
Details
n The holidays data set do not need to be sorted.
n All holidays last only one day, unless you use a HOLIFIN or HOLIDUR
statement.
n If two or more holidays occur on the same day, then PROC CALENDAR uses
only the first observation.
HOLIVAR Statement
Specifies a variable in the holidays data set whose values are used to label the holidays.
Aliases: HOLIVARIABLE
HOLINAME
Default: If you do not use a HOLIVAR statement, then PROC CALENDAR uses the word
DATE to identify holidays.
Supports: Summary and schedule calendars
Examples: “Example 1: Schedule Calendar with Holidays: 5-Day Default” on page 261
“Example 5: Schedule Calendar, Blank or with Holidays” on page 284
Syntax
HOLIVAR variable;
MEAN Statement 253
Required Argument
variable
a variable whose values are used to label the holidays. Typically, this variable
contains the names of the holidays.
MEAN Statement
Specifies numeric variables in the activities data set for which mean values are to be calculated for
each month.
Syntax
MEAN variable(s) </ FORMAT=format-name>;
Required Argument
variable(s)
numeric variable for which mean values are calculated for each month.
Optional Argument
FORMAT=format-name
names a SAS or user-defined format to be used in displaying the means
requested.
Alias F=
Details
n The means appear at the bottom of the summary calendar page, if there is room.
Otherwise, they appear on the following page.
n The means appear in the LEGEND box if you specify the LEGEND option.
OUTDUR Statement
Specifies in days the length of the week to be displayed.
Alias: OUTDURATION
Requirement: The OUTSTART statement is required.
Syntax
OUTDUR number-of-days;
Required Argument
number-of-days
an integer that expresses the length in days of the week to be displayed.
Details
Use either the OUTDUR or OUTFIN statement to supply the procedure with
information about the length of the week to display. If you use both, then PROC
CALENDAR ignores the OUTDUR statement.
OUTFIN Statement
Specifies the last day of the week to display in the calendar.
Alias: OUTFINISH
Requirement: The OUTSTART statement is required.
See: “Example 8: Multiple Summary Calendars with Atypical Work Shifts (Separated
Output)” on page 300
OUTSTART Statement 255
Examples: “Example 3: Multiple Schedule Calendars with Atypical Work Shifts (Separated
Output)” on page 271
“Example 4: Multiple Schedule Calendars with Atypical Work Shifts (Combined and
Mixed Output)” on page 278
“Example 8: Multiple Summary Calendars with Atypical Work Shifts (Separated
Output)” on page 300
Syntax
OUTFIN day-of-week;
Required Argument
day-of-week
the name of the last day of the week to display. For example,
outfin friday;
Details
Use either the OUTFIN or OUTDUR statement to supply the procedure with
information about the length of the week to display. If you use both, then PROC
CALENDAR uses only the OUTFIN statement.
OUTSTART Statement
Specifies the starting day of the week to display in the calendar.
Alias: OUTSTA
Default: If you do not use OUTSTART, then each calendar week begins with Sunday.
See: “Example 8: Multiple Summary Calendars with Atypical Work Shifts (Separated
Output)” on page 300
Examples: “Example 3: Multiple Schedule Calendars with Atypical Work Shifts (Separated
Output)” on page 271
“Example 4: Multiple Schedule Calendars with Atypical Work Shifts (Combined and
Mixed Output)” on page 278
“Example 8: Multiple Summary Calendars with Atypical Work Shifts (Separated
Output)” on page 300
256 Chapter 9 / CALENDAR Procedure
Syntax
OUTSTART day-of-week;
Required Argument
day-of-week
the name of the starting day of the week for each week in the calendar. For
example,
outstart monday;
Details
By default, a calendar displays all seven days in a week. Use OUTDUR or OUTFIN,
in conjunction with OUTSTART, to control how many days are displayed and which
day starts the week.
START Statement
Specifies the variable in the activities data set that contains the starting date of each activity.
Aliases: STA
DATE
ID
Requirement: START is required for both summary and schedule calendars.
Example: “Example 1: Schedule Calendar with Holidays: 5-Day Default” on page 261
Syntax
START variable;
Required Argument
variable
contains the starting date of each activity.
Both the START and FIN variables must have matching formats. For
example, if one contains datetime values, then so must the other.
SUM Statement
Specifies numeric variables in the activities data set to total for each month.
Syntax
SUM variable(s) </ FORMAT=format-name>;
Required Argument
variable(s)
specifies one or more numeric variables to total for each month.
Optional Argument
FORMAT=format-name
names a SAS or user-defined format to use in displaying the sums requested.
Alias F=
Details
n The sum appears at the bottom of the calendar page, if there is room.
Otherwise, it appears on the following page.
n The sum appears in the LEGEND box if you specify the LEGEND option.
VAR Statement
Specifies the variables that you want to display for each activity.
Alias: VARIABLE
Example: “Example 6: Calculating a Schedule Based on Completion of Predecessor Tasks” on
page 287
Syntax
VAR variable(s);
Required Argument
variable(s)
specifies one or more variables that you want to display in the calendar.
Details
Display of Variables
n PROC CALENDAR displays variables in the order in which they appear in the
VAR statement. Not all variables are displayed, however, if the LINESIZE= and
PAGESIZE= settings do not allow enough space in the calendar.
n PROC CALENDAR also displays any variable named in a SUM or MEAN
statement for each activity in the calendar output. It displays the variable even
if you do not name that variable in a VAR statement.
n the BY statement
PROC CALENDAR always prints one calendar for every month that contains any
activities. If you specify the FILL option, then the procedure prints every month
between the first and last activities, including months that contain no activities.
Using the BY statement prints one set of output for each BY value. Using the CALID
statement with OUTPUT=SEPARATE prints one set of output for each value of the
CALID variable.
The length of the activity lines depends on the amount of horizontal space
available. You can increase the length by specifying the following variables:
n a larger line size with the LINESIZE= option in the OPTIONS statement
n the WEEKDAYS option to suppress the printing of Saturday and Sunday, which
provides more space for Monday through Friday
If your printer supports an extended character set (one that includes graphics
characters in addition to the regular alphanumeric characters), then you can greatly
improve the appearance of your output by using the FORMCHAR= option to
redefine formatting characters with hexadecimal characters. For information about
which hexadecimal codes to use for which characters, consult the documentation
for your hardware. For an example of assigning hexadecimal values, see
“formatting-character(s)” on page 239.
Details
This example does the following:
n creates a schedule calendar
n uses one of the two default work patterns: 8-hour day, 5-day week
Program
data allacty;
input date : date7. event $ 9-36 who $ 37-48 long;
datalines;
01JUL02 Dist. Mtg. All 1
262 Chapter 9 / CALENDAR Procedure
Program Description
Create the activities data set. Allacty contains both personal and business
activities information for a bank president.
data allacty;
input date : date7. event $ 9-36 who $ 37-48 long;
datalines;
01JUL02 Dist. Mtg. All 1
17JUL02 Bank Meeting 1st Natl 1
02JUL02 Mgrs. Meeting District 6 2
Example 1: Schedule Calendar with Holidays: 5-Day Default 263
Sort the activities data set by the variable that contains the starting date. You are
not required to sort the holidays data set.
proc sort data=allacty;
by date;
run;
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available.
options formchar="|----|+|---+=|-/\<>*";
Create the schedule calendar. DATA= identifies the activities data set; HOLIDATA=
identifies the holidays data set. WEEKDAYS specifies that a week consists of five
eight-hour work days.
proc calendar data=allacty holidata=hol weekdays;
Specify an activity start date variable and an activity duration variable. The
START statement specifies the variable in the activities data set that contains the
starting date of the activities; DUR specifies the variable that contains the duration
of each activity. Creating a schedule calendar requires START and DUR.
start date;
dur long;
holistart date;
holivar holiday;
holidur holilong;
Output: HTML
Output 9.4 Summer Planning Calendar: Julia Cho
266 Chapter 9 / CALENDAR Procedure
Details
This example builds on Example 1 by identifying activities as belonging to one of
two calendars, business or personal. This example does the following:
n produces a schedule calendar report
n uses one of the two default work patterns: 24-hour day, 7-day week
Program
data allacty2;
input date:date7. happen $ 10-34 who $ 35-47 _CAL_ $ long;
datalines;
01JUL02 Dist. Mtg. All CAL1 1
02JUL02 Mgrs. Meeting District 6 CAL1 2
03JUL02 Interview JW CAL1 1
05JUL02 VIP Banquet JW CAL1 1
06JUL02 Beach trip family CAL2 2
08JUL02 Sales Drive District 6 CAL1 5
08JUL02 Trade Show Knox CAL1 3
09JUL02 Orthodontist Meagan CAL2 1
11JUL02 Mgrs. Meeting District 7 CAL1 2
11JUL02 Planning Council Group II CAL1 1
12JUL02 Seminar White CAL1 1
14JUL02 Co. Picnic All CAL1 1
14JUL02 Business trip Fred CAL2 2
15JUL02 Sales Drive District 7 CAL1 5
16JUL02 Dentist JW CAL1 1
Example 2: Schedule Calendar Containing Multiple Calendars 267
Program Description
Create the activities data set and identify separate calendars. Allacty2 contains
both personal and business activities for a bank president. The _CAL_ variable
identifies which calendar an event belongs to.
data allacty2;
input date:date7. happen $ 10-34 who $ 35-47 _CAL_ $ long;
datalines;
01JUL02 Dist. Mtg. All CAL1 1
02JUL02 Mgrs. Meeting District 6 CAL1 2
03JUL02 Interview JW CAL1 1
05JUL02 VIP Banquet JW CAL1 1
06JUL02 Beach trip family CAL2 2
08JUL02 Sales Drive District 6 CAL1 5
08JUL02 Trade Show Knox CAL1 3
09JUL02 Orthodontist Meagan CAL2 1
11JUL02 Mgrs. Meeting District 7 CAL1 2
268 Chapter 9 / CALENDAR Procedure
Create the holidays data set and identify which calendar a holiday affects. The
_CAL_ variable identifies which calendar a holiday belongs to.
data vac;
input hdate:date7. holiday $ 11-25 _CAL_ $ ;
datalines;
29JUL02 vacation CAL2
04JUL02 Independence CAL1
;
Sort the activities data set by the variable that contains the starting date. When
creating a calendar with combined output, you sort only by the activity starting
date, not by the CALID variable. You are not required to sort the holidays data set.
proc sort data=allacty2;
by date;
run;
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available.
options formchar="|----|+|---+=|-/\<>*";
Create the schedule calendar. DATA= identifies the activities data set; HOLIDATA=
identifies the holidays data set. By default, the output calendar displays a 7-day
week.
proc calendar data=allacty2 holidata=vac;
Combine all events and holidays on a single calendar. The CALID statement
specifies the variable that identifies which calendar an event belongs to.
OUTPUT=COMBINE places all events and holidays on the same calendar.
calid _CAL_ / output=combine;
Specify an activity start date variable and an activity duration variable. The
START statement specifies the variable in the activities data set that contains the
starting date of the activities; DUR specifies the variable that contains the duration
of each activity. Creating a schedule calendar requires START and DUR.
start date ;
Example 2: Schedule Calendar Containing Multiple Calendars 269
dur long;
Output: HTML
Output 9.5 Summer Planning Calendar - Work and Home Schedule
Example 3: Multiple Schedule Calendars with Atypical Work Shifts (Separated Output)
271
Details
This example does the following:
n produces separate output pages for each calendar in a single PROC step
OUTPUT=
Print Options Sorting Variables Settings Examples
Program
libname well 'SAS-library';
data well.act;
input task & $16. dur : 5. date : datetime16. _cal_ $ cost;
datalines;
Drill Well 3.50 01JUL02:12:00:00 CAL1 1000
Lay Power Line 3.00 04JUL02:12:00:00 CAL1 2000
Assemble Tank 4.00 05JUL02:08:00:00 CAL1 1000
Build Pump House 3.00 08JUL02:12:00:00 CAL1 2000
Pour Foundation 4.00 11JUL02:08:00:00 CAL1 1500
Install Pump 4.00 15JUL02:14:00:00 CAL1 500
Install Pipe 2.00 19JUL02:08:00:00 CAL1 1000
Erect Tower 6.00 20JUL02:08:00:00 CAL1 2500
Deliver Material 2.00 01JUL02:12:00:00 CAL2 500
Excavate 4.75 03JUL02:08:00:00 CAL2 3500
;
data well.hol;
input date date. holiday $ 11-25 _cal_ $;
datalines;
09JUL02 Vacation CAL2
04JUL02 Independence CAL1
;
data well.cal;
input _sun_ $ _sat_ $ _mon_ $ _tue_ $ _wed_ $ _thu_ $
_fri_ $ _cal_ $;
datalines;
Holiday Holiday Workday Workday Workday Workday Workday CAL1
Holiday Halfday Workday Workday Workday Workday Workday CAL2
;
data well.wor;
input halfday time5.;
datalines;
Example 3: Multiple Schedule Calendars with Atypical Work Shifts (Separated Output)
273
08:00
12:00
;
proc sort data=well.act;
by _cal_ date;
run;
options formchar="|----|+|---+=|-/\<>*";
proc calendar data=well.act
holidata=well.hol
caledata=well.cal
workdata=well.wor
datetime;
calid _cal_ / output=separate;
start date;
dur dur;
holistart date;
holivar holiday;
outstart Monday;
outfin Saturday;
title1 'Well Drilling Work Schedule: Separate Calendars';
format cost dollar9.2;
run;
Program Description
Specify a library so that you can permanently store the activities data set.
libname well 'SAS-library';
Create the activities data set and identify separate calendars. Well.Act is a
permanent SAS data set that contains activities for a well construction project. The
_CAL_ variable identifies the calendar that an activity belongs to.
data well.act;
input task & $16. dur : 5. date : datetime16. _cal_ $ cost;
datalines;
Drill Well 3.50 01JUL02:12:00:00 CAL1 1000
Lay Power Line 3.00 04JUL02:12:00:00 CAL1 2000
Assemble Tank 4.00 05JUL02:08:00:00 CAL1 1000
Build Pump House 3.00 08JUL02:12:00:00 CAL1 2000
Pour Foundation 4.00 11JUL02:08:00:00 CAL1 1500
Install Pump 4.00 15JUL02:14:00:00 CAL1 500
Install Pipe 2.00 19JUL02:08:00:00 CAL1 1000
Erect Tower 6.00 20JUL02:08:00:00 CAL1 2500
Deliver Material 2.00 01JUL02:12:00:00 CAL2 500
Excavate 4.75 03JUL02:08:00:00 CAL2 3500
;
Create the holidays data set. The _CAL_ variable identifies the calendar that a
holiday belongs to.
274 Chapter 9 / CALENDAR Procedure
data well.hol;
input date date. holiday $ 11-25 _cal_ $;
datalines;
09JUL02 Vacation CAL2
04JUL02 Independence CAL1
;
Create the calendar data set. Each observation defines the work shifts for an entire
week. The _CAL_ variable identifies to which calendar the work shifts apply. CAL1
uses the default 8-hour work shifts for Monday through Friday. CAL2 uses a half
day on Saturday and the default 8-hour work shift for Monday through Friday.
data well.cal;
input _sun_ $ _sat_ $ _mon_ $ _tue_ $ _wed_ $ _thu_ $
_fri_ $ _cal_ $;
datalines;
Holiday Holiday Workday Workday Workday Workday Workday CAL1
Holiday Halfday Workday Workday Workday Workday Workday CAL2
;
Create the workdays data set. This data set defines the daily work shifts that are
named in the calendar data set. Each variable (not observation) contains one daily
schedule of alternating work and nonwork periods. The HALFDAY work shift lasts 4
hours.
data well.wor;
input halfday time5.;
datalines;
08:00
12:00
;
Sort the activities data set by the variables that contain the calendar
identification and the starting date, respectively. You are not required to sort the
holidays data set.
proc sort data=well.act;
by _cal_ date;
run;
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available.
options formchar="|----|+|---+=|-/\<>*";
Create the schedule calendar. DATA= identifies the activities data set; HOLIDATA=
identifies the holidays data set; CALEDATA= identifies the calendar data set;
WORKDATA= identifies the workdays data set. DATETIME specifies that the
variable specified with the START statement contains values in SAS datetime
format.
proc calendar data=well.act
holidata=well.hol
caledata=well.cal
workdata=well.wor
datetime;
Example 3: Multiple Schedule Calendars with Atypical Work Shifts (Separated Output)
275
Print each calendar on a separate page. The CALID statement specifies that the
_CAL_ variable identifies calendars. OUTPUT=SEPARATE prints information for
each calendar on separate pages.
calid _cal_ / output=separate;
Specify an activity start date variable and an activity duration variable. The
START statement specifies the variable in the activities data set that contains the
activity starting date; DUR specifies the variable that contains the activity duration.
START and DUR are required for a schedule calendar.
start date;
dur dur;
Customize the calendar appearance. OUTSTART and OUTFIN specify that the
calendar display a 6-day week, Monday through Saturday.
outstart Monday;
outfin Saturday;
Output: HTML
Output 9.6 Part One of Well Drilling Work Schedule
Example 3: Multiple Schedule Calendars with Atypical Work Shifts (Separated Output)
277
Output 9.7 Part Two of Well Drilling Work Schedule
278 Chapter 9 / CALENDAR Procedure
Details
This example does the following:
n produces a schedule calendar
This example creates both combined and mixed output. Producing combined or
mixed calendar output requires only one change to a PROC CALENDAR step: the
setting of the OUTPUT= option in the CALID statement. Combined output is
produced first, then mixed output.
Example 4: Multiple Schedule Calendars with Atypical Work Shifts (Combined and Mixed
Output) 279
This example and “Example 3: Multiple Schedule Calendars with Atypical Work
Shifts (Separated Output)” on page 271 use the same input data for multiple
calendars to produce different output. The only differences in these programs are
how the activities data set is sorted and how the OUTPUT= option is set.
OUTPUT=
Print Options Sorting Variables Settings Examples
Program Description
Specify the SAS library where the activities data set is stored.
libname well
'SAS-library';
Sort the activities data set by the variable that contains the starting date. Do not
sort by the CALID variable when producing combined calendar output.
proc sort data=well.act;
by date;
run;
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available.
options formchar="|----|+|---+=|-/\<>*";
Create the schedule calendar. DATA= identifies the activities data set; HOLIDATA=
identifies the holidays data set; CALEDATA= identifies the calendar data set;
WORKDATA= identifies the workdays data set. DATETIME specifies that the
variable specified with the START statement contains values in SAS datetime
format.
proc calendar data=well.act
holidata=well.hol
caledata=well.cal
workdata=well.wor
datetime;
Combine all events and holidays on a single calendar. The CALID statement
specifies that the _CAL_ variable identifies the calendars. OUTPUT=COMBINE
prints multiple calendars on the same page and identifies each calendar.
calid _cal_ / output=combine;
Specify an activity start date variable and an activity duration variable. The
START statement specifies the variable in the activities data set that contains the
starting date of the activities; DUR specifies the variable that contains the duration
of each activity. START and DUR are required for a schedule calendar.
start date;
dur dur;
Output: HTML
Output 9.8 Well Drilling Work Schedule: Combined Calendars
282 Chapter 9 / CALENDAR Procedure
Output: HTML
Output 9.9 Well Drilling Work Schedule: Mixed Calendars
284 Chapter 9 / CALENDAR Procedure
Details
This example produces a schedule calendar that displays only holidays. You can
use this same code to produce a set of blank calendars by removing the
HOLIDATA= option and the HOLISTART, HOLIVAR, and HOLIDUR statements from
the PROC CALENDAR step.
Program
data acts;
input sta : date7. act $ 11-30 dur;
datalines;
01JAN03 Start 0
31DEC03 Finish 0
;
data holidays;
input sta : date7. act $ 11-30 dur;
datalines;
01JAN03 New Year's 1
30MAR03 Good Friday 1
28MAY03 Memorial Day 1
04JUL03 Independence Day 1
03SEP03 Labor Day 1
22NOV03 Thanksgiving 2
25DEC03 Christmas Break 5
;
options formchar="|----|+|---+=|-/\<>*";
proc calendar data=acts holidata=holidays fill interval=workday;
Example 5: Schedule Calendar, Blank or with Holidays 285
start sta;
dur dur;
holistart sta;
holivar act;
holidur dur;
title1 'Calendar of Holidays Only';
run;
Program Description
Create the activities data set. Specify one activity in the first month and one in the
last, and give each activity a duration of 0. PROC CALENDAR does not print
activities with zero durations in the output.
data acts;
input sta : date7. act $ 11-30 dur;
datalines;
01JAN03 Start 0
31DEC03 Finish 0
;
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available.
options formchar="|----|+|---+=|-/\<>*";
Create the calendar. DATA= identifies the activities data set; HOLIDATA= identifies
the holidays data set. FILL displays all months, even those with no activities. By
default, only months with activities appear in the report. INTERVAL=WORKDAY
specifies that activities and holidays are measured in 8-hour days and that PROC
CALENDAR schedules activities only Monday through Friday.
proc calendar data=acts holidata=holidays fill interval=workday;
Specify an activity start date variable and an activity duration variable. The
START statement specifies the variable in the activities data set that contains the
starting date of the activities; DUR specifies the variable that contains the duration
of each activity. Creating a schedule calendar requires START and DUR.
start sta;
dur dur;
286 Chapter 9 / CALENDAR Procedure
Output: HTML
The following output shows the December portion of the output. Without the
INTERVAL=WORKDAY option, the 5-day Christmas break would be scheduled
through the weekend.
Example 6: Calculating a Schedule Based on Completion of Predecessor Tasks 287
Details
Program Description
This example does the following:
n calculates a project schedule containing multiple calendars (PROC CPM)
n produces a listing of the PROC CPM output data set (PROC PRINT)
This example features PROC CPM's ability to calculate a schedule that meets the
following criteria:
n is based on an initial starting date
In order to use PROC CPM, you must complete the following steps:
1 Create an activities data set that contains activities with durations. (You can
indicate nonwork days, weekly work schedules, and work shifts with holidays,
calendar, and work-shift data sets.)
PROC CPM can process your data to generate a data set that contains the start and
end dates for each activity. PROC CPM schedules the activities, based on the
Example 6: Calculating a Schedule Based on Completion of Predecessor Tasks 289
duration information, weekly work patterns, work shifts, as well as holidays and
nonwork days that interrupt the schedule. You can generate several views of the
schedule that is computed by PROC CPM, from a simple listing of start and finish
dates to a calendar, a Gantt chart, or a network diagram.
See Also
This example introduces users of PROC CALENDAR to more advanced SAS
scheduling tools. For an introduction to project management tasks and tools and
several examples, see Project Management Using the SAS System. For more
examples, see SAS/OR Software: Project Management Examples. For complete
reference documentation, see SAS/OR(R) 9.3 User's Guide: Mathematical
Programming.
Program
options formchar="|----|+|---+=|-/\<>*";
data grant;
input jobnum Task $ 4-22 Days Succ1 $ 27-45 aldate : date7. altype $
_cal_ $;
format aldate date7.;
datalines;
1 Run Exp 1 11 Analyze Exp 1 . . Student
2 Analyze Exp 1 5 Send Report 1 . . Prof.
3 Send Report 1 0 Run Exp 2 . . Prof.
4 Run Exp 2 11 Analyze Exp 2 . . Student
5 Analyze Exp 2 4 Send Report 2 . . Prof.
6 Send Report 2 0 Write Final Report . . Prof.
7 Write Final Report 4 Send Final Report . . Prof.
8 Send Final Report 0 . . Student
9 Site Visit 1 18jul07 ms Prof.
;
data nowork;
format holista date7. holifin date7.;
input holista : date7. holifin : date7. name $ 17-32 _cal_ $;
datalines;
04jul07 04jul07 Independence Day Prof.
03sep07 03sep07 Labor Day Prof.
04jul07 04jul07 Independence Day Student
03sep07 03sep07 Labor Day Student
16jul07 17jul07 PROF Vacation Prof.
16aug07 17aug07 STUDENT Vacation Student
;
proc cpm data=grant
date='01jul07'd
interval=weekday
out=gcpm1
holidata=nowork;
activity task;
290 Chapter 9 / CALENDAR Procedure
successor succ1;
duration days;
calid _cal_;
id task;
aligndate aldate;
aligntype altype;
holiday holista / holifin=holifin;
run;
proc print data=gcpm1;
title 'Data Set GCPM1, Created with PROC CPM';
run;
proc sort data=gcpm1;
by e_start;
run;
proc calendar data=gcpm1
holidata=nowork
interval=workday;
start e_start;
fin e_finish;
calid _cal_ / output=combine;
holistart holista;
holifin holifin;
holivar name;
var task;
title 'Schedule for Experiment X-15';
title2 'Professor and Student Schedule';
run;
Program Description
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available.
options formchar="|----|+|---+=|-/\<>*";
Create the activities data set and identify separate calendars. This data identifies
two calendars: the professor's (the value of _CAL_ is Prof.) and the student's (the
value of _CAL_ is Student). The Succ1 variable identifies which activity cannot begin
until the current one ends. For example, Analyze Exp 1 cannot begin until Run Exp 1
is completed. The DAYS value of 0 for JOBNUM 3, 6, and 8 indicates that these jobs
are milestones.
data grant;
input jobnum Task $ 4-22 Days Succ1 $ 27-45 aldate : date7. altype $
_cal_ $;
format aldate date7.;
datalines;
1 Run Exp 1 11 Analyze Exp 1 . . Student
2 Analyze Exp 1 5 Send Report 1 . . Prof.
3 Send Report 1 0 Run Exp 2 . . Prof.
4 Run Exp 2 11 Analyze Exp 2 . . Student
5 Analyze Exp 2 4 Send Report 2 . . Prof.
Example 6: Calculating a Schedule Based on Completion of Predecessor Tasks 291
Create the holidays data set and identify which calendar a nonwork day belongs
to. The two holidays are listed twice, once for the professor's calendar and once for
the student's. Because each person is associated with a separate calendar, PROC
CPM can apply the personal vacation days to the appropriate calendars.
data nowork;
format holista date7. holifin date7.;
input holista : date7. holifin : date7. name $ 17-32 _cal_ $;
datalines;
04jul07 04jul07 Independence Day Prof.
03sep07 03sep07 Labor Day Prof.
04jul07 04jul07 Independence Day Student
03sep07 03sep07 Labor Day Student
16jul07 17jul07 PROF Vacation Prof.
16aug07 17aug07 STUDENT Vacation Student
;
Calculate the schedule with PROC CPM. PROC CPM uses information supplied in
the activities and holidays data sets to calculate start and finish dates for each
activity. The DATE= option supplies the starting date of the project. The CALID
statement is not required, even though this example includes two calendars,
because the calendar identification variable has the special name _CAL_.
proc cpm data=grant
date='01jul07'd
interval=weekday
out=gcpm1
holidata=nowork;
activity task;
successor succ1;
duration days;
calid _cal_;
id task;
aligndate aldate;
aligntype altype;
holiday holista / holifin=holifin;
run;
Print the output data set that was created with PROC CPM. This step is not
required. PROC PRINT is a useful way to view the calculations produced by PROC
CPM.
proc print data=gcpm1;
title 'Data Set GCPM1, Created with PROC CPM';
run;
Sort GCPM1 by the variable that contains the activity start dates before using it
with PROC CALENDAR.
proc sort data=gcpm1;
by e_start;
run;
292 Chapter 9 / CALENDAR Procedure
Create the schedule calendar. GCPM1 is the activity data set. PROC CALENDAR
uses the S_START and S_FINISH dates, calculated by PROC CPM, to print the
schedule. The VAR statement selects only the variable TASK to display on the
calendar output.
proc calendar data=gcpm1
holidata=nowork
interval=workday;
start e_start;
fin e_finish;
calid _cal_ / output=combine;
holistart holista;
holifin holifin;
holivar name;
var task;
title 'Schedule for Experiment X-15';
title2 'Professor and Student Schedule';
run;
Output: HTML
PROC PRINT displays the observations in GCPM1, showing the scheduling
calculations created by PROC CPM.
PROC CALENDAR created the following schedule calendar by using the S_START
and S_FINISH dates that were calculated by PROC CPM. The activities on July 25
and August 15, because they are milestones, do not delay the start of a successor
activity. Note that Site Visit occurs on July 18, the same day that Analyze Exp 1
occurs. To prevent this overallocation of resources, you can use resource
constrained scheduling, available in SAS/OR software.
Example 6: Calculating a Schedule Based on Completion of Predecessor Tasks 293
Details
This example does the following:
n produces a summary calendar
n displays holidays
n produces sum and mean values by business day (observation) for three
variables
n prints a legend and uses variable labels
To produce MEAN values based on the number of days in the calendar month, use
MEANTYPE=NDAYS. By default, MEANTYPE=NOBS, which calculates the MEAN
values according to the number of days for which data exists.
Program
data meals;
input date : date7. Brkfst Lunch Dinner;
datalines;
01Dec08 123 234 238
02Dec08 188 188 198
03Dec08 123 183 176
04Dec08 200 267 243
05Dec08 176 165 177
08Dec08 178 198 187
09Dec08 165 176 187
296 Chapter 9 / CALENDAR Procedure
Program Description
Create the Activities data set. MEALS records how many meals were served for
breakfast, lunch, and dinner on the days that the cafeteria was open for business.
data meals;
input date : date7. Brkfst Lunch Dinner;
datalines;
01Dec08 123 234 238
02Dec08 188 188 198
03Dec08 123 183 176
04Dec08 200 267 243
05Dec08 176 165 177
08Dec08 178 198 187
09Dec08 165 176 187
10Dec08 187 176 231
11Dec08 176 187 222
12Dec08 187 187 123
15Dec08 176 165 177
16Dec08 156 . 167
17Dec08 198 143 167
18Dec08 178 198 187
19Dec08 165 176 187
22Dec08 187 187 123
;
Sort the Activities data set by the activity starting date. You are not required to
sort the Holidays data set.
proc sort data=meals;
by date;
run;
Create picture formats for the variables that indicate how many meals were
served.
proc format;
picture bfmt other = '000 Brkfst';
picture lfmt other = '000 Lunch ';
picture dfmt other = '000 Dinner';
run;
298 Chapter 9 / CALENDAR Procedure
Set the FORMCHAR and LINESIZE options. Setting FORMCHAR to this exact
string renders better HTML output when it is viewed outside of the SAS
environment where SAS Monospace fonts are not available.
options formchar="|----|+|---+=|-/\<>*";
Create the summary calendar. DATA= identifies the Activities data set;
HOLIDATA= identifies the Holidays data set. The START statement specifies the
variable in the Activities data set that contains the activity starting date; START is
required.
proc calendar data=meals holidata=closed;
start date;
Calculate, label, and format the sum and mean values. The SUM and MEAN
statements calculate sum and mean values for three variables and print them with
the specified format. The LABEL statement prints a legend and uses labels instead
of variable names. The FORMAT statement associates picture formats with three
variables.
sum brkfst lunch dinner / format=4.0;
mean brkfst lunch dinner / format=6.2;
label brkfst = 'Breakfasts Served'
lunch = ' Lunches Served'
dinner = ' Dinners Served';
format brkfst bfmt.
lunch lfmt.
dinner dfmt.;
Output: HTML
Output 9.14 Meals Served in Company Cafeteria - Mean Number by Business Day
300 Chapter 9 / CALENDAR Procedure
Details
This example does the following:
n produces a summary calendar for multiple calendars in a single PROC step
n displays holidays
n uses separate work patterns, work shifts, and holidays for each calendar
OUTPUT=
Print Options Sorting Variables Settings Examples
Program
libname well
'SAS-library';
run;
proc sort data=well.act;
by _cal_ date;
run;
options formchar="|----|+|---+=|-/\<>*" linesize=132;
proc calendar data=well.act
holidata=well.hol
datetime legend;
calid _cal_ / output=separate;
start date;
holistart date;
holivar holiday;
sum cost / format=dollar10.2;
outstart Monday;
outfin Saturday;
title 'Well Drilling Cost Summary';
title2 'Separate Calendars';
format cost dollar10.2;
run;
Program Description
Specify the SAS library where the Activities data set is stored.
libname well
302 Chapter 9 / CALENDAR Procedure
'SAS-library';
run;
Sort the Activities data set by the variables containing the calendar identification
and the starting date, respectively.
proc sort data=well.act;
by _cal_ date;
run;
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available. LINESIZE needs to be set in this example to
prevent truncating data in the output.
options formchar="|----|+|---+=|-/\<>*" linesize=132;
Create the summary calendar. DATA= identifies the Activities data set;
HOLIDATA= identifies the Holidays data set; CALDATA= identifies the Calendar
data set; WORKDATA= identifies the Workdays data set. DATETIME specifies that
the variable specified with the START statement contains a SAS datetime value.
LEGEND prints text that identifies the variables.
proc calendar data=well.act
holidata=well.hol
datetime legend;
Print each calendar on a separate page. The CALID statement specifies that the
_CAL_ variable identifies calendars. OUTPUT=SEPARATE prints information for
each calendar on separate pages.
calid _cal_ / output=separate;
Specify an activity start date variable and retrieve holiday information. The
START statement specifies the variable in the Activities data set that contains the
activity starting date. The HOLISTART and HOLIVAR statements specify the
variables in the Holidays data set that contain the start date and name of each
holiday, respectively. These statements are required when you use a Holidays data
set.
start date;
holistart date;
holivar holiday;
Calculate sum values. The SUM statement totals the COST variable for all
observations in each calendar.
sum cost / format=dollar10.2;
Display a 6-day week. OUTSTART and OUTFIN specify that the calendar display a
6-day week, Monday through Saturday.
outstart Monday;
outfin Saturday;
Output: HTML
Output 9.15 Part One of Well Drilling Cost Summary
304 Chapter 9 / CALENDAR Procedure
10
CATALOG Procedure
For more information about SAS libraries and catalogs, see SAS Language
Reference: Concepts.
You can perform similar functions with the SAS Explorer window and with
DICTIONARY tables in the SQL procedure. For information about the Explorer
window, see the online Help. For information about PROC SQL, see SAS SQL
Procedure User’s Guide.
See: CATALOG Procedure under Windows, UNIX, z/OS
Chapter 10, “CATALOG Copy entries from one SAS catalog to another Ex. 1, Ex. 2,
Procedure,” Ex. 3
Syntax
PROC CATALOG CATALOG=<libref.>catalog <ENTRYTYPE=entry-type>
<FORCE> <KILL>;
Required Argument
CATALOG=<libref.>catalog
specifies the SAS catalog to process.
Aliases CAT=
C=
Example “Example 3: Using the FORCE Option with the KILL Option” on page
331
Optional Arguments
ENTRYTYPE=entry-type
restricts processing of the current PROC CATALOG step to one entry type.
Alias ET=
Interactions The specified entry type applies to any one-level entry names that
are used in a subordinate statement. You cannot override this
specification in a subordinate statement.
The ENTRYTYPE= option does not restrict the effects of the KILL
option.
FORCE
forces statements to execute on a catalog that is opened by another resource
environment.
Tip Use the FORCE option to execute the statement, even if exclusive
access cannot be obtained.
Example “Example 3: Using the FORCE Option with the KILL Option” on page
331
KILL
affects the specified catalog and deletes all entries in a SAS catalog.
CAUTION
Do not attempt to limit the effects of the KILL option. This option deletes all
entries in a SAS catalog before any option or other statement takes effect.
Interactions The KILL option deletes all catalog entries even when the
ENTRYTYPE= option is specified.
310 Chapter 10 / CATALOG Procedure
The SAVE statement has no effect because the KILL option deletes
all entries in a SAS catalog before any other statements are
processed.
Tip The KILL option deletes all entries but does not remove an empty
catalog from the SAS library. You must use another method, such
as PROC DATASETS or the DIR window to delete an empty SAS
catalog.
Example “Example 3: Using the FORCE Option with the KILL Option” on page
331
CHANGE Statement
Renames one or more catalog entry names.
Tip: You can change multiple entry names in a single CHANGE statement or use
multiple CHANGE statements.
Example: “Example 2: Displaying Contents, Changing Names, and Changing a Description” on
page 329
Syntax
CHANGE old-name-1=new-name-1
<old-name-2=new-name-2 …>
</ ENTRYTYPE=entry-type>;
Required Argument
old-name=new-name
specifies the current name of a catalog entry and the new name that you want
to assign to it. Specify any valid SAS name.
Restriction You must designate the type of the entry, either with the name
(entry-name.entry-type) or with the ENTRYTYPE= option.
Optional Argument
ENTRYTYPE=entry-type
restricts processing to one entry type.
Alias ET=
CONTENTS Statement
Lists the contents of a catalog in the procedure output or writes a list of the contents to a SAS data
set, an external file, or both.
Note: The ENTRYTYPE= option is not available for the CONTENTS statement.
Example: “Example 2: Displaying Contents, Changing Names, and Changing a Description” on
page 329
Syntax
CONTENTS <CATALOG=<libref.>catalog > <OUT=SAS-data-set> <FILE=fileref>;
Without Arguments
The output is sent to the procedure output.
Optional Arguments
CATALOG=<libref.>catalog
specifies the SAS catalog to process.
Aliases CAT=
C=
Default None
FILE=fileref
sends the contents to an external file that is identified with a SAS fileref.
Interaction If fileref has not been previously assigned to a file, then the file is
created and named according to operating environment-dependent
rules for external files.
OUT=SAS-data-set
sends the contents to a SAS data set. When the statement executes, a message
in the SAS log reports that a data set has been created. The data set contains
six variables in the following order:
Table 10.1 OUT= Outpput
COPY Statement
Copies some or all of the entries in one catalog to another catalog.
Syntax
COPY OUT=<libref.>catalog <options>;
Required Argument
OUT=<libref.>catalog
names the catalog to which entries are copied.
Optional Arguments
ENTRYTYPE=entry-type
restricts processing to one entry type for the current COPY statement and any
subsequent SELECT or EXCLUDE statements.
Alias ET=
IN=<libref.>catalog
specifies the catalog to copy.
Interaction The IN= option overrides a CATALOG= argument that was specified
in the PROC CATALOG statement.
LOCKCAT=EXCLUSIVE | SHARE
specifies whether to enable more than one user to copy to the same catalog at
the same time. Using LOCKCAT=SHARE locks individual entries rather than the
entire catalog, which enables greater throughput. The default is
LOCKCAT=EXCLUSIVE, which locks the entire catalog to one user. Note that
using the LOCKCAT=SHARE option can lessen performance if used in a single-
user environment because of the overhead associated with locking and
unlocking each entry.
MOVE
deletes the original catalog or entries after the new copy is made.
Interaction When the MOVE option removes all entries from a catalog, the
procedure deletes the catalog from the library.
NEW
overwrites the catalog (specified by the OUT= option) if it already exists. If you
omit the NEW option , then PROC CATALOG updates the catalog.
See For information about using the NEW option with concatenated catalogs,
see “Catalog Concatenation” on page 323.
NOEDIT
prevents the copied version of the following SAS/AF entry types from being
edited by the BUILD procedure:
CBT
FRAME
HELP
MENU
PROGRAM
SCL
SYSTEM
Restriction If you specify the NOEDIT option for an entry that is not one of the
above types, then it is ignored.
Tip When creating SAS/AF applications for other users, use the NOEDIT
option to protect the application by preventing certain catalog
entries from being altered.
NOSOURCE
omits copying the source lines when you copy a SAS/AF PROGRAM, FRAME, or
SCL entry.
Alias NOSRC
Restriction If you specify this option for an entry other than a PROGRAM,
FRAME, or SCL entry, then it is ignored.
DELETE Statement
Deletes entries from a SAS catalog.
Syntax
DELETE entry-1 <entry-2 …> </ ENTRYTYPE=entry-type>;
Required Argument
entry-1 <entry-2 …>
specifies the name of one or more SAS catalog entries.
Restriction You must designate the type of the entry, either with the name
(entry-name.entry-type) or with the ENTRYTYPE= option.
Optional Argument
ENTRYTYPE=entry-type
restricts processing to one entry type.
EXCHANGE Statement
Switches the name of two catalog entries.
Restriction: When using the EXCHANGE statement, the catalog entries must be of the same
type.
Syntax
EXCHANGE name-1=other-name-1
<name-2=other-name-2 …>
</ ENTRYTYPE=entry-type>;
Required Argument
name=other-name
specifies two catalog entry names that the procedure switches.
Interaction You can specify only the entry name without the entry type if you
use the ENTRYTYPE= option on either the PROC CATALOG
statement or the EXCHANGE statement.
Optional Argument
ENTRYTYPE=entry-type
restricts processing to one entry type.
Alias ET=
EXCLUDE Statement
Specifies entries that the COPY statement does not copy.
You can use multiple EXCLUDE statements with a single COPY statement within a
RUN group.
See: COPY Statement on page 312 and SELECT Statement on page 318
Example: “Example 1: Copying, Deleting, and Moving Catalog Entries from Multiple Catalogs”
on page 325
Syntax
EXCLUDE entry-1 <entry-2 …> </ ENTRYTYPE=entry-type>;
Required Argument
entry-1 <entry-2 …>
specifies the name of one or more SAS catalog entries.
Restriction You must designate the type of the entry, either when you specify
the name (entry-name.entry-type) or with the ENTRYTYPE= option.
Optional Argument
ENTRYTYPE=entry-type
restricts processing to one entry type.
Alias ET=
MODIFY Statement
Changes the description of a catalog entry.
Syntax
MODIFY entry (DESCRIPTION=<<'>entry-description<'>>)
</ ENTRYTYPE=entry-type>;
SAVE Statement 317
Required Arguments
entry
specifies the name of one SAS catalog entry. You can specify the entry type
with the name (entry-name.entry-type).
Restriction You must designate the type of the entry, either when you specify
the name (entry-name.entry-type) or with the ENTRYTYPE= option.
DESCRIPTION=<<'>entry-description<'>>
changes the description of a catalog entry by replacing it with a new description,
up to 256 characters long, or by removing it altogether. You can enclose the
description in single or double quotation marks.
Alias DESC
Tip When using the MODIFY statement with the CATALOG procedure, use
the DESCRIPTION= option with no text to remove the current
description.
Optional Argument
ENTRYTYPE=entry-type
restricts processing to one entry type.
Alias ET=
SAVE Statement
Specifies entries not to delete from a SAS catalog.
Restriction: The SAVE statement cannot limit the effects of the KILL option.
Tips: Use the SAVE statement to delete all but a few entries in a catalog. Use the
DELETE statement when it is more convenient to specify which entries to delete.
You can specify multiple entries and use multiple SAVE statements.
See: DELETE Statement on page 314
Syntax
SAVE entry-1 <entry-2 …> </ ENTRYTYPE=entry-type >;
318 Chapter 10 / CATALOG Procedure
Required Argument
entry-1 <entry-2…>
specifies the name of one or more SAS catalog entries.
Restriction You must designate the type of the entry, either with the name
(entry-name.entry-type) or with the ENTRYTYPE= option when
using the SAVE statement.
Optional Argument
ENTRYTYPE=entry-type
restricts processing to one entry type.
Alias ET=
SELECT Statement
Specifies entries that the COPY statement copies.
Syntax
SELECT entry-1 <entry-2 …> </ ENTRYTYPE=entry-type >;
Required Argument
entry-1 <entry-2 …>
specifies the name of one or more SAS catalog entries.
Restriction You must designate the type of the entry, either when you specify
the name (entry-name.entry-type) or with the ENTRYTYPE= option.
Usage: CATALOG Procedure 319
Optional Argument
ENTRYTYPE=entry-type
restricts processing to one entry type.
Alias ET=
Definition
The CATALOG procedure is interactive. Once you submit a PROC CATALOG
statement, you can continue to submit and execute statements or groups of
statements without repeating the PROC CATALOG statement.
A set of procedure statements ending with a RUN statement is called a RUN group.
The changes specified in a given group of statements take effect when a RUN
statement is encountered.
Note: When you enter a QUIT, DATA, or PROC statement, any statements
following the last RUN group execute before the CATALOG procedure terminates. If
320 Chapter 10 / CATALOG Procedure
you enter a RUN statement with the CANCEL option, then the remaining
statements do not execute before the procedure ends.
Note: Be careful when setting up batch jobs in which one RUN group's statements
depend on the effects of a previous RUN group, especially when deleting and
renaming entries.
1 in a subordinate statement
2 in the PROC CATALOG or the COPY statement
Note: All statements, except the CONTENTS statement, accept the ENTRYTYPE=
option.
To create a default for entry type for all statements in the current step, use the
ENTRYTYPE= option in the PROC CATALOG statement. To set the default for only
the current statement, use the ENTRYTYPE= option in a subordinate statement.
You can have many entries of one type and a few of other types. You can use the
ENTRYTYPE= option to specify a default and then override that for individual
entries with (ENTRYTYPE=) in parentheses after those entries.
ENTRYTYPE=entry-type
not in parentheses, sets a default entry type for the entire PROC step when used
in the PROC CATALOG statement. In all other statements, this option sets a
default entry type for the current statement. If you omit the ENTRYTYPE=
option, then PROC CATALOG processes all entries in the catalog.
Alias ET=
(ENTRYTYPE=entry-type)
in parentheses, identifies the type of the entry just preceding it.
Alias (ET=)
Catalog Concatenation
Restrictions
When you use the CATALOG procedure to copy concatenated catalogs and you use
the NEW option, the following rules apply:
n If the input catalog is a concatenation and if the output catalog exists in any
level of the input concatenation, then the copy is not allowed.
n If the output catalog is a concatenation and if the input catalog exists in the first
level of the output concatenation, then the copy is not allowed.
For example, the following code demonstrates these two rules, and the copy fails:
libname first 'SAS-library-1';
libname second 'SAS-library-2';
/* create concat.x */
libname concat (first second);
/* fails rule #1 */
proc catalog c=concat.x;
copy out=first.x new;
run;
quit;
/* fails rule #2 */
proc catalog c=first.x;
324 Chapter 10 / CATALOG Procedure
In summary, the following table shows when copies are allowed. In the table, A and
B are libraries, and each contains catalog X. Catalog C is an automatic
concatenation of A and B, and catalog D is an automatic concatenation of B and A.
C.X B.X No
C.X D.X No
D.X C.X No
A.X A.X No
C.X A.X No
A.X C.X No
Details
This example demonstrates the following tasks:
n copies entries by excluding a few entries
n moves entries
n deletes entries
Program
libname perm 'SAS-library';
proc catalog cat=perm.sample;
delete credit.program credit.log;
run;
copy out=tcatall;
run;
copy out=testcat;
exclude test1 test2 test3 passist (et=slist) / et=log;
run;
copy out=logcat move;
select test1 test2 test3 / et=log;
run;
copy out=perm.finance noedit;
select loan.frame loan.help loan.keys loan.pmenu;
run;
copy in=perm.formats out=perm.finance;
select revenue.format dept.formatc;
run;
quit;
Program Description
Assign a library reference to a SAS library. The LIBNAME statement assigns the
libref Perm to the SAS library that contains a permanent SAS catalog.
libname perm 'SAS-library';
Example 1: Copying, Deleting, and Moving Catalog Entries from Multiple Catalogs 327
Copy everything except three LOG entries and Passist.Slist from Perm.Sample to
Work.TestCat. The EXCLUDE statement specifies which entries not to copy. ET=
specifies a default type. (ET=) specifies an exception to the default type.
copy out=testcat;
exclude test1 test2 test3 passist (et=slist) / et=log;
run;
Copy two formats from Perm.Formats to Perm.Finance. The IN= option enables
you to copy from a different catalog than the one specified in the PROC CATALOG
statement. Note the entry types for numeric and character formats:
REVENUE.FORMAT is a numeric format and DEPT.FORMATC is a character format.
The COPY and SELECT statements execute before the QUIT statement ends the
PROC CATALOG step.
copy in=perm.formats out=perm.finance;
select revenue.format dept.formatc;
run;
quit;
328 Chapter 10 / CATALOG Procedure
Details
This example demonstrates the following tasks:
n lists the entries in a catalog and routes the output to a file
Program
libname perm 'SAS-library';
proc catalog catalog=perm.finance;
contents;
title1 'Contents of PERM.FINANCE before changes are made';
run;
change dept=deptcode (et=formatc);
run;
modify loan.frame (description='Loan analysis app. - ver1');
contents;
330 Chapter 10 / CATALOG Procedure
quit;
Program Description
Assign a library reference. The LIBNAME statement assigns a libref to the SAS
library that contains a permanent SAS catalog.
libname perm 'SAS-library';
List the entries in a catalog and route the output to a file. The CONTENTS
statement creates a listing of the contents of the SAS catalog Perm.Finance and
routes the output to a file.
proc catalog catalog=perm.finance;
contents;
title1 'Contents of PERM.FINANCE before changes are made';
run;
Change entry names. The CHANGE statement changes the name of an entry that
contains a user-written character format. (ET=) specifies the entry type.
change dept=deptcode (et=formatc);
run;
Process entries in multiple run groups. The MODIFY statement changes the
description of an entry. The CONTENTS statement creates a listing of the contents
of Perm.Finance after all the changes have been applied. QUIT ends the procedure.
modify loan.frame (description='Loan analysis app. - ver1');
contents;
title1 'Contents of PERM.FINANCE after changes are made';
run;
quit;
Example 3: Using the FORCE Option with the KILL Option 331
Output Examples
Output 10.1 Contents of Perm.Finance before and After Changes Are Made
Details
This example demonstrates the following tasks:
n creates a resource environment
n tries to delete all catalog entries by using the KILL option but receives an error
n specifies the FORCE option to successfully delete all catalog entries by using
the KILL option.
Program
%macro matt;
%put &syscc;
%mend matt;
proc catalog c=work.sasmacr kill;
run;
quit;
proc catalog c=work.sasmacr kill force;
run;
quit;
Program Description
Start a process (resource environment). Do this by opening the catalog entry
MATT in the Work.Sasmacr catalog.
%macro matt;
%put &syscc;
%mend matt;
Specify the KILL option to delete all catalog entries in Work.SasMacr. Since there
is a resource environment (process using the catalog), KILL does not work and an
error is sent to the log.
proc catalog c=work.sasmacr kill;
run;
quit;
Specify the FORCE option to the KILL option to delete the catalog entries.
proc catalog c=work.sasmacr kill force;
run;
quit;
Example 3: Using the FORCE Option with the KILL Option 333
Log Examples
Example Code 10.2 KILL Option Causes Error to Be Sent to the SAS Log
1 %macro matt;
2 %put &syscc;
3 %mend matt;
4
5 proc catalog c=work.sasmacr kill;
NOTE: Writing HTML Body file: sashtml.htm
6 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE CATALOG used (Total process time):
real time 6.46 seconds
cpu time 0.62 seconds
Example Code 10.3 Adding the FORCE Option to the KILL Option to Delete the Catalog
Entry
11
CHART Procedure
PROC CHART is a useful tool that lets you visualize data quickly, but if you need to
produce presentation-quality graphics that include color and various fonts, then
use SAS/GRAPH software. The GCHART procedure in SAS/GRAPH software
produces the same types of charts as PROC CHART does. In addition, PROC
GCHART can produce donut charts.
Bar Charts
Horizontal and vertical bar charts display the magnitude of data with bars, each of
which represents a category of data. The length or height of the bars represents the
value of the chart statistic for each category.
The following output shows a vertical bar chart that displays the number of
responses for the five categories from the survey data. The following statements
produce the output:
The following output shows the same data presented in a horizontal bar chart. The
two types of bar charts have essentially the same characteristics, except that
horizontal bar charts by default display a table of statistic values to the right of the
bars. The following statements produce the output:
Block Charts
Block charts display the relative magnitude of data by using blocks of varying
height, each set in a square that represents a category of data. The following output
shows the number of each survey response in the form of a block chart.
Pie Charts
Pie charts represent the relative contribution of parts to the whole by displaying
data as wedge-shaped slices of a circle. Each slice represents a category of the
data. The following output shows the survey results divided by response into five
pie slices. The following statements produce the output:
Star Charts
With PROC CHART, you can produce star charts that show group frequencies,
totals, or mean values. A star chart is similar to a vertical bar chart, but the bars on
a star chart radiate from a center point, like spokes in a wheel. Star charts are
commonly used for cyclical data, such as measures taken every month or day or
hour. They are also used for data in which the categories have an inherent order
(“always” meaning more frequent than “usually,” which means more frequent than
Overview: CHART Procedure 341
“sometimes”). The following output shows the survey data displayed in a star chart.
The following statements produce the output:
Syntax
PROC CHART <options>;
Optional Arguments
DATA=SAS-data-set
identifies the input SAS data set.
Restriction You cannot use PROC CHART with an engine that supports
concurrent access if another user is updating the data set at the
same time.
FORMCHAR <(position(s))>='formatting-character(s)'
defines the characters to use for constructing the horizontal and vertical axes,
reference lines, and other structural parts of a chart. It also defines the symbols
to use to create the bars, blocks, or sections in the output.
position(s)
identifies the position of one or more characters in the SAS formatting-
character string. A space or a comma separates the positions.
formatting-character(s)
lists the characters to use for the specified positions. PROC CHART assigns
characters in formatting-character(s) to position(s), in the order which they
are listed. For example, the following option assigns the asterisk (*) to the
second formatting character, the number sign (#) to the seventh character,
and does not alter the remaining characters:
formchar(2,7)='*#'
1 | Vertical axes in bar charts, the sides of the blocks in block charts,
and reference lines in horizontal bar charts. In side-by-side bar
charts, the first and second formatting characters appear around
each value of the group variable (below the chart) to indicate the
width of each group.
2 - Horizontal axes in bar charts, the horizontal lines that separate the
blocks in a block chart, and reference lines in vertical bar charts. In
side-by-side bar charts, the first and second formatting characters
appear around each value of the group variable (below the chart) to
indicate the width of each group.
7 + Tick marks in bar charts and the centers in pie and star charts.
16 / Ends of blocks and the diagonal lines that separate blocks in a block
chart.
Pies_Sold Mean
1 2
400 +
| *** ***
300 +--***-------***---------***-------***-----------------------------------
-
| *** *** *** *** ***
200 +--***--***--***---------***--***--***---------***-------***--------------
7 | *** *** *** *** *** *** *** ***
100 +--***--***--***---------***--***--***---------***--***--***--------------
| *** *** *** *** *** *** *** *** *** *** *** ***
--------------------------------------------------------------------------
a9b c r a b c r a b c r Flavor
plhhplhhplh h
pueupueupue u
lerblerbler b
ebraebraebr a
eyreyrey r
rbrbr b
rr r
|----- Clyde ----| |------ Oak -----| |---- Samford ---| Bakery
1 2 2
Interaction The SAS system option FORMCHAR= specifies the default
formatting characters. The system option defines the entire string of
formatting characters. The FORMCHAR= option in a procedure can
redefine selected characters.
See For information about which hexadecimal codes to use for which
characters, consult the documentation for your hardware.
LPI=value
specifies the proportions of PIE and STAR charts. The value is determined by
lines per inch / columns per inch * 10
For example, if you have a printer with 8 lines per inch and 12 columns per inch,
then specify LPI=6.6667.
346 Chapter 11 / CHART Procedure
Default 6
BLOCK Statement
Produces a block chart.
Restriction: This procedure is not available in SAS Viya orders that include only SAS Visual
Analytics.
Example: “Example 6: Producing Block Charts for BY Groups” on page 384
Syntax
BLOCK variable(s) </ options>;
Required Argument
variable(s)
specifies the variables for which PROC CHART produces a block chart, one
chart for each variable.
Optional Arguments
AXIS=value-expression
specifies the values for the response axis, where value-expression is a list of
individual values, each separated by a space, or a range with a uniform interval
for the values. For example, the following range specifies tick marks on a bar
chart from 0 to 100 at intervals of 10: hbar x / axis=0 to 100 by 10;
Interactions For BLOCK charts, AXIS= sets the scale of the tallest block. To set
the scale, PROC CHART uses the maximum value from the AXIS=
list. If no value is greater than 0, then PROC CHART ignores the
AXIS= option.
FREQ=variable
specifies a data set variable that represents a frequency count for each
observation. Normally, each observation contributes a value of one to the
frequency counts. With FREQ=, each observation contributes its value of the
FREQ= value.
Restriction If the FREQ= values are not integers, then PROC CHART truncates
them.
Interaction If you use SUMVAR=, then PROC CHART multiplies the sums by the
FREQ= value.
GROUP=variable
produces side-by-side charts, with each chart representing the observations
that have a common value for the GROUP= variable. The GROUP= variable can
be character or numeric and is assumed to be discrete. For example, the
following statement produces a frequency bar chart for men and women in each
department:
vbar gender / group=dept;
G100
specifies that the sum of percentages for each group equals 100. By default,
PROC CHART uses 100% as the total sum. For example, if you produce a bar
chart that separates males and females into three age categories, then the six
bars, by default, add to 100%. However, with G100, the three bars for females
add to 100%, and the three bars for males add to 100%.
LEVELS=number-of-midpoints
specifies the number of bars that represent each chart variable when the
variables are continuous.
MIDPOINTS=midpoint-specification | OLD
defines the range of values that each bar, block, or section represents by
specifying the range midpoints.
midpoint-specification
specifies midpoints, either individually, or across a range at a uniform
interval. For example, the following statement produces a chart with five
bars. The first bar represents the range of values of X with a midpoint of 10.
The second bar represents the range with a midpoint of 20, and so on:
vbar x / midpoints=10 20 30 40 50;
348 Chapter 11 / CHART Procedure
OLD
specifies an algorithm that PROC CHART used in previous versions of SAS
to choose midpoints for continuous variables. The old algorithm was based
on the work of Nelder (1976). The current algorithm that PROC CHART uses
if you omit OLD is based on the work of Terrell and Scott (1985).
Default Without MIDPOINTS=, PROC CHART displays the values in the SAS
System's normal sorted order.
MISSING
specifies that missing values are valid levels for the chart variable.
NOHEADER
suppresses the default header line printed at the top of a chart.
Alias NOHEADING
NOSYMBOL
suppresses printing of the subgroup symbol or legend table.
Alias NOLEGEND
SUBGROUP=variable
subdivides each bar or block into characters that show the contribution of the
values of variable to that bar or block. PROC CHART uses the first character of
each value to fill in the portion of the bar or block that corresponds to that
value, unless more than one value begins with the same first character. In that
case, PROC CHART uses the letters A, B, C, and so on, to fill in the bars or
blocks. If the variable is formatted, then PROC CHART uses the first character
of the formatted value.
The characters used in the chart and the values that they represent are given in
a legend at the bottom of the chart. The subgroup symbols are ordered A
through Z and 0 through 9 with the characters in ascending order.
PROC CHART calculates the height of a bar or block for each subgroup
individually and then rounds the percentage of the total bar up or down. So the
total height of the bar can be higher or lower than the same bar without the
SUBGROUP= option.
Interaction If you use both TYPE=MEAN and SUBGROUP=, then PROC CHART
first calculates the mean for each variable that is listed in the
SUMVAR= option. It then subdivides the bar into the percentages
that each subgroup contributes.
BLOCK Statement 349
SUMVAR=variable
specifies the variable for which either values or means (depending on the value
of TYPE=) PROC CHART displays in the chart.
Interaction If you use SUMVAR= and you use TYPE= with a value other than
MEAN or SUM, then TYPE=SUM overrides the specified TYPE=
value.
Tip Both HBAR and VBAR charts can print labels for SUMVAR=
variables if you use a LABEL statement.
SYMBOL=character(s)
specifies the character or characters that PROC CHART uses in the bars or
blocks of the chart when you do not use the SUBGROUP= option.
Interaction If the SAS system option OVP is in effect and if your printing device
supports overprinting, then you can specify up to three characters to
produce overprinted charts.
TYPE=statistic
specifies what the bars or sections in the chart represent. The statistic is one of
the following:
CFREQ
specifies that each bar, block, or section represent the cumulative frequency.
CPERCENT
specifies that each bar, block, or section represent the cumulative
percentage.
Alias CPCT
FREQ
specifies that each bar, block, or section represent the frequency with which
a value or range occurs for the chart variable in the data.
MEAN
specifies that each bar, block, or section represent the mean of the
SUMVAR= variable across all observations that belong to that bar, block, or
section.
350 Chapter 11 / CHART Procedure
Interaction With TYPE=MEAN, you can compute only MEAN and FREQ
statistics.
PERCENT
specifies that each bar, block, or section represent the percentage of
observations that have a given value or that fall into a given range of the
chart variable.
Alias PCT
SUM
specifies that each bar, block, or section represent the sum of the SUMVAR=
variable for the observations that correspond to each bar, block, or section.
Interaction With TYPE=SUM, you can compute only SUM and FREQ
statistics.
Details
Statement Results
Because each block chart must fit on one output page, you might have to adjust the
SAS system options LINESIZE= and PAGESIZE= if you have a large number of
charted values for the BLOCK variable and for the variable specified in the
GROUP= option.
The following table shows the maximum number of charted values of BLOCK
variables for selected LINESIZE= (LS=) specifications that can fit on a 66-line page.
GROUP= Value LS= 132 LS= 120 LS= 105 LS= 90 LS= 76 LS= 64
0,1 9 8 7 6 5 4
2 8 8 7 6 5 4
3 8 7 6 5 4 3
4 7 7 6 5 4 3
5,6 7 6 5 4 3 2
BY Statement 351
If the value of any GROUP= level is longer than three characters, then the
maximum number of charted values for the BLOCK variable that can fit might be
reduced by one. BLOCK level values truncate to 12 characters. If you exceed these
limits, then PROC CHART produces a horizontal bar chart instead.
BY Statement
Produces a separate chart for each BY group.
Syntax
BY <DESCENDING> variable-1
<<DESCENDING> variable-2 …>
<NOTSORTED>;
Required Argument
variable
specifies the variable that the procedure uses to form BY groups. You can
specify more than one variable. If you do not use the NOTSORTED option in the
BY statement, then the observations in the data set must either be sorted by all
the variables that you specify, or they must be indexed appropriately. Variables
in a BY statement are called BY variables.
Optional Arguments
DESCENDING
specifies that the observations are sorted in descending order by the variable
that immediately follows the word DESCENDING in the BY statement.
NOTSORTED
specifies that observations are not necessarily sorted in alphabetic or numeric
order. The observations are grouped in another way (for example, chronological
order).
HBAR Statement
Produces a horizontal bar chart.
Tip: HBAR charts can print either the name or the label of the chart variable.
See: “Example 5: Producing a Horizontal Bar Chart for a Subset of the Data” on page 382
Syntax
HBAR variable(s) </ options>;
Required Argument
variable(s)
specifies the variables for which PROC CHART produces a horizontal bar chart,
one chart for each variable.
Optional Arguments
ASCENDING
prints the bars and any associated statistics in ascending order of size within
groups.
Alias ASC
AXIS=value-expression
specifies the values for the response axis, where value-expression is a list of
individual values, each separated by a space, or a range with a uniform interval
for the values. For example, the following range specifies tick marks on a bar
chart from 0 to 100 at intervals of 10: hbar x / axis=0 to 100 by 10;
Interactions For HBAR and VBAR charts, AXIS= determines tick marks on the
response axis. If the AXIS= specification contains only one value,
then the value determines the minimum tick mark if the value is
less than 0, or determines the maximum tick mark if the value is
greater than 0.
only the data in the range 3 to 5 appears on the chart. Values out of range
produce a warning message in the SAS log.
CFREQ
prints the cumulative frequency.
CPERCENT
prints the cumulative percentages.
DISCRETE
specifies that a numeric chart variable is discrete rather than continuous.
Without DISCRETE, PROC CHART assumes that all numeric variables are
continuous and automatically chooses intervals for them unless you use
MIDPOINTS= or LEVELS=.
FREQ
prints the frequency of each bar to the side of the chart.
FREQ=variable
specifies a data set variable that represents a frequency count for each
observation. Normally, each observation contributes a value of one to the
frequency counts. With FREQ=, each observation contributes its value of the
FREQ= value.
Restriction If the FREQ= values are not integers, then PROC CHART truncates
them.
Interaction If you use SUMVAR=, then PROC CHART multiplies the sums by the
FREQ= value.
GROUP=variable
produces side-by-side charts, with each chart representing the observations
that have a common value for the GROUP= variable. The GROUP= variable can
be character or numeric and is assumed to be discrete. For example, the
following statement produces a frequency bar chart for men and women in each
department:
vbar gender / group=dept;
GSPACE=n
specifies the amount of extra space between groups of bars. Use GSPACE=0 to
leave no extra space between adjacent groups of bars.
G100
specifies that the sum of percentages for each group equals 100. By default,
PROC CHART uses 100% as the total sum. For example, if you produce a bar
chart that separates males and females into three age categories, then the six
bars, by default, add to 100%. However, with G100, the three bars for females
add to 100%, and the three bars for males add to 100%.
LEVELS=number-of-midpoints
specifies the number of bars that represent each chart variable when the
variables are continuous.
MEAN
prints the mean of the observations represented by each bar.
MISSING
specifies that missing values are valid levels for the chart variable.
NOSTATS
suppresses the statistics on a horizontal bar chart.
Alias NOSTAT
NOSYMBOL
suppresses printing of the subgroup symbol or legend table.
Alias NOLEGEND
NOZEROS
suppresses any bar with zero frequency.
PERCENT
prints the percentages of observations having a given value for the chart
variable.
REF=value(s)
draws reference lines on the response axis at the specified positions.
Tip The REF= values should correspond to values of the TYPE= statistic.
SPACE=n
specifies the amount of space between individual bars.
Use the GSPACE= option to specify the amount of space between the
bars within each group.
SUBGROUP=variable
subdivides each bar or block into characters that show the contribution of the
values of variable to that bar or block. PROC CHART uses the first character of
each value to fill in the portion of the bar or block that corresponds to that
value, unless more than one value begins with the same first character. In that
case, PROC CHART uses the letters A, B, C, and so on, to fill in the bars or
blocks. If the variable is formatted, then PROC CHART uses the first character
of the formatted value.
The characters used in the chart and the values that they represent are given in
a legend at the bottom of the chart. The subgroup symbols are ordered A
through Z and 0 through 9 with the characters in ascending order.
PROC CHART calculates the height of a bar or block for each subgroup
individually and then rounds the percentage of the total bar up or down. So the
total height of the bar can be higher or lower than the same bar without the
SUBGROUP= option.
Interaction If you use both TYPE=MEAN and SUBGROUP=, then PROC CHART
first calculates the mean for each variable that is listed in the
SUMVAR= option. It then subdivides the bar into the percentages
that each subgroup contributes.
SUM
prints the total number of observations that each bar represents.
Restrictions Available only when you use both SUMVAR= and TYPE=
SUMVAR=variable
specifies the variable for which either values or means (depending on the value
of TYPE=) PROC CHART displays in the chart.
Interaction If you use SUMVAR= and you use TYPE= with a value other than
MEAN or SUM, then TYPE=SUM overrides the specified TYPE=
value.
Tip HBAR charts can print labels for SUMVAR= variables if you use a
LABEL statement.
SYMBOL=character(s)
specifies the character or characters that PROC CHART uses in the bars or
blocks of the chart when you do not use the SUBGROUP= option.
Interaction If the SAS system option OVP is in effect and if your printing device
supports overprinting, then you can specify up to three characters to
produce overprinted charts.
TYPE=statistic
specifies what the bars or sections in the chart represent. The statistic is one of
the following:
CFREQ
specifies that each bar, block, or section represent the cumulative frequency.
CPERCENT
specifies that each bar, block, or section represent the cumulative
percentage.
Alias CPCT
FREQ
specifies that each bar, block, or section represent the frequency with which
a value or range occurs for the chart variable in the data.
MEAN
specifies that each bar, block, or section represent the mean of the
SUMVAR= variable across all observations that belong to that bar, block, or
section.
Interaction With TYPE=MEAN, you can compute only MEAN and FREQ
statistics.
PERCENT
specifies that each bar, block, or section represent the percentage of
observations that have a given value or that fall into a given range of the
chart variable.
Alias PCT
SUM
specifies that each bar, block, or section represent the sum of the SUMVAR=
variable for the observations that correspond to each bar, block, or section.
PIE Statement 357
Interaction With TYPE=SUM, you can compute only SUM and FREQ
statistics.
WIDTH=n
specifies the width of the bars on bar charts.
Details
Statement Results
Each chart occupies one or more output pages, depending on the number of bars;
each bar occupies one line, by default.
By default, for horizontal bar charts of TYPE=FREQ, CFREQ, PCT, or CPCT, PROC
CHART prints the following statistics: frequency, cumulative frequency,
percentage, and cumulative percentage. If you use one or more of the statistics
options, then PROC CHART prints only the statistics that you request, plus the
frequency.
PIE Statement
Produces a pie chart.
Syntax
PIE variable(s) </ options>;
Required Argument
variable(s)
specifies the variables for which PROC CHART produces a pie chart, one chart
for each variable.
Optional Arguments
FREQ=variable
specifies a data set variable that represents a frequency count for each
observation. Normally, each observation contributes a value of one to the
frequency counts. With FREQ=, each observation contributes its value of the
FREQ= value.
358 Chapter 11 / CHART Procedure
Restriction If the FREQ= values are not integers, then PROC CHART truncates
them.
Interaction If you use SUMVAR=, then PROC CHART multiplies the sums by the
FREQ= value.
LEVELS=number-of-midpoints
specifies the number of bars that represent each chart variable when the
variables are continuous.
MIDPOINTS=midpoint-specification | OLD
defines the range of values that each bar, block, or section represents by
specifying the range midpoints.
midpoint-specification
specifies midpoints, either individually, or across a range at a uniform
interval. For example, the following statement produces a chart with five
bars. The first bar represents the range of values of X with a midpoint of 10.
The second bar represents the range with a midpoint of 20, and so on:
vbar x / midpoints=10 20 30 40 50;
OLD
specifies an algorithm that PROC CHART used in previous versions of SAS
to choose midpoints for continuous variables. The old algorithm was based
on the work of Nelder (1976). The current algorithm that PROC CHART uses
if you omit OLD is based on the work of Terrell and Scott (1985).
Default Without MIDPOINTS=, PROC CHART displays the values in the SAS
System's normal sorted order.
MISSING
specifies that missing values are valid levels for the chart variable.
NOHEADER
suppresses the default header line printed at the top of a chart.
Alias NOHEADING
SUMVAR=variable
specifies the variable for which either values or means (depending on the value
of TYPE=) PROC CHART displays in the chart.
PIE Statement 359
Interaction If you use SUMVAR= and you use TYPE= with a value other than
MEAN or SUM, then TYPE=SUM overrides the specified TYPE=
value.
Tip Both HBAR and VBAR charts can print labels for SUMVAR=
variables if you use a LABEL statement.
TYPE=statistic
specifies what the bars or sections in the chart represent. The statistic is one of
the following:
CFREQ
specifies that each bar, block, or section represent the cumulative frequency.
CPERCENT
specifies that each bar, block, or section represent the cumulative
percentage.
Alias CPCT
FREQ
specifies that each bar, block, or section represent the frequency with which
a value or range occurs for the chart variable in the data.
MEAN
specifies that each bar, block, or section represent the mean of the
SUMVAR= variable across all observations that belong to that bar, block, or
section.
Interaction With TYPE=MEAN, you can compute only MEAN and FREQ
statistics.
PERCENT
specifies that each bar, block, or section represent the percentage of
observations that have a given value or that fall into a given range of the
chart variable.
Alias PCT
SUM
specifies that each bar, block, or section represent the sum of the SUMVAR=
variable for the observations that correspond to each bar, block, or section.
360 Chapter 11 / CHART Procedure
Interaction With TYPE=SUM, you can compute only SUM and FREQ
statistics.
Details
Statement Results
PROC CHART determines the number of slices for the pie in the same way that it
determines the number of bars for vertical bar charts. Any slices of the pie
accounting for less than three print positions are grouped together into an "OTHER"
category.
The pie's size is determined only by the SAS system options LINESIZE= and
PAGESIZE=. By default, the pie looks elliptical if your printer does not print 6 lines
per inch and 10 columns per inch. To make a circular pie chart on a printer that does
not print 6 lines and 10 columns per inch, use the LPI= option in the PROC CHART
statement. See the description of “LPI=value” on page 345 for the formula that
gives you the proper LPI= value for your printer.
If you try to create a PIE chart for a variable with more than 50 levels, then PROC
CHART produces a horizontal bar chart instead.
STAR Statement
Produces a star chart.
Syntax
STAR variable(s) </ options>;
Required Argument
variable(s)
specifies the variables for which PROC CHART produces a star chart, one chart
for each variable.
Optional Arguments
AXIS=value-expression
specifies the values for the response axis, where value-expression is a list of
individual values, each separated by a space, or a range with a uniform interval
STAR Statement 361
for the values. For example, the following range specifies tick marks on a bar
chart from 0 to 100 at intervals of 10: hbar x / axis=0 to 100 by 10;
Interactions For STAR charts, a single AXIS= value sets the minimum (the center
of the chart) if the value is less than zero, or sets the maximum (the
outside circle) if the value is greater than zero. If the AXIS=
specification contains more than one value, then PROC CHART
uses the minimum and maximum values from the list.
FREQ=variable
specifies a data set variable that represents a frequency count for each
observation. Normally, each observation contributes a value of one to the
frequency counts. With FREQ=, each observation contributes its value of the
FREQ= value.
Restriction If the FREQ= values are not integers, then PROC CHART truncates
them.
Interaction If you use SUMVAR=, then PROC CHART multiplies the sums by the
FREQ= value.
LEVELS=number-of-midpoints
specifies the number of bars that represent each chart variable when the
variables are continuous.
MIDPOINTS=midpoint-specification | OLD
defines the range of values that each bar, block, or section represents by
specifying the range midpoints.
midpoint-specification
specifies midpoints, either individually, or across a range at a uniform
interval. For example, the following statement produces a chart with five
bars. The first bar represents the range of values of X with a midpoint of 10.
The second bar represents the range with a midpoint of 20, and so on:
vbar x / midpoints=10 20 30 40 50;
OLD
specifies an algorithm that PROC CHART used in previous versions of SAS
to choose midpoints for continuous variables. The old algorithm was based
on the work of Nelder (1976). The current algorithm that PROC CHART uses
if you omit OLD is based on the work of Terrell and Scott (1985).
Default Without MIDPOINTS=, PROC CHART displays the values in the SAS
System's normal sorted order.
MISSING
specifies that missing values are valid levels for the chart variable.
NOHEADER
suppresses the default header line printed at the top of a chart.
Alias NOHEADING
SUMVAR=variable
specifies the variable for which either values or means (depending on the value
of TYPE=) PROC CHART displays in the chart.
Interaction If you use SUMVAR= and you use TYPE= with a value other than
MEAN or SUM, then TYPE=SUM overrides the specified TYPE=
value.
Tip Both HBAR and VBAR charts can print labels for SUMVAR=
variables if you use a LABEL statement.
TYPE=statistic
specifies what the bars or sections in the chart represent. The statistic is one of
the following:
CFREQ
specifies that each bar, block, or section represent the cumulative frequency.
CPERCENT
specifies that each bar, block, or section represent the cumulative
percentage.
Alias CPCT
STAR Statement 363
FREQ
specifies that each bar, block, or section represent the frequency with which
a value or range occurs for the chart variable in the data.
MEAN
specifies that each bar, block, or section represent the mean of the
SUMVAR= variable across all observations that belong to that bar, block, or
section.
Interaction With TYPE=MEAN, you can compute only MEAN and FREQ
statistics.
PERCENT
specifies that each bar, block, or section represent the percentage of
observations that have a given value or that fall into a given range of the
chart variable.
Alias PCT
SUM
specifies that each bar, block, or section represent the sum of the SUMVAR=
variable for the observations that correspond to each bar, block, or section.
Interaction With TYPE=SUM, you can compute only SUM and FREQ
statistics.
Details
Statement Results
The number of points in the star is determined in the same way as the number of
bars for vertical bar charts.
If all the data values are positive, then the center of the star represents zero and
the outside circle represents the maximum value. If any data values are negative,
then the center represents the minimum. See the description of the AXIS=value
expression for more information about how to specify maximum and minimum
values. For information about how to specify the proportion of the chart, see the
description of the “LPI=value” on page 345.
If you try to create a star chart for a variable with more than 24 levels, then PROC
CHART produces a horizontal bar chart instead.
364 Chapter 11 / CHART Procedure
VBAR Statement
Produces a vertical bar chart.
Syntax
VBAR variable(s) </ options>;
Required Argument
variable(s)
specifies the variables for which PROC CHART produces a vertical bar chart,
one chart for each variable.
Optional Arguments
ASCENDING
prints the bars and any associated statistics in ascending order of size within
groups.
Alias ASC
AXIS=value-expression
specifies the values for the response axis, where value-expression is a list of
individual values, each separated by a space, or a range with a uniform interval
for the values. For example, the following range specifies tick marks on a bar
chart from 0 to 100 at intervals of 10: hbar x / axis=0 to 100 by 10;
Interactions For HBAR and VBAR charts, AXIS= determines tick marks on the
response axis. If the AXIS= specification contains only one value,
then the value determines the minimum tick mark if the value is
less than 0, or determines the maximum tick mark if the value is
greater than 0.
DISCRETE
specifies that a numeric chart variable is discrete rather than continuous.
Without DISCRETE, PROC CHART assumes that all numeric variables are
continuous and automatically chooses intervals for them unless you use
MIDPOINTS= or LEVELS=.
FREQ=variable
specifies a data set variable that represents a frequency count for each
observation. Normally, each observation contributes a value of one to the
frequency counts. With FREQ=, each observation contributes its value of the
FREQ= value.
Restriction If the FREQ= values are not integers, then PROC CHART truncates
them.
Interaction If you use SUMVAR=, then PROC CHART multiplies the sums by the
FREQ= value.
GROUP=variable
produces side-by-side charts, with each chart representing the observations
that have a common value for the GROUP= variable. The GROUP= variable can
be character or numeric and is assumed to be discrete. For example, the
following statement produces a frequency bar chart for men and women in each
department:
vbar gender / group=dept;
GSPACE=n
specifies the amount of extra space between groups of bars. Use GSPACE=0 to
leave no extra space between adjacent groups of bars.
G100
specifies that the sum of percentages for each group equals 100. By default,
PROC CHART uses 100% as the total sum. For example, if you produce a bar
chart that separates males and females into three age categories, then the six
bars, by default, add to 100%. However, with G100, the three bars for females
add to 100%, and the three bars for males add to 100%.
LEVELS=number-of-midpoints
specifies the number of bars that represent each chart variable when the
variables are continuous.
MIDPOINTS=midpoint-specification | OLD
defines the range of values that each bar, block, or section represents by
specifying the range midpoints.
midpoint-specification
specifies midpoints, either individually, or across a range at a uniform
interval. For example, the following statement produces a chart with five
bars; the first bar represents the range of values of X with a midpoint of 10,
the second bar represents the range with a midpoint of 20, and so on:
vbar x / midpoints=10 20 30 40 50;
OLD
specifies an algorithm that PROC CHART used in previous versions of SAS
to choose midpoints for continuous variables. The old algorithm was based
on the work of Nelder (1976). The current algorithm that PROC CHART uses
if you omit OLD is based on the work of Terrell and Scott (1985).
Default Without MIDPOINTS=, PROC CHART displays the values in the SAS
System's normal sorted order.
Restriction When the VBAR variables are numeric, the midpoints must be given
in ascending order.
MISSING
specifies that missing values are valid levels for the chart variable.
NOSYMBOL
suppresses printing of the subgroup symbol or legend table.
Alias NOLEGEND
NOZEROS
suppresses any bar with zero frequency.
REF=value(s)
draws reference lines on the response axis at the specified positions.
Tip The REF= values should correspond to values of the TYPE= statistic.
SPACE=n
specifies the amount of space between individual bars.
Use the GSPACE= option to specify the amount of space between the
bars within each group.
SUBGROUP=variable
subdivides each bar or block into characters that show the contribution of the
values of variable to that bar or block. PROC CHART uses the first character of
each value to fill in the portion of the bar or block that corresponds to that
value, unless more than one value begins with the same first character. In that
case, PROC CHART uses the letters A, B, C, and so on, to fill in the bars or
blocks. If the variable is formatted, then PROC CHART uses the first character
of the formatted value.
The characters used in the chart and the values that they represent are given in
a legend at the bottom of the chart. The subgroup symbols are ordered A
through Z and 0 through 9 with the characters in ascending order.
PROC CHART calculates the height of a bar or block for each subgroup
individually and then rounds the percentage of the total bar up or down. So the
total height of the bar can be higher or lower than the same bar without the
SUBGROUP= option.
Interaction If you use both TYPE=MEAN and SUBGROUP=, then PROC CHART
first calculates the mean for each variable that is listed in the
SUMVAR= option. It then subdivides the bar into the percentages
that each subgroup contributes.
SUMVAR=variable
specifies the variable for which either values or means (depending on the value
of TYPE=) PROC CHART displays in the chart.
Interaction If you use SUMVAR= and you use TYPE= with a value other than
MEAN or SUM, then TYPE=SUM overrides the specified TYPE=
value.
Tip VBAR charts can print labels for SUMVAR= variables if you use a
LABEL statement.
SYMBOL=character(s)
specifies the character or characters that PROC CHART uses in the bars or
blocks of the chart when you do not use the SUBGROUP= option.
368 Chapter 11 / CHART Procedure
Interaction If the SAS system option OVP is in effect and if your printing device
supports overprinting, then you can specify up to three characters to
produce overprinted charts.
TYPE=statistic
specifies what the bars or sections in the chart represent. The statistic is one of
the following:
CFREQ
specifies that each bar, block, or section represent the cumulative frequency.
CPERCENT
specifies that each bar, block, or section represent the cumulative
percentage.
Alias CPCT
FREQ
specifies that each bar, block, or section represent the frequency with which
a value or range occurs for the chart variable in the data.
MEAN
specifies that each bar, block, or section represent the mean of the
SUMVAR= variable across all observations that belong to that bar, block, or
section.
Interaction With TYPE=MEAN, you can compute only MEAN and FREQ
statistics.
PERCENT
specifies that each bar, block, or section represent the percentage of
observations that have a given value or that fall into a given range of the
chart variable.
Alias PCT
SUM
specifies that each bar, block, or section represent the sum of the SUMVAR=
variable for the observations that correspond to each bar, block, or section.
Interaction With TYPE=SUM, you can compute only SUM and FREQ
statistics.
WIDTH=n
specifies the width of the bars on bar charts.
Results: CHART Procedure 369
Details
Statement Results
PROC CHART prints one page per chart. Along the vertical axis, PROC CHART
describes the chart frequency, the cumulative frequency, the chart percentage, the
cumulative percentage, the sum, or the mean. At the bottom of each bar, PROC
CHART prints a value according to the value of the TYPE= option, if specified. For
character variables or discrete numeric variables, this value is the actual value
represented by the bar. For continuous numeric variables, the value gives the
midpoint of the interval represented by the bar.
PROC CHART can automatically scale the vertical axis, determine the bar width,
and choose spacing between the bars. However, by using options, you can choose
bar intervals and the number of bars, include missing values in the chart, produce
side-by-side charts, and subdivide the bars. If the number of characters per line
(LINESIZE=) is not sufficient to display all vertical bars, then PROC CHART
produces a horizontal bar chart instead.
Missing Values
PROC CHART follows these rules when handling missing values:
n Missing values are considered as valid levels for the chart variable when you use
the MISSING option.
n Missing values for a GROUP= or SUBGROUP= variable are treated as valid
levels.
n PROC CHART ignores missing values for the FREQ= option and the SUMVAR=
option.
n If the value of the FREQ= variable is missing, zero, or negative, then the
observation is excluded from the calculation of the chart statistic.
n If the value of the SUMVAR= variable is missing, then the observation is
excluded from the calculation of the chart statistic.
370 Chapter 11 / CHART Procedure
Details
This example produces a vertical bar chart that shows a frequency count for the
values of the chart variable.
Program
data shirts;
input Size $ @@;
datalines;
medium large
large large
large medium
medium small
small medium
medium large
small medium
large large
large small
medium medium
medium medium
medium large
small small
;
proc chart data=shirts;
vbar size;
title 'Number of Each Shirt Size Sold';
run;
372 Chapter 11 / CHART Procedure
Program Description
Create the Shirts data set. Shirts contains the sizes of a particular shirt that is sold
during a week at a clothing store, with one observation for each shirt that is sold.
data shirts;
input Size $ @@;
datalines;
medium large
large large
large medium
medium small
small medium
medium large
small medium
large large
large small
medium medium
medium medium
medium large
small small
;
Create a vertical bar chart with frequency counts. The VBAR statement produces
a vertical bar chart for the frequency counts of the Size values.
proc chart data=shirts;
vbar size;
Output: HTML
The following frequency chart shows the store's sales of each shirt size for the
week: 9 large shirts, 11 medium shirts, and 6 small shirts.
Example 2: Producing a Percentage Bar Chart 373
Details
This example produces a vertical bar chart. The chart statistic is the percentage for
each category of the total number of shirts sold.
Program
proc chart data=shirts;
vbar size / type=percent;
title 'Percentage of Total Sales for Each Shirt Size';
run;
Program Description
Create a vertical bar chart with percentages. The VBAR statement produces a
vertical bar chart. TYPE= specifies percentage as the chart statistic for the variable
Size.
proc chart data=shirts;
vbar size / type=percent;
Output: HTML
The following chart shows the percentage of total sales for each shirt size. Of all
the shirts sold, about 42.3 percent were medium, 34.6 were large, and 23.1 were
small.
Example 2: Producing a Percentage Bar Chart 375
Details
This example does the following:
n produces a vertical bar chart for categories of one variable with bar lengths that
represent the values of another variable
n subdivides each bar into categories based on the values of a third variable
Program
data piesales;
input Bakery $ Flavor $ Year Pies_Sold;
datalines;
Samford apple 2005 234
Samford apple 2006 288
Samford blueberry 2005 103
Samford blueberry 2006 143
Samford cherry 2005 173
Samford cherry 2006 195
Samford rhubarb 2005 26
Samford rhubarb 2006 28
Oak apple 2005 219
Oak apple 2006 371
Oak blueberry 2005 174
Oak blueberry 2006 206
Oak cherry 2005 226
Oak cherry 2006 311
Oak rhubarb 2005 51
Oak rhubarb 2006 56
Clyde apple 2005 213
Clyde apple 2006 415
Clyde blueberry 2005 177
Clyde blueberry 2006 201
Clyde cherry 2005 230
Clyde cherry 2006 328
Clyde rhubarb 2005 60
Clyde rhubarb 2006 59
;
Example 3: Subdividing the Bars into Categories 377
Program Description
Create the Piesales data set. Piesales contains the number of each flavor of pie
that is sold for two years at three bakeries that are owned by the same company.
One bakery is on Samford Avenue, one on Oak Street, and one on Clyde Drive.
data piesales;
input Bakery $ Flavor $ Year Pies_Sold;
datalines;
Samford apple 2005 234
Samford apple 2006 288
Samford blueberry 2005 103
Samford blueberry 2006 143
Samford cherry 2005 173
Samford cherry 2006 195
Samford rhubarb 2005 26
Samford rhubarb 2006 28
Oak apple 2005 219
Oak apple 2006 371
Oak blueberry 2005 174
Oak blueberry 2006 206
Oak cherry 2005 226
Oak cherry 2006 311
Oak rhubarb 2005 51
Oak rhubarb 2006 56
Clyde apple 2005 213
Clyde apple 2006 415
Clyde blueberry 2005 177
Clyde blueberry 2006 201
Clyde cherry 2005 230
Clyde cherry 2006 328
Clyde rhubarb 2005 60
Clyde rhubarb 2006 59
;
Create a vertical bar chart with the bars that are subdivided into categories. The
VBAR statement produces a vertical bar chart with one bar for each pie flavor.
SUBGROUP= divides each bar into sales for each bakery.
proc chart data=piesales;
vbar flavor / subgroup=bakery
Specify the bar length variable. SUMVAR= specifies Pies_Sold as the variable
whose values are represented by the lengths of the bars.
sumvar=pies_sold;
Output: HTML
In the following output, the bar that represents the sales of apple pies, for example,
shows 1,940 total pies across both years and all three bakeries. The symbol for the
Samford Avenue bakery represents the 522 pies at the top. The symbol for the Oak
Street bakery represents the 690 pies in the middle. The symbol for the Clyde Drive
bakery represents the 728 pies at the bottom of the bar for apple pies. By default,
the labels along the horizontal axis are truncated to eight characters.
Example 4: Producing Side-by-Side Bar Charts 379
TYPE=
Data set: PIESALES
Details
This example does the following:
n charts the mean values of a variable for the categories of another variable
Program
proc chart data=piesales;
vbar flavor / group=bakery
ref=100 200 300
sumvar=pies_sold
type=mean;
title 'Mean Yearly Pie Sales Grouped by Flavor';
title2 'within Bakery Location';
run;
Program Description
Create a side-by-side vertical bar chart. The VBAR statement produces a side-by-
side vertical bar chart to compare the sales across values of Bakery, specified by
GROUP=. Each Bakery group contains a bar for each Flavor value.
proc chart data=piesales;
vbar flavor / group=bakery
Create reference lines. REF= draws reference lines to mark pie sales at 100, 200,
and 300.
ref=100 200 300
Specify the bar length variable. SUMVAR= specifies Pies_Sold as the variable that
is represented by the lengths of the bars.
sumvar=pies_sold
Specify the statistical variable. TYPE= averages the sales for 2005 and 2006 for
each combination of bakery and flavor.
type=mean;
Example 4: Producing Side-by-Side Bar Charts 381
Output: HTML
The following side-by-side bar charts compare the sales of pies by flavor, across
bakeries. For example, for apple pie sales, the mean for the Clyde Drive bakery is
364, the mean for the Oak Street bakery is 345, and the mean for the Samford
Avenue bakery is 261.
382 Chapter 11 / CHART Procedure
Output 11.9 Mean Yearly Pie Sales Grouped by Flavor within Bakery Location
SUMVAR=
WHERE= data set option
Data set: PIESALES
Details
This example does the following:
n produces horizontal bar charts only for observations with a common value
Program
proc chart data=piesales(where=(year=2005));
hbar bakery / group=flavor
sumvar=pies_sold;
title '2005 Pie Sales for Each Bakery According to Flavor';
run;
Program Description
Specify the variable value limitation for the horizontal bar chart. WHERE= limits
the chart to only the 2005 sales totals.
proc chart data=piesales(where=(year=2005));
Create a side-by-side horizontal bar chart. The HBAR statement produces a side-
by-side horizontal bar chart to compare sales across values of Flavor, specified by
GROUP=. Each Flavor group contains a bar for each Bakery value.
hbar bakery / group=flavor
Specify the bar length variable. SUMVAR= specifies Pies_Sold as the variable
whose values are represented by the lengths of the bars.
sumvar=pies_sold;
Output: HTML
Output 11.10 2005 Pie Sales for Each Bakery According to Flavor
Details
This example does the following:
n sorts the data set
Program
proc sort data=piesales out=sorted_piesales;
by year;
run;
options nobyline;
proc chart data=sorted_piesales;
by year;
block bakery / group=flavor
sumvar=pies_sold
noheader
symbol='OX';
title 'Pie Sales for Each Bakery and Flavor';
title2 '#byval(year)';
run;
options byline;
Program Description
Sort the input data set Piesales. PROC SORT sorts Piesales by year. Sorting is
required to produce a separate chart for each year.
proc sort data=piesales out=sorted_piesales;
by year;
run;
Specify the BY group for multiple block charts. The BY statement produces one
chart for 2005 sales and one for 2006 sales.
386 Chapter 11 / CHART Procedure
Create a block chart. The BLOCK statement produces a block chart for each year.
Each chart contains a grid (Bakery values along the bottom, Flavor values along the
side) of cells that contain the blocks.
block bakery / group=flavor
Specify the bar length variable. SUMVAR= specifies Pies_Sold as the variable
whose values are represented by the lengths of the blocks.
sumvar=pies_sold
Suppress the default header line. NOHEADER suppresses the default header line.
noheader
Specify the block symbols. SYMBOL= specifies the symbols in the blocks.
symbol='OX';
Specify the titles. The #BYVAL specification inserts the year into the second line
of the title.
title 'Pie Sales for Each Bakery and Flavor';
title2 '#byval(year)';
run;
Reset the printing of the default BY line. The SAS system option BYLINE resets
the printing of the default BY line.
options byline;
Example 6: Producing Block Charts for BY Groups 387
Output: HTML
Output 11.11 2005 Pie Sales for Each Bakery and Flavor
388 Chapter 11 / CHART Procedure
Output 11.12 2006 Pie Sales for Each Bakery and Flavor
References
Nelder, J.A. 1976. “A Simple Algorithm for Scaling Graphs.” Applied Statistics 25 (1)
London, England: The Royal Statistical Society: 94–96.
Terrell, G.R. and D.W. Scott. 1985. “Oversmoothed Nonparametric Density
Estimates.” Journal of the American Statistical Association 80 (389): 209–214.
389
12
CIMPORT Procedure
PROC CIMPORT also converts SAS files, which means that it changes the format of
a SAS file from the SAS format appropriate for one version of SAS to the SAS
format appropriate for another version. For example, you can use PROC CPORT and
PROC CIMPORT to move files from earlier releases of SAS to more recent releases
(for example, from SAS 8 to SAS ®9) or between the same versions (for example,
from one SAS 9 operating environment to another SAS 9 operating environment).
PROC CIMPORT automatically converts the transport file as it imports it.
However, PROC CPORT and PROC CIMPORT do not allow file transport from a
later version to an earlier version (known as regressing). For example, transporting
is not allowed from SAS® 9 to SAS 8.
Note: PROC CIMPORT and PROC CPORT can be used to back up graphic catalogs.
PROC COPY cannot be used to back up graphic catalogs.
PROC CIMPORT produces no output, but it does write notes to the SAS log.
2 The transport file is transferred from the source computer to the target
computer.
3 The transport file is read at the target computer using PROC CIMPORT.
Note: Transport files that are created using PROC CPORT are not
interchangeable with transport files that are created using the XPORT engine.
For complete details about the steps to create a transport file (PROC CPORT), to
transfer the transport file, and to restore the transport file (PROC CIMPORT), see
Moving and Accessing SAS Files.
In SAS Viya 3.5, PROC CIMPORT provides the EXTENDVAR= and the
EXTENDFORMAT = options to prevent truncation and to ensure that the SAS
supplied format width is sufficient for the transcoded data.
PROC CIMPORT Statement 391
Note: The formats supported are SAS supplied formats only. User-defined formats
are not supported.
For more information about migrating data to UTF-8 encoding, see the following
resources.
n “UTF-8 Encoding” in Migrating Data to UTF-8 for SAS Viya
Syntax
PROC CIMPORT destination=libref | <libref.> member-name <options>;
Required Argument
destination=libref | < libref. >member-name
identifies the type of file to import and specifies the catalog, SAS data set, or
SAS library to import.
destination
identifies the file or files in the transport file as a single catalog, as a single
SAS data set, or as the members of a SAS library. The destination argument
can be one of the following:
CATALOG | CAT | C
DATA | DS | D
LIBRARY | LIB | L
libref
<libref. > member-name
specifies the specific catalog, SAS data set, or SAS library as the destination
of the transport file. If the destination argument is CATALOG or DATA, you
can specify both a libref and a member name. If the libref is omitted, PROC
CIMPORT uses the default library as the libref, which is usually the WORK
library. If the destination argument is LIBRARY, specify only a libref.
Optional Arguments
COMPRESS=NO | CHAR | BINARY
specifies whether the resulting CIMPORT data set is compressed. You can
specify the compression type that is being used.
NO
specifies that the data set produced by CIMPORT is not compressed.
394 Chapter 12 / CIMPORT Procedure
Alias OFF | N
CHAR
specifies that the observations in a newly created SAS data set are
compressed (producing variable-length records) by using RLE (Run Length
Encoding). RLE compresses observations by reducing repeated runs of the
same character (including blanks) to two-byte or three-byte representations.
Alias ON | YES | Y
BINARY
specifies that the observations in a newly created SAS data set are
compressed (producing variable-length records) by using RDC (Ross Data
Compression). RDC combines run-length encoding and sliding-window
compression to compress the file by representing repeated byte patterns
more efficiently.
See For more information about creating and restoring transport files, see
Moving and Accessing SAS Files.
EET=(etype(s))
excludes specified entry types from the import process. If the etype is a single
entry type, then you can omit the parentheses. Separate multiple values with
spaces.
Interaction You cannot specify both the EET= option and the ET= option in the
same PROC CIMPORT step.
ENCODINGINFO=ALL | n
specifies the number of data set headers to read in order to output the encoding
value of the data set to the log.
ALL
specifies that all data set headers are read. The encoding value stored for
the data set header is output to the SAS log.
n
specifies an integer greater than zero. This value represents the number of
data set headers to read. The encoding value stored for the data set header
is output to the SAS log.
Range Specify an integer greater than zero. The upper limit is dependent on
system contraints. All data is still processed when n is larger than the
number of data sets in the transport file.
PROC CIMPORT Statement 395
Note If catalogs are encountered in the library during the process of reading
the data set headers, they are skipped. Catalog headers do not contain
encoding values.
See For more information about creating and restoring transport files, see
Moving and Accessing SAS Files.
ET=(etype(s))
specifies the entry types to import. If the etype is a single entry type, then you
can omit the parentheses. Separate multiple values with spaces.
Interaction You cannot specify both the EET= option and the ET= option in the
same PROC CIMPORT step.
EXTENDSN=YES | NO
specifies whether to extend by 1 byte the length of short numerics (fewer than 8
bytes) when you import them. You can avoid a loss of precision when you
transport a short numeric in IBM format to IEEE format if you extend its length.
You cannot extend the length of an 8-byte short numeric.
Default YES
EXTENDFORMAT=YES | NO
specifies whether to extend the character format width.
Alias Y, N
Default YES
Restrictions This option is supported in SAS Viya 3.5 but is not supported in
SAS 9.
EXTENDVAR=multiplier | AUTO
specifies a multiplier value that expands character variable lengths when you
are processing a SAS data file that requires transcoding. The expanded length
helps to ensure that character data is not truncated.
multiplier
Specify a multiplier value from 1 to 5 or you can specify AUTO. The lengths
for character variables are increased by multiplying the current length by the
specified value. The default value is 1. When 1 is used, the variable length is
not extended. When AUTO is specified, the multiplier is set automatically.
Auto
AUTO is used to set the multiplier. This value is dependant on the encoding
of the session and the data set in the transport file. When the file that is
specified by INFILE= is in ANSI encoding or has no character encoding ID
(CEI) information, a multiplier value of one is used. Otherwise, the extended
length is calculated for you.
Default 1. When the default is used, the variable length is not extended.
Restrictions This option is supported in SAS Viya 3.5 but is not supported in
SAS 9.
ISFILEUTF8=YES | NO
explicitly designates the encoding of a data set that is contained in a transport
file as UTF-8. Although data set encodings are recorded (or stamped) in SAS 9.2
transport files, encodings are not stamped in transport files created using SAS
releases before 9.2. Therefore, designating the UTF-8 encoding is useful under
these conditions:
n The data set in the transport file was created using a SAS release before 9.2.
The person who restores the transport file in the target environment should
have a description of the transport file in advance of the restore operation.
yes
specifies that the data set in the transport file is encoded as UTF-8.
no
specifies that the data set in the transport file is not encoded as UTF-8. NO
is the default.
In order to successfully import a transport file in the target SAS session, you
should have this information about the transport file:
n source operating environment. Windows for example.
n national language of the character data. For example, American English (or
en_US)
Default NO
Restriction PROC CIMPORT uses this option only if the transport file is not
stamped with the encoding of the data set. Encodings were not
recorded in SAS releases before 9.2. If an encoding is recorded in the
transport file and the ISFILEUTF8= option is specified in PROC
CIMPORT, ISFILEUTF8= is ignored.
FORCE
enables access to a locked catalog. By default, PROC CIMPORT locks the
catalog that it is updating to prevent other users from accessing the catalog
while it is being updated. The FORCE option overrides this lock, which allows
other users to access the catalog while it is being imported, or enables you to
import a catalog that is currently being accessed by other users.
CAUTION
The FORCE option can lead to unpredictable results. The FORCE option
allows multiple users to access the same catalog entry simultaneously.
INFILE=fileref | 'filename'
specifies a previously defined fileref or the filename of the transport file to read.
If you omit the INFILE= option, then PROC CIMPORT attempts to read from a
transport file with the fileref SASCAT. If a fileref SASCAT does not exist, then
PROC CIMPORT attempts to read from a file named SASCAT.DAT.
Alias FILE=
MEMTYPE=mtype
specifies that only data sets, only catalogs, or both, be imported from the
transport file. Values for mtype can be as follows:
NEW
creates a new catalog to contain the contents of the imported transport file
when the destination that you specify has the same name as an existing catalog.
NEW deletes any existing catalog with the same name as the one you specify as
a destination for the import. If you do not specify NEW, and the destination that
you specify has the same name as an existing catalog, PROC CIMPORT appends
the imported transport file to the existing catalog.
NOEDIT
imports SAS/AF PROGRAM and SCL entries without Edit capability.
You obtain the same results if you create a new catalog to contain SCL code by
using the MERGE statement with the NOEDIT option in the BUILD procedure of
SAS/AF software.
Note: The NOEDIT option affects only SAS/AF PROGRAM and SCL entries. It
does not affect FSEDIT SCREEN and FSVIEW FORMULA entries.
Alias NEDIT
NOSRC
suppresses the importing of source code for SAS/AF entries that contain
compiled SCL code.
You obtain the same results if you create a new catalog to contain SCL code by
using the MERGE statement with the NOSOURCE option in the BUILD
procedure of SAS/AF software.
Alias NSRC
Interaction PROC CIMPORT ignores the NOSRC option if you use it with an
entry type other than FRAME, PROGRAM, or SCL.
SORT
causes the output data set to be re-sorted if necessary when importing a sorted
data set using PROC CIMPORT. A re-sort is necessary if the data set to be
imported is sorted by one or more character variables and the ordering of
character values in the target SAS session is different from that of the source
session.
Note: The CIMPORT SORT option has no effect on data sets that do not
contain sort information. This applies only to data sets that were sorted
previously.
EXCLUDE Statement 399
When PROC CIMPORT re-sorts a data set, the following information is output:
NOTE: PROC CIMPORT re-sorted the data set WORK.TEMP because it contained
character variables in the sort key, and the collating sequence of
the sort differs from the local host.
TAPE
reads the input transport file from a tape.
UPCASE
reads from the transport file and writes the alphabetic characters in uppercase
to the output file.
Restriction The UPCASE option is allowed only if SAS is built with a Double-
Byte Character Set (DBCS).
Tip PROC CPORT can be used to create a transport file that includes all
uppercase characters. See option “OUTTYPE=UPCASE” on page 545
for details.
EXCLUDE Statement
Excludes specified files or entries from the import process.
Interaction: You can use either EXCLUDE statements or SELECT statements in a PROC
CIMPORT step, but not both.
Tip: There is no limit to the number of EXCLUDE statements that you can use in one
invocation of PROC CIMPORT.
Syntax
EXCLUDE SAS file(s) | catalog entry(s) </ MEMTYPE=mtype>
</ ENTRYTYPE=entry-type>;
400 Chapter 12 / CIMPORT Procedure
Required Argument
SAS file(s) | catalog entry(s)
specifies one or more SAS files or one or more catalog entries to be excluded
from the import process. Specify SAS filenames if you import a library; specify
catalog entry names if you import an individual SAS catalog. Separate multiple
filenames or entry names with a space. You can use shortcuts to list many like-
named files in the EXCLUDE statement. For more information, see “SAS Data
Sets” in SAS Language Reference: Concepts.
Optional Arguments
ENTRYTYPE=entry-type
specifies a single entry type for one or more catalog entries that are listed in the
EXCLUDE statement. See SAS Language Reference: Concepts for a complete list
of catalog entry types.
MEMTYPE=mtype
specifies a single member type for one or more SAS files listed in the EXCLUDE
statement. Values for mtype can be
Default ALL
SELECT Statement
Specifies individual files or entries to import.
Interaction: You can use either EXCLUDE statements or SELECT statements in a PROC
CIMPORT step, but not both.
SELECT Statement 401
Tip: There is no limit to the number of SELECT statements that you can use in one
invocation of PROC CIMPORT.
Example: “Example 2: Importing Individual Catalog Entries” on page 411
Syntax
SELECT SAS file(s) | catalog entry(s) </ MEMTYPE=mtype>
</ ENTRYTYPE=entry-type>;
Required Argument
SAS file(s) | catalog entry(s)
specifies one or more SAS files or one or more catalog entries to import. Specify
SAS filenames if you import a library; specify catalog entry names if you import
an individual SAS catalog. Separate multiple filenames or entry names with a
space. You can use shortcuts to list many like-named files in the SELECT
statement. For more information, see “SAS Data Sets” in SAS Language
Reference: Concepts.
Optional Arguments
ENTRYTYPE=entry-type
specifies a single entry type for one or more catalog entries that are listed in the
SELECT statement. See SAS Language Reference: Concepts for a complete list of
catalog entry types.
MEMTYPE=mtype
specifies a single member type for one or more SAS files listed in the SELECT
statement. Valid values are CATALOG or CAT, DATA, or ALL.
Default ALL
n the Windows encoding that is associated with the locale of the SAS session in
which the transport file is created. However, starting in SAS 9.4M3, PROC
CIMPORT supports the ability to import data sets created in non-UTF-8 SAS
sessions into UTF-8 SAS sessions.
Using PROC CIMPORT to import a data set in a UTF-8 session preserves the
encoding value of the data set. For example, if a data set with SHIFT-JIS
encoding is imported into a UTF-8 session using PROC CIMPORT, PROC
CONTENTS shows that the SHIFT-JIS encoding is maintained. However, starting
in SAS Viya 3.5, using PROC CIMPORT to import a data set in a UTF-8 session,
the UTF-8 encoding is maintained.
n Starting in SAS 9.4, when you use PROC CPORT to create a transport file that is
encoded with US-ASCII on an ASCII platform, regardless of the session
encoding, the US-ASCII encoding is preserved for that transport file. If you then
transport that data set to an ASCII platform using PROC CIMPORT, the US-
ASCII encoding for that transport file is preserved and is not d. The data set that
is created has the US-ASCII encoding, not the session encoding. For example, if
your session encoding is WLATIN1, you use PROC CPORT to create a data set
that has an encoding of US-ASCII. The US-ASCII encoding is preserved in the
transport file, instead of the WLATIN1 encoding. This preservation also occurs
when you use PROC CIMPORT on this data set. The US-ASCII encoding is
preserved and is not when you use PROC CIMPORT to transport the data set to
an ASCII platform.
Usage: CIMPORT Procedure 403
n on a z/OS platform, PROC CIMPORT creates data sets using the session
encoding.
Encoding Value
of the Example of Applying an
Transport File Encoding in a SAS Invocation Explanation
For a complete list of encodings that are associated with each locale, see Locale
Tables in SAS National Language Support (NLS): Reference Guide.
In order for a transport file to be imported successfully, the encodings of the source
and target SAS sessions must be compatible. Here is an example of compatible
source and target SAS sessions:
The encodings of the source and target SAS sessions are compatible because the
Windows default encoding for the es_MX locale is WLATIN1 and the encoding of
the target SAS session is WLATIN1. For more detailed information about
404 Chapter 12 / CIMPORT Procedure
compatible languages and encodings, see the SAS Press book SAS Encoding:
Understanding the Details, by Manfred Kiefer.
However, if the encodings of the source and target SAS sessions are incompatible, a
transport file might not be successfully imported. (See the introduction to this
section.) Here is an example of incompatible encodings:
UNIX SAS
Session Transport File
Locale Encoding Encoding Locale z/OS Encoding
The encodings of the source and target SAS sessions are incompatible because the
Windows default encoding for the cs_CZ locale is WLATIN2 and the encoding of
the target SAS session is OPEN_ED-1141. A transport file cannot be imported
between these locales.
When importing transport files, you can use the ENCODINGINFO= option see the
encoding value of the transport file. Otherwise, you are alerted to compatibility
problems via warnings and error messages. For more information about the
ENCODINGINFO= option, see “ENCODINGINFO=ALL | n” on page 394.
However, using your knowledge about the transport file, you should be able to
recover from transport problems. For information that is useful for importing the
transport file in the target SAS session, see Tips: for the ISFILEUTF8 option on
page 396. For complete details about creating and restoring transport files, see
Moving and Accessing SAS Files.
Here are the warning and error messages with recovery actions:
n “Error: Transport File Encoding Is Unknown: Use the ISFILEUTF8= Option” on
page 405
Usage: CIMPORT Procedure 405
n Because the encoding is not stamped in the transport file, the encoding is
unknown.
n The target SAS session uses the UTF-8 encoding.
Note: In order to perform recovery steps, you must know the encoding of the
transport file.
If you know that the transport file is encoded as UTF-8, you can import the file
again, and use the ISFILEUTF8=YES option in PROC CIMPORT.
Here is an example of the UTF-8 transport file and UTF-8 target SAS session. The
UTF-8 transport file was created using a SAS release before 9.2.
filename importin 'transport-file';
libname target 'sas-library';
proc cimport isfileutf8=yes infile=importin library=target
memtype=data;
run;
n Because the encoding is not stamped in the transport file, the encoding is
unknown.
Try to read the character data from the imported data set. If you cannot read the
data, you can infer that the locale of the target SAS session is incompatible with
the encoding of the transport file.
Note: In order to perform recovery steps, you must know the encoding of the
transport file.
For example, the transport file was created in a source SAS session using a SAS
release before 9.2. The transport file was also created using a Polish Poland locale.
The target SAS session uses a German locale.
1 In the target SAS session, start another SAS session and change the locale to
the locale of the source SAS session that created the transport file.
406 Chapter 12 / CIMPORT Procedure
In this example, you start a new SAS session in the Polish Poland locale.
sas9 -locale pl_PL;
PROC CIMPORT should succeed and the data should be readable in the SAS
session that uses a Polish_Poland locale.
Overview
The encoding of the character data is stamped in transport files that are created
using SAS versions 9.2 and later. Therefore, the CIMPORT procedure can detect
error conditions such as UTF-8 encoded transport files cannot be imported into
SAS sessions that do not use the UTF-8 encoding. For example, a UTF-8 transport
file cannot be imported into a SAS session that uses the Wlatin2 encoding.
SAS versions 9.2 and later can detect the condition of incompatibility between the
encoding of the transport file and the locale of the target SAS session. Because
some customers' SAS applications ran successfully using a release prior to SAS 9.2,
PROC CIMPORT reports a warning only, but allows the import procedure to
continue.
Here are the warning and error messages with recovery actions:
n “Error: Target Session Uses UTF-8: Transport File Is Not UTF-8” on page 406
n “Error: Target Session Does Not Use UTF-8: Transport File Is UTF-8” on page
408
n “Warning: Target Session Does Not Use UTF-8: Transport File Is Not UTF-8” on
page 408
n The transport file has an identified encoding that is not UTF-8. The encodings of
the transport file and the target SAS session are incompatible.
Note: Starting in SAS 9.4M3, PROC CIMPORT supports the ability to import
data sets that are created in non-UTF-8 SAS sessions into UTF-8 SAS sessions.
Usage: CIMPORT Procedure 407
Prior to SAS 9.4M3, transport files are encoded in a Windows encoding that
corresponds to the SAS session encoding.
If this error is generated, the encoding of the target SAS session cannot be UTF-8.
The locales of the source and target SAS sessions must be identical.
Here is an example of a SAS 9.2 WLATIN2 transport file and UTF-8 target SAS
session.
1 To recover, in the target SAS session, start another SAS session and change the
locale to the locale that was used in the source SAS session that created the
transport file.
The LOCALE= value is preferred over the ENCODING= value because it sets
automatically the default values for the ENCODING=, DFLANG=, DATESTYLE=,
and PAPERSIZE= options.
If you do not know the locale of the source session (or the transport file), you
can infer it from the language that is used by the character data in the transport
file.
For example, if you know that Polish is the language, specify the pl_PL (Polish
Poland) locale in a new target SAS session. Here are the encoding values that
are associated with the pl_PL locale:
Table 12.4 LOCALE= Value for the Polish Language
Windows
POSIX Locale Encoding UNIX Encoding z/OS Encoding
Note: Verify that you do not have a SAS invocation command that already
contains the specification of the UTF-8 encoding. For example: sas9 -encoding
utf8. If it exists, the UTF-8 encoding would persist regardless of a new locale
specification.
n The transport file is encoded as UTF-8. The encodings of the transport file and
the target SAS session are incompatible.
Here is an example of a SAS 9.2 UTF-8 transport file and Wlatin1 target SAS
session:
1 To recover, in the target SAS session, start a new SAS session and change the
session encoding to UTF-8. Here is an example:
sas9 -encoding utf8;
n The encoding of the transport file is identified. The encodings of the transport
file and the target SAS session are incompatible.
This table shows the locale and encoding values of incompatible source and target
SAS sessions. Although the wlatin2 Windows encoding that is assigned to the
transport file in the source SAS session is incompatible with the open_ed-1141
encoding of the target SAS session, a warning is displayed and the import
continues.
Table 12.5 Encoding Values for the Czech and German Locales
The transport file is imported, but the contents of the file are questionable. The
message identifies the incompatible encoding formats. To recover, try to read the
contents of the imported file. If the file is unreadable, perform these steps:
1 In the target SAS session, start a new SAS session and change the locale (rather
than the encoding) to the locale that is used in the source SAS session.
The LOCALE= value is preferred over the ENCODING= value because it
automatically sets the default values for the ENCODING=, DFLANG=,
DATESTYLE=, and PAPERSIZE= options.
If you do not know the locale of the source session (or the transport file), you
can infer it from the national language of the transport file.
For example, if you know that Czech is the national language, specify the cs_CZ
locale in a new target SAS session.
Here is an example of specifying the cs_CZ locale in a new SAS session:
sas9 -locale cs_CZ;
The target SAS session and the transport file use compatible encodings. They
both use wlatin2.
For complete details, see the Locale Table in SAS National Language Support
(NLS): Reference Guide.
Details
This example shows how to use PROC CIMPORT to read from disk a transport file,
named TRANFILE, that PROC CPORT created from a SAS library in another
operating environment. The transport file was moved to the new operating
environment by means of communications software or magnetic medium. PROC
CIMPORT imports the transport file to a SAS library, called NEWLIB, in the new
operating environment.
Program
libname newlib 'sas-library';
filename tranfile 'transport-file';
host-options-for-file-characteristics;
proc cimport library=newlib infile=tranfile;
run;
Program Description
Specify the library name and filename. The LIBNAME statement specifies a
LIBNAME for the new SAS library. The FILENAME statement specifies the filename
of the transport file that PROC CPORT created and enables you to specify any
operating environment options for file characteristics.
libname newlib 'sas-library';
filename tranfile 'transport-file';
Example 2: Importing Individual Catalog Entries 411
host-options-for-file-characteristics;
Import the SAS library in the NEWLIB library. PROC CIMPORT imports the SAS
library into the library named NEWLIB.
proc cimport library=newlib infile=tranfile;
run;
Log Examples
Example Code 12.1 Importing an Entire Library
Details
This example shows how to use PROC CIMPORT to import the individual catalog
entries LOAN.PMENU and LOAN.SCL from the transport file TRANS2, which was
created from a single SAS catalog.
Program
libname newlib 'sas-library';
filename trans2 'transport-file';
host-options-for-file-characteristics;
412 Chapter 12 / CIMPORT Procedure
Program Description
Specify the library name, filename, and operating environment options. The
LIBNAME statement specifies a LIBNAME for the new SAS library. The FILENAME
statement specifies the filename of the transport file that PROC CPORT created
and enables you to specify any operating environment options for file
characteristics.
libname newlib 'sas-library';
filename trans2 'transport-file';
host-options-for-file-characteristics;
Import the specified catalog entries to the new SAS catalog. PROC CIMPORT
imports the individual catalog entries from the TRANS2 transport file and stores
them in a new SAS catalog called NEWLIB.FINANCE. The SELECT statement
selects only the two specified entries from the transport file to be imported into
the new catalog.
proc cimport catalog=newlib.finance infile=trans2;
select loan.pmenu loan.scl;
run;
Log Examples
Example Code 12.2 Importing Individual Catalog Entries
Details
This example shows how to use PROC CIMPORT to import an indexed SAS data set
from a transport file that was created by PROC CPORT from a single SAS data set.
Program
libname newdata 'sas-library';
filename trans3 'transport-file';
host-options-for-file-characteristics;
proc cimport data=newdata.times infile=trans3;
run;
Program Description
Specify the library name, filename, and operating environment options. The
LIBNAME statement specifies a LIBNAME for the new SAS library. The FILENAME
statement specifies the filename of the transport file that PROC CPORT created
and enables you to specify any operating environment options for file
characteristics.
libname newdata 'sas-library';
filename trans3 'transport-file';
host-options-for-file-characteristics;
Import the SAS data set. PROC CIMPORT imports the single SAS data set that you
identify with the DATA= specification in the PROC CIMPORT statement. PROC
CPORT exported the data set NEWDATA.TIMES in the transport file TRANS3.
proc cimport data=newdata.times infile=trans3;
run;
Log Examples
Example Code 12.3 Importing a Single Indexed SAS Data Set
Details
This example shows how to import a French data set into a UTF-8 SAS session. The
French data set has an 8-byte character variable ‘a’ with a character format ‘$8.’.
Using a SAS Viya deployment, we start a French SAS session where we use PROC
CPORT to generate a transport file named ‘myfile’.
In another SAS Viya window, we start a SAS session that uses UTF-8 encoding. We
then import the transport file (non-UTF-8) into a UTF-8 SAS session using PROC
CIMPORT. We first show that the imported data set is truncated and then how to
extend the variable length using the EXTENDVAR= option to avoid truncation.
Program
Use PROC CPORT to create a transport file using a French data set. From a SAS
Viya deployment, bring up a default SAS session using the French language. In the
following example, we are using a French data set where the data set has an 8-byte
character variable ‘a’ (the value of ‘a’ is premiere with an accented e) with a
character format set to‘$8.’. We use PROC CPORT to create a transport file. In this
example, the encoding is wlatin-1, the default.
data test;
a = 'première';
format a $8.;
run;
proc cport data=test file='myfile';
run;
Example 4: Using PROC CIMPORT to Import a French Data Set into a UTF-8 SAS Session
415
Output 12.1 French Data Set Used in PROC CPORT File
Import the transport file in a UTF-8 SAS session using PROC CIMPORT. From a
separate SAS Viya session that is using UTF-8 encoding, import the transport file
‘myfile’ into the UTF-8 SAS session using PROC CIMPORT. Print the SAS data set
and the PROC CONTENTS. Because it takes 2 bytes to store the accented ‘e’ in
UTF-8 encoding, the value premiere needs 9 bytes to display the value. The format
is $8. so the value is truncated.
proc cimport data=test infile='myfile';
run;
proc print data=test;
run;
proc contents data=test;
run;
416 Chapter 12 / CIMPORT Procedure
Extend the Variable Length to avoid truncation. Because there was truncation, we
use the EXTENDVAR= option to extend the variable length and the format width
(default for EXTENDFORMAT= is YES).
proc cimport data=test infile='myfile' extendvar=1.5;
run;
proc print data=test;
run;
Example 4: Using PROC CIMPORT to Import a French Data Set into a UTF-8 SAS Session
417
proc contents data=test;
run;
Note that the output data set now contains the entire variable. Also, note in the
output of PROC CONTENTS that the variable length and the format width have
been extended.
13
COMPARE Procedure
PROC COMPARE compares two data sets: the base data set and the comparison
data set. The procedure determines matching variables and matching observations.
Matching variables are variables with the same name or variables that you pair by
using the VAR and WITH statements. Matching variables must be of the same type.
Matching observations are observations that have the same values for all ID
variables that you specify or, if you do not use the ID statement, that occur in the
same position in the data sets. If you match observations by ID variables, then both
data sets must be sorted by all ID variables.
n whether one data set has more observations than the other
n how many variables are in one data set but not in the other
Note: You can create a view that has two columns with the same variable name. If
duplicate variable names exist in the view, PROC COMPARE cannot determine
which column in the base data set should be compared to the compare data set.
PROC COMPARE issues an error if it finds duplicate variable names.
Further, PROC COMPARE creates two types of output data sets that give detailed
information about the differences between observations of variables that it is
comparing.
Overview: COMPARE Procedure 421
The following example compares the data sets Proclib.One and Proclib.Two, which
contain similar data about students:
PROC COMPARE does not produce information about values that are the same in
each comparison data set. It produces information about values that are different,
not the same.
PROC COMPARE does not produce a data set that contains observations that are
in one of the comparison data sets but not in the other, or that are in both
comparison data sets. The options for the COMPARE statement can produce much
of this information, but they do not produce a data set. If you want to produce a
data set that contains this information, use a DATA step that contains a MERGE
statement. Here is an example of a DATA step that uses a MERGE statement to
create a data set:
data inone intwo inboth;
merge a (in=ina) b(in=inb);
by byvar;
if ina and not inb then output inone;
if inb and not ina then output intwo;
if ina and inb then output inboth;
run;
422 Chapter 13 / COMPARE Procedure
n variables. PROC COMPARE checks each variable in one data set to determine
whether it matches a variable in the other data set.
n attributes (type, length, labels, formats, and informats) of matching variables.
After making these comparisons, PROC COMPARE compares the values in the
parts of the data sets that match. PROC COMPARE either compares the data by
the position of observations or by the values of an ID variable.
When you use PROC COMPARE to compare data set TWO with data set ONE, the
procedure compares the first observation in data set ONE with the first observation
in data set TWO, and it compares the second observation in the first data set with
the second observation in the second data set, and so on. In each observation that
it compares, the procedure compares the values of the idnum, name, year, state,
grade1, and grade2.
The procedure does not report on the value of the last three observations or the
variable major in data set TWO because there is nothing to compare them with in
data set ONE.
For the two data sets shown in the following figure, assume that IDNUM is an ID
variable and that IDNUM has the same type in both data sets. The procedure
compares the observations that have the same value for IDNUM. The data inside
the shaded boxes shows the part of the data sets that the procedure compares.
424 Chapter 13 / COMPARE Procedure
The data sets contain five matching variables: name, year, state, grade1, and grade2.
They also contain four matching observations: the observations with values of
1000, 1042, 1095, and 1187 for idnum.
For a numeric variable compared, let x be its value in the base data set and let y be
its value in the comparison data set. If both x and y are nonmissing, then the values
are judged unequal according to the value of METHOD= and the value of
CRITERION= (γ) as follows:
n If METHOD=EXACT, then the values are unequal if y does not equal x.
ABS y − x > γ
n If METHOD=RELATIVE, then the values are unequal if
or
y ≠ 0 for x = 0
If the value that is specified for CRITERION= is negative, then the actual criterion
that is used, γ, is equal to the absolute value of the specified criterion multiplied by
a very small number, ε (epsilon), that depends on the numerical precision of the
computer. This number ε is defined as the smallest positive floating-point value
such that, using machine arithmetic, 1−ε<1<1+ε. Round-off or truncation error in
floating-point computations is typically a few orders of magnitude larger than ε.
CRITERION=−1000 often provides a reasonable test of the equality of computed
results at the machine level of precision.
Specifying a value for δ avoids this extreme sensitivity of the RELATIVE method for
small values. If you specify METHOD=RELATIVE(δ) CRITERION=γ when both x
and y are much smaller than δ in absolute value, then the comparison is as if you
had specified METHOD=ABSOLUTE CRITERION=δγ. However, when either x or y is
much larger than δ in absolute value, the comparison is like METHOD=RELATIVE
426 Chapter 13 / COMPARE Procedure
For character variables, if one value is longer than the other, then the shorter value
is padded with blanks for the comparison. Nonblank character values are judged
equal only if they agree at each character. If the NOMISSING option is in effect,
then blank character values are judged equal to anything.
PROC COMPARE Compare the contents of SAS data sets, or Ex. 1, Ex. 2,
compare two variables Ex. 4, Ex. 6,
Ex. 7
Restriction: If you omit COMPARE=, then you must use the WITH and VAR statements.
Tips: Ensure that the LINESIZE option in the OPTIONS statement specifies an adequate
length to display the label information for the data sets.
If problems occur with the length of CHAR variables when you compare two
unequal data sets in CAS, you might need to change the value of the
NCHARMULTIPLIER option of the LIBNAME statement. For more information, see
“NCHARMULTIPLIER= LIBNAME Statement Option” in SAS Cloud Analytic Services:
User’s Guide and “CAS LIBNAME Statement” in SAS Cloud Analytic Services: User’s
Guide.
428 Chapter 13 / COMPARE Procedure
You can use data set options with the BASE= and COMPARE= options.
Examples: “Example 1: Producing a Complete Report of the Differences” on page 463
“Example 2: Comparing Variables in Different Data Sets” on page 470
“Example 4: Comparing Variables That Are in the Same Data Set” on page 474
“Example 6: Comparing Values of Observations Using an Output Data Set (OUT=)”
on page 482
“Example 7: Creating an Output Data Set of Statistics (OUTSTATS=)” on page 486
Syntax
PROC COMPARE <options>;
Optional Arguments
ALLOBS
includes in the report of value comparison results the values and, for numeric
variables, the differences for all matching observations, even if they are judged
equal.
Default If you omit ALLOBS, then PROC COMPARE prints values only for
observations that are judged unequal.
Interaction When used with the TRANSPOSE option, ALLOBS invokes the
ALLVARS option and displays the values for all matching
observations and variables.
ALLSTATS
prints a table of summary statistics for all pairs of matching variables.
See “Table of Summary Statistics” on page 456 for information about the
statistics produced
PROC COMPARE Statement 431
ALLVARS
includes in the report of value comparison results the values and, for numeric
variables, the differences for all pairs of matching variables, even if they are
judged equal.
Default If you omit ALLVARS, then PROC COMPARE prints values only for
variables that are judged unequal.
BASE=SAS-data-set
specifies the data set to use as the base data set.
Alias DATA=
Tip You can use the WHERE= data set option with the BASE= option to
limit the observations that are available for comparison.
BRIEFSUMMARY
produces a short comparison summary and suppresses the four default
summary reports (data set summary report, variables summary report,
observation summary report, and values comparison summary report).
Alias BRIEF
Example “Example 4: Comparing Variables That Are in the Same Data Set” on
page 474
COMPARE=SAS-data-set
specifies the data set to use as the comparison data set.
Alias COMP=, C=
Default If you omit COMPARE=, then the comparison data set is the same as
the base data set, and PROC COMPARE compares variables within
the data set.
Restriction If you omit COMPARE=, then you must use the WITH and VAR
statements.
Tip You can use the WHERE= data set option with COMPARE= to limit
the observations that are available for comparison.
CRITERION=γ
specifies the criterion for judging the equality of numeric values. Normally, the
value of γ (gamma) is positive. In that case, the number itself becomes the
432 Chapter 13 / COMPARE Procedure
equality criterion. If you use a negative value for γ, then PROC COMPARE uses
an equality criterion proportional to the precision of the computer on which SAS
is running.
Default 0.00001
ERROR
displays an error message in the SAS log when differences are found.
FUZZ=number
alters the values comparison results for numbers less than number. PROC
COMPARE prints the following:
n 0 for any variable value that is less than number
Default 0
Range 0-1
Tip A report that contains many trivial differences is easier to read in this
form.
LISTALL
lists all variables and observations that are found in only one data set.
Alias LIST
LISTBASE
lists all observations and variables that are found in the base data set but not in
the comparison data set.
LISTBASEOBS
lists all observations that are found in the base data set but not in the
comparison data set.
LISTBASEVAR
lists all variables that are found in the base data set but not in the comparison
data set.
LISTCOMP
lists all observations and variables that are found in the comparison data set but
not in the base data set.
PROC COMPARE Statement 433
LISTCOMPOBS
lists all observations that are found in the comparison data set but not in the
base data set.
LISTCOMPVAR
lists all variables that are found in the comparison data set but not in the base
data set.
LISTEQUALVAR
prints a list of variables whose values are judged equal at all observations in
addition to the default list of variables whose values are judged unequal.
LISTOBS
lists all observations that are found in only one data set.
LISTVAR
lists all variables that are found in only one data set.
total
is the maximum total number of differences to be printed. The default value
is 500 unless you use the ALLOBS option (or both the ALLVAR and
TRANSPOSE options). In that case, the default is 32000.
per-variable
is the maximum number of differences to be printed for each variable within
a BY group. The default value is 50 unless you use the ALLOBS option (or
both the ALLVAR and TRANSPOSE options). In that case, the default is
1000.
The MAXPRINT= option prevents the output from becoming extremely large
when data sets differ greatly.
Unless you use the CRITERION= option, the default method is EXACT. If you
use the CRITERION= option, then the default method is RELATIVE(φ), where φ
(phi) is a small number that depends on the numerical precision of the computer
on which SAS is running and on the value of CRITERION=.
NODATE
suppresses the display in the data set summary report of the creation dates and
the last modified dates of the base and comparison data sets.
NOMISSBASE
(By default, a missing value is equal only to a missing value of the same kind,
that is .=., .^=.A, .A=.A, .A^=.B, and so on.)
You can use this option to determine the changes that would be made to the
observations in the comparison data set if it were used as the master data set
and the base data set were used as the transaction data set in a DATA step
UPDATE statement. For information about the UPDATE statement, see
“UPDATE” in SAS DATA Step Statements: Reference.
NOMISSCOMP
judges a missing value in the comparison data set equal to any value. (By
default, a missing value is equal only to a missing value of the same kind, that
is .=., .^=.A, .A=.A, .A^=.B, and so on.)
You can use this option to determine the changes that would be made to the
observations in the base data set if it were used as the master data set and the
comparison data set were used as the transaction data set in a DATA step
UPDATE statement. For information about the UPDATE statement, see
“UPDATE” in SAS DATA Step Statements: Reference.
NOMISSING
judges missing values in both the base and comparison data sets equal to any
value. By default, a missing value is equal only to a missing value of the same
kind, that is .=., .^=.A, .A=.A, .A^=.B, and so on.
Alias NOMISS
NOPRINT
suppresses all printed output.
Tip You might want to use this option when you are creating one or more
output data sets.
NOSUMMARY
suppresses the data set, variable, observation, and values comparison summary
reports.
NOTE
displays notes in the SAS log that describe the results of the comparison, if
differences were found.
PROC COMPARE Statement 435
NOVALUES
suppresses the report of the value comparison results.
OUT=SAS-data-set
names the output data set. If SAS-data-set does not exist, then PROC
COMPARE creates it. SAS-data-set contains the differences between matching
variables.
OUTALL
writes an observation to the output data set for each observation in the base
data set and for each observation in the comparison data set. The option also
writes observations to the output data set that contains the differences and
percent differences between the values in matching observations.
OUTBASE
writes an observation to the output data set for each observation in the base
data set, creating observations in which _TYPE_=BASE.
OUTCOMP
writes an observation to the output data set for each observation in the
comparison data set, creating observations in which _TYPE_=COMP.
OUTDIF
writes an observation to the output data set for each pair of matching
observations. The values in the observation include values for the differences
between the values in the pair of observations. The value of _TYPE_ in each
observation is DIF.
Default The OUTDIF option is the default unless you specify the OUTBASE,
OUTCOMP, or OUTPERCENT option. If you use any of these options,
then you must specify the OUTDIF option to create _TYPE_=DIF
observations in the output data set.
436 Chapter 13 / COMPARE Procedure
OUTNOEQUAL
suppresses the writing of an observation to the output data set when all values
in the observation are judged equal. In addition, in observations containing
values for some variables judged equal and others judged unequal, the
OUTNOEQUAL option uses the special missing value ".E" to represent
differences and percent differences for variables judged equal.
OUTPERCENT
writes an observation to the output data set for each pair of matching
observations. The values in the observation include values for the percent
differences between the values in the pair of observations. The value of _TYPE_
in each observation is PERCENT.
OUTSTATS=SAS-data-set
writes summary statistics for all pairs of matching variables to the specified
SAS-data-set.
Tip If you want to print a table of statistics in the procedure output, then
use the STATS, ALLSTATS, or PRINTALL option.
PRINTALL
invokes the following options: ALLVARS, ALLOBS, ALLSTATS, LISTALL, and
WARNING.
STATS
prints a table of summary statistics for all pairs of matching numeric variables
that are judged unequal.
See “Table of Summary Statistics” on page 456 for information about the
statistics produced.
BY Statement 437
TRANSPOSE
prints the reports of value differences by observation instead of by variable.
Interaction If you also use the NOVALUES option, then the TRANSPOSE option
lists only the names of the variables whose values are judged
unequal for each observation, not the values and differences.
WARNING
displays a warning message in the SAS log when differences are found.
BY Statement
Produces a separate comparison for each BY group.
Syntax
BY <DESCENDING> variable-1
<<DESCENDING> variable-2 …>
<NOTSORTED>;
Required Argument
variable
specifies the variable that the procedure uses to form BY groups. You can
specify more than one variable. If you do not use the NOTSORTED option in the
BY statement, then the observations in the data set must be sorted by all the
variables that you specify. Variables in a BY statement are called BY variables.
Optional Arguments
DESCENDING
specifies that the observations are sorted in descending order by the variable
that immediately follows the word DESCENDING in the BY statement.
NOTSORTED
specifies that observations are not necessarily sorted in alphabetic or numeric
order. The observations are grouped in another way (for example, chronological
order).
Details
None of the BY variables are in the Compares each BY group in the base data set
comparison data set with the entire comparison data set
Some BY variables are not in the Writes an error message to the SAS log and
comparison data set terminates
Some BY variables have different Writes an error message to the SAS log and
types in the two data sets terminates
Note: Identical BY values might not compare as equal if they are formatted
differently.
ID Statement
Lists variables to use to match observations.
Syntax
ID <DESCENDING> variable-1
<<DESCENDING> variable-2 …>
<NOTSORTED>;
Required Argument
variable
specifies the variable that the procedure uses to match observations. You can
specify more than one variable, but the data set must be sorted by the variable
or variables that you specify. These variables are ID variables. ID variables also
identify observations on the printed reports and in the output data set.
Optional Arguments
DESCENDING
specifies that the data set is sorted in descending order by the variable that
immediately follows the word DESCENDING in the ID statement.
If you use the DESCENDING option, then you must sort the data sets. SAS does
not use an index to process an ID statement with the DESCENDING option.
Further, the use of DESCENDING for ID variables must correspond to the use of
the DESCENDING option in the BY statement in the PROC SORT step that was
used to sort the data sets.
NOTSORTED
specifies that observations are not necessarily sorted in alphabetic or numeric
order. The data are grouped in another way (for example, chronological order).
Details
n You should sort both data sets by the common ID variables (within the BY
variables, if any) unless you specify the NOTSORTED option.
440 Chapter 13 / COMPARE Procedure
If the data sets are not sorted by the common ID variables and if you do not specify
the NOTSORTED option, then PROC COMPARE writes a warning message to the
SAS log and continues to process the data sets as if you had specified
NOTSORTED.
When the data sets are not sorted, PROC COMPARE detects only those duplicate
observations that occur in succession.
VAR Statement
Restricts the comparison of the values of variables to the ones named in the VAR statement.
Syntax
VAR variable(s);
Required Argument
variable(s)
one or more variables that appear in the BASE= and COMPARE= data sets or
only in the BASE= data set.
WITH Statement 441
Details
n If you do not use the VAR statement, then PROC COMPARE compares the
values of all matching variables except the ones that appear in BY and ID
statements.
n If a variable in the VAR statement does not exist in the COMPARE= data set,
then PROC COMPARE writes a warning message to the SAS log and ignores the
variable.
n If a variable in the VAR statement does not exist in the BASE= data set, then
PROC COMPARE stops processing and writes an error message to the SAS log.
n The VAR statement restricts only the comparison of values of matching
variables. PROC COMPARE still reports on the total number of matching
variables and compares their attributes. However, it produces neither error nor
warning messages about these variables.
WITH Statement
Compares variables in the base data set with variables that have different names in the comparison
data set, and compares different variables that are in the same data set.
Restriction: You must use the VAR statement when you use the WITH statement.
Examples: “Example 2: Comparing Variables in Different Data Sets” on page 470
“Example 3: Comparing a Variable Multiple Times” on page 472
“Example 4: Comparing Variables That Are in the Same Data Set” on page 474
Syntax
WITH variable(s);
Required Argument
variable(s)
one or more variables to compare with variables in the VAR statement.
Details
statement corresponds to the first variable that you list in the VAR statement, the
second with the second, and so on. If the WITH statement list is shorter than the
VAR statement list, then PROC COMPARE assumes that the extra variables in the
VAR statement have the same names in the comparison data set as they do in the
base data set. If the WITH statement list is longer than the VAR statement list,
then PROC COMPARE ignores the extra variables.
A variable name can appear any number of times in the VAR statement or the
WITH statement. By selecting VAR and WITH statement lists, you can compare the
variables in any permutation.
If you omit the COMPARE= option in the PROC COMPARE statement, then you
must use the WITH statement. In this case, PROC COMPARE compares the values
of variables with different names in the BASE= data set.
“Procedure Output” on page 449 shows the default output for these two data sets.
“Example 1: Producing a Complete Report of the Differences” on page 463 shows
the complete output for these two data sets.
Results Reporting
PROC COMPARE reports the results of its comparisons in the following ways:
n the SAS log
n procedure output
SAS Log
When you use the WARNING, PRINTALL, or ERROR option, PROC COMPARE
writes a description of the differences to the SAS log.
The following table is a key for interpreting the SYSINFO return code from PROC
COMPARE. For each of the conditions listed, the associated value is added to the
return code if the condition is true. Thus, the SYSINFO return code is the sum of the
codes listed in the following table for the applicable conditions:
These codes are ordered and scaled to enable a simple check of the degree to
which the data sets differ. For example, if you want to check that two data sets
contain the same variables, observations, and values, but you do not care about
differences in labels, formats, and so on, then use the following statements:
proc compare base=SAS-data-set
compare=SAS-data-set;
run;
Each time you include SYSINFO in your code, the value for each subsequent
instance of SYSINFO is added to the previous value. The final value for SYSINFO is
the cumulative value for all of the instances of SYSINFO in the code. For example,
if you run the following example code, the first instance of SYSINFO produces a
return code of 32, and the second instance produces a return code of 4096. The two
values are added together to produce the final return code of 4128.
/*diff label -- RC is 32*/
data class1;
set sashelp.class;
label sex='Gender';
run;
data class2;
set sashelp.class;
run;
%let rc=&sysinfo
%put 'RC' &rc
set sashelp.class;
label sex='Gender';
run;
data class2;
set sashelp.class;
if name="Jeffrey" then name="Jeff";
run;
%let rc=&sysinfo
%put 'RC' &rc
You can examine individual bits in the SYSINFO value by using DATA step bit-
testing features to check for specific conditions. For example, to check for the
presence of observations in the base data set that are not in the comparison data
set, use the following statements:
proc compare base=SAS-data-set
compare=SAS-data-set;
run;
%let rc=&sysinfo;
data _null_;
/* Test for data set label */
if &rc = '1'b then
put '<<<< Data sets have different labels';
/* Test for data set types */
if &rc = '1.'b then
put '<<<< Data set types differ';
/* Test for label */
if &rc = '1.....'b then
put '<<<< Variable has different label';
/* Test for base observation */
if &rc = '1......'b then
put '<<<< Base data set has observation not in comparison data set';
/* Test for length */
if &rc = '1....'b then
put '<<<< Variable has different lengths between the base data set
and the comparison data set';
/* Variable in base data set not in compare data set */
if &rc ='1..........'b then
put '<<<< Variable in base data set not found in comparison data set';
/* Comparison data set has variable not in base data set */
if &rc = '1...........'b then
put '<<<< Comparison data set has variable not contained in the
base data set';
/* Test for values */
if &rc = '1............'b then
put '<<<< A value comparison was unequal';
/* Conflicting variable types */
if &rc ='1.............'b then
put '<<<< Conflicting variable types between the two data sets
being compared';
run;
Results: COMPARE Procedure 449
PROC COMPARE must run before you check SYSINFO and you must obtain the
SYSINFO value before another SAS step starts because every SAS step resets
SYSINFO.
Procedure Output
To view the Data Set Summary, see Output 13.65 on page 451.
Note: The COMPARE procedure omits data set labels if the line size is too small
for them.
Variables Summary
This report compares the variables in the two data sets. The first part of the report
lists the following:
n the number of variables the data sets have in common
450 Chapter 13 / COMPARE Procedure
n the number of variables in the base data set that are not in the comparison data
set and vice versa
n the number of variables in both data sets that have different types
n the number of variables that differ on other attributes (length, label, format, or
informat)
n the number of BY, ID, VAR, and WITH variables specified for the comparison
The second part of the report lists matching variables with different attributes and
shows how the attributes differ. (The COMPARE procedure omits variable labels if
the line size is too small for them.)
The following output shows the Data Set Summary and the Variables Summary.
Results: COMPARE Procedure 451
Output 13.4 Partial Output Showing the Data Set Summary and Variables Summary
Observation Summary
This report provides information about observations in the base and comparison
data sets. First of all, the report identifies the first and last observation in each data
set, the first and last matching observations, and the first and last different
observations. Then, the report lists the following:
n the number of observations that the data sets have in common
452 Chapter 13 / COMPARE Procedure
n the number of observations in the base data set that are not in the comparison
data set and vice versa
n the total number of observations in each data set
n the number of matching observations for which PROC COMPARE judged some
variables unequal
n the number of matching observations for which PROC COMPARE judged all
variables equal
n the maximum difference measure between unequal values for all pairs of
matching variables (for differences not involving missing values)
In addition, for the variables for which some matching observations have unequal
values, the report lists the following:
n the name of the variable
n the maximum difference measure found between values (for differences not
involving missing values)
n the number of differences caused by comparison with missing values, if any
Note:
n When it is comparing character values, PROC COMPARE displays only the first
20 characters. When you use the TRANSPOSE option, PROC COMPARE
displays only the first 12 characters.
If you are comparing character values, in the output PROC COMPARE displays a
plus sign at the end of a character string that is longer than 20 characters. If you
specify the TRANSPOSE option PROC COMPARE displays a plus sign in the
output at the end of a character string that is longer than 12 characters. In both
instances, the plus sign appears in the table-cell border above the character
string. Here is an example.
454 Chapter 13 / COMPARE Procedure
Note: The plus sign is displayed at the end of the border above the character
string.
n If you do not specify an adequate value for the LINESIZE option, SAS might
write a warning that indicates the LINESIZE value is too small to print all of the
ID variables in the value comparison reports. PROC COMPARE reserves space
for the value section based on the formatted width of the data values. The rest
of the space is available for ID values. The character ID values use the width
that is specified by their format, and numeric variables use approximately 10
spaces.
n the percent difference between these two values (numeric variables only)
The following output shows the Value Comparison Results for Variables.
Results: COMPARE Procedure 455
Output 13.7 Partial Output Showing the Value Comparison Results for Variables
You can suppress the value comparison results with the NOVALUES option. If you
use both the NOVALUES and TRANSPOSE options, then PROC COMPARE lists for
each observation the names of the variables with values judged unequal but does
not display the values and differences.
The display limits of PROC COMPARE and the TRANSPOSE option apply only to
the printed output generated by PROC COMPARE. To see the entire value, you
have to use the following options to create an output data set with PROC
COMPARE:
n OUT= specifies the name of the output data set.
n OUTBASE and OUTCOMP include the observations from the BASE= and
COMPARE= data sets.
n NOPRINT suppresses the default printed reports.
The following example produces a Differences data set and then runs PROC PRINT.
data x;
a='aaaaaaaaaaaaaaaaaaaaaa';
456 Chapter 13 / COMPARE Procedure
data y;
a='aaaaaaaaaaaaaaaaaaaaab';
run;
proc compare base=x comp=y out=dif outbase outcomp outdif outnoequal;
run;
proc print data=dif;
run;
Note: In all cases PROC COMPARE calculates the summary statistics based on all
matching observations that do not contain missing values, not just on those
containing unequal values.
The following output shows the following summary statistics for base data set
values, comparison data set values, differences, and percent differences:
N
the number of nonmissing values.
MEAN
the mean, or average, of the values.
STD
the standard deviation.
Results: COMPARE Procedure 457
MAX
the maximum value.
MIN
the minimum value.
MISSDIFF
the number of missing values in either a base or compare data set.
STDERR
the standard error of the mean.
T
the T ratio (MEAN/STDERR).
PROB> | T |
the probability of a greater absolute T value if the true population mean is 0.
NDIF
the number of matching observations judged unequal, and the percent of the
matching observations that were judged unequal.
DIFMEANS
the difference between the mean of the base values and the mean of the
comparison values. This line contains three numbers. The first is the mean
expressed as a percentage of the base values mean. The second is the mean
expressed as a percentage of the comparison values mean. The third is the
difference in the two means (the comparison mean minus the base mean).
R
the correlation of the base and comparison values for matching observations
that are nonmissing in both data sets.
RSQ
the square of the correlation of the base and comparison values for matching
observations that are nonmissing in both data sets.
The following output is from the ALLSTATS option using the two data sets shown
in Overview: COMPARE Procedure on page 420.:
options nodate pageno=1
linesize=80 pagesize=60;
proc compare base=proclib.one
compare=proclib.two allstats;
title 'Comparing Two Data Sets: Default Report';
run;
458 Chapter 13 / COMPARE Procedure
Output 13.8 Partial Output Showing Value Comparison Results for Variables
Note: If you use a wide line size with PRINTALL, then PROC COMPARE prints the
value comparison result for character variables next to the result for numeric
variables. In that case, PROC COMPARE calculates only NDIF for the character
variables.
default, the source of the values for each row of the table is indicated by the
following label:
_OBS_1=number-1 _OBS_2=number-2
where number-1 is the number of the observation in the base data set for which the
value of the variable is shown, and number-2 is the number of the observation in
the comparison data set.
If you use an ID statement, then the identifying label has the following form:
ID-1=ID-value-1 ... ID-n=ID-value-n
where ID is the name of an ID variable and ID-value is the value of the ID variable.
Note: When you use the TRANSPOSE option, PROC COMPARE prints only the
first 12 characters of the value.
to select tables and create output data sets. For more information, see SAS Output
Delivery System: User’s Guide.
CompareDetails (ID variable A listing of notes and warnings If ID statement is specified and
notes and warnings) concerning duplicate ID variable duplicate ID variable values
values exist in either data set
Note: The ODS output tables contain the same information as that written to the
Output window. These tables do not contain any additional information, and they
do not provide a different format for the information. The ODS output tables might
be of some use to you, but it depends on the task that you are performing. For an
example of the format produced in the Output window, see the outputs in
“Customizing PROC COMPARE Output” on page 442.
n all matching variables or, if you use the VAR statement, all variables listed in the
VAR statement
In addition, the data set contains two variables created by PROC COMPARE to
identify the source of the values for the matching variables: _TYPE_ and _OBS_.
_TYPE_
is a character variable of length 8. Its value indicates the source of the values for
the matching (or VAR) variables in that observation. (For ID and BY variables,
which are not compared, the values are the values from the original data sets.)
_TYPE_ has the label Type of Observation. The four possible values of this
variable are as follows:
BASE
the values in this observation are from an observation in the base data set.
PROC COMPARE writes this type of observation to the OUT= data set when
you specify the OUTBASE option.
COMPARE
the values in this observation are from an observation in the comparison
data set. PROC COMPARE writes this type of observation to the OUT= data
set when you specify the OUTCOMP option.
DIF
the values in this observation are the differences between the values in the
base and comparison data sets.
For character variables, PROC COMPARE uses a period (.) to represent equal
characters and an X to represent unequal characters.
PROC COMPARE writes this type of observation to the OUT= data set by
default. However, if you request any other type of observation with the
OUTBASE, OUTCOMP, or OUTPERCENT option, then you must specify the
OUTDIF option to generate observations of this type in the OUT= data set.
For an example output that shows the use of the period, X, and E to
represent equal and different values, see “Example 6: Comparing Values of
Observations Using an Output Data Set (OUT=)” on page 482.
PERCENT
the values in this observation are the percent differences between the
values in the base and comparison data sets. For character variables the
values in observations of type PERCENT are the same as the values in
observations of type DIF.
_OBS_
is a numeric variable that contains a number further identifying the source of
the OUT= observations.
For observations with _TYPE_ equal to BASE, _OBS_ is the number of the
observation in the base data set from which the values of the VAR variables
were copied. Similarly, for observations with _TYPE_ equal to COMPARE, _OBS_
is the number of the observation in the comparison data set from which the
values of the VAR variables were copied.
462 Chapter 13 / COMPARE Procedure
The COMPARE procedure takes variable names and attributes for the OUT= data
set from the base data set except for the length of the VAR variable. The
COMPARE procedure uses the longer length for the VAR variable regardless of
which data set contains that length is from. This behavior has two important
repercussions:
n If you use the VAR and WITH statements, then the names of the variables in the
OUT= data set come from the VAR statement. Thus, observations with _TYPE_
equal to BASE contain the values of the VAR variables, whereas observations
with _TYPE_ equal to COMPARE contain the values of the WITH variables.
n If you include a variable more than once in the VAR statement in order to
compare it with more than one variable, then PROC COMPARE can include only
the first comparison in the OUT= data set because each variable must have a
unique name. Other comparisons produce warning messages.
_COMP_
is a numeric variable that contains the value of the statistic calculated from the
values of the variable named by the _VAR_ variable (or by the _WITH_ variable if
you use the WITH statement) in the observations in the comparison data set
with matching observations in the base data set.
_DIF_
is a numeric variable that contains the value of the statistic calculated from the
differences of the values of the variable named by the _VAR_ variable in the
base data set and the matching variable (named by the _VAR_ or _WITH_
variable) in the comparison data set.
_PCTDIF_
is a numeric variable that contains the value of the statistic calculated from the
percent differences of the values of the variable named by the _VAR_ variable in
the base data set and the matching variable (named by the _VAR_ or _WITH_
variable) in the comparison data set.
Note: For both types of output data sets, PROC COMPARE assigns one of the
following data set labels:
Comparison of base-SAS-data-set
with comparison-SAS-data-set
Details
This example shows the most complete report that PROC COMPARE produces as
procedure output.
Program
libname proclib 'SAS-library';
options nodate pageno=1 linesize=80 pagesize=40;
proc compare base=proclib.one compare=proclib.two printall;
title 'Comparing Two Data Sets: Full Report';
run;
Program Description
Declare the PROCLIB SAS library.
libname proclib 'SAS-library';
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Create a complete report of the differences between two data sets. BASE= and
COMPARE= specify the data sets to compare. PRINTALL prints a full report of the
differences.
proc compare base=proclib.one compare=proclib.two printall;
title 'Comparing Two Data Sets: Full Report';
run;
Output: HTML
A > in the output marks information that is in the full report but not in the default
report. The additional information includes a listing of variables found in one data
set but not the other, a listing of observations found in one data set but not the
other, a listing of variables with all equal values, and summary statistics. For an
explanation of the statistics, see “Table of Summary Statistics” on page 456.
Example 1: Producing a Complete Report of the Differences 465
Output 13.10 Part One of Comparing Two Data Sets: Full Report
466 Chapter 13 / COMPARE Procedure
Output 13.11 Part Two of Comparing Two Data Sets: Full Report
Example 1: Producing a Complete Report of the Differences 467
Output 13.12 Part Three of Comparing Two Data Sets: Full Report
468 Chapter 13 / COMPARE Procedure
Output 13.13 Part Four of Comparing Two Data Sets: Full Report
Example 1: Producing a Complete Report of the Differences 469
Output 13.14 Part Five of Comparing Two Data Sets: Full Report
470 Chapter 13 / COMPARE Procedure
Output 13.15 Part Six of Comparing Two Data Sets: Full Report
Details
This example compares a variable from the base data set with a variable in the
comparison data set. All summary reports are suppressed.
Program
libname proclib 'SAS-library';
options nodate pageno=1 linesize=80 pagesize=40;
proc compare base=proclib.one compare=proclib.two nosummary;
var gr1;
with gr2;
title 'Comparison of Variables in Different Data Sets';
run;
Program Description
Declare the PROCLIB SAS library.
libname proclib 'SAS-library';
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Suppress all summary reports of the differences between two data sets. BASE=
specifies the base data set and COMPARE= specifies the comparison data set.
NOSUMMARY suppresses all summary reports.
proc compare base=proclib.one compare=proclib.two nosummary;
Specify one variable from the base data set to compare with one variable from
the comparison data set. The VAR and WITH statements specify the variables to
compare. This example compares GR1 from the base data set with GR2 from the
comparison data set.
var gr1;
with gr2;
title 'Comparison of Variables in Different Data Sets';
run;
472 Chapter 13 / COMPARE Procedure
Output: HTML
Output 13.16 Comparison of Variables in Different Data Sets
Details
This example compares one variable from the base data set with two variables in
the comparison data set.
Program
libname proclib 'SAS-library';
options nodate pageno=1 linesize=80 pagesize=40;
Example 3: Comparing a Variable Multiple Times 473
Program Description
Declare the PROCLIB SAS library.
libname proclib 'SAS-library';
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Suppress all summary reports of the differences between two data sets. BASE=
specifies the base data set and COMPARE= specifies the comparison data set.
NOSUMMARY suppresses all summary reports.
proc compare base=proclib.one compare=proclib.two nosummary;
Specify one variable from the base data set to compare with two variables from
the comparison data set. The VAR and WITH statements specify the variables to
compare. This example compares GR1 from the base data set with GR1 and GR2
from the comparison data set.
var gr1 gr1;
with gr1 gr2;
title 'Comparison of One Variable with Two Variables';
run;
Output: HTML
The Value Comparison Results section shows the result of the comparison.
474 Chapter 13 / COMPARE Procedure
Details
This example shows that PROC COMPARE can compare two variables that are in
the same data set.
Program
libname proclib 'SAS-library';
options nodate pageno=1 linesize=80 pagesize=40;
proc compare base=proclib.one allstats briefsummary;
var gr1;
with gr2;
title 'Comparison of Variables in the Same Data Set';
run;
Program Description
Declare the Proclib SAS library.
libname proclib 'SAS-library';
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Create a short summary report of the differences within one data set. ALLSTATS
prints summary statistics. BRIEFSUMMARY prints only a short comparison
summary.
proc compare base=proclib.one allstats briefsummary;
Specify two variables from the base data set to compare. The VAR and WITH
statements specify the variables in the base data set to compare. This example
compares GR1 with GR2. Because there is no comparison data set, the variables
GR1 and GR2 must be in the base data set.
var gr1;
with gr2;
title 'Comparison of Variables in the Same Data Set';
run;
476 Chapter 13 / COMPARE Procedure
Output: HTML
Output 13.18 Comparison of One Variable with Two Variables
Details
In this example, PROC COMPARE compares only the observations that have
matching values for the ID variable.
Program
libname proclib 'SAS-library';
options nodate pageno=1 linesize=80 pagesize=40;
data proclib.emp95;
input #1 idnum $4. @6 name $15.
#2 address $42.
#3 salary 6.;
datalines;
2388 James Schmidt
100 Apt. C Blount St. SW Raleigh NC 27693
92100
2457 Fred Williams
99 West Lane Garner NC 27509
33190
... more data lines...
3888 Kim Siu
5662 Magnolia Blvd Southeast Cary NC 27513
77558
;
data proclib.emp96;
input #1 idnum $4. @6 name $15.
#2 address $42.
#3 salary 6.;
datalines;
2388 James Schmidt
100 Apt. C Blount St. SW Raleigh NC 27693
92100
2457 Fred Williams
99 West Lane Garner NC 27509
33190
...more data lines...
6544 Roger Monday
3004 Crepe Myrtle Court Raleigh NC 27604
47007
;
proc sort data=proclib.emp95 out=emp95_byidnum;
by idnum;
run;
by idnum;
run;
proc compare base=emp95_byidnum compare=emp96_byidnum;
id idnum;
title 'Comparing Observations that Have Matching IDNUMs';
run;
Program Description
Declare the PROCLIB SAS library.
libname proclib 'SAS-library';
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
data proclib.emp96;
input #1 idnum $4. @6 name $15.
#2 address $42.
#3 salary 6.;
datalines;
2388 James Schmidt
100 Apt. C Blount St. SW Raleigh NC 27693
92100
2457 Fred Williams
99 West Lane Garner NC 27509
33190
...more data lines...
6544 Roger Monday
Example 5: Comparing Observations with an ID Variable 479
Sort the data sets by the ID variable. Both data sets must be sorted by the variable
that will be used as the ID variable in the PROC COMPARE step. OUT= specifies
the location of the sorted data.
proc sort data=proclib.emp95 out=emp95_byidnum;
by idnum;
run;
Create a summary report that compares observations with matching values for
the ID variable. The ID statement specifies IDNUM as the ID variable.
proc compare base=emp95_byidnum compare=emp96_byidnum;
id idnum;
title 'Comparing Observations that Have Matching IDNUMs';
run;
Output: HTML
PROC COMPARE identifies specific observations by the value of IDNUM. In the
Value Comparison Results for Variables section, PROC COMPARE prints the
nonmatching addresses and nonmatching salaries. For salaries, PROC COMPARE
computes the numerical difference and the percent difference. Because ADDRESS
is a character variable, PROC COMPARE displays only the first 20 characters. For
addresses where the observation has an IDNUM of 0987, 2776, or 3888, the
differences occur after the 20th character and the differences do not appear in the
output. The plus sign in the output indicates that the full value is not shown. To see
the entire value, create an output data set. See “Example 6: Comparing Values of
Observations Using an Output Data Set (OUT=)” on page 482.
480 Chapter 13 / COMPARE Procedure
Output 13.19 Part One of Comparing Observations That Have Matching IDNUMs
Example 5: Comparing Observations with an ID Variable 481
Output 13.20 Part Two of Comparing Observations That Have Matching IDNUMs
482 Chapter 13 / COMPARE Procedure
Output 13.21 Part Three of Comparing Observations That Have Matching IDNUMs
Details
This example creates and prints an output data set that shows the differences
between matching observations.
Program
libname proclib 'SAS-library';
options nodate pageno=1 linesize=120 pagesize=40;
proc sort data=proclib.emp95 out=emp95_byidnum;
by idnum;
run;
Program Description
Declare the PROCLIB SAS library.
libname proclib 'SAS-library';
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=120 pagesize=40;
Sort the data sets by the ID variable. Both data sets must be sorted by the variable
that will be used as the ID variable in the PROC COMPARE step. OUT= specifies
the location of the sorted data.
proc sort data=proclib.emp95 out=emp95_byidnum;
by idnum;
run;
Specify the data sets to compare. BASE= and COMPARE= specify the data sets to
compare.
484 Chapter 13 / COMPARE Procedure
Create the Result output data set and include all unequal observations and their
differences. OUT= names and creates the output data set. NOPRINT suppresses
the printing of the procedure output. OUTNOEQUAL includes only observations
that are judged unequal. OUTBASE writes an observation to the output data set for
each observation in the base data set. OUTCOMP writes an observation to the
output data set for each observation in the comparison data set. OUTDIF writes an
observation to the output data set that contains the differences between the two
observations.
out=result outnoequal outbase outcomp outdif
noprint;
Print the Result output data set and use the BY and ID statements with the ID
variable. PROC PRINT prints the output data set. Using the BY and ID statements
with the same variable makes the output easy to read. See the PRINT procedure for
more information about this technique.
proc print data=result noobs;
by idnum;
id idnum;
title 'The Output Data Set RESULT';
run;
Output: HTML
The differences for character variables are noted with an X or a period (.). An X
shows that the characters do not match. A period shows that the characters do
match. For numeric variables, an E means that there is no difference. Otherwise, the
numeric difference is shown. By default, the output data set shows that two
observations in the comparison data set have no matching observation in the base
data set. You do not have to use an option to make those observations appear in
the output data set.
Example 6: Comparing Values of Observations Using an Output Data Set (OUT=) 485
Details
This example creates an output data set that contains summary statistics for the
numeric variables that are compared.
Program
libname proclib 'SAS-library';
options nodate pageno=1 linesize=80 pagesize=40;
proc sort data=proclib.emp95 out=emp95_byidnum;
by idnum;
run;
Program Description
Declare the Proclib SAS library.
libname proclib 'SAS-library';
Example 7: Creating an Output Data Set of Statistics (OUTSTATS=) 487
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Sort the data sets by the ID variable. Both data sets must be sorted by the variable
that will be used as the ID variable in the PROC COMPARE step. OUT= specifies
the location of the sorted data.
proc sort data=proclib.emp95 out=emp95_byidnum;
by idnum;
run;
Create the output data set of statistics and compare observations that have
matching values for the ID variable. BASE= and COMPARE= specify the data sets
to compare. OUTSTATS= creates the output data set Diffstat.Noprint and
suppresses the procedure output. The ID statement specifies IDNUM as the ID
variable. PROC COMPARE uses the values of IDNUM to match observations.
proc compare base=emp95_byidnum compare=emp96_byidnum
outstats=diffstat noprint;
id idnum;
run;
Print the output data set Diffstat. PROC PRINT prints the output data set Diffstat.
proc print data=diffstat noobs;
title 'The DIFFSTAT Data Set';
run;
Output: HTML
The variables are described in “Output Statistics Data Set (OUTSTATS=)” on page
462.
488 Chapter 13 / COMPARE Procedure
14
CONTENTS Procedure
n PROC CONTENTS can read sequential files. The CONTENTS statement cannot.
PROC CONTENTS reports metadata about the table and the metadata about the
variables. The CAS engine is the only engine supporting VARCHAR. If there is a
VARCHAR data type in the table, PROC CONTENTS shows the Length in bytes and
characters as well as maximum bytes used.
Just like with Base engine data sets, the top portion of PROC CONTENTS reports
the information about the table. The Encoding shows the encoding of the CAS
table. The same for the Data Representation.
The following PROC CONTENTS shows the output from the Hadoop engine.
492 Chapter 14 / CONTENTS Procedure
CAUTION
Do not confuse the GENNUM variable value in CONTENTS’ OUT= data set with
the GEN variable value from DICTIONARY tables. GENNUM from a CONTENTS
procedure or statement refers to a specific generation of a data set. GEN from
DICTIONARY tables refers to the total number of generations for a data set. Each SAS
procedure is designed and architected to deliver specific information. The output from
DICTIONARY.TABLES and PROC CONTENTS are not interchangeable.
PROC CONTENTS Statement 493
CONTENTS List the contents of one or more SAS data sets Ex. 1, Ex. 2,
and print the directory of the SAS library Ex. 3, Ex. 4
Restriction: You cannot use the WHERE option to affect the output because PROC CONTENTS
does not process any observations.
Notes: The ATTRIB statement does not affect the CONTENTS statement output.
CONTENTS reports the labels, informats, and formats on the actual member.
494 Chapter 14 / CONTENTS Procedure
Syntax
PROC CONTENTS<options>
Optional Arguments
CENTILES
prints centiles information for indexed variables.
The following additional fields are printed in the default report of PROC
CONTENTS when the CENTILES option is selected and an index exists on the
data set. Note that the additional fields depend on whether the index is simple
or complex.
DATA=SAS-file-specification
specifies an entire library or a specific SAS data set within a library. SAS-file-
specification can take one of the following forms:
<libref.>SAS-data-set
names one SAS data set to process. The default for libref is the libref of the
procedure input library. For example, to obtain the contents of the SAS data
set HtWt from the procedure input library, use the following CONTENTS
statement:
contents data=HtWt;
To obtain the contents of a specific version from a generation group, use the
GENNUM= data set option as shown in the following CONTENTS statement:
contents data=HtWt(gennum=3);
PROC CONTENTS Statement 495
<libref.>_ALL_
gives you information about all SAS data sets that have the type or types
specified by the MEMTYPE= option. libref refers to the SAS library. The
default for libref is the libref of the procedure input library.
n If you are using the _ALL_ keyword, you need Read access to all read-
protected SAS data sets in the SAS library.
n DATA=_ALL_ automatically prints a listing of the SAS files that are
contained in the SAS library. Note that for SAS views, all librefs that are
associated with the views must be assigned in the current session in
order for them to be processed for the listing.
Default most recently created data set in your job or session, from any SAS
library.
Tip If you specify a read-protected data set in the DATA= option but do
not give the Read password, by default the procedure looks in the
PROC DATASETS statement for the Read password. However, if you
do not specify the DATA= option and the default data set (last one
created in the session) is Read protected, the procedure does not look
in the PROC DATASETS statement for the Read password.
DETAILS | NODETAILS
includes information in the output about the number of observations, number of
variables, number of indexes, and data set labels. DETAILS includes these
additional columns of information in the output, but only if DIRECTORY is also
specified.
DIRECTORY
prints a list of all SAS files in the specified SAS library. If DETAILS is also
specified, using DIRECTORY causes the additional columns described in
DETAILS | NODETAILS on page 495 to be printed.
ENCRYPTKEY=key-value
specifies the key value for AES encryption.
FMTLEN
prints the length of the informat or format. If you do not specify a length for the
informat or format when you associate it with a variable, the length does not
appear in the output of the CONTENTS statement unless you use the FMTLEN
496 Chapter 14 / CONTENTS Procedure
option. The length also appears in the FORMATL or INFORML variable in the
output data set.
MEMTYPE=(member-type(s))
restricts processing to one or more member types. The CONTENTS statement
produces output only for member types DATA, VIEW, and ALL, which includes
DATA and VIEW.
n You cannot enclose the MEMTYPE= option in parentheses to limit its effect
to only the SAS file immediately preceding it.
Aliases MTYPE=
MT=
Default DATA
NODS
suppresses printing the contents of individual files when you specify _ALL_ in
the DATA= option. The CONTENTS statement prints only the SAS library
directory. You cannot use the NODS option when you specify only one SAS data
set in the DATA= option.
NODETAILS
See “DETAILS|NODETAILS” on page 495.
NOPRINT
suppresses printing the output of the CONTENTS statement.
Note The ORDER= option does not affect the order of the OUT= and OUT2=
data sets.
PROC CONTENTS Statement 497
Example See “Example 4: Using the ORDER= Option” on page 515 to compare
the default and the four options for ORDER=.
OUT=SAS-data-set
names an output SAS data set.
Tip OUT= does not suppress the printed output from the statement. If you
want to suppress the printed output, you must use the NOPRINT option.
See “The OUT= Data Set” on page 683 for a description of the variables in the
OUT= data set.
OUT2=SAS-data-set
names the output data set to contain information about indexes and integrity
constraints.
Tips If UPDATECENTILES was not specified in the index definition, then the
default value of 5 is used in the re-create variable of the OUT2 data set.
OUT2= does not suppress the printed output from the statement. To
suppress the printed output, use the NOPRINT option.
See “The OUT2= Data Set” on page 689 for a description of the variables in
the OUT2= data set.
SHORT
prints only the list of variable names, the index information, and the sort
information for the SAS data set.
Restriction If the list of variables is more than 32,767 characters, the list is
truncated and a WARNING is written to the SAS log. To get a
complete list of the variables, request an alphabetical listing of the
variables.
VARNUM
prints a list of the variable names in the order of their logical position in the data
set. By default, the CONTENTS statement lists the variables alphabetically. The
physical position of the variable in the data set is engine-dependent.
498 Chapter 14 / CONTENTS Procedure
Details
Printing Variables
The CONTENTS statement prints an alphabetical listing of the variables by default,
except for variables in the form of a numbered range list. Numbered range lists,
such as x1–x100, are printed in incrementing order, that is, x1–x100. For more
information, see “Alphabetic List of Variables and Attributes” on page 678.
Note: If a label is changed after a view is created from a data set with variable
labels, the CONTENTS or DATASETS procedure output shows the original labels.
The view must be recompiled in order for the CONTENTS or DATASETS procedure
output to reflect the new variable labels.
If the key value does not match the key value for a particular data file in the library,
then you will be prompted to enter the correct key value.
For more information about AES encryption, see “AES Encryption” in SAS
Programmer’s Guide: Essentials. For more information about the ENCRYPTKEY=
data set option, see “ENCRYPTKEY= Data Set Option” in SAS Data Set Options:
Reference.
Variables DD and FF, the only true numeric doubles, are at offsets 0 and 8,
respectively, so they are automatically aligned. The rest of the observation
contains the remaining numeric variables and then character variables.
The last physical variable in this layout is CC with an offset of 32 and a length of
10. This gives you an internal length of 42, even though PROC CONTENTS
reports the observation length as 48. The difference is the 6 bytes of padding so
that the next observation is aligned on a double-byte boundary within the disk
page buffer.
n No alignment is done when the observation does not contain 8-byte numeric
variables as demonstrated in the next example, which gives you an observation
length of 7 and no padding between observations within disk page buffers:
data b;
length aa 6 cc $1;
aa = 1;
cc = 'x';
output;
run;
n Observations for compressed data sets are not aligned within the disk page
buffer, but the same algorithm is used for positioning the variables within the
observations. Compressed observations must be uncompressed and moved into
a work buffer. The 8-byte numeric values will be aligned and ready for use
immediately after uncompressing. The observation length in the PROC
CONTENTS output might be larger due to operating system-specific overhead.
For more information, see Chapter 5, “CAS Processing of Base Procedures,” on page
93.
Details
This example shows the output from the CONTENTS procedure for the Group data
set. The output shows the modifications made to the Group data set in “Example 4:
Modifying SAS Data Sets” on page 704 and the contents of the Grpout data set.
Program
options pagesize=40 linesize=80 nodate pageno=1;
LIBNAME health 'SAS-library';
proc datasets library=health nolist;
run;
proc contents data=health.group (read=green) out=health.grpout;
title 'The Contents of the GROUP Data Set';
run;
proc contents data=health.grpout;
title 'The Contents of the GRPOUT Data Set';
run;
Program Description
Set the system options. The PAGESIZE= option specifies the number of lines that
compose a page of the SAS log and SAS output. The LINESIZE= option specifies
the line size for the SAS log and for the SAS procedure output. The NODATE option
specifies that the date and the time are not printed. The PAGENO= option specifies
a beginning page number for the next page of output.
options pagesize=40 linesize=80 nodate pageno=1;
Specify Health as the procedure input library, and suppress the directory listing.
proc datasets library=health nolist;
run;
Create the output data set Grpout from the data set Group. Specify Group as the
data set to describe, give Read access to the Group data set, and create the output
data set Grpout.
proc contents data=health.group (read=green) out=health.grpout;
title 'The Contents of the GROUP Data Set';
run;
Example 1: Describing a SAS Data Set 503
Output Examples
Output 14.3 Contents of the Group Data Set
504 Chapter 14 / CONTENTS Procedure
Example 1: Describing a SAS Data Set 505
Details
This example shows the output from the CONTENTS procedure for the Group data
set using the DIRECTORY option. This option prints a list of all SAS files that are in
the specified SAS library.
Program
options pagesize=40 linesize=80 nodate pageno=1;
LIBNAME health 'SAS-library';
proc datasets library=health nolist;
run;
proc contents data=health.group (read=green) directory;
title 'Contents Using the DIRECTORY Option';
run;
Program Description
Set the system options. The PAGESIZE= option specifies the number of lines that
compose a page of the SAS log and the SAS output. The LINESIZE= option
specifies the line size for the SAS log and for the SAS procedure output. The
NODATE option specifies that the date and the time are not printed. The PAGENO=
option specifies a beginning page number for the next page of output.
options pagesize=40 linesize=80 nodate pageno=1;
Specify Health as the procedure input library, and suppress the directory listing.
proc datasets library=health nolist;
run;
Specify Group as the data set to describe, and give Read access to the Group data
set. Use the DIRECTORY option to print a listing of all the data sets that are in the
HEALTH library.
proc contents data=health.group (read=green) directory;
title 'Contents Using the DIRECTORY Option';
run;
Example 2: Using the DIRECTORY Option 509
Output Examples
Output 14.5 Using the DIRECTORY Option - Section 1
510 Chapter 14 / CONTENTS Procedure
Example 3: Using the DIRECTORY and DETAILS Options 511
Details
This example shows the output from the CONTENTS procedure for the Group data
set using the DIRECTORY option. This option prints a list of all SAS files that are in
the specified SAS library. The DETAILS option includes information in the output
about the number of observations, number of variables, number of indexes, and
data set labels.
Program
options pagesize=40 linesize=80 nodate pageno=1;
LIBNAME health 'SAS-library';
proc datasets library=health nolist;
run;
proc contents data=health.groupdirectory details;
title 'Contents Using the DIRECTORY and DETAILS Options';
run;
Program Description
Set the system options. The PAGESIZE= option specifies the number of lines that
compose a page of the SAS log and the SAS output. The LINESIZE= option
specifies the line size for the SAS log and for the SAS procedure output. The
NODATE option specifies that the date and the time are not printed. The PAGENO=
option specifies a beginning page number for the next page of output.
options pagesize=40 linesize=80 nodate pageno=1;
Specify Health as the procedure input library, and suppress the directory listing.
proc datasets library=health nolist;
run;
Specify Group as the data set. Use the DIRECTORY option to print a listing of all
the data sets that are in the HEALTH library. Use the DETAILS options for
additional columns of information in the Group output.
proc contents data=health.groupdirectory details;
title 'Contents Using the DIRECTORY and DETAILS Options';
run;
Example 3: Using the DIRECTORY and DETAILS Options 513
Output Examples
Output 14.6 Using the DIRECTORY and DETAILS Options
514 Chapter 14 / CONTENTS Procedure
Example 4: Using the ORDER= Option 515
Details
This example shows the output from the CONTENTS procedure for the Grpout data
set using the ORDER= option, which prints a list of variables in different orders.
Program
options pagesize=40 linesize=80 nodate pageno=1;
LIBNAME health 'SAS-library';
proc contents data=health.grpout order=collate;
title 'Contents Using the ORDER= Option';
run;
proc contents data=health.grpout order=varnum;
title 'Contents Using the ORDER= Option';
run;
Program Description
Set the system options. The PAGESIZE= option specifies the number of lines that
compose a page of the SAS log and the SAS output. The LINESIZE= option
specifies the line size for the SAS log and for the SAS procedure output. The
NODATE option specifies that the date and the time are not printed. The PAGENO=
option specifies a beginning page number for the next page of output.
options pagesize=40 linesize=80 nodate pageno=1;
Specify the Grpout data set. Use the ORDER=COLLATE option to print a listing of
all variables in alphabetical order.
proc contents data=health.grpout order=collate;
title 'Contents Using the ORDER= Option';
run;
Specify the Grpout data set. Use the ORDER=VARNUM option to print a listing of
all variables in number order.
proc contents data=health.grpout order=varnum;
title 'Contents Using the ORDER= Option';
run;
Example 4: Using the ORDER= Option 517
Output Examples
Output 14.7 Using the ORDER=COLLATE Option
518 Chapter 14 / CONTENTS Procedure
Example 4: Using the ORDER= Option 519
15
COPY Procedure
Generally, the COPY procedure functions the same as the COPY statement in the
DATASETS procedure. The differences are as follows:
n The IN= argument is required with PROC COPY. In the COPY statement, IN= is
optional. If IN= is omitted, the default value is the libref of the procedure input
library.
522 Chapter 15 / COPY Procedure
n PROC DATASETS cannot work with libraries that allow only sequential data
access.
n The COPY statement honors the NOWARN option but PROC COPY does not.
Note: The MIGRATE procedure is available specifically for migrating a SAS library
from a previous release to the most recent release. For migration, PROC MIGRATE
offers benefits that PROC COPY does not. For more information, see MIGRATE
Procedure on page 1559.
With the ACCEL option, PROC COPY executes a CAS action to copy a CAS table
from one caslib to another caslib in the same CAS session.
<FORCE>
IN=libref-2
<INDEX=YES | NO>
<MEMTYPE=(member-type(s))>
<MOVE <ALTER=alter-password>>
<OVERRIDE=(ds-option-1=value-1 <ds-option-2=value-2 …> ) >;
<SELECT SAS-file(s)>
</ <ENCRYPTKEY=key-value> <ALTER=alter-password>
<MEMTYPE=member-type>>;
PROC COPY options are valid when using a CAS LIBNAME engine except for the
following:
n ENCRYPTKEY=
n OVERRIDE=
n PW=
When a copy occurs on the CAS server, the MVA session system options (like
VALIDMEMNAME and VALIDVARNAME) will not be used.
SAS Cloud Analytic Services (CAS) is the analytic server and associated cloud
services in SAS Viya. The CAS LIBNAME engine can connect a SAS 9.4 session to
an existing SAS Cloud Analytic Services session through the CAS session name or
the CAS session UUID. The libref then becomes your handle to communicate from
SAS with the specific session. The following example shows how to use PROC
COPY with CAS processing.
For information about how to use the CAS LIBNAME statement, see “Getting
Started” in SAS Cloud Analytic Services: User’s Guide.
Attributes are specified with data set options, system options, or LIBNAME
statement options. The CAS engine supports only the COMPRESS=YES | NO
option. No other attributes are supported by the CAS engine.
CLONE or
Attribute To NOCLONE Description
CLONE or
Attribute To NOCLONE Description
CAS table to
CAS table
CLONE or
Attribute To NOCLONE Description
CAS table to
CAS table
CAS table to
CAS table
CAS table to
SAS data set
CAS table to
CAS table
CLONE or
Attribute To NOCLONE Description
CAS table to
CAS table
TIP If the MSGLEVEL=I option is set, and the SELECT performance code
can be used, the following message is sent to the SAS log:
INFO: COPY with SELECT performance is in use.
1 Use PROC COPY to copy one or more SAS data sets to a file that is created with
either the transport (XPORT) engine or the XML engine. This file is referred to
as a transport file and is always a sequential file.
2 After the file is created, you can move it to another operating environment via
communications software, such as FTP, or tape. If you use communications
software, be sure to move the file in binary format to avoid any type of
conversion. If you are moving the file to a mainframe, the file must have certain
Usage: COPY Procedure 529
attributes. Consult the SAS documentation for your operating environment and
the SAS Technical Support web page for more information.
3 After you have successfully moved the file to the receiving host, use PROC
COPY to copy the data sets from the transport file to a SAS library.
For an example, see “Example 1: Copying SAS Data Sets between Hosts” on page
530.
For details about transporting files, see Moving and Accessing SAS Files.
The CPORT and CIMPORT procedures also provide a way to transport SAS files.
For more information, see Chapter 16, “CPORT Procedure,” on page 537 and
Chapter 12, “CIMPORT Procedure,” on page 389.
If you need to migrate a SAS library from a previous release of SAS, see the
Migration focus area at http://support.sas.com/migration.
For more information, see the Details on page 618 section of the CONTENTS
statement in PROC DATASETS.
options compress=yes;
proc copy in=work out=new noclone;
select x;
run;
530 Chapter 15 / COPY Procedure
Details
This example demonstrates how to create a transport file on a host and read it on
another host.
In order for this example to work correctly, the transport file must have certain
characteristics, as described in the SAS documentation for your operating
environment. In addition, the transport file must be moved to the receiving
operating system in binary format.
Program
Program Description
Assign library references. Assign a libref, such as Source, to the SAS library that
contains the SAS data set that you want to transport. Also, assign a libref to the
transport file and use the XPORT keyword to specify the XPORT engine.
Enable the procedure to read data from the transport file. The XPORT engine in
the LIBNAME statement enables the procedure to read the data from the transport
file.
Copy the SAS data sets to the receiving host. After you copy the files, use PROC
COPY to copy the SAS data sets to the Work data library on the receiving host. You
could use FTP in binary mode to the Windows host.
Log Examples
Example Code 15.1 Source Library Log
CVP engine
Details
This example demonstrates how to convert encoding from one type to another
type. In order for this example to work correctly, the two encodings must be
compatible. For documentation, see “Compatible and Incompatible Encodings” in
SAS National Language Support (NLS): Reference Guide.
Program
Assign library references. The two encodings must be compatible.
Log Examples
Example Code 15.3 InLib Library Log
NOTE: Libname and/or system options for compress, pointobs, data representation and
encoding attributes were used at user's request.
NOTE: Data file OUTLIB.CAR.DATA is in a format that is native to another host, or
the file encoding does not match the session encoding. Cross Environment Data Access
will be used, which might require additional CPU resources and might reduce
performance.
NOTE: There were 25 observations read from the data set INLIB.CAR.
Details
This example demonstrates how to use PROC COPY to migrate from a 32-bit to a
64-bit environment. PROC MIGRATE does not support item stores when you
migrate from a 32-bit to a 64-bit environment.
Example 4: Copy a SAS Data Set to a CAS Table 535
Program
Program Description
Assign library resources. Use the OUTREP= option when changing from a 32-bit to
a 64-bit machine.
Details
This example demonstrates how to copy a SAS data set into a CAS table.
Program
536 Chapter 15 / COPY Procedure
Program Description
Assign library references. Select the data set that you want to copy into a CAS
table.
Use PROC COPY and the SELECT statement. Copy a SAS data set into a CAS
table.
Log Examples
Example Code 15.4 MyLib Library Log
16
CPORT Procedure
environments and for many releases of SAS. In PROC CPORT, export means to put
a SAS library, a SAS catalog, or a SAS data set into transport format. PROC CPORT
exports catalogs and data sets, either singly or as a SAS library. PROC CIMPORT
restores (imports) the transport file to its original form as a SAS catalog, SAS data
set, or SAS library.
PROC CPORT also converts SAS files, which means that it changes the format of a
SAS file from the format appropriate for one version of SAS to the format
appropriate for another version. For example, you can use PROC CPORT and PROC
CIMPORT to move files from earlier releases of SAS to more recent releases. PROC
CIMPORT automatically converts the transport file as it imports it.
Note: PROC CPORT and PROC CIMPORT can be used to back up graphic catalogs.
PROC COPY cannot be used to back up graphic catalogs.
PROC CPORT produces no output (other than the transport files), but it does write
notes to the SAS log.
2 The transport file is transferred from the source computer to the target
computer via communications software or a magnetic medium.
3 The transport file is read at the target computer using PROC CIMPORT.
Note: Transport files that are created using PROC CPORT are not
interchangeable with transport files that are created using the XPORT engine.
For complete details about the steps to create a transport file (PROC CPORT), to
transfer the transport file, and to restore the transport file (PROC CIMPORT), see
Moving and Accessing SAS Files.
For complete details about the steps to create a transport file (PROC CPORT), to
transfer the transport file, and to restore the transport file (PROC CIMPORT), see
Moving and Accessing SAS Files
Syntax
PROC CPORT source-type=libref | <libref.>member-name <options>;
exports SAS/AF PROGRAM and SCL entries without Edit capability when
you import them.
NOSRC
specifies that exported catalog entries contain compiled SCL code, but
not the source code.
OUTLIB=libref
specifies a libref associated with a SAS library.
Required Argument
source-type=libref | < libref.>member-name
identifies the type of file to export and specifies the catalog, SAS data set, or
SAS library to export.
source-type
identifies one or more files to export as a single catalog, as a single SAS data
set, or as the members of a SAS library. The source-type argument can be
one of the following:
CATALOG | CAT | C
DATA | DS | D
LIBRARY | LIB | L
Note If you specify a password-protected data set as the source type, you
must also include the password when creating its transport file. For
details, see “READ= Data Set Option in the PROC CPORT Statement ”
on page 549.
libref
<libref.>member-name
specifies the specific catalog, SAS data set, or SAS library to export. If
source-type is CATALOG or DATA, you can specify both a libref and a
member name. If the libref is omitted, PROC CPORT uses the default library
as the libref, which is usually the WORK library. If the source-type argument
is LIBRARY, specify only a libref. If you specify a library, PROC CPORT
exports only data sets and catalogs from that library. You cannot export
other types of files.
Optional Arguments
AFTER=date
exports copies of all data sets or catalog entries that have a modification date
later than or equal to the date that you specify. The modification date is the
most recent date when the contents of the data set or catalog entry changed.
Specify date as a SAS date literal or as a numeric SAS date value.
Tip You can determine the modification date of a catalog entry by using
the CATALOG procedure.
ASIS
suppresses the conversion of displayed character data to transport format. Use
this option when you move files that contain DBCS (double-byte character set)
data from one operating environment to another if both operating environments
use the same type of DBCS data.
542 Chapter 16 / CPORT Procedure
You cannot use both the ASIS option and the OUTTYPE= options in
the same PROC CPORT step.
CONSTRAINT=YES | NO
controls the exportation of integrity constraints that have been defined on a
data set. When you specify CONSTRAINT=YES, all types of integrity
constraints are exported for a library; only general integrity constraints are
exported for a single data set. When you specify CONSTRAINT=NO, indexes
created without integrity constraints are ported, but neither integrity
constraints nor any indexes created with integrity constraints are ported. For
more information about integrity constraints, see the section on SAS files in
SAS Language Reference: Concepts.
Alias CON=
Default YES
Interactions You cannot specify both CONSTRAINT= and INDEX= in the same
PROC CPORT step.
DATECOPY
copies the SAS internal date and time at which the SAS file was created and the
date and time at which it was last modified to the resulting transport file. Note
that the operating environment date and time are not preserved.
Restriction DATECOPY can be used only when the destination file uses the V8
or V9 engine.
Note If the file that you are transporting has attributes that require
additional processing, then the last modified date might be changed
to the current date and time.
You can alter the file creation date and time with the DTC= option in
the MODIFY statement in a PROC DATASETS step. For details, see
“MODIFY Statement” on page 644.
EET=(etype(s))
excludes specified entry types from the transport file. If etype is a single entry
type, then you can omit the parentheses. Separate multiple values with a space.
Interaction You cannot use both the EET= option and the ET= option in the
same PROC CPORT step.
ET=(etype(s))
includes specified entry types in the transport file. If etype is a single entry type,
then you can omit the parentheses. Separate multiple values with a space.
PROC CPORT Statement 543
Interaction You cannot use both the EET= option and the ET= option in the
same PROC CPORT step.
FILE=fileref | 'filename'
specifies a previously defined fileref or the filename of the transport file to
write to. If you omit the FILE= option, then PROC CPORT writes to the fileref
SASCAT, if defined. If the fileref SASCAT is not defined, PROC CPORT writes to
SASCAT.DAT in the current directory.
Note The behavior of PROC CPORT when SASCAT is undefined varies from
one operating environment to another. For details, see the SAS
documentation for your operating environment.
GENERATION=YES | NO
specifies whether to export all generations of a SAS data set. To export only the
base generation of a data set, specify GENERATION=NO in the PROC CPORT
statement. To export a specific generation number, use the GENNUM= data set
option when you specify a data set in the PROC CPORT statement. For more
information about generation data sets, see SAS Language Reference: Concepts.
Alias GEN=
Note PROC CIMPORT imports all generations of a data set that are present
in the transport file. It deletes any previous generation set with the
same name and replaces it with the imported generation set, even if the
number of generations does not match.
INDEX=YES | NO
specifies whether to export indexes with indexed SAS data sets.
Default YES
Interactions You cannot specify both INDEX= and CONSTRAINT= in the same
PROC CPORT step.
INTYPE=DBCS-type
specifies the type of DBCS data stored in the SAS files to be exported. Double-
byte character set (DBCS) data uses up to two bytes for each character in the
set. DBCS-type must be one of the following values:
Default If the INTYPE= option is not used, the DBCS type defaults to the
value of the SAS system option DBCSTYPE=.
Restriction The INTYPE= option is allowed only if SAS is built with Double-
Byte Character Set (DBCS) extensions. Because these extensions
require significant computing resources, there is a special
distribution for those sites that require it. An error is reported if this
option is used at a site for which DBCS extensions are not enabled.
Interactions Use the INTYPE= option in conjunction with the OUTTYPE= option
to change from one type of DBCS data to another.
You cannot use the INTYPE= option and the ASIS option in the
same PROC CPORT step.
Tip You can set the value of the SAS system option DBCSTYPE= in
your configuration file.
MEMTYPE=mtype
restricts the type of SAS file that PROC CPORT writes to the transport file.
MEMTYPE= restricts processing to one member type. Values for mtype can be
ALL
both catalogs and data sets
CATALOG
CAT
catalogs
DATA
DS
SAS data sets
Alias MT=
Default ALL
NOCOMPRESS
suppresses the compression of binary zeros and blanks in the transport file.
Alias NOCOMP
Note Compression of the transport file does not alter the flag in each
catalog and data set that indicates whether the original file was
compressed.
PROC CPORT Statement 545
NOEDIT
exports SAS/AF PROGRAM and SCL entries without Edit capability when you
import them.
The NOEDIT option produces the same results as when you create a new
catalog to contain SCL code by using the MERGE statement with the NOEDIT
option in the BUILD procedure of SAS/AF software.
Alias NEDIT
Note The NOEDIT option affects only SAS/AF PROGRAM and SCL entries. It
does not affect FSEDIT SCREEN or FSVIEW FORMULA entries.
NOSRC
specifies that exported catalog entries contain compiled SCL code but not the
source code.
The NOSRC option produces the same results as when you create a new catalog
to contain SCL code by using the MERGE statement with the NOSOURCE
option in the BUILD procedure of SAS/AF software.
Alias NSRC
OUTLIB=libref
specifies a libref associated with a SAS library. If you specify the OUTLIB=
option, PROC CIMPORT is invoked automatically to re-create the input library,
data set, or catalog in the specified library.
Alias OUT=
Tip Use the OUTLIB= option when you change SAS files from one DBCS type
to another within the same operating environment if you want to keep the
original data intact.
OUTTYPE=UPCASE
writes all displayed characters to the transport file and to the OUTLIB= file in
uppercase.
TAPE
directs the output from PROC CPORT to a tape.
TRANSLATE=(translation-list)
translates specified characters from one ASCII or EBCDIC value to another.
Each element of translation-list has the following form:
n ASCII-value-1 TO ASCII-value-2
n EBCDIC-value-1 TO EBCDIC-value-2
You can use hexadecimal or decimal representation for ASCII values. If you use
the hexadecimal representation, values must begin with a digit and end with an
x. Use a leading zero if the hexadecimal value begins with an alphabetic
character.
546 Chapter 16 / CPORT Procedure
For example, to translate all left brackets to left braces, specify the
TRANSLATE= option as follows (for ASCII characters):
translate=(5bx to 7bx)
The following example translates all left brackets to left braces and all right
brackets to right braces:
translate=(5bx to 7bx 5dx to 7dx)
EXCLUDE Statement
Excludes specified files or entries from the transport file.
Interaction: You can use either EXCLUDE statements or SELECT statements in a PROC CPORT
step, but not both.
Tip: There is no limit to the number of EXCLUDE statements that you can use in one
invocation of PROC CPORT.
Syntax
EXCLUDE SAS file(s) | catalog entry(s) </ MEMTYPE=mtype>
</ ENTRYTYPE=entry-type>;
Required Argument
SAS file(s) | catalog entry(s)
specifies one or more SAS files or one or more catalog entries to be excluded
from the transport file. Specify SAS filenames when you export a SAS library;
specify catalog entry names when you export an individual SAS catalog.
Separate multiple filenames or entry names with a space. You can use shortcuts
to list many like-named files in the EXCLUDE statement. For more information,
see “SAS Data Sets” in SAS Language Reference: Concepts.
Optional Arguments
ENTRYTYPE=entry-type
specifies a single entry type for the catalog entries listed in the EXCLUDE
statement. See SAS Language Reference: Concepts for a complete list of catalog
entry types.
MEMTYPE=mtype
specifies a single member type for one or more SAS files listed in the EXCLUDE
statement. Valid values are CATALOG, DATA, or ALL. If you do not specify the
SELECT Statement 547
SELECT Statement
Includes specified files or entries in the transport file.
Interaction: You can use either EXCLUDE statements or SELECT statements in a PROC CPORT
step, but not both.
Tip: There is no limit to the number of SELECT statements that you can use in one
invocation of PROC CPORT.
Examples: “Example 2: Exporting Individual Catalog Entries” on page 555
“Example 4: Using PROC CIMPORT to Import a French Data Set into a UTF-8 SAS
Session” on page 414
Syntax
SELECT SAS file(s) | catalog entry(s) </ MEMTYPE=mtype>
</ ENTRYTYPE=entry-type> ;
548 Chapter 16 / CPORT Procedure
Required Argument
SAS file(s) | catalog entry(s)
specifies one or more SAS files or one or more catalog entries to be included in
the transport file. Specify SAS filenames when you export a SAS library; specify
catalog entry names when you export an individual SAS catalog. Separate
multiple filenames or entry names with a space. You can use shortcuts to list
many like-named files in the SELECT statement. For more information, see “SAS
Data Sets” in SAS Language Reference: Concepts.
Optional Arguments
ENTRYTYPE=entry-type
specifies a single entry type for the catalog entries listed in the SELECT
statement. See SAS Language Reference: Concepts for a complete list of catalog
entry types.
MEMTYPE=mtype
specifies a single member type for one or more SAS files listed in the SELECT
statement. Valid values are CATALOG, DATA, or ALL. If you do not specify the
MEMTYPE= option in the SELECT statement, then processing is restricted to
those member types specified in the MEMTYPE= option in the PROC CPORT
statement.
TRANTAB Statement
Specifies translation tables for characters in catalog entries that you export.
Restriction: The TRANTAB statement does not support DBCS or UTF-8 SAS sessions.
Tip: You can specify only one translation table for each TRANTAB statement. However,
you can use more than one translation table in a single invocation of PROC CPORT.
See: The TRANTAB Statement for the CPORT Procedure and the UPLOAD and
DOWNLOAD Procedures in SAS National Language Support (NLS): Reference Guide.
Example: “Example 4: Applying a Translation Table” on page 557
Syntax
TRANTAB NAME=translation-table-name <options>;
If you are working with a password protected data set, you can supply that
password using the READ= option. If you do not supply the password using the
READ= option for a read-protected data set, you are prompted for the password.
Use the READ= data set option to include the appropriate password for the read-
protected data set when creating a transport file. In Example 1, PROC CPORT
copies the input file that is named SOURCE.GRADES, includes the password
ADMIN with the data set, and creates the transport file named GRADESOUT.
If the password is omitted when referring to a password protected data set, SAS
prompts for the password. If an invalid password is specified, an error message is
sent to the log. Here is an example error:
ERROR: Invalid or missing READ password on member WORK.XYZ.DATA
For details about the READ= data set option, see SAS Data Set Options: Reference,
and for details about password-protected data sets, see SAS Language Reference:
Concepts.
Note: PROC CIMPORT does not require a password in order to restore the
transport file in the target environment. However, other SAS procedures that use
the password-protected data set must include the password.
n the Windows encoding that is associated with the locale of the SAS session in
which the transport file is created. However, starting in SAS 9.4M3, PROC
CIMPORT supports the ability to import data sets that are created in non-UTF-8
SAS sessions into UTF-8 SAS sessions.
Using PROC CIMPORT to import a data set in a UTF-8 session preserves the
encoding value of the data set. For example, if a data set with SHIFT-JIS
encoding is imported into a UTF-8 session using PROC CIMPORT, PROC
CONTENTS shows that the SHIFT-JIS encoding is maintained.
Usage: CPORT Procedure 551
n In SAS 9.4, when you use PROC CPORT to create a transport file that is
encoded with US-ASCII on an ASCII platform, regardless of the session
encoding, the US-ASCII encoding is preserved for that transport file. If you then
transport that data set to an ASCII platform using PROC CIMPORT, the US-
ASCII encoding for that transport file is preserved and is not transcoded. The
data set that is created has the US-ASCII encoding, not the session encoding.
For example, if your session encoding is WLATIN1, you use PROC CPORT to
create a data set that has an encoding of US-ASCII. The US-ASCII encoding is
preserved in the transport file, instead of the WLATIN1 encoding. This
preservation also occurs when you use PROC CIMPORT on this data set. The
US-ASCII encoding is preserved and is not transcoded when you use PROC
CIMPORT to transport the data set to an ASCII platform.
n on a z/OS platform, PROC CIMPORT creates data sets using the session
encoding.
Encoding Value
of the Example of Applying an
Transport File Encoding in a SAS Invocation Explanation
For a complete list of encodings that are associated with each locale, see Locale
Tables in SAS National Language Support (NLS): Reference Guide.
In order for a transport file to be imported successfully, the encodings of the source
and target SAS sessions must be compatible. Here is an example of compatible
source and target SAS sessions:
552 Chapter 16 / CPORT Procedure
The encodings of the source and target SAS sessions are compatible because the
Windows default encoding for the es_MX locale is WLATIN1 and the encoding of
the target SAS session is WLATIN1. For more detailed information about
compatible languages and encodings, see the SAS Press book SAS Encoding:
Understanding the Details, by Manfred Kiefer.
However, if the encodings of the source and target SAS sessions are incompatible, a
transport file might not be successfully imported. (See the introduction to this
section.) Here is an example of incompatible encodings:
UNIX SAS
Session Transport File
Locale Encoding Encoding Locale z/OS Encoding
The encodings of the source and target SAS sessions are incompatible because the
Windows default encoding for the cs_CZ locale is WLATIN2 and the encoding of
the target SAS session is OPEN_ED-1141. A transport file cannot be imported
between these locales.
When importing transport files, you can use the ENCODINGINFO= option to see
the encoding value of the transport file. Otherwise, you are alerted to compatibility
problems via warnings and error messages. For more information about the
ENCODINGINFO= option, see “ENCODINGINFO=ALL | n” on page 394.
characteristics. When you reference a transport file, you must specify the following
DCB characteristics:
n LRECL=80
n BLKSIZE=8000
n RECFM=FB
n DSORG=PS
Another common problem can occur if you use communications software to move
files from another environment to z/OS. In some cases, the transport file does not
have the proper DCB characteristics when it arrives on z/OS. If the communications
software does not allow you to specify file characteristics, try the following
approach for z/OS:
1 Create a file under z/OS with the correct DCB characteristics and initialize the
file.
2 Move the transport file from the other environment to the newly created file
under z/OS using binary transfer.
Details
This example shows how to use PROC CPORT to export entries from all of the SAS
catalogs in the SAS library that you specify.
Program
libname source 'sas-library';
filename tranfile 'transport-file';
host-options-for-file-characteristics;
proc cport library=source file=tranfile memtype=catalog;
run;
Program Description
Specify the library reference for the SAS library that contains the source files to
be exported and the file reference to which the output transport file is written.
The LIBNAME statement assigns a libref for the SAS library. The FILENAME
statement assigns a fileref and any operating environment options for file
characteristics for the transport file that PROC CPORT creates.
libname source 'sas-library';
filename tranfile 'transport-file';
host-options-for-file-characteristics;
Create the transport file. The PROC CPORT step executes on the operating
environment where the source library is located. MEMTYPE=CATALOG writes all
SAS catalogs in the source library to the transport file.
proc cport library=source file=tranfile memtype=catalog;
run;
Example 2: Exporting Individual Catalog Entries 555
Log Examples
Example Code 16.1 Exporting Multiple Catalogs
Details
This example shows how to use PROC CPORT to export individual catalog entries,
rather than all of the entries in a catalog.
Program
libname source 'sas-library';
filename tranfile 'transport-file';
host-options-for-file-characteristics;
proc cport catalog=source.finance file=tranfile;
select loan.scl;
run;
556 Chapter 16 / CPORT Procedure
Program Description
Assign library references. The LIBNAME and FILENAME statements assign a libref
for the source library and a fileref for the transport file, respectively.
libname source 'sas-library';
filename tranfile 'transport-file';
host-options-for-file-characteristics;
Write an entry to the transport file. SELECT writes only the LOAN.SCL entry to
the transport file for export.
proc cport catalog=source.finance file=tranfile;
select loan.scl;
run;
Log Examples
Example Code 16.2 Exporting Individual Catalog Entries
Details
This example shows how to use PROC CPORT to export a single SAS data set.
Program
libname source 'sas-library';
filename tranfile 'transport-file';
host-options-for-file-characteristics;
proc cport data=source.times file=tranfile;
Example 4: Applying a Translation Table 557
run;
Program Description
Assign library references. The LIBNAME and FILENAME statements assign a libref
for the source library and a fileref for the transport file, respectively.
libname source 'sas-library';
filename tranfile 'transport-file';
host-options-for-file-characteristics;
Specify the type of file that you are exporting. The DATA= specification in the
PROC CPORT statement tells the procedure that you are exporting a SAS data set
rather than a library or a catalog.
proc cport data=source.times file=tranfile;
run;
Log Examples
Example Code 16.3 Exporting a Single SAS Data Set
Details
This example shows how to apply a customized translation table to the transport
file before PROC CPORT exports it. For this example, assume that you have already
created a customized translation table called TTABLE1.
558 Chapter 16 / CPORT Procedure
Program
libname source 'sas-library';
filename tranfile 'transport-file';
host-options-for-file-characteristics;
proc cport catalog=source.formats file=tranfile;
trantab name=ttable1 type=(format);
run;
Program Description
Assign library references. The LIBNAME and FILENAME statements assign a libref
for the source library and a fileref for the transport file, respectively.
libname source 'sas-library';
filename tranfile 'transport-file';
host-options-for-file-characteristics;
Apply the translation specifics. The TRANTAB statement applies the translation
that you specify with the customized translation table TTABLE1. TYPE= limits the
translation to FORMAT entries.
proc cport catalog=source.formats file=tranfile;
trantab name=ttable1 type=(format);
run;
Log Examples
Example Code 16.4 Applying a Translation Table
Details
This example shows how to use PROC CPORT to transport only the catalog entries
with modification dates equal to or later than the date that you specify in the
AFTER= option.
Program
libname source 'sas-library';
filename tranfile 'transport-file';
host-options-for-file-characteristics;
proc cport catalog=source.finance file=tranfile
after='09sep1996'd;
run;
Program Description
Assign library references. The LIBNAME and FILENAME statements assign a libref
for the source library and a fileref for the transport file, respectively.
libname source 'sas-library';
filename tranfile 'transport-file';
host-options-for-file-characteristics;
Specify the catalog entries to be written to the transport file. AFTER= specifies
that only catalog entries with modification dates on or after September 9, 1996,
should be written to the transport file.
proc cport catalog=source.finance file=tranfile
after='09sep1996'd;
run;
Log Examples
PROC CPORT writes messages to the SAS log to inform you that it began the
export process for all the entries in the specified catalog. However, PROC CPORT
wrote only the entries LOAN.FRAME and LOAN.HELP in the FINANCE catalog to
the transport file because only those two entries had a modification date equal to
or later than September 9, 1996. That is, of all the entries in the specified catalog,
only two met the requirement of the AFTER= option.
560 Chapter 16 / CPORT Procedure
17
DATASETS Procedure
n modify attributes of SAS data sets and variables within the data sets
Notes
n Although the DATASETS procedure can perform some operations on catalogs,
generally the CATALOG procedure is the best utility to use for managing
catalogs.
n The term member often appears as a synonym for SAS file. If you are unfamiliar
with SAS files and SAS libraries, see “SAS Files Concepts” in SAS Language
Reference: Concepts.
n If the NOLIST option is specified, then the PROC DATASETS statement does
not list the SAS files. However, if you are using the ODS LISTING statement the
result is displayed in the Results window. To view the output only in the SAS
log, use the ODS EXCLUDE statement with the LISTING destination.
n PROC DATASETS cannot work with sequential data libraries.
n You cannot change the length of a variable using the LENGTH statement or the
LENGTH= option in an ATTRIB statement.
n There can be a discrepancy between the modified date in PROC DATASETS,
PROC CONTENTS, and other components of SAS, such as SAS Explorer. The
two modified dates and times are distinctly different:
o Operating-environment modified date and time is reported by the SAS
Explorer and the PROC DATASETS LIST option.
o The modified date and time reported by the CONTENTS statement is the
date and time that the data within the data set was actually modified.
564 Chapter 17 / DATASETS Procedure
n If you want to use the KILL functionality of the DATASETS procedure and the
DBMS=Redshift, then you need to use a SCHEMA= option in the LIBNAME
statement.
Procedure Execution
Execution of Statements
When you start the DATASETS procedure, you specify the procedure input library
in the PROC DATASETS statement. If you omit a procedure input library, the
procedure processes the current default SAS library (usually the Work library). To
specify a new procedure input library, issue the DATASETS procedure again.
Statements execute in the order in which they are written. Use CONTENTS, COPY,
CONTENTS if you want to see the contents of a data set, copy a data set, and then
visually compare the contents of the second data set with the first.
RUN-Group Processing
PROC DATASETS supports RUN-group processing. RUN-group processing enables
you to submit RUN groups without ending the procedure.
The DATASETS procedure supports four types of RUN groups. Each RUN group is
defined by the statements that compose it and by what causes it to execute.
The following list discusses what statements compose a RUN group and what
causes each RUN group to execute:
Concepts: DATASETS Procedure 565
o AGE
o CHANGE
o DELETE
o EXCHANGE
o REPAIR
o SAVE
If any of these statements appear in sequence in the PROC step, the sequence
forms a RUN group. For example, if a REPAIR statement appears immediately
after a SAVE statement, the REPAIR statement does not force the SAVE
statement to execute; it becomes part of the same RUN group. To execute the
RUN group, submit one of the following statements:
o PROC DATASETS
o APPEND
o CONTENTS
o COPY
o MODIFY
o QUIT
o RUN
o another DATA or PROC step
SAS reads the program statements that are associated with one task until it
reaches a RUN statement or an implied RUN statement. SAS executes all of the
preceding statements immediately and continues reading until it reaches another
RUN statement or implied RUN statement. To execute the last task, you must use a
RUN statement or a statement that stops the procedure.
proc datasets;
/* RUN group */
566 Chapter 17 / DATASETS Procedure
change nutr=fatg;
delete bldtest;
exchange xray=chest;
/* RUN group */
copy out=dest;
select report;
/* RUN group */
modify bp;
label dias='Taken at Noon';
rename weight=bodyfat;
/* RUN group */
append base=tissue data=newtiss;
quit;
Note: If you are running in interactive line mode, you can receive messages that
statements have already executed before you submit a RUN statement. Plan your
tasks carefully if you are using this environment for running PROC DATASETS.
Error Handling
Generally, if an error occurs in a statement, the RUN group containing the error
does not execute. RUN groups preceding or following the one containing the error
execute normally. The MODIFY RUN group is an exception. If a syntax error occurs
in a statement subordinate to the MODIFY statement, only the statement
containing the error fails. The other statements in the RUN group execute.
Note: If the first word of the statement (the statement name) is in error and the
procedure cannot recognize it, the procedure treats the statement as part of the
preceding RUN group.
Password Errors
If there is an error involving an incorrect or omitted password in a statement, the
error affects only the statement containing the error. The other statements in the
RUN group execute.
Extended Attributes
Extended attributes are customized metadata for your SAS files. They are user-
defined characteristics that you associate with a SAS data set or variable. Whereas
common SAS attributes, such as length for variables for data sets are predefined
SAS system attributes, extended attributes are attributes that you define yourself.
Extended attributes are organized into (name, value) pairs.
Use the MODIFY statement to add, delete, remove, set, and update extended
attributes. When using the COPY statement, if the OUT= library engine supports
extended attributes, they are copied. Extended attributes are not appended when
using the APPEND statement, unless the BASE= data set does not exist.
Extended attributes can be used to automate tasks that require a custom attribute
to be associated with a variable or a data set.
The following output shows a data set and variables with extended attributes.
568 Chapter 17 / DATASETS Procedure
For information about using extended attributes, see “XATTR ADD Statement” on
page 660, “XATTR DELETE Statement” on page 661, “XATTR REMOVE
Statement” on page 662, “XATTR SET Statement” on page 663, and “XATTR
UPDATE Statement” on page 664.
The following example shows that there are 6 Levels in the libref named Samp.
Concepts: DATASETS Procedure 569
570 Chapter 17 / DATASETS Procedure
The bottom portion of PROC CONTENTS (related to the variable metadata) is the
metadata that is represented in the SAS session. Based on what type of
transcoding that might or might not be needed to go from the CAS UTF–8 encoding
to the SAS session encoding. The variable byte length used in SAS is dependent on
the encoding of the SAS session. The variable byte length used in the SAS session
might differ from the byte length.
Use the following code to show the contents of the Mycas library:
proc datasets lib=mycas;
contents data=cars;
run;
Concepts: DATASETS Procedure 571
Use the following code to show the contents of the Mycas library:
proc datasets lib=mycas;
contents data=mycs.cars directory details;
run;
572 Chapter 17 / DATASETS Procedure
To obtain the number of observations, run an SQL query to obtain the count using
the LIBNAME engine or explicit-pass through on the DBMS table.
To obtain the information about indexes and integrity constraints, you can use
explicit pass-through of database-specific syntax to query the system tables. For
documentation on explicit pass-through, see “Connecting to a DBMS By Using the
SQL Procedure Pass-Through Facility” in SAS SQL Procedure User’s Guide.
Concepts: DATASETS Procedure 573
By default, the PROC DATASETS statement itself produces two output objects:
Members and Directory. These objects are routed to the log. The CONTENTS
statement produces three output objects by default: Attributes, EngineHost, and
Variables. (The use of various options adds other output objects.) These objects
are routed to the procedure output table. If you open an ODS destination (such as
HTML, RTF, or PRINTER), all of these objects are, by default, routed to that
destination.
574 Chapter 17 / DATASETS Procedure
You can use the ODS SELECT and ODS EXCLUDE statements to control which
objects go to which destination, just as you can for any other procedure.
PROC DATASETS and PROC CONTENTS assign a name to each table that they
create. You can use these names to reference the table when using the Output
Delivery System (ODS) to select tables and create output tables.
PROC CONTENTS generates the same ODS tables as PROC DATASETS with the
CONTENTS statement.
Table 17.1 ODS Tables Produced by the DATASETS Procedure without the
CONTENTS Statement
Table 17.2 ODS Table Names Produced by PROC CONTENTS and PROC DATASETS with the
CONTENTS Statement
Attributes Data set attributes Unless you specify the SHORT option
Position A detailed listing of variables If you specify the VARNUM option and you do
by logical position in the not specify the SHORT option
table
PositionShort A concise listing of variables If you specify the VARNUM option and the
by logical position in the SHORT option
table
Variables A detailed listing of variables Unless you specify the SHORT option
in alphabetical order
APPEND Add observations from one SAS data set to the Ex. 6
end of another SAS data set
INITIATE Create an audit file that has the same name as Ex. 8
the SAS data file and a data set type of AUDIT
SAVE Delete all the SAS files except the ones listed Ex. 3
in the SAVE statement
Syntax
PROC DATASETS <options>;
forces either a RUN group to execute even when there are errors or forces
an Append operation.
GENNUM=ALL | HIST | REVERT | integer
restricts processing for generation data sets.
KILL
deletes SAS files.
LIBRARY=libref
specifies the procedure input/output library.
MEMTYPE=(member-type(s))
restricts processing to a certain type of SAS file.
NODETAILS
see the description of DETAILS | NODETAILS.
NOLIST
suppresses the printing of the directory.
NOPRINT
suppresses the printing of the output to the log and listing.
NOWARN
suppresses error processing.
PW= password
provides Read, Write, or Alter access.
READ=read-password
provides Read access.
Optional Arguments
ALTER=alter-password
provides the Alter password for any alter-protected SAS files in the SAS library.
DETAILS | NODETAILS
determines whether the following columns are written to the log:
Obs, gives the number of observations for SAS files of type AUDIT,
Entries, DATA, and VIEW; the number of entries for type CATALOG; and
or the number of files of type INDEX that are associated with a
Indexes data file, if any. If SAS cannot determine the number of
observations in a SAS data set, the value in this column is set to
missing. For example, in a very large data set, if the number of
observations or deleted observations exceeds the number that
can be stored in a double-precision integer, the count shows as
missing. The value for type CATALOG is the total number of
entries. For other types, this column is blank.
Vars gives the number of variables for types AUDIT, DATA, and
VIEW. If SAS cannot determine the number of variables in the
SAS data set, the value in this column is set to missing. For
other types, this column is blank.
Label contains the label associated with the SAS data set. This
column prints a label only for the type DATA.
PROC DATASETS Statement 581
TIP The value for files of type INDEX includes both user-defined
indexes and indexes created by integrity constraints. To view index
ownership and attribute information, use PROC DATASETS with the
CONTENTS statement and the OUT2 option.
Note: The DETAILS option affects output only when a directory is specified and
requires Read access to all read-protected SAS files in the SAS library. If you do
not supply the Read password, the directory listing contains missing values for
the columns produced by the DETAILS option.
Tip If you are using the SAS windowing environment and specify the
DETAILS option for a library that contains read-protected SAS files, a
dialog box prompts you for each Read password that you do not
specify in the PROC DATASETS statement. Therefore, you might want
to assign the same Read password to all SAS files in the same SAS
library.
ENCRYPTKEY=key-value
specifies the key value for AES encryption.
FORCE
performs two separate actions:
n forces a RUN group to execute even if errors are present in one or more
statements in the RUN group. See “RUN-Group Processing” on page 564 for
a discussion of RUN-group processing and error handling.
n forces all APPEND statements to concatenate two data sets even when the
variables in the data sets are not exactly the same. The APPEND statement
drops the extra variables and issues a warning message to the SAS log
unless the NOWARN option is specified (either with the APPEND statement
or PROC DATASETS). For more information about the FORCE option, see
APPEND statement on page 586.
KILL
deletes all SAS files in the SAS library that are available for processing. The
MEMTYPE= option subsets the member types that the statement deletes. The
following example deletes all the data files in the Work library:
proc datasets lib=work kill memtype=data; run; quit;
CAUTION
The KILL option deletes the SAS files immediately after you submit the
statement. If the SAS file has an ALTER= password assigned, it must be specified
in order to delete the SAS file.
LIBRARY=libref
names the library that the procedure processes. This library is the procedure
input/output library.
Aliases DDNAME=
DD=
LIB=
Note A SAS library that is accessed via a sequential engine (such as a tape
format engine) cannot be specified as the value of the LIBRARY=
option.
See For information about the Work and User libraries, see “One-level SAS
Data Set Names” in SAS Language Reference: Concepts.
MEMTYPE=(member-type(s))
restricts processing to one or more member types and restricts the listing of the
data library directory to SAS files of the specified member types. For example,
the following PROC DATASETS statement limits processing to SAS data sets in
the default data library and limits the directory listing in the SAS log to SAS files
of member type DATA:
PROC DATASETS Statement 583
Aliases MTYPE=
MT=
Default ALL
NODETAILS
See “DETAILS|NODETAILS” on page 580.
NOLIST
suppresses the printing of the directory of the SAS files in the SAS log and any
open non-LISTING destination.
Note If you have ODS LISTING turned on and open a non-LISTING ODS
destination, PROC DATASETS output goes to both the SAS log and
the ODS destination. The NOLIST option suppresses output to both.
To see the output in the SAS log only, use the ODS EXCLUDE
statement by specifying the Members and Directory output objects.
For example, if the RTF and LISTING destinations are both open and
Directory and Members information is desired in the LOG window only,
use the following:
NOPRINT
suppresses the printing of the output and the printing of the directory of the
SAS files in the log and any open non-LISTING destination. The NOPRINT
option is a combination of the NOLIST option and the NOPRINT option in the
CONTENTS statement.
NOWARN
suppresses the error processing that occurs when a SAS file that is specified in a
SAVE, CHANGE, EXCHANGE, REPAIR, DELETE, or COPY statement or listed as
the first SAS file in an AGE statement, is not in the procedure input library.
When an error occurs and the NOWARN option is in effect, PROC DATASETS
continues processing that RUN group. If NOWARN is not in effect, PROC
DATASETS stops processing that RUN group and issues a warning for all
operations except DELETE, for which it does not stop processing.
584 Chapter 17 / DATASETS Procedure
PW= password
provides the password for any protected SAS files in the SAS library. PW= can
act as an alias for READ=, WRITE=, or ALTER=.
READ=read-password
provides the Read password for any read-protected SAS files in the SAS library.
AGE Statement
Renames a group of related SAS files in a library.
Syntax
AGE current-name related-SAS-file(s)
</ <ALTER=alter-password> <MEMTYPE=member-type>>;
Required Arguments
current-name
is a SAS file that the procedure renames. current-name receives the name of the
first name in related-SAS-file(s).
related-SAS-file(s)
is one or more SAS files in the SAS library.
Optional Arguments
ALTER=alter-password
provides the Alter password for any alter-protected SAS files named in the AGE
statement. Because an AGE statement renames and deletes SAS files, you need
Alter access to use the AGE statement. You can use the option either in
parentheses after the name of each SAS file or after a forward slash.
MEMTYPE=member-type
restricts processing to one member type. All of the SAS files that you name in
the AGE statement must be the same member type. You can use the option
after a forward slash after the name of each SAS file.
Aliases MTYPE=
AGE Statement 585
MT=
Details
The AGE statement renames the current-name to the name of the first name in the
related-SAS-files, renames the first name in the related-SAS-files to the second
name in the related-SAS-files, and so on, until it changes the name of the next-to-
last SAS file in the related-SAS-files to the last name in the related-SAS-files. The
AGE statement then deletes the last file in the related-SAS-files.
If the first SAS file named in the AGE statement does not exist in the SAS library,
PROC DATASETS stops processing the RUN group containing the AGE statement
and issues an error message. The AGE statement does not age any of the related-
SAS-files. To override this behavior, use the NOWARN option in the PROC
DATASETS statement.
If one of the related-SAS-files does not exist, the procedure prints a warning
message to the SAS log but continues to age the SAS files that it can.
If you age a data set that has an index, the index continues to correspond to the
data set.
You can age only entire generation groups. For example, if data sets A and B have
generation groups, then the following statement deletes generation group B and
ages (renames) generation group A to the name B:
age a b;
For example, suppose the generation group for data set A has three historical
versions and the generation group for data set B has two historical versions. Then
aging A to B has this effect:
A base B base
A 1 B 1
A 2 B 2
A 3 B 3
B base is deleted
586 Chapter 17 / DATASETS Procedure
B 1 is deleted
B 2 is deleted
APPEND Statement
Adds the observations from one SAS data set to the end of another SAS data set.
Default: If the BASE= data set is accessed through a SAS server and if no other user has the
data set open at the time the APPEND statement begins processing, the BASE=
data set defaults to CNTLLEV=MEMBER (member-level locking). When this
behavior happens, no other user can update the file while the data set is processed.
Restriction: The BASE= option cannot be an existing CAS table.
Requirement: The BASE= data set must be a member of a SAS library that supports update
processing.
Note: If you use the DROP=, KEEP=, or RENAME= options on the BASE= data set, the
options ONLY affect the APPEND processing and does not change the variables in
the appended BASE= data set. Variables that are dropped or not kept using the
DROP= and KEEP= options still exist in the appended BASE= data set. Variables
that are renamed using the RENAME= option remain with their original name in the
appended BASE= data set. The CAS engine does not support update processing
and therefore an existing CAS table cannot be specified with BASE=.
Tips: You can specify most data set options for the BASE= argument and DATA= option.
However, if you specify DROP=, KEEP=, or RENAME= data set option for the
BASE= data set, the option is ignored. You can use any global statements as well.
If a failure occurs during processing, the data set is marked as damaged and is reset
to the pre-append condition at the next REPAIR statement. If the data set has an
index, the index is not updated with each observation but is updated once at the
end. (This behavior is Version 7 and later, as long as APPENDVER=V6 is not set.)
Each password and encryption key option must be coded on a separate line to
ensure that they are properly blotted in the log.
Example: “Example 6: Concatenating Two SAS Data Sets” on page 710
Syntax
APPEND BASE=<libref.>SAS-data-set
<APPENDVER=V6>
<DATA=<libref.>SAS-data-set>
<ENCRYPTKEY=key-value>
<FORCE>
APPEND Statement 587
<GETSORT>
<NOWARN>;
Required Argument
BASE=<libref.> SAS-data-set
names the data set to which you want to add observations.
libref
specifies the library that contains the SAS data set. If you omit the libref, the
default is the libref for the procedure input library. If you are using PROC
APPEND, the default for libref is either Work or User.
SAS-data-set
names a SAS data set. If the APPEND statement cannot find an existing data
set with this name, it creates a new data set in the library. That is, you can
use the APPEND statement to create a data set by specifying a new data set
name in the BASE= argument.
Whether you are creating a new data set or appending to an existing data set,
the BASE= data set is the current SAS data set after all Append operations.
Alias OUT=
Optional Arguments
APPENDVER=V6
uses the Version 6 behavior for appending observations to the BASE= data set,
which is to append one observation at a time. Beginning in Version 7, to improve
performance, the default behavior changed so that all observations are
appended after the data set is processed.
DATA=<libref.> SAS-data-set
names the SAS data set containing observations that you want to append to the
end of the SAS data set specified in the BASE= argument.
libref
specifies the library that contains the SAS data set. If you omit libref, the
default is the libref for the procedure input library. The DATA= data set can
be from any SAS library. You must use the two-level name if the data set
resides in a library other than the procedure input library.
SAS-data-set
names a SAS data set. If the APPEND statement cannot find an existing data
set with this name, it stops processing.
Alias NEW=
Default the most recently created SAS data set, from any SAS library
ENCRYPTKEY=key-value
specifies the key value for AES encryption.
FORCE
forces the APPEND statement to concatenate data sets when the DATA= data
set contains variables that meet one of the following criteria:
n are not in the BASE= data set
n do not have the same type as the variables in the BASE= data set
Tip You can use the GENNUM= data set option to append to or from a
specific version in a generation group. Here are some examples:
/* appends historical version to base A */
proc datasets;
append base=a
data=a (gennum=2);
See “Appending to Data Sets with Different Variables” on page 594 and
“Appending to Data Sets That Contain Variables with Different
Attributes” on page 595
GETSORT
copies the sort indicator from the DATA= data set to the BASE= data set. The
sort indicator is established by either a PROC SORT or an ORDERBY clause in
PROC SQL if the following criteria are met:
n The BASE= data set must meet the following criteria:
CAUTION
Any pre-existing sort indicator on the BASE= data set is overwritten with
no warning, even if the DATA= data set is not sorted at all.
Restrictions The GETSORT option has no effect on the data sets if the BASE=
data set has an audit trail associated with it. This restriction causes
a WARNING in the output while the APPEND process continues.
The GETSORT option has no effect on the data sets if there are
dropped, kept, or renamed variables in the DATA= data file.
NOWARN
suppresses the warning when used with the FORCE option to concatenate two
data sets with different variables.
Details
You can use the WHERE= data set option with the DATA= data set to restrict the
observations that are appended. Similarly, you can use the WHERE statement to
restrict the observations from the DATA= data set. The WHERE statement has no
effect on the BASE= data set.
CAUTION
If there is a WHERE option used on an existing BASE= data set, it is used only
if the WHEREUP= option is set to YES. If the BASE= data set does not exist, then
no WHEREUP= option is needed for the WHERE option to take effect.
Note: The block I/O method cannot be used when appending a CAS table to a SAS
data set.
The block I/O method is used to append blocks of data instead of one observation
at a time. This method increases performance when you are appending large data
sets. SAS determines whether to use the block I/O method. Not all data sets can
use the block I/O method. There are restrictions set by the APPEND statement and
the Base SAS engine.
To display information in the SAS log about the append method that is being used,
you can specify the MSGLEVEL= system option as follows:
options msglevel=i;
The following message is written to the SAS log, if the block I/O method is not
used:
INFO: Data set block I/O cannot be used because:
If the APPEND statement determines that the block I/O will not be used, one of the
following explanations is written to the SAS log:
INFO: - The data sets use different engines, have different variables
or have attributes that might differ.
If the Base SAS engine determines that the block I/O method will not be used, one
of the following explanations is written to the SAS log:
INFO: - Referential Integrity Constraints exist.
CAUTION
For an existing BASE= data set: If there is a WHERE statement in the BASE= data
set, it takes effect only if the WHEREUP= option is set to YES.
APPEND Statement 591
CAUTION
For the non-existent BASE= data set: If there is a WHERE statement in the non-
existent BASE= data set, regardless of the WHEREUP option setting, you use the
WHERE statement.
Note: You cannot append a data set to itself by using the WHERE= data set option.
n All variables in the BASE= data set have the same length and type as the
variables in the DATA= data set and if all variables exist in both data sets.
Note: You can use the CONTENTS statement to see the variable lengths and
types.
proc datasets;
append base=a (encryptkey=secret)
data=a (encryptkey=jlgh56);
run;
When appending to a non-encrypted data set, you must specify the ENCRYPTKEY=
on the DATA= data set. Then, you can append to the data set even if the BASE=
data set engine does not support AES encryption. The data appended will not be
encrypted.
proc datasets;
append base=a
data=a (encryptkey=key-value);
run;
For more information about AES encryption, see “AES Encryption” in SAS
Programmer’s Guide: Essentials. For more information about the ENCRYPTKEY=
data set option, see “ENCRYPTKEY= Data Set Option” in SAS Data Set Options:
Reference.
Either or both of the BASE= SAS data set and DATA= data set or CAS table can be
compressed. If the BASE= data set allows the reuse of space from deleted rows, the
APPEND statement might insert the rows into the middle of the BASE= data set.
For information about the COMPRESS= and REUSE= data set and system options,
see SAS Data Set Options: Reference and SAS System Options: Reference.
PROC APPEND places all of the new observations at the end of the BASE= data-
set and tries to reuse space, which is part of the fast-append behavior that was
added in SAS 7. The fast-append behavior was added for better performance
writing to the BASE data set and updating indexes. If you want to reuse space, then
you need to add the APPENDVER=v6 option to the PROC DATASETS APPEND
statement. PROC DATASETS APPEND reuses space only when the new
observations fit into the space previously occupied by the deleted observations.
proc append base=libref.data-set data=librefdata-set appendver=v6;
run;
APPEND Statement 593
If the formats in the DATA= SAS data set or CAS table are different from those in
the BASE= SAS data set, then the formats in the BASE= data set are used.
However, the data from the DATA= data set or table is not converted in order to be
consistent with the formats in the BASE= data set. The result could be data that
seems to be incorrect. A warning message is displayed in the log.
The fast-append method is used by default when the following requirements are
met. Otherwise, the Version 6 method is used:
n The BASE= data set is open for member-level locking. If CNTLLEV= is set to
record, then the fast-append method is not used.
n The BASE= data set does not contain referential integrity constraints.
n The BASE= data set is not accessed using the Cross Environment Data Access
(CEDA) facility.
n The BASE= data set is not using a WHERE= data set option.
To display information in the SAS log about the append method that is being used,
you can specify the MSGLEVEL= system option as follows:
594 Chapter 17 / DATASETS Procedure
options msglevel=i;
The current append method initially adds observations to the BASE= data set
regardless of the restrictions that are determined by the index. For example, a
variable that has an index that was created with the UNIQUE option does not have
its values validated for uniqueness until the index is updated. Then, if a nonunique
value is detected, the offending observation is deleted from the data set. After
observations are appended, some of them might subsequently be deleted.
For a simple example, consider that the BASE= data set has ten observations
numbered from 1 to 10 with a UNIQUE index for the variable ID. You append a data
set that contains five observations numbered from 1 to 5, and observations 3 and 4
both contain the same value for ID. The following actions occur:
1 After the observations are appended, the BASE= data set contains 15
observations numbered from 1 to 15.
2 SAS updates the index for ID, validates the values, and determines that
observations 13 and 14 contain the same value for ID.
3 SAS deletes one of the observations from the BASE= data set, resulting in 14
observations that are numbered from 1 to 15. For example, observation 13 is
deleted. Note that you cannot predict which observation is deleted, because the
internal sort might place either observation first. (In Version 6, you could predict
that observation 13 would be added and observation 14 would be rejected.)
If you do not want the current behavior (which could result in deleted observations)
or if you want to be able to predict which observations are appended, request the
Version 6 append method by specifying the APPENDVER=V6 option:
proc datasets;
append base=a data=b appendver=v6;
run;
Note: In Version 6, deleting the index and then re-creating it after the append
could improve performance. The current method might eliminate the need to do
that. However, the performance depends on the nature of your data.
If the BASE= data set contains a variable that is not in the DATA= data set, the
APPEND statement concatenates the data sets, but the observations from the
DATA= data set have a missing value for the variable that was not present in the
DATA= data set. The FORCE option is not necessary in this case.
If you use the DROP=, KEEP=, or RENAME= options on the BASE= data set, the
options ONLY affect the APPEND processing and does not change the variables in
APPEND Statement 595
the appended BASE= data set. Variables that are dropped or not kept using the
DROP= and KEEP= options still exist in the appended BASE= data set. Variables
that are renamed using the RENAME= option remain with their original name in the
appended BASE= data set.
If the SAS formats in the DATA= data set are different from those in the BASE=
data set, then the SAS formats in the BASE= data set are used. However, SAS does
not convert the data from the DATA= data set in order to be consistent with the
SAS formats in the BASE= data set. The result could be data that seems to be
incorrect. A warning message is displayed in the SAS log. The following example
illustrates appending data by using different SAS formats:
data format1;
input Date date9.;
format Date date9.;
datalines;
24sep1975
22may1952
;
data format2;
input Date datetime20.;
format Date datetime20.;
datalines;
25aug1952:11:23:07.4
;
If the length of a variable is longer in the DATA= data set than in the BASE= data
set, or if the same variable is a character variable in one data set and a numeric
variable in the other, use the FORCE option. Using FORCE has the following
consequences:
n The length of the variables in the BASE= data set takes precedence. SAS
truncates values from the DATA= data set to fit them into the length that is
specified in the BASE= data set.
596 Chapter 17 / DATASETS Procedure
n The type of the variables in the BASE= data set takes precedence. The APPEND
statement replaces values of the wrong type (all values for the variable in the
DATA= data set) with missing values.
data work.test_zipcode;
set sashelp.zipcode;
run;
System Failures
If a system failure or some other type of interruption occurs while the procedure is
executing, the Append operation might not be successful; it is possible that not all,
perhaps none, of the observations are added to the BASE= data set. In addition, the
BASE= data set might suffer damage. The Append operation performs an update in
place, which means that it does not make a copy of the original data set before it
begins to append observations. If you want to be able to restore the original
observations, you can initiate an audit trail for the base data file and choose to
store a before-update image of the observations. Then you can write a DATA step
to extract and reapply the original observations to the data file. For information
about initiating an audit trail, see “AUDIT Statement” on page 598.
ATTRIB Statement
Associates a format, informat, or label with variables in the SAS data set specified in the MODIFY
statement.
Syntax
ATTRIB variable-list(s) attribute-list(s);
Required Arguments
variable-list(s)
names the variables that you want to associate with the attributes. You can list
the variables in any form that SAS allows.
598 Chapter 17 / DATASETS Procedure
attribute-list(s)
specifies one or more attributes to assign to variable-list. Specify one or more of
the following attributes in the ATTRIB statement:
FORMAT=format
associates a format with variables in variable-list.
Tip The format can be either a standard SAS format or a format that is
defined with the FORMAT procedure.
INFORMAT=informat
associates an informat with variables in variable-list.
Tip The informat can be either a standard SAS informat or an informat that
is defined with the FORMAT procedure.
LABEL='label'
associates a label with variables in the variable-list.
Details
Within the DATASETS procedure, the ATTRIB statement must be used in a
MODIFY RUN group and can use only the FORMAT, INFORMAT, and LABEL
options. The ATTRIB statement is the simplest way to remove or change all
variable labels, formats, or informats in a data set using the keyword _ALL_. For an
example, see “Example 1: Removing All Labels and Formats in a Data Set” on page
691.
If you are not deleting or changing all attributes, it is easier to use the following
statements, LABEL statement on page 642 , FORMAT Statement on page 632,
and INFORMAT statement on page 640.
AUDIT Statement
Initiates and controls event logging to an audit file as well as suspends, resumes, or terminates event
logging in an audit file.
Tips: The AUDIT statement takes one of two forms, depending on whether you are
initiating the audit trail or suspending, resuming, or terminating event logging in an
audit file.
You can define attributes such as format and informat for the user variables in the
data file by using the PROC DATASETS MODIFY statement.
See: “Understanding an Audit Trail” in SAS Language Reference: Concepts
AUDIT Statement 599
Syntax
AUDIT SAS-file <(SAS-password <ENCRYPTKEY=key-value>
<GENNUM=integer>)>;
INITIATE <AUDIT_ALL=NO | YES>;
LOG<ADMIN_IMAGE=YES | NO>
<BEFORE_IMAGE=YES | NO>
<DATA_IMAGE=YES | NO>
<ERROR_IMAGE=YES | NO>;
<SUSPEND | RESUME | TERMINATE; >
<USER_VAR variable(s) >;
Required Argument
SAS-file
specifies the SAS data file in the procedure input library that you want to audit.
Optional Arguments
SAS-password
specifies the password for the SAS data file, if one exists. The parentheses are
required.
ENCRYPTKEY=key-value
specifies the key value for AES encryption.
GENNUM=integer
specifies that the SUSPEND, RESUME, or TERMINATE action be performed on
the audit trail of a generation file. You cannot initiate an audit trail on a
generation file. Valid values for GENNUM= are integers, which is a number that
references a specific version from a generation group. Specifying a positive
number is an absolute reference to a specific generation number that is
appended to a data set's name (that is, gennum=2 specifies MYDATA#002).
Specifying a negative number is a relative reference to a historical version in
relation to the base version, from the youngest to the oldest (that is, gennum=-1
refers to the youngest historical version). Specifying 0, which is the default,
refers to the base version. The parentheses are required.
Details
Note: The initiation of an audit trail is possible only with the Base SAS engine.
600 Chapter 17 / DATASETS Procedure
The following example creates the audit file MyLib.MyFile.audit to log updates to
the data file MyLib.MyFile.data, storing all available record images:
proc datasets library=mylib;
audit myfile (alter=password);
initiate;
run;
The following example creates the same audit file but stores only error record
images:
proc datasets library=mylib;
audit myfile (alter=password);
initiate;
log data_image=no
before_image=no
data_image=no;
run;
The AUDIT statement starts an audit run group for a file. Multiple audit run groups
for that file can be submitted in the following ways:
n in the same PROC DATASETS step
All audit file related statements (INITIATE, USER_VAR, LOG, SUSPEND, RESUME,
TERMINATE) must be preceded by an AUDIT statement, which identifies the file
that they apply to.
The INITIATE statement creates an audit file and must be submitted in the first
AUDIT statement occurrence. No other audit-related statement, such as
USER_VAR, LOG, SUSPEND, RESUME, or TERMINATE will be valid for that audit
file until the INITIATE statement has been submitted. You can initiate an audit file
on a generation data set, but it must be the latest generation of the generation
group. You cannot specify a GENNUM to identify the latest generation. You will get
the latest generation by default.
Here is an example of the AUDIT statement in the first AUDIT RUN group for a
given file:
AUDIT file <(SAS-password)>;
CHANGE Statement 601
Once an audit file has been initiated, the AUDIT statements that follow the
INITIATE statement for that file can specify a GENNUM:
AUDIT file <(<SAS-password><GENNUM=integer>)>;
The USER_VAR statement must directly follow the INITIATE statement in the same
AUDIT RUN group.
CHANGE Statement
Renames one or more SAS files in the same SAS library.
Syntax
CHANGE old-name-1=new-name-1
<old-name-2=new-name-2 …>
</ <ENCRYPTKEY=key-value>
<ALTER=alter-password> <GENNUM=ALL | integer> <MEMTYPE=member-type>>;
Required Argument
old-name=new-name
changes the name of a SAS file in the input data library. old-name must be the
name of an existing SAS file in the input data library.
Optional Arguments
ENCRYPTKEY=key-value
specifies the key value for AES encryption. This option is needed only if
RELATIVE GENNUM is specified. For more information, see “Library Contents
and AES Encryption” on page 607.
ALTER=alter-password
provides the Alter password for any alter-protected SAS files named in the
CHANGE statement. Because a CHANGE statement changes the names of SAS
files, you need Alter access to use the CHANGE statement for new-name. You
can use the option either in parentheses after the name of each SAS file or after
a forward slash.
GENNUM=ALL | integer
restricts processing for generation data sets. You can use the option either in
parentheses after the name of each SAS file or after a forward slash. The
following list shows valid values:
602 Chapter 17 / DATASETS Procedure
ALL
0
refers to the base version and all historical versions of a generation group.
integer
refers to a specific version from a generation group. Specifying a positive
number is an absolute reference to a specific generation number that is
appended to a data set's name (that is, gennum=2 specifies MYDATA#002).
Specifying a negative number is a relative reference to a historical version in
relation to the base version, from the youngest to the oldest (that is,
gennum=-1 refers to the youngest historical version).
For example, the following statements change the name of version A#003 to
base B:
proc datasets;
change A=B / gennum=3;
proc datasets;
change A(gennum=3)=B;
See “Restricting Processing for Generation Data Sets” on page 667 and
“Understanding Generation Data Sets” in SAS Language Reference:
Concepts
MEMTYPE=member-type
restricts processing to one member type. You can use the option either in
parentheses after the name of each SAS file or after a forward slash.
Aliases MTYPE=
MT=
Details
The CHANGE statement changes names by the order in which the old-names occur
in the directory listing, not in the order in which you list the changes in the
CHANGE statement.
If the old-name SAS file does not exist in the SAS library, PROC DATASETS stops
processing the RUN group containing the CHANGE statement and issues an error
message. To override this behavior, use the NOWARN option in the PROC
DATASETS statement.
CONTENTS Statement 603
If you change the name of a data set that has an index, the index continues to
correspond to the data set.
CONTENTS Statement
Describes the contents of one or more SAS data sets and prints the directory of the SAS library.
Restriction: You cannot use the WHERE option to affect the output because PROC CONTENTS
does not process any observations.
Notes: The ATTRIB statement does not affect the CONTENTS statement output.
CONTENTS reports the labels, informats, and formats on the actual member.
When using the CONTENTS statement with SAS/ACCESS LIBNAME engines, see
“Differences in the DATASETS Procedure Output When Using SAS/ACCESS
LIBNAME Engines” on page 572.
Tip: You can use data set options with the DATA=, OUT=, and OUT2= options. You can
use any global statements as well.
Example: “Example 5: Describing a SAS Data Set” on page 707
Syntax
CONTENTS <options>;
Optional Arguments
CENTILES
prints centiles information for indexed variables.
The following additional fields are printed in the default report of PROC
CONTENTS when the CENTILES option is selected and an index exists on the
data set. Note that the additional fields depend on whether the index is simple
or complex.
DATA=SAS-file-specification
specifies an entire library or a specific SAS data set within a library. SAS-file-
specification can take one of the following forms:
<libref.>SAS-data-set
names one SAS data set to process. The default for libref is the libref of the
procedure input library. For example, to obtain the contents of the SAS data
set HtWt from the procedure input library, use the following CONTENTS
statement:
contents data=HtWt;
To obtain the contents of a specific version from a generation group, use the
GENNUM= data set option as shown in the following CONTENTS statement:
contents data=HtWt(gennum=3);
<libref.>_ALL_
gives you information about all SAS data sets that have the type or types
specified by the MEMTYPE= option. libref refers to the SAS library. The
default for libref is the libref of the procedure input library.
n If you are using the _ALL_ keyword, you need Read access to all read-
protected SAS data sets in the SAS library.
n DATA=_ALL_ automatically prints a listing of the SAS files that are
contained in the SAS library. Note that for SAS views, all librefs that are
associated with the views must be assigned in the current session in
order for them to be processed for the listing.
Default most recently created data set in your job or session, from any SAS
library.
Tip If you specify a read-protected data set in the DATA= option but do
not give the Read password, by default the procedure looks in the
PROC DATASETS statement for the Read password. However, if you
do not specify the DATA= option and the default data set (last one
created in the session) is Read protected, the procedure does not look
in the PROC DATASETS statement for the Read password.
DETAILS | NODETAILS
includes information in the output about the number of observations, number of
variables, number of indexes, and data set labels. DETAILS includes these
additional columns of information in the output, but only if DIRECTORY is also
specified.
DIRECTORY
prints a list of all SAS files in the specified SAS library. If DETAILS is also
specified, using DIRECTORY causes the additional columns described in
DETAILS | NODETAILS on page 604 to be printed.
ENCRYPTKEY=key-value
specifies the key value for AES encryption. For more information, see “Library
Contents and AES Encryption” on page 607.
FMTLEN
prints the length of the informat or format. If you do not specify a length for the
informat or format when you associate it with a variable, the length does not
appear in the output of the CONTENTS statement unless you use the FMTLEN
option. The length also appears in the FORMATL or INFORML variable in the
output data set.
MEMTYPE=(member-type(s))
restricts processing to one or more member types. The CONTENTS statement
produces output only for member types DATA, VIEW, and ALL, which includes
DATA and VIEW.
n You cannot enclose the MEMTYPE= option in parentheses to limit its effect
to only the SAS file immediately preceding it.
Aliases MTYPE=
MT=
Default DATA
NODS
suppresses printing the contents of individual files when you specify _ALL_ in
the DATA= option. The CONTENTS statement prints only the SAS library
directory. You cannot use the NODS option when you specify only one SAS data
set in the DATA= option.
NODETAILS
See “DETAILS|NODETAILS” on page 604.
NOPRINT
suppresses printing the output of the CONTENTS statement.
606 Chapter 17 / DATASETS Procedure
Note The ORDER= option does not affect the order of the OUT= and OUT2=
data sets.
Example See “Example 4: Using the ORDER= Option” on page 515 to compare
the default and the four options for ORDER=.
OUT=SAS-data-set
names an output SAS data set.
Tip OUT= does not suppress the printed output from the statement. If you
want to suppress the printed output, you must use the NOPRINT option.
See “The OUT= Data Set” on page 683 for a description of the variables in the
OUT= data set.
OUT2=SAS-data-set
names the output data set to contain information about indexes and integrity
constraints.
Tips If UPDATECENTILES was not specified in the index definition, then the
default value of 5 is used in the re-create variable of the OUT2 data set.
OUT2= does not suppress the printed output from the statement. To
suppress the printed output, use the NOPRINT option.
See “The OUT2= Data Set” on page 689 for a description of the variables in
the OUT2= data set.
SHORT
prints only the list of variable names, the index information, and the sort
information for the SAS data set.
Restriction If the list of variables is more than 32,767 characters, the list is
truncated and a WARNING is written to the SAS log. To get a
complete list of the variables, request an alphabetical listing of the
variables.
CONTENTS Statement 607
VARNUM
prints a list of the variable names in the order of their logical position in the data
set. By default, the CONTENTS statement lists the variables alphabetically. The
physical position of the variable in the data set is engine-dependent.
Details
Printing Variables
The CONTENTS statement prints an alphabetical listing of the variables by default,
except for variables in the form of a numbered range list. Numbered range lists,
such as x1–x100, are printed in incrementing order, that is, x1–x100. For more
information, see “Alphabetic List of Variables and Attributes” on page 678.
Note: If a label is changed after a view is created from a data set with variable
labels, the CONTENTS or DATASETS procedure output shows the original labels.
The view must be recompiled in order for the CONTENTS or DATASETS procedure
output to reflect the new variable labels.
If the key value does not match the key value for a particular data file in the library,
then you will be prompted to enter the correct key value.
For more information about AES encryption, see “AES Encryption” in SAS
Programmer’s Guide: Essentials. For more information about the ENCRYPTKEY=
608 Chapter 17 / DATASETS Procedure
data set option, see “ENCRYPTKEY= Data Set Option” in SAS Data Set Options:
Reference.
Variables DD and FF, the only true numeric doubles, are at offsets 0 and 8,
respectively, so they are automatically aligned. The rest of the observation
contains the remaining numeric variables and then character variables.
The last physical variable in this layout is CC with an offset of 32 and a length of
10. This gives you an internal length of 42, even though PROC CONTENTS
reports the observation length as 48. The difference is the 6 bytes of padding so
that the next observation is aligned on a double-byte boundary within the disk
page buffer.
610 Chapter 17 / DATASETS Procedure
n No alignment is done when the observation does not contain 8-byte numeric
variables as demonstrated in the next example, which gives you an observation
length of 7 and no padding between observations within disk page buffers:
data b;
length aa 6 cc $1;
aa = 1;
cc = 'x';
output;
run;
n Observations for compressed data sets are not aligned within the disk page
buffer, but the same algorithm is used for positioning the variables within the
observations. Compressed observations must be uncompressed and moved into
a work buffer. The 8-byte numeric values will be aligned and ready for use
immediately after uncompressing. The observation length in the PROC
CONTENTS output might be larger due to operating system-specific overhead.
COPY Statement
Copies all or some of the SAS files in a SAS library.
Restriction: The COPY statement does not support data set options.
Notes: When using the COPY statement with SAS/ACCESS LIBNAME engines, see
“Differences in the DATASETS Procedure Output When Using SAS/ACCESS
LIBNAME Engines” on page 572.
For CAS engine specifics, see “CAS Processing for PROC COPY” on page 523.
Tips: See the example in PROC COPY on page 534 to migrate from a 32-bit machine to a
64-bit machine.
The COPY statement defaults to the encoding and data representation of the
output library when you use Remote Library Services (RLS) such as SAS/SHARE or
SAS/CONNECT. If you are not using RLS, you must use the PROC COPY option
NOCLONE for the output files to take on the encoding and data representation of
COPY Statement 611
the output library. Using the NOCLONE option results in a copy with the data
representation of the data library (if specified in the OUTREP= LIBNAME option) or
the native data representation of the operating environment.
Example: “Example 2: Manipulating SAS Files” on page 696
Syntax
COPY <ACCEL | NOACCEL>OUT=libref-1
<CLONE | NOCLONE>
<CONSTRAINT=YES | NO>
<DATECOPY>
<ENCRYPTKEY=key-value>
<FORCE>
IN=libref-2
<INDEX=YES | NO>
<MEMTYPE=(member-type(s))>
<MOVE <ALTER=alter-password>>
<OVERRIDE=(ds-option-1=value-1 <ds-option-2=value-2 …> ) >;
Required Argument
OUT=libref-1
names the SAS library to copy SAS files to.
Optional Arguments
ACCEL | NOACCEL
specifies whether to perform the copy operation in CAS. Both the IN= and OUT=
libraries must be CAS engine libraries and they must use the same CAS session.
Default ACCEL
CLONE | NOCLONE
specifies whether to copy the following data set attributes:
n size of input/output buffers
n encoding value
These attributes are specified with data set options, SAS system options, and
LIBNAME statement options:
n BUFSIZE= value for the size of the input/output buffers
For the BUFSIZE= attribute, the following table summarizes how the COPY
statement works:
Table 17.5 CLONE and the Buffer Page Size Attribute
CLONE Uses the BUFSIZE= value from the input data set for the
output data set. However, specifying BUFSIZE= value in
the OVERRIDE= option list results in a copy that uses the
specified value.
For the COMPRESS= attribute, the following table summarizes how the COPY
statement works:
Table 17.6 CLONE and the Compression Attribute
CLONE Uses the values from the input data set for the output
data set. However, specifying COMPRESS= value in the
OVERRIDE= option list results in a copy that uses the
specified encoding.
COPY Statement 613
For the REUSE= attribute, the following table summarizes how the COPY
statement works:
Table 17.7 CLONE and the Reuse Space Attribute
CLONE Uses the values from the input data set for the output data
set. If the engine for the input data set does not support
the reuse space attribute, then the COPY statement uses
the current setting of the corresponding SAS system
option. However, specifying REUSE= value in the
OVERRIDE= option list results in a copy that uses the
specified value.
For the OUTREP= attribute, the following table summarizes how the COPY
statement works:
Table 17.8 CLONE and the Data Representation Attribute
Native data representation is when the data representation of a file is the same
as the CPU operating environment. For example, a file in Windows data
representation is native to the Windows operating environment.
For the ENCODING= attribute, the following table summarizes how the COPY
statement works.
Table 17.9 CLONE and the Encoding Attribute
CLONE Results in a copy that uses the encoding of the input data
set or, if specified, the value of the INENCODING= option
in the LIBNAME statement for the input library. However,
specifying ENCODING= value in the OVERRIDE= option
list results in a copy that uses the specified encoding.
For the POINTOBS= attribute, the following table summarizes how the COPY
statement works. To use POINTOBS=, the output data set must be compressed.
COPY Statement 615
CLONE Uses the POINTOBS= value from the input data set for the
output data set. However, specifying POINTOBS= value in
the OVERRIDE= option list results in a copy that uses the
specified value.
CONSTRAINT=YES | NO
specifies whether to copy all integrity constraints when copying a data set.
Default NO
Tip For data sets with integrity constraints that have a foreign key, the
COPY statement copies the general and referential constraints if
CONSTRAINT=YES is specified and the entire library is copied. If you
use the SELECT or EXCLUDE statement to copy the data sets, then the
referential integrity constraints are not copied. For more information,
see “Understanding Integrity Constraints” in SAS Language Reference:
Concepts.
DATECOPY
copies the SAS internal date and time at which the SAS file was created and
when it was last modified to the resulting copy of the file. Note that the
operating environment date and time are not preserved.
DATECOPY can be used only when the resulting SAS file uses the
V8 or V9 engine.
Tips You can alter the file creation date and time with the DTC= option
in the MODIFY statement. See MODIFY statement on page 644.
If the file that you are copying has attributes that require additional
processing, the last modified date is changed to the current date.
For example, when you copy a data set that has an index, the index
must be rebuilt, and the last modified date changes to the current
date. Other attributes that require additional processing and that
could affect the last modified date include integrity constraints and
a sort indicator.
616 Chapter 17 / DATASETS Procedure
ENCRYPTKEY= key-value
specifies the key value needed to copy data sets in the IN= library that have
AES encryption.
Note If the output library does not support AES-encryption and the input data
set is AES-encrypted, the COPY process will produce an error.
FORCE
enables you to use the MOVE option for a SAS data set on which an audit trail
exists.
Note The AUDIT file is not moved with the audited data set.
IN=libref-2
names the SAS library containing SAS files to copy.
INDEX=YES | NO
specifies whether to copy all indexes for a data set when copying the data set to
another SAS library.
Default YES
MEMTYPE=(member-type(s))
restricts processing to one or more member types.
Aliases MTYPE=
MT=
Note When PROC COPY processes a SAS library on tape and the
MEMTYPE= option is not specified, it scans the entire sequential
library for entries until it reaches the end-of-file. If the sequential
library is a multivolume tape, all tape volumes are mounted. This
behavior is also true for single-volume tape libraries.
MOVE
moves SAS files from the input data library (named with the IN= option) to the
output data library (named with the OUT= option). And deletes the original files
from the input data library.
Restriction The MOVE option can be used to delete a member of a SAS library
only if the IN= engine supports the deletion of tables. A tape format
engine does not support table deletion. If you use a tape format
engine, SAS suppresses the MOVE operation and prints a warning.
ALTER=alter-password
provides the Alter password for any alter-protected SAS files that you are
moving from one data library to another. Because the MOVE option deletes the
SAS file from the original data library, you need Alter access to move the SAS
file.
Interaction When you specify an OUTREP= value in the OVERRIDE= option, the
default encoding is based on the operating environment that is
represented by the OUTREP= value and the locale of the current
SAS session. To assign a nondefault encoding such as UTF-8, you
must also specify an ENCODING= value in the OVERRIDE=option.
For more information about locale and encoding, see SAS National
Language Support (NLS): Reference Guide.
NOCLONE
See the description of “CLONE|NOCLONE” on page 611.
618 Chapter 17 / DATASETS Procedure
Details
To display information in the SAS log about the copy method that is being used,
you can specify the MSGLEVEL= system option as follows:
options msglevel=i;
The following message is written to the SAS log, if the block I/O method is not
used:
INFO: Data set block I/O cannot be used because:
If the COPY statement determines that the block I/O will not be used, one of the
following explanations is written to the SAS log:
INFO: - The data sets use different engines, have different variables
or have attributes that might differ.
If the Base SAS engine determines that the block I/O method will not be used, one
of the following explanations is written to the SAS log:
INFO: - Referential Integrity Constraints exist.
If you are having performance issues and want to create a subset of a large data set
for testing, you can use the OBS=0 option. In this case, you want to reduce the use
of system resources by disabling the block I/O method.
The following example uses the OBS=0 option to reduce the use of system
resources:
options obs=0 msglevel=i;
proc copy in=old out=lib;
select a;
run;
You get the same results when you use the SET statement:
data lib.new;
if 0 then set old.a;
stop;
run;
You can also select or exclude an abbreviated list of members. For example, the
following statement selects members Tabs, Test1, Test2, and Test3:
select tabs test1-test3;
Also, you can select a group of members whose names begin with the same letter
or letters by entering the common letters followed by a colon (:). For example, you
can select the four members in the previous example and all other members having
names that begin with the letter T by specifying the following statement:
select t:;
You specify members to exclude in the same way that you specify those to select.
That is, you can list individual member names, use an abbreviated list, or specify a
common letter or letters followed by a colon (:). For example, the following
statement excludes the members Stats, Teams1, Teams2, Teams3, Teams4 and all
the members that begin with the letters RBI from the copy operation:
exclude stats teams1-teams4 rbi:;
620 Chapter 17 / DATASETS Procedure
Note that the MEMTYPE= option affects which types of members are available to
be selected or excluded.
The PROC COPY option ACCEL | NOACCEL determines where the COPY procedure
executes. ACCEL is the default, which means the COPY operation runs on the CAS
server. If the COPY procedure fails on the CAS server, it does not attempt to copy
the file by pulling the observations into V9 SAS.
If the OVERRIDE option is specified on the PROC COPY invocation, the option is
ignored, and a note is sent to the log.
You cannot copy between sessions with PROC COPY. If the CAS libraries are
defined with two different sessions, the following error is displayed: ERROR: The
CAS sessions SESS1 and SESS2 used for this action do not match.
n You cannot limit its effect to the member immediately preceding it by enclosing
the MEMTYPE= option in parentheses.
n The SELECT and EXCLUDE statements and the IN= option (in the COPY
statement) affect the behavior of the MEMTYPE= option in the COPY
statement according to the following rules:
2 If you do not use the IN= option, or you use it to specify the library that
happens to be the procedure input library, the value of the MEMTYPE=
option in the PROC DATASETS statement limits the types of SAS files that
COPY Statement 621
are available for processing. The procedure uses the order of precedence
described in rule 1 to further subset the types available for copying. The
following statements do not copy any members from the default data library
to the Dest data library. Instead, the procedure issues an error message
because the MEMTYPE= value specified in the SELECT statement is not one
of the values of the MEMTYPE= option in the PROC DATASETS statement.
/* This step fails! */
proc datasets memtype=(data program);
copy out=dest;
select apples / memtype=catalog;
run;
3 If you specify an input data library in the IN= option other than the procedure
input library, the MEMTYPE= option in the PROC DATASETS statement has
no effect on the copy operation. Because no subsetting has yet occurred, the
procedure uses the order of precedence described in rule 1 to subset the
types available for copying. The following statements successfully copy
Bodyfat.data to the Dest data library because the Source library specified in
the IN= option in the COPY statement is not affected by the MEMTYPE=
option in the PROC DATASETS statement.
proc datasets library=work memtype=catalog;
copy in=source out=dest;
select bodyfat / memtype=data;
run;
When using the COPY statement, an in-memory directory of the library is obtained.
This can be a performance issue if the library has thousands of members and only a
few members are being copied. To resolve this performance issue, use a
combination of the MEMTYPE= option in the COPY statement with a SELECT
statement. Here is an example of this process:
Copying Views
The COPY statement with NOCLONE specified supports the OUTREP= and
ENCODING= LIBNAME options for SQL views, DATA step views, and some views
(Oracle and Sybase). When you use the COPY statement with Remote Library
Services (RLS) such as SAS/SHARE or SAS/CONNECT, the COPY statement
defaults to the encoding and data representation of the output library.
CAUTION
If you use the DATA statement's SOURCE=NOSAVE option when creating a
DATA step view, the view cannot be copied from one version of SAS to
another version.
622 Chapter 17 / DATASETS Procedure
When a variable name is truncated, the variable name is shortened to eight bytes. If
this name has already been defined in the data set, the name is shortened and a
digit is added, starting with the number 2. The process of truncation and adding a
digit continues until the variable name is unique. For example, a variable named
LONGVARNAME becomes LONGVARN, provided that a variable with that name
does not already exist in the data set. In that case, the variable name becomes
LONGVAR2.
CAUTION
Truncated variable names can collide with names already defined in the input
data set. This behavior is possible when the variable name that is already defined is
exactly eight bytes long and ends in a digit. In the following example, the truncated name
is defined in the output data set and the name from the input data set is changed:
options validvarname=any;
data test;
longvar10='aLongVariableName';
retain longvar1-longvar5 0;
run;
options validvarname=v6;
proc copy in=work out=sasuser;
select test;
run;
In this example, LONGVAR10 is truncated to LONGVAR1 and placed in the output data
set. Next, the original LONGVAR1 is copied. Its name is no longer unique. Therefore, it
is renamed LONGVAR2. The other variables in the input data set are also renamed
according to the renaming algorithm. The following example is from the SAS log:
1 options validvarname=any;
2 data test;
3 longvar10='aLongVariableName';
4 retain longvar1-longvar5 0;
5 run;
6
7 options validvarname=v6;
8 proc copy in=work out=sasuser;
9 select test;
10 run;
11
12 proc print data=test;
13 run;
n if the ENCRYPTKEY=key-value is different for each data set, use the SELECT
statement as shown in the following example:
624 Chapter 17 / DATASETS Procedure
For more information about AES encryption, see “AES Encryption” in SAS
Programmer’s Guide: Essentials. For more information about the ENCRYPTKEY=
data set option, see “ENCRYPTKEY= Data Set Option” in SAS Data Set Options:
Reference.
However, the above code will not work if there are multiple data primary key and
referential data set pairs in the IN=LIB and each pair has different ENCYPTKEY=
values. For example, if there are two pairs:
primarydset1/foreigndset1 having ENCRYPTKEY=secret1
primarydset2/foreigndset2 having ENCRYPTKEY=secret2
then the above scheme would work for only one pair. All pairs must have the same
key value.
For more information, see “ENCRYPTKEY= Data Set Option” in SAS Data Set
Options: Reference.
n PROC DATASETS cannot work with libraries that allow only sequential data
access.
n The COPY statement honors the NOWARN option but PROC COPY does not.
DELETE Statement
Deletes SAS files from a SAS library.
Syntax
DELETE SAS-file(s)
</ <ALTER=alter-password>
<ENCRYPTKEY=key-value>
<GENNUM=ALL | HIST | REVERT | integer>
<MEMTYPE=member-type>>;
Required Argument
SAS-file(s)
specifies one or more SAS files that you want to delete. If the SAS file has an
ALTER= password assigned, it must be specified in order to delete the SAS file.
You can also use a numbered range list or colon list. For more information, see
“Data Set Name Lists” in SAS Programmer’s Guide: Essentials.
Optional Arguments
ALTER=alter-password
provides the Alter password for any alter-protected SAS files that you want to
delete. You can use the option either in parentheses after the name of each SAS
file or after a forward slash.
ENCRYPTKEY=key-value
specifies the key value for AES encryption. This option is required when
specifying GENNUM=REVERT (which is the same as GENNUM=0) or
GENNUM=relative-generation-number. The key value for the ENCRYPTKEY=
option must be the key value for the base version. For more information, see
“Library Contents and AES Encryption” on page 607.
MEMTYPE=member-type
restricts processing to one member type. You can use the option either in
parentheses after the name of each SAS file or after a forward slash.
Aliases MTYPE=
MT=
Default DATA
Details
The Basics
SAS immediately deletes SAS files when the RUN group executes. You do not have
an opportunity to verify the Delete operation before it begins.
If the SAS file has an ALTER= password assigned, it must be specified in order to
delete the SAS file.
If you attempt to delete a SAS file that does not exist in the procedure input library,
PROC DATASETS issues a message and continues processing. If NOWARN is used,
no message is issued.
When you use the DELETE statement to delete a data set that has indexes
associated with it, the statement also deletes the indexes.
DELETE Statement 627
You cannot use the DELETE statement to delete a data file that has a foreign key
integrity constraint or a primary key with foreign key references. For data files that
have foreign keys, you must remove the foreign keys before you delete the data file.
For data files that have primary keys with foreign key references, you must remove
the foreign keys that reference the primary key before you delete the data file.
If you attempt to delete a CAS table that does not exist in the input CAS engine
libref, a message is written to the log and processing continues. If NOWARN is
used, no message is issued.
n delete the base version and rename the youngest historical version to the base
version
n delete an absolute version
proc datasets;
delete A / gennum=all;
The following statements delete the base version and all historical versions where
the data set name begins with the letter A:
proc datasets;
delete A:(gennum=all);
proc datasets;
delete A: / gennum=all;
proc datasets;
delete A / gennum=revert;
The following statements delete the base version and rename the youngest
historical version to the base version, where the data set name begins with the
letter A:
proc datasets;
delete A:(gennum=revert);
proc datasets;
delete A: / gennum=revert;
proc datasets;
delete A / gennum=1;
The following statements delete a specific historical version, where the data set
name begins with the letter A:
proc datasets;
delete A:(gennum=1);
proc datasets;
delete A: / gennum=1;
proc datasets;
delete A / gennum=-1;
The following statements use a relative number to delete the youngest historical
version, where the data set name begins with the letter A:
proc datasets;
delete A:(gennum=-1);
proc datasets;
delete A: / gennum=-1;
proc datasets;
delete A / gennum=hist;
The following statements delete all historical versions and leave the base version,
where the data set name begins with the letter A:
proc datasets;
delete A:(gennum=hist);
proc datasets;
delete A: / gennum=hist;
EXCHANGE Statement
Exchanges the names of two SAS files in a SAS library.
630 Chapter 17 / DATASETS Procedure
Syntax
EXCHANGE name-1=other-name-1 <name-2=other-name-2 …>
</ <ALTER=alter-password> <MEMTYPE=member-type>>;
Required Argument
name=other-name
exchanges the names of SAS files in the procedure input library. Both name and
other-name must already exist in the procedure input library.
Optional Arguments
ALTER=alter-password
provides the Alter password for any alter-protected SAS files whose names you
want to exchange. You can use the option either in parentheses after the name
of each SAS file or after a forward slash.
MEMTYPE=member-type
restricts processing to one member type. You can exchange only the names of
SAS files of the same type. You can use the option either in parentheses after
the name of each SAS file or after a forward slash.
Details
When you exchange more than one pair of names in one EXCHANGE statement,
PROC DATASETS performs the exchanges in the order in which the names of the
SAS files occur in the directory listing, not in the order in which you list the
exchanges in the EXCHANGE statement.
If the name SAS file does not exist in the SAS library, PROC DATASETS stops
processing the RUN group that contains the EXCHANGE statement and issues an
error message. To override this behavior, specify the NOWARN option in the PROC
DATASETS statement.
The EXCHANGE statement also exchanges the associated indexes so that they
correspond with the new name.
EXCLUDE Statement 631
The EXCHANGE statement allows only two existing generation groups to exchange
names. You cannot exchange a specific generation number with either an existing
base version or another generation number.
EXCLUDE Statement
Excludes SAS files from copying.
Syntax
EXCLUDE SAS-file(s) </ MEMTYPE=member-type>;
Required Argument
SAS-file(s)
specifies one or more SAS files to exclude from the copy operation. All SAS files
you name in the EXCLUDE statement must be in the library that is specified in
the IN= option in the COPY statement. If the SAS files are generation groups,
the EXCLUDE statement allows only selection of the base versions.
You can use the following shortcuts to list several SAS files in the EXCLUDE
statement:
Table 17.11 Using the EXCLUDE Statement
Notation Meaning
Optional Argument
MEMTYPE=member-type
restricts processing to one member type. You can use the option either in
parentheses after the name of each SAS file or after a forward slash.
Aliases MTYPE=
632 Chapter 17 / DATASETS Procedure
MT=
Details
FORMAT Statement
Assigns, changes, and removes variable formats in the SAS data set specified in the MODIFY
statement permanently.
Syntax
FORMAT variable-1 <format-1>
<variable-2 <format-2> …>;
Required Argument
variable
specifies one or more variables whose format you want to assign, change, or
remove. If you want to disassociate a format with a variable, list the variable
last in the list with no format following:
format x1-x3 4.1 time hhmm2.2 age;
Optional Argument
format
specifies a format to apply to the variable or variables listed before it. If you do
not specify a format, the FORMAT statement removes any format associated
with the variables in variable-list.
IC CREATE Statement 633
Tip To remove all formats from a data set, use the ATTRIB Statement on page
597 and the _ALL_ keyword.
IC CREATE Statement
Creates an integrity constraint.
Syntax
IC CREATE <constraint-name=> constraint <MESSAGE='message-string'
<MSGTYPE=USER>>;
Required Argument
constraint
is the type of constraint. Here is a list of valid values:
NOT NULL specifies that variable does not contain a SAS missing
(variable) value, including special missing values.
UNIQUE specifies that the values of variables must be unique.
(variables) This constraint is identical to DISTINCT.
DISTINCT specifies that the values of variables must be unique.
(variables) This constraint is identical to UNIQUE.
CHECK limits the data values of variables to a specific set,
(WHERE- range, or list of values. This behavior is accomplished
expression) with a WHERE expression.
PRIMARY KEY specifies a primary key, that is, a set of variables that do
(variables) not contain missing values and whose values are
unique.
When defining overlapping primary key and foreign key constraints, which
means that variables in a data file are part of both a primary key and a foreign
key definition, if you use exactly the same variables, then the variables must be
defined in a different order.
A primary key affects the values of an individual data file until it has a foreign
key referencing it.
634 Chapter 17 / DATASETS Procedure
Note: If a not null constraint exists for a variable that is being used to define a
new primary key constraint, then the primary key constraint replaces the
existing not null constraint.
The following operations can be done with the RESTRICT referential action:
a delete deletes the primary key row, but only if no foreign key
operation values match the deleted value.
an update updates the primary key value, but only if no foreign
operation key values match the current value to be updated.
The following operations can be done with the SET NULL referential action:
a delete operation deletes the primary key row and sets the
corresponding foreign key values to NULL.
an update modifies the primary key value and sets all matching
operation foreign key values to NULL.
The following operations can be done with the CASCADE referential action:
an update modifies the primary key value, and also modifies any
operation matching foreign key values to the same value. CASCADE
is not supported for Delete operations.
If you use exactly the same variables, then the variables must be
defined in a different order.
The foreign key's update and delete referential actions must both
be RESTRICT.
Optional Arguments
<constraint-name=>
is an optional name for the constraint. The name must be a valid SAS name.
When you do not supply a constraint name, a default name is generated. This
default constraint name has the following form:
Table 17.12 Using RESTRICT=
_UNxxxx_ Unique
_CKxxxx_ Check
<MESSAGE='message-string' <MSGTYPE=USER>>
message-string is the text of an error message to be written to the log when the
data fails the constraint:
ic create not null(socsec)
message='Invalid Social Security number';
IC DELETE Statement
Deletes an integrity constraint.
Syntax
IC DELETE constraint-name(s) | _ALL_;
Required Arguments
constraint-name(s)
names one or more constraints to delete. For example, to delete the constraints
Unique_D and Unique_E, use the following statement: ic delete Unique_D
Unique_E;
_ALL_
deletes all constraints for the SAS data file specified in the preceding MODIFY
statement.
IC REACTIVATE Statement
Reactivates a foreign key integrity constraint that is inactive.
Syntax
IC REACTIVATE foreign-key-name REFERENCES libref;
Required Arguments
foreign-key-name
is the name of the foreign key to reactivate.
INDEX CENTILES Statement 637
libref
refers to the SAS library containing the data set that contains the primary key
that is referenced by the foreign key.
Example
Suppose that you have the foreign key FKEY defined in data set MyLib.MyOwn and
that FKEY is linked to a primary key in data set MainLib.Main. If the integrity
constraint is inactivated by a copy or move operation, you can reactivate the
integrity constraint by using the following code:
proc datasets library=mylib;
modify myown;
ic reactivate fkey references mainlib;
run;
Syntax
INDEX CENTILES index(s)
</ <REFRESH> <UPDATECENTILES=ALWAYS | NEVER | integer>>;
Required Argument
index(s)
names one or more indexes.
Optional Arguments
REFRESH
updates centiles immediately, regardless of the value of UPDATECENTILES.
ALWAYS | 0 updates centiles when the data set is closed if any changes
have been made to the data set index. You can specify
ALWAYS or 0 and produce the same results.
NEVER | 101 does not update centiles. You can specify NEVER or 101 and
produce the same results.
integer is the percentage of values for the indexed variable that can
be updated before centiles are refreshed. The alias is
UPDCEN. The default is 5 (percent).
Syntax
INDEX CREATE index-specification(s)
</ <NOMISS> <UNIQUE> <UPDATECENTILES=ALWAYS | NEVER | integer>>;
Required Argument
index-specification(s)
can be one or both of the following forms:
variable
creates a simple index on the specified variable.
index=(variables)
creates a composite index. The name that you specify for index is the name
of the composite index. It must be a valid SAS name and cannot be the same
as any variable name or any other composite index name. You must specify
at least two variables.
Note The index name must follow the same rules as a SAS variable name,
including avoiding the use of reserved names for automatic variables,
such as _N_, and special variable list names, such as _ALL_. For more
information, see “Words in the SAS Language” in SAS Language Reference:
Concepts.
INDEX DELETE Statement 639
Optional Arguments
NOMISS
excludes from the index all observations with missing values for all index
variables.
When you create an index with the NOMISS option, SAS uses the index only for
WHERE processing and only when missing values fail to satisfy the WHERE
expression. For example, if you use the following WHERE statement, SAS does
not use the index, because missing values satisfy the WHERE expression:
where dept ne '01';
BY-group processing ignores indexes that are created with the NOMISS option.
UNIQUE
specifies that the combination of values of the index variables must be unique.
If you specify UNIQUE and multiple observations have the same values for the
index variables, the index is not created.
ALWAYS | 0 updates centiles when the data set is closed if any changes
have been made to the data set index. You can specify
ALWAYS or 0 and produce the same results.
NEVER | 101 does not update centiles. You can specify NEVER or 101 and
produce the same results.
integer specifies the percentage of values for the indexed variable
that can be updated before centiles are refreshed. The alias
is UPDCEN and the default is 5 (percent).
Restriction: The INDEX DELETE statement must appear in a MODIFY RUN group
Note: You can use the CONTENTS statement to produce a list of all indexes for a data
set.
640 Chapter 17 / DATASETS Procedure
Syntax
INDEX DELETE index(s) | _ALL_;
Required Arguments
index(s)
names one or more indexes to delete. The index(es) must be for variables in the
SAS data set that is named in the preceding MODIFY statement. You can delete
both simple and composite indexes.
_ALL_
deletes all indexes, except for indexes that are owned by an integrity constraint.
When an index is created, it is marked as owned by the user, by an integrity
constraint, or by both. If an index is owned by both a user and an integrity
constraint, the index is not deleted until both an IC DELETE statement and an
INDEX DELETE statement are processed.
INFORMAT Statement
Assigns, changes, and removes variable informats in the data set specified in the MODIFY statement
permanently.
Syntax
INFORMAT variable-1 <informat-1>
<variable-2 <informat-2> …>;
Required Argument
variable
specifies one or more variables whose informats you want to assign, change, or
remove. If you want to disassociate an informat with a variable, list the variable
last in the list with no informat following:
informat a b 2. x1-x3 4.1 c;
Optional Argument
informat
specifies an informat for the variables immediately preceding it in the
statement. If you do not specify an informat, the INFORMAT statement removes
any existing informats for the variables in variable-list.
INITIATE Statement 641
Tip To remove all informats from a data set, use the ATTRIB statement on
page 597 and the _ALL_ keyword.
INITIATE Statement
Creates an audit file that has the same name as the SAS data file and a data set type of AUDIT.
Syntax
INITIATE <AUDIT_ALL=NO | YES>;
Optional Argument
AUDIT_ALL=NO | YES
specifies whether logging can be suspended and audit settings can be changed.
AUDIT_ALL=YES specifies that all images are logged and cannot be suspended.
That is, you cannot use the LOG statement to turn off logging of particular
images, and you cannot suspend event logging by using the SUSPEND
statement. To turn off logging, you must use the TERMINATE statement, which
terminates event logging and deletes the audit file.
Default NO
Details
The audit file logs additions, deletions, and updates to the SAS data file. You must
initiate an audit trail before you can suspend, resume, or terminate it. Although the
AUDIT statement immediately preceding the INITIATE statement cannot specify a
GENNUM= option, if the specified file identifies a generation data set group, the
audit file created by the INITIATE statement will be attached to the most recently
created generation in the generation group.
The following example creates the audit file MyLib.MyFile.audit to log updates to
the data file MyLib.MyFile.data, storing all available record images:
proc datasets library=mylib;
audit myfile (alter=password);
642 Chapter 17 / DATASETS Procedure
initiate;
run;
LABEL Statement
Assigns, changes, and removes variable labels for the SAS data set specified in the MODIFY
statement.
Syntax
LABEL variable-1=<'label-1' | ' '>
<variable-2=<'label-2' | ' '> …>;
Required Argument
variable=<'label'>
specifies a text string of up to 256 characters. If the label text contains single
quotation marks, use double quotation marks around the label, or use two single
quotation marks in the label text and enclose the string in single quotation
marks. To remove a label from a data set, assign a label that is equal to a blank
that is enclosed in quotation marks.
Tip To remove all variable labels in a data set, use the ATTRIB statement on
page 597 and the _ALL_ keyword.
LOG Statement
specifies the audit file settings.
Restriction: The LOG statement must appear after the INITIATE statement in an AUDIT RUN
group.
Example: “Example 8: Initiating an Audit File” on page 715
LOG Statement 643
Syntax
LOG <ADMIN_IMAGE=YES | NO>
<BEFORE_IMAGE=YES | NO>
<DATA_IMAGE=YES | NO>
<ERROR_IMAGE=YES | NO>;
Optional Arguments
ADMIN_IMAGE=YES | NO
specifies whether the administrative events are logged to the audit file (that is,
the SUSPEND and RESUME actions).
Default YES
Tip If you do not want to log a particular image, specify NO for the image
type. For example, the following code turns off logging the error
images, but the administrative, before, and data images continue to be
logged: log error_image=no;
BEFORE_IMAGE=YES | NO
specifies whether the before-update record images are logged to the audit file.
Default YES
DATA_IMAGE=YES | NO
specifies whether the added, deleted, and after-update record images are
logged to the audit file.
Default YES
ERROR_IMAGE=YES | NO
specifies whether the after-update record images are logged to the audit file.
Default YES
Details
The following example creates the same audit file but stores only error record
images:
proc datasets library=mylib;
audit myfile (alter=password);
initiate;
log admin_image=no
before_image=no
data_image=no;
run;
644 Chapter 17 / DATASETS Procedure
MODIFY Statement
Changes the attributes of a SAS file and, through the use of subordinate statements, the attributes
of variables in the SAS file.
Restriction: You cannot change the length of a variable using the LENGTH= option in an ATTRIB
statement.
Example: “Example 4: Modifying SAS Data Sets” on page 704
Syntax
MODIFY SAS-file <(options)>
</ <CORRECTENCODING=encoding-value> <DTC=SAS-date-time>
<GENNUM=integer> <MEMTYPE=member-type>>;
Required Argument
SAS-file
specifies a SAS file that exists in the procedure input library.
Optional Arguments
ALTER=password-modification
assigns, changes, or removes an Alter password for the SAS file named in the
MODIFY statement. password-modification is one of the following:
n new-password
n old-password / new-password
n / new-password
n old-password /
n /
CORRECTENCODING=encoding-value
enables you to change the encoding indicator, which is recorded in the file's
descriptor information, in order to match the actual encoding of the file's data.
ENCRYPTKEY=key-value
specifies a key value for AES encryption.
MODIFY Statement 645
Requirement ENCRYPTKEY= data set option is required if the data file has AES
encryption.
DTC=SAS-date-time
specifies a date and time to substitute for the date and time stamp placed on a
SAS file at the time of creation. You cannot use this option in parentheses after
the name of each SAS file; you must specify DTC= after a forward slash:
modify mydata / dtc='03MAR00:12:01:00'dt;
Restrictions A SAS file's creation date and time cannot be set later than the date
and time the file was actually created.
DTC= can be used only when the resulting SAS file uses the V8 or
V9 engine.
Tip Use DTC= to alter a SAS file's creation date and time before using
the DATECOPY option in the COPY procedure, CPORT procedure,
SORT procedure, and the COPY statement in the DATASETS
procedure.
GENMAX=number-of-generations
specifies the maximum number of versions. Use this option in parentheses after
the name of SAS file.
Default 0
Range 0 to 1,000
GENNUM=integer
restricts processing for generation data sets. You can specify GENNUM= either
in parentheses after the name of each SAS file or after a forward slash. Valid
value is integer, which is a number that references a specific version from a
generation group. Specifying a positive number is an absolute reference to a
specific generation number that is appended to a data set's name (that is,
gennum=2 specifies MYDATA#002). Specifying a negative number is a relative
reference to a historical version in relation to the base version, from the
youngest to the oldest (that is, gennum=-1 refers to the youngest historical
version). Specifying 0, which is the default, refers to the base version.
Tip To remove all variable labels in a data set, use the ATTRIB
statement on page 597.
MEMTYPE=member-type
restricts processing to one member type. You cannot specify MEMTYPE= in
parentheses after the name of each SAS file; you must specify MEMTYPE= after
a forward slash.
Aliases MTYPE=
MT=
Default If you do not specify the MEMTYPE= option in the PROC DATASETS
statement or in the MODIFY statement, the default is
MEMTYPE=DATA.
PW=password-modification
assigns, changes, or removes a Read, Write, or Alter password for the SAS file
named in the MODIFY statement. password-modification is one of the following:
n new-password
n old-password / new-password
n / new-password
n old-password /
n /
READ=password-modification
assigns, changes, or removes a Read password for the SAS file named in the
MODIFY statement. password-modification is one of the following:
n new-password
n old-password / new-password
n / new-password
n old-password /
n /
SORTEDBY=sort-information
specifies how the data are currently sorted. SAS stores the sort information
with the file but does not verify that the data are sorted the way you indicate.
sort-information can be one of the following:
by-clause </ indicates how the data are currently sorted. Values for by-
collate- clause are the variables and options that you can use in a BY
name> statement in a PROC SORT step. collate-name names the
collating sequence used for the sort. By default, the
MODIFY Statement 647
Restriction The data must be sorted in the order in which you specify. If the data
is not in the specified order, SAS does not sort it for you.
Tip When using the MODIFY SORTEDBY option, you can also use a
numbered range list or colon list. For more information, see “Data
Set Name Lists” in SAS Programmer’s Guide: Essentials.
TYPE=special-type
assigns or changes the special data set type of a SAS data set. SAS does not
verify the following:
n the SAS data set type that you specify in the TYPE= option (except to check
if it has a length of eight or fewer characters)
n that the SAS data set's structure is appropriate for the type that you have
designated
Note Do not confuse the TYPE= option with the MEMTYPE= option. The
TYPE= option specifies a type of special SAS data set. The MEMTYPE=
option specifies one or more types of SAS files in a SAS library.
Tip Most SAS data sets have no special type. However, certain SAS
procedures, like the CORR procedure, can create a number of special SAS
data sets. In addition, SAS/STAT software and SAS/EIS software support
special data set types.
WRITE=password-modification
assigns, changes, or removes a Write password for the SAS file named in the
MODIFY statement. password-modification is one of the following:
n new-password
n old-password / new-password
n / new-password
n old-password /
n /
Details
You can change one or more variable labels within a data set. To change a variable
label within the data set, use the following syntax:
modify datasetname;
label variablename='Label for Variable';
run;
For an example of changing both a data set label and a variable label in the same
PROC DATASETS, see “Example 4: Modifying SAS Data Sets” on page 704.
Manipulating Passwords
In order to assign, change, or remove a password, you must specify the password
for the highest level of protection that currently exists on that file.
Assigning Passwords
/* assigns a password to an unprotected file */
modify colors (pw=green);
Changing Passwords
/* changes the Write password from YELLOW to BROWN */
modify cars (write=yellow/brown);
Removing Passwords
/* removes the Alter password RED from STATES */
modify states (alter=red/);
Removing Passwords
/* removes the Alter password RED from STATES#002 */
modify states (alter=red/) / gennum=2;
REBUILD Statement 649
REBUILD Statement
Specifies whether to restore or delete the disabled indexes and integrity constraints.
Syntax
REBUILD SAS-file </ <ENCRYPTKEY=key-value>
<ALTER=password> <GENNUM=n> <MEMTYPE=member-type> <NOINDEX>>;
Required Argument
SAS-file
specifies a SAS data file that contains the disabled indexes and integrity
constraints. You can also use a numbered range list or colon list.
Optional Arguments
ENCRYPTKEY=key-value
specifies a key value for AES encryption.
Requirement ENCRYPTKEY= data set option is required if the data file has AES
encryption.
ALTER=alter-password
provides the Alter password for any alter-protected SAS files that are named in
the REBUILD statement. You can use the option either in parentheses after the
name of each SAS file or after a forward slash.
GENNUM=integer
restricts processing for generation data sets. You can use the option either in
parentheses after the name of each SAS file or after a forward slash. Valid value
is integer, which is a number that references a specific version from a generation
group. Specifying a positive number is an absolute reference to a specific
generation number that is appended to a data set's name (that is, gennum=2
specifies MYDATA#002). Specifying a negative number is a relative reference
to a historical version in relation to the base version, from the youngest to the
oldest (that is, gennum=-1 refers to the youngest historical version). Specifying
0, which is the default, refers to the base version.
MEMTYPE=member-type
restricts processing to one member type.
Aliases MTYPE=
MT=
Default If you do not specify the MEMTYPE= option in the PROC DATASETS
statement or in the REBUILD statement, the default is MEMTYPE=ALL.
NOINDEX
specifies to delete the disabled indexes and integrity constraints.
Restriction The NOINDEX option cannot be used for data files that contain one
or more referential integrity constraints.
Details
When the DLDMGACTION=NOINDEX data set or system option is specified and
SAS encounters a damaged data file, SAS does the following:
n repairs the data file without indexes and integrity constraints
n updates the data file to reflect the disabled indexes and integrity constraints
The REBUILD statement completes the repair of a damaged SAS data file by
rebuilding or deleting all of the data file's disabled indexes and integrity constraints.
The REBUILD statement establishes and uses member-level locking in order to
process the new index file and to restore the indexes and integrity constraints.
To rebuild the index file and restore the indexes and integrity constraints, use the
following code:
proc datasets library=mylib;
rebuild myfile
/alter=password
gennum=n
memtype=mytype;
To delete the disabled indexes and integrity constraints, use the following code:
proc datasets library=mylib;
rebuild myfile /noindex;
After you execute the REBUILD statement, the data file is no longer restricted to
INPUT mode.
RENAME Statement 651
The REBUILD statement default is to rebuild the indexes and integrity constraints
and the index file.
If a data file contains one or more referential integrity constraints and you use the
NOINDEX option with the REBUILD statement, the following error message is
written to the SAS log:
RENAME Statement
Renames variables in the SAS data set specified in the MODIFY statement.
Syntax
RENAME old-name-1=new-name-1
<old-name-2=new-name-2 …>;
Required Argument
old-name=new-name
changes the name of a variable in the data set specified in the MODIFY
statement. old-name must be a variable that already exists in the data set. new-
name cannot be the name of a variable that already exists in the data set or the
name of an index, and the new name must be a valid SAS name.
For example, assume the Oxygen data set includes a variable named OXYGEN.
The following code renames the OXYGEN variable to intake:
modify oxygen;
rename oxygen=intake;
652 Chapter 17 / DATASETS Procedure
See “Rules for Words and Names in the SAS Language” in SAS Language
Reference: Concepts
Details
If old-name does not exist in the SAS data set or new-name already exists, PROC
DATASETS stops processing the RUN group containing the RENAME statement
and issues an error message.
When you use the RENAME statement to change the name of a variable for which
there is a simple index, the statement also renames the index.
If the variable that you are renaming is used in a composite index, the composite
index automatically references the new variable name. However, if you attempt to
rename a variable to a name that has already been used for a composite index, you
receive an error message.
REPAIR Statement
Attempts to restore damaged SAS data sets or catalogs, in permanent libraries, to a usable
condition.
Note: The REPAIR statement is not a replacement for having a current backup.
Syntax
REPAIR SAS-file(s)
</ <ENCRYPTKEY=key-value>
<ALTER=alter-password> <GENNUM=integer> <MEMTYPE=member-type>>;
Required Argument
SAS-file(s)
specifies one or more SAS data sets or catalogs in the procedure input library.
You can also use a numbered range list or colon list.
REPAIR Statement 653
Optional Arguments
ALTER=alter-password
provides the Alter password for any alter-protected SAS files that are named in
the REPAIR statement. You can use the option either in parentheses after the
name of each SAS file or after a forward slash.
ENCRYPTKEY=key-value
specifies a key value for AES encryption.
Requirement ENCRYPTKEY= data set option is required if the data file has AES
encryption.
GENNUM=integer
restricts processing for generation data sets. You can use the option either in
parentheses after the name of each SAS file or after a forward slash. Valid value
is integer, which is a number that references a specific version from a generation
group. Specifying a positive number is an absolute reference to a specific
generation number that is appended to a data set's name (that is, gennum=2
specifies MYDATA#002). Specifying a negative number is a relative reference
to a historical version in relation to the base version, from the youngest to the
oldest (that is, gennum=-1 refers to the youngest historical version). Specifying
0, which is the default, refers to the base version.
See “Restricting Processing for Generation Data Sets” on page 667 and
“Understanding Generation Data Sets” in SAS Language Reference:
Concepts
MEMTYPE=member-type
restricts processing to one member type.
Aliases MTYPE=
MT=
Default If you do not specify the MEMTYPE= option in the PROC DATASETS
statement or in the REPAIR statement, the default is MEMTYPE=ALL.
Details
CAUTION
If you have extensive damage to your data set, the REPAIR statement will not
correct it. If the device on which a SAS data set or an auxiliary file (index, audit, or
654 Chapter 17 / DATASETS Procedure
extended attribute file) resides is damaged, then you must restore the damaged data set
and auxiliary files from a backup device.
The most common situations where the REPAIR statement might be helpful are as
follows:
n A system failure occurs while you are updating a SAS data set or catalog.
When you use the REPAIR statement for SAS data sets, it re-creates all indexes
for the data set. It also attempts to restore the data set to a usable condition,
but the restored data set might not include the last several updates that
occurred before the system failed. You cannot use the REPAIR statement to re-
create indexes that were destroyed by using the FORCE option in a PROC SORT
step.
n An I/O error occurs while you are writing a SAS data set or catalog entry.
If the disk that stores the SAS data set becomes full before the file is
completely written to disk, the step that writes the data set will fail, and an
error is written to the SAS log. The REPAIR statement can repair the data set
header and perhaps offer a workable data set. The data set will likely be
incomplete and its integrity questionable. The best recourse is to re-create the
data set from a backup.
When you use the REPAIR statement for a catalog, you receive a message stating
whether the REPAIR statement restored the entry. If the entire catalog is
potentially damaged, the REPAIR statement attempts to restore all the entries in
the catalog. If only a single entry is potentially damaged, for example, when a single
entry is being updated and a disk-full condition occurs, on most systems only the
entry that is open when the problem occurs is potentially damaged. In this case, the
REPAIR statement attempts to repair only that entry. Some entries within the
restored catalog might not include the last updates that occurred before a system
crash or an I/O error. The REPAIR statement issues warning messages for entries
that might have truncated data.
To repair a damaged catalog, you must use a version of SAS that can update the
catalog. A damaged SAS 9 catalog can be repaired with SAS 9 only.
If you issue a REPAIR statement for a SAS file that does not exist in the specified
library, PROC DATASETS stops processing the run group that contains the REPAIR
statement, and issues an error message. To override this behavior, use the
NOWARN option in the PROC DATASETS statement.
If you are using Cross-Environment Data Access (CEDA) to process a foreign SAS
data set that has become damaged, you must move the data set back to its native
environment before you attempt to repair it using the PROC DATASETS REPAIR
statement. CEDA does not support update processing, which is required in order to
repair a damaged data set.
For more information about CEDA, see “Definitions for Cross-Environment Data
Access (CEDA)” in SAS Programmer’s Guide: Essentials.
SAVE Statement 655
RESUME Statement
resumes event logging to the audit file, if it was suspended.
Syntax
RESUME;
Details
No other audit-related statement, such as RESUME, SUSPEND, TERMINATE,
USER_VAR, or LOG, will be valid for an audit file until the INITIATE statement has
been submitted. For more information, see “INITIATE Statement” on page 641.
SAVE Statement
Deletes all the SAS files in a library except the ones listed in the SAVE statement.
Syntax
SAVE SAS-file(s) </ MEMTYPE=member-type>;
Required Argument
SAS-file(s)
specifies one or more SAS files that you do not want to delete from the SAS
library.
656 Chapter 17 / DATASETS Procedure
Optional Argument
MEMTYPE=member-type
restricts processing to one member type. You can use the option either in
parentheses after the name of each SAS file or after a forward slash.
Aliases MTYPE=
MT=
Default If you do not specify the MEMTYPE= option in the PROC DATASETS
statement or in the SAVE statement, the default is MEMTYPE=ALL.
Details
If one of the SAS files in SAS-file does not exist in the procedure input library,
PROC DATASETS stops processing the RUN group containing the SAVE statement
and issues an error message. To override this behavior, specify the NOWARN
option in the PROC DATASETS statement.
When the SAVE statement deletes SAS data sets, it also deletes any indexes
associated with those data sets. (If the SAS data set that is to be deleted has an
ALTER= password assigned to it, the ALTER= password must be specified in order
to delete the SAS data set.)
CAUTION
SAS immediately deletes libraries and library members when you submit a
RUN group. You are not asked to verify the Delete operation before it begins. The
SAVE statement deletes many SAS files in one operation. Make sure that you
understand how the MEMTYPE= option affects which types of SAS files are saved and
which types are deleted.
When you use the SAVE statement with generation groups, the SAVE statement
treats the base version and all historical versions as a unit. You cannot save a
specific version.
SELECT Statement
Selects SAS files for copying.
SELECT Statement 657
Syntax
SELECT SAS-file(s)
</ <ENCRYPTKEY=key-value> <ALTER=alter-password> <MEMTYPE=member-
type>>;
Required Argument
SAS-file(s)
specifies one or more SAS files that you want to copy. All of the SAS files that
you name must be in the data library that is referenced by the libref named in
the IN= option in the COPY statement. If the SAS files have generation groups,
all the generations are copied because the SELECT statement does not allow
you to select specific versions.
Optional Arguments
ALTER=alter-password
provides the Alter password for any alter-protected SAS files that you are
moving from one data library to another. Because you are moving and thus
deleting a SAS file from a SAS library, you need Alter access. You can use the
option either in parentheses after the name of each SAS file or after a forward
slash.
ENCRYPTKEY=key-value
specifies the key value for AES encryption.
MEMTYPE=member-type
restricts processing to one member type. You can use the option either in
parentheses after the name of each SAS file or after a forward slash.
Aliases MTYPE=
MT=
Default If you do not specify the MEMTYPE= option in the PROC DATASETS
statement, in the COPY statement, or in the SELECT statement, the
default is MEMTYPE=ALL.
Details
Notation Meaning
SUSPEND Statement
suspends event logging to the audit file, but does not delete the audit file.
Syntax
SUSPEND;
Details
No other audit-related statement, such as SUSPEND, RESUME, TERMINATE,
USER_VAR, or LOG, will be valid for an audit file until the INITIATE statement has
been submitted. For more information, see “INITIATE Statement” on page 641.
TERMINATE Statement
terminates event logging and deletes the audit file.
Syntax
TERMINATE;
Details
No other audit-related statement, such as TERMINATE, SUSPEND, RESUME,
USER_VAR, or LOG, will be valid for an audit file until the INITIATE statement has
been submitted. For more information, see “INITIATE Statement” on page 641.
USER_VAR Statement
defines optional variables to be logged in the audit file with each update to an observation. When
you use USER_VAR, it must follow an INITIATE statement.
Syntax
USER_VAR variable-name-1 <$> <length> <LABEL='variable-label' >
<variable-name-2 <$> <length> <LABEL='variable-label' > …>;
Required Argument
variable-name
is a name for the variable.
660 Chapter 17 / DATASETS Procedure
Optional Arguments
$
indicates that the variable is a character variable.
length
specifies the length of the variable.
Default 8
LABEL='variable-label'
specifies a label for the variable.
Restrictions: The XATTR ADD statement must appear in a MODIFY RUN group
Generation data sets do not support extended attributes.
Supports: V9 engine only
Notes: An extended attribute can have numeric or character values.
A blank space in a character value indicates a missing value. Missing numeric
values are also allowed.
An extended attribute name must conform to SAS naming rules. A SAS name can
be up to 32 characters long. For more information, see “Rules for User-Supplied
SAS Names” in SAS Language Reference: Concepts and “SAS Variable Attributes” in
SAS Language Reference: Concepts.
Example: “Example 9: Extended Attributes” on page 721
Syntax
XATTR ADD DS attribute-name-1=attribute-value-1
<attribute-name-2=attribute-value-2 …>;
or
Required Arguments
Note that for character values, attribute-value must be in quotation marks, such as "attribute-
value".
XATTR ADD DS attribute-name-1=attribute-value-1 <attribute-name-2=attribute-
value-2 ...>
adds an extended attribute to a data set. If the extended attribute already
exists, an error will be returned.
Details
Extended attributes are organized into (name, value) pairs. If you try to add a new
attribute and the attribute already exists, an error is written to the SAS log.
Restriction: The XATTR DELETE statement must appear in a MODIFY RUN group
Supports: V9 engine only
Example: “Example 9: Extended Attributes” on page 721
Syntax
XATTR DELETE;
Required Argument
XATTR DELETE
deletes all extended attributes from a data set.
Details
Use the XATTR DELETE statement to delete all of the extended attributes from a
data set. None of the extended attributes will exist after using this command. The
following example deletes all extended attributes from a data set:
662 Chapter 17 / DATASETS Procedure
Restriction: The XATTR OPTIONS statement must appear in a MODIFY RUN group
Supports: V9 engine only
Example: “Example 9: Extended Attributes” on page 721
Syntax
XATTR OPTIONS <SEGLEN=number-of-bytes>;
Required Argument
XATTR OPTIONS SEGLEN=number-of-bytes
Indicates the length of the storage element that will hold the character
extended attribute value. The value can be 1 to 32,760 bytes.
Default 256
Restriction: The XATTR REMOVE statement must appear in a MODIFY RUN group
Supports: V9 engine only
Example: “Example 9: Extended Attributes” on page 721
Syntax
XATTR REMOVE DS attribute-name(s) ;
or
Required Arguments
XATTR REMOVE DS attribute-name(s)
removes an extended attribute from a data set.
Details
If you no longer need an extended attribute that you created, use the XATTR
REMOVE statement to remove it from a variable or a data set. The XATTR
REMOVE statement deletes only the extended attribute that you specify.
Restrictions: The XATTR SET statement must appear in a MODIFY RUN group
Generation data sets do not support extended attributes.
Supports: V9 engine only
Notes: An extended attribute can have numeric or character values.
A blank space in a character value indicates a missing value. Missing numeric
values are also allowed.
An extended attribute name must conform to SAS naming rules. A SAS name can
be up to 32 characters long. For more information, see “Rules for User-Supplied
SAS Names” in SAS Language Reference: Concepts and “SAS Variable Attributes” in
SAS Language Reference: Concepts.
Example: “Example 9: Extended Attributes” on page 721
Syntax
XATTR SET DS attribute-name-1=attribute-value-1
<attribute-name-2="attribute-value-2" …>;
or
Required Arguments
Note that for character values, attribute-value must be in quotation marks, such as "attribute-
value".
XATTR SET DS attribute-name-1=attribute-value-1 <attribute-name-2=attribute-
value-2 ...>
updates or adds an extended attribute to a data set. If the data set extended
attribute does not exist, it will be added. If it does exist, it will be updated with
the value specified.
Details
Use the XATTR SET statement if you are not sure if an extended attribute exists. If
an extended attribute does exist, it will be updated. If the extended attribute does
not exist, it added. The XATTR SET statement defines the variable or data set
extended attribute even if it does not exist yet. When using XATTR ADD, an error
occurs if there is an existing extended attribute using that value. You also get an
error if you try to use XATTR UPDATE on an extended attribute that does not exist
yet. Using XATTR SET defines the variable or data set extended attributes. If the
extended attribute did not exist, it does now. If the extended attribute did exist,
then it has a new value.
Restriction: The XATTR UPDATE statement must appear in a MODIFY RUN group
Supports: V9 engine only
Notes: A blank space in a character value indicates a missing value. Missing numeric
values are also allowed.
An extended attribute name must conform to SAS naming rules. A SAS name can
be up to 32 characters long. For more information, see “Rules for User-Supplied
SAS Names” in SAS Language Reference: Concepts and “SAS Variable Attributes” in
SAS Language Reference: Concepts.
Example: “Example 9: Extended Attributes” on page 721
Usage: DATASETS Procedure 665
Syntax
XATTR UPDATE DS attribute-name-1=attribute-value-1
<attribute-name-2=attribute-value-2 …>;
or
Required Arguments
Note that for character values, attribute-value must be in quotation marks, such as "attribute-
value".
XATTR UPDATE DS attribute-name-1=attribute-value-1 <attribute-
name-2=attribute-value-2 ...>
updates an extended attribute in a data set. If the extended attribute does not
exist, an error is written to the SAS log.
Details
To make changes to an existing extended attribute, use the XATTR UPDATE
statement. If you try to update an extended attribute that does not exist, an error is
written to the SAS log.
When you are working with password-protected SAS files in the AGE, CHANGE,
DELETE, EXCHANGE, REPAIR, or SELECT statement, you can specify ALTER= and
PW= password options in the PROC DATASETS statement or in the subordinate
statement.
Note: The ALTER= option works slightly different for the COPY (when moving a
file) and MODIFY statements. For more information, see COPY statement on page
610 and the MODIFY statement on page 644.
1 in parentheses after the name of the SAS file in a subordinate statement. When
used in parentheses, the option refers only to the name immediately preceding
the option. If you are working with more than one SAS file in a data library and
each SAS file has a different password, you must specify password options in
parentheses after individual names.
In the following statement, the ALTER= option provides the password red for
the SAS file Bones only:
delete xplant bones(alter=red);
2 after a forward slash (/) in a subordinate statement. When you use a password
option following a slash, the option refers to all SAS files named in the
statement unless the same option appears in parentheses after the name of a
SAS file. This method is convenient when you are working with more than one
SAS file and they all have the same password.
In the following statement, the ALTER= option in parentheses provides the
password red for the SAS file Chest, and the ALTER= option after the slash
provides the password blue for the SAS file Virus:
delete chest(alter=red) virus / alter=blue;
Note: For the password for a SAS file in a SELECT statement, SAS looks in the
COPY statement before it looks in the PROC DATASETS statement.
1. In the APPEND and CONTENTS statements, you use these options just as you use any SAS data set option, in
parentheses after the SAS data set name.
Usage: DATASETS Procedure 667
When you are working with a generation group for the AUDIT, CHANGE, DELETE,
MODIFY, and REPAIR statements, you can restrict processing in the PROC
DATASETS statement or in the subordinate statement to a specific version.
Note: The GENNUM= option works slightly different for the MODIFY statement.
See MODIFY statement on page 644.
Note: You cannot restrict processing to a specific version for the AGE, COPY,
EXCHANGE, and SAVE statements. These statements apply to the entire
generation group.
1 in parentheses after the name of the SAS data set in a subordinate statement.
When used in parentheses, the option refers only to the name immediately
preceding the option. If you are working with more than one SAS data set in a
data library and you want a different generation version for each SAS data set,
then you must specify GENNUM= in parentheses after individual names.
In the following statement, the GENNUM= option specifies the version of a
generation group for the SAS data set Bones only:
delete xplant bones (gennum=2);
2 after a forward slash (/) in a subordinate statement. When you use the
GENNUM= option following a slash, the option refers to all SAS data sets
named in the statement unless the same option appears in parentheses after
the name of a SAS data set. This method is convenient when you are working
with more than one file and you want the same version for all files.
In the following statement, the GENNUM= option in parentheses specifies the
generation version for SAS data set Chest, and the GENNUM= option after the
slash specifies the generation version for SAS data set Virus:
delete chest (gennum=2) virus / gennum=1;
1. For the APPEND and CONTENTS statements, use GENNUM= just as you use any SAS data set option, in parentheses
after the SAS data set name.
668 Chapter 17 / DATASETS Procedure
of the SAS data sets you are working with in the library. Do not specify the
option in parentheses.
In the following PROC DATASETS step, the GENNUM= option specifies the
generation version for the SAS files Insulin and Abneg:
proc datasets gennum=2;
delete insulin;
contents data=abneg;
run;
Note: For the generation version for a SAS file in a SELECT statement, SAS
looks in the COPY statement before it looks in the PROC DATASETS statement.
In Subordinate Statements
Use the MEMTYPE= option in the following subordinate statements to limit the
member types that are available for processing:
Note: The MEMTYPE= option works slightly differently for the CONTENTS, COPY,
and MODIFY statements. For more information, see CONTENTS statement on page
603, COPY Statement on page 610, and MODIFY statement on page 644.
Lot.catalog, and Sales.data because the default member type for the DELETE
statement is DATA. (For more information, see Table 17.52 on page 670 for the
default types for each statement.)
delete house lot(memtype=catalog) sales;
2 after a slash (/) at the end of the statement. When used following a slash, the
MEMTYPE= option refers to all SAS files named in the statement unless the
option appears in parentheses after the name of a SAS file. For example, the
following statement deletes Lotpix.catalog, Regions.data, and Appl.catalog:
delete lotpix regions(memtype=data) appl / memtype=catalog;
Note: When you use the EXCLUDE and SELECT statements, the procedure
looks in the COPY statement for the MEMTYPE= option before it looks in the
PROC DATASETS statement. For more information, see “Specifying Member
Types When Copying or Moving SAS Files” on page 620.
4 for the default value. If you do not specify a MEMTYPE= option in the
subordinate statement or in the PROC DATASETS statement, the default value
for the subordinate statement determines the member type available for
processing.
Member Types
The following list gives the possible values for the MEMTYPE= option:
ACCESS
access descriptor files (created by SAS/ACCESS software)
ALL
all member types
CATALOG
SAS catalogs
DATA
SAS data files
FDB
financial database
MDDB
multidimensional database
PROGRAM
stored compiled SAS programs
670 Chapter 17 / DATASETS Procedure
VIEW
SAS views
The following table shows the member types that you can use in each statement:
Default
Statement Appropriate Member Types Member Type
1 When DATA=_ALL_ in the CONTENTS statement, the default is ALL. ALL includes only DATA and
VIEW.
2 ALL includes only DATA and CATALOG.
n copies all data sets from the Control library to the Health library
Libref HEALTH
Engine V9
Physical Name c:\Documents and Settings\myfile\My Documents\procdatasets\health
Filename c:\Documents and Settings\myfile\My Documents\procdatasets\health
If you want only a directory, use the NODS option and the _ALL_ keyword in the
DATA= option. The NODS option suppresses the description of the SAS data sets;
only the directory appears in the output.
Note: The CONTENTS statement does not put a directory in an output data set. If
you try to create an output data set using the NODS option, you receive an empty
output data set. Use the SQL procedure to create a SAS data set that contains
information about a SAS library.
Note: If you specify the ODS RTF destination, the PROC DATASETS output goes
to both the SAS log and the ODS output area. The NOLIST option suppresses
output to both. To see the output only in the SAS log, use the ODS EXCLUDE
statement by specifying the member directory as the exclusion.
674 Chapter 17 / DATASETS Procedure
Procedure Output
Only the items in the output that require explanation are discussed.
Last Modified
indicates the date and time that the data set was last modified. The date and
time reflect the setting of the TIMEZONE= system option. If the TIMEZONE=
system option is not set, then the local time zone in which the SAS session is
running is used.
Protection
indicates whether the SAS data set is Read, Write, or Alter password protected.
Data Set Type
names the special data set type (such as CORR, COV, SSPC, EST, or FACTOR), if
any.
Observations
is the total number of observations currently in the file. Note that for a very
large data set, if the number of observations exceeds the largest integer value
that can be represented in a double precision floating point number, the count is
shown as missing.
Deleted Observations
is the number of observations marked for deletion. These observations are not
included in the total number of observations, shown in the Observations field.
Note that for a very large data set, if the number of deleted observations
exceeds the number that can be stored in a double-precision integer, the count
is shown as missing. Also, the count for Deleted Observations shows a
missing value if you use the COMPRESS=YES option with one or both of the
REUSE=YES and POINTOBS=NO options.
Compressed
indicates whether the data set is compressed. If the data set is compressed, the
output includes an additional item, Reuse Space (with a value of YES or NO).
This item indicates whether to reuse space that is made available when
observations are deleted.
Sorted
indicates whether the data set is sorted. If you sort the data set with PROC
SORT, PROC SQL, or specify sort information with the SORTEDBY= data set
option, a value of YES appears here, and there is an additional section to the
output. See “Sort Information” on page 680 for details.
Data Representation
is the format in which data is represented on a computer architecture or in an
operating environment. For example, on an IBM PC, character data is
represented by its ASCII encoding and byte-swapped integers. Native data
representation refers to an environment for which the data representation
compares with the CPU that is accessing the file. For example, a file that is in
Windows data representation is native to the Windows operating environment.
Encoding
is the encoding value. Encoding is a set of characters (letters, logograms, digits,
punctuation, symbols, control characters, and so on) that have been mapped to
numeric values (called code points) that can be used by computers. The code
points are assigned to the characters in the character set when you apply an
encoding method.
Results: DATASETS Procedure 677
Output 17.5 Engine/Host Dependent Information for the Group Data Set
Note: Variable names are sorted such that X1, X2, and X10 appear in that order
and not in the true collating sequence of X1, X10, and X2. Variable names that
contain an underscore and digits might appear in a nonstandard sort order. For
example, P25 and P75 appear before P2_5.
Type
specifies the type of variable: character or numeric.
Len
specifies the variable's length, which is the number of bytes used to store each
of a variable's values in a SAS data set.
Transcode
specifies whether a character variable is transcoded. If the attribute is NO, then
transcoding is suppressed. By default, character variables are transcoded when
Results: DATASETS Procedure 679
required. For more information about transcoding, see SAS National Language
Support (NLS): Reference Guide.
Note: If none of the variables in the SAS data set has a format, informat, or label
associated with it, or if all of the variables are set to TRANSCODE=YES, then the
column for the attribute is NOT displayed.
Output 17.6 Listing of Variables and Attributes of the Group Data Set
Nomiss Option
indicates whether the index excludes missing values for all index variables. If
the column contains YES, the index does not contain observations with missing
values for all index variables.
# of Unique Values
gives the number of unique values in the index.
Variables
names the variables in a composite index.
Output 17.7 Listing of Indexes and Attributes of the Group Data Set
Sort Information
The section shown in the following output appears only if the Sorted field has a
value of YES.
Sortedby
indicates how the data are currently sorted. This field contains either the
variables and options that you use in the BY statement in PROC SORT, the
column name in PROC SQL, or the values that you specify in the SORTEDBY=
option.
Validated
indicates whether the data was sorted using PROC SORT or SORTEDBY. If
PROC SORT or PROC SQL sorted the data set, the value is YES. If you assigned
the sort indicator with the SORTEDBY= data set option, the value is NO.
Character Set
is the character set used to sort the data. The value for this field can be ASCII,
EBCDIC, or PASCII.
Collating Sequence
is the collating sequence used to sort the data set, which can be a translation
table name, an encoding value, or LINGUISTIC if the data set is sorted
linguistically. This field does not appear if you do not specify a collating
sequence that is different from the character set.
Sort Option
indicates whether PROC SORT used the NODUPKEY option when sorting the
data set. This field does not appear if you did not use this option in a PROC
SORT statement (not shown).
By default, the PROC DATASETS statement itself produces two output objects:
Members and Directory. These objects are routed to the SAS log. The CONTENTS
statement produces three output objects by default: Attributes, EngineHost, and
Variables. (The use of various options adds other output objects.) These objects
are routed to the procedure output file. If you open an ODS destination (such as
HTML, RTF, or PRINTER), all of these objects are, by default, routed to that
destination.
You can use ODS SELECT and ODS EXCLUDE statements to control which objects
go to which destination, just as you can for any other procedure. However, because
of the unique interface between PROC DATASETS and ODS, when you use the
keyword LISTING in an ODS SELECT or ODS EXCLUDE statement, you affect both
the log and the listing.
PROC CONTENTS generates the same ODS tables as PROC DATASETS with the
CONTENTS statement.
Table 17.15 ODS Tables Produced by the DATASETS Procedure without the
CONTENTS Statement
Table 17.16 ODS Table Names Produced by PROC CONTENTS and PROC DATASETS with the
CONTENTS Statement
Attributes Data set attributes Unless you specify the SHORT option
EngineHost Engine and operating Unless you specify the SHORT option
environment information
IntegrityConstraints A detailed listing of integrity If the data set has integrity constraints and you
constraints do not specify the SHORT option
IntegrityConstraintsS A concise listing of integrity If the data set has integrity constraints and you
hort constraints specify the SHORT option
Indexes A detailed listing of indexes If the data set is indexed and you do not
specify the SHORT option
IndexesShort A concise listing of indexes If the data set is indexed and you specify the
SHORT option
Position A detailed listing of variables If you specify the VARNUM option and you do
by logical position in the data not specify the SHORT option
set
PositionShort A concise listing of variables If you specify the VARNUM option and the
by logical position in the data SHORT option
set
Results: DATASETS Procedure 683
PositionVarchar Position including varchar If a varchar is in the data set, you specify
type VARNUM and you do not specify the SHORT
option
Sortedby Detailed sort information If the data set is sorted and you do not specify
the SHORT option
SortedbyShort Concise Sort information If the data set is sorted and you specify the
SHORT option
Variables A detailed listing of variables Unless you specify the SHORT option
in alphabetical order
VariablesVarchar Variables including varchar If a varchar is in the data set, unless the
type SHORT option is specified
1 For PROC DATASETS, if both the NOLIST option and either the DIRECTORY option or DATA=<libref.>_ALL_ are specified,
then the NOLIST option is ignored.
COLLATE
the collating sequence used to sort the data set. A blank appears if the sort
indicator for the input data set does not include a collating sequence.
COMPRESS
indicates whether the data set is compressed.
CRDATE
date the data set was created.
DELOBS
number of observations marked for deletion in the data set. (Observations can
be marked for deletion but not actually deleted when you use the FSEDIT
procedure of SAS/FSP software.)
ENCRYPT
indicates whether the data set is encrypted.
ENGINE
name of the method used to read from and write to the data set.
FLAGS
indicates whether the variables in an SQL view are protected (P) or contribute
(C) to a derived variable.
P
indicates the variable is protected. The value of the variable can be
displayed but not updated.
C
indicates whether the variable contributes to a derived variable.
IDXCOUNT
number of indexes for the data set.
IDXUSAGE
use of the variable in indexes. Possible values are
NONE
the variable is not part of an index.
SIMPLE
the variable has a simple index. No other variables are included in the index.
COMPOSITE
the variable is part of a composite index.
BOTH
the variable has a simple index and is part of a composite index.
INFORMAT
variable informat. The value is a blank if you do not associate an informat with
the variable.
INFORMD
number of decimals that you specify when you associate the informat with the
variable. The value is 0 if you do not specify decimals when you associate the
informat with the variable.
INFORML
informat length. If you specify a length for the informat when you associate the
informat with a variable, the length that you specify is the value of INFORML. If
you do not specify a length for the informat when you associate the informat
with a variable, the value of INFORML is the default length of the informat if
you use the FMTLEN option and 0 if you do not use the FMTLEN option.
JUST
justification (0=left, 1=right).
LABEL
variable label (blank if none given).
LENGTH
variable length.
LIBNAME
libref used for the data library.
MEMLABEL
label for this SAS data set (blank if no label).
MEMNAME
SAS data set that contains the variable.
MEMTYPE
library member type (DATA or VIEW).
MODATE
date the data set was last modified.
NAME
variable name.
686 Chapter 17 / DATASETS Procedure
NOBS
number of observations in the data set.
NODUPKEY
indicates whether the NODUPKEY option was used in a PROC SORT statement
to sort the input data set.
NODUPREC
indicates whether the NODUPREC option was used in a PROC SORT statement
to sort the input data set.
NPOS
physical position of the first character of the variable in the data set.
POINTOBS
indicates whether the data set can be addressed by observation.
PROTECT
the first letter of the level of protection. The value for PROTECT is one or more
of the following:
A
indicates the data set is alter-protected.
R
indicates the data set is read-protected.
W
indicates the data set is write-protected.
REUSE
indicates whether the space made available when observations are deleted
from a compressed data set should be reused. If the data set is not compressed,
the REUSE variable has a value of NO.
SORTED
the value depends on the sorting characteristics of the input data set. Here are
the possible values:
. (period)
for not sorted.
0
for sorted but not validated.
1
for sorted and validated.
SORTEDBY
the value depends on that variable's role in the sort. Here are the possible
values:
. (period)
if the variable was not used to sort the input data set.
n
where n is an integer that denotes the position of that variable in the sort. A
negative value of n indicates that the data set is sorted by the descending
order of that variable.
Results: DATASETS Procedure 687
TRANSCOD
indicates whether the variable is transcoded.
TYPE
type of the variable (1=numeric, 2=character).
TYPEMEM
special data set type (blank if no TYPE= value is specified).
VARNUM
variable number in the data set. Variables are numbered in the order in which
they appear.
The output data set is sorted by the variables LIBNAME and MEMNAME.
Note: The variable names are sorted so that the values X1, X2, and X10 are listed in
that order, not in the true collating sequence of X1, X10, X2. Therefore, if you want
to use a BY statement on MEMNAME in subsequent steps, run a PROC SORT step
on the output data set first. You can also use the NOTSORTED option in the BY
statement.
Here is an example of an output data set created from the Group data set, which is
shown in “Example 5: Describing a SAS Data Set” on page 707 and in “Procedure
Output” on page 674.
Due to the size of the Health.Grpout, the following output is in five sections.
LIBNAME
libref used for the data library.
MEMNAME
SAS data set that contains the variable.
MG
the value of MESSAGE=, if it is used, in the IC CREATE statement.
MSGTYPE
the value is blank unless an integrity constraint is violated and you specified a
message.
NAME
the name of the index or integrity constraint.
NOMISS
contains YES if the NOMISS option is defined for the index.
NUMVALS
the number of distinct values in the index (displayed for centiles).
NUMVARS
the number of variables involved in the index or integrity constraint.
ONDELETE
for a foreign key integrity constraint, contains RESTRICT or SET NULL if
applicable (the ON DELETE option in the IC CREATE statement).
ONUPDATE
for a foreign key integrity constraint, contains RESTRICT or SET NULL if
applicable (the ON UPDATE option in the IC CREATE statement).
RECREATE
the SAS statement necessary to re-create the index or integrity constraint.
REFERENCE
for a foreign key integrity constraint, contains the name of the referenced data
set.
TYPE
the type. For an index, the value is “Index” while for an integrity constraint, the
value is the type of integrity constraint (Not Null, Check, Primary Key, and so
on).
UNIQUE
contains YES if the UNIQUE option is defined for the index.
UPERC
the percentage of the index that has been updated since the last refresh
(displayed for centiles).
UPERCMX
the percentage of the index update that triggers a refresh (displayed for
centiles).
WHERE
for a check integrity constraint, contains the WHERE statement.
Example 1: Removing All Labels and Formats in a Data Set 691
Details
This example demonstrates the following tasks:
n sets system options
n uses PROC CONTENTS to show data set with and without labels and format
Program
options ls=79 nodate center;
title ;
run;
data mylib.class;
format z clsfmt.;
label x='ID NUMBER'
y='AGE'
z='CLASS STATUS';
input x y z;
datalines;
1 20 4
2 18 1
;
proc contents data=mylib.class;
run;
proc datasets lib=mylib memtype=data;
modify class;
attrib _all_ label=' ';
attrib _all_ format=;
contents data=mylib.class;
run;
quit;
Program Description
Set the system options and the LIBNAME statement. In this example, the
LIBNAME is MyLib. The CENTER option specifies to align SAS procedure output in
the center. The NODATE option specifies that the date and the time are not printed.
The LS= option specifies the line size for the SAS log and for the output. The TITLE
statement followed by a blank space removes any existing title in your SAS session.
options ls=79 nodate center;
title ;
Create a data set named Class. Use the CLSFMT format on variable Z. Create
labels for variables, X, Y, and Z.
data mylib.class;
format z clsfmt.;
label x='ID NUMBER'
y='AGE'
z='CLASS STATUS';
input x y z;
datalines;
1 20 4
2 18 1
;
Example 1: Removing All Labels and Formats in a Data Set 693
Use PROC CONTENTS to view the contents of the data set before removing the
labels and format.
proc contents data=mylib.class;
run;
Within PROC DATASETS, remove all the labels and formats using the MODIFY
statement and the ATTRIB option. Use the CONTENTS statement within PROC
DATASETS to view the contents of the data set without the labels and format.
proc datasets lib=mylib memtype=data;
modify class;
attrib _all_ label=' ';
attrib _all_ format=;
contents data=mylib.class;
run;
quit;
694 Chapter 17 / DATASETS Procedure
Output Examples
Output 17.14 CONTENTS Procedure for Class Data Set with Labels and Format
Example 1: Removing All Labels and Formats in a Data Set 695
Output 17.15 CONTENTS Statement for Class Data Set without Labels and Format
696 Chapter 17 / DATASETS Procedure
Details
This example demonstrates the following tasks:
n changes the names of SAS files
Program
options pagesize=60 linesize=80 nodate pageno=1 source;
LIBNAME dest1 'SAS-library-1';
LIBNAME dest2 'SAS-library-2';
LIBNAME health 'SAS-library-3';
proc datasets library=health details;
Example 2: Manipulating SAS Files 697
Program Description
Set the system options. The SOURCE system option writes the programming
statements to the SAS log. PAGESIZE= option specifies the number of lines that
compose a page of the SAS log and SAS output. LINESIZE= option specifies the line
size for the SAS log and for SAS procedure output. NODATE option specifies that
the date and the time are not printed. PAGENO= option specifies a beginning page
number for the next page of output.
options pagesize=60 linesize=80 nodate pageno=1 source;
LIBNAME dest1 'SAS-library-1';
LIBNAME dest2 'SAS-library-2';
LIBNAME health 'SAS-library-3';
Specify the procedure input library, and add more details to the directory.
DETAILS prints these additional columns in the directory: Obs, Entries or Indexes,
Vars, and Label. All member types are available for processing because the
MEMTYPE= option does not appear in the PROC DATASETS statement.
proc datasets library=health details;
Delete two files in the library, and modify the names of a SAS data set and a
catalog. The DELETE statement deletes the Tension data set and the A2 catalog.
MT=CATALOG applies only to A2 and is necessary because the default member
type for the DELETE statement is DATA. The CHANGE statement changes the
name of the A1 catalog to Postdrug. The EXCHANGE statement exchanges the
names of the Weight and Bodyfat data sets. MEMTYPE= is not necessary in the
CHANGE or EXCHANGE statement because the default is MEMTYPE=ALL for each
statement.
delete tension a2(mt=catalog);
change a1=postdrug;
exchange weight=bodyfat;
Restrict processing to one member type and delete and move data views.
MEMTYPE=VIEW restricts processing to SAS views. MOVE specifies that all SAS
views named in the SELECT statements in this step be deleted from the Health
data library and moved to the Dest1 data library.
copy out=dest1 move memtype=view;
Move the SAS view Spdata from the Health data library to the Dest1 data library.
698 Chapter 17 / DATASETS Procedure
select spdata;
Move the catalogs to another data library. The SELECT statement specifies that
the catalogs Etest1 through Etest5 be moved from the Health data library to the
Dest1 data library. MEMTYPE=CATALOG overrides the MEMTYPE=VIEW option in
the COPY statement.
select etest1-etest5 / memtype=catalog;
Exclude all files with specified criteria from processing. The EXCLUDE statement
excludes from the COPY operation all SAS files that begin with the letter D and the
other SAS files listed. All remaining SAS files in the Health data library are copied
to the Dest2 data library.
copy out=dest2;
exclude d: mlscl oxygen test2 vision weight;
quit;
Example 2: Manipulating SAS Files 699
Log Examples
Example Code 17.2 SAS Log for Dest1
Libref HEALTH
Engine V9
Physical Name \myfiles\health
Filename \myfiles\health
1 A1 CATALOG 23
2 ALL DATA 23 17
3 BODYFAT DATA 1 2
4 CONFOUND DATA 8 4
5 CORONARY DATA 39 4
6 DRUG1 DATA 6 2 JAN2005 DATA
7 DRUG2 DATA 13 2 MAY2005 DATA
8 DRUG3 DATA 11 2 JUL2005 DATA
9 DRUG4 DATA 7 2 JAN2002 DATA
10 DRUG5 DATA 1 2 JUL2002 DATA
11 ETEST1 CATALOG 1
12 ETEST2 CATALOG 1
13 ETEST3 CATALOG 1
14 ETEST4 CATALOG 1
15 ETEST5 CATALOG 1
16 ETESTS CATALOG 1
17 FORMATS CATALOG 6
18 GROUP DATA 148 11
19 GRPOUT DATA 11 40
20 INFANT DATA 149 6
21 MLSCL DATA 32 4 Multiple Sclerosis Data
22 NAMES DATA 7 4
23 OXYGEN DATA 31 7
24 PERSONL DATA 148 11
25 PHARM DATA 6 3 Sugar Study
26 POINTS DATA 6 6
27 RESULTS DATA 10 5
28 SLEEP DATA 108 6
29 SPDATA VIEW . 2
30 TEST2 DATA 15 5
31 TRAIN DATA 7 2
32 VISION DATA 16 3
33 WEIGHT DATA 83 13 California Results
34 WGHT DATA 83 13
700 Chapter 17 / DATASETS Procedure
File
# Size Last Modified
1 62464 07Mar05:14:36:20
2 13312 12Sep07:13:57:48
3 5120 12Sep07:13:57:48
4 5120 12Sep07:13:57:48
5 5120 12Sep07:13:57:48
6 5120 12Sep07:13:57:49
7 5120 12Sep07:13:57:49
8 5120 12Sep07:13:57:49
9 5120 12Sep07:13:57:49
10 5120 12Sep07:13:57:49
11 17408 04Jan02:14:20:16
12 17408 04Jan02:14:20:16
13 17408 04Jan02:14:20:16
14 17408 04Jan02:14:20:16
15 17408 04Jan02:14:20:16
16 17408 24Mar05:16:12:20
17 17408 24Mar05:16:12:20
18 25600 12Sep07:13:57:50
19 17408 24Mar05:15:33:31
20 17408 12Sep07:13:57:51
21 5120 12Sep07:13:57:50
22 5120 12Sep07:13:57:50
23 9216 12Sep07:13:57:50
24 25600 12Sep07:13:57:51
25 5120 12Sep07:13:57:51
26 5120 12Sep07:13:57:51
27 5120 12Sep07:13:57:52
28 9216 12Sep07:13:57:52
29 5120 24Mar05:16:12:21
30 5120 12Sep07:13:57:52
31 5120 12Sep07:13:57:53
32 5120 12Sep07:13:57:53
33 13312 12Sep07:13:57:53
34 13312 12Sep07:13:57:53122 delete tension
a2(mt=catalog);
123 change a1=postdrug;
124 exchange weight=bodyfat;
NOTE: Changing the name HEALTH.A1 to HEALTH.POSTDRUG (memtype=CATALOG).
NOTE: Exchanging the names HEALTH.WEIGHT and HEALTH.BODYFAT (memtype=DATA).
125 copy out=dest1 move memtype=view;
126 select spdata;
127
128 select etest1-etest5 / memtype=catalog;
NOTE: Moving HEALTH.SPDATA to DEST1.SPDATA (memtype=VIEW).
NOTE: Moving HEALTH.ETEST1 to DEST1.ETEST1 (memtype=CATALOG).
NOTE: Moving HEALTH.ETEST2 to DEST1.ETEST2 (memtype=CATALOG).
NOTE: Moving HEALTH.ETEST3 to DEST1.ETEST3 (memtype=CATALOG).
NOTE: Moving HEALTH.ETEST4 to DEST1.ETEST4 (memtype=CATALOG).
NOTE: Moving HEALTH.ETEST5 to DEST1.ETEST5 (memtype=CATALOG).
Example 3: Saving SAS Files from Deletion 701
SAVE statement
OPTIONS statement
Details
This example demonstrates how to use the SAVE statement to save some SAS files
from deletion and to delete other SAS files.
Program
options pagesize=40 linesize=80 nodate pageno=1 source;
LIBNAME elder 'SAS-library';
proc datasets lib=elder;
save chronic aging clinics / memtype=data;
run;
Program Description
Set the system options and the LIBNAME statement. The SOURCE system option
writes the programming statements to the SAS log. LINESIZE= option specifies the
line size for the SAS log and for SAS procedure output. NODATE option specifies
that the date and the time are not printed.
options pagesize=40 linesize=80 nodate pageno=1 source;
LIBNAME elder 'SAS-library';
Save the data sets Chronic, Aging, and Clinics, and delete all other SAS files (of all
types) in the Elder library. MEMTYPE=DATA is necessary because the Elder library
has a catalog named Clinics and a data set named Clinics.
save chronic aging clinics / memtype=data;
run;
Example 3: Saving SAS Files from Deletion 703
Log Examples
Example Code 17.3 SAS Log for Elder Library
Output 17.16 Elder Library before and After Using the SAVE Statement
704 Chapter 17 / DATASETS Procedure
Details
This example modifies two SAS data sets using the MODIFY statement and
statements subordinate to it. “Example 5: Describing a SAS Data Set” on page 707
shows the modifications to the Group data set.
Program
options pagesize=40 linesize=80 nodate pageno=1 source;
LIBNAME health 'SAS-library';
proc datasets library=health nolist;
modify group (label='Test Subjects' read=green sortedby=lname);
index create vital=(birth salary) / nomiss unique;
informat birth date7.;
format birth date7.;
label salary='current salary excluding bonus';
modify oxygen;
rename oxygen=intake;
label intake='Intake Measurement';
quit;
Program Description
Set the system options and LIBNAME statement. The SOURCE system option
writes the programming statements to the SAS log. PAGESIZE= option specifies the
number of lines that compose a page of the SAS log and SAS output. LINESIZE=
option specifies the line size for the SAS log and for SAS procedure output.
NODATE option specifies that the date and the time are not printed. PAGENO=
option specifies a beginning page number for the next page of output.
options pagesize=40 linesize=80 nodate pageno=1 source;
LIBNAME health 'SAS-library';
Specify Health as the procedure input library to process. NOLIST suppresses the
directory listing for the Health data library.
proc datasets library=health nolist;
Add a label to a data set, assign a Read password, and specify how to sort the
data. LABEL= adds a data set label to the data set Group. READ= assigns green as
the Read password. The password appears as Xs in the SAS log. SAS issues a
warning message if you specify a level of password protection on a SAS file that
does not include alter protection. SORTEDBY= specifies how the data is sorted.
modify group (label='Test Subjects' read=green sortedby=lname);
Create the composite index VITAL on the variables BIRTH and SALARY for the
Group data set. NOMISS excludes all observations that have missing values for
BIRTH and SALARY from the index. UNIQUE specifies that the index is created
only if each observation has a unique combination of values for BIRTH and
SALARY.
706 Chapter 17 / DATASETS Procedure
Rename a variable, and assign a label. Modify the data set Oxygen by renaming the
variable OXYGEN to INTAKE and assigning a label to the variable INTAKE.
modify oxygen;
rename oxygen=intake;
label intake='Intake Measurement';
quit;
Log Examples
Example Code 17.4 SAS Log for Health Library
Details
This example demonstrates the output from the CONTENTS statement for the
Group data set. The output shows the modifications made to the Group data set in
“Example 4: Modifying SAS Data Sets” on page 704.
Program
options pagesize=40 linesize=80 nodate pageno=1;
Program Description
Set the system options and LIBNAME statement. PAGESIZE= option specifies the
number of lines that compose a page of the SAS log and SAS output. LINESIZE=
option specifies the line size for the SAS log and for SAS procedure output.
NODATE option specifies that the date and the time are not printed. PAGENO=
option specifies a beginning page number for the next page of output.
options pagesize=40 linesize=80 nodate pageno=1;
Specify Health as the procedure input library, and suppress the directory listing
with the NOLIST option.
proc datasets library=health nolist;
708 Chapter 17 / DATASETS Procedure
Create the output data set Grpout from the data set Group. Specify Group as the
data set to describe, give Read access to the Group data set, and create the output
data set Grpout, which appears in the OUT= data set.
contents data=group (read=green) out=grpout;
title 'The Contents of the GROUP Data Set';
run;
quit;
Output Examples
Output 17.17 Contents of Group Data Set
Example 5: Describing a SAS Data Set 709
Output 17.20 Alphabetic List of Extended Attributes for the Data Set and Variables
Details
This example demonstrates the following tasks:
n suppresses the printing of a library
n prints the data sets before appending and prints the new data set after
appending
To create the Exp.Results and Exp.Sur data sets and print them out before using
this example to concatenate them, see “EXP Library” on page 2789.
Program
options pagesize=40 linesize=64 nodate pageno=1;
Example 6: Concatenating Two SAS Data Sets 711
Program Description
This example appends one data set to the end of another data set.
This example appends one data set to the end of another data set.
The data set Exp.Sur contains the variable Wt6Mos, but the Exp.Results data set
does not.
Set the system options. The NODATE option suppresses the display of the date
and time in the output. The PAGENO= option specifies the starting page number.
The LINESIZE= option specifies the output line length, and the PAGESIZE= option
specifies the number of lines on an output page.
options pagesize=40 linesize=64 nodate pageno=1;
Suppress the printing of the Exp library. LIBRARY= specifies Exp as the procedure
input library. NOLIST suppresses the directory listing for the Exp library.
proc datasets library=exp nolist;
Append the data set Exp.Sur to the Exp.Results data set. The APPEND statement
appends the data set Exp.Sur to the data set Exp.Results. FORCE causes the
APPEND statement to carry out the Append operation even though Exp.Sur has a
variable that Exp.Results does not. APPEND does not add the Wt6Mos variable to
Exp.Results.
append base=exp.results data=exp.sur force;
run;
Output 17.23 Concatenating the Results and the Sur Data Sets
Details
This example demonstrates how the AGE statement ages SAS files.
Program
options pagesize=40 linesize=80 nodate pageno=1 source;
714 Chapter 17 / DATASETS Procedure
Program Description
Set the system options. The SOURCE system option writes the programming
statements to the SAS log. PAGESIZE= option specifies the number of lines that
compose a page of the SAS log and SAS output. LINESIZE= option specifies the line
size for the SAS log and for SAS procedure output. NODATE option specifies that
the date and the time are not printed. PAGENO= option specifies a beginning page
number for the next page of output.
options pagesize=40 linesize=80 nodate pageno=1 source;
LIBNAME daily 'SAS-library';
Specify Daily as the procedure input library and suppress the directory listing.
proc datasets library=daily nolist;
Delete and age. Delete the last SAS file in the list, Day7, and then age (or rename)
Day6 to Day7, Day5 to Day6, and so on, until it ages Today to Day1.
age today day1-day7;
run;
Log Examples
Example Code 17.5 SAS Log
Details
This example demonstrates the following tasks:
n initiates an audit file
Program
libname mylib "SAS-library";
data mylib.inventory;
input vendor $10. +1 item $4. +1 description $11. +1 units 4.;
datalines;
SmithFarms F001 Apples 10
Tropicana B002 OrangeJuice 45
UpperCrust C215 WheatBread 25
;
run;
proc sql;
716 Chapter 17 / DATASETS Procedure
proc sql;
select * from mylib.inventory(type=audit);
quit;
Program Description
data mylib.inventory;
input vendor $10. +1 item $4. +1 description $11. +1 units 4.;
datalines;
SmithFarms F001 Apples 10
Tropicana B002 OrangeJuice 45
UpperCrust C215 WheatBread 25
;
run;
proc sql;
Insert into mylib.inventory values ('Bordens','B132', 'Milk', 100,
'increase on hand');
Update mylib.inventory set units=10, reason='recounted inventory'
where item='B002';
quit;
proc sql;
select * from mylib.inventory(type=audit);
quit;
Log Examples
Example Code 17.6 Initiating an Audit File
1 options nocenter;
2
3 libname mylib "SAS-library";
NOTE: Libref MYLIB was successfully assigned as follows:
Engine: V9
Physical Name: c:\mylib
4
5 data mylib.inventory;
6 input vendor $10. +1 item $4. +1 description $11. +1 units 4.;
7 datalines;
11 ;
12 run;
13
14 proc datasets lib=mylib;
NOTE: Writing HTML Body file: sashtml.htm
15 audit inventory;
16 initiate;
WARNING: The audited data file MYLIB.INVENTORY.DATA is not password protected.
Apply an Alter password to prevent accidental
deletion or replacement of it and any associated audit files.
17 user_var reason $ 30;
18 quit;
19
20 proc sql;
21 Insert into mylib.inventory values ('Bordens','B132', 'Milk', 100,
22 'increase on hand');
NOTE: 1 row was inserted into MYLIB.INVENTORY.
25 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 2.57 seconds
cpu time 0.03 seconds
26
27 proc datasets lib=mylib;
28 audit inventory;
29 log admin_image=no;
30 suspend;
31 quit;
32
33 proc sql;
34 select * from mylib.inventory(type=audit);
35 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.54 seconds
cpu time 0.01 seconds
36
37 proc datasets lib=mylib;
37 ! /* resume audit file */
38 audit inventory;
39 resume;
40 quit;
41
42 /* additional step(s) which update the inventory dataset could go here*/
43
44 proc datasets lib=mylib;
44 ! /* terminate audit file */
45 audit inventory;
46 terminate;
NOTE: Deleting MYLIB.INVENTORY (memtype=AUDIT).
47 quit;
Output Examples
Output 17.24 Inventory Contents for MyLib Library
Program
libname mylib 'C:\mylib';
data mylib.sales;
purchase="car";
age=10;
income=200000;
kids=3;
cars=4;
run;
proc datasets lib=mylib nolist;
modify sales;
xattr add ds role="train" attrib="table";
xattr add var purchase (role="target" level="nominal")
age (role="reject")
income (role="input" level="interval");
contents data=sales;
title 'The Contents of the Sales Data Set That Contains Extended
Attributes';
run;
quit;
Program Description
722 Chapter 17 / DATASETS Procedure
data mylib.sales;
purchase="car";
age=10;
income=200000;
kids=3;
cars=4;
run;
contents data=sales;
title 'The Contents of the Sales Data Set That Contains Extended
Attributes';
run;
quit;
Example 9: Extended Attributes 723
Log Examples
Example Code 17.7 Extended Attributes
364
365 proc datasets lib=mylib nolist ;
366 modify sales;
367 xattr add ds role= "train" attrib="table";
368 xattr add var purchase ( role="target" level="nominal" )
369 age ( role="reject" )
370 income ( role="input" level="interval" );
NOTE: MODIFY was successful for MYLIB.SALES.DATA.
371
372 contents data=sales;
373 title 'The Contents of the Sales Data Set That Contains Extended
Attributes';
374
375 run;
376 quit;
Output Examples
Output 17.27 Contents of the Sales Data Set with Extended Attributes
Example 9: Extended Attributes 725
726 Chapter 17 / DATASETS Procedure
727
18
DATEKEYS Procedure
The DATEKEYS procedure can be used to identify time periods that are associated
with a datekey.
Some common uses of the DATEKEYS procedure are to define datekeys that
identify holiday periods, sales events, or changes to operating hours.
A datekey has a name, a date or set of dates that are associated with the datekey,
and a set of qualifiers.
The DATEKEYS procedure provides results in output data sets that can be
interpreted in other SAS procedures. It enables you to define datekeys that can be
specified as SAS predefined datekeys when used in conjunction with the SAS
system option EVENTDS=. The datekeys that you define can be used as date
keywords, just as SAS predefined datekeys are used.
The following example creates datekeys, and then writes the datekey definitions to
an output data set named MyHolidays. The datekey definitions are automatically
available to SAS High-Performance Forecasting procedures by setting the SAS
system option EVENTDS=.
Concepts: DATEKEYS Procedure 729
proc datekeys;
datekeydef SuperBowl=
'15JAN1967'd '14JAN1968'd '12JAN1969'd '11JAN1970'd
'17JAN1971'd '16JAN1972'd '14JAN1973'd '13JAN1974'd '12JAN1975'd
'18JAN1976'd '09JAN1977'd '15JAN1978'd '21JAN1979'd '20JAN1980'd
'25JAN1981'd '24JAN1982'd '30JAN1983'd '22JAN1984'd '20JAN1985'd
'26JAN1986'd '25JAN1987'd '31JAN1988'd '22JAN1989'd '28JAN1990'd
'27JAN1991'd '26JAN1992'd '31JAN1993'd '30JAN1994'd '29JAN1995'd
'28JAN1996'd '26JAN1997'd '25JAN1998'd '31JAN1999'd '30JAN2000'd
'28JAN2001'd '03FEB2002'd '26JAN2003'd '01FEB2004'd '06FEB2005'd
'05FEB2006'd '04FEB2007'd '03FEB2008'd '01FEB2009'd '07FEB2010'd
'06FEB2011'd '05FEB2012'd '03FEB2013'd '02FEB2014'd '01FEB2015'd
...'07FEB2016'd '05FEB2017'd '04FEB2018'd '03FEB2019'd '02FEB2020'd
/ PULSE=DAY ;
options eventds=(MyHolidays);
The following statements display an output data set that shows variables for the
Super Bowl, Good Friday, and Easter Monday events. The first output shows the
results when Month=1. The second output shows the results when Month GE 3.
run;
PROC DATEKEYS<options>;
BY variable(s);
DATEKEYCALENDAR OUT=SAS-data-set <SUMMARY=SAS-variable-name>;
DATEKEYDATA IN=SAS-data-set | OUT=SAS-data-set <options>;
DATEKEYDEF SAS-variable-name=timing-value </qualifier-options>;
DATEKEYDSOPT LOCALE=<(ONLY)>'POSIX locale';
DATEKEYKEY <SAS-variable-name=> datekey-keyword </qualifier-options>;
DATEKEYPERIODS OUT=SAS-data-set;
ID SAS-variable-name INTERVAL=interval <options>;
VAR variable(s);
PROC DATEKEYS Creates and manages datekeys that are Ex. 1, Ex. 2,
associated with time computations Ex. 3, Ex. 4
DATEKEYCALENDAR Writes variables that indicate the active time Ex. 3, Ex. 4
periods for datekeys
Syntax
PROC DATEKEYS <options>;
Optional Argument
option
The following options are available:
DATA=SAS-data-set
names the SAS data set that contains the variables that are used in the VAR,
ID, and BY statements.
Tip If the DATA= option is not specified, the most recently created SAS
data set is used.
LEAD=number-of-periods
specifies the number of periods to extend the calendar variables beyond the
input time ID. The LEAD= value is relative to the last observation in the input
data set. If BY variables are specified, the LEAD= value is relative to the last
observation in each BY group.
Default 0
MAXERROR=number
specifies the maximum number of warning and error messages that are
produced during the execution of the procedure.
Default 25
SORTNAMES
specifies that the datekeys and variables in the output data sets be written
in alphabetical order. Variables are sorted within their groups. Variables that
734 Chapter 18 / DATEKEYS Procedure
are listed in the VAR statement are sorted with respect to other variables
that are listed in the VAR statement. Calendar variables are sorted with
respect to other calendar variables.
BY Statement
Obtains separate calendar variables for groups of observations that are defined by the BY variables.
Syntax
BY variable(s);
Required Argument
variable
specifies a variable that is used to obtain separate calendar variables for groups
of observations that are defined by the BY variables.
Tip When a BY statement appears, the procedure expects the input data set to
be sorted in order of the BY variables. If your input data set is not sorted in
ascending order, use one of the following alternatives: Sort the data by
using the SORT procedure with a similar BY statement, or create an index
for the BY variables by using the DATASETS procedure. For more
information, see the DATASETS procedure in Base SAS Procedures Guide.
DATEKEYCALENDAR Statement
Writes variables that indicate the active time periods for datekeys. The active periods are indicated
with a value of 1, and the inactive periods are indicated with a value of 0.
Syntax
DATEKEYCALENDAR OUT=SAS-data-set <SUMMARY=SAS-variable-name>;
Required Argument
OUT=SAS-data-set
names the output data set to contain the calendar variables for the specified
datekeys based on the ID information as specified in the ID statement.
Tip SAS-data-set also includes variables that are specified in the VAR, BY, and
ID statements.
Optional Argument
SUMMARY | SUM=SAS-variable-name
specifies that the datekey calendar variables be summed and that the result be
placed in the specified variable in the DATEKEYCALENDAR OUT= data set.
The SUM= variable can be interpreted as the number of keydates that are active
for each time interval. If the SUM= option is not specified, no such variable is
included in the OUT= data set.
DATEKEYDATA Statement
Inputs datekeys from a datekeys data set or writes datekeys to a datekeys data set. You can specify
multiple DATEKEYDATA statements.
Syntax
DATEKEYDATA IN=SAS-data-set | OUT=SAS-data-set <options>;
Required Arguments
IN=SAS-data-set
names an input data set that contains datekey definitions.
OUT=SAS-data-set
names the output data set to contain the datekey definitions that are specified
in the DATEKEYDATA IN= data sets and in the DATEKEYDEF and DATEKEYKEY
statements.
736 Chapter 18 / DATEKEYS Procedure
Tip If the LIST option is not specified, the OUT= data set can then be used in
other SAS procedures and system options to define datekeys.
Optional Argument
option
The following options are available:
CONDENSE
specifies that the DATEKEYDATA OUT= data set be condensed. Any
variables that contain only default values are omitted from the data set.
The DATEKEYDATA IN= option reads both condensed data sets and data
sets that have not been condensed. For more information, see “Identifying
Variables in the DATEKEYDATA OUT= Data Set” on page 757.
LIST
specifies that the DATEKEYDATA OUT= data set contain only a list of the
available datekeys. When you specify the LIST option, the output data set
does not contain the parameters that are required for datekey definition.
NODEFAULTS
specifies that the DATEKEYDATA OUT= data set not contain any SAS
predefined datekeys.
DATEKEYDEF Statement
Defines a datekey that can be interpreted in other SAS procedures. You can specify multiple
DATEKEYDEF statements.
Note: These datekeys can be used to create events that can be included in forecasting
models.
Syntax
DATEKEYDEF SAS-variable-name=timing-value </qualifier-options>;
Required Arguments
SAS-variable-name
specifies a name in the DATEKEYDEF statement.
timing-value list
specifies one or more datekeys, dates, datetime values, or observation numbers.
You can also specify a value-list.
integer value-list
For integer variables, integer value-list is either an explicit list of one or more
integers or a starting value and an ending value with an interval increment, or
a combination of both forms:
n n <...n>
Here is an example:
10,11,12
n n TO n <BY increment>
Here is an example:
10 to 12 by 1
Here is an example:
11 to 5, 5 to 10 by 1;
"value-1" <"value-2"..."value-n">
Here is an example:
"INDEPENDENCE"
or
"INDEPENDENCE" "EASTER" "NEWYEARS"
'01Jan2000'd,'01Feb2000'd, '01Mar2000'd
or
'01Mar1990:15:03:00'dt
or
'01Jan:15:03:00'dt,'1Feb:15:03:00'dt, '01Mar:15:03:00'dt
Optional Argument
qualifier-options
The following qualifier options are available:
AFTER=(<DURATION=value>)
specifies options that control the datekey definition after the timing value.
The DURATION= suboption is used within the parentheses in the AFTER= ( )
option. DURATION specifies the datekey duration after the timing value.
BEFORE=(<DURATION=value>)
specifies options that control the datekey definition before the timing value.
The DURATION= suboption is used within the parentheses in the BEFORE=
( ) option. DURATION specifies the datekey duration before the timing value.
LABEL='SAS-label'
specifies a label that is associated with the datekey. 'SAS-label' is a text
string that is enclosed in quotation marks and can be up to 256 characters.
LOCALE='POSIX locale'
specifies a locale that is associated with the datekey. The locale should be a
POSIX locale value. There is no default for the locale value.
PERIOD=interval
specifies the interval for the frequency of the datekey. For example,
PERIOD=YEAR produces a datekey that is periodic in a yearly pattern.
If the PERIOD= option is omitted, the datekey is not periodic. The PERIOD=
option does not apply to observation numbers, which are not periodic, or to
date keywords, which have their own periodicity. For intervals that you can
specify, see Chapter 4, “Date Intervals, Formats, and Functions” in SAS/ETS
User’s Guide.
PULSE=interval
specifies the interval to be used with the DURATION= option to determine
the width of the datekey.
RULE=value
specifies the action to take when the defined datekey has multiple timing
values that include at least one datekey.
When the datekey timing values consist only of SAS date, SAS datetime
values, and observation numbers, the RULE= option does not apply. The
RULE= option also does not apply when the timing value list consists of a
single datekey. The RULE= option accepts the values AND and OR. The
default is RULE=OR.
The RULE= option does not apply to the first statement because the timing
value list consists of a single datekey. The RULE= option does not apply to
the second and third statements because the timing value list consists only
of SAS date values. The operation between two SAS date, datetime, or
observation values is always OR. In the fourth statement, the RULE=AND
option identifies only dates where the month is January and the day of the
week is Friday. In the fifth statement, the RULE=OR option identifies all
dates in January and all dates where the day of the week is Friday. In the
sixth statement, the RULE=AND option identifies days that are both rainy
and hot. The RULE=AND applies to the two datekeys, RainyDays and
HotDays.
Usually, the result of an AND operation between two discrete time periods is
an empty value. Therefore, the OR operation is always used between
discrete time periods. An example is '13JUL2013'd, '01Mar1990:15:03:00'dt, 3.
The following table explains how the RULE= option is interpreted for each
observation:
Table 18.1 Definition of RULE= Option Values
SHIFT=number
specifies the number of pulses to shift the timing value δ. The default is not
to shift the timing value (δ= 0). When the SHIFT= option is used, all timing
values in the list (including those that are generated by date keywords) are
shifted. Therefore, SHIFT= can be used with EASTER to specify
ecclesiastical holidays that are based on Easter. For example, the following
statement specifies Good Friday, which is defined as two days before Easter
(Montes 2001).
datekeydef GoodFriday=EASTER / shift=-2 pulse=day;
DATEKEYDSOPT Statement
Limits the input and output processing of data sets to a specified locale.
Syntax
DATEKEYDSOPT LOCALE= 'POSIX locale'<(ONLY)>;
Required Arguments
LOCALE=
specifies a locale that is used to filter input and output data sets.
'POSIX locale'
specifies a POSIX locale value. There is no default for the locale value.
Optional Argument
(ONLY)
specifies to process only the specified locale, both for input and output data
sets. If (ONLY) is not specified, the specified locale and defaults (no specified
locale) are processed, both for input and output data sets.
DATEKEYKEY Statement
Alters a user-defined or predefined SAS datekey, or creates a new datekey from another datekey.
You can specify multiple DATEKEYKEY statements.
DATEKEYKEY Statement 741
Syntax
DATEKEYKEY <SAS-variable-name=> datekey-keyword </qualifier-options>;
Required Argument
datekey-keyword
specifies the default SAS variable name for a user-defined or predefined
datekey.
Optional Arguments
SAS-variable-name
specifies the name of the new datekey keyword.
qualifier-option
The following options are available:
AFTER=(<DURATION=value>)
specifies options that control the datekey definition after the timing value.
The DURATION= suboption is used within the parentheses in the AFTER ( )
option.
DURATION specifies the datekey duration after the timing value when used
in the AFTER= option.
BEFORE=(<DURATION=value>)
specifies options that control the datekey definition before the timing value.
The DURATION= suboption is used within the parentheses in the BEFORE ( )
option.
DURATION specifies the datekey duration before the timing value when
used in the BEFORE= option.
742 Chapter 18 / DATEKEYS Procedure
LABEL='SAS-label'
specifies a label that is associated with the datekey. 'SAS-label' is a text
string that is enclosed in quotation marks and can be up to 256 characters.
The default label is 'SAS-variable-name , where SAS-variable-name is the
name that is specified in the DATEKEYKEY statement. If SAS-variable-name
is not specified in the DATEKEYKEY statement, then the label is the default
label for the SAS predefined datekey. The label is stored in the
DATEKEYDATA OUT= data set.
LOCALE='POSIX locale'
specifies a locale that is associated with the datekey. The locale should be a
POSIX locale value. There is no default for the locale value.
PERIOD=interval
specifies the interval for the frequency of the datekey. For example,
PERIOD=YEAR produces a datekey that is periodic in a yearly pattern. If the
PERIOD= option is omitted, the datekey is not periodic. The PERIOD= option
does not apply to observation numbers, which are not periodic, or to date
keywords, which have their own periodicity. For intervals that can be
specified, see Chapter 4, “Date Intervals, Formats, and Functions” in
SAS/ETS User’s Guide.
PULSE=interval
specifies the interval to be used with the DURATION= option to determine
the width of the datekey. If the datekey is evaluated with respect to a time
ID variable, then the default pulse is one observation. When no DURATION=
values are specified and the PULSE= option is specified, the DURATION=
values are set to zero. For intervals that can be specified, see Chapter 4,
“Date Intervals, Formats, and Functions” in SAS/ETS User’s Guide.
RULE=value
specifies the action to take when the defined datekey has multiple timing
values that include at least one datekey. When the datekey timing values
consist only of SAS date, SAS datetime, and observation numbers, the
RULE= option does not apply. The RULE= option also does not apply when
the timing value list consists of a single datekey. The RULE= option accepts
the values AND and OR. The default is RULE=OR. The following examples
demonstrate the RULE= option:
datekeykey JANUARY / pulse=month;
datekeydef RainyDays='11JUL2013'd '13JUL2013'd '21JUL2013'd;
datekeydef HotDays= '11JUL2013'd '16JUL2013'd '17JUL2013'd
'18JUL2013'd '19JUL2013'd;
datekeydef FridaysInJanuary=JANUARY FRIDAY / rule=and;
datekeydef JanuaryPlusFridays=JANUARY FRIDAY / rule=or;
datekeydef HotandRainyDays=RainyDays HotDays / rule=and;
The RULE= option does not apply to the first statement because the timing
value list consists of a single datekey. The RULE= option does not apply to
the second and third statements because the timing value list consists only
of SAS date values. The operation between two SAS date, datetime, or
observation values is always OR. In the fourth statement, the RULE=AND
option identifies dates that are both in January and the day of the week is
Friday. In the fifth statement, the RULE=OR option identifies all dates in
January and all dates where the day of the week is Friday. In the sixth
ID Statement 743
statement, the RULE=AND option identifies days that are both rainy and hot.
The RULE=AND applies to the two datekeys, RainyDays and HotDays.
Usually, the result of an AND operation between two discrete time periods is
an empty value. Therefore, the OR operation is always used between
discrete time periods. An example is '13JUL2013'D, '01Mar1990:15:03:00'DT,
3.
Table 18.55 on page 739 explains how the RULE= option is interpreted for
each observation.
SHIFT=number
specifies the number of pulses to shift the timing value δ. The default is not
to shift the timing value (δ=0). When the SHIFT= option is used, all timing
values in the list (including those generated by date keywords) are shifted.
DATEKEYPERIODS Statement
Writes variables that list the active time periods in the input time ID for datekeys. The active period
dates, datetime values, or observation numbers, are listed with the associated datekey.
Syntax
DATEKEYPERIODS OUT=SAS-data-set;
Required Argument
OUT=SAS-data-set
names the output data set to contain the active dates, datetime values, or
observations for the specified datekeys based on the ID information that is
specified in the ID statement. The OUT= data set also includes variables that
are specified in the BY and ID statements.
ID Statement
Specifies a numeric variable that identifies observations in the input and output data sets.
Syntax
ID SAS-variable-name INTERVAL=interval <options>;
744 Chapter 18 / DATEKEYS Procedure
Required Arguments
SAS-variable-name
specifies a numeric variable that identifies observations in the input and output
data sets. SAS-variable-name can be a SAS date, time, datetime value, or an
observation number.
INTERVAL=interval
specifies the frequency of the input time ID. For example, if the time ID in the
input data set consists of quarterly observations, then use INTERVAL=QTR. For
intervals that can be specified, see Chapter 4, “Date Intervals, Formats, and
Functions” in SAS/ETS User’s Guide.
Optional Argument
option
The following options are available:
ALIGN=option
controls the alignment of SAS dates that are used to identify output
observations. The ALIGN= option accepts the following values: BEGINNING
| BEG | B, MIDDLE | MID | M, and ENDING | END | E.
Default BEGINNING
END=option
specifies a SAS date, datetime, or time value that represents the end of the
data. If the last time ID variable value is less than the END= value, the
variables in the VAR statement are extended with missing values. If the last
time ID variable value is greater than the END= value, the variables are
truncated. For example, END="&sysdate"d uses the automatic macro
variable SYSDATE to extend or truncate the variables to the current date.
This option and the START= option can be used to ensure that data
associated with each BY group contains the same number of observations.
FORMAT=format
specifies the SAS format for the time ID values. If the FORMAT= option is
not specified, the default format is implied from the INTERVAL= option.
START=option
specifies a SAS date, datetime, or time value that represents the beginning
of the data. If the first time ID variable value is greater than the START=
value, the variables in the VAR statement are prefixed with missing values. If
the first time ID variable value is less than the START= value, the variables
are truncated. This option and the END= option can be used to ensure that
data associated with each BY group contains the same number of
observations.
Usage: DATEKEYS Procedure 745
VAR Statement
Copies input variables to the output calendar variables data set. If the VAR statement is omitted, all
numeric variables are selected except those that appear in a BY or ID statement.
Syntax
VAR variable(s);
Required Argument
variable
specifies numeric input variables to be copied to the output calendar variables
data set.
Statements
Extends the calendar variables past the end PROC DATEKEYS LEAD=
of the input time ID
Miscellaneous Options
Datekey Definitions
The purpose of a datekey definition is to define time periods that are associated
with the reference datekey. These time periods are then interpreted in other SAS
procedures that define time-dependent features such as events. The time period of
the datekey definition is compared to the time period of interest. If the time period
of interest falls within the range of the datekey definition, then appropriate action
is taken. The datekey definitions can be written to an output file by using the OUT=
option of the DATEKEYDATA statement.
Once a datekey has been defined, it is referenced using its SAS variable name.
When the datekey definition is written to an output file by using the
DATEKEYDATA statement, the datekey is identified by its SAS variable name. As
with SAS predefined datekeys, when an event is specified using a user-defined
datekey name, a dummy variable is created using the datekey definition. The
dummy variable name is the same as the datekey SAS reference name.
Each datekey must have a unique SAS variable name. If two datekey definitions
have the same name, the following rules apply:
n If two DATEKEYDEF statements exist using the same name, the second
statement is used.
n If a datekey is defined in both a DATEKEYDEF statement and in a data set
specified using the DATEKEYDATA statement, the definition in the
DATEKEYDEF statement is used.
n Any datekey that is defined using a DATEKEYDEF, DATEKKEYKEY, or
DATEKEYDATA statement is used, rather than a predefined SAS datekey.
748 Chapter 18 / DATEKEYS Procedure
The timing values are interpreted as follows: July 4 of any relevant year; the 10th
observation in a time series or data set; December 25, 2000; March 1, 1990 at
3:03PM; January 1, 2000; February 1, 2000; and March 1, 2000.
The timing-value list can be enclosed in parentheses, and commas can separate the
items in the list. Numbers must be integers and are always interpreted as
observation numbers. The value-list can be based on observation numbers, SAS
dates, or SAS datetime values. However, the first and second values in the list must
be of the same type. SAS always expects the type of the second value to be the
same as the type of the first value, and tries to interpret the statement in that way.
The following statement yields erratic results:
datekeydef baddatekey='01Jan2000'd to '01Mar2000:00:00:00'dt by month;
Either the DATEKEYS procedure produces a list much longer than expected or the
procedure does not have enough memory to execute.
Note: Do not mix date, datetime, integer, and value types in a value-list.
The following table shows the holiday date keywords that can be used in a timing-
value list, and their definitions:
BOXING December 26
CANADA July 1
CHRISTMAS December 25
Usage: DATEKEYS Procedure 749
HALLOWEEN October 31
NEWYEAR January 1
USINDEPENDENCE July 4
VALENTINES February 14
VETERANS November 11
1. The date for Easter is calculated using a method described by Montes (2001).
750 Chapter 18 / DATEKEYS Procedure
The following table shows the seasonal date keywords that can be used in a timing
value list, and their definitions:
WEEK_1, ...WEEK_53 the first day of the nth week of the year.
PULSE=WEEK.n shifts this date for n NE 1.
QTR_1, QTR_2, QTR_3, QTR_4 the first date of the quarter. PULSE=QTR.n
shifts this date for n NE 1.
Timing values are evaluated with respect to the application and relevant time
specified by the user. Select the timing values that are consistent with the usage. In
particular, date and datetime timing values are ignored when no date or time
information is specified, and only observation numbers are available for analysis.
3 If PERIOD= values are specified, then periodic values are generated based on
the timing values that result from steps 1 and 2.
The observation that is specified by the shifted timing value, ti, is the observation
that contains the date that is generated by
INTNX(interval,timing—value,s,'same'), where SHIFT=s and PULSE=interval. If
no PULSE= value is specified, the default is PULSE=OBS, which is equivalent to
PULSE=interval, where interval is the interval of the time ID.
Table 18.4 Calculating the Beginning and Ending Observations for Datekeys When
Applied to Time Series
BEFORE=(DURATION=value
) PULSE=value Definition of tb
BEFORE=(DURATION=value
) PULSE=value Definition of tb
The following table shows active time periods for datekey definitions:
If an observation is not within an active period for a datekey, then it is not altered
by the datekey action.
The active period always occurs at the shifted timing value. You specify three
observations before the timing value, one observation at the timing value, and four
after the timing value. The total would be 3+1+4=8 observations as follows:
datekeydef E1='01JAN1950'd / before=(duration=3)
after=(duration=4);
In this example, you specify three weeks before the timing value, the week of the
timing value, and four weeks after the timing value by using a combination of the
BEFORE=, AFTER=, and PULSE= options as follows:
datekeydef E1='01JAN1950'd / before=(duration=3)
after=(duration=4)
pulse=week;
DURATION=ALL implies that the active period should be extended to the beginning
(BEFORE=) or end (AFTER=) of time. If only one DURATION= value is specified,
the other value is assumed to be zero. When neither DURATION= value is specified,
both DURATION= values are set to zero. DURATION=ALL is represented in the
Usage: DATEKEYS Procedure 753
datekey definition data set as a special missing value displayed as "A". For more
information, see “Missing Values” in SAS Language Reference: Concepts.
A user-defined datekey variable has timing values and qualifiers that are defined by
the user. A DATEKEYKEY variable that is defined using a predefined SAS datekey
has a predefined set of timing values and qualifiers that are associated with the
predefined datekey keyword. You can redefine the qualifiers by using the statement
options. The options are the same as in the DATEKEYDEF statement. In “Concepts:
The DATEKEYS Procedure” on page 728, the default SAS variable name for an
event based on a datekey is the datekey keyword. However, you can specify a
different SAS name for the datekey. For example, you can rename the CHRISTMAS
predefined datekey to XMAS by using the following statement:
datekeykey xmas=christmas;
If you redefine the qualifiers that are associated with a predefined SAS datekey and
do not rename the datekey, then that has the impact of redefining the predefined
SAS datekey. This redefinition occurs because any user definition takes precedence
over a SAS predefined definition. The following example produces an event named
FALLHOLIDAYS with a pulse of 1 day at Halloween and a pulse of 1 month at
Thanksgiving:
datekeykey thanksgiving / pulse=month;
eventcomb fallholidays=halloween thanksgiving;
The following table describes how to construct a predefined SAS datekey keyword.
It also gives the default qualifier options for those predefined datekeys.
If the preceding datekeys are stored in a data set named SPRINGHOLIDAYS, the
first DATEKEYKEY statement in the following example clones SPRING as a datekey
named FirstDayOfSpring. The second DATEKEYKEY statement changes the case of
the SPRINGBREAK datekey name.
datekeydata in=springholidays;
datekeykey FirstDayOfSpring=spring;
datekeykey Springbreak=springbreak;
Datekey names that refer to a previously defined datekey are not case sensitive.
However, datekey names that are used to create a new datekey preserve the casing
in the _NAME_ variable of the DATEKEYDATA OUT= data set.
describes how date and observation numbers are encoded into AO, LS, TLS, NLS,
CBLS, and TC enter predefined datekeys.
Table 18.7 Encoding Data Information into AO, LS, TLS, NLS, CBLS, and TC Type DATEKEYKEY
Variable Names
_DTINTRVL_
specifies the interval for the datetime value-list. The default for _DTINTRVL_ is
no interval, designated by ".".
_DUR_AFTER_
specifies the number of durations after the timing value. The default for
_DUR_AFTER_ is 0.
_DUR_BEFORE_
specifies the number of durations before the timing value. The default for
_DUR_BEFORE_ is 0.
_ENDDATE_
specifies the last date timing value to use in a value-list. The default for
_ENDDATE_ is no date, designated by a missing value.
_ENDDT_
specifies the last datetime timing value to use in a value-list. The default for
_ENDDT_ is no datetime, designated by a missing value.
_ENDOBS_
specifies the last observation number timing value to use in a value-list. The
default for _ENDOBS_ is no observation number, designated by a missing value.
_KEYNAME_
specifies either a predefined datekey keyword or a user-defined datekey
keyword. All _KEYNAME_ values are displayed in uppercase. However, if the
_KEYNAME_ value refers to a user-defined keyword, then the actual name can
be mixed case. The default for _KEYNAME_ is no keyname, designated by ".".
_LABEL_
specifies a label or description for the datekey. If you do not specify a label,
then the default label value is displayed as ".". For more information, see the
LABEL system option in SAS System Options: Reference.
_LOCALE_
specifies the locale for the datekey. _LOCALE_ values are the valid POSIX locale
values. For more information, see the LOCALE system option in SAS National
Language Support (NLS): Reference Guide. There is no default.
_NAME_
specifies a datekey reference name. _NAME_ is displayed with the case
preserved. Because _NAME_ is a SAS variable name, the datekey can be
referenced using any case. The _NAME_ variable is required. There is no default.
_OBSINTRVL_
specifies the interval length of the observation number value-list. The default
for _OBSINTRVL_ is no interval, designated by ".".
_PERIOD_
specifies the frequency interval at which the datekey should be repeated. If this
value is missing, then the datekey is not periodic. The default for _PERIOD_ is no
interval, designated by ".".
_PULSE_
specifies an interval that defines the units for the DURATION values. The
default for _PULSE_ is no interval (one observation), designated by ".".
Usage: DATEKEYS Procedure 759
_RULE_
specifies the rule to use when you combine the timing values of a datekey. The
default for _RULE_ for datekeys is OR.
_SHIFT_
specifies the number of PULSE= intervals to shift the timing value. The shift can
be positive (forward in time) or negative (backward in time). If PULSE= is not
specified, then the shift occurs in observations. The default for _SHIFT_ is 0.
_STARTDATE_
specifies either the date timing value or the first date timing value to use in a
value-list. The default for _STARTDATE_ is no date, designated by a missing
value.
_STARTDT_
specifies either the datetime timing value or the first datetime timing value to
use in a value-list. The default for _STARTDT_ is no datetime, designated by a
missing value.
_STARTOBS_
specifies either the observation number timing value or the first observation
number timing value to use in a value-list. The default for _STARTOBS_ is no
observation number, designated by a missing value.
n a list of active datekeys that are associated with the input time ID
n the ID variable dates that are associated with the active datekeys
n the starting date, datetime, or observation value that is associated with the
active time ID period
Written Output
The DATEKEYS procedure has no written output other than warning and error
messages as recorded in the log.
Details
This example uses two methods to construct a datekeys definition data set. The
first method uses the DATA step to construct a datekey data set for Fridays when
an item was on sale. The second method uses the DATEKEYS procedure to
construct the datekey definition data set. The TimeSeriesDates data set contains
the time ID variable for the time series.
The same dates that are defined in WhiteSaleDates in the DATA step can be
defined using the DATEKEYS procedure. The active dates are the same as in the
data set that is shown in Output 18.121 on page 763. However, the definitions in the
DATEKEYS procedure are continuous.
data TimeSeriesDates(keep=date);
set sashelp.citiday;
format date date.;
run;
data WhiteSaleDates(keep=_name_ _startdate_);
set TimeSeriesDates;
_name_='WhiteSale';
if (month(date)=1) then do;
if (year(date)=1991 or year(date)=1992) then do;
if (weekday(date)=6) then do;
_startdate_=date;
end;
else delete;
end;
else delete;
end;
else delete;
format _startdate_ date.;
run;
Program Description
Set the EVENTDS= system option. The EVENTDS= system option specifies the
data set that defines the event. The NODEFAULTS option specifies not to use
default event definitions. The only events that are used are specified by the event-
data-set list.
options eventds=(nodefaults);
Create the TimeSeriesDates data set. The DATA step uses sashelp.citiday to
create the data set for this example:
data TimeSeriesDates(keep=date);
set sashelp.citiday;
format date date.;
run;
Construct a datekeys data set, WhiteSalesDates, for use with the EVENTDS=
system option. When you construct a datekeys data set using a DATA step, all
observations that define a specific datekey must be consecutive. If necessary, use
the SORT procedure to sort the data set by the _NAME_ variable. The DATA step
describes an item that was on sale during January 1991 and January 1992. The user-
defined datekeys identify the Fridays that the item was on sale.
data WhiteSaleDates(keep=_name_ _startdate_);
set TimeSeriesDates;
_name_='WhiteSale';
if (month(date)=1) then do;
if (year(date)=1991 or year(date)=1992) then do;
if (weekday(date)=6) then do;
_startdate_=date;
end;
Example 1: Methods for Constructing a Datekeys Definition Data Set 763
else delete;
end;
else delete;
end;
else delete;
format _startdate_ date.;
run;
Output: HTML
Output 18.1 User-Defined Datekeys Data Set
Program Description
Begin the DATEKEYS procedure. The DATA= option contains the variables that are
used in the ID statement.
proc datekeys data=sashelp.citiday;
Specify the time ID variable. The ID statement names a numeric variable that
identifies observations in the data set. If you use the ID statement, the INTERVAL=
option must be used. In this case, INTERVAL=day.
id date interval=day;
Alter or create a new datekey. Because January is used in the timing value list, it
represents the first day of January. Using DATEKEYKEY with PULSE=MONTH
creates a datekey for the entire month of January.
datekeykey January / pulse=month;
Write the datekey to an output data set. The DATEKEYDATA OUT= data set
contains the definitions for the WhiteSale datekey. The CONDENSE option
specifies that the output data set be condensed. Any variables that contain only
default values are omitted from the data set.
datekeydata out=WhiteSaleDefinitions condense;
Write variables that list the active time periods in the input time ID for datekeys.
The DATEKEYPERIODS OUT= data set contains the active time periods for the
WhiteSale datekey, as shown in Output 18.122 on page 765.
datekeyperiods out=WhiteSaleActiveDates;
Execute the example. The RUN statement causes the program to execute.
run;
Write the active sales dates data set. The results are shown in Output 18.122 on
page 765.
proc print data=WhiteSaleActiveDates(where=(_name_='WhiteSale'));
run;
Example 1: Methods for Constructing a Datekeys Definition Data Set 765
Write the definitions for the WhiteSale datekey. The results are shown in Output
18.123 on page 765.
proc print data=WhiteSaleDefinitions;
run;
HTML Output
Output 18.2 Datekeys Active Periods Data Set Output
The following output shows the DATEKEYDATA OUT= data set definitions for the
WhiteSale datekey:
Details
If a procedure supports an EVENT statement, then you can use either data set that
is shown in “Example 1: Methods for Constructing a Datekeys Definition Data Set”
on page 761 without using PROC HPFEVENTS. PROC HPFEVENTS enables you to
use a user-defined datekey. However, when the user-defined datekey is specified as
an event, the event is created automatically, provided that the user-defined
datekey definition data set has been specified using the EVENTDS= system option.
This example uses the data set from “Example 1: Methods for Constructing a
Datekeys Definition Data Set” on page 761 and the EVENT statement in PROC
HPFDIAGNOSE.
The output from PROC HPFDIAGNOSE shows that a model that included the
WhiteSale event was selected. The event was a poor fit, but REQUIRED=YES was
specified. REQUIRED=YES specifies that the events be included in the model as
long as the model does not fail to be diagnosed.
Program
proc datekeys;
datekeydef Years1991_1992='01JAN1991'd / pulse=year
after=(duration=1);
datekeykey JANUARY / pulse=month;
datekeydef WhiteSale=JANUARY FRIDAY Years1991_1992 / rule=and;
datekeydata out=WhiteSaleDefinitions condense;
run;
options eventds=(WhiteSaleDefinitions);
Example 2: Using User-Defined SAS Datekey Keywords Directly in Other SAS Procedures
767
proc hpfdiagnose data=sashelp.citiday
print=all;
id date interval=day;
forecast snysecm;
event WhiteSale / required=yes;
arimax;
run;
Program Description
Write the datekey to an output data set. The CONDENSE option specifies that the
output data set be condensed. Any variables that contain only default values are
omitted from the data set.
datekeydata out=WhiteSaleDefinitions condense;
Set the EVENTDS= system option. The EVENTDS= system option specifies the
data set that defines the event.
options eventds=(WhiteSaleDefinitions);
List the variables in the data set that you want to diagnose. The FORECAST
statement lists the variables to be diagnosed in the data set that is specified by the
DATA= option. The variables are dependent variables or response variables that
you want to forecast in the HPFENGINE procedure.
forecast snysecm;
Name the event. The EVENT statement name identifies the events. The REQUIRED
option specifies that the events be included in the model as long as the model does
not fail to be diagnosed.
event WhiteSale / required=yes;
HTML Output
The following output shows the results:
Example 2: Using User-Defined SAS Datekey Keywords Directly in Other SAS Procedures
769
Output 18.4 Using the EVENT Statement in PROC HPFDIAGNOSE
770 Chapter 18 / DATEKEYS Procedure
Details
Consider the datekey definitions that are shown in “Concepts: The DATEKEYS
Procedure” on page 728. The DATEKEYS procedure can be used to create a
calendar variable that indicates the active periods of the datekey definitions that
are defined in the data set MyHolidays.
Program
options eventds=(nodefaults);
data Year2010;
do date='01JAN2010'd to '31DEC2010'd;
output;
end;
format date date.;
run;
proc datekeys;
datekeydef SuperBowl=
'15JAN1967'd '14JAN1968'd '12JAN1969'd '11JAN1970'd
'17JAN1971'd '16JAN1972'd '14JAN1973'd '13JAN1974'd '12JAN1975'd
'18JAN1976'd '09JAN1977'd '15JAN1978'd '21JAN1979'd '20JAN1980'd
'25JAN1981'd '24JAN1982'd '30JAN1983'd '22JAN1984'd '20JAN1985'd
'26JAN1986'd '25JAN1987'd '31JAN1988'd '22JAN1989'd '28JAN1990'd
'27JAN1991'd '26JAN1992'd '31JAN1993'd '30JAN1994'd '29JAN1995'd
'28JAN1996'd '26JAN1997'd '25JAN1998'd '31JAN1999'd '30JAN2000'd
'28JAN2001'd '03FEB2002'd '26JAN2003'd '01FEB2004'd '06FEB2005'd
'05FEB2006'd '04FEB2007'd '03FEB2008'd '01FEB2009'd '07FEB2010'd
'06FEB2011'd '05FEB2012'd '03FEB2013'd '02FEB2014'd
/ pulse=day;
Example 3: Obtaining a Calendar Variable By Using the DATEKEYS Procedure 771
Program Description
Set the EVENTDS= system option. The EVENTDS= system option specifies the
data set that defines the event. The NODEFAULTS option specifies not to use
default event definitions. The only events that are used are specified by the event-
data-set list.
options eventds=(nodefaults);
Create a SAS data set. Create the Year2010 data set, which contains a series of
dates.
data Year2010;
do date='01JAN2010'd to '31DEC2010'd;
output;
end;
format date date.;
run;
Alter or create a new datekey. The value of the DATEKEYKEY statement with the
datekey name EasterMonday is equal to the datekey named Easter. When Easter is
treated as a datekey timing value, only the timing value definition is used, and
qualifiers (such as SHIFT or PULSE) are not part of the definition. When Easter is
treated as a datekey, the qualifiers that you select (such as SHIFT and PULSE) are
part of the new definition. When you specify the PULSE= option, DURATION=
values are set to zero. SHIFT= specifies the number of pulses to shift the timing
value. When the SHIFT= option is used, all timing values are shifted.
datekeykey EasterMonday=Easter / shift=1 pulse=day;
Write the datekey to an output data set. The DATEKEYDATA statement writes
datekeys to an output data set named MyHolidays. The CONDENSE option
specifies that the output data set be condensed. Any variables that contain only
default values are omitted from the data set.
datekeydata out=MyHolidays condense;
Set the EVENTSDS= system option. The EVENTSDS system option specifies the
data set that defines the event.
options eventds=(MyHolidays);
Begin the DATEKEYS procedure. The DATA= option names the data set that
contains the variables that are used in the ID statement.
proc datekeys data=Year2010;
Specify the time ID variable. The ID statement names a numeric variable that
identifies observations in the data set. If you use the ID statement, then the
INTERVAL= option must be used. In this case, INTERVAL= day.
id date interval=day;
Create a data set that contains variables that indicate the active time periods for
datekeys. The DATEKEYCALENDAR OUT= statement specifies the data set that is
created, and contains calendar variables that are based on the information that is
specified in the ID statement. The DATEKEYCALENDAR OUT= data set contains
calendar variables that identify the holidays that are defined in the MyHolidays
data set. Values for the month in which of February 2010 are shown Output 18.125
on page 774.Values for the month in which of April 2010 are shown Output 18.126
on page 775. Only the active and inactive periods that correspond to the input time
ID are contained in the DATEKEYCALENDAR OUT= data set MyHolidaysIn2010.
Because the input time ID has a daily frequency and spans the year 2010, the
results are calendar variables for the year 2010.
Example 3: Obtaining a Calendar Variable By Using the DATEKEYS Procedure 773
datekeycalendar out=MyHolidaysIn2010;
run;
Write the output for the MyHolidaysIn2010 data set for February. The results are
shown in Output 18.125 on page 774.
proc print data=MyHolidaysIn2010(where=(month(date)=2));
Write the output for the MyHolidaysIn2010 data set for April. The results are
shown in Output 18.126 on page 775.
proc print data=MyHolidaysIn2010(where=(month(date)=4));
run;
774 Chapter 18 / DATEKEYS Procedure
HTML Output
Output 18.5 Holiday Calendar from the Daily 2010 Time ID for February
Example 3: Obtaining a Calendar Variable By Using the DATEKEYS Procedure 775
Output 18.6 Holiday Calendar from the Daily 2010 Time ID for April
776 Chapter 18 / DATEKEYS Procedure
Details
The DATEKEYS procedure can be used to filter input data sets and create data sets
that contain only datekey data that is associated with a specified locale. The
DATEKEYDSOPT statement applies to the data sets that are specified in the
DATEKEYDATA, DATEKEYPERIODS, and DATEKEYCALENDAR statements.
This example creates the datekey definitions for the following English Canadian
holidays: Canada, CanadaObserved, and Boxing. It also creates an English U.S.
datekey definition for USINDEPENDENCE. The definition for NewYearsEve does
not have a specified locale.
Program
options eventds=(nodefaults);
data December2010;
do date='01DEC2010'd to '31DEC2010'd;
output;
end;
format date date.;
run;
proc datekeys;
datekeykey Canada / locale=en_CA;
datekeykey CanadaObserved / locale=en_CA;
datekeykey Boxing / locale='en_CA';
datekeykey USINDEPENDENCE / locale=en_US;
datekeydef NewYearsEve=NEWYEAR / shift=-1 pulse=day;
Example 4: Filtering Data Sets By Using the DATEKEYDSOPT Statement 777
Program Description
Set the EVENTDS= system option. The EVENTDS= system option specifies the
data set that defines the event. The NODEFAULTS option specifies not to use
default event definitions. The only events that are used are specified by the event-
data-set list.
options eventds=(nodefaults);
Create a SAS data set. Create the December2010 data set, which contains a series
of dates:
data December2010;
do date='01DEC2010'd to '31DEC2010'd;
output;
end;
format date date.;
run;
Create a new datekey. The DATEKEYKEY statement creates a new datekey. The
LOCALE= option specifies a locale that is associated with the datekey. The locale
should be a POSIX locale value.
datekeykey Canada / locale=en_CA;
778 Chapter 18 / DATEKEYS Procedure
Write the datekey to an output data set. The DATEKEYDATA statement creates an
output data set. The CONDENSE option specifies that the output data set be
condensed. Any variables that contain only default values are omitted from the
data set.
datekeydata out=MyHolidays condense;
Set the EVENTDS= system option. The EVENTDS= system option specifies the
data set that defines the event.
options eventds=(MyHolidays);
Begin the DATEKEYS procedure. The DATEKEYS procedure creates datekeys that
are associated with time computations. The DATA= option contains the variables
that are used in the ID statement.
proc datekeys data=December2010;
Create a data set for a specific locale. The DATEKEYDSOPT statement limits the
processing of data sets to a specified locale. When you use this statement, only the
datekey definitions with a locale value of en_CA, and datekey definitions with no
specified locale, are used to create the AllCAHolidaysInDecember2010 data set.
datekeydsopt locale=en_CA;
Create a data set that contains variables that indicate the active time periods for
datekeys. The DATEKEYCALENDAR statement writes variables that indicate the
active time periods for datekeys. The OUT= option is the data set that is created,
and contains calendar variables based on the information that is specified in the ID
statement.
datekeycalendar out=AllCAHolidaysInDecember2010;
Begin the DATEKEYS procedure. The DATA= option contains the variables that are
used in the ID statement.
proc datekeys data=December2010;
Example 4: Filtering Data Sets By Using the DATEKEYDSOPT Statement 779
Create a data set for a specific locale. The DATEKEYDSOPT statement creates a
data set that contains only datekey data that is associated with a specified locale.
If you specify LOCALE=(ONLY)en_CA, then only the specified locale is processed
for both input and output data sets. The locale should be a POSIX locale value.
datekeydsopt locale=(ONLY)en_CA;
Create a data set that contains variables that indicate the active time periods for
datekeys. The DATEKEYCALENDAR statement writes variables that indicate the
active time periods for datekeys. The OUT= option specifies the data set that is
created, and contains calendar variables based on the information that is specified
in the ID statement.
datekeycalendar out=OnlyCAHolidaysInDecember2010;
Write the MyHolidays data set. For results, see Output 18.127 on page 780.
proc print data=MyHolidays;
run;
Write the AllCAHolidaysInDecember2010 data set. For results, see Output 18.128
on page 781.
proc print data=AllCAHolidaysInDecember2010;
run;
HTML Output
Output 18.7 All Holiday Definitions That Are Specified in MyHolidays
Example 4: Filtering Data Sets By Using the DATEKEYDSOPT Statement 781
Output 18.8 Holiday Calendar for All Canadian Holidays in December 2010
782 Chapter 18 / DATEKEYS Procedure
Output 18.9 Holiday Calendar for Only Canadian Holidays in December 2010
Example 4: Filtering Data Sets By Using the DATEKEYDSOPT Statement 783
References
Montes, M. J. “Calculation of the Ecclesiastical Calendar.” 2001. Available at http://
www.smart.net/~mmontes/ec-cal.html.
Montes, M. J. “Algorithm for Calculating the Date of Easter in the Gregorian
Calendar.” 2001. Available at http://www.smart.net/~mmontes/nature1876.html.
784 Chapter 18 / DATEKEYS Procedure
785
19
DELETE Procedure
n delete loaded CAS tables from a caslib. It does not delete the original files from
the data source specified by the caslib.
n delete a list of data sets with the same name and a numeric suffix, such as
PROC DELETE Delete SAS files from SAS libraries Ex. 1, Ex. 2,
Ex. 3, Ex. 4,
Ex. 5
PROC DELETE Statement 787
Syntax
PROC DELETE <LIBRARY=libref> DATA=SAS-file(s)
(<GENNUM=ALL | HIST | REVERT | integer>
<MEMTYPE=member-type>
<ENCRYPTKEY=key-value>
<ALTER=alter-password>);
Required Argument
DATA= SAS-file(s)
specifies one or more SAS files that you want to delete.
Note: You can also use a numbered range list. For more information, see “Data
Set Name Lists” in SAS Programmer’s Guide: Essentials. You cannot use a colon
list.
TIP If you want to delete all files in a library, use the PROC DATASETS
KILL option. Use PROC DATASETS LIB=library name KILL to delete all
files including catalogs. For more information, see “KILL” on page 582.
788 Chapter 19 / DELETE Procedure
Optional Arguments
ALTER=alter-password
provides the Alter password for any alter-protected SAS files.
ENCRYPTKEY=key-value
The ENCRYPTKEY= option unlocks the data sets that are protected by an AES-
encrypted key value. The ENCRYPTKEY= option is needed only when a data set
must be opened.
LIBRARY=libref
specifies the SAS library that contains members to be deleted.
Alias LIB=
PROC DELETE Statement 789
MEMTYPE=(member-type(s))
restricts deleting one or more member types. For example, if you have a data set
and a catalog named MyFile in the MyLib library and you want to delete only the
catalog, then use the MEMTYPE= option.
proc delete lib=MyLib data=MyFile (memtype=catalog);
run;
ACCESS
access descriptor files (created by SAS/ACCESS software)
CATALOG
SAS catalogs
DATA
SAS data files
FDB
financial database
MDDB
multidimensional database
PROGRAM
stored compiled SAS programs
VIEW
SAS views
Aliases MTYPE=
MT=
Default DATA
Details
Details
This example demonstrates the following tasks:
n deletes data sets from a library
Program
Delete SAS data sets named A, B, and C from a SAS library named MyLib. The
GENNUM= option deletes all the historical versions for each of the data sets.
proc delete data=MyLib.A MyLib.B MyLib.C (gennum=all);
run;
Example 3: Deleting the Base Version and Renaming the Youngest Historical Version to the
Base Version 791
Details
This example demonstrates the following tasks:
n deletes a data set from a library
Program
Deletes the data set named MyLib.A and all the historical versions.
proc delete data=MyLib.A (gennum=all);
run;
Details
This example demonstrates the following tasks:
n deletes a data set from a library
792 Chapter 19 / DELETE Procedure
The following statement deletes the data set named MyLib.A and renames the
youngest historical version.
Program
Deletes the data set named MyLib.A and renames the youngest historical version.
proc delete data=MyLib.A(gennum=revert1);
run;
Details
This example deletes the first historical version of the data set.
Program
GENNUM= option deletes the first historical version of the data set names
MyLib.A. You use GENNUM=integer to select the historical version that you want
to delete.
proc delete data=MyLib.A(gennum=1);
run;
Details
This example deletes all historical versions except the base version of a data set.
Program
Use the GENNUM=HIST option to delete all historical versions and retain the
base version of the data set MyLib.A.
proc delete data=MyLib.A(gennum=hist);
run;
Details
This example deletes the CATALOG file in a specific SAS library.
Program
The MEMTYPE= option names the type of file to delete in a SAS library. If you
have other member types named MyFile in the MyLib library, they will not be
deleted.
proc delete lib=MyLib data=MyFile (memtype=catalog);
run;
ENCRYPTKEY=
GENNUM=
Details
This example demonstrates the following tasks:
n unlocks an AES encrypted data set
n deletes all historical versions and the base AES data set named MyLib.A.
Program
Deletes the base AES data set named MyLib.A and all historical versions. The
ENCRYPTKEY= option must be used if the data set is AES encrypted.
proc delete data=MyLib.A (gennum=ALL encryptkey=key-value);
run;
Details
This example deletes a password protected data set.
Program
Deletes a password protected data set named MyLib.A. If the data set is password
protected, you must supply the password.
proc delete data=MyLib.A (alter=alter-password);
run;
Example 10: Using the LIBRARY= Option 795
Program
Deletes multiple data sets named X1, X2, X3, X4, and X5. To use the list feature,
the data sets must have the same name and end with a numeric suffix. If a
LIBRARY= option is not specified, the data sets are deleted from the Work library.
proc delete data=X1-X5;
run;
Details
The following statement deletes a data set from a specific SAS library.
Program
Deletes the A data set that is in the specified SAS library named MyLib. The alias
for the LIBRARY= option is LIB=.
proc delete lib=MyLib data=A;
run;
796 Chapter 19 / DELETE Procedure
Details
This example deletes data sets X1, X2, X3, X4, and X5 in the specified SAS library
named MyLib. The alias for the LIBRARY= option is LIB=.
Program
Deletes data sets X1, X2, X3, X4, and X5 from the specified SAS library named
MyLib. When using the list feature, all the data sets must have the same name and
end with a numeric suffix. The alias for the LIBRARY= option is LIB=.
proc delete lib=MyLib data=X1-X5;
run;
797
20
DISPLAY Procedure
Restriction: This procedure is not available in SAS Viya orders that include only SAS Visual
Analytics.
Example: “Example: Executing a SAS/AF Application” on page 799
Syntax
PROC DISPLAY CATALOG=libref.catalog.entry.type <BATCH>;
Required Argument
CATALOG=libref.catalog.entry.type
specifies a four-level name for the catalog entry.
libref
specifies the SAS library where the catalog is stored.
catalog
specifies the name of the catalog.
entry
specifies the name of the entry.
type
specifies the entry's type, which is one of the following. For details, see the
description of catalog entry types in the BUILD procedure in online Help.
n CBT
n FRAME
n HELP
n MENU
n PROGRAM
n SCL
Optional Argument
BATCH
runs PROGRAM and SCL entries in batch mode. If a PROGRAM entry contains a
display, then it will not run, and you will receive the following error message:
ERROR: Cannot allocate window.
If you use the SAS windowing environment, you can use the AF command to
execute an application. SUBMIT blocks execute immediately when you use the AF
command. You can use the AFA command to execute multiple applications
concurrently.
Details
Suppose that your company has developed a SAS/AF application that compiles
statistics from an invoice database. Further, suppose that this application is stored
in the SASUSER library, as a FRAME entry in a catalog named INVOICES.WIDGETS.
You can execute this application using the following SAS code.
proc display catalog=sasuser.invoices.widgets.frame;
run;
800 Chapter 20 / DISPLAY Procedure
801
21
DS2 Procedure
Using the DS2 procedure, you can submit DS2 language statements to SAS and
third-party data sources that are accessed with SAS and SAS/ACCESS library
engines. If you have SAS Cloud Analytic Services (CAS) configured, you can also
submit DS2 language statements to the CAS server.
To execute DS2 jobs in a SAS library, specify the PROC DS2 statement followed by
DS2 language statements. Qualify table names in your DS2 language statements
with a libref. If you do not specify a libref, the request is executed in the SAS Work
library.
To execute DS2 jobs on the CAS server, specify the SESSREF= (or SESSUUID=)
option and a CAS session name in the procedure statement. Then, either create
tables in the CAS session, load tables from external data sources into the CAS
session, or qualify table names in your DS2 language statements with caslibs. A
caslib enables you to dynamically load external data into your CAS session for
processing.
Note: Do not use a CAS engine libref with PROC DS2. When SESSREF= (or
SESSUUID=) are specified, PROC DS2 makes a direct connection to the CAS server.
The procedure does not need the CAS engine. The procedure silently passes
requests to the DS2.runDS2 action and the action executes your DS2 program on
the CAS server.
Concepts: DS2 Procedure 803
n execute outside of a SAS session (for example, on SAS Federation Server or the
CAS server)
n take advantage of threaded processing in products such as SAS Enterprise
Miner and the CAS server.
Table 21.1 Data Sources for Which DS2 Supports SAS Library Access
DB2 * * DB2
Greenplum * *
804 Chapter 21 / DS2 Procedure
MySQL * *
Netezza * *
ODBC-compliant * *
databases
Oracle * *
Read-and-write
support available in
SAS Viya 3.5.
SAP * * Read-only
SAP IQ * *
SAS Scalable * *
Performance Data
(SPD) Engine data
sets
Teradata * * Teradata
Yellowbrick * SAS/ACCESS to
Yellowbrick is limited
806 Chapter 21 / DS2 Procedure
When submitting DS2 statements to the CAS server, the procedure reads data from
files and in-memory tables and creates CAS session tables. A caslib uses a SAS
Data Connector (or SAS Data Connector Accelerator) to access data from a
corresponding data source for processing in CAS. For information about available
SAS Data Connectors, see SAS Cloud Analytic Services: User’s Guide. DS2 output
tables in CAS are in-memory tables. You must use other actions to persist data to
caslib data sources.
For information about how to connect to a data source with PROC DS2, see “Data
Source Connection” on page 814.
PROC DS2 Specify that the subsequent input is DS2 Ex. 1, Ex. 2,
language statements. Ex. 3, Ex. 4,
Ex. 5
Requirement: Follow the PROC DS2 statement with DS2 language statements. See SAS DS2
Language Reference for information about DS2 language statements.
Interactions: The DS2 procedure requires the RUN statement to submit DS2 statements. That is,
SAS reads the program statements that are associated with one task until it
reaches a RUN statement.
PROC DS2 Statement 807
Syntax
PROC DS2 <connection-option ><processing-options>;
General Processing
ANSIMODE
specifies that nonexistent values in CHAR and DOUBLE columns are
processed as ANSI SQL null values.
BYPARTITION=YES | NO
determines whether the input data for the DS2 program is automatically
re-partitioned when executed inside the database with in-database
processing.
DS2ACCEL=NO | YES
determines whether DS2 code is enabled for parallel processing in
supported environments using the SAS In-Database Code Accelerator.
808 Chapter 21 / DS2 Procedure
ERRORSTOP | NOERRORSTOP
specifies whether the procedure stops executing if it encounters an error.
LABEL | NOLABEL
specifies whether to use the column label or the column name as the
column heading.
MEMSIZE=n | nM | nG
specifies a limit for the amount of memory that is used for an underlying
query (such as a SELECT statement), so that allocated memory is
available to support other PROC DS2 operations.
NUMBER
specifies to include a column named Row, which is the row (observation)
number of the data as the rows are retrieved.
SCOND=WARNING | NONE | NOTE | ERROR
specifies the level of messages that PROC DS2 displays in the SAS log
for the DS2 variable declaration strict mode, which requires that every
variable must be declared in the DS2 program.
STIMER
specifies to write a subset of system performance statistics, such as
time-elapsed statistics, to the SAS log.
XCODE=ERROR | WARNING | IGNORE
controls the behavior of the SAS session when an NLS transcoding failure
occurs.
Optional Arguments
ANSIMODE
specifies that nonexistent values in CHAR and DOUBLE columns are processed
as ANSI SQL null values. By default, PROC DS2 processes nonexistent values in
CHAR and DOUBLE columns as missing values. This is how SAS processes
nonexistent values. The ANSIMODE option specifies to process nonexistent
values in CHAR and DOUBLE columns as ANSI SQL null values. It is important
to understand the differences, or data can be lost. For information about
processing differences, see “How DS2 Processing Nulls and SAS Missing Values”
in SAS DS2 Programmer’s Guide. All other data types use ANSI NULL semantics
all of the time.
BYPARTITION=YES | NO
determines whether the input data for the DS2 program is automatically re-
partitioned when executed inside the database with in-database processing.
YES
specifies that the input data is automatically re-partitioned by the first BY
variable. All of the BY groups are in the same data partition and processed
by the same thread. Each thread does the BY processing for the entire group
of data.
NO
specifies that the input data is not re-partitioned even if there is a BY
statement in the DS2 program. Each group of data resides on different data
partitions and is processed by different DS2 threads. Each thread gets partial
PROC DS2 Statement 809
data from a group, and each group is processed by multiple threads. The DS2
program must request the final aggregation of data.
Default YES
DS2ACCEL=NO | YES
determines whether DS2 code is enabled for parallel processing in supported
environments using the SAS In-Database Code Accelerator. The SAS In-
Database Code Accelerator enables you to publish a DS2 thread program to the
database and execute the thread program in parallel inside the database. If you
are using Hadoop or Teradata, then the DS2 data program is also published and
executed inside the database.
NO
disables DS2 code from executing in supported parallel environments. The
DS2 code is executed in the Base SAS session.
YES
enables DS2 code to execute in supported parallel environments.
Alias INDB=
Notes The INDB= option was added in SAS 9.4M1 and was renamed to
DS2ACCEL= in SAS 9.4M2.
ERRORSTOP | NOERRORSTOP
specifies whether the procedure stops executing if it encounters an error. In a
batch or noninteractive session, ERRORSTOP instructs the procedure to stop
executing the statements but to continue checking the syntax after it has
encountered an error. NOERRORSTOP instructs the procedure to execute the
statements and to continue checking the syntax after an error occurs.
LABEL | NOLABEL
specifies whether to use the column label or the column name as the column
heading.
Default LABEL
Interactions If a column does not have a label, the procedure uses the column's
name as the column heading.
Alias LIBNAMES=
Interactions If both LIBS= and SESSREF= (or SESSUID=) are specified in the
procedure statement, SESSREF= is applied and the other option is
ignored.
If you are curious about how LIBS= affects library assignments, set
the MSGLEVEL=i system option before running a PROC DS2
request with LIBS=. The option produces Include and Ignore
PROC DS2 Statement 811
MEMSIZE=n | nM | nG
specifies a limit for the amount of memory that is used for an underlying query
(such as a SELECT statement), so that allocated memory is available to support
other PROC DS2 operations. Specify the memory limit in multiples of 1 (bytes);
1,048,576 (megabytes); or 1,073,741,824 (gigabytes). For example, the value
23M specifies 24,117,248 bytes of memory. The value 16G specifies
17,179,869,184 bytes of memory.
Default The procedure optimizes the setting based on the amount of memory
on the host.
Note On the CAS server, MEMSIZE= specifies the memory for a single
worker.
NUMBER
specifies to include a column named Row, which is the row (observation)
number of the data as the rows are retrieved.
WARNING
writes warning messages to the SAS log.
NONE
no messages are written to the SAS log.
NOTE
writes notes to the SAS log.
ERROR
writes error messages to the SAS log.
Interaction Specifying the SCOND= option in the PROC DS2 statement takes
precedence over the DS2SCOND= system option.
SESSREF=session-name
specifies to run the DS2 statements in a CAS session. The CAS session is
identified by its session name.
Note This option is supported in SAS Viya 3.1 and later and in SAS
9.4M5 and later.
SESSUUID="session-uuid"
specifies to run the DS2 statements in a CAS session. The CAS session is
identified by its universally unique identifier (UUID).
Note This option is supported in SAS Viya 3.1 and later and in SAS
9.4M5 and later.
STIMER
specifies to write a subset of system performance statistics, such as time-
elapsed statistics, to the SAS log. When STIMER is in effect, the procedure
writes to the SAS log a list of computer resources used for each step and the
entire SAS session.
Interaction If the SAS system option FULLSTIMER is in effect, the complete list
of computer resources is written to the SAS log.
ERROR
specifies that a run-time error occurs, which causes row processing to halt.
An error message is written to the SAS log. This is the default behavior.
WARNING
specifies that the incompatible character is set to a substitution character. A
warning message is written to the SAS log.
IGNORE
specifies that the incompatible character is set to a substitution character.
No messages are written to the SAS log.
Default ERROR
Tip: The RUN CANCEL statement is useful if you enter a typographical error.
Example: “Example 3: Terminating the Current Step in Line Prompt Mode” on page 830
Syntax
RUN CANCEL;
814 Chapter 21 / DS2 Procedure
1 You first submit the LIBNAME statement for a SAS engine. For information to
define a LIBNAME statement, see:
SAS data sets
SAS Global Statements: Reference
Relational DBMS data sources
SAS/ACCESS for Relational Databases: Reference
MongoDB and Salesforce
SAS/ACCESS for Nonrelational Databases: Reference
SPD Engine data sets
SAS Scalable Performance Data Engine: Reference
SPD Server tables
SAS Scalable Performance Data Server: User’s Guide
2 In your DS2 program, use a two-part name in the form libref.table-name to refer
to tables. The libref tells the program where to create or locate a table.
This example illustrates how PROC DS2 accesses a data source by using the
attributes of a previously assigned libref. The LIBNAME statement assigns the
libref MyFiles, specifies the BASE engine, and then specifies the physical location
Usage: DS2 Procedure 815
for the SAS data set. The DS2 program then creates the SAS data set
MyFiles.Table1 at the location specified in the LIBNAME statement.
proc ds2;
data myfiles.table1;
dcl double j j2;
method run();
do j = 1 to 1000;
j2 = 2*j;
output;
end;
end;
enddata;
run;
quit;
The DS2 procedure builds a data source connection string that includes all active
librefs and sends it to the DS2 program. You reference a particular library by
specifying its libref in a two-part table name in the form libref.table-name. If you do
not specify a libref, the table is created in the SAS Work library.
PROC DS2 uses libref attributes for connection information only (such as physical
location). PROC DS2 generally does not use libref attributes that define behavior.
For example, if a previously submitted LIBNAME statement for the BASE engine
specifies that SAS data sets are to be compressed, the compression attribute is not
used by the procedure. There are exceptions. For example, the MAX_BINARY_LEN=
and MAX_CHAR_LEN= for Google BigQuery are included in the internal connection
string.
You can determine which LIBNAME options are used by the procedure by setting
the MSGLEVEL=i system option before submitting a LIBNAME statement. For
many data sources, you can specify a DS2 table option to override a LIBNAME
option. For example, the COMPRESS= table option can be used to request
compression of SAS data sets. The SCANSTRINGCOLUMNS= table option can be
used to override the MAX_CHAR_LEN= LIBNAME option. Not all LIBNAME
statement options have a corresponding table option.
z/OS Specifics: The physical location for the libref must be an HFS path
specification.
You must continue to qualify table names with a libref, even when LIBS= specifies
only one library; otherwise, the default library is used.
The following example illustrates the use of the LIBS= option. In the example, two
librefs are assigned in the SAS session: AllFiles and MyFiles. The LIBS= option
specifies to use libref MyFiles only.
For more information, see “LIBS=libref | (libref1 libref2 ...librefn)” on page 810.
Note: You must have a CAS server configured. You must first submit the CAS
statement to establish the CAS session. To interact with data in a CAS session, you
need a caslib. You must first define a caslib or use a pre-defined caslib. You define a
caslib and list the caslibs that are available to your CAS session by using the
CASLIB statement. For syntax information, see SAS Cloud Analytic Services: User’s
Guide. A caslib uses a SAS Data Connector (or SAS Data Connector Accelerator) to
access data. For information about SAS Data Connectors, see SAS Cloud Analytic
Services: User’s Guide.
Use a two-part name in the form caslib.table-name to identify tables in your DS2
statements. The following example illustrates a PROC DS2 request to the CAS
server.
options cashost="cloud.example.com" casport=5570;
cas mysess;
password='mypw'
server='testserver',
db='testdb');
This example establishes a CAS session named MySess on a CAS server on CAS
host cloud.example.com. It then uses the CASLIB statement to assign caslib
CASTERA. The PROC DS2 statement specifies the SESSREF= procedure option
and the CAS session name MySess. The FedSQL SELECT statement in the DS2 SET
statement identifies table Employees using the CASTERA caslib.
SAS Viya data connectors support explicit and automatic (shown here) loading of
data into CAS. For an example that explicitly loads data, see “Example 5: Run a DS2
Program in CAS” on page 833.
The tables that you create with PROC DS2 are in-memory CAS tables. That is, the
tables are available for the duration of the CAS session and are accessible only to
the current session. PROC DS2 does not provide a way to persist a table to a data
source or to share the table with other CAS sessions. To persist or share a CAS
table, use the CASUTIL procedure.
Note: Although CAS tables are in-memory tables, you must specify the
OVERWRITE= table option to overwrite an initial output table with a replacement
output table.
Most of the functionality of the DS2 language is supported for use on the CAS
server. However, there are some exceptions. For information about the DS2
functionality that is supported in CAS, see SAS DS2 Programmer’s Guide.
For an example of how a DS2 parallel program is submitted to the CAS server, see
“Example 5: Run a DS2 Program in CAS” on page 833.
DS2 output tables in CAS are in-memory tables. The tables are created in the user’s
CAS session. You must use other CAS actions to promote the output tables for
global use in CAS or to store data to caslib data sources.
RUN-Group Processing
PROC DS2 supports RUN-group processing. RUN-group processing enables you to
submit RUN groups without ending the procedure.
To use RUN-group processing, you start the procedure and then submit multiple
RUN-groups. A RUN-group is a group of statements that contains at least one
action statement and ends with a RUN statement. As long as you do not terminate
the procedure, it remains active and you do not need to resubmit the PROC
statement.
Note: When using PROC DS2, DS2 programs are delimited by RUN statements. If
additional DS2 code is found after a RUN statement, then this code composes a
new, distinct DS2 program from the DS2 program before the previous RUN
statement.
Usage: DS2 Procedure 819
DS2 table options are used to apply options when you access a data source within
PROC DS2. For example, the following code applies a table option to the SAS data
set to specify the size of a permanent buffer page for the new table:
proc ds2;
data myfiles.table1 (bufsize=16k);
dcl double j j2;
method run();
do j = 1 to 1000;
j2 = 2*j;
output;
end;
end;
enddata;
run;
quit;
For a list of available table options, see SAS DS2 Language Reference.
With PROC DS2, you can use a macro variable on a subsequent DS2 statement.
However, if a macro variable occurs within a literal string, you cannot enclose the
string in double quotation marks. The macro processor requires double quotation
marks to resolve the macro variable reference. DS2 statements consider a string
enclosed in double quotation marks to be a delimited (case sensitive) identifier
such as a table or column name.
820 Chapter 21 / DS2 Procedure
To reference a macro variable in a literal string, use the SAS macro function
%TSLIT. %TSLIT overrides the need for double quotation marks around the literal
string and puts single quotation marks around the input value. For example, the
following statement includes the %TSLIT function to specify the &SYSHOSTNAME
macro variable, which returns the host name of the computer on which it is
executed:
if hostname = %tslit(&syshostname) then ...
The %TSLIT macro function is stored in the default autocall macro library. For more
information, see “Referencing a Macro Variable in a Delimited Identifier” in the SAS
DS2 Programmer’s Guide.
Variable Description
_HOSTNAME_ Returns the name of the worker node or host on which the DS2
program is running.
_NTHREADS_ Returns the total number of DS2 threads running in the program. In a
parallel environment, _NTHREADS_ is the total number of DS2
threads across all nodes on which the DS2 program is running.
_THREADID_ Returns the _THREADID_. A serial program (which does not contain
a thread component) is assigned _THREADID_ = 0. In a parallel
program, the executing data program is assigned _THREADID_ = 0
and each executing thread program is assigned a unique
_THREADID_ from 1 to the number of threads.
data _null_;
dcl thread thd t;
method init();
Usage: DS2 Procedure 821
Passwords
SAS software enables you to restrict access to SAS data sets and SPD Engine data
sets by assigning SAS passwords to the files. You can specify three levels of
protection: read, write, and alter.
With PROC DS2, you assign or specify a password for a data source using the DS2
table options ALTER=, PW=, READ=, and WRITE=. For example, the following code
applies the DS2 table option PW= in order to assign READ, WRITE, and ALTER
passwords to a SAS data set:
proc ds2;
data myfiles.table1 (pw=luke);
dcl double j j2;
method run();
do j = 1 to 1000;
j2 = 2*j;
output;
end;
end;
enddata;
run;
quit;
A SAS password does not control access to a SAS file beyond SAS. You should use
the operating system-supplied utilities and file system security controls to control
access to SAS files outside SAS. For more information about SAS passwords, see
“Assigning Passwords” in SAS Programmer’s Guide: Essentials..
CAS tables do not support SAS passwords. Therefore, you cannot assign a
password for a CAS table. When accessing password-protected data from CAS,
passwords are specified in the CAS language element used to access the data. For
example, passwords are supported in the CASUTIL procedure, which loads data
into CAS, as well as in the CASLIB statement and in the Table.addCaslib action. For
more information, see the SAS Viya SAS Data Connector documentation in SAS
Cloud Analytic Services: User’s Guide.
822 Chapter 21 / DS2 Procedure
Encryption
SAS software enables you to encrypt the contents of a SAS data set, SPD Engine
data set, and SPD Server table. SAS supports SAS proprietary encryption and AES
encryption.
AES encryption is performed by specifying the ENCRYPT= table option with the
ENCRYPTKEY= table option. A data set or table encrypted with AES encryption is
later decrypted by specifying the ENCRYPTKEY= table option with the appropriate
key value.
Beginning with SAS 9.4M5, SAS supports two levels of AES encryption: AES and
AES2. The new AES2 option provides AES encryption to meet newer and more
secure encryption standards. You must specify the ENCRYPTKEY= table option
when using AES or AES2 encryption. AES2 encryption is initially supported for SAS
data sets only. For more information, see “ENCRYPT=” in SAS DS2 Language
Reference.
DS2 currently does not support the encryption attribute for CAS tables. When
accessing SAS and AES encrypted data sets from CAS, passwords and encryption
keys are specified in the CAS language element that is used to access the data. For
example, passwords and encryption keys are supported in the CASUTIL procedure,
which loads data into CAS, as well as in the CASLIB statement and in the
Table.addCaslib action. For more information, see the SAS Viya SAS Data
Connector documentation in SAS Cloud Analytic Services: User’s Guide.
Table 21.3 DS2 Data Type Translation for SAS Data Sets
Legacy SAS
DS2 Data Type Data Type Description
FLOAT DOUBLE
Usage: DS2 Procedure 825
NVARCHAR(n) VARCHAR
REAL DOUBLE
1 Support for the integer data types starts with SAS Viya 3.3.
2 Beginning with SAS Viya 3.5, a DS2 program that runs in CAS can read and create a column with a
VARBINARY data type in a CAS table. It can also write to an existing VARBINARY column in a CAS
table. In earlier SAS Viya releases, DS2 does not return an error when reading VARBINARY columns.
However, the data is incorrectly treated as character data.
3 Beginning with SAS Viya 3.5, a DS2 program that runs in CAS can create a column with a BINARY
data type. However, it cannot read a BINARY data column from a CAS table or write to an existing
BINARY column in a CAS table. DS2 returns an error if an attempt is made to do so. Use the DROP
data set option to exclude BINARY columns when reading or writing to a CAS table that contains a
BINARY column. In earlier SAS Viya releases, no error was reported when reading BINARY columns.
However, the data was not read correctly.
Date, time, and timestamp values in CAS tables are supported as DOUBLEs, with a
SAS format applied. When SAS Viya Data Connectors read DATE, TIME, and
TIMESTAMP columns from an ANSI-compliant data source, they convert the
columns to data type DOUBLE. DS2 applies a DATE. SAS format to date values, a
TIME. SAS format to time values, and a DATETIME. SAS format to datetime values.
826 Chapter 21 / DS2 Procedure
Details
This example uses a simple DS2 program that displays Hello World! in the SAS
log. The example shows basic differences between DS2 and the SAS DATA step.
The code looks similar to the SAS DATA step, but there are syntax elements that
are different, such as default system methods (INIT, RUN, and TERM). Also, DS2
supports the most common SQL data types such as DECIMAL, INTEGER, and
VARCHAR to make operations more native for DBMS data.
Program
proc ds2 libs=work;
data _null_;
method init();
dcl varchar(16) str;
str = 'Hello World!';
put str;
end;
enddata;
run;
quit;
Example 1: Introducing DS2 Code 827
Program Description
Execute the PROC DS2 statement. The LIBS= connection option specifies to
execute the request in the SAS Work library. The PROC DS2 statement sets up the
environment to submit DS2 language statements.
proc ds2 libs=work;
Enter the DS2 language statements. _NULL_ on the DS2 DATA statement indicates
that there is no automatic output generated. The DS2 PUT statement writes to the
SAS log.
data _null_;
method init();
dcl varchar(16) str;
str = 'Hello World!';
put str;
end;
enddata;
Submit the DS2 statements. The RUN statement submits the DS2 statements. The
RUN statement is required. SAS reads the program statements that are associated
with one task until it reaches a RUN statement.
run;
Details
This example creates a SAS data set in a Base SAS session by submitting the DS2
procedure, and then submitting DS2 language statements. The output shows the
first ten rows of the data set.
Program
libname myfiles base 'C:\myfiles';
proc ds2 libs=myfiles;
data myfiles.basetable;
declare double j j2;
method run();
do j = 1 to 1000;
j2 = 2*j;
output;
end;
end;
enddata;
run;
quit;
proc print data=myfiles.basetable (obs=10);
run;
Example 2: Creating a SAS Data Set 829
Program Description
Assign a library reference to the SAS data set to be created. The LIBNAME
statement assigns the libref MyFiles, specifies the BASE engine, and specifies the
physical location for the SAS data set.
libname myfiles base 'C:\myfiles';
Execute the PROC DS2 statement. The PROC DS2 statement connects to the data
source by using the libref MyFiles and sets up the environment to submit DS2
language statements.
proc ds2 libs=myfiles;
Enter the DS2 language statements. The DS2 DATA statement creates an output
table named Myfiles.BaseTable. The two-level name in the DATA statement
specifies the catalog identifier MyFiles. The DECLARE statement assigns the data
type DOUBLE to the variables J and J2. The METHOD statement identifies the RUN
system method that is used to create output. The OUTPUT statement writes a row
to table MyFiles.BaseTable after each execution of the DO loop.
data myfiles.basetable;
declare double j j2;
method run();
do j = 1 to 1000;
j2 = 2*j;
output;
end;
end;
enddata;
Submit the DS2 language statements. The RUN statement submits the DS2
statements. The RUN statement is required. SAS reads the program statements
that are associated with one task until it reaches a RUN statement.
run;
Print the SAS data set. The PRINT procedure prints the observations in the SAS
data set. The OBS= data set option limits the output to 10 observations.
proc print data=myfiles.basetable (obs=10);
run;
830 Chapter 21 / DS2 Procedure
Details
The following example shows the usefulness of the RUN CANCEL statement in a
line prompt mode session. The sixth statement in the code contains an invalid value
for the column (Z instead of Y). RUN CANCEL ends the PROC DS2 step and
prevents it from executing.
Example 4: Routing Data to Tables Based on Values 831
Program
proc ds2;
data xy_data;
declare double x y;
method init();
do x = 1 to 5;
z = 2*x;
end;
end;
enddata;
run cancel;
quit;
Example Code 21.2 SAS Log Showing the Canceled PROC DS2 Step
Details
This example illustrates how to create tables based on a condition. Programs 1 and
2 create two tables, Dept1_Items and Dept2_Items, that hold costs for items used
by two departments. The third program creates two tables, Highcosts and
Lowcosts, based on the costs of the items in the two items tables. Programs 4 and
5 output the contents of the costs tables.
Program
proc ds2;
/* Program 1 */
data dept1_items (overwrite=yes);
dcl varchar(20) item;
dcl double cost;
method init();
item = 'staples'; cost = 1.59; output;
item = 'pens'; cost = 3.26; output;
item = 'envelopes'; cost = 11.42; output;
end;
enddata;
run;
/* Program 2 */
data dept2_items (overwrite=yes);
dcl varchar(20) item;
dcl double cost;
method init();
item = 'erasers'; cost = 5.43; output;
item = 'paper'; cost = 26.92; output;
item = 'toner'; cost = 62.29; output;
end;
enddata;
run;
/* Program 3 */
data lowCosts (overwrite=yes) highCosts (overwrite=yes);
method run();
set dept1_items dept2_items;
if cost <= 10.00 then
output lowCosts;
else
output highCosts;
end;
enddata;
run;
/* Program 4 */
data;
method run();
set lowCosts;
end;
Example 5: Run a DS2 Program in CAS 833
enddata;
run;
/* Program 5 */
data;
method run();
set highCosts;
end;
enddata;
run;
quit;
Details
Here is an example of a DS2 parallel program that is run in CAS. A DS2 parallel
program is a DS2 program that contains a thread program and a data program and
does not contain any data manipulation statements. That is, the data program does
not contain any statements besides SET FROM and OUTPUT. Operations in the
thread program are applied to multiple data observations in parallel. Each CAS
worker processes a subset of the data set and generates a subset of the result set.
Program
options cashost="cloud.example.com" casport=5570;
cas mysess;
caslib casdata datasource=(srctype=path)
path="testdata/cas";
proc casutil;
load casdata="cars_single.sashdat" incaslib="casdata"
casout="cars_single";
run;
proc ds2 sessref=mysess;
thread cars_thd / overwrite=yes;
method run();
set cars_single;
if (msrp > 100000) then do;
put make= model= msrp=;
output;
end;
end;
endthread;
run;
quit;
Example 5: Run a DS2 Program in CAS 835
Program Description
Connect to the CAS server. The CASHOST= and CASPORT= options specify to
connect to the CAS server at cloud.example.com using port 5570. This step is not
required if your network has a pre-configured CAS server connection.
options cashost="cloud.example.com" casport=5570;
Establish a CAS session. The CAS statement specifies to start a CAS session
named MySess.
cas mysess;
Load the table into CAS for processing. The CASUTIL procedure is used to load
table Cars_Single.sashdat into the CAS session. The CASOUT= parameter assigns
the loaded table the name Cars_Single.
proc casutil;
load casdata="cars_single.sashdat" incaslib="casdata"
casout="cars_single";
run;
Issue the PROC DS2 statement and specify the SESSREF= procedure option.
SESSREF= instructs the procedure to process the request using CAS session
MySess.
proc ds2 sessref=mysess;
Enter the DS2 language statements. The DS2 THREAD statement creates a thread
program named Cars_Thd that specifies criteria for selecting data from loaded
table Cars_Single. The DS2 DATA statement creates an output table, Cars_Luxury,
and specifies to set the results of the thread program as the content of the new
table. The data program specifies to use four threads to execute the thread
program.
thread cars_thd / overwrite=yes;
method run();
set cars_single;
if (msrp > 100000) then do;
put make= model= msrp=;
output;
end;
end;
endthread;
Submit the DS2 language statements. The RUN statement submits the DS2
statements. The RUN statement is required. SAS reads the program statements
that are associated with one task until it reaches a RUN statement.
run;
22
DSTODS2 Procedure
For more information, see Chapter 21, “DS2 Procedure,” on page 801.
838 Chapter 22 / DSTODS2 Procedure
PROC DSTODS2 requires an input file containing the source to be translated. The
input file is specified via the IN= argument. PROC DSTODS2 also requires an
output filename where the translated source is to be written. This file is specified
via the OUT= argument.
Note: The resulting output file might not be syntactically complete. You might
have to clean up the output file to create a DS2 program that can be compiled and
executed.
Note: PROC DSTODS2 adds this code lines to your output file. You should not
remove it from your .ds2 file during cleanup:
_return: ;
Overview
PROC DSTODS2 cannot translate all possible DATA step syntax. Its main purpose
is to support typical SAS Enterprise Miner scoring syntax, which is a subset of the
full DATA step syntax.
Supported Syntax
The DATA step syntax that PROC DSTODS2 supports includes the following items.
Concepts: DSTODS2 Procedure 839
n All functions and formats are translated as-is. However, only functions and
formats that are supported by DS2 are valid. For more information, see “DS2
Functions” in SAS DS2 Language Reference.
n Constant lists (as used in IN clauses)
n Variable lists except for type modifiers in lists, for example, x-numeric-a
840 Chapter 22 / DSTODS2 Procedure
Note: Because of the depth of the DATA step language, this list is not exhaustive.
Unsupported Syntax
The DATA step syntax that PROC DSTODS2 does not support includes the
following items:
n The following statements are not supported:
ABORT INFILE
ATTRIB INFORMAT
CALL INPUT (all forms)
CARDS LINK
CARDS4 LIST
DATALINES LOSTCARD
DATALINES4 MODIFY
DELETE PUT statement with formatting of any
kind
DESCRIBE PUTLOG
DISPLAY REMOVE
DO OVER REPLACE
ERROR UPDATE
EXECUTE WHERE
FILE WINDOW
FORMAT (undeclared variables)
n Data set options other than DROP, IN, KEEP, and RENAME
CONTROL NOBS
CUROBS OPEN
END POINT
INDSNAME KEYRESET
KEY
n Implicit arrays
n Bitstring expressions
n Other procedures
Note: Because of the depth of the DATA step language, this list is not exhaustive.
n The resulting DS2 program contains two lines of code that should be removed
before saving your final DS2 program:
ds2_options sas tkgmac;
_return: ;
These comments are placed in-line in the original position as much as possible
to enable you to easily compare the original code with the translated code. But
there are situations where comments and other code might be moved to other
positions. For example, the code might be moved to the top of a block (the
outermost scope).
n The resulting DS2 program might not be syntactically complete. The resulting
program can result in one of the following outcomes:
o compile and execute without error. This is unlikely if any of the new DATA
step syntax is used.
o compile but fail to execute. This is particularly true of any programs that
contain commented-out sections.
o not compile. This could happen because there was a warning, or something
that translates syntactically but has no corresponding support in DS2. This
could be the case for certain functions and formats. It could happen for some
of the variable list features also.
o produce a fatal syntax error. This could happen for some previously
undiscovered feature or for some aspect of the previously mentioned
variable list syntax.
n The resulting DS2 program does not invoke the SAS In-Database Code
Accelerator. You must take the part of the code that can be run in parallel and
create a thread program to accompany the data program.
n If any code line is longer than 32767 characters, it is truncated.
might have to adjust your LENGTH statement before running PROC DSTODS2
to account for the length of the variable in characters to avoid truncation. There
is no issue with single-byte encodings.
n If you run PROC DSTODS2 on a DATA step program with the SESSREF= option
to run on the CAS server, you must move the SESSREF= option from the DATA
statement to the PROC DS2 statement before running on the CAS server.
PROC DSTODS2 Translates DATA step code into DS2 code. Ex. 1, Ex. 2,
Ex. 3
Syntax
PROC DSTODS2 IN=datastep-program-filename OUT=ds2-program-filename
<OUTDIR="output-directory-name">
Required Arguments
IN=datastep-program-filename
specifies the name of the DATA step file or a SAS fileref.
OUT=ds2-program-filename
specifies the name of the DS2 file that is created.
Restriction You must specify only a single, output filename. It cannot include a
pathname.
Note You should remove the following two lines from your output .ds2
program:
ds2_options sas tkgmac;
_return: ;
Tip If you cannot or do not specify the OUTDIR= directory, the output
file is automatically written to the current working directory. You
can use this code to find your current working directory:
data _null_;
file 'name.txt';
put x;
run;
OUTDIR="output-directory-name"
specifies the output directory name for the file.
Requirement For UNIX directories, you must include the final directory
separator(/), for example, outdir="/mydir/files/"
Details
This example uses PROC DSTODS2 to translate the following DATA step program,
dsEx1.sas, that has a SET statement and a BY statement.
844 Chapter 22 / DSTODS2 Procedure
data _null_;
length x y z w 8;
set x;
by x--z w;
put x=;
run;
Execute the PROC DSTODS2 statement. Without a currently assigned libref, the
PROC DSTODS2 statement simply sets up the environment to submit DS2
language statements.
proc dstods2 in="dsEx1.sas" out="ds2Ex1.ds2";
Submit the DSTODS2 statements. The RUN statement submits the DSTODS2
statements. The RUN statement is required. SAS reads the program statements
that are associated with one task until it reaches a RUN statement.
run;
data _NULL_;
dcl double X;
dcl double Y;
dcl double Z;
dcl double W;
method run();
set X;
by /* X--Z */ W;
put X=;
;
_return: ;
end;
enddata;
Example 2: PUT Statement with Line Specifiers 845
Details
This example uses PROC DSTODS2 to translate the following DATA step program,
dsEx2.sas, that has a DO statement and several PUT statements.
data _null_;
file temp linesize=32600;
do i = 1 to 5330;
put i 6. @;
end;
put @31990 'N1=72' @;
put @32580 'N2=32413' @;
put @32001 'N4=32 N3=783424123' @;
put @32477 'N5=1977' @;
put @32222 'N6=1981' ;
run;
Execute the PROC DSTODS2 statement. Without a currently assigned libref, the
PROC DSTODS2 statement simply sets up the environment to submit DS2
language statements.
proc dstods2 in="dsEx2.sas" out="ds2Ex2.ds2";
Submit the DSTODS2 statements. The RUN statement submits the DSTODS2
statements. The RUN statement is required. SAS reads the program statements
that are associated with one task until it reaches a RUN statement.
run;
data _NULL_;
method run();
/* FILE TEMP LINESIZE = 32600 */;
do I = 1.0 to 5330.0;
put I 6. /* Put statement contains unsupported feature(s) @ */ ;
end;
put /* Put statement contains unsupported feature(s) @ 31990 'N1=72'
@ */ ;
put /* Put statement contains unsupported feature(s) @ 32580 'N2=32413'
@ */ ;
put /* Put statement contains unsupported feature(s) @ 32001 'N4=32
N3=783424123'
@ */ ;
put /* Put statement contains unsupported feature(s) @ 32477 'N5=1977'
@ */ ;
put /* Put statement contains unsupported feature(s) @ 32222 'N6=1981'
*/ ;
;
_return: ;
end;
enddata;
Example 3: Arrays
Features: PROC DSTODS2 statement
Details
This example uses PROC DSTODS2 to translate the following DATA step program,
dsEx2.sas, that has an ARRAY statement.
data _null_;
array a(*) a1-a5;
retain a1-a5 (1*(10 10 10 10 10));
put _all_;
run;
Execute the PROC DSTODS2 statement. Without a currently assigned libref, the
PROC DSTODS2 statement simply sets up the environment to submit DS2
language statements.
proc dstods2 in="dsEx3.sas" out="ds2Ex3.ds2";
Example 3: Arrays 847
Submit the DSTODS2 statements. The RUN statement submits the DSTODS2
statements. The RUN statement is required. SAS reads the program statements
that are associated with one task until it reaches a RUN statement.
run;
data _NULL_;
retain A1-A5 (1 * (10, 10, 10, 10, 10)) ;
vararray double A[*] A1-A5;
method run();
put _ALL_;
;
_return: ;
end;
enddata;
848 Chapter 22 / DSTODS2 Procedure
849
23
EXPORT Procedure
In delimited files, a delimiter can be a blank, comma, or tab that separates columns
of data values. If you have a license for SAS/ACCESS Interface to PC Files, you can
also export to additional file formats, such as to a Microsoft Access database,
850 Chapter 23 / EXPORT Procedure
Microsoft Excel workbook, DBF file, and Lotus spreadsheets. For more information,
see SAS/ACCESS Interface to PC Files: Reference.
Starting in SAS 9.4, you can export a SAS data set to a JMP 7 or later file, and JMP
variables can be up to 255 characters long. Extended attributes are now used
automatically, and the META= statement is no longer supported for JMP files. For
more information, see “JMP Files” in SAS/ACCESS Interface to PC Files: Reference.
You control the results with options and statements that are specific to the output
data source. The EXPORT procedure generates the specified output file and writes
information about the export to the SAS log. The log displays the DATA step or the
SAS/ACCESS code that the EXPORT procedure generates. If a translation engine is
used, then no code is submitted.
The Export Wizard or the External File Interface (EFI) can be used to guide you
through the steps to export a SAS data set. The Export Wizard can generate
EXPORT procedure statements, which you can save to a file for subsequent use.
For more information, see “External File Interface (EFI)” in SAS/ACCESS Interface to
PC Files: Reference.
The Export Wizard uses EFI methods to read and write data in delimited files, and
this can affect the behavior when you use the EXPORT procedure or Export Wizard.
For example, when exporting SAS data to a delimited file, the EXPORT procedure
discards items that exceed the output line-length. For more information, see the
DROPOVER option in the FILE Statement in SAS DATA Step Statements: Reference.
To open the Export Wizard, from the SAS windowing environment, select File ð
Export Data. For more information about the Export Wizard, see the Base SAS
online Help and documentation. For more detail and an example, see “Using SAS
Import and Export Wizards” in SAS/ACCESS Interface to PC Files: Reference.
For more information about the encodings of format catalogs, see Migrating Data to
UTF-8 for SAS Viya and SAS/ACCESS Interface to PC Files: Reference.
The VARCHAR data type is similar to the CHAR data type. CHAR variables have a
length that is measured in terms of bytes. VARCHAR variables have a length that is
measured in terms of characters rather than bytes. For information about using
VARCHAR, see SAS Cloud Analytic Services: DATA Step Programming.
In the following example, the CAS engine is used with the LENGTH statement to
create a VARCHAR variable and a CHAR variable. The VARCHAR variable, X, has a
length of 30 and the CHAR variable, Y, also has a length of 30.
libname mycas cas;
data mycas.string;
length x varchar(30);
length y $30;
x = 'abc'; y = 'def';
run;
proc contents data=mycas.string; run;
PROC EXPORT Export SAS data sets to an external data file Ex. 1, Ex. 2
Tips: Beginning with SAS Viya 3.5, PROC EXPORT supports all access types that are
available in the FILENAME statement.
Beginning with SAS 9.4M5, PROC EXPORT supports the VARCHAR data type for
CAS tables. For more information, see “Support for the VARCHAR Data Type” on
page 850.
See: “FILENAME Statement” in SAS Global Statements: Reference
Syntax
PROC EXPORT DATA=<libref.>SAS data set <(SAS data set options)>
PROC EXPORT Statement 853
OUTFILE="filename" | OUTTABLE="tablename"
<DBMS=identifier> <REPLACE> <LABEL>;
Required Arguments
DATA=<libref.>SAS data set
identifies the input SAS data set with either a one- or two-level SAS name
(library and member name). If you specify a one-level name, by default, the
EXPORT procedure uses either the USER library (if assigned) or the WORK
library.
The EXPORT procedure can export a SAS data set only if the data target
supports the format of a SAS data set. The amount of data must also be within
the limitations of the data target. For example, some data files have a maximum
number of rows or columns. Some data files cannot support SAS user-defined
formats and informats. If the SAS data set that you want to export exceeds the
limits of the target file, the EXPORT procedure might not be able to export it
correctly. In many cases, the procedure attempts to convert the data to the best
of its ability. However, conversion is not possible for some types.
Beginning with SAS 9.4M1 a SAS data set name can contain a single quotation
mark when the VALIDMEMNAME=EXTEND system option is also specified.
Using VALIDMEMNAME= expands the rules for the names of certain SAS
members, such as a SAS data set name. For more information, see “Rules for
SAS Data Set Names, View Names, and Item Store Names” in SAS Language
Reference: Concepts.
Default If you do not specify a SAS data set to export, the EXPORT
procedure uses the most recently created SAS data set. SAS keeps
track of the data sets with the system variable _LAST_. To be certain
that the EXPORT procedure uses the correct data set, you should
identify the SAS data set.
OUTFILE="filename" | "fileref"
specifies the complete path and filename or a fileref for the output PC file,
spreadsheet, or delimited external file. A fileref is a SAS name that is associated
with the physical location of a file. To assign a fileref, use the FILENAME
statement.
If you specify a fileref, or if the complete path and filename do not include
special characters (such as the backslash in a path), lowercase characters, or
spaces, you can omit the quotation marks.
Alias FILE
Restrictions The EXPORT procedure does not support device types or access
methods for the FILENAME statement except for DISK. For
example, the EXPORT procedure does not support the TEMP
device type, which creates a temporary external file.
PROC EXPORT does not support the DROP|KEEP data set options
with Name Range Lists for DBMS=CSV | TAB | DLM.
OUTTABLE="tablename"
specifies the table name of the output DBMS table. If the name does not include
special characters (such as question marks), lowercase characters, or spaces,
you can omit the quotation marks. Note that the DBMS table name might be
case sensitive.
When you export a DBMS table, you must specify the DBMS
option.
Optional Arguments
DBMS=identifier
specifies the type of data to export. To export to a DBMS table, you must
specify the DBMS option by using a valid database identifier. For DBMS=DLM,
the default delimiter character is a space. However, you can use
DELIMITER='char'.
LABEL
specifies a variable label name. SAS writes these to the exported table as
column names. If the label names do not already exist, SAS writes them to the
exported table.
REPLACE
overwrites an existing file. If you do not specify REPLACE, the EXPORT
procedure does not overwrite an existing file.
DBENCODING Statement
Indicates the encoding used to save data in JMP files.
Syntax
DBENCODING=12-char SAS encoding-value;
Required Argument
12-char SAS encoding-value
indicates the encoding used to save data in JMP files. Encoding maps each
character in a character set to a unique numeric representation, which results in
a table of code points. A single character can have different numeric
representations in different encodings. This value can be up to 12 characters
long.
DELIMITER Statement
Specifies the delimiter to separate columns of data in the output file.
Syntax
DELIMITER=char | 'nn’x;
Required Argument
char | 'nn'x
specifies the delimiter to use to separate values in the output file. You can
specify the delimiter as a single BYTE character or as a hexadecimal value. For
example, if you want columns of data to be separated by an ampersand, specify
DELIMITER='&'. A single character which requires two bytes to be represented
in a DBCS environment is not valid.
META Statement 857
FMTLIB Statement
Write SAS format values defined in the format catalog to the JMP file for the value labels.
Syntax
FMTLIB=<libref> format-catalog;
Required Argument
<libref.>format-catalog
specifies the format catalog to be written to the JMP file.
META Statement
Writes SAS metadata information to the JMP file. (Deprecated)
Syntax
META=libref.member-data-set;
Required Argument
libref.member-data-set
specifies the SAS data set that contains the metadata information to be written
to the JMP file.
The META statement can remain in your programs, yet it generates a NOTE in
the log saying that META has been replaced by extended attributes and is
ignored.
858 Chapter 23 / EXPORT Procedure
PUTNAMES Statement
Writes SAS variable names as column headings to the first row of the exported data file.
Default: YES
Restriction: Valid only for the EXPORT procedure.
Note: If you specify the LABEL= option, the SAS variable labels (not the variable names)
are written as column headings.
Example: “Example 3: Exporting to a Tab Delimited File with the PUTNAMES= Statement” on
page 865
Syntax
PUTNAMES=YES | NO;
Required Arguments
YES
specifies that the EXPORT procedure is to do the following tasks:
n Write the SAS variable names as column names (or headings) to the first row
of the exported data file.
n Write the first row of the SAS data set to the second row of the exported
data file.
NO
specifies that the EXPORT procedure is to write the first row of SAS data set
values to the first row of the exported data file.
DBMS=
OUTFILE=
REPLACE
DELIMITER= statement
Details
This example exports the SASHelp.Class data set to a delimited external file. The
following example is the SASHelp.Class data set before it is exported:
Program
delimiter='&';
run;
Program Description
Specify the input data set. Note that the filename does not contain an extension.
DBMS=DLM specifies that the output file is a delimited file.
The DELIMITER option specifies that an & (ampersand) will delimit data fields in
the output file.
delimiter='&';
run;
Log
This partial SAS log displays this information about the successful export, including
the generated SAS DATA step.
Example 1: Exporting to a Delimited External Data Source 861
2 /**********************************************************************
3 * PRODUCT: SAS
4 * VERSION: 9.3
5 * CREATOR: External File Interface
6 * DATE: 31JAN11
7 * DESC: Generated SAS Datastep Code
8 * TEMPLATE SOURCE: (None Specified.)
9 ***********************************************************************/
10 data _null_;
11 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
12 %let _EFIREC_ = 0; /* clear export record count macro variable */
13 file 'c:\myfiles\class' delimiter='&' DSD DROPOVER lrecl=32767;
14 if _n_ = 1 then /* write column names or labels */
15 do;
16 put
17 "Name"
18 '&'
19 "Sex"
20 '&'
21 "Age"
22 '&'
23 "Height"
24 '&'
25 "Weight"
26 ;
27 end;
28 set SASHELP.CLASS end=EFIEOD;
29 format Name $8. ;
30 format Sex $1. ;
31 format Age best12. ;
32 format Height best12. ;
33 format Weight best12. ;
34 do;
35 EFIOUT + 1;
36 put Name $ @;
37 put Sex $ @;
38 put Age @;
39 put Height @;
40 put Weight ;
41 ;
42 end;
43 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
44 if EFIEOD then call symputx('_EFIREC_',EFIOUT);
45 run;
Output
The EXPORT procedure produces this external file:
862 Chapter 23 / EXPORT Procedure
Name&Sex&Age&Height&Weight
Alfred&M&14&69&112.5
Alice&F&13&56.5&84
Barbara&F&13&65.3&98
Carol&F&14&62.8&102.5
Henry&M&14&63.5&102.5
James&M&12&57.3&83
Jane&F&12&59.8&84.5
Janet&F&15&62.5&112.5
Jeffrey&M&13&62.5&84
John&M&12&59&99.5
Joyce&F&11&51.3&50.5
Judy&F&14&64.3&90
Louise&F&12&56.3&77
Mary&F&15&66.5&112
Philip&M&16&72&150
Robert&M&12&64.8&128
Ronald&M&15&67&133
Thomas&M&11&57.5&85
William&M&15&66.5&112
Details
This example exports the SAS data set SASHelp.Class to a delimited file.
Program
Specify the data set to be exported. The WHERE option requests a subset of the
observations. The OUTFILE= option specifies the output file. The DBMS= option
specifies that the output file is a CSV file, and overwrites the target CSV, if it exists.
dbms=csv
replace;
run;
Log
This partial SAS log displays this information about the successful export, including
the generated SAS DATA step.
864 Chapter 23 / EXPORT Procedure
584 /**********************************************************************
585 * PRODUCT: SAS
586 * VERSION: 9.4
587 * CREATOR: External File Interface
588 * DATE: 18APR14
589 * DESC: Generated SAS Datastep Code
590 * TEMPLATE SOURCE: (None Specified.)
591 ***********************************************************************/
592 data _null_;
593 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
594 %let _EFIREC_ = 0; /* clear export record count macro variable */
595 file 'c:\myfiles\Femalelist.csv' delimiter=',' DSD DROPOVER lrecl=32767
595! ;
596 if _n_ = 1 then /* write column names or labels */
597 do;
598 put
599 "Name"
600 ','
601 "Sex"
602 ','
603 "Age"
604 ','
605 "Height"
606 ','
607 "Weight"
608 ;
609 end;
610 set SASHELP.CLASS(where=(sex='F')) end=EFIEOD;
611 format Name $8. ;
612 format Sex $1. ;
613 format Age best12. ;
614 format Height best12. ;
615 format Weight best12. ;
616 do;
617 EFIOUT + 1;
618 put Name $ @;
619 put Sex $ @;
620 put Age @;
621 put Height @;
622 put Weight ;
623 ;
624 end;
625 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection
625! macro variable */
626 if EFIEOD then call symputx('_EFIREC_',EFIOUT);
627 run;
Output
The EXPORT procedure produces this external CSV file:
Details
This example shows exporting a SAS data set, WORK.INVOICE, to a tab-delimited
file. The first program uses PROC EXPORT with the PUTNAMES= statement and
the second program does not. They show how the use of this statement affects
column headings in a tab-delimited file.
The following display shows the SAS data set, WORK.INVOICE, before it is
exported to a tab-delimited file:
866 Chapter 23 / EXPORT Procedure
Program
PROC PRINT DATA=WORK.INVOICE;
RUN;
PROC PRINT;
RUN;
Program Description
Use the PUTNAMES=YES statement in the EXPORT procedure. After
WORK.INVOICE is printed, using the PUTNAMES=YES statement writes the SAS
variables names as column names to the first row of the exported delimited file,
Example 3: Exporting to a Tab Delimited File with the PUTNAMES= Statement 867
Invoice_names.txt. The first row of data is then written to the second row of the
delimited file.
PROC PRINT DATA=WORK.INVOICE;
RUN;
Impact of the PUTNAMES=NO statement. When you set this statement to NO,
PROC EXPORT writes the first row of data to the first row of the exported
delimited file. Therefore, the SAS variable names are skipped, and the columns are
left unlabeled.
PROC EXPORT DATA=WORK.INVOICE
OUTFILE="c:\temp\invoice_data_1st.txt"
DBMS=TAB REPLACE;
PUTNAMES=NO;
RUN;
PROC PRINT;
RUN;
SAS Log
This SAS log displays information about the successful export, including the
generated SAS DATA step. The log is divided into sections only for documentation
appearances.
868 Chapter 23 / EXPORT Procedure
495 /**********************************************************************
496 * PRODUCT: SAS
497 * VERSION: 9.4
498 * CREATOR: External File Interface
499 * DATE: 24MAY14
500 * DESC: Generated SAS Datastep Code
501 * TEMPLATE SOURCE: (None Specified.)
502 ***********************************************************************/
503 data _null_;
504 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
505 %let _EFIREC_ = 0; /* clear export record count macro variable */
506 file 'c:\temp\invoice_names.txt' delimiter='09'x DSD DROPOVER lrecl=32767;
507 if _n_ = 1 then /* write column names or labels */
508 do;
509 put
510 "INVNUM"
511 '09'x
512 "BILLEDTO"
513 '09'x
514 "AMTBILL"
515 '09'x
516 "COUNTRY"
517 '09'x
518 "AMTINUS"
519 '09'x
520 "BILLEDBY"
521 '09'x
522 "BILLEDON"
523 '09'x
524 "PAIDON"
525 ;
526 end;
527 set WORK.INVOICE end=EFIEOD;
528 format INVNUM best12. ;
529 format BILLEDTO $8. ;
530 format AMTBILL dollar18.2 ;
531 format COUNTRY $20. ;
532 format AMTINUS dollar18.2 ;
533 format BILLEDBY best12. ;
534 format BILLEDON date9. ;
535 format PAIDON date9. ;
536 do;
537 EFIOUT + 1;
538 put INVNUM @;
539 put BILLEDTO $ @;
540 put AMTBILL @;
541 put COUNTRY $ @;
542 put AMTINUS @;
543 put BILLEDBY @;
544 put BILLEDON @;
545 put PAIDON ;
546 ;
547 end;
548 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro
variable */
549 if EFIEOD then call symputx('_EFIREC_',EFIOUT);
550 run;
Example 3: Exporting to a Tab Delimited File with the PUTNAMES= Statement 869
552
553
554 PROC EXPORT DATA= WORK.INVOICE
555 OUTFILE= "c:\temp\invoice_data_1st.txt"
556 DBMS=TAB REPLACE;
557 PUTNAMES=NO;
558 RUN;
870 Chapter 23 / EXPORT Procedure
559 /**********************************************************************
560 * PRODUCT: SAS
561 * VERSION: 9.4
562 * CREATOR: External File Interface
563 * DATE: 24MAY14
564 * DESC: Generated SAS Datastep Code
565 * TEMPLATE SOURCE: (None Specified.)
566 ***********************************************************************/
567 data _null_;
568 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
569 %let _EFIREC_ = 0; /* clear export record count macro variable */
570 file 'c:\temp\invoice_data_1st.txt' delimiter='09'x DSD DROPOVER
lrecl=32767;
571 set WORK.INVOICE end=EFIEOD;
572 format INVNUM best12. ;
573 format BILLEDTO $8. ;
574 format AMTBILL dollar18.2 ;
575 format COUNTRY $20. ;
576 format AMTINUS dollar18.2 ;
577 format BILLEDBY best12. ;
578 format BILLEDON date9. ;
579 format PAIDON date9. ;
580 do;
581 EFIOUT + 1;
582 put INVNUM @;
583 put BILLEDTO $ @;
584 put AMTBILL @;
585 put COUNTRY $ @;
586 put AMTINUS @;
587 put BILLEDBY @;
588 put BILLEDON @;
589 put PAIDON ;
590 ;
591 end;
592 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro
variable */
593 if EFIEOD then call symputx('_EFIREC_',EFIOUT);
594 run;
595
596 PROC PRINT; RUN;
Output Example
Using the PROC EXPORT PUTNAMES=YES statement, the SAS variable names are
mapped to the column headings in the tab-delimited file. This is the default
behavior.
Output 23.5 SAS Data Exported to a Tab-Delimited File Using the PUTNAMES=YES Statement
Output Example
Using the PROC EXPORT PUTNAMES=NO statement results in unnamed columns
in the tab-delimited file.
Output 23.6 SAS Data Exported to a Tab-Delimited File Using the PUTNAMES=NO Statement
872 Chapter 23 / EXPORT Procedure
873
24
FCMP Procedure
The FCMP procedure uses the SAS language compiler to compile and execute SAS
programs. The compiler subsystem generates machine language code for the
computer on which SAS is running. By specifying values with the CMPOPT option,
the machine language code can be optimized for efficient execution. For
information about the type of code generation optimizations to use in the SAS
language compiler, see “CMPOPT= System Option” in SAS System Options:
Reference.
You can use the functions and subroutines that you create in PROC FCMP with the
DATA step, the WHERE statement, the Output Delivery System (ODS), and with
the following procedures:
PROC NLIN PROC SQL (functions with array arguments are not supported)
Functions are equivalent to routines that are used in other programming languages.
They are independent computational blocks that require zero or more arguments. A
subroutine is a special type of function where return values are optional. All
variables that are created within a function or subroutine block are local to that
subroutine.
and the INVERSE subroutine calculates a simple inverse. Each ends with an
ENDSUB statement.
option cmplib=sasuser.MySubs;
data _null_;
daysDate=day_date(today());
put daysDate=;
daysDate=2
inverseValue=0.0833333333
The function and subroutine follow DATA step syntax. Functions and subroutines
that are already defined in the current FCMP procedure step, as well as most DATA
step functions, can be called from within these routines as well. In the example
above, the DATA step function WEEKDAY is called by DAY_DATE.
The routines in the example are saved to the data set Sasuser.MySubs, inside a
package called MathFncs. A package is any collection of related routines that are
specified by the user. It is a way of grouping related subroutines and functions
within the data set. The OUTLIB= option in the PROC FCMP statement tells PROC
FCMP where to store the subroutines that it compiles, and the LIBRARY= option
tells it where to read in libraries (C or SAS).
Note: Function and subroutine names must be unique within a package. However,
different packages can have subroutines and functions with the same names. To
select a specific subroutine when there is ambiguity, use the package name and a
period as the prefix to the subroutine name. For example, to access the MthFncs
version of INVERSE, use MthFncs.inverse.
Concepts: FCMP Procedure 877
Note: PROC FCMP routines that you create cannot have the same name as built-in
SAS functions. If the names are the same, then SAS generates an error message
stating that a built-in SAS function or subroutine already exists with the same
name.
STUDY_DAY is called from DATA step code as if it were any other function. When
the DATA step encounters a call to STUDY_DAY, it does not find this function in its
traditional library of functions. Instead, SAS searches each of the libraries or data
sets that are specified in the CMPLIB= system option for a package that contains
STUDY_DAY. In this example, STUDY_DAY is located in Sasuser.Funcs.Trial. The
878 Chapter 24 / FCMP Procedure
program calls the function, passing the variable values for start_date and end_date,
and returns the result in the variable sd.
options cmplib=sasuser.funcs;
data _null_;
start_date='15Feb2006'd;
end_date='27Mar2006'd;
sd=study_day(start, today);
put sd=;
run;
Example Code 24.1 Results from the STUDY_DAY User-Defined Function
sd=40
The OUTLIB= option is required and specifies the package where routines declared
in the routine-declarations section are stored.
Routines that are declared in the routine-declarations section can call FCMP
routines that exist in other packages. To find these routines and to check the
validity of the call, SAS searches the data sets that are specified in the INLIB=
option. The format for the INLIB= option is as follows:
inlib=library.dataset
inlib=(library1.dataset1 library2.dataset2 ... libraryN.datasetN)
inlib=library.datasetM - library.datasetN
If the routines that are being declared do not call FCMP routines in other packages,
then you do not need to specify the INLIB= option.
Concepts: FCMP Procedure 879
Declaring Functions
You declare one or more functions or CALL routines in the routine-declarations
section of the program. A routine consists of four parts:
n a name
n a body of code
n a RETURN statement
You specify these four parts between the FUNCTION or SUBROUTINE keyword
and an ENDSUB keyword. For functions, the syntax has the following form:
function
name(argument-1 <, argument-2, ...>);
program-statements;
return(expression);
endsub;
After the FUNCTION keyword, you specify the name of the function and its
arguments. Arguments in the function declaration are called formal arguments and
can be used within the body of the function. To specify a string argument, place a
dollar sign ($) after the argument name. For functions, all arguments are passed by
value. This means that the value of the actual argument, variable, or value that is
passed to the function from the calling environment is copied before being used by
the function. This copying ensures that any modification of the formal argument by
the function does not change the original value.
The formal arguments that are listed in the OUTARGS statement are passed by
reference instead of by value. This means that any modification of the formal
880 Chapter 24 / FCMP Procedure
argument by the CALL routine modifies the original variable that was passed. It
also means that the value is not copied when the CALL routine is invoked. Reducing
the number of copies can improve performance when you pass large amounts of
data between a CALL routine and the calling environment.
A RETURN statement is optional within the definition of the CALL routine. When a
RETURN statement executes, execution is immediately returned to the caller. A
RETURN statement within a CALL routine does not return a value.
The behaviors of the DROP, KEEP, FORMAT, and LENGTH statements are the same
in PROC FCMP and in the DATA step.
The following DATA step statements are not supported in PROC FCMP:
n DATA
n SET
n MERGE
n UPDATE
n MODIFY
n INPUT
n INFILE
Syntax: FCMP Procedure 881
The support for the FILE statement is limited to LOG and PRINT destinations in
PROC FCMP. The OUTPUT statement is supported in PROC FCMP, but it is not
supported within a function or subroutine.
The following statements are supported in PROC FCMP but not in the DATA step:
n FUNCTION
n STRUCT
n SUBROUTINE
n OUTARGS
PROC FCMP Create, test, and store SAS functions for use by Ex. 1, Ex. 2,
other SAS procedures Ex. 8, Ex. 7
Examples: “Example 1: Creating a Function and Calling the Function from a DATA Step” on
page 925
“Example 2: Creating and Saving Functions with PROC FCMP” on page 927
“Example 7: Using Graph Template Language (GTL) with User-Defined Functions”
on page 932
“Example 8: Standardizing Each Row of a Data Set” on page 934
Syntax
PROC FCMP options;
PROC FCMP Statement 883
Optional Arguments
DATA=filename
reads an input data set into the PROC FCMP step.
Note: The DATA option can send inputs to a function or subroutine that are
defined in the PROC FCMP step. The PROC FCMP step iterates through each
observation and calls the function or subroutine during each iteration.
Note: For more information about the DATA= and OUT= options, see “Data Set
Input and Output” on page 899.
Note: This example demonstrates using the DATA= and OUT= options in the
PROC FCMP statement.
option CMPLIB=sasuser.funcs;
ENCRYPT
specifies to encode the source code in a data set. The alias HIDE is also valid.
FLOW
specifies printing a message for each statement in a program as it is executed.
This option produces extensive output.
LIBRARY | INLIB=library.dataset
LIBRARY | INLIB=(library-1.dataset library-2.dataset ... library-n.dataset)
LIBRARY | INLIB=library.datasetM - library.datasetN
specifies that previously compiled libraries are to be linked into the program.
These libraries are created by a previous PROC FCMP step or by using PROC
PROTO (for external C routines).
884 Chapter 24 / FCMP Procedure
Tips Libraries are created by the OUTLIB= option and are stored as members of
a SAS library that have the type CMPSUB. Only subroutines and functions
are read into the program when you use the LIBRARY= option.
If the routines that are being declared do not call PROC FCMP routines in
other packages, then you do not need to specify the INLIB= option. Use
the libref.dataset format to specify the two-level name of a library. The
libref and dataset names must be valid SAS names that are not longer
than eight characters.
You can specify a list of files with the LIBRARY= option, and you can
specify a range of names by using numeric suffixes. When you specify
more than one file, you must enclose the list in parentheses, except in the
case of a single range of names. The following are syntax examples:
proc fcmp library=sasuser.exsubs;
proc fcmp library=(sasuser.exsubs work.examples);
proc fcmp library=lib1-lib10;
LIST
specifies that both the LISTSOURCE and LISTPROG options are in effect.
Tip Printing both the source code and the compiled code and then comparing
the two listings of assignment statements is one way of verifying that the
assignments were compiled correctly.
LISTALL
specifies that the LISTCODE, LISTPROG, and LISTSOURCE options are in
effect.
LISTCODE
specifies that the compiled program code be printed. LISTCODE lists the chain
of operations that are generated by the compiler.
LISTFUNCS
specifies that prototypes for all visible FCMP functions or subroutines be
written to the SAS listing.
LISTPROG
specifies that the compiled program be printed. The listing for assignment
statements is generated from the chain of operations that are generated by the
compiler. The source statement text is printed for other statements.
Tip The expressions that are printed by the LISTPROG option do not
necessarily represent how the expression is actually calculated, because
intermediate results for common subexpressions can be reused. However,
the expressions are printed in expanded form by the LISTPROG option. To
see how the expression is actually evaluated, see the listing from the
LISTCODE option.
PROC FCMP Statement 885
LISTSOURCE
specifies that source code statements for the program be printed.
OUT=filename
creates an output data set.
OUTFILE=filename
writes referenced functions and the main program to a text file. Programs that
have been parsed by PROC FCMP, including macro variables, can be exported.
OUTITEMSTORE=path name
exports symbols, referenced functions, and the main program to the specified
item store. OUTITEMSTORE does not support a fileref. You must use a quoted
path.
OUTLIB=libname.dataset.package
specifies the three-level name of an output data set to which the compiled
subroutines and functions are written when the PROC FCMP step ends. This
argument is required. The following are syntax examples:
proc fcmp outlib=sasuser.fcmpsubs.pkt1;
Tips Use this option when you want to save subroutines and functions in an
output library.
Only those subroutines that are declared inside the current PROC FCMP
step are saved to the output file. Those subroutines that are loaded by
using the LIBRARY= option are not saved to the output file. If you do not
specify the OUTLIB= option, then no subroutines that are declared in the
current PROC FCMP step are saved.
PRINT
specifies printing the result of each statement in a program as it is executed.
This option produces extensive output.
TRACE
specifies printing the results of each operation in each statement in a program
as it is executed. These results are produced in addition to the information that
is printed by the FLOW option. The TRACE option produces extensive output.
The TRACE option works when the data is contained within PROC FCMP,
not when calling FCMP functions from the DATA step. This example and
output demonstrate the functionality of the TRACE option.
ABORT Statement
Terminates the current DATA step, job, or SAS session.
Syntax
ABORT;
Without Arguments
The ABORT statement in PROC FCMP has no arguments.
ARRAY Statement
Associates a name with a list of variables and constants.
Syntax
ARRAY array-name[dimensions] </NOSYMBOLS> | <variable(s)> | <constant(s)> |
<initial-values>;
Required Arguments
array-name
specifies the name of the array.
dimensions
is a numeric representation of the number of elements in a one-dimensional
array or the number of elements in each dimension of a multidimensional array.
Optional Arguments
/NOSYMBOLS
specifies that an array of numeric or character values be created without the
associated element variables. In this case, the only way that you can access
elements in the array is by array subscripting.
You can save memory if you do not need to access the individual array
element variables by name.
variable
specifies the variables of the array.
constant
specifies a number or a character string that indicates a fixed value. Enclose
character constants in quotation marks.
initial-values
gives initial values for the corresponding elements in the array. You can specify
internal values inside parentheses.
Details
The ARRAY statement that is used in PROC FCMP does not support all the
features of the ARRAY statement in the DATA step. Here is a list of differences that
apply only to PROC FCMP:
n All array references must have explicit subscript expressions.
888 Chapter 24 / FCMP Procedure
n PROC FCMP uses parentheses after a name to represent a function call. When
you reference an array, use square brackets [ ] or braces { }.
n The ARRAY statement in PROC FCMP does not support lower-bound
specifications.
n You can use a maximum of six dimensions for an array.
You can use both variables and constants as array elements in the ARRAY
statement that is used in PROC FCMP. You cannot assign elements to a constant
array. Although dimension specification and the list of elements are optional, you
must provide one of these values. If you do not specify a list of elements for the
array, or if you list fewer elements than the size of the array, PROC FCMP creates
array variables by adding a numeric suffix to the elements of the array to complete
the element list.
Example
Here are some examples of the ARRAY statement:
array spot_rate[3] 1 2 3;
array spot_rate[3] (1 2 3);
array y[4] y1-y4;
array xx[2,3] x11 x12 x13 x21 x22 x23;
array pp p1-p12;
array q[1000] /nosymbols;
ATTRIB Statement
Specifies format, label, and length information for variables.
Syntax
ATTRIB variable(s) <FORMAT=format-name LABEL='label' LENGTH=length>;
Required Argument
variable
specifies the variables that you want to associate with attributes.
DELETESUBR Statement 889
Optional Arguments
FORMAT=format-name
associates a format with variables in the variable argument.
LABEL='label'
associates a label with variables in the variable argument.
LENGTH=length
specifies the length of the variable in the variable argument.
Example
Here are some examples of the ATTRIB statement:
attrib x1 format=date7. label='variable x1' length=5;
attrib x1 format=date7. label='variable x1' length=5
x2 length=5
x3 label='var x3' format=4.
x4 length=$2 format=$4.;
DELETEFUNC Statement
Causes a function to be deleted from the function library that is specified in the OUTLIB option.
Syntax
DELETEFUNC function-name;
Required Argument
function-name
specifies the name of a function to be deleted from the function library that is
specified in the OUTLIB option.
DELETESUBR Statement
Causes a subroutine to be deleted from the function library that is specified in the OUTLIB option.
890 Chapter 24 / FCMP Procedure
Syntax
DELETESUBR subroutine-name;
Required Argument
subroutine-name
specifies the name of a subroutine to be deleted from the function library that is
specified in the OUTLIB option.
FUNCTION Statement
Specifies a subroutine declaration for a routine that returns a value.
Examples: “Example 1: Creating a Function and Calling the Function from a DATA Step” on
page 925
“Example 2: Creating and Saving Functions with PROC FCMP” on page 927
“Example 3: Using Numeric Data in the FUNCTION Statement” on page 929
“Example 4: Using Character Data with the FUNCTION Statement” on page 929
“Example 5: Using Variable Arguments with an Array” on page 930
“Example 7: Using Graph Template Language (GTL) with User-Defined Functions”
on page 932
Syntax
FUNCTION function-name(argument-1 <, argument-2, ...>) <VARARGS> <$>
<length>
<KIND | GROUP='string' ><LABEL='string-2'>;
... more-program-statements ...
RETURN (expression);
ENDSUB;
Required Arguments
function-name
specifies the name of the function.
argument
specifies one or more arguments for the function. You specify character
arguments by placing a dollar sign ($) after the argument name. In the following
example, function myfunct(arg1, arg2 $, arg3, arg4 $); arg1 and arg3 are
numeric arguments, and arg2 and arg4 are character arguments.
expression
specifies the value that is returned from the function.
LABEL Statement 891
Optional Arguments
VARARGS
specifies that the function supports a variable number of arguments. If you
specify VARARGS, then the last argument in the function must be an array.
Restriction You must specify a numeric variable with the VARARGS argument.
$
specifies that the function returns a character value. If $ is not specified, the
function returns a numeric value.
length
specifies the length of a character value.
Default 8
KIND='string'
GROUP='string'
specifies a collection of items that have specific attributes and is limited to 32
characters.
LABEL='string-2'
specifies a label of up to 256 characters, including blanks.
Details
The FUNCTION statement is a special case of the subroutine declaration that
returns a value. You do not use a CALL statement to call a function. The definition
of a function begins with the FUNCTION statement and ends with an ENDSUB
statement.
LABEL Statement
Specifies a label of up to 256 characters.
Syntax
LABEL variable='label';
Required Arguments
variable
names the variable that you want to label.
892 Chapter 24 / FCMP Procedure
'label'
specifies a label of up to 256 characters, including blanks.
Example
Here are some examples of the LABEL statement:
label date='Maturity Date';
label bignum='Very very large numeric value';
LISTFUNC Statement
Causes the source code for a function to be written to the SAS listing.
Syntax
LISTFUNC function-name </NODEPENDENTS>;
Required Argument
function-name
specifies the name of the function for which source code is written to the SAS
listing.
Note: All dependent functions are returned in addition to the specified function
by default.
Optional Argument
/NODEPENDENTS
specifies that only the functions specified by function-name be returned and not
any dependent functions.
Alias /NODEPS
LISTSUBR Statement
Causes the source code for a subroutine to be written to the SAS listing.
STATIC Statement 893
Syntax
LISTSUBR subroutine-name;
Required Argument
subroutine-name
specifies the name of the subroutine for which source code is written to the SAS
listing.
OUTARGS Statement
Specifies arguments in an argument list that you want a subroutine to update.
Syntax
OUTARGS out-argument-1 <, out-argument-2, ...>;
Required Argument
out-argument
specifies arguments from the argument list that you want the subroutine to
update.
STATIC Statement
Retains a variable’s value from a previous call until the variable is reassigned.
894 Chapter 24 / FCMP Procedure
Syntax
STATIC variables, <initial-value(s)>;
Required Argument
variables
specifies variable names, variable lists, or array names whose values you want
to retain.
Optional Argument
initial-values
specifies an initial value, numeric or character, for one or more of the preceding
elements.
Details
The STATIC statement can be used to initialize variables.
Local variables in a function or subroutine are usually not retained between calls to
the function or subroutine. If there is an expensive initialization required, a STATIC
variable can be used to perform the initialization. STATIC variables are not
allocated on the stack, so they can be used for large local arrays to avoid
reallocations and overflows on the stack.
Examples
Example 1
Here is a numeric static example:
proc fcmp;
function fdef1(in);
static x1 1;
if x1 = 1 then do;
x1 = 2;
return(in);
end;
return (in*2);
endsub;
run;
Example 2
Here is a character static example:
proc fcmp;
function char_func( in $) $;
length c1 $ 32;
static c1 "Elephant";
if c1 = "Elephant" then
do;
c1 = in || c1;
return (c1);
end;
return( in);
endsub;
length ans $ 32;
ans = char_func( "Big ");
put "Answer should be >>Big Elephant<<" ans=;
ans = char_func( "Big ");
put "Answer should be >>Big<<" ans=;
run;
quit;
Example 3
Here is an array static example:
proc fcmp ;
function array_func( in ) ;
array a[5] ;
array foo[5];
static a first 1;
put a[1]= foo[1]=;
If first then do;
do i=1 to dim(a);
a[i]=i;
foo[i]=i;
end;
first =0;
end;
else do;
do i=1 to dim(a);
a[i]=a[i]+1;
end;
end;
put a[5];
return( in);
endsub;
896 Chapter 24 / FCMP Procedure
/* should increase by 1 */
ans = array_func( 4);
run;
STRUCT Statement
Declares (creates) structure types that are defined in C-Language packages.
Syntax
STRUCT structure-name variable;
Required Arguments
structure-name
specifies the name of a structure that is defined in a C-language package and
declared in PROC FCMP.
variable
specifies the variable that you want to declare as this structure type.
Example
Here is an example of the STRUCT statement.
struct DATESTR matdate;
matdate.month=3;
matdate.day=22;
matdate.year=2009;
SUBROUTINE Statement
Declares (creates) an independent computational block of code that you can call using a CALL
statement.
Syntax
SUBROUTINE subroutine-name (argument-1 <, argument-2, ...>) <VARARGS>
<KIND | GROUP='string'>;
OUTARGS out-argument-1 <, out-argument-2, ...>;
... more-program-statements ...
ENDSUB;
Required Arguments
subroutine-name
specifies the name of a subroutine.
argument
specifies one or more arguments for the subroutine. Character arguments are
specified by placing a dollar sign ($) after the argument name. In the following
example, subroutine mysub(arg1, arg2 $, arg3, arg4 $); arg1 and arg3 are
numeric arguments, and arg2 and arg4 are character arguments.
OUTARGS
specifies arguments from the argument list that the subroutine should update.
out-argument
specifies arguments from the argument list that you want the subroutine to
update.
Optional Arguments
VARARGS
specifies that the subroutine supports a variable number of arguments. If you
specify VARARGS, then the last argument in the subroutine must be an array.
GROUP='string'
KIND='string'
specifies a collection of items that have specific attributes and is limited to 32
characters.
Details
The SUBROUTINE statement enables you to declare (create) an independent
computational block of code that you can call with a CALL statement. The
definition of a subroutine begins with the SUBROUTINE statement and ends with
an ENDSUB statement. You can use the OUTARGS statement in a SUBROUTINE
statement to specify arguments from the argument list that the subroutine should
update.
898 Chapter 24 / FCMP Procedure
ABORT Statement
The ABORT statement in PROC FCMP does not accept arguments.
The ABORT statement is not valid within functions or subroutines in PROC FCMP.
It is valid only in the main body of the procedure.
Arrays
PROC FCMP uses parentheses after a name to represent a function call. When
referencing an array, the recommended practice is to use square brackets [ ] or
braces { }. For an array named ARR, the code would be ARR[i] or ARR{i}. PROC
FCMP limits the number of dimensions for an array to six.
For more information about the differences in the ARRAY statement for PROC
FCMP, see “Details” on page 887.
Usage: FCMP Procedure 899
DO Statement
The following type of DO statement is supported by PROC FCMP:
do i=1, 2, 3;
The DO statement in PROC FCMP does not support character loop control
variables. You can execute the following code in the DATA step, but not in PROC
FCMP:
do i='a', 'b', 'c';
The DO statement does not support a character index variable. Therefore, the
following code is not supported in PROC FCMP:
do i='one', 'two', 'three';
IF Expressions
An IF expression enables IF-THEN/ELSE conditions to be evaluated within an
expression. IF expressions are supported by PROC FCMP but not by the DATA step.
You can simplify some expressions with IF expressions by not having to split the
expression among IF-THEN/ELSE statements. For example, the following two
pieces of code are equivalent, but the IF expression (the first example) is not as
complex:
x=if y < 100 then 1 else 0;
if y < 100 then
x=1;
else
x=0;
900 Chapter 24 / FCMP Procedure
PUT Statement
The syntax of the PUT statement is similar in PROC FCMP and in the DATA step,
but their operations can be different. In PROC FCMP, the PUT statement is typically
used for program debugging. In the DATA step, the PUT statement is used as a
report or file creation tool, as well as a debugging tool. The following list describes
other differences:
n The PUT statement in PROC FCMP writes output to the SAS Output Window by
default. The PUT statement in the DATA step writes output to the SAS log by
default.
n The PUT statement in PROC FCMP does not support line pointers, format
modifiers, column output, factored lists, iteration factors, overprinting, the
_INFILE_ option, or the special character $. It does not support features that are
provided by the FILE statement options, such as DLM= and DSD.
n The PUT statement in PROC FCMP supports evaluating an expression and
writing the result by placing the expression in parentheses. The DATA step,
however, does not support the evaluation of expressions in a PUT statement. In
the following example for PROC FCMP, the expressions x/100 and sqrt(y)/2
are evaluated and the results are written to the SAS log:
put (x/100) (sqrt(y)/2);
Because parentheses are used for expression evaluation in PROC FCMP, they
cannot be used for variable or format lists as in the DATA step.
n The PUT statement in PROC FCMP does not support subscripted array names
unless they are enclosed in parentheses. For example, the statement put
(A[i]); writes the i-th element of the array A, but the statement put A[i];
results in an error message.
n An array name can be used in a PUT statement without subscripts. Therefore,
the following statements are valid:
o put A=; (when A is an array), writes all of the elements of array A with each
value labeled with the name of the element variable.
o put (A)*=; writes the same output as put A=;.
o put A; writes all of the elements of array A.
n The PUT statement in PROC FCMP follows the output of each item with a
space, which is similar to list mode output in the DATA step. Detailed control
over column and line position are supported to a lesser extent than in the DATA
step.
n The PUT statement in PROC FCMP supports the print item _PDV_, and prints a
formatted listing of all of the variables in the routine's program data vector. The
statement put _PDV_; prints a much more readable listing of the variables than
is printed by the statement put _ALL_;.
Usage: FCMP Procedure 901
You can also view these functions by using the following SAS code:
proc fcmp inlib=sashelp.slkwxl listall;
run;
run;
Passing Arrays
By default, PROC FCMP passes arrays "by value" between routines. However, if an
array is listed in the OUTARGS statement within the routine, the array is passed "by
reference."
This means that a modification to the formal parameter by the function modifies
the array that is passed. Passing arrays by reference helps to efficiently pass large
amounts of data between the function and the calling environment because the
data does not need to be copied. The syntax for specifying a formal array has the
following form:
function
name(numeric-array-parameter[*],
character-array-parameter[*] $);
You can pass DATA step temporary arrays to PROC FCMP routines.
Resizing Arrays
You can resize arrays in PROC FCMP routines by calling the built-in CALL routine
DYNAMIC_ARRAY. The syntax for this CALL routine has the following form:
call dynamic_array(array, new-dim1-size <, new-dim2-size, ...>);
SAS passes to the DYNAMIC_ARRAY CALL routine both the array that is to be
resized and a new size for each dimension of the array. A dynamic array enables the
routine to allocate the amount of memory that is needed, instead of having to
create an array that is large enough to handle all possible cases.
Support for dynamic arrays is limited to PROC FCMP routines. When an array is
resized, the array is available only in the routine that resized it. It is not possible to
resize a DATA step array or to return a PROC FCMP dynamic array to a DATA step.
Usage: FCMP Procedure 903
Functions use local variables as scratch variables during computations, and the
variables are not available when the function returns. When a function is called,
space for local variables is pushed on the call stack. When the function returns, the
space used by local variables is removed from the call stack.
subroutine subB();
x='subB';
put 'In subB: ' x=;
endsub;
run;
options cmplib=sasuser.funcs;
data _null_;
x=99;
call subA();
put 'In DATA step: ' x=;
run;
Example Code 24.2 Local Variables in Different Routines That Have the Same Name
In subB: x=subB
In subA: x=5
In DATA step: x=99
Recursion
PROC FCMP routines can be recursive. Recursion is a problem-solving technique
that reduces a problem to a smaller one that is simpler to solve and then combines
the results of the simpler solution to form a complete solution. A recursive function
is a function that calls itself, either directly or indirectly.
Each time a routine is called, space for the local variables is pushed on the call
stack. The space on the call stack ensures independence of local variables for each
call. When the routine returns, the space allocated on the call stack is removed,
freeing the space used by local variables. Recursion relies on the call stack to store
progress toward a complete solution.
When a routine calls itself, both the calling routine and the routine that is being
called must have their own set of local variables for intermediate results. If the
calling routine was able to modify the local variables of the routine that is being
called, it would be difficult to program a recursive solution. A call stack ensures the
independence of local variables for each call.
In the following example, the ALLPERMK routine in PROC FCMP has two
arguments, n and k, and writes all C(n, k) = n! /(n - k)! combinations that
contain exactly k out of the n elements. The elements are represented as binary
values (0, 1). The function ALLPERMK calls the recursive function PERMK to
traverse the entire solution space and output only the items that match a particular
filter:
proc fcmp outlib=sasuser.funcs.math;
subroutine allpermk(n, k);
array scratch[1] / nosymbols;
call dynamic_array(scratch, n);
call permk(n, k, scratch, 1,0);
endsub;
Usage: FCMP Procedure 905
options cmplib=sasuser.funcs;
data _null_;
call allpermk(5,3);
run;
Example Code 24.3 Recursion Example Results
1 1 1 0 0
1 1 0 1 0
1 1 0 0 1
1 0 1 1 0
1 0 1 0 1
1 0 0 1 1
0 1 1 1 0
0 1 1 0 1
0 1 0 1 1
0 0 1 1 1
This program uses the /NOSYMBOLS option in the ARRAY statement to create an
array without a variable for each array element. A /NOSYMBOLS array can be
accessed only with an array reference, scratch[m], and is equivalent to a DATA
step _temporary_ array. A /NOSYMBOLS array uses less memory than a regular
array because no space is allocated for variables. ALLPERMK also uses PROC
FCMP dynamic arrays.
Directory Traversal
PROC FCMP and DATA step syntax and underscores how PROC FCMP routines
simplify a program and produce independent, reusable code. DIR_ENTRIES uses as
input the following parameters:
n a starting directory
n an output parameter that is the number of pathnames placed in the result array
n an output parameter that indicates whether the complete result set was
truncated because the result array was not large enough
2 If the FILENAME function fails, write an error message to the log and then
return.
3 Otherwise, use the DOPEN function to open the directory and retrieve a
directory ID.
The DIRCLOSE CALL routine is passed a directory ID, which is passed to DCLOSE.
DIRCLOSE sets the passed directory ID to missing so that an error occurs if a
program tries to use the directory ID after the directory has been closed. The
following code implements the DIROPEN and DIRCLOSE CALL routines:
proc fcmp outlib=sasuser.funcs.dir;
function diropen(dir $);
length dir $ 256 fref $ 8;
Usage: FCMP Procedure 907
rc=filename(fref, dir);
if rc=0 then do;
did=dopen(fref);
rc=filename(fref);
end;
else do;
msg=sysmsg();
put msg '(DIROPEN(' dir= ')';
did=.;
end;
return(did);
endsub;
subroutine dirclose(did);
outargs did;
rc=dclose(did);
did=.;
endsub;
Gathering Filenames
File paths are collected by the DIR_ENTRIES CALL routine. DIR_ENTRIES uses the
following arguments:
n a starting directory
n an output parameter to fill with the number of entries in the result array
The body of DIR_ENTRIES is almost identical to the code that is used to implement
this functionality in a DATA step. Also, DIR_ENTRIES is a CALL routine that is
easily reused in several programs.
DIR_ENTRIES calls DIROPEN to open a directory and retrieve a directory ID. The
routine then calls DNUM to retrieve the number of entries in the directory. For each
entry in the directory, DREAD is called to retrieve the name of the entry. Now that
the entry name is available, the routine calls MOPEN to determine whether the
entry is a file or a directory.
If the entry is a file, then MOPEN returns a positive value. In this case, the full path
to the file is added to the result array. If the result array is full, the truncation
output argument is set to 1.
If the entry is a directory, then MOPEN returns a value that is less than or equal to
0. In this case, DIR_ENTRIES gathers the pathnames for the entries in this
subdirectory. It gathers the pathnames by recursively calling DIR_ENTRIES and
passing the subdirectory's path as the starting path. When DIR_ENTRIES returns,
the result array contains the paths of the subdirectory's entries.
subroutine dir_entries(dir $, files[*] $, n, trunc);
outargs files, n, trunc;
length dir entry $ 256;
did=diropen(dir);
if did <= 0 then return;
dnum=dnum(did);
do i=1 to dnum;
entry=dread(did, i);
/* If this entry is a file, then add to array, */
/* else entry is a directory, recurse. */
fid=mopen(did, entry);
entry=trim(dir) || '\' || entry;
if fid > 0 then do;
rc=fclose(fid);
if n < dim(files) then do;
trunc=0;
n=n + 1;
files[n]=entry;
end;
else do;
trunc=1;
call dirclose(did);
return;
end;
end;
else
call dir_entries(entry, files, n, trunc);
end;
call dirclose(did);
return;
endsub;
Example Code 24.4 Results from Calling DIR_ENTRIES from a DATA Step
c:\logs\2004\qtr1.log
c:\logs\2004\qtr2.log
c:\logs\2004\qtr3.log
c:\logs\2004\qtr4.log
c:\logs\2005\qtr1.log
c:\logs\2005\qtr2.log
c:\logs\2005\qtr3.log
c:\logs\2005\qtr4.log
c:\logs\2006\qtr1.log
c:\logs\2006\qtr2.log
This example shows the similarity between PROC FCMP syntax and the DATA step.
For example, numeric expressions and flow of control statements are identical. The
abstraction of DIROPEN into a PROC FCMP function simplifies DIR_ENTRIES. All
of the PROC FCMP routines that are created can be reused by other DATA steps
without any need to modify the routines to work in a new context.
The _DISPLAYLOC_ option writes to the log the name of the data set from where
SAS loaded a function. The _NO_DISPLAYLOC_ option prevents the data set name
from being written to the log.
OPTIONS CMPLIB=library
OPTIONS CMPLIB=(library-1 <, library-2, ...>)
OPTIONS CMPLIB=list-1 <, list-2, ...
OPTIONS CMPLIB=_DISPLAYLOC_
OPTIONS CMPLIB=_NO_DISPLAYLOC_
910 Chapter 24 / FCMP Procedure
OPTIONS
identifies the statement as an OPTIONS statement.
library
specifies that the previously compiled libraries be linked into the program.
list
specifies a list of libraries.
_DISPLAYLOC_
when using PROC FCMP, specifies to display in the SAS log the data set from
where SAS loaded the function.
Default _NO_DISPLAYLOC_
_NO_DISPLAYLOC_
when using PROC FCMP, specifies to not display in the SAS log the data set
from where SAS loaded the function, and removes any library specifications as
CMPLIB= option values.
Default _NO_DISPLAYLOC_
The output from this example spans several pages. The output is divided into five
parts.
proc fcmp outlib=sasuser.models.yval;
function simple(a, b, x);
y=a+b*x;
return(y);
endsub;
run;
Usage: FCMP Procedure 911
data a;
input y @@;
x=_n_;
datalines;
08 06 08 10 08 10
;
For information about PROC MODEL, see the SAS/ETS User's Guide.
return(3);
endsub;
run;
proc fcmp;
a = myfunc();
put a=;
run;
proc fcmp;
a = myfunc();
put a=;
run;
proc fcmp;
a = myfunc();
put a=;
run;
option CMPLIB=_DISPLAYLOC_;
option CMPLIB=_NO_DISPLAYLOC_;
The following results show a partial SAS log. The _DISPLAYLOC_ and
_NO_DISPLAYLOC options produce different results:
Usage: FCMP Procedure 917
123
124 /*- turning _DISPLAYLOC_ off -*/
125 option CMPLIB=(myfuncs1-myfuncs3);
126
127
128 proc fcmp;
129
130 a = myfunc();
131 put a=;
132 run;
133
134 option CMPLIB=(myfuncs1 myfuncs2 _DISPLAYLOC_);
135
136 proc fcmp;
137
138 a = myfunc();
139 put a=;
140 run;
141
142 option CMPLIB=_DISPLAYLOC_;
143
144 proc fcmp inlib=work.myfuncs1;
145 a = myfunc();
146 put a=;
147 run;
For more information about hash and hash iterator component objects, see “Using
the PROC FCMP Hash Object and PROC FCMP Hash Iterator Object” in SAS
Component Objects: Reference.
Dictionaries
Dictionaries create references to numeric and character data, and they also give
you fast in-memory hashing to arrays, other dictionaries, and PROC FCMP hash
objects.
For information about how to use dictionaries in PROC FCMP, and to review
examples, see “Using FCMP Dictionary Objects” in SAS Component Objects:
Reference and Dictionaries: Referencing a New PROC FCMP Data Type.
What is a State?
State is a copy of all the memory items that are relevant to the next task, for
example, scoring. SAVESTATE takes a snapshot of this information:
n Public information (that is, information common to all analytic engines).
Examples of public information include the list of input variables, the list of
output variables, the formats, and other elements.
n Private information (that is, information specific to that particular analysis, for
example, random forests). Examples of private information include the number
of trees, the trees themselves, the scores, and other types.
What is Scoring?
Historic data is where the outcomes are known. You train your model with this
historic data, and then you score the new data by using the input variables. Scoring
new data is predicting the outcome with new data, using the model that is built
using the historic data.
PROC ASTORE can also move analytic stores between the client and the server
and it can provide descriptive information about the analytic store. The syntax is
shown below:
920 Chapter 24 / FCMP Procedure
SCORE
Scores the model.
DESCRIBE
Specifies the name of the analytic store and produces DS2 basic scoring code.
DOWNLOAD
Retrieves from the CAS session the specified analytic store and stores it in the
local file system
UPLOAD
Moves the specified analytic store from the local file system into a data table in
CAS.
Note: See The ASTORE Procedure for more information about PROC ASTORE.
Note: The code for declaring a CMP object is the same for all platforms:
declare object myscore(astore);
Note: In a SAS client, the score() method takes a single parameter file path:
n Windows:
call myscore.score("C:\models\_va_model208");
n UNIX:
call myscore.score("/userid/models/_va_model208");
Note: In CAS, the score() method requires two arguments, the caslib and the CAS
table name:
call myscore.score('CASUSER','_va_model208');
Note: The score object also supports the describe() method, which prints the
input/output variables from the ASTORE to the log. The method takes no
arguments:
call myscore.describe();
PROC ASTORE transports data from CAS into a local analytic store (local file).
Once in that form, the data is used by CMP and the new ASTORE object.
quit;
options pagesize=max;
proc cas;
loadactionset "table";
table.fetch
format=false
maxRows=1
sasTypes=TRUE
924 Chapter 24 / FCMP Procedure
table = {
compOnDemand=TRUE
caslib="CASUSER"
name="hmeq"
compPgm=
"declare object myscore(astore);
call myscore.score('CASUSER','_va_model208');"
singlePass=TRUE
compVars={"_P_", "P__EVENT_0" , "P__EVENT_1" , "I__EVENT_" ,"_WARN_"}
};
run;
quit;
routineCode = "
declare object model_glmstore(astore);
call model_glmstore.setoption('alpha', 0.05);
call model_glmstore.setoption('COMPUTE_CONFIDENCE_LIMIT', 1);
call model_glmstore.score('CASUSER','glmstore');
"
;
run;
quit;
Details
This example shows how to compute a study day during a drug trial by creating a
function in PROC FCMP and using that function in a DATA step.
Program
proc fcmp outlib=sasuser.funcs.trial;
function study_day(intervention_date, event_date);
n=event_date - intervention_date;
if n >= 0 then
n=n + 1;
return(n);
endsub;
options cmplib=sasuser.funcs;
data _null_;
start='15Feb2010'd;
today='27Mar2010'd;
sd=study_day(start, today);
put sd=;
run;
926 Chapter 24 / FCMP Procedure
Program Description
Specify the name of an output package to which the compiled function and CALL
routine are written. The package is stored in the data set Sasuser.Funcs.
proc fcmp outlib=sasuser.funcs.trial;
Use a DATA step IF statement to calculate EVENT_DATE. Use DATA step syntax
to compute the difference between EVENT_DATE and INTERVENTION_DATE. The
days before INTERVENTION_DATE begin at -1 and become smaller. The days after
and including INTERVENTION_DATE begin at 1 and become larger. (This function
never returns 0 for a study date.)
n=event_date - intervention_date;
if n >= 0 then
n=n + 1;
return(n);
endsub;
Use the CMPLIB= system option to specify a SAS data set that contains the
compiler subroutine to include during program compilation.
options cmplib=sasuser.funcs;
Create a DATA step to produce a value for the function STUDY_DAY. The function
uses a start date and today's date to compute the value. STUDY_DAY is called from
the DATA step. When the DATA step encounters a call to STUDY_DAY, it does not
find this function in its traditional library of functions. It searches each of the data
sets that are specified in the CMPLIB system option for a package that contains
STUDY_DAY. In this case, it finds STUDY_DAY in Sasuser.Funcs.Trial.
data _null_;
start='15Feb2010'd;
today='27Mar2010'd;
sd=study_day(start, today);
Log
Example Code 24.5 Results from Creating and Calling a Function from a DATA Step
sd=41
Example 2: Creating and Saving Functions with PROC FCMP 927
Details
This example shows how to use PROC FCMP to create and save the functions that
are used in the example.
Program
proc fcmp outlib=sasuser.exsubs.pkt1;
subroutine calc_years(maturity, current_date, years);
outargs years;
years=(maturity - current_date) / 365.25;
endsub;
function garkhprc(type$, buysell$, amount,E, t, S, rd, rf, sig);
if buysell="Buy" then sign=1.;
else do;
if buysell="Sell" then sign=-1.;
else sign=.;
end;
if type="Call" then
garkhprc=sign * amount * garkhptprc(E, t, S, rd, rf, sig);
else do;
if type="Put" then
garkhprc=sign * amount * garkhptprc(E, t, S, rd, rf, sig);
else garkhprc=.;
end;
return(garkhprc);
run;
endfunc;
928 Chapter 24 / FCMP Procedure
Program Description
Specify the entry where the function package information is saved. The package
is a three-level name.
proc fcmp outlib=sasuser.exsubs.pkt1;
if type="Call" then
garkhprc=sign * amount * garkhptprc(E, t, S, rd, rf, sig);
else do;
if type="Put" then
garkhprc=sign * amount * garkhptprc(E, t, S, rd, rf, sig);
else garkhprc=.;
end;
Execute the FCMP procedure. The RUN statement executes the FCMP procedure.
run;
Log
Example Code 24.6 Location of Functions That Are Saved
Details
The following example uses numeric data as input to the FUNCTION statement of
PROC FCMP.
Program
proc fcmp;
function inverse(in);
if in=0 then inv=.;
else inv=1/in;
return(inv);
endfunc;
run;
Details
The following example uses character data as input to the FUNCTION statement of
PROC FCMP. The output from FUNCTION TEST is assigned a length of 12 bytes.
Program
options cmplib=work.funcs;
if x='yes' then
return('si si si');
else
return('no');
endfunc;
run;
data _null_;
spanish=test('yes');
put spanish=;
run;
Log
Example Code 24.7 Results from Using Character Data with the FUNCTION Statement in
PROC FCMP
spanish=si si si
Details
The following example shows an array that accepts variable arguments. The
example implies that the summation function can be called as follows:
sum=summation(1,2,3,4,5);
Note: When calling this function from a DATA step, you must provide the
VARARGS as an array.
Program
options cmplib=sasuser.funcs;
endfunc;
sum=summation(1, 2, 3, 4, 5);
put sum=;
run;
Details
Here is an example of the SUBROUTINE statement. The SUBROUTINE statement
creates an independent computational block of code that can be used with a CALL
statement.
Program
proc fcmp outlib=sasuser.funcs.temp;
subroutine inverse(in,inv) group="generic";
outargs inv;
if in=0 then inv=.;
else inv=1/in;
endsub;
options cmplib=sasuser.funcs;
data _null_;
x=5;
call inverse(x, y);
put x= y=;
run;
Log
Example Code 24.8 Results from Using the SUBROUTINE Statement in PROC FCMP
x=5 y=0.2
932 Chapter 24 / FCMP Procedure
Details
The following example shows how to use functions in a GTL EVAL function. It
shows how to define functions that define new curve types (oscillate and
oscillateBound). These functions can be used in a GTL EVAL function to compute
new columns that are presented with a seriesplot and bandplot.
Program
proc fcmp outlib=sasuser.funcs.curves;
function oscillate(x,amplitude,frequency);
if amplitude le 0 then amp=1; else amp=amplitude;
if frequency le 0 then freq=1; else freq=frequency;
y=sin(freq*x)*constant("e")**(-amp*x);
return (y);
endfunc;
function oscillateBound(x,amplitude);
if amplitude le 0 then amp=1; else amp=amplitude;
y=constant("e")**(-amp*x);
return(y);
endfunc;
run;
options cmplib=sasuser.funcs;
data range;
do time=0 to 2 by .01;
output;
end;
run;
proc template ;
Example 7: Using Graph Template Language (GTL) with User-Defined Functions 933
Program Description
options cmplib=sasuser.funcs;
data range;
do time=0 to 2 by .01;
output;
end;
run;
Use the TEMPLATE procedure to customize the appearance of your SAS output.
proc template ;
define statgraph damping;
dynamic X AMP FREQ;
begingraph;
934 Chapter 24 / FCMP Procedure
Use the SGRENDER procedure to identify the data set that contains the input
variables and to assign a statgraph template for the output.
proc sgrender data=range template=damping;
dynamic x="Time" amp=10 freq=50 ;
run;
Details
This example shows how to standardize each row of a data set.
Program
data numbers;
drop i j;
array a[5];
do j=1 to 5;
do i=1 to 5;
a[i] = ranuni(12345) * (i+123.234);
end;
output;
end;
run;
%macro standardize;
%let dsname=%sysfunc(dequote(&dsname));
%let colname=%sysfunc(dequote(&colname));
proc standard data=&dsname mean=&MEAN std=&STD out=_out;
var &colname;
run;
data &dsname;
set_out;
run;
%mend standardize;
proc fcmp outlib=sasuser.ds.functions;
subroutine standardize(x[*], mean, std);
outargs x;
rc=write_array('work._TMP_', x, 'x1');
dsname='work._TMP_';
colname='x1';
rc=run_macro('standardize', dsname, colname, mean, std);
array x2[1]_temporary_;
rc=read_array('work._TMP_', x2);
if dim(x2)=dim(x) then do;
do i=1 to dim(x);
x[i]=x2[i];
end;
end;
endsub;
run;
options cmplib=(sasuser.ds);
data numbers2;
set numbers;
array a[5];
936 Chapter 24 / FCMP Procedure
array t[5]_temporary_;
do i=1 to 5;
t[i]=a[i];
end;
call standardize(t, 0, 1);
do i=1 to 5;
a[i]=t[i];
end;
output;
run;
proc print data=work.numbers2;
run;
Program Description
Create a macro to standardize a data set with a given value for mean and std.
%macro standardize;
%let dsname=%sysfunc(dequote(&dsname));
%let colname=%sysfunc(dequote(&colname));
proc standard data=&dsname mean=&MEAN std=&STD out=_out;
var &colname;
run;
data &dsname;
set_out;
run;
%mend standardize;
Use the FCMP function to call WRITE_ARRAY, which writes the data to a data set.
Call RUN_MACRO to standardize the data in the data set. Call WRITE_ARRAY to
write data to a data set. Call READ_ARRAY to read the standardized data back
into the array.
proc fcmp outlib=sasuser.ds.functions;
subroutine standardize(x[*], mean, std);
outargs x;
rc=write_array('work._TMP_', x, 'x1');
dsname='work._TMP_';
colname='x1';
rc=run_macro('standardize', dsname, colname, mean, std);
array x2[1]_temporary_;
Example 8: Standardizing Each Row of a Data Set 937
rc=read_array('work._TMP_', x2);
if dim(x2)=dim(x) then do;
do i=1 to dim(x);
x[i]=x2[i];
end;
end;
endsub;
run;
References
SAS Institute Inc. 2013. SAS Component Objects: Reference. Cary, NC: SAS Institute
Inc.
Henrick, A., D. Erdman, and S. Christian. 2013. “Hashing in PROC FCMP to Enhance
Your Productivity.” Proceedings of the SAS Global Forum 2013 Conference, Cary,
NC. SAS Institute Inc., 1–15. Available at http://support.sas.com/resources/
papers/proceedings13/129-2013.pdf.
939
25
FCMP Special Functions and
Call Routines
Note: You can call special functions directly in a procedure, but not in the DATA
step.
Category Description
Calling SAS Two functions are available that enable you to call SAS code from
Code from within functions. The RUN_MACRO function executes a predefined
within SAS macro. The RUN_SASFILE function executes SAS code from a
Functions fileref that you specify.
C Helper Several helper functions are provided with the package to handle C-
language constructs in PROC FCMP. Most C-language constructs
must be defined in a package that is created by PROC PROTO before
the constructs can be referenced or used by PROC FCMP. The
ISNULL function and the SETNULL and STRUCTINDEX CALL
routines have been added to extend the SAS language to handle C-
language constructs that do not naturally fit into the SAS language.
Arrays PROC FCMP provides the READ_ARRAY function to read arrays, and
the WRITE_ARRAY function to write arrays to a data set. This
functionality enables PROC FCMP array data to be processed by
SAS programs, macros, and procedures.
Functions and CALL Routines by Category 941
Category Description
Matrix The FCMP procedure provides you with a number of CALL routines
Operations for performing simple matrix operations on declared arrays. These
CALL routines are automatically provided by the FCMP procedure.
With the exception of ZEROMATRIX, FILLMATRIX, and IDENTITY,
the CALL routines listed below do not support matrices or arrays
that contain missing values.
READ_ARRAY Function (p. Reads data from a SAS data set into a PROC FCMP array
969) variable.
WRITE_ARRAY Function (p. Writes data from a PROC FCMP array variable to a data set
982) that can then be used by SAS programs, macros, and
procedures.
C Helper CALL SETNULL Routine (p. Sets a pointer element of a structure to null.
955)
ISNULL Function (p. 964) Determines whether a pointer element of a structure is null.
Calling SAS Code RUN_MACRO Function (p. Executes a predefined SAS macro.
from within 971)
Functions
RUN_SASFILE Function (p. Executes SAS code in a fileref that you specify.
976)
Compute Implicit SOLVE Function (p. 977) Computes implicit values of a function using the Gauss-
Values Newton method.
CALL CHOL Routine (p. Calculates the Cholesky decomposition for a given
943) symmetric matrix.
CALL DET Routine (p. 945) Calculates the determinant of a specified matrix that
should be square.
942 Chapter 25 / FCMP Special Functions and Call Routines
CALL EXPMATRIX Routine Returns a matrix etA given the input matrix A and a
(p. 949) multiplier t.
CALL FILLMATRIX Routine Replaces all of the element values of the input matrix with
(p. 950) the specified value.
CALL IDENTITY Routine (p. Converts the input matrix to an identity matrix.
951)
CALL INV Routine (p. 952) Calculates a matrix that is the inverse of the provided input
matrix that should be a square, non-singular matrix.
CALL MULT Routine (p. Calculates the multiplicative product of two input matrices.
953)
CALL POWER Routine (p. Raises a square matrix to a given scalar value.
954)
CALL ZEROMATRIX Routine Replaces all of the element values of the numeric input
(p. 959) matrix with 0.
Special Purpose INVCDF Function (p. 960) Computes the quantile from any distribution for which you
Functions have defined a cumulative distribution function (CDF).
LIMMOMENT Function (p. Computes the limited moment of any distribution for which
965) you have defined a cumulative distribution function (CDF).
Dictionary
Syntax
CALL ADDMATRIX(X, Y, Z);
Required Arguments
X
specifies an input matrix with dimensions m x n (that is, X[m, n]) or a scalar.
Y
specifies an input matrix with dimensions m x n (that is, Y[m, n]) or a scalar.
Z
specifies an output matrix with dimensions m x n (that is, Z[m, n]), such that
Z = X+Y
Example
The following example uses the ADDMATRIX CALL routine:
proc fcmp;
array mat1[3,2] (0.3, -0.78, -0.82, 0.54, 1.74, 1.2);
array mat2[3,2] (0.2, 0.38, -0.12, 0.98, 2, 5.2);
array result[3,2];
call addmatrix(mat1, mat2, result);
call addmatrix(2, mat1, result);
put result=;
quit;
Requirement: Both input and output matrices must be square and have the same dimensions. X
must be symmetric positive-definite, and Y a lower triangle matrix.
Syntax
CALL CHOL(X, Y <, validate>);
Required Arguments
X
specifies a symmetric positive-definite input matrix with dimensions m x m
(that is, X[m, m]).
Y
specifies an output matrix with dimensions m x m (that is, Y[m, m]). This
variable contains the Cholesky decomposition, such that
Z = YY*
where Y is a lower triangular matrix with strictly positive diagonal entries and Y*
denotes the conjugate transpose of Y.
Optional Argument
validate
specifies an optional argument that can increase the processing speed by
avoiding error checking. The argument can take the following values:
0 the matrix X checks for symmetry. This is the default if the validate
argument is omitted.
1 the matrix is assumed to be symmetric.
Example
The following example uses the CHOL CALL routine:
proc fcmp;
array xx[3,3] 2 2 3 2 4 2 3 2 6;
array yy[3,3];
call chol(xx, yy, 0);
do i=1 to 3;
put yy[i, 1] yy[i, 2] yy[i, 3];
end;
CALL DET Routine 945
run;
Output 25.2 Results from PROC FCMP and the CHOL CALL Routine
1.4142135624 0 0
1.4142135624 1.4142135624 0
2.1213203436 -0.707106781 1
Syntax
CALL DET(X, a);
Required Arguments
X
specifies an input matrix with dimensions m x n (that is, X[m, n]).
a
specifies the returned determinate value, such that
a= X
Details
The determinant, the product of the eigenvalues, is a single numeric value. If the
determinant of a matrix is zero, then that matrix is singular (that is, it does not have
an inverse). The method performs an LU decomposition and collects the product of
the diagonals (Forsythe, Malcolm, and Moler 1967). For more information, see the
SAS/IML User's Guide.
946 Chapter 25 / FCMP Special Functions and Call Routines
Example
The following example uses the DET CALL routine:
options pageno=1 nodate;
proc fcmp;
array mat1[3,3] (.03, -0.78, -0.82, 0.54, 1.74,
1.2, -1.3, 0.25, 1.49);
call det(mat1, result);
put result=;
quit;
result=-0.052374
Category: Array
Syntax
CALL DYNAMIC_ARRAY(array–name, new-dimension1–size <, new-dimension2–
size, ...>);
Required Arguments
array-name
specifies the name of a temporary array.
new-dimension-size
specifies a new size for the temporary array.
Details
Arrays that are declared in functions and CALL routines can be resized, as well as
arrays that are declared with the /NOSYMBOLS option. No other array can be
resized.
CALL DYNAMIC_ARRAY Routine 947
The DYNAMIC_ARRAY CALL routine is passed the array to be resized and a new
size for each dimension of the array. In the ALLPERMK routine, a scratch array that
is the size of the number of elements being permuted is needed. When the function
is created, this value is not known because it is passed in as parameter n. A
dynamic array enables the routine to allocate the amount of memory that is
needed, instead of having to create an array that is large enough to handle all
possible cases.
When using dynamic arrays, support is limited to PROC FCMP routines. When an
array is resized, the resized array is available only within the routine that resized it.
It is not possible to resize a DATA step array or to return a PROC FCMP dynamic
array to the DATA step.
Example
The following example creates a temporary array named TEMP. The size of the
array area depends on parameters that are passed to the function.
proc fcmp;
function avedev_wacky(data[*]);
length=dim(data);
array temp[1] /nosymbols;
call dynamic_array(temp, length);
mean=0;
do i=1 to length;
mean += data[i];
if i>1 then temp[i]=data[i-1];
else temp[i]=0;
end;
mean=mean/length;
avedev=0;
do i=1 to length;
avedev += abs((data[i])-temp[i] /2-mean);
end;
avedev=avedev/length;
return(avedev);
endsub;
array data[10];
do i = 1 to 10;
data[i] = i;
end;
948 Chapter 25 / FCMP Special Functions and Call Routines
avedev = avedev_wacky(data);
run;
Syntax
CALL ELEMMULT(X, Y, Z);
Required Arguments
X
specifies an input matrix with dimensions m x n (that is, X[m, n]).
Y
specifies an input matrix with dimensions m x n (that is, Y[m, n]).
Z
specifies an output matrix with dimensions m x n (that is, Z[m, n]).
Example
The following example uses the ELEMMULT CALL routine:
options pageno=1 nodate;
proc fcmp;
array mat1[3,2] (0.3, -0.78, -0.82, 0.54, 1.74, 1.2);
array mat2[3,2] (0.2, 0.38, -0.12, 0.98, 2, 5.2);
array result[3,2];
call elemmult(mat1, mat2, result);
call elemmult(2.5, mat1, result);
put result=;
quit;
CALL EXPMATRIX Routine 949
Syntax
CALL EXPMATRIX(X, t, Y);
Required Arguments
X
specifies an input matrix with dimensions m x m (that is, X[m, m]).
t
specifies a double scalar value.
Y
specifies an output matrix with dimensions m x m (that is, Y[m, m]), such that
Y = εtX
Details
The EXPMATRIX CALL routine uses a Padé approximation algorithm as presented
in Golub and van Loan (1989), p. 558. Note that this module does not exponentiate
each entry of a matrix. For more information, see the EXPMATRIX documentation
in the SAS/IML User's Guide.
950 Chapter 25 / FCMP Special Functions and Call Routines
Example
The following example uses the EXPMATRIX CALL routine:
options pageno=1 nodate;
proc fcmp;
array mat1[3,3] (0.3, -0.78, -0.82, 0.54, 1.74,
1.2, -1.3, 0.25, 1.49);
array result[3,3];
call expmatrix(mat1, 3, result);
put result=;
quit;
Syntax
CALL FILLMATRIX(X, Y);
Required Arguments
X
specifies an input numeric matrix.
Y
specifies the numeric value that fills the matrix.
CALL IDENTITY Routine 951
Example
The following example uses the FILLMATRIX CALL routine.
options pageno=1 nodate ls=80 ps=64;
proc fcmp;
array mat1[3, 2] (0.3, -0.78, -0.82, 0.54, 1.74, 1.2);
call fillmatrix(mat1, 99);
put mat1=;
quit;
mat1[1, 1]=99 mat1[1, 2]=99 mat1[2, 1]=99 mat1[2, 2]=99 mat1[3, 1]=99
mat1[3, 2]=99
Syntax
CALL IDENTITY(X);
Required Argument
X
specifies an input matrix with dimensions m x m (that is, X[m, m]).
Example
The following example uses the IDENTITY CALL routine:
options pageno=1 nodate;
952 Chapter 25 / FCMP Special Functions and Call Routines
proc fcmp;
array mat1[3,3] (0.3, -0.78, -0.82, 0.54, 1.74, 1.2,
-1.3, 0.25, 1.49);
call identity(mat1);
put mat1=;
quit;
mat1[1, 1]=1 mat1[1, 2]=0 mat1[1, 3]=0 mat1[2, 1]=0 mat1[2, 2]=1 mat1[2, 3]=0
mat1[3, 1]=0 mat1[3, 2]=0 mat1[3, 3]=1
Syntax
CALL INV(X, Y);
Required Arguments
X
specifies an input matrix with dimensions m x m (that is, X[m, m]).
Y
specifies an output matrix with dimensions m x m (that is, Y[m, m]), such that
Y m, m = X′ m, m
Example
The following example uses the INV CALL routine:
options pageno=1 nodate;
proc fcmp;
array mat1[3,3] (0.3, -0.78, -0.82, 0.54, 1.74,
1.2, -1.3, 0.25, 1.49);
array result[3,3];
call inv(mat1, result);
put result=;
quit;
Syntax
CALL MULT(X, Y, Z);
Required Arguments
X
specifies an input matrix with dimensions m x n (that is, X[m, n]).
Y
specifies an input matrix with dimensions n x p (that is, Y[n, p]).
Z
specifies an output matrix with dimensions m x p (that is, Z[m, p]), such that
Z m, p = X m, n × Y n, p
954 Chapter 25 / FCMP Special Functions and Call Routines
Example
The following example uses the MULT CALL routine:
options pageno=1 nodate;
proc fcmp;
array mat1[2,3] (0.3, -0.78, -0.82, 0.54, 1.74, 1.2);
array mat2[3,2] (1, 0, 0, 1, 1, 0);
array result[2,2];
call mult(mat1, mat2, result);
put result=;
quit;
Syntax
CALL POWER(X, a, Y);
Required Arguments
X
specifies an input matrix with dimensions m x m (that is, X[m, m]).
a
specifies an integer scalar value (power).
Y
specifies an output matrix with dimensions m x m (that is, Y[m, m]), such that
Y = Xa
CALL SETNULL Routine 955
Details
If the scalar is not an integer, it is truncated to an integer. If the scalar is less than 0,
then it is changed to 0. For more information, see the SAS/IML User's Guide.
Example
The following example uses the POWER CALL routine:
options pageno=1 nodate;
proc fcmp;
array mat1[3,3] (0.3, -0.78, -0.82, 0.54, 1.74,
1.2, -1.3, 0.25, 1.49);
array result[3,3];
call power(mat1, 3, result);
put result=;
quit;
Category: C Helper
Syntax
CALL SETNULL(pointer-element);
Required Argument
pointer-element
is a pointer to a structure.
956 Chapter 25 / FCMP Special Functions and Call Routines
Example
The following example assumes that the same LINKLIST structure that is described
in “Example 1: Generating a Linked List” on page 964 is defined using PROC
PROTO. The CALL SETNULL routine can be used to set the NEXT element to null:
struct linklist list;
call setnull(list.next);
Category: C Helper
Syntax
CALL STRUCTINDEX(structure-array, index, structure-element);
Required Arguments
structure-array
specifies an array.
index
is a 1–based index as used in most SAS arrays.
structure-element
points to an element in the array.
Example
In the first part of this example, the following structures and function are defined
by using PROC PROTO.
proc proto package=sasuser.mylib.str2;
struct point{
short s;
int i;
long l;
double d;
};
struct point_array {
int length;
struct point p[2];
char name[32];
};
CALL SUBTRACTMATRIX Routine 957
run;
In the second part of this example, the PROC FCMP code segment shows how to
use the STRUCTINDEX CALL routine to retrieve and set each point structure
element of an array called P in the POINT_ARRAY structure:
options pageno=1 nodate ls=80 ps=64;
pntarray.length=2;
pntarray.name="My funny structure";
/* Get each element using the STRUCTINDEX CALL routine and set
values. */
do i=1 to 2;
call structindex(pntarray.p, i, pnt);
put "Before setting the" i "element: " pnt=;
pnt.s=1;
pnt.i=2;
pnt.l=3;
pnt.d=4.5;
put "After setting the" i "element: " pnt=;
end;
run;
Syntax
CALL SUBTRACTMATRIX(X, Y, Z);
958 Chapter 25 / FCMP Special Functions and Call Routines
Required Arguments
X
specifies an input matrix with dimensions m x n (that is, X[m, n]) or a scalar.
Y
specifies an input matrix with dimensions m x n (that is, Y[m, n]) or a scalar.
Z
specifies an output matrix with dimensions m x n (that is, Z[m, n]), such that
Z = X−Y
Example
The following example uses the SUBTRACTMATRIX CALL routine:
options pageno=1 nodate;
proc fcmp;
array mat1[3,2] (0.3, -0.78, -0.82, 0.54, 1.74, 1.2);
array mat2[3,2] (0.2, 0.38, -0.12, 0.98, 2, 5.2);
array result[3,2];
call subtractmatrix(mat1, mat2, result);
call subtractmatrix(2, mat1, result);
put result=;
quit;
Syntax
CALL TRANSPOSE(X, Y);
CALL ZEROMATRIX Routine 959
Required Arguments
X
specifies an input matrix with dimensions m x n (that is, X[m, n]).
Y
specifies an output matrix with dimensions n x m (that is, Y[n, m])
Details
Y = X′
Note that the number of rows for the input matrix should be equal to the number of
columns of the output matrix, and the number of rows for the output matrix should
be equal to the number of columns of the input matrix.
Example
The following example uses the TRANSPOSE CALL routine:
options pageno=1 nodate;
proc fcmp;
array mat1[3,2] (0.3, -0.78, -0.82, 0.54, 1.74, 1.2);
array result[2,3];
call transpose(mat1, result);
put result=;
quit;
Syntax
CALL ZEROMATRIX(X);
Required Argument
X
specifies a numeric input matrix.
Example
The following example uses the ZEROMATRIX CALL routine:
options pageno=1 nodate;
proc fcmp;
array mat1[3,2] (0.3, -0.78, -0.82, 0.54, 1.74, 1.2);
call zeromatrix(mat1);
put mat1=;
quit;
mat1[1, 1]=0 mat1[1, 2]=0 mat1[2, 1]=0 mat1[2, 2]=0 mat1[3, 1]=0 mat1[3,
2]=0
INVCDF Function
Computes the quantile from any distribution for which you have defined a cumulative distribution
function (CDF).
Syntax
quantile=INVCDF('CDF-function-name', options-array, cumulative-probability,
parameter-1 <, parameter-2, ...>);
Required Arguments
quantile
specifies the quantile that is returned from the INVCDF function.
'CDF-function-name'
specifies the name of the CDF function. Enclose CDF-function-name in
quotation marks.
endsub;
options-array
specifies an array of options to use with the INVCDF function. Options-array is
used to control and monitor the process of inverting the CDF. Options-array can
be a missing value (.), or it can have up to four of the following elements in the
following order:
initial-value
specifies the initial guess for the quantile at which the inversion process
starts. This is useful when you have an idea of the approximate value for
quantile (for example, from the empirical estimate of the CDF).
Default 0.1
desired-accuracy
specifies the desired relative accuracy of the quantile. You can specify any
value in the range (0,0.1). If you specify a smaller value, the result is a more
accurate estimate of the quantile, but it might take longer to invert the CDF.
Default 1.0e-8
domain-type
specifies the domain for the CDF function. A missing value or a value of 0
indicates a nonnegative support, that is [0,∞). Any other value indicates a
support over the entire real line, that is (-∞,∞).
Default 0
962 Chapter 25 / FCMP Special Functions and Call Routines
return-code
specifies the return status. If options-array is of dimension 4 or more, then
the fourth element contains the return status. Return-code can have one of
the following values:
<=0
indicates success. If negative, then the absolute value is the number of
times the CDF function was evaluated in order to compute the quantile.
A larger absolute value indicates longer convergence time.
1
indicates that the quantile could not be computed.
cumulative-probability
specifies the cumulative probability value for which the quantile is desired.
Range [0,1)
parameter
specifies the parameters of the distribution at which the quantile is desired. You
must specify exactly the same number of parameters as required by the
specified CDF function, and they should appear exactly in the same order as
required by the specified CDF function.
Details
The INVCDF function finds the quantile for the specified cumulative probability
from a distribution whose cumulative distribution function is specified by the CDF-
function-name argument. In other words, it inverts the CDF function such that the
following expression is true:
cumulative-probability = CDF-function-name(quantile,<parameters>)
If ε denotes the desired accuracy of the quantile for cumulative probability p, then
INVCDF attempts to compute the quantile q such that |p − F(q) | < εp, where F(x)
denotes the CDF evaluated at x.
You can control the inversion process with various options. Here is an example of
an options array:
array opts[4] initial epsilon support (1.5 1.0e-6 0);
epsilon(desired-accuracy)=1.0e-6
support(domain-type)=0
You can examine the return status of the function by checking opts[4].
INVCDF Function 963
Comparisons
You can regard this function as a generic extension of the QUANTILE function,
which computes quantiles only from specific distributions. The INVCDF function
enables you to compute quantiles from any continuous distribution as long as you
can programmatically define that distribution’s CDF function. Unlike the QUANTILE
function, this function cannot be used directly in a DATA step. It only can be used
inside the definition of an FCMP function or subroutine. However, this is not a
limitation because you can invoke the FCMP function that uses it from a DATA
step. See the following example.
The preceding code assumes that you have stored the definition of the EXP_CDF
function in an FCMP library called Work.Mycdf using a PROC FCMP step as follows:
proc fcmp outlib=work.mycdf.functions;
function exp_cdf(x, theta);
return(1.0 - exp(-x/Theta));
endsub;
quit;
Now you can invoke the EXP_QUANTILE function from a DATA step to generate a
random sample from the exponential distribution with a scale parameter (theta)
that has a value of 50. Note that the locations of the EXP_CDF and
EXP_QUANTILE functions need to be specified with the appropriate value for the
CMPLIB= option before you execute the DATA step:
options cmplib=(work.mycdf work.myquantile);
data exp_sample(keep=q);
n=0;k=0;
do k=1 to 500;
if (n=100) then leave;
964 Chapter 25 / FCMP Special Functions and Call Routines
rcode=.;
q=exp_quantile(rand('UNIFORM'), 50, rcode);
if (rcode <= 0) then do;
n=n+1;
output;
end;
end;
run;
ISNULL Function
Determines whether a pointer element of a structure is null.
Category: C Helper
Syntax
numeric-variable = ISNULL (pointer-element);
Required Arguments
numeric-variable
specifies a numeric value.
pointer-element
specifies a variable that contains the address of another variable.
Examples
struct linklist{
double value;
struct linklist * next;
};
externc get_list;
struct linklist * get_list(int len){
int i;
struct linklist * list=0;
list=(struct linklist*)
malloc(len*sizeof(struct linklist));
for (i=0;i<len-1;i++){
list[i].value=i;
list[i].next=&list[i+1];
}
list[i].value=i;
list[i].next=0;
return list;
}
externcend;
run;
do while (^isnull(list.next));
list=list.next;
put list.value=;
end;
run;
list.value=0
list.value=1
list.value=2
LIMMOMENT Function
Computes the limited moment of any distribution for which you have defined a cumulative
distribution function (CDF).
966 Chapter 25 / FCMP Special Functions and Call Routines
Syntax
imom=LIMMOMENT('CDF-function-name', options-array, order, limit,
parameter-1 <, parameter-2, ...>);
Required Arguments
imom
specifies the limited moment that is returned from the LIMMOMENT function.
'CDF-function-name'
specifies the name of the CDF function. Enclose CDF-function-name in
quotation marks.
endsub;
options-array
specifies an array of options to use with the LIMMOMENT function. Options-
array is used to control and monitor the process of numerical integration used to
compute the limited amount. Options-array can be a missing value (.), or it can
have up to four of the following elements in the following order:
desired-accuracy
specifies the desired accuracy of the numerical integration. You can specify
any value in the range (0,0.1). If you specify a smaller value, the result is a
more accurate estimate of the moment, but it takes longer to compute the
desired-accuracy.
Default 1.0e-8
initial-step-size
specifies the step size that is used initially by the numerical integration
process. An increase in the value results in a linear decrease in the number of
times the integrand is evaluated. Typically, using the default value of 1
produces good results.
LIMMOMENT Function 967
Default 1
maximum-iterations
specifies the maximum number of iterations that are used to refine the
integration result in order to achieve the desired accuracy. An increase in this
value results in an exponential increase in the number of times the integrand
is evaluated.
Default 8
return-code
specifies the return status. If options-array is of dimension 4 or more, then
the fourth element contains the return status. Return-code can have one of
the following values:
<=0
indicates success. If negative, then the absolute value is the number of
times the integrand function was evaluated in order to compute the
limited moment. A larger absolute value indicates longer convergence
time.
1
indicates that the limited moment could not be computed.
order
specifies the order of the desired limited moment.
Range [1,10]
limit
specifies the upper limit that is used to compute the desired limited moment.
parameter
specifies the parameters of the distribution at which the limited moment is
desired. You must specify exactly the same number of parameters as required
by the specified CDF function, and they should appear exactly in the same order
as required by the specified CDF function.
Details
Let a random variable X have a probability distribution with probability density
function f(x;θ) and cumulative distribution function F(x;θ), where θ denotes the
parameters of the distribution. For a specified upper limit u, the kth–order limited
moment of this distribution is defined as follows:
u ∞ u
E[(X ∧ u)k] = ∫0 xk f (x)dx + uk∫u f (x)dx = ∫0 xk f (x)dx + uk(1 − F(u))
Because the expression needs only F(x), you need to specify only the CDF function
for the distribution. Limited moments are often used in insurance applications to
compute the maximum amount expected to be paid if the policy limit is set at a
certain value.
You can control the numerical integration process with various options. Here is an
example of an options array:
array opts[4] epsilon initial maxiter (1.0e-5 1 6);
initial(initial-step)=1
maxiter(maximum-iterations)=6
You can examine the return status of the function by checking opts[4].
return(m);
endsub;
quit;
The preceding code assumes that you have stored the definition of the LOGN_CDF
function in an FCMP library called Work.Mycdf using a PROC FCMP step as follows:
proc fcmp outlib=work.mycdf.functions;
function logn_cdf(x, Mu, Sigma);
if (x >= constant('MACEPS')) then do;
z=(log(x) - Mu)/Sigma;
return(CDF('NORMAL',z));
end;
return (0);
endsub;
quit;
READ_ARRAY Function 969
You can now invoke the LOGN_LIMMOMENT function from a DATA step as shown
below. Note that the location of the LOGN_CDF and LOGN_LIMMOMENT functions
must be specified with an appropriate value for the CMPLIB= option before you
execute the DATA step:
options cmplib=(work.mycdf work.mylimmom);
data _null_;
do order=1 to 3;
rcode=.;
m=logn_limmoment(order, 100, 5, 0.5, rcode);
if (rcode > 0) then
put "ERROR: Limited moment could not be computed.";
else
put 'Moment of order ' order ' with limit 100 = ' m;
end;
run;
READ_ARRAY Function
Reads data from a SAS data set into a PROC FCMP array variable.
Category: Array
Syntax
rc = READ_ARRAY(data_set_name, array_variable <, 'column_name_1',
'column_name_2' , ... >);
Required Arguments
rc
is 0 if the function is able to successfully read the data set.
data_set_name
specifies the name of the data set from which the array data is read.
Data_set_name must be a character literal or variable that contains the member
name (libname.memname) of the data set to be read from.
array_variable
specifies the PROC FCMP array variable into which the data is read.
Array_variable must be a local temporary array variable because the function
might need to grow or shrink its size to accommodate the size of the data set.
Optional Argument
column_name
specifies optional names for the specific columns of the data set that are read.
970 Chapter 25 / FCMP Special Functions and Call Routines
Details
When SAS translates between an array and a data set, the array is indexed as
[row,column].
Arrays that are declared in functions and CALL routines can be resized, as well as
arrays that are declared with the /NOSYMBOLS option. No other arrays can be
resized.
The READ_ARRAY function attempts to dynamically resize the array to match the
dimensions of the input data set. This means that the array must be dynamic. That
is, the array must be declared either in a function or CALL routine or declared with
the /NOSYMBOLS option.
Example
This example creates and reads a SAS data set into an FCMP array variable.
options nodate pageno=1;
data account;
input acct price cost;
datalines;
1 2 3
4 5 6
;
run;
proc fcmp;
array x[2,3] / nosymbols;
rc=read_array('account',x);
put x=;
run;
proc fcmp;
array x[2,2] / nosymbols;
rc=read_array('account', x, 'price', 'acct');
put x=;
run;
RUN_MACRO Function 971
x[1, 1]=1 x[1, 2]=2 x[1, 3]=3 x[2, 1]=4 x[2, 2]=5 x[2, 3]=6
RUN_MACRO Function
Executes a predefined SAS macro.
Syntax
rc = RUN_MACRO ('macro_name' <, variable_1, variable_2, ...>);
Required Arguments
rc
is 0 if the function is able to submit the macro. The return code indicates only
that the macro call was attempted. The macro itself should set the value of a
SAS macro variable that corresponds to a PROC FCMP variable to determine
whether the macro executed as expected.
macro_name
specifies the name of the macro to be run.
Optional Argument
variable
specifies optional PROC FCMP variables, which are set by macro variables of
the same name. These arguments must be PROC FCMP double or character
variables.
972 Chapter 25 / FCMP Special Functions and Call Routines
Before SAS executes the macro, SAS macro variables are defined with the same
name and value as the PROC FCMP variables. After SAS executes the macro,
the macro variable values are copied back to the corresponding PROC FCMP
variables.
Examples
data _null_;
a=5.3;
b=0.7;
p=.;
p=subtract_macro(a, b);
put p=;
run;
Example Code 25.1 Results from Executing a Predefined Macro with PROC FCMP
p=4.6
DOSUBL. For more information, see “DOSUBL Function” in SAS Functions and CALL
Routines: Reference.
data _null_;
a=5.3;
b=0.7;
p=.;
p=subtract_macro(a, b);
put p= '(RUN_MACRO function and a DATA step)';
run;
%global a b p;
%put p=&p;
/* The value should not yet be known. */
%let a=5.3;
%let b=0.7;
data _null_;
rc=dosubl('%testmacro');
run;
%put p=&p (DOSUBL function);
Example Code 25.2 Results from the RUN_MACRO and the DOSUBL Functions
n The third section of the program creates the SALARIES data set and divides the
data set into four separate data sets depending on the value of the variable
Department.
n The fourth section of the program writes the results to the output window.
/* Use the DATA step to separate the salaries data set into four
separate */
/* departmental data sets (NAD, DDG, PPD, and
STD). */
data salaries;
input Department $ Name $ WageCategory $ WageRate;
datalines;
BAD Carol Salaried 20000
BAD Beth Salaried 5000
BAD Linda Salaried 7000
BAD Thomas Salaried 9000
BAD Lynne Hourly 230
DDG Jason Hourly 200
DDG Paul Salaried 4000
PPD Kevin Salaried 5500
PPD Amber Hourly 150
PPD Tina Salaried 13000
STD Helen Hourly 200
STD Jim Salaried 8000
RUN_MACRO Function 975
;
run;
Output 25.17 Results for Calling a DATA Step within a DATA Step
Wage Wage
Obs Name Category Rate
Wage Wage
Obs Name Category Rate
Wage Wage
Obs Name Category Rate
Wage Wage
Obs Name Category Rate
RUN_SASFILE Function
Executes SAS code in a fileref that you specify.
Syntax
rc = RUN_SASFILE ('fileref_name' <, variable-1, variable-2, ...>);
Required Arguments
rc
is 0 if the function is able to submit a request to execute the code that
processes the SAS file. The return code indicates only that the call was
attempted.
fileref_name
specifies the name of the SAS fileref that points to the SAS code.
Optional Argument
variable
specifies optional PROC FCMP variables that are set by macro variables of the
same name. These arguments must be PROC FCMP double or character
variables.
Before SAS executes the code that references the SAS file, the SAS macro
variables are defined with the same name and value as the PROC FCMP
variables. After execution, these macro variable values are copied back to the
corresponding PROC FCMP variables.
SOLVE Function 977
Example
The following example is similar to the first example for RUN_MACRO except that
RUN_SASFILE uses a SAS file instead of a predefined macro. This example assumes
that test.sas(a, b, c) is located in the current directory.
/* test.sas(a,b,c) */
data _null_;
call symput('p', &a * &b);
run;
/* Set up a function in PROC FCMP and call it from the DATA step. */
proc fcmp outlib=sasuser.ds.functions;
function subtract_sasfile(a, b);
rc=run_sasfile('myfileref', a, b,
p);
if rc=0 then return(p);
else return(.);
endsub;
run;
options cmplib=(sasuser.ds);
data _null_;
a=5.3;
b=0.7;
p=.;
p=subtract_sasfile(a, b);
put p=;
run;
SOLVE Function
Computes implicit values of a function using the Gauss-Newton method.
Syntax
answer = SOLVE('function-name', options-array, expected-value,
argument-1 <, argument-2, ...>);
978 Chapter 25 / FCMP Special Functions and Call Routines
Required Arguments
answer
specifies the value that is returned from the SOLVE function.
'function-name'
specifies the name of the function. Enclose function-name in quotation marks.
options-array
specifies an array of options to use with the SOLVE function. Options-array is
used to control and monitor the root-finding process. Options-array can be a
missing value (.), or it can have up to five of the following elements in the
following order:
initial-value
specifies the starting value for the implied value. The default for the first call
is 0.001. If the same line of code is executed again, then options-array uses
the previously found implied value.
absolute-criterion
specifies a value for convergence. The absolute value of the difference
between the expected value and the predicted value must be less than the
value of absolute-criterion for convergence.
Default 1.0e–12
relative-criterion
specifies a value for convergence. If the change in the computed implied
value is less than the value of relative-criterion, then convergence is
assumed.
Default 1.0e–6
maximum-iterations
specifies the maximum number of iterations to use to find the solution.
Default 100
solve-status
can be one of the following values:
0 successful.
1 could not decrease the error.
2 could not compute a change vector.
3 maximum number of iterations exceeded.
4 initial objective function is missing.
expected-value
specifies the expected value of the function of interest.
argument
specifies the arguments to pass to the function that is being minimized.
SOLVE Function 979
Details
The SOLVE function finds the value of the specified argument that makes the
expression of the following form equal to zero.
expected-value - function-name
(argument-1, argument-2, ...)
You specify the argument of interest with a missing value (.), which appears in place
of the argument in the parameter list that is shown above. If the SOLVE function
finds the value, then the value that is returned for this function is the implied value.
n abconv (absolute-criterion)=.001
n relconv (relative-criterion)=1.0e-6
n maxiter (maximum-iterations)=100
The solve status is the fifth element in the array. You can display this value by
specifying opts[5] in the output list.
Examples
proc fcmp;
/* define the function */
function inversesqrt(x);
return(1/sqrt(x));
endsub;
y=20;
x=solve("inversesqrt", {.}, y, .);
put x;
run;
980 Chapter 25 / FCMP Special Functions and Call Routines
0.0025
proc fcmp;
function garkhprc(type$, buysell$, amount, E, t, S, rd, rf, sig)
kind=pricing label='FX option pricing';
if type='Call' then
garkhprc=sign*amount*(E+t+S+rd+rf+sig);
else do;
if type='Put' then
garkhprc=sign*amount*(E+t+S+rd+rf+sig);
else garkhprc=.;
end;
return(garkhprc);
endsub;
n PUT statements are used to write the implied volatility (BSVOLTY), the initial
value, and the solve status.
options pageno=1 nodate ls=80 ps=64;
proc fcmp;
opt_price=5;
strike=50;
today='20jul2010'd;
exp='21oct2010'd;
eq_price=50;
intrate=.05;
time=exp - today;
array opts[5] initial abconv relconv maxiter status
(.5 .001 1.0e-6 100 -1);
function blksch(strike, time, eq_price, intrate, volty);
return(blkshclprc(strike, time/365.25,
eq_price, intrate, volty));
endsub;
bsvolty=solve("blksch", opts, opt_price, strike,
time, eq_price, intrate, .);
Note: SAS functions and external C functions cannot be used directly in the
SOLVE function. They must be enclosed in a PROC FCMP function. In this example,
the built-in SAS function BLKSHCLPRC is enclosed in the PROC FCMP function
BLKSCH, and then BLKSCH is called in the SOLVE function.
WRITE_ARRAY Function
Writes data from a PROC FCMP array variable to a data set that can then be used by SAS programs,
macros, and procedures.
Category: Array
Note: When SAS translates between an array and a data set, the array is indexed as [row,
column].
Syntax
rc = WRITE_ARRAY(data_set_name, array_variable <, 'column_name_1',
'column_name_2', ...>);
Required Arguments
rc
is 0 if the function is able to successfully write the data set.
data_set_name
specifies the name of the data set to which the array data is written.
Data_set_name must be a character literal or variable that contains the member
name (libname.memname) of the data set to be created.
array_variable
specifies the PROC FCMP array or matrix variable whose contents are written
to data_set_name.
Optional Argument
column_name
specifies optional names for the columns of the data set that are created.
WRITE_ARRAY Function 983
Examples
proc fcmp;
array x[4,5] (11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43
44 45);
rc=write_array('work.numbers', x);
run;
1 11 12 13 14 15
2 21 22 23 24 25
3 31 32 33 34 35
4 41 42 43 44 45
proc fcmp;
array x[2,3] (1 2 3 4 5 6);
rc=write_array('numbers2', x, 'col1', 'col2', 'col3');
run;
Output 25.21 Results from Using the WRITE_ARRAY Function to Specify Column
Names
1 1 2 3
2 4 5 6
985
26
FCmp Function Editor
With the FCmp Function Editor, you can view functions in a package declaration as
well as create new functions. You can add these new functions to an existing
package, or create a new package declaration.
If you are not working in the Windows operating environment, or if SAS is not
installed locally, then you are prompted for your authorization credentials, which
are your user ID and password.
To open the FCmp Function Editor, select Solutions ð Analysis ð FCmp Function
Editor from the menu in your SAS session. The following dialog box appears:
Figure 26.1 Initial Dialog Box for the FCmp Function Editor
Open the FCmp Function Editor 987
After you enter your user ID and password and click Log On, SAS establishes a
connection to a port. A window that displays your libraries appears:
In the window above, you can see that the left pane lists the functions that are in
the SASHELP and SASUSER libraries. The WORK library is empty. You cannot
access the WORK library directly from a spawning SAS session. The FCmp Function
Editor remaps the WORK library from the spawning SAS session to the location of
OLD_WORK so that you can access the contents of WORK from OLD_WORK.
988 Chapter 26 / FCmp Function Editor
Open a Function
To open a function, select a library from the left pane, expand the library, and drill
down until a list of functions appears. Double-click the name of the function that
you want to open.
If you open a function from a read-only library, a window similar to the following
appears:
In the window above, the AMORLINC_SLK function is selected from the read-only
SASHELP library. Use the scroll bar to scroll to the top of the function.
Working with Existing Functions 989
If you open a function from a library to which you have Write access, a window
similar to the following appears:
You can see that there is a difference in the windows that appear depending on
whether the library has Read-Only access or Write access. If the library has Write
access, you can enter information in the top section of the window that you are
viewing. These fields are the same fields that you use when you create a new
function. For a description of the fields, see “Creating a New Function” on page
993.
The upper right corner of the FCmp Function Editor contains a field called Open
Views. Click the arrow to list the functions that are open. When you select a
function, the window for that function is brought to the foreground.
Two icons that you can use to alter the display of your functions are located to the
left of the Open Views field:
Move a Function
You can move a function to a different library, data set, or package. To move a
function, select a function in the left pane. Right-click the function, and select
Move from the menu. The following dialog box appears:
In the Move Function dialog box, you can perform the following tasks:
n enter a new name for the function
The descriptions of the fields in the Move Function dialog box are listed below:
Name
specifies the new name for the function.
Library
specifies the library that contains the function that you move. Use the menu in
the Library field to select a library.
Data Set
specifies the data set that contains the function that you move. Enter the name
of the data set, or click the down arrow in the Data Set field to select a data set.
Working with Existing Functions 991
If you do not choose a data set, then the value in this field defaults to
FUNCTIONS.
Package
specifies the name of the package that contains the new function that you
move. Enter the name of the package, or click the down arrow in the Package
field to select a package. If you do not choose a package, then the value in this
field defaults to PACKAGE.
When you click OK, the following dialog box appears, cautioning you about the
move:
CAUTION
Other functions and macros that reference the function that you want to move
is not updated with the new function location. This situation can cause
referencing objects such as macros to be out of synchronization.
Close a Function
When you right-click the function name in the left pane and select Close, the
window that displays that function closes. You can also close the function by
clicking OK in the bottom right corner of the window that displays the function.
Duplicate a Function
You can duplicate (copy) a function that you are viewing to an existing or new
package or library to which you have Write access. To duplicate a function, select
the function in the left pane. Right-click the function and select Duplicate from the
menu. The following dialog box appears:
992 Chapter 26 / FCmp Function Editor
The fields in this dialog box automatically display the function name, library, data
set, and package of the function that you want to duplicate. You can change these
fields when you duplicate the function.
Rename a Function
Use the Rename dialog box to rename a function within a given package. You must
have Write access to the library that contains the function. When you rename a
function, the new function resides in the same library as the original function.
To rename a function, select the function in the left pane. Right-click the function
and select Rename from the menu. Enter the new name of the function and click
OK.
CAUTION
Rename enables you to rename a function within a given package. Just as
with moving a function, the renaming of a function does not modify
dependent macros and other entities.
Delete a Function
You can delete a function from a library to which you have Write access. To delete a
function, select the function that you want to delete. Right-click the function and
select Delete from the menu. The following dialog box appears, cautioning you
about the impact that Delete has on other items:
Creating a New Function 993
The upper right corner of the window contains two buttons: Function and
Subroutine. Click one of the buttons depending on whether you want to create a
new function or a new subroutine.
Package
specifies the name of the package that contains the new function or subroutine.
Enter the name of the package, or click the down arrow in the Package field to
select a package. The Package field is a required field. If you do not specify a
value, the value in this field defaults to PACKAGE.
Kind
enables you to group functions or subroutines within a given package. Four
predefined kind groupings are available and are typically used with SAS Risk
Management:
n Project
n Instrument Pricing
n Instrument Input
You can use one of these four groupings, or enter your own kind value in the
Kind field. The function tree in the left pane groups the functions in a package
into their kind grouping, if you specified a value for Kind.
Include Libraries
specifies libraries that contain SAS code that you want to include in your
function or subroutine.
Input Parameters
specifies the arguments that you use as input to the function or subroutine.
Variable Parameter List
specifies whether the function or subroutine supports a variable number of
arguments.
Return Type
specifies whether the function or subroutine returns a character or numeric
value.
Function Body
is the area in the window in which you code your function or subroutine.
Three buttons are located at the bottom left of the newElement window:
Details
provides you with an area in which to write descriptive information (name of the
new function, list of include libraries, input parameters, and so on) about your
function or subroutine. You code your new function or subroutine in the
Function Body section. The Details tab is selected by default.
SAS Code
enables you to view the function or subroutine that you have written. The SAS
Code selection provides read-only capabilities.
Check Syntax
enables you to check the syntax for the code that you have written. If the
syntax is correct, a dialog box appears, stating that the syntax is correct. If the
syntax contains an error, a dialog box appears that describes the error. An error
message also appears in the lower left bar of the window. Syntax errors are
written to the log, which you can access from the View ð Show Log menu.
996 Chapter 26 / FCmp Function Editor
When you enter information in the descriptive portion of the Details tab, as well as
in the Function Body section, the information is converted to SAS code that you
can see when you select the SAS Code button.
Log Window
To display the Log window, select View ð Show Log from the menu. When you
display the Log window, you can view system, application, and program results by
selecting the tabs that are located in the upper left corner of the window.
Click the SAS tab to view the contents of the SAS log. The content of the log
represents output from the SAS server. In addition, commands that are sent to SAS
are also present to add context to the log output.
The following display shows the Log window with a SAS log displayed:
Viewing the Log Window, Function Browser, and Data Explorer 997
The System window contains two vertical tabs that are located in the upper right
section of the window. These tabs provide information about messages that might
be of interest:
System.out
displays system output if messages are routed to this location.
System.err
displays error messages if the messages are routed to this location.
The Log window contains three buttons that are located at the bottom right of the
window:
Save Log
saves the log output to a file that you choose.
998 Chapter 26 / FCmp Function Editor
Clear One
clears the results in the active window.
Clear Every
clears the results in all three of the windows.
The Find button is located at the bottom left of the window. This button opens a
dialog box that enables you to search your output. For example, searching for
ERROR when the SAS tab is selected enables you to quickly find errors in the SAS
log.
Function Browser
The Function Browser displays all of the functions that are listed in the left pane of
the window. You can filter this list of functions to display a subset of the functions.
To display the Function Browser, select View ð Show Function Browser from the
menu. A window similar to the following appears:
The partial output that is displayed above shows the functions in the application
tree. You can filter the output and create a subset of the functions by entering your
criteria in the Function Browser fields that are located above the list of functions.
These fields are Library Name, Data Set Name, Package Name, and Function
Name.
In the following display, the Package Name field is used as the filter. When you
press the OK button that is located in the bottom right corner of the window, or if
you press the Find button that is located in the upper right corner, the following
window appears:
Viewing the Log Window, Function Browser, and Data Explorer 999
The math functions that are listed are a subset of all of the functions.
You can enter information in the fields that you choose, depending on your filter
criteria. For example, if you enter a value, such as SASHELP, in the Library Name
field, then all of the functions that are in the SASHELP library appear.
Data Explorer
The Data Explorer enables you to view the data in a data set that you select.
To display the Data Explorer, select View ð Show Data Explorer from the menu. A
window similar to the following appears:
1000 Chapter 26 / FCmp Function Editor
The Data Explorer window displays data set information based on the data set you
select from the left pane.
By clicking the column headings, you can move the columns to reposition them in
the display. When you click OK in the lower right section of the window, the
changes that you made are saved.
27
FEDSQL Procedure
The FedSQL language is the SAS implementation of the ANSI SQL:1999 core
standard. It provides support for extended data types, such as DECIMAL, INTEGER,
and VARCHAR, and other ANSI 1999 core compliance features and proprietary
extensions. FedSQL provides data access technology that brings a scalable,
threaded, high-performance way to access, manage, and share relational data in
multiple data sources. Beginning in April 2019, it also supports some non-relational
data sources. When possible, FedSQL queries are optimized with multithreaded
algorithms to resolve large-scale operations.
For applications, FedSQL provides a common SQL syntax across all of the data
sources that it supports. FedSQL is a vendor-neutral SQL dialect. Applications can
submit the same FedSQL queries to all FedSQL data sources instead of having to
submit queries in the SQL dialect that is specific to the data source. In addition, a
single FedSQL query can target data in several data sources and return a single
result set.
Using the FEDSQL procedure, you can submit FedSQL language statements to SAS
and third-party data sources that are accessed with SAS and SAS/ACCESS library
engines. Or, if you have SAS Cloud Analytic Services (CAS) configured, you can
submit FedSQL language statements to the CAS server.
To execute FedSQL statements on the CAS server, specify the SESSREF= (or
SESSUUID=) connection option with a CAS session name in the procedure
statement. Either load tables into your CAS session using other tools and query the
tables with PROC FEDSQL, or set an active caslib for your CAS session and query
tables from the caslib by name. When you query unloaded tables from the active
caslib, FedSQL passes requests that are eligible for implicit pass-through to the
data source for processing or dynamically loads the external data into your CAS
session for processing.
Note: Do not use a CAS engine libref with PROC FEDSQL. When SESSREF= (or
SESSUUID=) is specified, PROC FEDSQL makes a direct connection to the CAS
Concepts: FEDSQL Procedure 1003
server. The procedure does not need the CAS engine. PROC FEDSQL silently
passes requests to the fedSQL.execDirect action and the action executes your
program on the CAS server.
The FedSQL language supports limited functionality in CAS. For an overview of the
FedSQL functionality that is available on the CAS server, see “Using FedSQL in SAS
Cloud Analytic Services” on page 1021.
For information about the FedSQL language statements that are supported for data
accessed with SAS library engines, see SAS FedSQL Language Reference.
Benefits of FedSQL
FedSQL provides the following features that are not provided in the SAS SQL
procedure.
n FedSQL conforms to the SQL 1999 ANSI standard. This conformance allows it to
process queries in its own language as well as the native languages of other
DBMSs that conform to the ANSI 1999 standard.
n FedSQL supports many more data types than previous SAS SQL
implementations. Traditional DBMS access through SAS/ACCESS LIBNAME
engines translate target DBMS data types to and from two legacy SAS data
types, which are SAS numeric and SAS character. When FedSQL connects to a
DBMS with a libref, the language matches or translates the target data source’s
definition to the FedSQL data types, as appropriate, which allows greater
precision. When FedSQL connects to a DBMS with a caslib, the language
translates the data source’s definition to native CAS data types.
n FedSQL handles federated queries, which access data from multiple data
sources and returns a single result set. A federated query is the ability to
communicate with more than one data source, access that data, and perform
operations against that data. FedSQL also has the ability to break apart a single
SQL query that is connected to multiple databases and send the parts down to
the individual databases.
n FedSQL supports implicit SQL pass-through. That is, FedSQL statements are
translated into data source-specific code internally so that they can be passed
directly to the data source for processing. There is no need to submit requests in
the native SQL of each data source to get the benefits of native processing.
FedSQL implicit pass-through improves query response time and enhances
security.
1004 Chapter 27 / FEDSQL Procedure
For more information about the FedSQL functionality for SAS libraries, see SAS
FedSQL Language Reference. For information about FedSQL functionality when you
target the CAS server, see SAS Viya: FedSQL Programming for SAS Cloud Analytic
Services.
Table 27.1 Data Sources for Which FedSQL Supports SAS Library Access
DB2 * * DB2
Greenplum * *
MySQL * *
Netezza * *
ODBC-compliant * *
databases
Oracle * *
1006 Chapter 27 / FEDSQL Procedure
SAP * * Read-only
SAP IQ * *
SAS Scalable * *
Performance Data
(SPD) Engine data
sets
Spark * SAS/ACCESS to
Spark is available on
Linux only in SAS
9.4M7.
Teradata * * Teradata
Yellowbrick * SAS/ACCESS to
Yellowbrick is limited
availability for SAS
9.4 only, starting in
SAS 9.4M7.
For information about the statements and data types supported for each data
source through a SAS library, see SAS FedSQL Language Reference.
When you submit FedSQL language statements to CAS, the procedure reads data
from files and in-memory tables and creates CAS session tables. You can preload
data into your CAS session with PROC CASUTIL. Or you can use a caslib to
dynamically load data into your CAS session. A caslib uses a SAS Viya Data
Connector (or SAS Viya Data Connector Accelerator) to access data from a
corresponding data source for processing in CAS. For information about available
SAS Viya Data Connectors, see SAS Cloud Analytic Services: User’s Guide.
FedSQL has different functionality on the CAS server than it has in a SAS library.
For more information, see “Using FedSQL in SAS Cloud Analytic Services” on page
1021.
For information about how to connect to supported data sources with PROC
FEDSQL, see “Data Source Connection” on page 1017.
PROC FEDSQL Specify that the subsequent input is FedSQL Ex. 1, Ex. 2,
language statements. Ex. 3, Ex. 4,
Ex. 5, Ex. 6,
Ex. 7, Ex. 8
Requirement: Follow the PROC FEDSQL statement with one or more FedSQL language
statements. See SAS FedSQL Language Reference for information about FedSQL
statements that are supported in a SAS library. See SAS Viya: FedSQL Programming
for SAS Cloud Analytic Services for information about FedSQL statements that are
supported in a CAS session.
Interactions: When creating tables with PROC FEDSQL, note that by default, you cannot
overwrite an existing table. You must destroy the table with the DROP TABLE
statement first. If you specify the DROP TABLE statement in the same procedure
execution as the CREATE TABLE statement, you must specify the FORCE
statement option. For more information, see “DROP TABLE Statement” in SAS
FedSQL Language Reference. In CAS, you can overwrite an existing table by
specifying the REPLACE= table option in the CREATE TABLE statement. For more
information, see “REPLACE= Table Option” in SAS Viya: FedSQL Programming for
SAS Cloud Analytic Services.
The procedure processes nonexistent values as SAS missing values by default. In a
SAS library, you can specify ANSIMODE to request that nonexistent values are
processed as ANSI SQL null values.
Note: As of September 2023, Microsoft Azure Active Directory (Azure AD) was renamed
to Microsoft Entra ID.
Examples: “Example 1: Creating a SAS Data Set” on page 1027
“Example 2: Joining Tables from Multiple SAS Libraries” on page 1031
“Example 3: Querying Data Using a Correlated Subquery” on page 1033
“Example 4: Creating and Using a DBMS Index to Perform a Join” on page 1034
“Example 5: Using a DS2 Package in an Expression” on page 1036
“Example 6: Querying Data in CAS” on page 1038
“Example 7: Explicitly Loading and Joining Tables in CAS” on page 1041
“Example 8: Joining Tables from Multiple CAS Libraries” on page 1046
Syntax
PROC FEDSQL <connection-option><processing-options>;
restricts the default data source connection to the specified libref(s). All
other librefs are ignored.
SESSREF=session-name
specifies to run the FedSQL statements in a CAS session. The CAS
session is identified by its session name.
SESSUUID="session-uuid"
specifies to run the FedSQL statements in a CAS session. The CAS
session is identified by its universally unique identifier (UUID).
General Processing
_METHOD
prints a brief text description of the FedSQL query plan.
_POSTOPTPLAN
prints an XML tree illustrating the FedSQL query plan.
ANSIMODE
specifies that nonexistent values in CHAR and DOUBLE columns are
processed as ANSI SQL null values.
AUTOCOMMIT | NOAUTOCOMMIT
specifies whether updates are automatically committed (that is, saved to
a table) after a default number of rows are updated, and whether rollback
is available.
CNTL=(parameter)
specifies optional control parameters for FedSQL query planning and
query execution in CAS.
ERRORSTOP | NOERRORSTOP
specifies whether the procedure stops executing if it encounters an error.
EXEC | NOEXEC
specifies whether a statement should be executed after its syntax is
checked for accuracy.
LABEL | NOLABEL
specifies whether to use the column label or the column name as the
column heading.
MEMSIZE=n | nM | nG
specifies a limit for the amount of memory that is used for an underlying
query (such as a SELECT statement), so that allocated memory is
available to support other PROC FEDSQL operations.
NOPRINT
suppresses the normal display of results.
NUMBER
specifies to include a column named Row, which is the row (observation)
number of the data as the rows are retrieved.
STIMER
specifies to write a subset of system performance statistics, such as
time-elapsed statistics, to the SAS log.
XCODE=ERROR | WARNING | IGNORE
controls the behavior of the SAS session when an NLS transcoding failure
occurs.
1010 Chapter 27 / FEDSQL Procedure
Optional Arguments
_METHOD
prints a brief text description of the FedSQL query plan. A FedSQL query is
broken into stages. Each stage of execution requires a stand-alone SQL query.
This option generates a brief text description of the nodes and stages in the
query plan. The information is written to the SAS log.
The behavior of this option changed in SAS Viya 3.5. In previous versions
of SAS Viya, _METHOD returned the stage query and the number of
threads used by a request. Beginning in SAS Viya 3.5, that information
and more can be obtained with a new showStages CNTL= option. For
more information, see “SHOWSTAGES” on page 1012.
_POSTOPTPLAN
prints an XML tree illustrating the FedSQL query plan. A FedSQL query is
broken into stages. Each stage of execution requires a stand-alone SQL query.
This option generates an XML tree that illustrates each stage of the FedSQL
query plan and the results from each execution stage. The information is written
to the SAS log.
The XML tree can be very long. You might want to use the _METHOD
option instead.
ANSIMODE
specifies that nonexistent values in CHAR and DOUBLE columns are processed
as ANSI SQL null values. By default, PROC FEDSQL processes nonexistent
values in CHAR and DOUBLE columns as missing values. This is how SAS
processes nonexistent values. The ANSIMODE option specifies to process
nonexistent values in CHAR and DOUBLE columns as ANSI SQL null values. It is
important to understand the differences, or data can be lost. For information
about processing differences, see “How FedSQL Processes Nulls and SAS
Missing Values” in SAS FedSQL Language Reference. All other data types use
ANSI NULL semantics all of the time.
AUTOCOMMIT | NOAUTOCOMMIT
specifies whether updates are automatically committed (that is, saved to a
table) after a default number of rows are updated, and whether rollback is
available.
CNTL=(parameter)
specifies optional control parameters for FedSQL query planning and query
execution in CAS. Multiple parameters are allowed inside of the parentheses.
Separate each parameter with a space. The following parameters are supported:
DISABLEPASSTHROUGH
disables implicit FedSQL pass-through in CAS. FedSQL attempts to use
implicit pass-through for all SQL data sources by default. In order for a
FedSQL request to be eligible for implicit pass-through in CAS, all tables
must exist in the same caslib and the tables cannot have already been
loaded into the CAS session. For other requirements, see “FedSQL Implicit
Pass-Through Facility in CAS” in SAS Viya: FedSQL Programming for SAS
Cloud Analytic Services. This option can save processing time for FedSQL
requests that contain functions that are specific to SAS, or whose tables
have already been loaded into the CAS session.
DYNAMICCARDINALITY
instructs the FedSQL query planner to perform cardinality estimations
before selecting a query plan.
The FedSQL query planner does not perform cardinality estimations before
selecting a query plan by default. Cardinality estimation can improve the
accuracy of selectivity estimates for join conditions and WHERE clause
predicates, which in turn can lead to better join-order decisions and faster
query execution times for some queries. However, dynamic cardinality does
not help all queries and adds overhead to the query planning process.
OPTIMIZEVARBINARYPRECISION
optimizes VARBINARY precision by using a precision that is appropriate to
the actual data, instead of the precision declared for the VARBINARY
columns.
The precision is optimized both when reading from a table and when creating
a new table from a result set. The greatest benefits from this option are
achieved when the declared precision is far larger than the precision of the
actual data. Then, this option can improve performance, reduce memory
footprint, and create VARBINARY columns in new tables with the needed
size rather than propagating the precision from the source table.
1012 Chapter 27 / FEDSQL Procedure
OPTIMIZEVARCHARPRECISION
optimizes VARCHAR precision by using a precision that is appropriate to the
actual data, instead of the precision declared for the VARCHAR columns.
The precision is optimized both when reading from a table and when creating
a new table from a result set. The greatest benefits from this option are
achieved when the declared precision is far larger than the precision of the
actual data. Then, this option can improve performance, reduce memory
footprint, and create VARCHAR columns in new tables with the needed size
rather than propagating the precision from the source table.
PRESERVEJOINORDER
joins tables in the specified order instead of an order chosen by the FedSQL
query optimizer.
REQUIREFULLPASSTHROUGH
stops processing the FedSQL request when implicit pass-through of the full
query cannot be achieved.
SHOWSTAGES
writes query execution details to the SAS log.
n Elapsed time for each stage and for the entire action
The information is gathered when the query is executed and is written to the
client log. Do not specify NOEXEC with showStages. Execution details
cannot be printed if the query is not executed.
See “Viewing the FedSQL Query Plan” in SAS Viya: FedSQL Programming
for SAS Cloud Analytic Services
ERRORSTOP | NOERRORSTOP
specifies whether the procedure stops executing if it encounters an error. In a
batch or noninteractive session, ERRORSTOP instructs the procedure to stop
executing the statements but to continue checking the syntax after it has
encountered an error. NOERRORSTOP instructs the procedure to execute the
statements and to continue checking the syntax after an error occurs.
Tips ERRORSTOP has an effect only when SAS is running in the batch or
noninteractive execution mode.
EXEC | NOEXEC
specifies whether a statement should be executed after its syntax is checked
for accuracy. EXEC specifies to execute the statement. NOEXEC specifies not
to execute the statement.
Default EXEC
Tip NOEXEC is useful if you want to check the syntax of your FedSQL
statements without executing the statements.
LABEL | NOLABEL
specifies whether to use the column label or the column name as the column
heading.
Default LABEL
Interactions If a column does not have a label, the procedure uses the column's
name as the column heading.
Alias LIBNAMES=
Interactions If both LIBS= and SESSREF= (or SESSUUID=) are specified in the
procedure statement, SESSREF= is applied and the other option is
ignored.
If you are curious about how LIBS= affects library assignments, set
the MSGLEVEL=i system option before running a PROC FEDSQL
request with LIBS=. The option produces Include and Ignore
messages for each of the LIBNAME statements that are processed
in the procedure request.
MEMSIZE=n | nM | nG
specifies a limit for the amount of memory that is used for an underlying query
(such as a SELECT statement), so that allocated memory is available to support
other PROC FEDSQL operations. Specify the memory limit in multiples of 1
(bytes); 1,048,576 (megabytes); or 1,073,741,824 (gigabytes). For example, the
value 23M specifies 24,117,248 bytes of memory. The value 16G specifies
17,179,869,184 bytes of memory.
Default The procedure optimizes the setting based on the amount of memory
on the host.
Note On the CAS server, MEMSIZE= specifies the memory for a single CAS
worker.
NOPRINT
suppresses the normal display of results.
NUMBER
specifies to include a column named Row, which is the row (observation)
number of the data as the rows are retrieved.
SESSREF=session-name
specifies to run the FedSQL statements in a CAS session. The CAS session is
identified by its session name.
PROC FEDSQL Statement 1015
Note This option is supported in SAS Viya 3.1 and later and in SAS
9.4M5 and later.
SESSUUID="session-uuid"
specifies to run the FedSQL statements in a CAS session. The CAS session is
identified by its universally unique identifier (UUID).
Note This option is supported in SAS Viya 3.1 and later and in SAS
9.4M5 and later.
STIMER
specifies to write a subset of system performance statistics, such as time-
elapsed statistics, to the SAS log. When STIMER is in effect, the procedure
writes to the SAS log a list of computer resources used for each step and the
entire SAS session.
Interaction If the SAS system option FULLSTIMER is in effect, the complete list
of computer resources is written to the SAS log.
ERROR
specifies that a run-time error occurs, which causes row processing to halt.
An error message is written to the SAS log. This is the default behavior.
WARNING
specifies that the incompatible character is set to a substitution character. A
warning message is written to the SAS log.
IGNORE
specifies that the incompatible character is set to a substitution character.
No messages are written to the SAS log.
Default ERROR
QUIT Statement
Stops the execution of the FEDSQL procedure.
Interaction: Unlike other SAS procedures, in SAS 9.4, PROC FEDSQL does not recognize step
boundaries. The QUIT statement is required to stop the FEDSQL procedure before
you can submit a DATA step or another procedure step.
Syntax
QUIT;
Usage: FEDSQL Procedure 1017
Details
When the FEDSQL procedure reaches the QUIT statement, all resources allocated
by the procedure are released. You can no longer execute FedSQL language
statements without invoking the procedure again. However, the connection to the
data source server is not lost, because that connection was made through the
LIBNAME statement. As a result, any subsequent invocation of the procedure that
uses the same libref executes almost instantaneously because the LIBNAME
engine is already connected to the server.
1 You first submit the LIBNAME statement for a SAS engine and then submit
PROC FEDSQL. For information to define a LIBNAME statement, see:
SAS data sets
SAS Global Statements: Reference
Relational DBMS data sources
SAS/ACCESS for Relational Databases: Reference
MongoDB and Salesforce
SAS/ACCESS for Nonrelational Databases: Reference
SPD Engine data sets
SAS Scalable Performance Data Engine: Reference
SPD Server tables
SAS Scalable Performance Data Server: User’s Guide
1018 Chapter 27 / FEDSQL Procedure
This example illustrates how PROC FEDSQL accesses a data source by using the
attributes of a previously assigned libref. The LIBNAME statement assigns the
libref MyFiles, specifies the BASE engine, and then specifies the physical location
for the SAS data set. The FedSQL program then creates a SAS data set named
MyFiles.Table1 at the location specified in the LIBNAME statement.
proc fedsql;
create table myfiles.table1 (x double);
insert into myfiles.table1 values (1.0);
insert into myfiles.table1 values (2.0);
insert into myfiles.table1 values (3.0);
quit;
The procedure builds a data source connection string that includes all the active
librefs in the SAS session and sends it to the FedSQL program. You reference a
particular library by specifying its libref in a two-part table name in the form
libref.table-name. If you do not specify a libref, the table is created in the SAS Work
library.
PROC FEDSQL uses libref attributes for connection information only (such as
physical location). The procedure generally does not use libref attributes that
define behavior. For example, if a previously submitted LIBNAME statement for the
BASE engine specifies that SAS data sets are to be compressed, the compression
attribute is not used by the procedure. There are exceptions. For example, the
MAX_BINARY_LEN= and MAX_CHAR_LEN= for Google BigQuery are included in
the internal connection string.
You can determine which LIBNAME options are used by the procedure by setting
the MSGLEVEL=i system option before submitting a LIBNAME statement. For
many data sources, you can specify a DS2 table option to override a LIBNAME
option. For example, the COMPRESS= table option can be used to request
compression of SAS data sets. The SCANSTRINGCOLUMNS= table option can be
used to override the MAX_BINARY_LEN= and MAX_CHAR_LEN= LIBNAME
options. Not all LIBNAME statement options have a corresponding table option.
z/OS Specifics: The physical location for the libref must be an HFS path
specification.
Usage: FEDSQL Procedure 1019
You must continue to qualify table names with a libref, even when LIBS= specifies
only one library; otherwise, the default library is used.
The following example illustrates the use of the LIBS= option. In the example, the
LIBS= option specifies to use only libref MyFiles.
For more information, see “LIBS=libref | (libref1 libref2 ...librefn)” on page 1013.
For information about the FedSQL statements supported in the connections made
with LIBS=, see SAS FedSQL Language Reference.
Note: You must have a CAS server configured. You must first submit the CAS
statement to establish the CAS session. To interact with data in a CAS session, you
need a caslib. You must first define a caslib (or use a pre-defined caslib). You define
a caslib and list the caslibs that are available to your CAS session by using the
CASLIB statement. For syntax information, see SAS Cloud Analytic Services: User’s
Guide. A caslib uses a SAS Data Connector (or SAS Data Connector Accelerator) to
access data. For information about SAS Data Connectors, see SAS Cloud Analytic
Services: User’s Guide.
This example establishes a CAS session named MySess on a CAS server on CAS
host cloud.example.com. It then uses the CASLIB statement to assign caslib
CASTERA. The PROC FEDSQL statement specifies the SESSREF= procedure
option with the CAS session name MySess. The FedSQL CREATE TABLE statement
identifies table Employees using the CASTERA caslib.
SAS Viya data connectors support automatic (shown here) and explicit loading of
data into CAS. For more examples, see “Example 6: Querying Data in CAS” on page
1038, “Example 7: Explicitly Loading and Joining Tables in CAS” on page 1041, and
“Example 8: Joining Tables from Multiple CAS Libraries” on page 1046.
The CAS tables that you create with PROC FEDSQL are in-memory tables. That is,
the tables are available for the duration of the CAS session and are accessible only
to the current session. PROC FEDSQL does not provide a way to persist a table to a
data source or to share the table with other CAS sessions. To persist or share a CAS
output table, use the CASUTIL procedure.
Note: Although CAS tables are in-memory tables, PROC FEDSQL will not
overwrite an existing table of the same name. Specify the REPLACE= table option
to overwrite an existing table with a replacement table. Or, use the DROP TABLE
statement to remove the initial table before creating the replacement table.
For more information about SAS Cloud Analytic Services, see SAS Cloud Analytic
Services: Fundamentals. Also see CAS statement, CASLIB statement, and CASUTIL
procedure in SAS Cloud Analytic Services: User’s Guide.
Data type support and table option support is also data source-specific. For more
information, see “Data Type Reference” in SAS FedSQL Language Reference and
“FedSQL Statement Table Options by Data Source” in SAS FedSQL Language
Reference.
The CAS server is an alternative environment for processing FedSQL queries. The
CAS server is a multiprocessing server. A FedSQL request executing on the CAS
server can perform manipulations on multiple table rows concurrently using
multiple threads and worker nodes. These multiple resources can reduce the time
required to process large tables.
The FedSQL statements available when you target the CAS server are subset of
those that are available for a SAS library. The following FedSQL statements are
supported in CAS:
n CREATE TABLE, with the AS query expression
n SELECT
n DROP TABLE
n dictionary queries
n views
FedSQL output tables in CAS are in-memory tables. The tables are created in the
user session. You must use other CAS actions to promote the output tables for
global use in CAS or to store data to caslib data sources. For more information
about FedSQL functionality on the CAS server, see SAS Viya: FedSQL Programming
for SAS Cloud Analytic Services.
FedSQL table options are used to apply options when you access a data source
within PROC FEDSQL. For example, the following code applies a table option to the
SAS data set in order to specify the size of a permanent buffer page for the new
table:
proc fedsql;
create table myfiles.table1 {options bufsize=16k}(x double) ;
insert into myfiles.table1 values (1.0);
insert into myfiles.table1 values (2.0);
insert into myfiles.table1 values (3.0);
quit;
A CAS library supports different table options than a SAS library. For a list of table
options that are supported in SAS libraries, see SAS FedSQL Language Reference.
For a list of table options that are supported in CAS libraries, see SAS Viya: FedSQL
Programming for SAS Cloud Analytic Services.
Macro Variables
Note: The information in this section applies to PROC FEDSQL use in SAS
libraries.
With PROC FEDSQL, you can use a macro variable on a subsequent FedSQL
statement. However, if a macro variable occurs within a literal string, you cannot
enclose the string in double quotation marks. The macro processor uses double
quotation marks to resolve macro variable references. FedSQL considers a string
enclosed in double quotation marks to be a delimited (case sensitive) identifier
such as a table or column name.
To reference a macro variable in a literal string, use the SAS macro function
%TSLIT. %TSLIT overrides the need for double quotation marks around the literal
string and puts single quotation marks around the input value. For example, the
following statement includes the %TSLIT function to specify the &SYSHOSTNAME
macro variable, which returns the host name of the computer on which it is
executed:
Usage: FEDSQL Procedure 1023
select %tslit(&syshostname);
The %TSLIT macro function is stored in the default autocall macro library. For more
information, see “Referencing a Macro Variable in a Delimited Identifier” in SAS
FedSQL Language Reference .
Passwords
SAS software enables you to restrict access to SAS data sets and SPD Engine data
sets by assigning SAS passwords to the files. When using PROC FEDSQL with SAS
libraries, you can assign or specify a password for a data source using the FedSQL
table options ALTER=, PW=, READ=, and WRITE=. For example, the following code
applies the FedSQL table option PW= in order to assign READ, WRITE, and ALTER
passwords to a SAS data set:
proc fedsql;
create table myfiles.table1 {options pw=luke}(x double) ;
insert into myfiles.table1 values (1.0);
insert into myfiles.table1 values (2.0);
insert into myfiles.table1 values (3.0);
quit;
1024 Chapter 27 / FEDSQL Procedure
This code shows how to specify a table option to read a data set:
select * from myfiles.table1 {options pw=luke};
A SAS password does not control access to a SAS file beyond SAS. You should use
the operating system-supplied utilities and file system security controls to control
access to SAS files outside SAS. For more information about SAS passwords, see
SAS FedSQL Language Reference.
CAS tables do not support SAS passwords. Therefore, you cannot assign a
password for a CAS table. When accessing password-protected data from CAS,
passwords are specified in the CAS language element used to access the data. For
example, passwords are supported in the CASUTIL procedure, which loads data
into CAS, as well as in the CASLIB statement and in the Table.addCaslib action. For
more information, see the SAS Data Connector documentation in SAS Cloud
Analytic Services: User’s Guide.
Encryption
SAS software enables you to encrypt the contents of a SAS data set, SPD Engine
data set, and SPD Server table. SAS supports SAS proprietary encryption and AES
encryption.
When using PROC FEDSQL with SAS libraries, you can encrypt output SAS data
sets, SPD Engine data sets, and SPD Server tables with SAS proprietary encryption.
This is done by specifying the FedSQL ENCRYPT= table option with the PW= or
READ= table option. A data set or table encrypted with SAS proprietary encryption
must be decrypted by specifying the PW= or READ= table option with the
appropriate password.
AES encryption is performed by specifying the ENCRYPT= table option with the
ENCRYPTKEY= table option. A data set or table encrypted with AES encryption is
later decrypted by specifying the ENCRYPTKEY= table option with the appropriate
key value.
SAS supports two levels of AES encryption: AES and AES2. The new AES2 option
provides AES encryption to meet newer and more secure encryption standards.
AES2 encryption is initially supported for SAS data sets only. For more information,
see “ENCRYPT= Table Option” in SAS FedSQL Language Reference.
FedSQL currently does not support the encryption attribute for CAS tables. When
accessing SAS and AES encrypted data sets from CAS, passwords and encryption
keys are specified in the CAS language element that is used to access the data. For
example, passwords and encryption keys are supported in the CASUTIL procedure,
which loads data into CAS as well as in the CASLIB statement and in the
Table.addCaslib action. For more information, see the SAS Data Connector
documentation in SAS Cloud Analytic Services: User’s Guide.
Usage: FEDSQL Procedure 1025
Table 27.2 FedSQL Data Type Translation for SAS Data Sets
For information about data type support for DBMS data sources through a SAS
library, see “Data Type Reference” in SAS FedSQL Language Reference.
1 Support for the integer data types starts in SAS Viya 3.3.
2 Support for the VARBINARY data type starts in SAS Viya 3.5. FedSQL can read and write
VARBINARY columns in CAS tables (with CREATE TABLE AS). However, the SELECT statement
does not display VARBINARY columns in some clients unless you apply the $HEX. format to the
VARBINARY column with the PUT function. In earlier SAS Viya releases, FedSQL does not return an
error when reading BINARY and VARBINARY columns. However, the data is incorrectly treated as
character data. Attempts to read BINARY data now yield an error.
Date, time, and timestamp values in CAS tables are supported as DOUBLEs, with a
SAS format applied. FedSQL applies the DATE9. SAS format to date values, the
TIME8. SAS format to time values, and the DATETIME25.6 SAS format to datetime
values.
Details
This example creates a SAS data set in a Base SAS session by submitting the
FEDSQL procedure. The example submits the FedSQL CREATE TABLE and INSERT
statements. The CONTENTS procedure then is used to describe the contents of the
SAS data set.
Program
libname mybase base 'C:\My Documents';
proc fedsql;
create table mybase.sales (prodid double not null,
custid double not null,
totals double having format comma8.,
country char(30));
insert into mybase.sales values (3234, 1, 189400, 'United States');
insert into mybase.sales values (1424, 3, 555789, 'Japan');
insert into mybase.sales values (3421, 4, 781183, 'Japan');
insert into mybase.sales values (3422, 2, 2789654, 'United States');
insert into mybase.sales values (3975, 5, 899453, 'Argentina');
quit;
proc contents data=mybase.sales;
run;
Program Description
Assign a library reference to the SAS data set to be created. The LIBNAME
statement assigns the libref MyFiles, specifies the BASE engine, and specifies the
physical location for the SAS data set.
libname mybase base 'C:\My Documents';
Execute the PROC FEDSQL statement. The PROC FEDSQL statement sets up an
environment to submit FedSQL statements. By default, the PROC FEDSQL
statement generates a connection string to the data source from the librefs that are
active in the SAS session.
proc fedsql;
Enter the FedSQL CREATE TABLE statement. The statement specifies to create
the SAS data set named MyBase.Sales. Note that the two-level name in the
FedSQL CREATE TABLE statement specifies the catalog identifier MyBase, which
is the assigned libref. The variable declaration defines a NOT NULL integrity
constraint on columns ProdId and CustId, and applies the SAS format COMMAw. on
column Totals.
Example 1: Creating a SAS Data Set 1029
Enter INSERT statements to populate the table with data. Note that the table
name is qualified with the catalog identifier in each statement. The data values are
submitted in a comma-delimited string, preceded by the keyword VALUES.
insert into mybase.sales values (3234, 1, 189400, 'United States');
insert into mybase.sales values (1424, 3, 555789, 'Japan');
insert into mybase.sales values (3421, 4, 781183, 'Japan');
insert into mybase.sales values (3422, 2, 2789654, 'United States');
insert into mybase.sales values (3975, 5, 899453, 'Argentina');
List the contents of the SAS data set. The CONTENTS procedure describes the
contents of the SAS data set.
proc contents data=mybase.sales;
run;
1030 Chapter 27 / FEDSQL Procedure
Details
This example creates a new SAS data set from existing tables by using PROC
FEDSQL and the CREATE TABLE statement with the AS query expression syntax.
The query expression selects rows from three existing tables — SAS data set Sales,
SPD Engine data set Products, and Oracle table Customers — to create the new
table: Results.
Program
libname mybase v9 'C:\base';
libname myspde spde 'C:\spde';
libname myoracle oracle path=ora11g user=xxxxxx password=xxxxxx
schema=xxxxxx;
proc fedsql;
1032 Chapter 27 / FEDSQL Procedure
Program Description
Assign three librefs. The first LIBNAME statement assigns the libref MyBase,
specifies the V9 engine, and specifies the physical location for the SAS data set to
be created. V9 is an alias for the BASE engine. The second LIBNAME statement
assigns the libref MySpde, specifies the SPDE engine, and specifies the physical
location for the existing SPD Engine data set. The third LIBNAME statement
assigns the libref MyOracle, specifies the ORACLE engine, and specifies the
connection information to the Oracle database that contains the existing Oracle
tables.
libname mybase v9 'C:\base';
libname myspde spde 'C:\spde';
libname myoracle oracle path=ora11g user=xxxxxx password=xxxxxx
schema=xxxxxx;
Execute the PROC FEDSQL statement. The PROC FEDSQL statement sets up an
environment for submitting FedSQL statements.
proc fedsql;
Create the new table. The CREATE TABLE statement creates a new SAS data set
from three existing SAS data sets by using a query expression to select rows from
the existing data sets. The SELECT statement retrieves the qualified columns and
rows from the existing data sets to create the new SAS data set.
create table mybase.results as
select products.prodid, products.product, customers.name,
sales.totals, sales.country
from myspde.products, mybase.sales, myoracle.customers
where products.prodid = sales.prodid and
customers.custid = sales.custid;
Retrieve data in the SAS data set. This second SELECT statement displays the
contents of the output data set.
select * from mybase.results;
Details
This example illustrates querying data using a correlated subquery. In a correlated
subquery, the WHERE clause in the subquery refers to values in a table in the outer
query. The correlated subquery is evaluated for each row in the outer query. With
correlated subqueries, FedSQL executes the subquery and the outer query
together. FedSQL can perform heterogeneous correlated subqueries. FedSQL
directs the subquery to be performed by the data source, which limits the result set
that is transferred from the data source.
Note: Correlated subqueries are not yet supported on the CAS server.
Program
libname myspde spde 'C:\spde';
1034 Chapter 27 / FEDSQL Procedure
Program Description
Assign two library references. The first LIBNAME statement assigns the libref
MySpde, specifies the SPDE engine, and specifies the physical location for the SPD
Engine data set. The second LIBNAME statement assigns the libref MyOracle,
specifies the Oracle engine, and specifies the connection information to the Oracle
database.
libname myspde spde 'C:\spde';
libname myoracle oracle path=ora11g user=xxxxxx password=xxxxxx
schema=xxxxxx;
Execute the PROC FEDSQL statement. The PROC FEDSQL statement creates an
environment for submitting FedSQL statements.
proc fedsql;
Stop the procedure. Specify the QUIT statement to complete the procedure
request.
quit;
Details
This example illustrates how to create an index for an Oracle table. It then
illustrates how to use the index to perform a join of the Oracle table and an SPD
Engine data set.
Note: The CREATE INDEX statement is not supported on the CAS server.
Program
libname myspde spde 'C:\spde';
libname myoracle oracle path=ora11g user=xxxxxx password=xxxxxx
schema=xxxxxx;
proc fedsql;
create index prodid on myoracle.sales (prodid);
select * from myspde.product, myoracle.sales
where product.prodid=sales.prodid;
quit;
Program Description
Assign two library references. The first LIBNAME statement assigns the libref
MySpde, specifies the SPDE engine, and specifies the physical location for the SPD
Engine data set. The second LIBNAME statement assigns the libref MyOracle,
specifies the ORACLE engine, and specifies the connection information to the
Oracle database.
libname myspde spde 'C:\spde';
libname myoracle oracle path=ora11g user=xxxxxx password=xxxxxx
schema=xxxxxx;
Execute the PROC FEDSQL statement. The PROC FEDSQL statement creates an
environment for submitting FedSQL statements.
proc fedsql;
Create an index for the Oracle table. The CREATE INDEX statement creates an
index named ProdId in the Oracle table named Sales for the column ProdId.
create index prodid on myoracle.sales (prodid);
Retrieve columns and rows. The SELECT statement retrieves data from the SPD
Engine data set named Product and the Oracle table named Sales. Even though the
index is in the Oracle database, FedSQL can take advantage of the index to perform
the join.
1036 Chapter 27 / FEDSQL Procedure
Stop the procedure. Specify the QUIT statement to complete the procedure
request.
quit;
Details
The FedSQL language supports the ability to invoke user-defined DS2 package
methods as functions in the SELECT statement. This example creates and submits
a DS2 package method named Add on a table named Numbers from PROC
FEDSQL. The package method and table are created in the Work library.
Note: Package methods that are run from PROC FEDSQL can have only input
arguments in the method. For more information, see “Using DS2 Packages in
Expressions” in SAS FedSQL Language Reference.
Note: User-defined package methods are not supported on the CAS server.
Program
proc ds2;
package adder / overwrite =yes;
method add( double x, double y ) returns double;
return x + y;
end;
endpackage;
data numbers / overwrite = yes;
dcl double x y;
method init();
Example 5: Using a DS2 Package in an Expression 1037
dcl int i;
do i = 1 to 10;
x = i; y = i * i;
output;
end;
end;
enddata;
run;
quit;
proc fedsql;
select x, y, work.adder.add( x, y ) as z from work.numbers;
quit;
Program Description
Invoke the DS2 procedure with the PROC DS2 statement and define a package.
This PACKAGE statement specifies to create a package named Adder.
proc ds2;
package adder / overwrite =yes;
Create a table. This DATA statement specifies to create a table named Numbers. A
library is not specified, so the Work library will be used. The table has two columns,
X and Y, of type DOUBLE. The system INIT method is called to populate the table
with values. A variable, I, is defined to hold input values. A DO statement uses
variable I to insert rows containing the values 1 through 10 into column X. Then, it
multiplies each instance of I with itself and inserts the result into column Y. The
END and ENDDATA statements signal the completion of the DO statement, INIT
method, and DATA statement declarations.
data numbers / overwrite = yes;
dcl double x y;
method init();
dcl int i;
do i = 1 to 10;
x = i; y = i * i;
output;
end;
end;
enddata;
1038 Chapter 27 / FEDSQL Procedure
Submit the DS2 statements. Like in the DATA step, the RUN statement executes
DS2 language statements.
run;
Stop the DS2 procedure. The QUIT statement completes the procedure request.
quit;
Invoke the FEDSQL procedure. The PROC FEDSQL statement invokes the FEDSQL
procedure.
proc fedsql;
Submit a SELECT statement that invokes the package method on the table. The
statement specifies to select columns X and Y, and the result of the package
method expression is column Z from table Numbers. The package method is
referenced using a three-part name in the form [catalog.][schema.]package.method.
select x, y, work.adder.add( x, y ) as z from work.numbers;
Stop the FEDSQL procedure. The QUIT statement completes the PROC FEDSQL
request.
quit;
Details
This example illustrates the steps necessary to query a database table named
Employees from CAS.
Example 6: Querying Data in CAS 1039
Program
options cashost="cloud.example.com" casport=5570;
cas mysess;
caslib castera desc='Teradata Caslib'
datasource=(srctype='teradata',
dataTransferMode='serial',
username='myname',
password='mypw',
server='testserver',
db='test');
proc fedsql sessref=mysess;
select Pos, count(Pos) as Count_Pos
from castera.employees
group by Pos
having count(Pos) >= 2;
quit;
Program Description
Invoke the CAS server. The CASHOST= and CASPORT= system options specify
the name and port number of the CAS server.
options cashost="cloud.example.com" casport=5570;
Establish a session on the CAS server. The CAS statement specifies to create a
session named MySess on the CAS server.
cas mysess;
Add a caslib for the Teradata database. When submitting a request to the CAS
server, you must identify your data source with a caslib instead of a libref. The
CASLIB statement specifies to create caslib CasTera. The caslib specifies the
SrcType=Teradata, dataTransferMode='serial', and connection details for the
Teradata database. A data connect accelerator that loads data in parallel is
available for Teradata, but it is not used here. CasTera becomes the active caslib in
your CAS session.
caslib castera desc='Teradata Caslib'
datasource=(srctype='teradata',
dataTransferMode='serial',
username='myname',
password='mypw',
server='testserver',
db='test');
Specify the PROC FEDSQL statement with the SESSREF= procedure option. In
the SESSREF= procedure option, specify the name CAS session name MySess. The
SESSREF= option establishes the connection to the CAS session and it instructs
the procedure to pass FedSQL language statements that follow to the
fedSQL.execDirect action.
1040 Chapter 27 / FEDSQL Procedure
Specify FEDSQL statements and identify the data source with caslib CASTERA.
The SELECT statement specifies to list all of the job titles in database table
Employees that have at least two employees. The table is identified by the two-
part name CasTera.Employees. When you identify tables using a two-part name,
the execDirect action responds as follows. The FedSQL language supports single-
source, full-query implicit SQL pass-through in CAS. If the target tables have not
yet been loaded into the CAS session, the tables are evaluated for implicit pass-
through. Implicit pass-through passes eligible requests to the data source for
processing and loads the result set into CAS. SQL implicit pass-through is possible
only for tables that have not yet been loaded into the CAS session. There are other
important requirements. For information about these requirements, see “FedSQL
Implicit Pass-Through Facility in CAS” in SAS Viya: FedSQL Programming for SAS
Cloud Analytic Services.. Unloaded tables that are not eligible for pass-through are
automatically loaded into the CAS session for processing by the CAS server. Tables
already existing in the CAS session are processed by the CAS server.
select Pos, count(Pos) as Count_Pos
from castera.employees
group by Pos
having count(Pos) >= 2;
Details
In some cases, some formatting is required before data can be processed
successfully. This example explicitly loads three files that contain comma-
delimited data into CAS. The CAS LIBNAME engine and the DATA step are used to
format and load the tables. Then, after the tables are in CAS, PROC FEDSQL is
used to join them and to create a new CAS table that contains the result set. The
input files are named Supplier, Nation, and Customer. The output CAS table is
named CASDATA.NewTable. All of the CAS tables are in-memory tables. They
disappear at the end of the CAS session, unless you save or promote them using
PROC CASUTIL.
The example assigns a caslib and a libref. The libref is mapped to the caslib in the
CAS LIBNAME statement. The DATA step executes in the libref. When the
SESSREF= option is specified in the PROC FEDSQL procedure statement, FedSQL
statements are executed in a caslib.
Note: When formatting your FEDSQL requests, be aware that leading spaces
before statements and clauses are important. Do not begin statements and clauses
flush with the left margin. If you put a line break in a quoted string, always follow
the line break with at least one blank.
Program
options cashost="cloud.example.com" casport=5570;
cas mysess;
1042 Chapter 27 / FEDSQL Procedure
Program Description
Invoke the CAS server. The CASHOST= and CASPORT= system options specify
the name and port number of the CAS server.
options cashost="cloud.example.com" casport=5570;
Establish a session on the CAS server. The CAS statement specifies to create a
CAS session named MySess.
cas mysess;
Assign a caslib that points to your input files. The CASLIB statement assigns the
caslib CASDATA to the location specified in the PATH= parameter. The path
specification must use an absolute pathname.
caslib casdata path='/r/ge.unx.company.com/vol/vol210/u21/myID/hold';
Assign a CAS engine libref. The LIBNAME statement specifies the libref MyCas,
the CAS engine, connection parameters for the CAS server, and the CASDATA
caslib. The LIBNAME statement invokes the CAS engine and maps libref to the
caslib.
libname mycas cas host="cloud.example.com" port=5570 sessref=mysess
caslib=casdata;
Use the DATA step to format and load the first file into CAS. The DATA statement
specifies to create a table named MyCas.Supplier. The CAS engine creates the
output table as a CAS table. In the SAS session, the table is known as
MyCas.Supplier. In CAS session MySess, the table is known as CASDATA.Supplier.
The INFILE= statement specifies to read the contents of the file using a | (pipe
symbol) as a column delimiter. The LENGTH statement specifies column names
and lengths for the output table. (Note that the INFILE specification can be relative
to the path that is specified in the CASLIB statement.) The CAS engine creates
table Supplier in caslib CASDATA.
data mycas.supplier;
infile "/r/ge.unx.company.com/vol/vol210/u21/myID/hold/supplier.tbl"
delimiter='|';
length S_SUPPKEY 8. S_NAME VARCHAR(25) S_ADDRESS VARCHAR(40)
S_NATIONKEY 8.
S_PHONE VARCHAR(15) S_ACCTBAL 8. S_COMMENT VARCHAR(101);
input S_SUPPKEY S_NAME S_ADDRESS S_NATIONKEY S_PHONE S_ACCTBAL
S_COMMENT;
run;
Format and load the second file into CAS. This DATA statement specifies to create
a CAS table named MyCas.Nation. The INFILE= statement reads the contents of
the file using a pipe symbol as a delimiter. The LENGTH statement specifies
column names and lengths for the output table. The CAS engine creates table
Nation in caslib CASDATA.
data mycas.nation;
infile "/r/ge.unx.company.com/vol/vol210/u21/myID/hold/nation.tbl"
delimiter='|';
length N_NATIONKEY 8. N_NAME VARCHAR(25) N_REGIONKEY 8. N_COMMENT
VARCHAR(152);
input N_NATIONKEY N_NAME N_REGIONKEY N_COMMENT;
run;
1044 Chapter 27 / FEDSQL Procedure
Format and load the third file into CAS. The DATA step specifies to create a CAS
table named MyCas.Customer. The INFILE= statement reads the contents of the
file. The LENGTH statement specifies column names and lengths for the output
table. The CAS engine creates table Customer in caslib CASDATA.
data mycas.customer;
infile "/r/ge.unx.company.com/vol/vol210/u21/myID/hold/customer.tbl"
delimiter='|';
length C_CUSTKEY 8. C_NAME VARCHAR(25) C_ADDRESS VARCHAR(40)
C_NATIONKEY 8.
C_PHONE VARCHAR(15) C_ACCTBAL 8. C_MKTSEGMENT VARCHAR(10) C_COMMENT
VARCHAR(117);
input C_CUSTKEY C_NAME C_ADDRESS C_NATIONKEY C_PHONE C_ACCTBAL
C_MKTSEGMENT
C_COMMENT;
run;
Verify that the files were created in CAS. Submit the CASUTIL procedure to list
the tables that are available in caslib CASDATA.
proc casutil;
list tables incaslib="casdata";
run;
Specify the PROC FEDSQL statement. In the PROC FEDSQL statement, specify
the SESSREF= procedure option with the name MySess to connect to the CAS
session and direct the FedSQL statements that follow to the fedSQL.execDirect
action. Because of earlier activity in the CAS session, CASDATA is the active caslib.
proc fedsql sessref=mysess;
Details
This example shows how to perform the join in “Example 2: Joining Tables from
Multiple SAS Libraries” on page 1031 on the CAS server. A difference is that this
example joins a SAS data set and an SPD Engine data set with a Teradata table
instead of an Oracle table.
Invoke the CAS server and start a CAS session. These steps are necessary only if a
CAS server connection and a CAS session do not already exist.
options cashost="cloud.example.com" casport=5570;
cas mysess;
Load the SAS data set Customers into your CAS session. PROC CASUTIL loads
the data set into default caslib CASUSERHDFS.
proc casutil;
load data="path-to-customers-data-set" outcaslib="casuserhdfs";
quit;
Add an SPD Engine caslib. The CASLIB statement specifies to create caslib
spdeCasLib. The caslib specifies SrcType=SPDE and it specifies the path to the
directory that contains the metadata file for SPD Engine data set Products.
caslib spdecaslib Desc="SPD Engine caslib"
datasource=(srctype="spde", username="",
mdfpath="path-to-metafile",
dataTransferMode="serial");
run;
Add a Teradata caslib. This step is necessary only if the caslib has not already been
assigned in the CAS session.
caslib TDcaslib desc='Teradata Caslib'
datasource=(srctype='teradata'
username='myname'
Example 8: Joining Tables from Multiple CAS Libraries 1047
password='mypw'
server='testserver',
db='test')
notactive;
Submit the join request. The PROC FEDSQL statement specifies the SESSREF=
procedure option. The SESSREF= option specifies CAS session MySess. Each table
name in the CREATE TABLE statement is identified using a two-part table name. A
SELECT statement requests to print the contents of table Results.
proc fedsql sessref=mysess;
create table results as
select products.prodid, products.product, customers.name,
sales.totals, sales.country
from spdecaslib.products, TDcaslib.sales, casuserhdfs.customers
where products.prodid = sales.prodid and
customers.custid = sales.custid;
28
FMTC2ITM Procedure
After you create an item store with PROC FMTC2ITM, you can use the CAS
statement with the addFmtLib action to make the item store available to CAS. For
more information, see “CAS Statement” in SAS Cloud Analytic Services: User’s Guide
and addFmtLib Action.
1050 Chapter 28 / FMTC2ITM Procedure
Statement Task
PROC FMTC2ITM Converts one or more format catalogs into a single item
store.
Note: The item store is written as a new file. If you specify the name of an existing item
store with the ITEMSTORE option, it overwrites the contents of the existing item
store.
Syntax
PROC FMTC2ITM <options>;
Required Arguments
CATALOG= memname | libname.memname( | list)
specifies a catalog that is to be converted to an item store. If you do not specify
the CATALOG option, the default catalog is WORK.FORMATS.
You can specify the following values for the CATALOG option:
n A single-level name that SAS interprets as a catalog name in the WORK
library.
n A two-level name that SAS interprets as a libname.memname for a catalog.
SAS opens each catalog in the list in the listed order and writes the members of
the catalogs to the item store. Only the first occurrence of the member is
written. For example, if CATALOG A has members X and Y, and CATALOG B has
members X and Z as specified in this code example:
PROC FMTC2ITM Statement 1051
The resulting item store contains members X and Y from CATALOG A, and
member Z from CATALOG B.
Note: You can specify only one item store with each invocation of PROC
FMTC2ITM.
ENCODING=encoding-name
specifies an encoding for a catalog or for all of the catalogs in a list.
To specify an encoding for one catalog, specify the ENCODING= option after
the catalog name as shown in this example code.
proc fmtc2itm cat=(abc.fmtlib1/encoding=utf8 abc.fmtlib2);
SAS applies the UTF8 encoding to only the Abc.Fmtlib1 catalog. The
Abc.Fmtlib2 catalog uses the session encoding.
SAS applies the UTF8 encoding to all of the catalogs in the list.
If you do not specify the ENCODING= option for a catalog, then SAS assigns the
session encoding option to the catalog.
PROC FMTC2ITM validates all of the character data (all labels and character
range values) in a catalog to ensure that they are valid for the specified
encoding. SAS issues an error if any of the characters do not transcode
successfully.
Optional Arguments
PRINT
displays information about each catalog member that is written.
LOCALE
adds locale-sensitive prefixes to the names of members of an item store.
If you specify the LOCALE option, then the processing of the parenthetical list
of member names is different from the usual processing of member names in a
list. If any catalog has a locale suffix (of the form _xx or _xx_yy), then the
members from the catalog are written to the item store with that suffix as a
prefix. For example, if catalog X_EN_US has members ABC, DEF, and GHI, and
catalog X_FR_FR also has members ABC, DEF and JKL as specified in this code
example
proc fmtc2itm catalog=(x_en_us x_fr_fr) itemstore locale; run;
the item store in CAS allows for ABC or DEF to be loaded properly based on an
EN_US or FR_FR locale. If the LOCALE option is not provided, the item store
contains the members ABC, DEF, and GHI from X_EN_US, and member JKL from
X_FR_FR.
Note: When you specify PROC FMTC2ITM, your SAS session must use the
same encoding that was used when the format catalogs were created. For
example, if the EN_US locale was used when the catalogs were created, then
the session where you specify PROC FMTC2ITM must also use the EN_US
locale.
SELECT Statement
Lists the formats to place in the item store.
Tip: If you do not specify the members of the format catalog with a SELECT statement,
then all formats in the catalog are written to the item store.
Syntax
SELECT <member-list>;
Optional Argument
member-list
Contains the names of the formats to place in the item store.
Details
The SELECT statement enables you to select only the formats from a format
library that you want to add to an item store. For example, if you had a format
library that contained the formats CHOICE, SINGLE2, TESTFMTA, WHEN, and
WHERE, you could specify the WHEN and WHERE formats with the SELECT
statement. The WHEN and WHERE formats are then added to the item store, but
the CHOICE, SINGLE2, and TESTFMTA, formats are not added.
Example: Migrate Formats to a CAS Session 1053
Details
This example uses the FMTC2ITM procedure to migrate user-defined formats that
are stored in one or more SAS format catalogs to a format library in a CAS session.
Program
libname orion "path-to-library";
proc format;
value $codes
"A" = "Alpha"
"B" = "Beta"
"C" = "Charlie"
"D" = "Delta";
value response
1 = "Yes"
2 = "No"
3 = "Undecided"
4 = "No response";
value MPGrating
34 - HIGH = "Excellent"
24 -< 34 = "Good"
19 -< 24 = "Fair"
LOW -< 19 = "Poor";
run;
Program Description
Create the Orion library and add the formats.
libname orion "path-to-library";
proc format;
value $codes
"A" = "Alpha"
"B" = "Beta"
"C" = "Charlie"
"D" = "Delta";
value response
1 = "Yes"
2 = "No"
3 = "Undecided"
4 = "No response";
value MPGrating
34 - HIGH = "Excellent"
24 -< 34 = "Good"
19 -< 24 = "Fair"
LOW -< 19 = "Poor";
run;
value $regionCodes
"E" = "East"
"W" = "West"
"N" = "North"
"S" = "South";
run;
Create the item store with the formats. The FMTC2ITM procedure writes the
formats in format catalogs Work.Formats and Orion.Mailfmts to an item store file.
The CATALOG option specifies to search the format catalogs Work.Formats and
Orion.Mailfmts. The ITEMSTORE option specifies the path where the item store is
to be written. To select a subset of the formats in the specified format catalogs,
specify a SELECT statement in the FMTC2ITM procedure step.
proc fmtc2itm
catalog=(work.formats orion.mailfmts)
itemstore="path-to-item-store-file";
run;
Load the Formats The CAS statement ADDFMTLIB option uses the item store file
that you created with the FMTC2ITM procedure to add the format library Myfmtlib.
cas casauto addfmtlib fmtlibname="myfmtlib"
path=path-to-item-store-file
replacefmtlib;
List the formats. The CAS statement LISTFORMAT option lists the formats in
format library Myfmtlib to the SAS log for verification.
cas casauto listformat fmtlibname="myfmtlib"
members;
Save the format library. The CAS statement SAVEFMTLIB option saves the format
library to a SASHDAT file. This step is optional.
For format libraries that you use repeatedly, saving to a caslib is a best practice.
Use the CAS statement ADDFMTLIB option with parameters CASLIB= and TABLE=
when adding a format library from a caslib.
cas casauto savefmtlib fmtlibname=myfmtlib
caslib=casuser table=myfmtlib replace;
1056 Chapter 28 / FMTC2ITM Procedure
1057
29
FONTREG Procedure
Note: Including a system font in the SAS registry means that SAS knows where to
find the font file. The font file is not actually used until the font is called for in a
SAS program. Therefore, do not move or delete font files after you have included
the fonts in the SAS registry.
When you specify a font in a SAS program, use the three-character tag to
distinguish between fonts that have the same name:
proc report data=sashelp.class nowd
style(header)=[font_face='<ttf> Palatino Linotype'];
run;
Examples of when you can specify a font in a SAS program are in the TEMPLATE
procedure or in the STYLE= option in the PROC REPORT.
If you do not include a tag in your font specification, SAS searches the registry for
fonts with that name. If more than one font with that name is found, SAS uses the
font that has the highest rank in the following table.
Concepts: FONTREG Procedure 1059
SAS does not support any type of nonscalable fonts that require FreeType font-
rendering. Even if they are recognized as valid fonts, they will not be added to the
SAS registry.
Font files that are not produced by major vendors can be unreliable, and in some
cases SAS might not be able to use them.
The following SAS output methods and device drivers can use FreeType font-
rendering:
This code will register all of the other font files in the Windows font directory.
To remove a font by using the SAS Registry Editor, select Solutions ð Accessories
ð Registry Editor. Alternatively, you can enter regedit in the command window or
Command ===> prompt.
In the left pane of the Registry Editor window, navigate to the [CORE\PRINTING
\FREETYPE\FONTS] key. Select the font that you want to delete, and use one of
these methods to delete it:
n Right-click the font name and select Delete from the menu.
For more information about PROC REGISTRY, see Chapter 57, “REGISTRY
Procedure,” on page 2027.
The ability to use a fileref enables you to directly use the FILENAME statement and
its features. For example, you can register available fonts by using a URL. With
fileref support, you would use a FILENAME statement and a PROC FONTREG step.
Statement Task
Restriction: This procedure is not available in SAS Viya orders that include only SAS Visual
Analytics.
Syntax
PROC FONTREG <options>;
Optional Arguments
MODE=ADD | REPLACE | ALL
specifies how to handle new and existing fonts in the SAS registry:
ADD
specifies to add fonts that do not already exist in the SAS registry. Do not
modify existing fonts.
REPLACE
specifies to replace fonts that already exist in the SAS registry. Do not add
new fonts.
ALL
specifies to add new fonts that do not already exist in the SAS registry and
replace fonts that already exist in the SAS registry.
Default ADD
VERBOSE
SAS log messages include which fonts were added, which fonts were not
added, and which fonts were not understood. The log also contains a
summary that indicates the number of fonts that were added, not added, and
not understood.
NORMAL
SAS log messages include which fonts were added, and a summary that
indicates the number of fonts that were added, not added, and not
understood.
TERSE
SAS log messages include only the summary that indicates the number of
fonts that were added, not added, and not understood.
NONE
No messages are written to the SAS log, except for errors (if encountered).
Default TERSE
Example “Example 2: Adding All Font Files from Multiple Directories” on page
1071
NOUPDATE
specifies that the procedure should run without actually updating the SAS
registry. This option enables you to test the procedure on the specified fonts
before modifying the SAS registry.
USESASHELP
specifies that the SAS registry in the Sashelp library should be updated. You
must have Write access to the Sashelp library in order to use this option. If the
USESASHELP option is not specified, then the SAS registry in the Sasuser
library is updated.
FONTFILE Statement
Specifies one or more font files to be processed.
Syntax
FONTFILE 'file' <…'file'> | 'file-1, pfm-file-1, afm-file-1' <...'file-n'>;
FONTFILE Statement 1065
Required Arguments
file
is the complete pathname to a font file. If the file is recognized as a valid font
file, then the file is processed. Each pathname must be enclosed in quotation
marks. If you specify more than one pathname, then you must separate the
pathnames with a space.
pfm-file
specifies a file specific to Windows that contains font metrics as well as the
value of the Windows font name.
afm-file
specifies a file that contains font metrics.
Details
If you specify a Type1 font in the FONTFILE statement, and you do not specify a
PFM or AFM file, then SAS does not search for the PFM or AFM files.
If you specify an AFM file but do not specify a PFM file, then you must use a
comma as a placeholder for the missing PFM file, as in this example:
fontfile 'c:\winnt\fonts\alpinerg.pfb, ,
c:\winnt\fonts\alpinerg.afm';
If you specify a PFM file but do not specify an AFM file, then you do not need a
comma as a placeholder for the missing AFM file, as in this example:
fontfile 'c:\winnt\fonts\alpinerg.pfb,
c:\winnt\fonts\alpinerg.pfm';
When you specify a PFM or AFM file, SAS attempts to open the file and determine
whether the file is of the specified type. If it is not, then SAS writes a message to
the log and the file is not used.
1066 Chapter 29 / FONTREG Procedure
The PFM file is a file that is specific to Windows, and contains font metrics as well
as a value for the Windows Font Name field. If you specify a valid PFM file, then
SAS opens the file, retrieves the value in Windows Font Name, and saves it with
the font in the SAS registry. SAS uses this field when it creates a file (such as an
EMF formatted file) to export into a Windows application.
Note: If you replace a font in a family and the font contains values for the PFM
Name or AFM Name, specifying a missing or invalid value for the metric in the
FONTFILE statement causes the corresponding metric value to be deleted from the
font in the registry.
Note: You cannot use a PFM or AFM file specification if you specify a TrueType
font.
FONTPATH Statement
Specifies one or more directories to be searched for valid font files to process.
See: “Example 2: Adding All Font Files from Multiple Directories” on page 1071
Syntax
FONTPATH <fileref> 'directory' <…'directory'>;
Required Argument
directory
specifies a directory to search. All files that are recognized as valid font files are
processed. Each directory must be enclosed in quotation marks. If you specify
more than one directory, then you must separate the directories with a space.
Operating Environment Information: In the Windows operating environment
only, you can locate the fonts folder if you do not know where the folder resides.
REMOVE Statement 1067
In addition, you can register system fonts without having to know where the
fonts are located. To find this information, submit the following program:
proc fontreg;
fontpath "%sysget(systemroot)\fonts";
run;
Optional Argument
fileref
specifies a fileref to use with the FONTPATH statement.
REMOVE Statement
Removes a font family, all fonts of a particular type (such as TrueType or Type1), or all fonts from the
Core\Printing\Freetype\Fonts location of the SAS registry.
Syntax
REMOVE ‘family-name’ | ‘alias’ | family-type | _ALL_;
Required Arguments
family-name
specifies the family name of the font that you want to remove from the
Core\Printing\Freetype\Fonts key in the SAS registry. Enclose family-name in
quotation marks if the value contains one or more spaces.
alias
specifies an alternative name, usually in a shortened form, for family-name.
Enclose the alias name in quotation marks if the value contains one or more
spaces.
Note The valid values that can be specified as an alias are listed in the
Core\Printing\Alias\Fonts\Freetype key in the SAS registry.
family-type
specifies the name of a font type (such as TrueType or Type1) that SAS
supports and that you want removed from the SAS registry.
Note: The font type is not removed from the operating system location in which
they reside. The registration of the font type from the SAS registry is removed
so that SAS does not recognize the fonts.
1068 Chapter 29 / FONTREG Procedure
_ALL_
specifies that all font families in the Core\Printing\Freetype\Fonts key in the
SAS registry will be deleted.
Details
that Test is an alias for Arial. SAS removes the Arial font family from the
Core\Printing\Freetype\Fonts key and the Test alias from the
Core\Printing\Alias\Fonts\Freetype key in the SAS registry.
If SAS is unable to remove a font family at this point, then SAS writes a message to
the log indicating that the specified value in the REMOVE statement is invalid.
TRUETYPE Statement
Specifies one or more directories to be searched for TrueType font files.
See: “Example 3: Replacing Existing TrueType Font Files from a Directory” on page 1073
Syntax
TRUETYPE <fileref> 'directory' <…'directory'>;
Required Argument
directory
specifies a directory to search. Only files that are recognized as valid TrueType
font files are processed. Each directory must be enclosed in quotation marks. If
you specify more than one directory, then you must separate the directories
with a space.
Optional Argument
fileref
specifies a fileref to use with the TRUETYPE statement.
TYPE1 Statement
Specifies one or more directories to be searched for valid Type1 font files.
Syntax
TYPE1 <fileref> 'directory' <…'directory'>;
Required Argument
directory
specifies a directory to search. Only files that are recognized as valid Type1 font
files are processed. Each directory must be enclosed in quotation marks. If you
1070 Chapter 29 / FONTREG Procedure
specify more than one directory, then you must separate the directories with a
space.
Optional Argument
fileref
specifies a fileref to use with the TYPE1 statement.
OPENTYPE Statement
Specifies one or more directories to be searched for valid OpenType font files.
Syntax
OPENTYPE <fileref> 'directory' <…'directory'>;
Required Argument
directory
specifies a directory to search. Only files that are recognized as valid OpenType
font files are processed. Each directory must be enclosed in quotation marks. If
you specify more than one directory, then you must separate the directories
with a space.
Optional Argument
fileref
specifies a fileref to use with the OPENTYPE statement.
Details
This example shows how to add a single font file to the SAS registry. The
FONTFILE statement specifies the complete path to a single font file.
Program
proc fontreg;
fontfile '<ttf> Arial';
run;
Log
Example Code 29.1 Adding a Single Font File to the SAS Registry
SUMMARY:
Files processed: 1
Unusable files: 0
Files identified as fonts: 1
Fonts that were processed: 1
Fonts replaced in the SAS registry: 0
Fonts added to the SAS registry: 1
Fonts that could not be used: 0
Font Families removed from SAS registry: 0
Details
This example shows how to add all valid font files from two different directories
and how to write detailed information to the SAS log.
1072 Chapter 29 / FONTREG Procedure
Program
proc fontreg msglevel=verbose;
fontpath 'your-font-directory-1' 'your-font-directory-2';
run;
Program Description
Write complete details to the SAS log. The MSGLEVEL=VERBOSE option writes
complete details about what fonts were added, what fonts were not added, and
what font files were not understood.
proc fontreg msglevel=verbose;
Specify the directories to search for valid fonts. You can specify more than one
directory in the FONTPATH statement. Each directory must be enclosed in
quotation marks. If you specify more than one directory, then you must separate
the directories with a space.
fontpath 'your-font-directory-1' 'your-font-directory-2';
run;
Example 3: Replacing Existing TrueType Font Files from a Directory 1073
Log
Example Code 29.2 Messages from Adding All Font Files from Multiple Directories
NOTE: The font "Albertus Medium" (Style: Regular, Weight: Normal) has been
added to the SAS Registry at
[CORE\PRINTING\FREETYPE\FONTS\<ttf>Albertus Medium]. Because it is a
TRUETYPE font, it can be referenced as "Albertus Medium" or
"<ttf>Albertus Medium" in SAS. The font resides in file
"your-font-directory-1\albr55w.ttf".
WARNING: The font "Georgia" (Style: Regular, Weight: Normal) will not be added
because it already exists in the "<ttf>Georgia" font family of the SAS Registry.
SUMMARY:
Files processed: 138
Unusable files: 3
Files identified as fonts: 135
Fonts that were processed: 135
Fonts replaced in the SAS registry: 0
Fonts added to the SAS registry: 91
Fonts that could not be used: 44
Font Families removed from SAS registry: 0
Details
This example reads all the TrueType fonts in the specified directory and replaces
the ones that already exist in the SAS registry.
Program
proc fontreg mode=replace;
truetype 'your-font-directory';
run;
Program Description
Replace existing fonts only. The MODE=REPLACE option limits the action of the
procedure to replacing fonts that are already defined in the SAS registry. New fonts
will not be added.
proc fontreg mode=replace;
Specify a directory that contains TrueType font files. Files in the directory that are
not recognized as being TrueType font files are ignored.
truetype 'your-font-directory';
run;
Log
Example Code 29.3 Replacing Existing TrueType Font Files from a Directory
SUMMARY:
Files processed: 49
Unusable files: 3
Files identified as fonts: 46
Fonts that were processed: 40
Fonts replaced in the SAS registry: 40
Fonts added to the SAS registry: 0
Fonts that could not be used: 0
Font Families removed from SAS registry: 0
1075
30
FORMAT Procedure
When you store formats in a library, SAS uses the session encoding in which the
formats were created. If the original encoding is not UTF-8, then truncation of
characters might occur if you convert the format library to an encoding that
requires more bytes to represent the characters.
Note: Moving format libraries between previous versions of SAS and CAS might
have some risk. SAS recommends that you use CNTLOUT data sets to reduce this
risk.
For more information about using format libraries in SAS Viya, see Migrating Data
to UTF-8 for SAS Viya and SAS Viya FAQ for Processing UTF-8 Data.
Overview: FORMAT Procedure 1077
Informats and formats tell SAS the data's type (character or numeric) and form
(such as how many bytes it occupies; decimal placement for numbers; how to
handle leading, trailing, or embedded blanks and zeros; and so on). SAS provides
informats and formats for reading and writing variables. For a thorough description
of informats and formats that SAS provides, see SAS Formats and Informats:
Reference.
read with
COMMA9.2
informat
printed using
DOLLAR9.2
format
In the figure, SAS reads the raw data value that contains the dollar sign and comma.
The COMMA9.2 informat ignores the dollar sign and comma and converts the value
to 1544.32. The DOLLAR9.2 format prints the value, adding the dollar sign and
comma. For more information about associating informats and formats with
variables, see “Associating Informats and Formats with Variables ” on page 1078.
In a DATA step Use the ATTRIB or INFORMAT Use the ATTRIB or FORMAT
statement to permanently statement to permanently associate
associate an informat with a a format with a variable. Use the PUT
variable. Use the INPUT function or function or PUT statement to
INPUT statement to associate the associate the format with the
informat with the variable only for variable only for the duration of the
the duration of the DATA step. DATA step.
In a PROC step The ATTRIB and INFORMAT Use the ATTRIB statement or the
statements are valid in Base SAS FORMAT statement to associate
procedures. However, in Base SAS formats with variables. If you use
software, typically you do not either statement in a procedure that
assign informats in PROC steps produces an output data set, then
because the data has already been the format is permanently
read into SAS variables. associated with the variable in the
output data set. If you use either
statement in a procedure that does
not produce an output data set or
modify an existing data set, the
statement associates the format
with the variable only for the
duration of the PROC step.
INPUT and PUT functions, see SAS Functions and CALL Routines: Reference. For
more information and example of using formats in Base SAS procedures, see
“Formatted Values” on page 63 .
Format Catalogs
PROC FORMAT stores user-defined informats and formats as entries in SAS
catalogs.1 You use the LIBRARY= option in the PROC FORMAT statement to
specify the catalog. If you omit the LIBRARY= option, then formats and informats
are stored in the Work.Formats catalog. If you specify LIBRARY=libref but do not
specify a catalog name, then formats and informats are stored in the
libref.FORMATS catalog. Note that this use of a one-level name differs from the
use of a one-level name elsewhere in SAS. With the LIBRARY= option, a one-level
name indicates a library; elsewhere in SAS, a one-level name indicates a file in the
WORK library.
The name of the catalog entry is the name of the format or informat. The entry
types are as follows:
n FORMAT for numeric formats
1. Catalogs are a type of SAS file and reside in a SAS library. If you are unfamiliar with the types of SAS files or the SAS
library structure, then see the section on SAS files in SAS Language Reference: Concepts.
Concepts: FORMAT Procedure 1081
You permanently store informats and formats by using the LIBRARY= option in the
PROC FORMAT statement. See the discussion of the LIBRARY= option in the PROC
FORMAT Statement on page 1085.
SAS uses one of two methods when searching for user-defined formats and
informats:
n By default, SAS always searches a library that is referenced by the Library libref
for a FORMATS catalog. If you have only one format catalog, then do the
following:
1 Assign the Library libref to a SAS library in the SAS session in which you are
running the PROC FORMAT step.
3 In the SAS program that uses your user-defined formats and informats,
include a LIBNAME statement to assign the Library libref to the library that
contains the permanent format catalog.
n If you have more than one format catalog, or if the format catalog is named
something other than Formats, then do the following:
1 Assign a libref to a SAS library in the SAS session in which you are running
the PROC FORMAT step.
3 In the SAS program that uses your user-defined formats and informats, use
the FMTSEARCH= option in an OPTIONS statement, and include libref or
libref.catalog in the list of format catalogs.
For more information, see “FMTSEARCH= System Option” in SAS System Options:
Reference. For an example that uses the LIBRARY= and FMTSEARCH= options
together, see “Example 17: Writing Ranges for Character Strings” on page 1187.
For more information, see “FMTERR System Option” in SAS System Options:
Reference.
If SAS encounters a missing variable to format using a user-defined format and the
MISSING= system option defines a character to be printed for missing values, the
missing value is determined as follows:
n If the user-defined format or informat has a value-range-set for missing values,
the missing value is defined by the user-defined format.
n If the user-defined format does not have a value-range-set defined for missing
values, the missing value is defined by the MISSING= system option. The
default value for the MISSING= system option is . (period).
Instead of using the FMTLIB option, you can use the CNTLOUT= option to create an
output data set that stores information about informats and formats. You can then
use PROC PRINT or PROC REPORT to print the data set. In this case, labels are not
truncated.
Note: You can use data set options to keep or drop references to additional
variables that were added by using the CNTLOUT= option.
Concepts: FORMAT Procedure 1083
The SAS client session and the CAS session can interact through the session
reference that you establish with the SESSREF system option or the SESSREF
argument in the CASLIB statement. When you use a SAS language element that
can take advantage of processing in CAS, the session reference identifies where
that processing should occur. If you do not specify a session reference, then
processing occurs in the client session. If the language element is not supported in
CAS, then processing occurs in the client session.
Here are some user-defined format values that could be written using PROC
FORMAT:
1='Yes'
2='No'
3='Possibly'
If you use these IF-THEN/ELSE statements, SAS begins searching with the first
value in the range, x=1, and steps through the values until it finds a matching value.
1084 Chapter 30 / FORMAT Procedure
This type of search can be more efficient if the value that you want is near the
beginning of the range of values.
if x=1 then label='Yes';
else if x=2 then label='No';
else if x=3 then label='Possibly';
If you are searching for the value 3, there will be 3 comparisons before a match is
found. If you are searching for the value 1, there will be only one comparison made.
If you use the VALUE statement of PROC FORMAT, SAS begins searching at the
middle value in the range. In this example, SAS begins by comparing the value to
the middle range value, 2='No'. It then compares the value to the higher range,
value 3="Possibly", and then compares the value to the lower range value,
1="Yes".
value
1="Yes"
2="No"
3="Possibly";
A binary search like PROC FORMAT uses is more efficient when the range has a
large number of values to search.
PROC FORMAT Define formats and informats for variables Ex. 3, Ex. 13,
Ex. 15, Ex.
16
VALUE Create a format that specifies character strings Ex. 8, Ex. 11,
to use to print variable values Ex. 17
Tips: User–defined format names cannot end in a number. For more information, see
“User-Defined Formats” in SAS Formats and Informats: Reference and “Names in the
SAS Language” in SAS Language Reference: Concepts.
You can use data set options with the CNTLIN= and CNTLOUT= data set options.
See “Data Set Options” on page 23 for a list.
Moving catalogs between previous versions of SAS and CAS might have some risk.
SAS recommends that you use CNTLOUT data sets to reduce this risk.
Examples: “Example 3: Creating a Picture Format” on page 1150
“Example 13: Creating a Format from a CNTLIN= Data Set” on page 1173
“Example 15: Printing the Description of Informats and Formats” on page 1183
“Example 16: Retrieving a Permanent Format” on page 1185
1086 Chapter 30 / FORMAT Procedure
Syntax
PROC FORMAT <options>;
Optional Arguments
CASFMTLIB='name'
adds a format library to a CAS session.
You can specify the CASFMTLIB option only in an active SAS Cloud Analytic
Services (CAS) session. PROC FORMAT connects to the CAS session and loads
a format library. If the format library already exists in the CAS session, then SAS
updates it. SAS also appends the format library to the search list for any
subsequent referencing by procedures that are operating in CAS in that session.
That is, if a format library already exists and you create a new library with
PROC FORMAT Statement 1087
CASFMTLIB, then the new library is appended to the search order. The library
name should be a one-level name that does not contain any slashes.
Note: A CAS session can have more than one format library. The libraries are
available only to that CAS session. For information about using a CAS action to
promote a format library to a global scope, see “Promote format library” in SAS
Viya: System Programming Guide.
You can specify additional CAS sessions with the SESSREF= option. For
information about the SESSREF= option and other CAS language elements, see
SAS Cloud Analytic Services: User’s Guide.
SAS formats are available in the local SAS client regardless of whether you add
them to a format library in CAS. When the CASFMTLIB option is specified, the
EXCLUDE and SELECT statements are applied to the local SAS session format
catalogs, not to the CAS session format library. Informats cannot be loaded into
a CAS session. If you specify an INVALUE statement with CASFMTLIB, then a
note is written to the log and nothing is written to the CAS format library.
Restriction The format library name that you specify with the CASFMTLIB
option cannot have a length of more than 63 characters.
Tip You can use the CAS action addFmtLib=fmtsearch to control the
order in which SAS searches for format libraries. For more
information, see “Manage User-Defined Formats with CAS Actions”
in SAS Cloud Analytic Services: User’s Guide
CNTLIN=input-control-SAS-data-set
specifies a SAS data set from which PROC FORMAT builds informats or
formats.
CNTLIN= builds formats and informats without using a VALUE, PICTURE, or
INVALUE statement. If you specify a one-level name, then the procedure
searches only the default library (either the WORK library or USER library) for
the data set, regardless of whether you specify the LIBRARY= option.
Notes When using PROC FORMAT with a CNTLIN data set the START and
END columns must have the same length. If the lengths are different,
an error might occur.
1088 Chapter 30 / FORMAT Procedure
Tip A common source for an input control data set is the output from the
CNTLOUT= option of another PROC FORMAT step.
Example “Example 13: Creating a Format from a CNTLIN= Data Set” on page
1173
CNTLOUT=output-control-SAS-data-set
creates a SAS data set that stores information about informats or formats that
are contained in the catalog specified in the LIBRARY= option. If you are
creating an informat or format in the same step that the CNTLOUT= option
appears, then the informat or format that you are creating is included in the
CNTLOUT= data set.
If you specify a one-level name, then the procedure stores the data set in the
default library (either the WORK library or the USER library), regardless of
whether you specify the LIBRARY= option.
If you issue CNTLOUT= with an ENCODING option to create an output data set
that has a different encoding, Cross Environment Data Access (CEDA) might
issue a truncation error, SAS stops processing, and the CNTLOUT= data set is
created with 0 observations. SAS writes an error to the log such as the following
about the truncation of data:
ERROR: Some character data was lost during transcoding in the dataset
WORK.TEST. Either the data contains characters that are not
representable in the new encoding or truncation occurred during
transcoding.
Suppose you are using a data set that contains monetary values in Euros. You
are using the Wlatin1 session encoding, and you specify UTF-8 encoding for the
CNTLOUT= data set. In Wlatin1 the LABEL variable is predetermined to be 5
bytes long and have a value of €1234 (in hexadecimal representation,
'803132334'x). When you attempt to store the variable in the CNTLOUT= data
set with UTF-8 encoding, the length of that string must be 7 bytes (in
hexadecimal representation, 'E282AC3132334'x). The two additional bytes are
needed in UTF-8 encoding for the Euro sign. Without the additional two bytes,
the string is truncated. You can use the %COPY_TO_NEW_ENCODING macro to
prevent this error. For information about the %COPY_TO_NEW_ENCODING
macro, see “Avoiding Character Truncation Using the
%COPY_TO_NEW_ENCODING Macro” in SAS National Language Support (NLS):
Reference Guide.
The macro examines the CNTLOUT data set and re-creates it in the new
encoding with the necessary lengths. If a width is stored as part of the
associated format, the value is not expanded by the CVP engine. The format can
cause truncation when the formatted value is displayed.
The DEFAULT value for the format width in the CNTLOUT data set is the
default width (in bytes) for the format. The user can specify DEFAULT=, MIN=,
and MAX= when they create the format, or the default is computed based on
the largest label. If the START, MIN, MAX, or LABEL variable, in characters, is
PROC FORMAT Statement 1089
larger in UTF-8 encoding, then these widths are not expanded by the CVP
engine. Use the %COPY_TO_NEW_ENCODING macro instead. For more
information about using CNTLOUT= with PROC FORMAT to convert catalogs to
UTF-8, see “Converting Format Catalogs to UTF-8 Encoding” in Moving and
Accessing SAS Files.
Tip You can use an output control data set as an input control data set in
subsequent PROC FORMAT steps.
SAS Viya supports only UTF-8 encoding. For information about the
encoding of your format catalogs in SAS Viya, see Migrating Data to
UTF-8 for SAS Viya and SAS Viya FAQ for Processing UTF-8 Data.
FMTLIB
prints information about informats or formats in the catalog that is specified in
the LIBRARY= option. To get information about specific informats or formats,
subset the catalog using the SELECT or EXCLUDE statement.
Tips If your output from FMTLIB is not formatted correctly in the ODS
LISTING destination, then try increasing the value of the LINESIZE=
system option.
LIBRARY=libref<.catalog>
specifies a SAS library or catalog that contains the informats or formats that
you are creating in the PROC FORMAT step. The procedure stores these
informats and formats in the catalog that you specify so that you can use them
in subsequent SAS sessions or jobs.
Alias LIB=
Default If you omit the LIBRARY= option, then formats and informats are
stored in the Work.Formats catalog. If you specify the LIBRARY=
option but do not specify a name for catalog, then formats and
informats are stored in the libref.FORMATS catalog.
1090 Chapter 30 / FORMAT Procedure
You can control the order in which SAS searches for format catalogs
with the FMTSEARCH= system option. For more information, see
“FMTSEARCH= System Option” in SAS System Options: Reference.
LOCALE
specifies to create a format catalog that corresponds to the current SAS locale.
The name of the catalog that SAS creates is the SAS library or catalog that is
specified in the LIBRARY= option appended with the five-character POSIX
locale value for the current SAS locale.
See For a list of POSIX locale values, see “LOCALE= Values for
PAPERSIZE and DFLANG, Options” in SAS National Language Support
(NLS): Reference Guide.
MAXLABLEN=number-of-characters
specifies the number of characters in the informatted or formatted value that
you want to appear in the CNTLOUT= data set or in the output of the FMTLIB
option. The FMTLIB option prints a maximum of 40 characters for the
informatted or formatted value.
MAXSELEN=number-of-characters
specifies the number of characters in the start and end values that you want to
appear in the CNTLOUT= data set or in the output of the FMTLIB option. The
FMTLIB option prints a maximum of 16 characters for start and end values.
NOREPLACE
prevents a new informat or format from replacing an existing one of the same
name. If you omit NOREPLACE, then the procedure warns you that the informat
or format already exists and replaces it.
Note You can have a format and an informat of the same name.
PAGE
prints information about each format and informat in the catalog.
Tip In the ODS LISTING destination, the information about each format
and informat appears on separate pages in the Output window.
EXCLUDE Statement 1091
EXCLUDE Statement
Excludes entries from processing by the FMTLIB and CNTLOUT= options.
Restrictions: Only one EXCLUDE statement can appear in a PROC FORMAT step.
You cannot use a SELECT statement and an EXCLUDE statement within the same
PROC FORMAT step.
When the CASFMTLIB option is specified, the EXCLUDE statement ignores format
libraries in CAS sessions and refers only to catalogs in the SAS session.
Syntax
EXCLUDE entry(s);
Required Argument
entry(s)
specifies one or more catalog entries to exclude from processing. Catalog entry
names are the same as the name of the informat or format that they store.
Because informats and formats can have the same name, and because character
and numeric informats or formats can have the same name, you must use
certain prefixes when specifying informats and formats in the EXCLUDE
statement. Follow these rules when specifying entries in the EXCLUDE
statement:
n Precede names of entries that contain character formats with a dollar sign
($).
n Precede names of entries that contain character informats with an at sign
and a dollar sign (for example, @$entry-name).
n Precede names of entries that contain numeric informats with an at sign (@).
Details
exclude apple-pear;
FMTLIB Output
If you use the EXCLUDE statement without either FMTLIB or CNTLOUT= in the
PROC FORMAT statement, then the procedure invokes the FMTLIB option and you
receive FMTLIB option output.
INVALUE Statement
Creates an informat for reading and converting raw data values.
See: SAS Formats and Informats: Reference for documentation on informats supplied by
SAS.
Example: “Example 12: Converting Raw Character Data to Numeric Values” on page 1170
Syntax
INVALUE <$>name <(informat-options)> <value-range-set(s)>;
Required Argument
name
names the informat that you are creating.
Requirement The name must be a valid SAS name. A numeric informat name can
be up to 31 characters in length; a character informat name can be
up to 30 characters in length and cannot end in a number. If you are
creating a character informat, then use a dollar sign ($) as the first
character. Adding the dollar sign to the name is why a character
informat is limited to 30 characters.
Tips Refer to the informat later by using the name followed by a period.
However, do not use a period after the informat name in the
INVALUE statement.
Optional Arguments
DEFAULT=length
specifies the default length of the informat. The value for DEFAULT= becomes
the length of the informat if you do not give a specific length when you
associate the informat with a variable.
For numeric informats, 12 if you have numeric data to the left of the
equal sign
Note If you specify an invalid value for DEFAULT=, SAS ignores the value
and writes an error to the log.
FUZZ=fuzz-factor
specifies a fuzz factor for matching values to a range. If a number does not
match or fall in a range exactly but comes within fuzz-factor, then the informat
considers it a match. For example, the following INVALUE statement creates the
LEVELS. informat, which uses a fuzz factor of .2:
invalue levels (fuzz=.2) 1='A'
2='B'
3='C';
FUZZ=.2 means that if a variable value falls within .2 of a value on either end of
the range, then the informat uses the corresponding formatted value to store
the variable value. So the LEVELS. informat saves the value 2.1 as B.
Tips Specify FUZZ=0 to save storage space when you use the INVALUE
statement to create numeric informats.
Use a nonzero fuzz factor only with numbers that are very close but not
an exact match. Ranges are stored internally in sorted order (unless the
NOTSORTED option is used), in order to perform a binary search. When a
fuzz-factor is added to the end of one range and subtracted from the
beginning of the next range, and the ranges overlap, the results can be
unpredictable. A value is placed in the first range that is a match in the
binary search. The exclusion operator is insufficient to override this binary
search algorithm. As a best practice, when you use the exclusion operator,
set FUZZ=0 or the NOTSORTED option.
A best practice is to use FUZZ=0 when you use the < exclusion operator
with numeric informats.
JUST
left-justifies all input strings before they are compared to ranges.
MAX=length
specifies a maximum length for the informat. When you associate the informat
with a variable, you cannot specify a width greater than the MAX= value.
Default 40
Range 1–32767
Note If you specify an invalid value for MAX=, SAS ignores the value and
writes an error to the log.
MIN=length
specifies a minimum length for the informat. If a CNTLIN= data set contains a
value for MIN that rounds to 0 or less, SAS ignores the invalid value and
substitutes 1 for it. When you specify the CNTLIN= data set in SAS code, SAS
continues processing the code step and does not write an error to the log.
The following example specifies a data set that has a MIN value of -1 and uses
CNTLIN= to call the data set on the subsequent PROC FORMAT statement.
data temp;
fmtname=’abc’;
start=1;
label=’xyz’;
INVALUE Statement 1095
min=–1;
run;
If you specify an invalid value for MIN in code, SAS does not ignore the invalid
value or substitute another value for it. SAS stops processing the code step and
writes an error to the log.
The following example specifies min=–1 for the NEWABC data set, which
causes SAS to stop processing and issue an error.
proc format;
value newabc (min=–1) 1=’yes’;
run;
Default 1
Range 1–32767
NOTSORTED
stores values or ranges in the order in which you define them.
If you do not specify NOTSORTED, then values or ranges are stored in sorted
order by default, and SAS uses a binary searching algorithm to locate the range
that a particular value falls into. If you specify NOTSORTED, then SAS searches
each range in the order in which you define them until a match is found.
SAS automatically sets the NOTSORTED option when you use the CPORT and
CIMPORT procedures to transport informats or formats between operating
environments with different standard collating sequences. This automatic
setting of NOTSORTED can occur when you transport informats or formats
between ASCII and EBCDIC operating environments. If this situation is
undesirable, then do the following:
n Use the CNTLOUT= option in the PROC FORMAT statement to create an
output control data set.
n Use the CPORT procedure to create a transport file for the control data set.
n In the target operating environment, use PROC FORMAT with the CNTLIN=
option to build the formats and informats from the imported control data
set.
REGEXP
REGEXPE
specifies that the preceding range is to be treated as a Perl regular expression. If
you specify REGEXPE, the regular expression is expected to produce a modified
result, as in using the substitute action.
During execution, all regular expressions are compiled and the input data is
passed to the first expression to confirm a match. If there is a match, the
corresponding label is used. If there is no match, the next range is compared.
Ranges are not sorted and are processed in the order in which they were defined
in the INVALUE statement or in the order in which they appear in the CNTLIN=
data set.
The rules for regular expressions using the REGEXP option are the same as they
are for the PRXPARSE function in the DATA step. The rules for the REGEXPE
option are the same as they are for the PRXCHANGE function.
Interaction If you are using a CNTLIN= data set, the HLO variable contains P for
REGEXP and E for REGEXPE.
UPCASE
converts all raw data values to uppercase before they are compared to the
possible ranges. If you use UPCASE, then make sure the values or ranges that
you specify are in uppercase.
value-range-set(s)
specifies raw data and values that the raw data becomes. The value-range-
set(s) can be one or more of the following:
The informat converts the raw data to the values of informatted-value on the
right side of the equal sign.
value-or-range
See “Specifying Values or Ranges” on page 1129.
informatted-value
is the value that you want the raw data in value-or-range to become. Use one
of the following forms for informatted-value:
'character-string'
is a character string up to 32,767 characters long. Typically, character-
string becomes the value of a character variable when you use the
informat to convert raw data. Use character-string for informatted-value
only when you are creating a character informat. If you omit the single or
double quotation marks around character-string, then the INVALUE
statement assumes that the quotation marks are there.
INVALUE Statement 1097
number
is a number that becomes the informatted value. Typically, number
becomes the value of a numeric variable when you use the informat to
convert raw data. Use number for informatted-value when you are
creating a numeric informat. The maximum for number depends on the
host operating environment.
_ERROR_
treats data values in the designated range as invalid data. SAS assigns a
missing value to the variable, prints the data line in the SAS log, and
issues a warning message.
_SAME_
prevents the informat from converting the raw data as any other value.
For example, the following GROUP informat converts values 01 through
20 and assigns the numbers 1 through 20 as the result. All other values
are assigned a missing value.
invalue group 01-20= _same_
other= .;
existing-informat
is an informat that is supplied by SAS or an existing user-defined informat.
The informat that you are creating uses the existing informat to convert the
raw data that match value-or-range on the left side of the equal sign. If you
use an existing informat, then enclose the informat name in square brackets
(for example, [date9.]) or with parentheses and vertical bars (for example, (|
date9.|)). Do not enclose the name of the existing informat in single
quotation marks.
Examples
The dollar sign prefix indicates that the informat converts character data.
1098 Chapter 30 / FORMAT Procedure
If you use a numeric informat to convert character strings that do not correspond to
any values or ranges, then you receive an error message.
PICTURE Statement
Creates a template for printing numbers.
The DATATYPE, DECSEP, DIG3SEP, FILL, LANGUAGE, MULT, NOEDIT, and PREFIX
options are valid in parentheses after the user-supplied value label.
The DATATYPE, DECSEP, DIG3SEP, FILL, LANGUAGE, MULT, NOEDIT, PREFIX, and
ROUND options are valid only with the PICTURE statement.
See: SAS Formats and Informats: Reference and SAS National Language Support (NLS):
Reference Guide for documentation about formats that are supplied by SAS.
Examples: “Example 3: Creating a Picture Format” on page 1150
“Example 5: Filling a Picture Format” on page 1155
“Example 18: Creating a Format in a non-English Language” on page 1190
Syntax
PICTURE name <(format-options)>
<value-range-set-1 <(picture-1-options)>
<value-range-set-2 <(picture-2-options)>> …>;
specifies the language that is used for weekdays and months that you
can substitute in a date, time, or datetime picture.
MAX=length
specifies a maximum length for the format.
MIN=length
specifies a minimum length for the format.
MULTILABEL
enables the assignment of labels to multiple values-or-range values that
might have the same or overlapping values.
NOTSORTED
stores values or ranges in the order in which you define them.
ROUND
rounds the value to the nearest integer before formatting.
Required Argument
name
names the format that you are creating.
Requirement The name must be a valid SAS name. A numeric format name can
be up to 32 characters in length; a character format name can be
up to 31 characters in length, not ending in a number. If you are
creating a character format, you use a dollar sign ($) as the first
character, which is why a character informat is limited to 31
characters. For information about SAS names, see “Rules for
Words and Names in the SAS Language” in SAS Language
Reference: Concepts.
Tip Refer to the format later by using the name followed by a period.
However, do not put a period after the format name in the VALUE
statement.
Optional Arguments
DATATYPE=DATE | TIME | DATETIME | DATETIME_UTIL
enables the use of directives in the picture as a template to format date, time, or
datetime values. Specify either DATE, TIME, DATETIME, or DATETIME_UTIL
based on the directive that you use in the picture format. See the definition and
list of directives on page 1107 in the description of picture.
PICTURE Statement 1101
Tip If you format a numeric missing value, then the resulting label is
ERROR. Adding a clause to your program that checks for missing
values can eliminate the ERROR label.
DEFAULT=length
specifies the default length of the picture. The value for DEFAULT= becomes
the length of picture if you do not give a specific length when you associate the
format with a variable.
Range 1–32767
Tip If you are using the DATATYPE= option, use the DEFAULT= option to
set the default format width large enough to format these characters.
DECSEP='character'
specifies the separator character for the fractional part of a number.
DIG3SEP='character'
specifies the three-digit separator character for a number.
Default , (a comma)
FILL='character'
specifies a character that completes the formatted value.
If the number of significant digits is less than the length of the format, then the
format must complete, or fill, the formatted value:
n The format uses character to fill the formatted value if you specify zeros as
digit selectors.
n The format uses zeros to fill the formatted value if you specify nonzero-digit
selectors. The FILL= option has no effect.
If the picture includes other characters, such as a comma, which appear to the
left of the digit selector that maps to the last significant digit placed, then the
characters are replaced by the fill character or leading zeros.
Restriction The FILL= option is not valid when you use a function to format a
value.
Interaction If you use the FILL= and PREFIX= options in the same picture, then
the format places the prefix and then the fill characters.
FUZZ=fuzz-factor
specifies a fuzz factor for matching values to a range. If a number does not
match or fall in a range exactly but comes within fuzz-factor, on either end of
the range, then the format considers it a match. For example, the following
VALUE statement creates the LEVELS. format, which uses a fuzz factor of .2:
value levels (fuzz=.2) 1='A'
2='B'
3='C';
FUZZ=.2 means that if a variable value falls within .2 of a value on either end of
the range, then the format uses the corresponding formatted value to print the
variable value. The LEVELS format formats the value 2.1 as B.
Tips Specify FUZZ=0 to save storage space when you use the VALUE
statement to create numeric formats.
Use a nonzero fuzz factor only with numbers that are very close but not
an exact match. If fuzz-factor is added to the end of one range and
subtracted from the beginning of the next range, and the ranges
overlap, the results can be unpredictable. A value is placed in the first
range that is a match in a binary search.
A best practice is to use FUZZ=0 when you use the < exclusion
operator with numeric formats.
LANGUAGE=
specifies the language that is used for weekdays and months that you can
substitute in a date, time, or datetime picture. If you specify a language that is
not supported or is invalid, English is used.
Tip To use a user-defined format in languages other than those that are
supported by the LANGUAGE= option, set the LOCALE= system
option to the locale for the language. In PROC FORMAT, do not
specify the LANGUAGE= option. The language of a picture format is
determined by the locale setting. For a list of locales, see “LOCALE=
PICTURE Statement 1103
See “DFLANG= System Option: UNIX, Windows, and z/OS” in SAS National
Language Support (NLS): Reference Guide
MAX=length
specifies a maximum length for the format. When you associate the format with
a variable, you cannot specify a width greater than the MAX= value.
Default 40
Range 1–32767
MIN=length
specifies a minimum length for the format.
Default 1
Range 1–32767
MULTILABEL
enables the assignment of labels to multiple values-or-range values that might
have the same or overlapping values. The label is the formatted value that is
determined by the picture definition on the right of the equal sign in a value-
range-set. Here is an example of how MULTILABEL is used:
The following PICTURE statements show the two uses of the MULTILABEL
option. In each case, number formats are assigned as labels. The first PICTURE
statement assigns multiple labels to a single value. Multiple labels can also be
assigned to a single range of values. The second PICTURE statement assigns
labels to overlapping ranges of values. The MULTILABEL option enables the
assignment of multiple labels to the overlapped values.
picture abc (multilabel)
1000='9,999'
1000='9999';
/* with decimals */
0-9='9.999'
10-99='99.99'
100-999='999.9';
The primary label for a given entry is the formatted value (based on the picture)
that is assigned to the first value or range-of-values (left side of the equal sign)
that matches or contains the entry when all values (on the left side of the equal
sign) are ordered sequentially. Here is an example:
1104 Chapter 30 / FORMAT Procedure
n In the first PICTURE statement, the primary label for 1000 is 1,000 because
the picture 9,999 is the first value that is assigned to 1000. The secondary
label for 1000 is 1000, based on the 9999 picture.
n In the second PICTURE statement, the primary label for 5 is 5.000 based on
the 9.999 picture that is assigned to the range 0–9 because 0–9 is
sequentially the first range of values that contain 5. The secondary label for
5 is 005 because the range 0–999 occurs in sequence after the range 0–9.
Unless you use the NOTSORTED option when you assign value-range-sets, SAS
stores the value-range-sets in sorted order. This order can produce unexpected
results when value-range-sets with the MULTILABEL format are processed.
Here is an example:
In the second PICTURE statement, the primary label for 15 is 015, and the
secondary label for 15 is 15.00 because the range 0–999 occurs in sequence
before the range 10–99. If you want the primary label for 15 to use the 99.99
format, then you might want to change the range 10–99 to 0–99 in the PICTURE
statement. The range 0–99 occurs in sequence before the range 0–999 and
produces the desired result.
Restriction The maximum number of labels that can be created for a single
format or informat is 255.
MULTIPLIER=n
specifies a number to multiply the variable's value by before it is formatted. The
value of the MULTIPLIER= option depends both on the result of the
multiplication and on the digit selectors in the picture portion of the value-
range-set. For example, the following PICTURE statement creates the MILLION.
format, which formats the variable value 1600000 as $1.6M:
picture million low-high='09.9M'
(prefix='$' mult=.00001);
1600000 is first multiplied by .00001, which equals 16. Note that there is a digit
selector after the decimal. The value 16 is placed into the picture beginning on
the right. The value 16 overlays 09.9, and results in 01.6. Leading zeros are
dropped, and the final result is 1.6M.
If the value of low-high is equal to '000M', then the result would be 16M.
Alias MULT=
Default 10 n, where n is the number of digits after the first decimal point in
the picture. For example, suppose your data contains a value
123.456 and you want to print it using a picture of '999.999'. The
format multiplies 123.456 by 10 3 to obtain a value of 123456, which
results in a formatted value of 123.456.
Restrictions The MULT= option is not valid when you use a function to format a
value.
NOEDIT
specifies that numbers are message characters rather than digit selectors. That
is, the format prints the numbers as they appear in the picture. For example, the
following PICTURE statement creates the MILES. format, which formats any
variable value greater than 1000 as >1000 miles:
picture miles 1-1000='0000'
1000<-high='>1000 miles'(noedit);
Restriction The NOEDIT= option is not valid when you use a function to format
a value.
NOTSORTED
stores values or ranges in the order in which you define them. If you do not
specify NOTSORTED, then values or ranges are stored in sorted order by
default, and SAS uses a binary searching algorithm to locate the range that a
particular value falls into. If you specify NOTSORTED, then SAS searches each
range in the order in which you define them until a match is found.
SAS automatically sets the NOTSORTED option when you use the CPORT and
CIMPORT procedures to transport informats or formats between operating
environments with different standard collating sequences. This automatic
setting of NOTSORTED can occur when you transport informats or formats
between ASCII and EBCDIC operating environments. If this situation is
undesirable, then do the following:
n Use the CNTLOUT= option in the PROC FORMAT statement to create an
output control data set.
n Use the CPORT procedure to create a transport file for the control data set.
n In the target operating environment, use PROC FORMAT with the CNTLIN=
option to build the formats and informats from the imported control data
set.
PREFIX='prefix'
specifies a character prefix to place in front of the formatted value. The prefix is
placed in front of the value's first significant digit. You must use zero-digit
selectors or the prefix is not used.
Typical uses for PREFIX= are printing leading currency symbols and minus signs.
For example, the PAY format prints the variable value 25500 as $25,500.00:
picture pay
low-high='000,009.99' (prefix='$');
Default no prefix
Restriction The PREFIX= option is not valid when you use a function to format a
value.
Interaction If you use the FILL= and PREFIX= options in the same picture, then
the format places the prefix and then the fill characters.
CAUTION If the picture is not wide enough to contain both the value and the
prefix, then the format truncates or omits the prefix, which results
in inaccurate data.
ROUND
rounds the value to the nearest integer before formatting. Without the ROUND
option, the format multiplies the variable value by the multiplier, truncates the
decimal portion (if any), and prints the result according to the template that you
define. With the ROUND option, the format multiplies the variable value by the
multiplier, rounds that result to the nearest integer, and then formats the value
according to the template. Note that if the FUZZ= option is also specified, the
rounding takes place after SAS has used the fuzz factor to determine which
range the value belongs to.
Tip The ROUND option rounds a value of .5 to the next highest integer.
CAUTION The picture must be wide enough for an additional digit if rounding
a number adds a digit to the number. For example, the picture for the
number .996 could be ‘99’ (prefix ‘.’ mult=100). After rounding the number
and multiplying it by 100, the resulting number is 100. When the picture is
applied, the result is .00, an inaccurate number. In order to ensure accuracy
of numbers when you round numbers, make the picture wide enough to
accommodate larger numbers.
value-range-set
specifies one or more variable values and a template for printing those values.
value-range-set has the following form:
value-or-range
See “Specifying Values or Ranges” on page 1129.
picture
specifies a template for formatting values of numeric variables. The picture
is a sequence of characters in single quotation marks. The maximum length
for a picture is 40 characters. Pictures are specified with three types of
characters: digit selectors, message characters, and directives. You can have
a maximum of 16 digit selectors in a picture.
digit selectors
are numeric characters (0 through 9) that define positions for numeric
values. A picture format with nonzero-digit selectors prints any leading
zeros in variable values; picture digit selectors of 0 do not print leading
zeros in variable values. If the picture format contains digit selectors,
then a digit selector must be the first character in the picture.
message characters
are nonnumeric characters that are printed as specified in the picture.
The following PICTURE statement contains both digit selectors (99) and
message characters (illegal day value). Because the DAYS. format
has nonzero-digit selectors, values are printed with leading zeros. The
special range OTHER prints the message characters for any values that
do not fall into the specified range (1 through 31).
picture days
01-31='99'
other='99-illegal day value';
directives
are special characters that you can use in the picture to format date,
time, or datetime values.
Note: You can use directives only when you specify the DATATYPE=
option on page 1100 in the PICTURE statement. Ensure that the value of
the DATATYPE= option is appropriate for the type of directive that you
want to use. If you use an inappropriate value, the data does not format.
For example, for the %a directive, use DATATYPE=DATE.
The DFLANG datetime handler and the National Language (NL) datetime
handler are the two methods to control character casing in user format
handling. The DFLANG datetime handler is the default method under
non–UTF8 and non–DBCS sessions. The NL datetime handler is the
default method under UTF8 and DBCS sessions.
Note: The DFLANG datetime handler is used if you specify it for UTF8
and DBCS sessions. Otherwise, the NL datetime handler is used.
1108 Chapter 30 / FORMAT Procedure
If you specify the DFLANG datetime handler, then month names are in all
uppercase if you specify English (for example, JAN), they are in all
lowercase if you specify French (for example, jan), and they are in mixed
case if you specify German (for example, Jan).
If you specify the NL datetime handler, then month names are in mixed
case if you specify English or German (Jan), and they are in lowercase if
you specify French (for example, jan).
If you mainly use the EUR* datetime formats and you need support for
European languages only for compatibility, then you can use the
DFLANG= option to specify the language.
Use the LOCALE= system option to set the locale for double-byte and
UTF-8 character sets. The %b directive formats month names with only
the first letter in uppercase when the locale is set with LOCALE=. For
more information, see “LOCALE System Option” in SAS National
Language Support (NLS): Reference Guide.
%a
abbreviated weekday name (for example, Wed).
PICTURE Statement 1109
%A
full weekday name (for example, Wednesday).
%b
abbreviated month name (for example, JAN or Jan).
Tip For the English language, use the directive %3B to create an
abbreviated month with only an uppercase initial letter (for
example, Jan).
%<n>B
the full month name (for example, January) if n is not included in the
directive. n specifies the number of characters that appear for the
month name. In comparison, the %b directive writes a three-character
month abbreviation in uppercase letters for some locales.
%C
long month name with blank padding (January through December)
(for example, December).
%d
day of the month.
%e
day of the month as a two-character decimal number with leading
spaces (" 1"- "31") (for example, “ 2”).
%F
full weekday name with blank padding.
%G
year as a four-digit decimal number (for example, 2008). If the week
that contains January 1 has four or more days in the new year, then it
is considered week 1 in the new year. Otherwise, it is the last week of
the previous year and the year is considered the previous year.
%H
hour (24-hour clock).
%I
hour (12-hour clock).
Alias %i
%j
day of the year as a decimal number (1–366), with leading zero.
%m
month (1–12).
%M
minute (0–59).
%n
number of days in a duration as a decimal number (maximum of 10
digits) (for example, 25).
Restriction This directive is not valid for DBCS and Unicode SAS
sessions.
%o
month (1-12) with blank padding (for example, " 2").
%p
equivalent to either a.m. or p.m.
%q
abbreviated quarter of the year string such as 1, 2, 3, or 4.
%Q
quarter of the year string, such as Quarter1, Quarter2, Quarter3, or
Quarter4.
PICTURE Statement 1111
%s
fractional seconds as decimal digits (for example, .39555). The
number of digits formatted is the number of digits to the right of the
decimal point that is specified when you use the format. SAS rounds
fractional seconds to accommodate the number of digits specified for
fractional seconds.
Restriction This directive is not valid for DBCS and Unicode SAS
sessions.
%S
seconds (0–59), allowing for possible leap seconds.
%u
weekday as a one-digit decimal number (1–7 (Monday - Sunday)) (for
example, Sunday=7).
%U
week number of the year as a decimal number (0–53). Sunday is
considered the first day of the week.
%V
week number (01–53) with the first Monday as the start day of the
first week. Minimum days of the first week is 4.
%w
weekday as a one-digit decimal number (0–6 (Sunday through
Saturday)) (for example, Sunday=0).
%W
week number (0–53) with the first Monday as the start day of the first
week.
%y
year without century (0–99) (for example, 93).
%Y
year with century as a four-digit decimal number (1970–2069) (for
example, 1994).
%z
UTC time-zone offset.
%Z
time-zone name.
%%
the % character.
Tip Add code to your program to direct how you want missing values to be
displayed.
Details
This program creates, sorts, and prints the sample data set:
data sample;
input Amount;
datalines;
-2.051
PICTURE Statement 1113
-.05
-.017
0
.093
.54
.556
6.6
14.63
0.996
-0.999
-45.00
;
run;
Here is the PROC FORMAT step that creates the NOZEROSR. and NOZEROS.
formats. Both formats eliminate leading zeros in the formatted values. The
NOZEROSR. format specifies the ROUND option to round numbers. The NOZEROS.
format does not perform rounding.
libname library 'SAS-library';
proc format;
1114 Chapter 30 / FORMAT Procedure
run;
The following table explains how one value from each range is formatted. For an
illustration of each step, see Table 30.74 on page 1117.
1 Determine into which range In the second range, the exclusion operator <
the value falls and use that appears on both sides of the hyphen and
picture. excludes −1 and -.99 from the range. The third
range excludes 0 and .99. The fourth range
excludes 1.
Because exclusion operators are used, the
FUZZ=0 option is specified.
2 Take the absolute value of Because the absolute value is used, you need a
the numeric value. separate range and picture for the negative
numbers in order to prefix the minus sign.
4 If the number is within 10 –8 of Because the example uses MULT= values that
a higher integer, round the ensured that all of the significant digits were
number up. This operation is moved to the left of the decimal, no significant
performed before the digits are lost. The zeros are truncated.
ROUND option is performed. 205.1 is rounded to 205.
The ROUND option is in 55.6 is rounded up to 56.
effect. The format rounds the
number after the decimal to 99.6 is rounded up to 100.
the next highest integer if the Rounding is not performed on 5 and 660.
number after the decimal is
greater than or equal to .5.
5 Turn the number into a 205 becomes the character string 00205.
character string. If the
5 becomes the character string 05.
number is shorter than the
picture, then the length of 56 becomes the character string 56.
the character string is equal 100 becomes the character string 100.
to the number of digit
660 becomes the character string 0660.
selectors in the picture. Pad
the character string with When the picture is longer than the numbers,
leading zeros. (The results the format adds a leading zero to the value. The
are equivalent to using the format does not add leading zeros to the
Zw. format. Zw. is explained character string 56 and 100 because the
in the section on SAS corresponding picture has the same number of
formats in SAS Formats and selectors.
Informats: Reference.
7 Prefix any characters that The PREFIX= option reclaims the decimal point
are specified in the PREFIX= and the negative sign, as shown with the
option. You need the formatted values -2.05, -.05 and .56.
PREFIX= option because
when a picture contains any
digit selectors, the picture
must begin with a digit
selector. Thus, you cannot
begin your picture with a
decimal point, minus sign, or
any other character that is
not a digit selector.
1 Range low – –1 –0.99 < – < 0 0 < – < .99 0.99 – < 1 1 – high
4a No 205 5 55 99 660
Roundin
g
1118 Chapter 30 / FORMAT Procedure
The following PROC PRINT steps associates the NOZEROSR. format and the
NOZEROS. format with the AMOUNT variable in SAMPLE. The first output shows
the result of rounding.
proc print data=sample;
format amount nozerosr.;
title 'Formatting the Variable Amount';
title2 'with the NOZEROSR. Format Using Rounding';
run;
CAUTION
The picture must be wide enough for the prefix and the numbers. In this
example, if the value –45.00 were formatted with NOZEROS., then the result would be
45.00 because it falls into the first range, low – –1, and the picture for that range is not
wide enough to accommodate the prefixed minus sign and the number.
CAUTION
The picture must be wide enough for an additional digit if rounding a number
adds a digit to the number. For example, the picture for the number .996 could be
‘99’ (prefix ‘.’ mult=100). After rounding the number and multiplying it by 100, the
resulting number is 100. When the picture is applied, the result is .00, an inaccurate
number. In order to ensure accuracy of numbers when you round numbers, make the
picture wide enough to accommodate larger numbers.
Specifying No Picture
This PICTURE statement creates a picture-name format that has no picture:
picture picture-name;
Using this format has the effect of applying the default SAS format to the values.
SELECT Statement
Selects entries for processing by the FMTLIB and CNTLOUT= options.
Restrictions: Only one SELECT statement can appear in a PROC FORMAT step.
You cannot use a SELECT statement and an EXCLUDE statement within the same
PROC FORMAT step.
Example: “Example 15: Printing the Description of Informats and Formats” on page 1183
Syntax
SELECT entry(s);
Required Argument
entry(s)
specifies one or more catalog entries for processing. Catalog entry names are
the same as the name of the informat or format that they store. Because
informats and formats can have the same name, and because character and
numeric informats or formats can have the same name, you must use certain
prefixes when specifying informats and formats in the SELECT statement.
Follow these rules when specifying entries in the SELECT statement:
n Precede names of entries that contain character formats with a dollar sign
($).
VALUE Statement 1121
Details
In addition, the following SELECT statement selects all formats or informats that
occur alphabetically between apple and pear, inclusive:
select apple-pear;
VALUE Statement
Creates a format that specifies character strings to use to print variable values.
See: SAS Formats and Informats: Reference for documentation about SAS formats.
Examples: “Example 8: Creating a Format for Character Values” on page 1160
“Example 11: Writing a Format for Dates Using a Standard SAS Format and a Color
Background” on page 1167
“Example 17: Writing Ranges for Character Strings” on page 1187
1122 Chapter 30 / FORMAT Procedure
Syntax
VALUE <$>name <(format-options)> <value-range-set(s)>;
Required Argument
name
names the format that you are creating. If you created a function using the
FCMP procedure to use as a format, name is the function name without
parenthesis.
Requirement The name must be a valid SAS name. A numeric format name can
be up to 32 characters in length. A character format name can be
up to 31 characters in length. If you are creating a character format,
then use a dollar sign ($) as the first character.
Tip Refer to the format later by using the name followed by a period.
However, do not use a period after the format name in the VALUE
statement.
Optional Arguments
DEFAULT=length
specifies the default length of the format. The value for DEFAULT= becomes the
length of the format if you do not give a specific length when you associate the
format with a variable.
Default The length of the longest label that is assigned to the right of the equal
sign
Range 1–32767
Tip As a best practice, always specify the DEFAULT= option if you specify
a format as a label.
FUZZ=fuzz-factor
specifies a fuzz factor for matching values to a range. If a number does not
match or fall in a range exactly but comes within fuzz-factor, then the format
considers it a match. For example, the following VALUE statement creates the
LEVELS. format, which uses a fuzz factor of .2:
value levels (fuzz=.2) 1='A'
2='B'
3='C';
FUZZ=.2 means that if a variable value falls within .2 of a value on either end of
the range, then the format uses the corresponding formatted value to print the
variable value. So the LEVELS. format formats the value 2.1 as B.
Tips Specify FUZZ=0 to save storage space when you use the VALUE
statement to create numeric formats.
Use a nonzero fuzz factor only with numbers that are very close but not
an exact match. Ranges are stored internally in sorted order (unless the
NOTSORTED option is used), in order to perform a binary search. When
a fuzz-factor is added to the end of one range and subtracted from the
beginning of the next range, and the ranges overlap, the results can be
unpredictable. A value is placed in the first range that is a match in the
binary search. The exclusion operator is insufficient to override this
binary search algorithm. As a best practice, when you use the exclusion
operator, set FUZZ=0 or the NOTSORTED option
A best practice is to use FUZZ=0 when you use the < exclusion
operator with numeric formats.
1124 Chapter 30 / FORMAT Procedure
MAX=length
specifies a maximum length for the format. When you associate the format with
a variable, you cannot specify a width greater than the MAX= value.
Default 40
Range 1–32767
MIN=length
specifies a minimum length for the format.
Default 1
Range 1–32767
MULTILABEL
enables the assignment of multiple labels or external values to internal values.
The following VALUE statements show the two uses of the MULTILABEL
option. The first VALUE statement assigns multiple labels to a single internal
value. Multiple labels can also be assigned to a single range of internal values.
The second VALUE statement assigns labels to overlapping ranges of internal
values. The MULTILABEL option allows the assignment of multiple labels to the
overlapped internal values.
value one (multilabel)
1='ONE'
1='UNO'
1='UN';
The primary label for a given entry is the external value that is assigned to the
first internal value or range of internal values that matches or contains the entry
when all internal values are ordered sequentially. Here is an example:
n In the first VALUE statement, the primary label for 1 is ONE because ONE is
the first external value that is assigned to 1. The secondary labels for 1 are
UNO and UN.
n In the second VALUE statement, the primary label for 33 is 25 to 39
because the range 25–39 is sequentially the first range of internal values
that contains 33. The secondary label for 33 is between 30 and 50 because
the range 30–50 occurs in sequence after the range 25–39.
VALUE Statement 1125
Restriction The maximum number of labels that can be created for a single
format is 255.
NOTSORTED
stores values or ranges in the order in which you define them. If you do not
specify NOTSORTED, then values or ranges are stored in sorted order by
default, and SAS uses a binary searching algorithm to locate the range that a
particular value falls into. If you specify NOTSORTED, then SAS searches each
range in the order in which you define them until a match is found.
SAS automatically sets the NOTSORTED option when you use the CPORT and
CIMPORT procedures to transport formats between operating environments
with different standard collating sequences. This automatic setting of
NOTSORTED can occur when you transport formats between ASCII and
EBCDIC operating environments. If this situation is undesirable, then do the
following:
n Use the CNTLOUT= option in the PROC FORMAT statement to create an
output control data set.
n Use the CPORT procedure to create a transport file for the control data set.
value-range-set(s)
specifies the assignment of a value or a range of values to a formatted value.
The value-range-set(s) have the following form:
The variable values on the left side of the equal sign prints as the character
string on the right side of the equal sign. The maximum length of each value-or-
range to the left of the equal sign is 32,767 characters.
value-or-range
For details about how to specify value-or-range, see “Specifying Values or
Ranges” on page 1129.
1126 Chapter 30 / FORMAT Procedure
formatted-value
specifies a character string that becomes the printed value of the variable
value that appears on the left side of the equal sign. Formatted values are
always character strings, regardless of whether you are creating a character
or numeric format.
existing-format
specifies a format that is supplied by SAS or an existing user-defined format.
The format that you are creating uses the existing format to convert the raw
data that is a match for value-or-range on the left side of the equal sign.
Requirement If you use an existing format, then enclose the format name in
square brackets (for example, [date9.]) or with parentheses and
vertical bars (for example, (|date9.|)). Do not enclose the name
of the existing format in single quotation marks.
Tips Avoid nesting formats more than one level. The resource
requirements can increase dramatically with each additional
level.
functionname
specifies a function that is supplied by SAS or an existing user-defined
function.
Examples
The variable value Delaware prints as DE, the variable value Florida prints as FL,
and the variable value Ohio prints as OH. Note that the $STATE. format begins with
a dollar sign.
Note: Range specifications are case sensitive. In the $STATE. format above, the
value OHIO would not match any of the specified ranges. If you are not certain what
case the data values are in, then one solution is to use the UPCASE function on the
data values and specify all uppercase characters for the ranges.
Using this format has the effect of applying the default SAS format to the values.
data test;
do Date='01jan2006'd to '31dec2013'd;
do j=1 to rannor(0)*100;
output;
end;
end;
run;
proc format;
value MYfmt
/* Format dates prior to 31DEC2011 using only a year. */
low-'31DEC2011'd=[year4.]
/* Format dates 01JAN2013 and beyond using the day, month, and
year. */
'01JAN2013'd-high=[date9.]
The INVALUE, PICTURE, and VALUE statements accept numeric values on the left
side of the equal sign. In character informats, numeric ranges are treated as
character strings. INVALUE and VALUE also accept character strings on the left
side of the equal sign.
As the syntax shows, you can have multiple occurrences of value-or-range in each
value-range-set, using a comma to separate the occurrences. Each occurrence of
value-or-range is either one of the following:
value
a single value, such as 12 or 'CA'. For character formats and informats, enclose
the character values in single quotation marks.
You can use the keyword OTHER= as a single value. OTHER= matches all values
that do not match any other value or range. OTHER= includes missing values for
both numeric and character user-defined formats. You cannot nest a user-
defined format by using the format as the value of OTHER=, unless the format is
a function that formats values.
If you specify a format that is too narrow to represent a value, then SAS tries to
fit the longer label into the available space. SAS truncates character values on
the right, and it sometimes reverts numeric values to the BESTw.d format. If you
do not specify an adequate width for a format, then SAS prints asterisks in the
output. In this example, the specified value is two characters, DK. The width of
DK is not adequate for the five-character 99999 value, and SAS prints two
asterisks in the output.
proc format;
value fmtzip 99999="DK";
data testzip;
zip=27609;
output;
1130 Chapter 30 / FORMAT Procedure
zip=99999;
output;
run;
One way to prevent this problem from occurring is to assign a width when you
apply the format, as in this example.
proc format;
value fmtzip 99999="DK";
data testzip;
zip=27609;
output;
zip=99999;
output;
run;
The result of format Zip fmtzip5.; is that adequate space is specified for
27609 to appear correctly in the first observation of the column labeled “Zip” in
the output table.
You can also prevent the problem by specifying the DEFAULT or MIN options
when you create the format, as in this example.
proc format;
value fmtzip (default=5) 99999="DK";
data testzip;
zip=27609;
output;
zip=99999;
output;
run;
Note: If you add blank spaces to a specified value, be sure to add enough blank
spaces to accommodate the longest value that the format might handle.
For more information about how SAS formats widths, see “Syntax ” in SAS
Formats and Informats: Reference.
For more information about format values, see “Using a Function to Format
Values” on page 1132. For examples, see “Example 8: Creating a Format for
Character Values” on page 1160 and “Example 20: Creating a Function to Use as
a Format” on page 1198.
range
a list of values (for example, 12–68 or 'A'-'Z'). For ranges with character
strings, be sure to enclose each string in single quotation marks. For example, if
you want a range that includes character strings from A to Z, then specify the
range as 'A'-'Z', with single quotation marks around the A and around the Z.
You can use LOW or HIGH as one value in a range, and you can use the range
LOW-HIGH to encompass all values. For example, the following are valid
ranges:
low-'ZZ'
35-high
low-high
other
In numeric ranges, LOW includes the lowest numeric value, excluding missing
values. HIGH includes the largest value in the range. In character ranges, LOW
includes missing values. OTHER includes missing values for both numeric and
character formats.
You can use the less than (<) symbol to exclude values from ranges. If you are
excluding the first value in a range, then put the < exclusion operator after the
value. If you are excluding the last value in a range, then put the < exclusion
operator before the value. For example, the following range does not include 0:
0<-100
TIP When you use the < exclusion operator to place values in ranges,
use the option FUZZ=0 in the VALUE statement for numeric formats.
This is not necessary for character formats because FUZZ=0 is the
default.
If a value at the high end of one range also appears at the low end of another
range, and you do not use the < exclusion operator, then PROC FORMAT assigns
the value to the first range. For example, in the following ranges, the value AJ is
part of the first range:
'AA'-'AJ'=1 'AJ'-'AZ'=2
In this example, to include the value AJ in the second range, use the < exclusion
operator on the first range:
'AA'-<'AJ'=1 'AJ'-'AZ'=2
If you overlap values in ranges, then PROC FORMAT returns an error message
unless, for the VALUE statement, the MULTILABEL option is specified. For
example, the following ranges will cause an error: 'AA'-'AK'=1 'AJ'-'AZ'=2.
Note: You do not have to account for every value on the left side of the equal sign.
Those values are converted using the default informat or format. For example, the
following VALUE statement creates the TEMP. format, which prints all occurrences
of 98.6 as NORMAL:
value temp 98.6='NORMAL';
If the value were 96.9, then the printed result would be 96.9.
Here are the steps to create and use a function to format values:
Here is an example:
/* Create a function that creates the value Qx from a formatted value.
*/
function qfmt(date) $;
length qnum $4;
qnum=put(date,yyq4.);
if substr(qnum,3,1)='Q'
then return(substr(qnum,3,2));
else return(qnum);
endsub;
run;
options cmplib=(work.functions);
proc format;
value qfmt
other=[qfmt()]; run;
data djia2013;
input closeDate date7. close;
datalines;
01jan13 800.86
02feb13 7062.93
02mar13 7608.92
01apr13 8168.12
01may13 8500.33
01jun13 8447.00
01jul13 9171.61
03aug13 9496.28
01sep13 9712.28
01oct13 9712.73
02nov13 10344.84
02dec13 10428.05
run;
2 Open the format folder. To view the default format folder, expand Libraries ð
Work and select Formats.
Results: FORMAT Procedure 1135
3 To view the format description, do one of the following actions on the format
name in the contents pane:
n double-click the format name
create a transfer file of the data set. Then, use the CIMPORT and FORMAT
procedures in the target operating environment to create the formats and informats
there.
You create an output control data set with the CNTLOUT= option in the PROC
FORMAT statement. You use output control data sets, or a set of observations
from an output control data set, as an input control data set in a subsequent PROC
FORMAT step using the CNTLIN= option.
Output control data sets contain an observation for every value or range in each of
the informats or formats in the LIBRARY= catalog. The data set consists of
variables that give either global information about each format and informat
created in the PROC FORMAT step or specific information about each range and
value.
H
specifies that a range's ending value is HIGH.
I
specifies a numeric informat range.
J
specifies justification for an informat.
L
specifies that a range's starting value is LOW.
M
specifies that the MULTILABEL option is in effect.
N
specifies that the format or informat has no ranges, including no OTHER=
range.
O
specifies that the range is OTHER.
R
specifies that the ROUND option is in effect.
S
specifies that the NOTSORTED option is in effect.
U
specifies that the UPCASE option for an informat be used.
LABEL
specifies a character variable whose value is associated with a format or an
informat.
LANGUAGE
specifies the language that is used for weekdays and months that you can
substitute in a date, time, or datetime picture. If you specify a language that is
not supported or is invalid, English is used.
LENGTH
specifies a numeric variable whose value is the value of the LENGTH= option.
MAX
specifies a numeric variable whose value is the value of the MAX= option.
MIN
specifies a numeric variable whose value is the value of the MIN= option.
MULT
specifies a numeric variable whose value is the value of the MULT= option.
NOEDIT
for picture formats, specifies a numeric variable whose value indicates whether
the NOEDIT option is in effect. Valid values are as follows:
1
specifies that the NOEDIT option is in effect.
0
specifies that the NOEDIT option is not in effect.
1138 Chapter 30 / FORMAT Procedure
PREFIX
for picture formats, specifies a character variable whose value is the value of
the PREFIX= option.
SEXCL
specifies a character variable that indicates whether the range's starting value is
excluded. Valid values are as follows:
Y
specifies that the range's starting value is excluded.
N
specifies that the range's starting value is not excluded.
START
specifies a character variable that gives the range's starting value.
TYPE
specifies a character variable that indicates the type of format. Possible values
are as follows:
C
specifies a character format.
I
specifies a numeric informat.
J
specifies a character informat.
N
specifies a numeric format (excluding pictures).
P
specifies a picture format.
This table specifies TYPE values for creating formats and informats using PROC
FORMAT and the CNTLIN option. For example, in Scenario 1 if START=Numeric and
LABEL=Numeric, then TYPE=I and you can use the INPUT function to create a
numeric column. See “Example 13: Creating a Format from a CNTLIN= Data Set” for
an example of creating a format using PROC FORMAT and the CNTLIN option.
The following output shows an output control data set that contains information
about all the informats and formats created in the FORMAT procedure examples.
Results: FORMAT Procedure 1139
Output 30.2 Output Control Data Set for PROC FORMAT Examples
You can use the SELECT or EXCLUDE statement to control which formats and
informats are represented in the output control data set. For details, see “SELECT
Statement” on page 1120 and “EXCLUDE Statement” on page 1091.
If you specify START='LOW', and the HLO variable does not contain 'L', then the
literal value of LOW is used. If you specify START='OTHER', and the HLO variable
does not contain 'O', then the literal value of OTHER is used. If you specify
END='HIGH', and the HLO variable does not contain 'H', then the literal value of
HIGH is used.
You can create more than one format from an input control data set if the
observations for each format are grouped together.
You can use a VALUE, INVALUE, or PICTURE statement in the same PROC
FORMAT step with the CNTLIN= option. If the VALUE, INVALUE, or PICTURE
statement is creating the same informat or format that the CNTLIN= option is
creating, then the VALUE, INVALUE, or PICTURE statement creates the informat or
format and the CNTLIN= data set is not used. However, you can create an informat
or format with VALUE, INVALUE, or PICTURE and create a different informat or
format with CNTLIN= in the same PROC FORMAT step.
For an example featuring an input control data set, see “Example 13: Creating a
Format from a CNTLIN= Data Set” on page 1173.
Procedure Output
The FORMAT procedure prints output only when you specify the FMTLIB option or
the PAGE option in the PROC FORMAT statement. The printed output is a table for
each format or informat entry in the catalog that is specified in the LIBRARY=
option. The output also contains global information and the specifics of each value
or range that is defined for the format or informat. You can use the SELECT or
EXCLUDE statement to control which formats and informats are represented in the
FMTLIB output. For details, see “SELECT Statement” on page 1120 and “EXCLUDE
Statement” on page 1091. For an example, see “Example 15: Printing the Description
of Informats and Formats” on page 1183.
The FMTLIB output shown in the following output contains a description of the
$CITY. format, which is created in “Example 8: Creating a Format for Character
Values” on page 1160, and the EVALUATION. informat, which is created in
“Example 12: Converting Raw Character Data to Numeric Values” on page 1170.
Results: FORMAT Procedure 1141
Output 30.3 Output from PROC FORMAT with the FMTLIB Option
The fields are described below in the order in which they appear in the output, from
left to right:
INFORMAT NAME or FORMAT NAME
the name of the informat or format. Informat names begin with an at-sign (@).
LENGTH
the length of the informat or format. PROC FORMAT determines the length in
the following ways:
n For character informats, the value for LENGTH is the length of the longest
raw data value on the left side of the equal sign.
n For numeric informats, the following is true:
o LENGTH is 12 if all values on the left side of the equal sign are numeric.
1142 Chapter 30 / FORMAT Procedure
o LENGTH is the same as the longest raw data value on the left side of the
equal sign.
n For formats, the value for LENGTH is the length of the longest value on the
right side of the equal sign.
In the output for $CITY., the LENGTH is 14 because the longest picture is 14
characters.
In the output for @EVALUATION., the length is 1 because 1 is the longest raw
data value on the left side of the equal sign.
NUMBER OF VALUES
the number of values or ranges associated with the informat or format.
NOZEROS. has 4 ranges, and EVAL. has 5.
MIN LENGTH
the minimum length of the informat or format. The value for MIN LENGTH is 1
unless you specify a different minimum length with the MIN= option.
MAX LENGTH
the maximum length of the informat or format. The value for MAX LENGTH is
40 unless you specify a different maximum length with the MAX= option.
DEFAULT LENGTH
the length of the longest value in the INVALUE or LABEL field, or the value of
the DEFAULT= option.
FUZZ
the fuzz factor. For informats, FUZZ always is 0. For formats, the value for this
field is STD if you do not use the FUZZ= option. STD signifies the default fuzz
value.
START
the beginning value of a range. FMTLIB prints only the first 16 characters of a
value in the START and END columns.
END
the ending value of a range. The exclusion sign (<) appears after the values in
START and END, if the value is excluded from the range.
INVALUE
appears only for informats and contains the values that have informats. The
SAS version specifies the version in which the informat is compatible. The date
indicates the date in which the informat was created.
Note: If SAS displays version numbers V7 | V8, then the informat is compatible
with those versions. If it is not compatible with earlier releases, the release that
created the informat is shown. Version V9 supports long informat names (more
than eight characters), and V7 | V8 do not.
LABEL
LABEL appears only for formats and contains either the formatted value or
picture. The SAS version specifies the version in which the format is compatible.
The date indicates the date in which the format was created.
Example 1: Create a Format Library in a CAS Session 1143
Note: If SAS displays version numbers V7 | V8, then the format is compatible
with those versions. If it is not compatible with earlier releases, the release that
created the format is shown. Version V9 supports long format names (more
than eight characters), and V7 | V8 do not.
For picture formats, such as NOZEROS., the LABEL section contains the
PREFIX=, FILL=, and MULT= values. To note these values, FMTLIB prints the
letters P, F, and M to represent each option, followed by the value. For example,
in the LABEL section, P-. indicates that the prefix value is a hyphen followed by
a period.
FMTLIB prints only 40 characters in the LABEL column.
Details
This example uses the CASFMTLIB option to create a format library in a CAS
session. It associates the format library with a table in the WORK directory and
assigns a CAS engine libref.
Program
cas casauto sessopts=(caslib="casuser");
caslib _all_ assign;
1='New_York '
2='Massachusetts_General'
3='Los_Angeles'
4='Mary_Fletcher';
run;
data clinicalTrial;
input hospital treatment $ @@;
severity=rannor(1323)*5 + 10;
format hospital hospx.;
cards;
3 B 3 B 3 C 3 C
1 A 1 A 1 A 1 B
1 B 1 B 1 C 1 C
1 C 1 D 1 D 1 D
2 A 2 A 2 A 2 B
2 B 2 B 2 C 2 C
2 C 2 D 2 D 2 D
3 A 3 A 3 A 3 B
3 C 3 D 3 D 3 D
4 A 4 A 4 A 4 B
4 B 4 B 4 C 4 C
4 C 4 D 4 D 4 D
;
data proclib.clinicalTrial;
set work.clinicalTrial;
run;
Program Description
Create a format library in a CAS session. Assign a library with the LIBNAME
statement. PROC FORMAT creates a format named hospx. The CASFMTLIB option
specifies the name of the format library myformats in the CAS session.
cas casauto sessopts=(caslib="casuser");
caslib _all_ assign;
Send actions to the CAS session. The LIBNAME statement assigns a CAS engine
libref that is used to identify the table in the REGSELECT procedure step.
data proclib.clinicalTrial;
set work.clinicalTrial;
run;
LOG
Example Code 30.1 Create a Format Library in a CAS Session, Part 1
68
69 data clinicalTrial;
70 input hospital treatment $ @@;
71 severity=rannor(1323)*5 + 10;
72 format hospital hospx.;
73 datalines;
NOTE: SAS went to a new line when INPUT statement reached past the end
of a line.
NOTE: The data set WORK.CLINICALTRIAL has 48 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
Example 1: Create a Format Library in a CAS Session 1147
88 data proclib.clinicalTrial;
89 set work.clinicalTrial;
90 run;
NOTE: There were 48 observations read from the data set WORK.CLINICALTRIAL.
NOTE: The data set PROCLIB.CLINICALTRIAL has 48 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.08 seconds
cpu time 0.01 seconds
91
92 proc regselect data=proclib.clinicalTrial;
93 class treatment hospital;
94 model severity=treatment hospital;
95 run;
96
97
98 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
Details
Several examples in this section use the PROCLIB.STAFF data set. In addition,
many of the informats and formats that are created in these examples are stored in
Library.Formats. The output data set shown in “Output Control Data Set” on page
1135 contains a description of these informats and the formats.
The variables are about a small subset of employees who work for a corporation
that has sites in the U.S. and Britain. The data contain the name, identification
number, salary (in British pounds), location, and date of hire for each employee.
Program
libname proclib 'SAS-library';
data proclib.staff;
infile datalines dlm='#';
input Name & $16. IdNumber $ Salary
Site $ HireDate date8.;
format hiredate date8.;
datalines;
Capalleti, Jimmy# 2355# 21163# BR1# 30JAN13
Chen, Len# 5889# 20976# BR1# 18JUN06
Davis, Brad# 3878# 19571# BR2# 20MAR04
Leung, Brenda# 4409# 34321# BR2# 18SEP94
Martinez, Maria# 3985# 49056# US2# 10JAN93
Orfali, Philip# 0740# 50092# US2# 16FEB03
Patel, Mary# 2398# 35182# BR3# 02FEB90
Smith, Robert# 5162# 40100# BR5# 15APR06
Sorrell, Joseph# 4421# 38760# US1# 19JUN11
Zook, Carla# 7385# 22988# BR3# 18DEC10
;
Program Description
libname proclib 'SAS-library';
Create the data set PROCLIB.STAFF. The INPUT statement assigns the names
Name, IdNumber, Salary, Site, and HireDate to the variables that appear after the
DATALINES statement. The FORMAT statement assigns the standard SAS format
DATE7. to the variable HireDate.
1150 Chapter 30 / FORMAT Procedure
data proclib.staff;
infile datalines dlm='#';
input Name & $16. IdNumber $ Salary
Site $ HireDate date8.;
format hiredate date8.;
datalines;
Capalleti, Jimmy# 2355# 21163# BR1# 30JAN13
Chen, Len# 5889# 20976# BR1# 18JUN06
Davis, Brad# 3878# 19571# BR2# 20MAR04
Leung, Brenda# 4409# 34321# BR2# 18SEP94
Martinez, Maria# 3985# 49056# US2# 10JAN93
Orfali, Philip# 0740# 50092# US2# 16FEB03
Patel, Mary# 2398# 35182# BR3# 02FEB90
Smith, Robert# 5162# 40100# BR5# 15APR06
Sorrell, Joseph# 4421# 38760# US1# 19JUN11
Zook, Carla# 7385# 22988# BR3# 18DEC10
;
Details
This example uses a PICTURE statement to create a format that prints the values
for the variable Salary in the data set PROCLIB.STAFF in U.S. dollars.
Program
libname proclib 'SAS-library-1';
libname library 'SAS-library-2';
options nodate pageno=1 linesize=80 pagesize=40;
proc format library=library;
picture uscurrency low-high='000,000' (mult=1.61 prefix='$');
run;
Example 3: Creating a Picture Format 1151
Program Description
Assign two SAS library references (PROCLIB and LIBRARY). Assigning a library
reference LIBRARY is useful in this case because if you use PROC FORMAT, then
SAS automatically searches for informats and formats in any library that is
referenced with the LIBRARY libref.
libname proclib 'SAS-library-1';
libname library 'SAS-library-2';
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Print the PROCLIB.STAFF data set. The NOOBS option suppresses the printing of
observation numbers. The LABEL option uses variable labels instead of variable
names for column headings.
proc print data=proclib.staff noobs label;
Specify a label and format for the Salary variable. The LABEL statement
substitutes the specific label for the variable in the report. In this case, “Salary in
US Dollars” is substituted for the variable Salary for this print job only. The
FORMAT statement associates the USCurrency. format with the variable name
Salary for the duration of this procedure step.
label salary='Salary in U.S. Dollars';
format salary uscurrency.;
Output
Output 30.4 PROCLIB.STAFF with a Format for the Variable Salary
Details
This example uses the MULT option of the PICTURE statement to format dollars
that displays M, B, or T to indicate millions, billions, and trillions of dollars,
Example 4: Creating a Picture Format for Large Dollar Amounts 1153
TIP This example uses dollar values without cents and rounding is not
necessary. If your dollar values include cents, you can use the ROUND option
in the PICTURE statement to round values to the nearest dollar value. For
more information, see “ROUND” on page 1106.
Program
proc format;
picture bigmoney (fuzz=0)
1E06-<1000000000='0000 M' (prefix='$' mult=.000001)
1E09-<1000000000000='0000 B' (prefix='$' mult=1E-09)
1E12-<1000000000000000='0000 T' (prefix='$' mult=1E-012);
run;
data mult;
do i=5 to 12;
x=16**i;
put x=comma20. x= bigmoney.;
end;
run;
Program Description
Create the BIGMONEY format. The BIGMONEY. format defines three value-range
sets to format millions, billions, and trillions of dollars. 1E06 is one million, 1E09 is
one billion, and 1E12 is one trillion. The < exclusion operator indicates not to include
the number that follows in the range. A best practice is to use the FUZZ=0 option
when you use the exclusion operator to ensure accurate numbers. For a million
dollars, the range is 1,000,000 to 999,999,999. The label that is specified on the
right side of the equal sign uses 4 zeros as digit selectors. The zero-digit selector
specifies not to print leading zeros. The first digit selector is necessary to print the
$ prefix symbol when the value is three digits. The value .000001 for the MULT=
option is another way to write 1E-06, which is one millionth. Multiplying a value by
the millionth, billionth, and trillionth multipliers return the number of millions,
billions, and trillions of dollars.
proc format;
picture bigmoney (fuzz=0)
1E06-<1000000000='0000 M' (prefix='$' mult=.000001)
1E09-<1000000000000='0000 B' (prefix='$' mult=1E-09)
1E12-<1000000000000000='0000 T' (prefix='$' mult=1E-012);
run;
x=16**i;
put x=comma20. x= bigmoney.;
end;
run;
LOG
Example Code 30.3 Formatted Millions, Billions, and Trillions Dollar Amounts
x=1,048,576 x=$1 M
x=16,777,216 x=$16 M
x=268,435,456 x=$268 M
x=4,294,967,296 x=$4 B
x=68,719,476,736 x=$68 B
x=1,099,511,627,776 x=$1 T
x=17,592,186,044,416 x=$17 T
x=281,474,976,710,656 x=$281 T
Program
proc format;
picture bigmoney (fuzz=0)
1E06-<1000000000='0000.99 M' (prefix='$' mult=.0001)
1E09-<1000000000000='0000.99 B' (prefix='$' mult=1E-07)
1E12-<1000000000000000='0000.99 T' (prefix='$' mult=1E-010);
run;
data mult;
do i=5 to 12;
x=16**i;
put x=comma20. x= bigmoney.;
end;
run;
Program Description
In this program, the BIGMONEY. format is modified to display a more accurate
number by adding decimal values.
Modify the BIGMONEY format. To display a more accurate number, the picture
value and the MULT= value are modified. To display two decimal values, .99 is
added to the picture. To calculate two decimal values, the value in the MULT=
option is reduced from one millionth to one ten-thousandth. When 16 5 is multiplied
by .0001, the results is 104.8576. The decimal values are truncated and the 104 is
placed in the picture beginning on the right. The resulting formatted value is 1.04 M.
proc format;
picture bigmoney (fuzz=0)
1E06-<1000000000='0000.99 M' (prefix='$' mult=.0001)
1E09-<1000000000000='0000.99 B' (prefix='$' mult=1E-07)
Example 5: Filling a Picture Format 1155
LOG
Example Code 30.4 More Precisely Formatted Large Dollar Amounts
x=1,048,576 x=$1.04 M
x=16,777,216 x=$16.77 M
x=268,435,456 x=$268.43 M
x=4,294,967,296 x=$4.29 B
x=68,719,476,736 x=$68.71 B
x=1,099,511,627,776 x=$1.09 T
x=17,592,186,044,416 x=$17.59 T
x=281,474,976,710,656 x=$281.47 T
Details
This example does the following tasks:
n prefixes the formatted value with a specified character
Program
data pay;
input Name $ MonthlySalary;
1156 Chapter 30 / FORMAT Procedure
datalines;
Liu 1259.45
Lars 1289.33
Kim 1439.02
Wendy 1675.21
Alex 1623.73
;
proc format;
picture salary low-high='00,000,000.00' (fill='*' prefix='$');
run;
proc print data=pay noobs;
format monthlysalary salary.;
title 'Printing Salaries for a Check';
run;
Program Description
Create the PAY data set. The PAY data set contains the monthly salary for each
employee.
data pay;
input Name $ MonthlySalary;
datalines;
Liu 1259.45
Lars 1289.33
Kim 1439.02
Wendy 1675.21
Alex 1623.73
;
Define the SALARY. picture format and specify how the picture will be filled.
When FILL= and PREFIX= PICTURE statement options appear in the same picture,
the format places the prefix and then the fill characters. The SALARY. format fills
the picture with the fill character because the picture has zeros as digit selectors.
The left-most comma in the picture is replaced by the fill character.
proc format;
picture salary low-high='00,000,000.00' (fill='*' prefix='$');
run;
Print the PAY data set. The NOOBS option suppresses the printing of observation
numbers. The FORMAT statement temporarily associates the SALARY. format with
the variable MonthlySalary.
proc print data=pay noobs;
format monthlysalary salary.;
Output
Output 30.5 Printing Salaries for a Check
Details
This example uses directives to format date, time, and datetime values. It also uses
directives on more than one value-range pair.
Program
proc format;
picture mytime (round)
low-<86400='%H hours, %M minutes' (datatype=time) 1
86400-high='%n days, %H hours, %M minutes' (datatype=time) ; 2
/* 86400=number of seconds in one day */
run;
data test;
input xtime;
newtime=put(xtime,mytime.);
datalines;
1158 Chapter 30 / FORMAT Procedure
12345
46987
86400
99999
172800
1012345
3333333
;
run;
proc print data=test;
run;
Program Description
Use directives with the keywords LOW, HIGH, and LOW–HIGH. The directives
format the date, time, and datetime values.
proc format;
picture mytime (round)
low-<86400='%H hours, %M minutes' (datatype=time) 1
86400-high='%n days, %H hours, %M minutes' (datatype=time) ; 2
/* 86400=number of seconds in one day */
run;
1 The %H and %M directives specify that the hours and minutes are identified in
the output. The DATATYPE option enables the use of directives in the picture.
2 The %n, %H, %M directives specify that the days, hours, and minutes are
identified in the output.
run;
Details
At times, you might need to express midnight as 24:00, or you need to use a
datetime hour range 00:00:01–24:00:00. The hour value range for
DATATYPE=DATETIME is 00:00:00–23:59:59. This example uses the option
DATATYPE=DATETIME_UTIL to express hours in the range 00:00:01–24:00:00,
and shows a date change if you use 00:00:00.
Program
proc format;
picture hours (default=19)
other='%Y-%0m-%0d %0H:%0M:%0S' (datatype=datetime_util);
run;
data _null_;
x = '01jul2015:00:00:01'dt; put x=hours.;
x = '01jul2015:00:00:00'dt; put x=hours.;
run;
1160 Chapter 30 / FORMAT Procedure
Program Description
Use the DATATYPE=DATETIME_UTIL option to use the hour range 00:00:01–
24:00:00.
proc format;
picture hours (default=19)
other='%Y-%0m-%0d %0H:%0M:%0S' (datatype=datetime_util);
run;
Compare Date Values. The first datetime value is in the range 00:00:01 and shows
the day as July 1. The second datetime value is not in the range 00:00:01–24:00:00
and shows results as midnight of the previous day.
data _null_;
x = '01jul2015:00:00:01'dt; put x=hours.;
x = '01jul2015:00:00:00'dt; put x=hours.;
run;
Log
Example Code 30.5 Using Hour Range 00:00:01–24:00:00
x=2015-07-01 00:01:00
x=2015-06-30 24:00:00
Details
This example uses a VALUE statement to create a character format that prints a
value of a character variable as a different character string.
Program
libname proclib 'SAS-library-1';
Example 8: Creating a Format for Character Values 1161
Program Description
Assign two SAS library references (PROCLIB and LIBRARY). Assigning a library
reference LIBRARY is useful in this case because if you use PROC FORMAT, then
SAS automatically searches for informats and formats in any library that is
referenced with the LIBRARY libref.
libname proclib 'SAS-library-1';
libname library 'SAS-library-2';
Create the catalog named Library.Formats, where the user-defined formats will
be stored. The LIBRARY= option specifies a permanent storage location for the
formats that you create. It also creates a catalog named FORMAT in the specified
library. If you do not use LIBRARY=, then SAS temporarily stores formats and
informats that you create in a catalog named Work.Formats.
proc format library=library;
Define the $CITY. format. The special codes BR1, BR2, and so on, are converted to
the names of the corresponding cities. The keyword OTHER specifies that values in
the data set that do not match any of the listed city code values are converted to
the value INCORRECT CODE.
value $city 'BR1'='Birmingham UK'
'BR2'='Plymouth UK'
'BR3'='York UK'
'US1'='Denver USA'
'US2'='Miami USA'
other='INCORRECT CODE';
run;
Print the PROCLIB.STAFF data set. The NOOBS option suppresses the printing of
observation numbers. The LABEL option uses variable labels instead of variable
names for column headings.
1162 Chapter 30 / FORMAT Procedure
Specify a label for the Salary variable. The LABEL statement substitutes the label
“Salary in U.S. Dollars” for the name SALARY.
label salary='Salary in U.S. Dollars';
Specify formats for Salary and Site. The FORMAT statement temporarily
associates the USCURRENCY. format with the variable SALARY and also
temporarily associates the format $CITY. with the variable SITE.
format salary uscurrency. site $city.;
Output
Output 30.7 PROCLIB.STAFF with Formatted Variables for Salary and Site
Details
The EDUCATION data set reports dropout rates and math scores for several states,
and indicates a region for each state.
In this example, you use the VALUE statement to create the text value n/a for all
math score missing values. All nonmissing math score values are formatted using
the 5.1 format.
The example then prints the dropout rate and math scores for each state, by region.
Program
options obs=20;
proc format;
value myfmt .='n/a' other=[5.1];
run;
proc sort data=education;
by region;
run;
Program Description
Set the number of observations to print.
options obs=20;
Create a format for the Mathscore variable values. Use the VALUE statement to
create the format MYFMT. for the Mathscore variable. When the program
encounters a missing Mathscore value, the value is formatted as n/a. All other
values for Mathscore are formatted using the 5.1 format.
proc format;
value myfmt .='n/a' other=[5.1];
run;
1164 Chapter 30 / FORMAT Procedure
Sort and print the data. Use PROC SORT to sort the data set by region. To print the
data by region, specify the region variable in the PROC PRINT BY statement. To
report the state, dropout rate, and math scores, use the VAR statement and specify
the state, dropOutRate, and mathScore variables. Finally, use the FORMAT
statement to tell SAS to format the mathScore variable using the MYFMT. format.
proc sort data=education;
by region;
run;
Output
Output 30.8 Dropout Rates and Math Scores for Each State in a Region
Example 10: Creating an Informat Using Perl Regular Expressions 1165
Details
This example uses two Perl regular expressions to create an informat. The informat
using the first expression verifies that the input is an integer and reads the integer.
The second informat uses a regular expression that invokes substitution to read a
number different from the input value.
Program
proc format;
1166 Chapter 30 / FORMAT Procedure
data _null_;
input x:isnum. y:x1to2x.;
put x= y=;
datalines;
1 121
2 145
a 232
run;
Program Description
Create new informats. If the input is a decimal integer, the ISNUM. format reads
the number. Otherwise, SAS writes an error to the log. The X1TO2X. informat
substitutes all 1s in the input value with a 2.
proc format;
invalue isnum (default=5) '/[0-9]/' (regexp) = _same_ other=_error_;
invalue x1to2x(default=5) 's/1/2/' (regexpe) = _same_ other=_same_;
run;
Read the data. The first two lines of data are valid. The first input value 121 is
formatted as 222 because a 1 is substituted with a 2. The input value of 145 is
formatted as 245 using the same substitution rule. The third line produces an error
because the value for x is a character.
data _null_;
input x:isnum. y:x1to2x.;
put x= y=;
datalines;
1 121
2 145
a 232
run;
Example 11: Writing a Format for Dates Using a Standard SAS Format and a Color
Background 1167
LOG
1 proc format;
2 invalue isnum (default=5) '/[0-9]/' (regexp) = _same_ other=_error_;
NOTE: Informat ISNUM has been output.
3 invalue x1to2x(default=5) 's/1/2/' (regexpe) = _same_ other=_same_;
NOTE: Informat X1TO2X has been output.
4 run;
5
6 data _null_;
7 input x:isnum. y:x1to2x.;
8 put x= y=;
9 datalines;
x=1 y=222
x=2 y=245
NOTE: Invalid data for x in line 12 1-1.
x=. y=232
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----
+----8----+--
12 a 232
x=. y=232 _ERROR_=1 _N_=3
Details
This example uses an existing format that is supplied by SAS as a formatted value
and color codes values based on dates.
n nesting formats
Program
libname proclib 'SAS-library-1';
libname library 'SAS-library-2';
proc format library=library;
value benefit
low-'31DEC2008'd=[worddate20.]
'01JAN2009'd-high=' ** Not Eligible **';
value color
low-'31DEC2008'd='light green'
'01JAN2009'd-high='light red';
run;
proc print data=proclib.staff noobs label;
var name idnumber salary site;
var hiredate /style=[background=color.];
label salary='Salary in U.S. Dollars';
format salary uscurrency. site $city. hiredate benefit.;
title 'PROCLIB.STAFF with a Format for the Variables';
title2 'Salary, Site, and HireDate';
run;
Program Description
This program defines a format called BENEFIT., which differentiates between
employees hired on or before 31DEC2008. The purpose of this program is to
indicate any employees who are eligible to receive a benefit, based on a hire date
on or before December 31, 2008. All other employees with a later hire date are
listed as ineligible for the benefit.
Assign two SAS library references (PROCLIB and LIBRARY). Assigning a library
reference LIBRARY is useful in this case because if you use PROC FORMAT, then
SAS automatically searches for informats and formats in any library that is
referenced with the LIBRARY libref.
libname proclib 'SAS-library-1';
libname library 'SAS-library-2';
Store the BENEFIT. format in the catalog Library.Formats. The LIBRARY= option
specifies the permanent storage location LIBRARY for the formats that you create.
If you do not use LIBRARY=, then SAS temporarily stores formats and informats
that you create in the catalog Work.Formats.
proc format library=library;
Example 11: Writing a Format for Dates Using a Standard SAS Format and a Color
Background 1169
Define the first range in the BENEFIT. format. This first range differentiates
between the employees who were hired on or before 31DEC2008 and those who
were hired after that date. The keyword LOW and the SAS date constant
'31DEC2008'd create the first range, which includes all date values that occur on or
before December 31, 2008. For values that fall into this range, SAS applies the
WORDDATEw. format. For more information about SAS date constants, see “Dates,
Times, and Intervals” in SAS Language Reference: Concepts. For more information
about the WORDDATE formats, see “WORDDATEw. Format” in SAS Formats and
Informats: Reference.
value benefit
low-'31DEC2008'd=[worddate20.]
'01JAN2009'd-high=' ** Not Eligible **';
Define the colors for the ranges. Using the same date ranges, employees who are
eligible for a benefit based on the dates are color coded in light green. Employees
who are not eligible for a benefit are color coded in a light red.
value color
low-'31DEC2008'd='light green'
'01JAN2009'd-high='light red';
run;
Print the data set PROCLIB.STAFF. The NOOBS option suppresses the printing of
observation numbers. The LABEL option uses variable labels instead of variable
names for column headings. The VAR statement names the variables to be printed.
The second VAR statement uses the STYLE= option to name the color. format as
the background color for the Hiredate variable.
proc print data=proclib.staff noobs label;
var name idnumber salary site;
var hiredate /style=[background=color.];
Specify a label for the Salary variable. The LABEL statement substitutes the label
“Salary in U.S. Dollars” for the name SALARY.
label salary='Salary in U.S. Dollars';
Specify formats for Salary, Site, and Hiredate. The FORMAT statement associates
the USCURRENCY. format (created in “Example 3: Creating a Picture Format” on
page 1150) with SALARY, the $CITY. format (created in “Example 8: Creating a
Format for Character Values” on page 1160) with SITE, and the BENEFIT. format
with HIREDATE.
format salary uscurrency. site $city. hiredate benefit.;
Output
Output 30.9 PROCLIB.STAFF with a Format for the Variables Salary, Site, and
HireDate
Details
This example uses an INVALUE statement to create a numeric informat that
converts numeric and character raw data to numeric data.
Program
libname proclib 'SAS-library-1';
libname library 'SAS-library-2';
proc format library=library;
invalue evaluation 'O'=4
Example 12: Converting Raw Character Data to Numeric Values 1171
'S'=3
'E'=2
'C'=1
'N'=0;
run;
data proclib.points;
input EmployeeId $ (Q1-Q4) (evaluation.,+1);
TotalPoints=sum(of q1-q4);
datalines;
2355 S O O S
5889 2 2 2 2
3878 C E E E
4409 0 1 1 1
3985 3 3 3 2
0740 S E E S
2398 E E C C
5162 C C C E
4421 3 2 2 2
7385 C C C N
;
proc print data=proclib.points noobs;
title 'The PROCLIB.POINTS Data Set';
run;
Program Description
This program converts quarterly employee evaluation grades, which are alphabetic,
into numeric values so that reports can be generated that sum the grades up as
points.
Set up two SAS library references, one named PROCLIB and the other named
LIBRARY.
libname proclib 'SAS-library-1';
libname library 'SAS-library-2';
Create the numeric informat EVALUATION.. The INVALUE statement converts the
specified values. The letters O (Outstanding), S (Superior), E (Excellent), C
(Commendable), and N (None) correspond to the numbers 4, 3, 2, 1, and 0,
respectively.
invalue evaluation 'O'=4
'S'=3
'E'=2
'C'=1
'N'=0;
run;
Create the PROCLIB.POINTS data set. The instream data, which immediately
follows the DATALINES statement, contains a unique identification number
1172 Chapter 30 / FORMAT Procedure
(EmployeeId) and bonus evaluations for each employee for each quarter of the year
(Q1–Q4). Some of the bonus evaluation values that are listed in the data lines are
numbers; others are character values. Where character values are listed in the data
lines, the EVALUATION. informat converts the value O to 4, the value S to 3, and so
on. The raw data values 0 through 4 are read as themselves because they are not
referenced in the definition of the informat. Converting the letter values to numbers
makes it possible to calculate the total number of bonus points for each employee
for the year. TotalPoints is the total number of bonus points.
data proclib.points;
input EmployeeId $ (Q1-Q4) (evaluation.,+1);
TotalPoints=sum(of q1-q4);
datalines;
2355 S O O S
5889 2 2 2 2
3878 C E E E
4409 0 1 1 1
3985 3 3 3 2
0740 S E E S
2398 E E C C
5162 C C C E
4421 3 2 2 2
7385 C C C N
;
Print the PROCLIB.POINTS data set. The NOOBS option suppresses the printing of
observation numbers.
proc print data=proclib.points noobs;
Output
Output 30.10 The PROCLIB.POINT Data Set
Details
This example shows how to create a format from a SAS data set.
n create an input control data set from an existing SAS data set
1174 Chapter 30 / FORMAT Procedure
Program Description
Create a temporary data set named scale. The first two variables in the data lines,
called BEGIN and END, will be used to specify a range in the format. The third
variable in the data lines, called AMOUNT, contains a percentage that will be used
as the formatted value in the format. Note that all three variables are character
variables as required for PROC FORMAT input control data sets.
data scale;
input begin: $char2. end: $char2. amount: $char2.;
datalines;
0 3 0%
4 6 3%
7 8 6%
9 10 8%
11 16 10%
;
Create the input control data set CTRL and set the length of the LABEL variable.
The LENGTH statement ensures that the LABEL variable is long enough to
accommodate the label ***ERROR***.
data ctrl;
length label $ 11;
Example 13: Creating a Format from a CNTLIN= Data Set 1175
Rename variables and create an end-of-file flag. The data set CTRL is derived
from WORK.SCALE. RENAME= renames BEGIN and AMOUNT as START and
LABEL, respectively. The END= option creates the variable LAST, whose value is
set to 1 when the last observation is processed.
set scale(rename=(begin=start amount=label)) end=last;
Create the variables Fmtname and Type with fixed values. The RETAIN statement
is more efficient than an assignment statement in this case. RETAIN retains the
value of Fmtname and Type in the program data vector and eliminates the need for
the value to be written on every iteration of the DATA step. Fmtname specifies the
name PercentageFormat, which is the format that the input control data set
creates. The Type variable specifies that the input control data set will create a
numeric format.
retain fmtname 'PercentageFormat' type 'n';
Create an “other” category. Because the only valid values for this application are
0–16, any other value (such as missing) should be indicated as an error to the user.
The IF statement executes only after the DATA step has processed the last
observation from the input data set. When IF executes, HLO receives a value of O
to indicate that the range is OTHER, and LABEL receives a value of ***ERROR***.
The OUTPUT statement writes these values as the last observation in the data set.
HLO has missing values for all other observations.
if last then do;
hlo='O';
label='***ERROR***';
output;
end;
run;
Print the control data set, CTRL. The NOOBS option suppresses the printing of
observation numbers.
proc print data=ctrl noobs;
Output
Output 30.11 The CTRL Data Set
Program Description
Store the created format in the catalog Work.Formats and specify the source for
the format. The CNTLIN= option specifies that the data set CTRL is the source for
the format PercentageFormat.
proc format library=work cntlin=ctrl;
run;
Create the numeric informat EVALUATION.. The INVALUE statement converts the
specified values. The letters O (Outstanding), S (Superior), E (Excellent), C
(Commendable), and N (None) correspond to the numbers 4, 3, 2, 1, and 0,
respectively.
proc format library=library;
invalue evaluation 'O'=4
'S'=3
'E'=2
'C'=1
'N'=0;
run;
Create the WORK.POINTS data set. The instream data, which immediately follows
the DATALINES statement, contains a unique identification number (EmployeeId)
and bonus evaluations for each employee for each quarter of the year (Q1–Q4).
Some of the bonus evaluation values that are listed in the data lines are numbers;
others are character values. Where character values are listed in the data lines, the
Evaluation. informat converts the value O to 4, the value S to 3, and so on. The raw
data values 0 through 4 are read as themselves because they are not referenced in
the definition of the informat. Converting the letter values to numbers makes it
possible to calculate the total number of bonus points for each employee for the
year. TotalPoints is the total number of bonus points. The addition operator is used
instead of the SUM function so that any missing value will result in a missing value
for TotalPoints.
data points;
input EmployeeId $ (Q1-Q4) (evaluation.,+1);
TotalPoints=q1+q2+q3+q4;
datalines;
2355 S O O S
5889 2 . 2 2
3878 C E E E
4409 0 1 1 1
3985 3 3 3 2
0740 S E E S
2398 E E C
5162 C C C E
4421 3 2 2 2
7385 C C C N
;
1178 Chapter 30 / FORMAT Procedure
Output
Output 30.12 The Percentage of Salary for Calculating Bonus
Details
This example shows how to create an informat from a CNTLIN= data set.
n create an input control data set from an existing SAS data set
Program
proc format;
invalue mytest
'abc'=1
'xyz'=2
other=3;
invalue $chrtest
'abc'='xyz'
other='else';
run;
data _null_;
input value:mytest. @@;
put value=;
datalines;
abc xyz ghi 4
run;
data _null_;
input value:$chrtest. @@;
put value=;
datalines;
abc xyz ghi 4
run;
data temp;
length start $8 type $1 hlo $1;
fmtname='newtest'; type='i';
start='abc'; label=1; hlo=' '; output;
start='xyz'; label=2; hlo=' '; output;
start=' '; label=3; hlo='O'; output;
run;
proc format cntlin=temp; run;
data temp;
length start label $8 type $1 hlo $1;
fmtname='$newchr'; type='j';
start='abc'; label='xyz'; hlo=' '; output;
start=' '; label='else'; hlo='O'; output;
run;
1180 Chapter 30 / FORMAT Procedure
data _null_;
input value:$newchr. @@;
put value=;
datalines;
abc xyz ghi 4
run;
data temp;
length start label $8 hlo $1;
fmtname='@new2test';
start='abc'; label='1'; hlo=' '; output;
start='xyz'; label='2'; hlo=' '; output;
start=' '; label='3'; hlo='O'; output;
fmtname='@$new2chr';
start='abc'; label='xyz'; hlo=' '; output;
start=' '; label='else'; hlo='O'; output;
run;
data _null_;
input value:new2test. @@;
put value=;
datalines;
abc xyz ghi 4
run;
data _null_;
input value:$new2chr. @@;
put value=;
datalines;
abc xyz ghi 4
run;
Program Description
Create informats with an INVALUE statement. Create a numeric informat and a
character informat.
proc format;
invalue mytest
'abc'=1
Example 14: Creating an Informat from a CNTLIN= Data Set 1181
'xyz'=2
other=3;
invalue $chrtest
'abc'='xyz'
other='else';
run;
Use the numeric informat with instream data. The code should produce 1, 2, 3, and
3 again as the results of VALUE.
data _null_;
input value:mytest. @@;
put value=;
datalines;
abc xyz ghi 4
run;
Use the character informat with instream data. The code should produce xyz, else,
else, and else as the results of VALUE.
data _null_;
input value:$chrtest. @@;
put value=;
datalines;
abc xyz ghi 4
run;
Create the equivalent of the MYTEST numeric informat using a CNTLIN= data set,
and use the informat name of NEWTEST. Specify that the FMTNAME variable has
the value NEWTEST. Specify the value 'i' or 'I' for the TYPE variable to indicate that
it is a numeric informat. Specify a value of O for other, or a blank value, for the HLO
variable.
data temp;
length start $8 type $1 hlo $1;
fmtname='newtest'; type='i';
start='abc'; label=1; hlo=' '; output;
start='xyz'; label=2; hlo=' '; output;
start=' '; label=3; hlo='O'; output;
run;
Use the character informat with instream data. The code should produce xyz, else,
else, and else as the results of VALUE.
proc format cntlin=temp; run;
Create a CNTLIN= data set for the character informat. The informat is the
equivalent to the $CHRTEST informat that was created above. Specify the name
$NEWCHR for it. The value of TYPE should be 'j' or 'J' to indicate that it is a
character informat.
data temp;
length start label $8 type $1 hlo $1;
fmtname='$newchr'; type='j';
start='abc'; label='xyz'; hlo=' '; output;
start=' '; label='else'; hlo='O'; output;
run;
1182 Chapter 30 / FORMAT Procedure
Read in the CNTLIN= data set to create the character informat. This example uses
temp as the value for CNTLIN. If you are saving the code example, specify a more
descriptive name.
proc format cntlin=temp; run;
Create two DATA steps that are the same as the previously created versions. Use
the informat names NEWTEST and $NEWCHR for these versions.
data _null_;
input value:newtest. @@;
put value=;
datalines;
abc xyz ghi 4
run;
data _null_;
input value:$newchr. @@;
put value=;
datalines;
abc xyz ghi 4
run;
Show that the FMTNAME value can start with an @ to indicate that it is an
informat, and that the type variable is not necessary. Numeric and character
informats can be created in the same CNTLIN= data set. The label variable must be
a character because character formats are being defined. Numeric values are saved
as character strings that contain numeric values.
data temp;
length start label $8 hlo $1;
fmtname='@new2test';
start='abc'; label='1'; hlo=' '; output;
start='xyz'; label='2'; hlo=' '; output;
start=' '; label='3'; hlo='O'; output;
fmtname='@$new2chr';
start='abc'; label='xyz'; hlo=' '; output;
start=' '; label='else'; hlo='O'; output;
run;
data _null_;
input value:new2test. @@;
put value=;
datalines;
abc xyz ghi 4
run;
data _null_;
input value:$new2chr. @@;
put value=;
datalines;
abc xyz ghi 4
run;
Print the contents of the CNTLIN= data set. Label the table “The CTRL Data Set.”
Example 15: Printing the Description of Informats and Formats 1183
Details
This example illustrates how to print a description of an informat and a format. The
description shows the values that are read in and written.
Program
libname library 'SAS-library';
proc format library=library fmtlib;
select @evaluation benefit;
title 'FMTLIB Output for the BENEFIT. Format and the';
title2 'EVALUATION. Informat';
run;
1184 Chapter 30 / FORMAT Procedure
Program Description
Set up a SAS library reference named LIBRARY.
libname library 'SAS-library';
Output
Output 30.14 FMTLIB Output for the BENEFIT Format and the EVALUATION
Informat
Example 16: Retrieving a Permanent Format 1185
This example uses the LIBRARY= option and the FMTSEARCH= system option to
store and retrieve a format stored in a catalog other than Work.Formats or
Library.Formats.
Program
libname proclib 'SAS-library';
proc format library=proclib;
picture nozeros (fuzz=0)
low - -1 = '000.00'(prefix='-')
-1 < - < -.99 = '0.99' (prefix='-.' mult=100)
-0.99 < - < 0 = '99' (prefix='-.' mult=100)
0 = '0.99'
0 < - < .99 = '99' (prefix='.' mult=100)
0.99 - <1 = '0.99' (prefix='.' mult=100)
1 - high = '00.99';
run;
options fmtsearch=(proclib);
data sample;
input Amount;
datalines;
-2.051
-.05
-.017
0
.093
.54
.556
6.6
14.63
0.996
-0.999
-45.00
;
run;
proc print data=sample;
format amount nozeros.;
title1 'Retrieving the NOZEROS. Format from PROCLIB.FORMATS';
1186 Chapter 30 / FORMAT Procedure
Program Description
Set up a SAS library reference named PROCLIB.
libname proclib 'SAS-library';
Create the NOZEROS. format. The PICTURE statement defines the picture format
NOZEROS. See “Details” on page 1112.
picture nozeros (fuzz=0)
low - -1 = '000.00'(prefix='-')
-1 < - < -.99 = '0.99' (prefix='-.' mult=100)
-0.99 < - < 0 = '99' (prefix='-.' mult=100)
0 = '0.99'
0 < - < .99 = '99' (prefix='.' mult=100)
0.99 - <1 = '0.99' (prefix='.' mult=100)
1 - high = '00.99';
run;
Add the PROCLIB.FORMATS catalog to the search path that SAS uses to find
user-defined formats. The FMTSEARCH= system option defines the search path.
The FMTSEARCH= system option requires only a libref. FMTSEARCH= assumes
that the catalog name is FORMATS if no catalog name appears. Without the
FMTSEARCH= option, SAS would not find the NOZEROS. format. For more
information, see “FMTSEARCH= System Option” in SAS System Options: Reference.
options fmtsearch=(proclib);
Print the SAMPLE data set. The FORMAT statement associates the NOZEROS.
format with the Amount variable.
Example 17: Writing Ranges for Character Strings 1187
Output
Output 30.15 Retrieving the NOZEROS. Format from PROCLIB.FORMATS
This example creates a format and shows how to use ranges with character strings.
1188 Chapter 30 / FORMAT Procedure
Program
libname proclib'SAS-library';
data train;
set proclib.staff(keep=name idnumber);
run;
proc print data=train noobs;
title 'The TRAIN Data Set without a Format';
run;
Program Description
libname proclib'SAS-library';
Create the TRAIN data set from the PROCLIB.STAFF data set. PROCLIB.STAFF
was created in “Example 2: Create the Example Data Set” on page 1149.
data train;
set proclib.staff(keep=name idnumber);
run;
Print the data set TRAIN without a format. The NOOBS option suppresses the
printing of observation numbers.
proc print data=train noobs;
Output
Output 30.16 The TRAIN Data Set without a Format
Store the format in Work.Formats. Because the LIBRARY= option does not appear,
the format is stored in Work.Formats and is available only for the current SAS
session.
proc format;
Create the $SKILLTEST. format. The $SKILLTEST. format prints each employee's
identification number and the skills test that they have been assigned. Employees
must take either TEST A, TEST B, or TEST C, depending on their last name. The
exclusion operator (<) excludes the last value in the range. Thus, the first range
includes employees whose last name begins with any letter from A through D, and
the second range includes employees whose last name begins with any letter from
E through M. The tilde (~) in the last range is necessary to include an entire string
that begins with the letter Z.
value $skilltest 'a'-<'e','A'-<'E'='Test A'
'e'-<'m','E'-<'M'='Test B'
'm'-'z~','M'-'Z~'='Test C';
run;
Generate a report of the TRAIN data set. The FORMAT= option in the DEFINE
statement associates the $SKILLTEST. format with the Name variable. The column
that contains the formatted values of Name is using the alias Test. Using an alias
enables you to print a variable twice, once with a format and once with the default
format. For more information, see Chapter 58, “REPORT Procedure,” on page 2047.
proc report data=train nowd headskip;
column name name=test idnumber;
define test / display format=$skilltest. 'Test';
1190 Chapter 30 / FORMAT Procedure
Output
Output 30.17 Test Assignment for Each Employee
Details
This example does the following tasks:
n Creates picture formats using directives for formatting date and datetime
values by using the DATATYPE= statement option.
Example 18: Creating a Format in a non-English Language 1191
n Uses the LOCALE= system option to specify the locale for German.
n Prints date and datetime values to the SAS log in German using the picture
formats.
n Prints a datetime value in French to the log by using the picture format that
specifies LANGUAGE=French.
Program
proc format;
picture mdy(default=8) other='%0d%0m%Y' (datatype=date);
picture langtsda (default=50) other='%A, %d %B, %Y' (datatype=date);
picture langtsdt (default=50) other='%A, %d,%B, %Y %H %M %S'
(datatype=datetime);
picture langtsfr (default=50) other='%A, %d %B, %Y %H %M %S'
(datatype=datetime language=french);
picture alltest (default=100)
other='%a %A %b %B %d %H %I %j %m %M %p %S %w %U %y %%'
(datatype=datetime);
run;
option locale = de_DE;
data _null_ ;
a= 18903;
b = 1633239000;
put a= mdy.;
put a= langtsda.;
put b= langtsdt.;
put b= langtsfr.;
put b= alltest.;
run ;
Program Description
Create formats using the PICTURE statement. Each PICTURE statement specifies
the date or datetime values to format using directives. %A prints a full weekday
name. %B prints a full month name. %d prints the day of the month. %Y prints the
year. %H prints the hour (24–hour clock). %M prints the minute. %S prints the
seconds. The first three formats print the date or datetime in the language specified
by the current value of the LOCALE= system option. The format LANGTSFT. prints
the datetime in French. For the remaining directives, see the PICTURE statement on
page 1098.
proc format;
picture mdy(default=8) other='%0d%0m%Y' (datatype=date);
picture langtsda (default=50) other='%A, %d %B, %Y' (datatype=date);
picture langtsdt (default=50) other='%A, %d,%B, %Y %H %M %S'
(datatype=datetime);
picture langtsfr (default=50) other='%A, %d %B, %Y %H %M %S'
(datatype=datetime language=french);
picture alltest (default=100)
1192 Chapter 30 / FORMAT Procedure
other='%a %A %b %B %d %H %I %j %m %M %p %S %w %U %y %%'
(datatype=datetime);
run;
Set the LOCALE= system option. de_DE is the locale value for Germany.
option locale = de_DE;
Print date and datetime values in German and French. The DATA step prints to the
SAS log the date and datetime information for 3 October, 2011, 05:30:00 AM. All
values are written in German except for the value of b when it is formatted using
the LANGTSFR. format. The LANGSTSFR. format prints the datetime value in
French.
data _null_ ;
a= 18903;
b = 1633239000;
put a= mdy.;
put a= langtsda.;
put b= langtsdt.;
put b= langtsfr.;
put b= alltest.;
run ;
Example 19: Creating a Locale-Specific Format Catalog 1193
1 proc format;
2 picture mdy(default=8) other='%0d%0m%Y' (datatype=date);
NOTE: Format MDY has been output.
3 picture langtsda (default=50) other='%A, %d %B, %Y' (datatype=date);
NOTE: Format LANGTSDA has been output.
4 picture langtsdt (default=50) other='%A, %d,%B, %Y %H %M %S'
5 (datatype=datetime);
NOTE: Format LANGTSDT has been output.
6 picture langtsfr (default=50) other='%A, %d %B, %Y %H %M %S'
7 (datatype=datetime language=french);
NOTE: Format LANGTSFR has been output.
8 picture alltest (default=100)
9 other='%a %A %b %B %d %H %I %j %m %M %p %S %w %U %y %%'
10 (datatype=datetime);
NOTE: Format ALLTEST has been output.
11 run;
12
13 option locale = de_DE;
14
15 data _null_ ;
16 a= 18903;
17 b = 1633239000;
18 put a= mdy.;
19 put a= langtsda.;
20 put b= langtsdt.;
21 put b= langtsfr.;
22 put b= alltest.;
23 run ;
a=03102011
a=Montag, 3 Oktober, 2011
b=Montag, 3,Oktober, 2011 5 30 0
b=Lundi, 3 octobre, 2011 5 30 0
b=Mo Montag Okt Oktober 3 5 5 276 10 30 AM 0 2 40 11 %
Details
This example demonstrates how to create a format in two languages, English and
Romanian, and how to access the English and Romanian format catalogs to print a
data set in the two languages. The example works best if the SAS session encoding
is a latin 2 encoding that supports the Romanian locale.
Program
/*no locale information*/
proc format lib=work.formats;
value age low - 5 = 'baby'
6 - 12 = 'child'
13 - 15 = 'teen'
16 - 30 = 'youth'
31 - 50 = 'midlife'
51 - high = 'older';
run;
options locale=ro_RO;
options fmtsearch=(work/locale);
/* Set the locale back to English(US) */
options locale=en_US;
data datatst;
input age sex $;
attrib age format= age.;
cards;
5 M
6 F
12 M
13 F
15 M
16 F
30 M
35 F
51 M
100 F
;
Example 19: Creating a Locale-Specific Format Catalog 1195
run;
/* Use the English format catalog*/
title "Locale is English, Use the Original Format Catalog";
proc print data=datatst; run;
/* Use the Romanian format catalog*/
options locale=ro_RO;
title 'Locale is ro_RO, Use the Romanian Format Catalog';
proc print data=datatst;run;
Program Description
Create the AGE. format in English.
/*no locale information*/
proc format lib=work.formats;
value age low - 5 = 'baby'
6 - 12 = 'child'
13 - 15 = 'teen'
16 - 30 = 'youth'
31 - 50 = 'midlife'
51 - high = 'older';
run;
Change the locale and create the AGE. format in a locale-specific format catalog.
Using the LOCALE= system option, the locale is change to the Romanian locale. In
the PROC FORMAT statement, the LOCALE option specifies to create a format
catalog that corresponds to the current locale, which is for the Romanian language.
options locale=ro_RO;
Add the locale-specific format catalogs to the format search path. The
FMTSEARCH= system option specifies the format catalog to search. Because you
can create more than one locale-specific catalog, when /LOCALE is added to a
libref in the search list, SAS searches for a catalog that is associated with the
current locale.
options fmtsearch=(work/locale);
Create a data set and print it using the English format catalog. The LOCALE=
system option sets the locale to English.
/* Set the locale back to English(US) */
options locale=en_US;
data datatst;
input age sex $;
1196 Chapter 30 / FORMAT Procedure
Print the data set Using the Romanian format catalog. Using the LOCALE= system
option, the locale is set to the Romanian locale.
/* Use the Romanian format catalog*/
options locale=ro_RO;
title 'Locale is ro_RO, Use the Romanian Format Catalog';
proc print data=datatst;run;
Here is the data set printed using the English and Romanian format catalogs:
Example 19: Creating a Locale-Specific Format Catalog 1197
Output 30.18 A Data Set Printed Using an English and Romanian Format Catalog
1198 Chapter 30 / FORMAT Procedure
Details
This example creates a function that converts temperatures from Celsius to
Fahrenheit and Fahrenheit to Celsius. The program uses the function as a function
in one DATA step and then as a format in another DATA step.
Program
proc fcmp outlib=library.functions.smd;
function ctof(c) $;
return(cats(((9*c)/5)+32,'F'));
endsub;
function ftoc(f) $;
return(cats((f-32)*5/9,'C'));
endsub;
run;
options cmplib=(library.functions);
data _null_;
f=ctof(100);
put f=;
run;
proc format;
value ctof (default=10) other=[ctof()];
value ftoc (default=10) other=[ftoc()];
run;
data _null_;
c=100;
put c=ctof.;
f=212;
put f=ftoc.;
run;
Example 20: Creating a Function to Use as a Format 1199
Program Description
Create the functions that change temperature from Celsius to Fahrenheit and
Fahrenheit to Celsius. The FCMP procedure creates the CTOF function to convert
Celsius temperatures to Fahrenheit and the FTOC to convert Fahrenheit
temperatures to Celsius.
proc fcmp outlib=library.functions.smd;
function ctof(c) $;
return(cats(((9*c)/5)+32,'F'));
endsub;
function ftoc(f) $;
return(cats((f-32)*5/9,'C'));
endsub;
run;
Access the function library. The CMPLIB system option enables the functions to be
included during program compilation.
options cmplib=(library.functions);
Create user-defined formats using the functions. The name of the format is the
name of the function. When you use a function as a format, you can nest the format
as shown by the OTHER keyword.
proc format;
value ctof (default=10) other=[ctof()];
value ftoc (default=10) other=[ftoc()];
run;
Use the function as a format. This DATA step formats temperatures using a named
PUT statement, where you assign a format to a variable in the PUT statement.
data _null_;
c=100;
put c=ctof.;
f=212;
put f=ftoc.;
run;
1200 Chapter 30 / FORMAT Procedure
Output: Log
Example Code 30.6 The SAS Log After Creating a Function to Use as a Format
333
334 options cmplib=(library.functions);
335
336 data _null_;
337 f=ctof(100);
338 put f=;
339 run;
f=212F
NOTE: DATA statement used (Total process time):
real time 0.50 seconds
cpu time 0.01 seconds
340
341 proc format;
342 value ctof (default=10) other=[ctof()];
NOTE: Format CTOF has been output.
343 value ftoc (default=10) other=[ftoc()];
NOTE: Format FTOC has been output.
344 run;
345
346 data _null_;
347 c=100;
348 put c=ctof.;
349 f=212;
350 put f=ftoc.;
351 run;
c=212F
f=100C
Example 21: Using a Format to Create a Drill-down Table 1201
Details
This example creates an HTML table that has population information about five
U.S. states. The name of the state is a link to the state’s website. The link is created
using a user-defined format to format the state name. This example does the
following:
n creates the data set that contains the state population information
n creates a user-defined format using the VALUE statement, where the value is an
HTML link (<a>) element
n defines the name of the HTML file and the titles for the HTML file
Program
data mydata;
format population comma12.0;
label st='State';
label population='Population';
input st $ 1-2 population;
year=2000;
datalines;
VA 7078515
NC 8049313
SC 4012012
GA 8186453
FL 15982378
;
run;
proc format;
value $COMPND
'VA'='<a href=http://www.va.gov>VA</a>'
'NC'='<a href=http://www.nc.gov>NC</a>'
'SC'='<a href=http://www.sc.gov>SC</a>'
'GA'='<a href=http://www.ga.gov>GA</a>'
1202 Chapter 30 / FORMAT Procedure
'FL'='<a href=http://www.fl.gov>FL</a>';
run;
ods html file="c:\mySAS\html\Drilldown.htm"
(title="An ODS HTML Drill-down Table Using a User-defined Format in
the PRINT
Procedure");
Program Description
Create the data set. The mydata DATA step creates a data set that contains
information about five U.S. state populations based on the census taken in the year
2000. The variables that are created assign data for the year of the census, the
state abbreviations, and the state population.
data mydata;
format population comma12.0;
label st='State';
label population='Population';
input st $ 1-2 population;
year=2000;
datalines;
VA 7078515
NC 8049313
SC 4012012
GA 8186453
FL 15982378
;
run;
Create the $COMPND. format. The $COMPND. format formats each state as a link
to the state’s respective website.
proc format;
value $COMPND
'VA'='<a href=http://www.va.gov>VA</a>'
'NC'='<a href=http://www.nc.gov>NC</a>'
'SC'='<a href=http://www.sc.gov>SC</a>'
'GA'='<a href=http://www.ga.gov>GA</a>'
'FL'='<a href=http://www.fl.gov>FL</a>';
run;
Example 21: Using a Format to Create a Drill-down Table 1203
Set up the table filename and table titles. The ODS HTML FILE= option names the
directory and filename where SAS saves the HTML output.
ods html file="c:\mySAS\html\Drilldown.htm"
(title="An ODS HTML Drill-down Table Using a User-defined Format in
the PRINT
Procedure");
Print the table and close and reopen the HTML destination. The PRINT procedure
uses the format $COMPND. to format the state name. The formatted name is a link
to the state’s respective website. The ODS HTML statements close and reopen the
HTML destination so that future output does not overwrite the HTML file that you
just created.
options nodate;
proc print data=mydata label noobs;
var st population;
format st $compnd. ;
run;
Output
Output 30.19 Using a Format to Create Drill-down Text in an HTML Table
1204 Chapter 30 / FORMAT Procedure
1205
31
FSLIST Procedure
Statement Task
PROC FSLIST Initiate the FSLIST procedure and specify the external
file to browse
Restriction: This procedure is not available in SAS Viya orders that include only SAS Visual
Analytics.
Syntax
PROC FSLIST FILEREF=file-specification | UNIT=nn <options>;
PROC FSLIST DDNAME=file-specification | UNIT=nn <options>;
PROC FSLIST DD=file-specification | UNIT=nn <options>;
NOOVP
indicates whether the carriage-control code for overprinting is in effect.
Required Arguments
FILEREF | DDNAME | DD=file-specification
specifies the external file to browse. File-specification can be one of the
following:
'external-file'
is the complete operating environment file specification (called the fully
qualified pathname under some operating environments) for the external
file. You must enclose the name in quotation marks.
fileref
is a fileref that has been previously assigned to the external file. You can use
the FILENAME statement to associate a fileref with an actual filename. For
more information, see “FILENAME Statement” in SAS Global Statements:
Reference.
UNIT=nn
defines the FORTRAN-style logical unit number of the external file to browse.
This option is useful when the file to browse has a fileref of the form FTnnF001,
where nn is the logical unit number that is specified in the UNIT= argument. For
example, you can specify the following: proc fslist unit=20; instead of
proc fslist fileref=ft20f001;
Optional Arguments
CAPS | NOCAPS
controls how search strings for the FIND command are treated:
CAPS
converts search strings into uppercase unless they are enclosed in quotation
marks. For example, with this option in effect, the command find nclocates
occurrences of NC, but not nc. To locate lowercase characters, enclose the
search string in quotation marks: find 'nc'
NOCAPS
does not perform a translation. The FIND command locates only those text
strings that exactly match the search string.
The default is NOCAPS. You can use the CAPS command in the FSLIST window
to change the behavior of the procedure while you are browsing a file.
CC | FORTCC | NOCC
indicates whether carriage-control characters are used to format the display.
You can specify one of the following values for this option:
CC
uses the native carriage-control characters of the operating environment.
FORTCC
uses FORTRAN-style carriage control. The first column of each line in the
external file is not displayed. The character in this column is interpreted as a
1208 Chapter 31 / FSLIST Procedure
+
skip zero lines and print (overprint).
blank
skip one line and print (single space).
0
skip two lines and print (double space).
-
skip three lines and print (triple space).
1
go to new page and print.
NOCC
treats carriage-control characters as regular text.
If the FSLIST procedure can determine from the file's attributes that the file
contains carriage-control information, then that carriage-control information is
used to format the displayed text. In this case, the CC option is the default.
Otherwise, the entire contents of the file are treated as text. In this case, the
NOCC option is the default.
n
sets the default scroll amount to n columns.
HALF
sets the default scroll amount to half the window width.
PAGE
sets the default scroll amount to the full window width.
The default is HSCROLL=HALF. You can use the HSCROLL command in the
FSLIST window to change the default scroll amount.
NOBORDER
suppresses the sides and bottom of the FSLIST window's border. When this
option is used, text can appear in the columns and row that are normally
occupied by the border.
NUM | NONUM
controls the display of line sequence numbers in files that have a record length
of 80 and contain sequence numbers in columns 73 through 80. NUM displays
the line sequence numbers. NONUM suppresses them.
Usage: FSLIST Procedure 1209
Default NONUM
OVP | NOOVP
indicates whether the carriage-control code for overprinting is in effect:
OVP
causes the procedure to honor the overprint code and print the current line
over the previous line when the code is encountered.
NOOVP
causes the procedure to ignore the overprint code and print each line from
the file on a separate line of the display.
FSLIST Command
Initiates an FSLIST session from any SAS window. The command enables you to
use either a fileref or a filename to specify the file to browse. It also enables you to
specify how carriage-control information is interpreted.
Syntax
FSLIST <* | ? | file-specification <carriage-control-option <overprinting-option>>>
Without Arguments
If you do not specify any of these three arguments, then a selection window
appears that enables you to select an external filename.
Optional Arguments
*
opens a dialog box in which you can specify the name of the file to browse,
along with various FSLIST procedure options. In the dialog box, you can specify
either a physical filename, a fileref, or a directory name. If you specify a
directory name, then a selection list of the files in the directory appears, from
which you can choose the desired file.
1210 Chapter 31 / FSLIST Procedure
?
opens a selection window from which you can choose the external file to
browse. The selection list in the window includes all external files that are
identified in the current SAS session (all files with defined filerefs).
To select a file, position the cursor on the corresponding fileref and press Enter.
Notes Only filerefs that are defined within the current SAS session appear in
the selection list. Under some operating environments, it is possible to
allocate filerefs outside of SAS. Such filerefs do not appear in the
selection list that is displayed by the FSLIST command.
file-specification
identifies the external file to browse. File-specification can be one of the
following:
'external-file'
the complete operating environment file specification (called the fully
qualified pathname under some operating environments) for the external
file. You must enclose the name in quotation marks.
If the specified file is not found, then a selection window appears that shows
all available filerefs.
fileref
a fileref that is currently assigned to an external file. If you specify a fileref
that is not currently defined, then a selection window appears that shows all
available filerefs. An error message in the selection window indicates that
the specified fileref is not defined.
If you specify file-specification with the FSLIST command, then you can also
use the following carriage control or overprinting options. These options are
not valid with the ? argument, or when no argument is used:
CC
FORTCC
NOCC
indicates whether carriage-control characters are used to format the
display.
If the FSLIST procedure can determine from the file's attributes that the
file contains carriage-control information, then that carriage-control
information is used to format the displayed text. In this case, the CC
option is the default. Otherwise, the entire contents of the file are treated
as text. In this case, the NOCC option is the default.
You can specify one of the following values for this option:
CC
uses the native carriage-control characters of the operating
environment.
Usage: FSLIST Procedure 1211
FORTCC
uses FORTRAN-style carriage control. See the discussion of the
PROC FSLIST statement's FORTCC option for details.
NOCC
treats carriage-control characters as regular text.
OVP | NOOVP
indicates whether the carriage-control code for overprinting is honored. OVP
causes the overprint code to be honored. NOOVP causes it to be ignored. The
OVP option is ignored if NOCC is in effect.
Default NOOVP
Depending on your operating environment, the text that you copy can then be
pasted into any SAS window that uses the SAS text editor, including the FSLETTER
window in SAS/FSP software, or into any other application that allows pasting of
text.
You can use commands in the command window or command line to control the
FSLIST window.
Global Commands
In the FSLIST window, you can use any of the global commands that are described
in the SAS/FSP Procedures Guide.
1212 Chapter 31 / FSLIST Procedure
Scrolling Commands
n
scrolls the window so that line n of text is at the top of the window. Type the
desired line number in the command window or on the command line and press
Enter. If n is greater than the number of lines in the file, then the last few lines of
the file are displayed at the top of the window.
BACKWARD <n|HALF | PAGE | MAX>
scrolls vertically toward the first line of the file. The following scroll amounts
can be specified:
n
scrolls upward by the specified number of lines.
HALF
scrolls upward by half the number of lines in the window.
PAGE
scrolls upward by the number of lines in the window.
MAX
scrolls upward until the first line of the file is displayed.
If the scroll amount is not explicitly specified, then the window is scrolled by the
amount that was specified in the most recent VSCROLL command. The default
VSCROLL amount is PAGE.
BOTTOM
scrolls downward until the last line of the file is displayed.
FORWARD <n|HALF | PAGE | MAX>
scrolls vertically toward the end of the file. The following scroll amounts can be
specified:
n
scrolls downward by the specified number of lines.
HALF
scrolls downward by half the number of lines in the window.
PAGE
scrolls downward by the number of lines in the window.
MAX
scrolls downward until the first line of the file is displayed.
If the scroll amount is not explicitly specified, then the window is scrolled by the
amount that was specified in the most recent VSCROLL command. The default
VSCROLL amount is PAGE. Regardless of the scroll amount, this command does
not scroll beyond the last line of the file.
HSCROLL <n|HALF | PAGE>
sets the default horizontal scrolling amount for the LEFT and RIGHT commands.
The following scroll amounts can be specified:
n
sets the default scroll amount to the specified number of columns.
Usage: FSLIST Procedure 1213
HALF
sets the default scroll amount to half the number of columns in the window.
PAGE
sets the default scroll amount to the number of columns in the window.
If the scroll amount is not explicitly specified, then the window is scrolled by the
amount that was specified in the most recent HSCROLL command. The default
HSCROLL amount is HALF. Regardless of the scroll amount, this command does
not scroll beyond the left margin of the text.
RIGHT <n|HALF | PAGE | MAX>
scrolls horizontally toward the right margin of the text. This command is ignored
unless the file width is greater than the window width. The following scroll
amounts can be specified:
n
scrolls right by the specified number of columns.
HALF
scrolls right by half the number of columns in the window.
PAGE
scrolls right by the number of columns in the window.
MAX
scrolls right until the right margin of the text is displayed at the left edge of
the window.
If the scroll amount is not explicitly specified, then the window is scrolled by the
amount that was specified in the most recent HSCROLL command. The default
HSCROLL amount is HALF. Regardless of the scroll amount, this command does
not scroll beyond the right margin of the text.
TOP
scrolls upward until the first line of text from the file is displayed.
VSCROLL <n | HALF | PAGE>
sets the default vertical scrolling amount for the FORWARD and BACKWARD
commands. The following scroll amounts can be specified:
1214 Chapter 31 / FSLIST Procedure
n
sets the default scroll amount to the specified number of lines.
HALF
sets the default scroll amount to half the number of lines in the window.
PAGE
sets the default scroll amount to the number of lines in the window.
Searching Commands
If a FIND command has previously been issued, then you can use the BFIND
command without arguments to repeat the search in the opposite direction.
The CAPS option in the PROC FSLIST statement and the CAPS ON command
cause search strings to be converted to uppercase for the purposes of the
search, unless the strings are enclosed in quotation marks. See the discussion of
the FIND command for details.
By default, the BFIND command locates any occurrence of the specified string,
even where the string is embedded in other strings. You can use any one of the
following options to alter the command's behavior:
PREFIX
causes the search string to match the text string only when the text string
occurs at the beginning of a word.
SUFFIX
causes the search string to match the text string only when the text string
occurs at the end of a word.
WORD
causes the search string to match the text string only when the text string is
a distinct word.
You can use the RFIND command to repeat the most recent BFIND command.
CAPS <ON | OFF>
controls how the FIND, BFIND, and RFIND commands locate matches for a
search string. By default, the FIND, BFIND, and RFIND commands locate only
those text strings that exactly match the search string as it was entered. When
you issue the CAPS command, the FIND, BFIND, and RFIND commands convert
search strings into uppercase for the purposes of searching (displayed text is
not affected), unless the strings are enclosed in quotation marks. Strings in
quotation marks are not affected.
For example, after you issue a CAPS ON command, both of the following
commands locate occurrences of NC but not occurrences of nc: find NC, find
Usage: FSLIST Procedure 1215
nc. If you omit the ON or OFF argument, then the CAPS command acts as a
toggle, turning the attribute on if it was off or off if it was on.
FIND search-string <NEXT | FIRST | LAST | PREV | ALL> <PREFIX | SUFFIX |
WORD>
locates an occurrence of the specified search-string in the file. The search-string
must be enclosed in quotation marks if it contains embedded blanks.
The text in the search-string must match the text in the file in terms of both
characters and case. For example, the following command locates occurrences
of raleigh: find raleigh. The following command locates occurrences of
Raleigh: find Raleigh.
When the CAPS option is used with the PROC FSLIST statement or when a
CAPS ON command is issued in the window, the search string is converted to
uppercase for the purposes of the search, unless the string is enclosed in
quotation marks. In that case, the command find raleigh will locate only the
text RALEIGH in the file. You must instead use the command find 'Raleigh' to
locate the text Raleigh.
You can modify the behavior of the FIND command by adding any one of the
following options:
ALL
reports the total number of occurrences of the string in the file in the
window's message line and moves the cursor to the first occurrence.
FIRST
moves the cursor to the first occurrence of the string in the file.
LAST
moves the cursor to the last occurrence of the string in the file.
NEXT
moves the cursor to the next occurrence of the string in the file.
PREV
moves the cursor to the previous occurrence of the string in the file.
By default, the FIND command locates any occurrence of the specified string,
even where the string is embedded in other strings. You can use any one of the
following options to alter the command's behavior:
PREFIX
causes the search string to match the text string only when the text string
occurs at the beginning of a word.
SUFFIX
causes the search string to match the text string only when the text string
occurs at the end of a word.
WORD
causes the search string to match the text string only when the text string is
a distinct word.
After you issue a FIND command, you can use the RFIND command to repeat
the search for the next occurrence of the string, or you can use the BFIND
command to repeat the search for the previous occurrence.
1216 Chapter 31 / FSLIST Procedure
RFIND
repeats the most recent FIND command, starting at the current cursor position
and proceeding forward toward the end of the file.
Display Commands
COLUMN <ON | OFF>
displays a column ruler below the message line in the FSLIST window. The ruler
is helpful when you need to determine the column in which a particular
character is located. If you omit the ON or OFF specification, then the COLUMN
command acts as a toggle, turning the ruler on if it was off and off if it was on.
HEX <ON | OFF>
controls the special hexadecimal display format of the FSLIST window. When
the hexadecimal format is turned on, each line of characters from the file
occupies three lines of the display. The first is the line displayed as characters.
The next two lines of the display show the hexadecimal value of the operating
environment's character codes for the characters in the line of text. The
hexadecimal values are displayed vertically, with the most significant byte on
top. If you omit the ON or OFF specification, then the HEX command acts as a
toggle, turning the hexadecimal format on if it was off and off if it was on.
NUMS <ON | OFF>
controls whether line numbers are shown at the left side of the window. By
default, line numbers are not displayed. If line numbers are turned on, then they
remain at the left side of the display when text in the window is scrolled right
and left. If you omit the ON or OFF argument, then the NUMS command acts as
a toggle, turning line numbering on if it was off or off if it was on.
Other Commands
BROWSE fileref | 'actual-filename' <CC | FORTCC | NOCC <OVP | NOOVP>>
closes the current file and displays the specified file in the FSVIEW window. You
can specify either a fileref previously associated with a file or an actual filename
enclosed in quotation marks. The BROWSE command also accepts the same
carriage-control options as the FSLIST command. See “Optional Arguments” on
page 1209 for details.
END
closes the FSLIST window and ends the FSLIST session.
HELP <command>
opens a Help window that provides information about the FSLIST procedure
and about the commands available in the FSLIST window. To get information
about a specific FSLIST window command, follow the HELP command with the
name of the desired command.
KEYS
opens the KEYS window for browsing and editing function key definitions for
the FSLIST window. The default key definitions for the FSLIST window are
stored in the FSLIST.KEYS entry in the Sashelp.Fsp catalog.
Usage: FSLIST Procedure 1217
If you change any key definitions in the KEYS window, then a new FSLIST.KEYS
entry is created in your personal PROFILE catalog (Sasuser.Profile, or
Work.Profile if the Sasuser library is not allocated).
When the FSLIST procedure is initiated, it looks for function key definitions first
in the FSLIST.KEYS entry in your personal PROFILE catalog. If that entry does
not exist, then the default entry in the Sashelp.Fsp catalog is used.
1218 Chapter 31 / FSLIST Procedure
1219
32
GROOVY Procedure
PROC GROOVY can run Groovy statements that are written as part of your SAS
code, and it can run statements that are in files that you specify with PROC
GROOVY commands. It can parse Groovy statements into Groovy Class objects,
1220 Chapter 32 / GROOVY Procedure
and it can run these objects or make them available to other PROC GROOVY
statements or Java DATA Step Objects. You can also use PROC GROOVY to update
your CLASSPATH environment variable with additional CLASSPATH strings or
filerefs to JAR files.
Special Considerations
Groovy code that is submitted with PROC GROOVY runs as the process owner, and
has the same access to resources (file system, network, and so on) as any process
owner. Groovy code access to resources can cause problems when SAS code is
running inside multiuser servers like the Stored Process Server. To give
administrators some control over this functionality, PROC GROOVY runs only if the
NOXCMD option is turned off. All SAS servers are shipped with the NOXCMD
option turned on.
The use of a percent character (%) in the first byte of the text that is written by
Java to the SAS log is reserved by SAS. If you need to write a percent character in
the first byte of a Java text line, then you must immediately follow it with another
percent character (%%).
PROC GROOVY does not support the THREADS | NOTHREADS SAS system
option. However, Groovy code that you submit with PROC GROOVY can use
threaded processing in the JVM.
QUIT;
PROC GROOVY Enable SAS code to run Groovy code on the Ex. 1
JVM
Restriction: This procedure is not available in SAS Viya orders that include only SAS Visual
Analytics.
Syntax
PROC GROOVY <classpath options>;
Optional Argument
classpath options
can be one of the following:
CLASSPATH=
specifies a quoted CLASSPATH string or a fileref to a specific JAR file that is
to be added to the current classpath. This path is searched after the paths
that are in the user’s CLASSPATH environment variable.
Alias PATH=
1222 Chapter 32 / GROOVY Procedure
SASJAR=<version=> | <range=>
specifies a quoted string that identifies a JAR in the Versioned JAR
Repository (VJR) that should be added to the current classpath. The
VERSION and RANGE values are optional. RANGE takes precedence over
VERSION, as in the following example:
ADD SASJAR="sas.core";
ADD SASJAR="sas.core" version="903000.9.0.20100810190000_v930";
ADD SASJAR="sas.core" range="[0,909000]";
Note: SAS JAR files do not have a source compatibility guarantee across
versions of SAS. Future versions of this JAR can change without notice. To
ensure continued functionality, contact SAS Technical Support.
Details
PROC GROOVY uses the current user’s CLASSPATH environment variable as the
base for building its classpath. You can use the CLASSPATH and SASJAR options to
add paths to the current classpath.
When a class is loaded, the paths are searched in the following order:
1 CLASSPATH environment variable when process started
2 paths added with the ADD CLASSPATH and ADD SASJAR statements in the
order in which they were executed
ADD Statement
Appends the given classpath to the current CLASSPATH environment variable.
Syntax
ADD classpath options;
Required Argument
classpath options
can be one of the following:
CLASSPATH=
specifies a quoted CLASSPATH string or a fileref to a specific JAR file that is
to be added to the current classpath. This path is searched after the paths
that are in the user’s CLASSPATH environment variable.
Alias PATH=
EVALUATE Statement 1223
SASJAR=<version=> | <range=>
specifies a quoted string that identifies a JAR file in the Versioned JAR
Repository (VJR) that should be added to the current classpath. The
VERSION and RANGE values are optional. RANGE takes precedence over
VERSION, as in the following example:
ADD SASJAR="sas.core";
ADD SASJAR="sas.core" version="903000.9.0.20100810190000_v930";
ADD SASJAR="sas.core" range="[0,909000]";
Note: SAS JAR files do not have a source compatibility guarantee across
versions of SAS. Future versions of this JAR file can change without notice.
To ensure continued functionality, contact SAS Technical Support.
Details
The ADD statement appends the given classpath to the current CLASSPATH
environment variable.
You must specify at least one CLASSPATH or one SASJAR. You can specify
multiple CLASSPATHs or SASJARs.
EVALUATE Statement
Parses the Groovy statement that is provided in the quoted string into a groovy.lang.Script object
and calls the Run method on the Script.
Syntax
EVALUATE <(LOAD | PARSEONLY | NORUN)>
"Groovy statement string" <argument(s)>;
Required Argument
Groovy statement string
specifies a Groovy statement string that is to be parsed by the EVALUATE
statement.
Optional Arguments
LOAD | PARSEONLY | NORUN
parses the Groovy statement into a groovy.lang.Script object, but does not run
it. The arguments are aliases for each other.
1224 Chapter 32 / GROOVY Procedure
argument(s)
specifies arguments that are passed to the code that is being evaluated.
Details
The EVALUATE statement parses the Groovy statement that is provided in the
quoted string into a groovy.lang.Script object and calls the Run method on the
Script. If one of the LOAD, PARSEONLY, or NORUN options is present, then this
statement parses the Groovy statement into a Class object but does not run it. Any
classes that are defined by the Groovy code are then available for use by PROC
GROOVY statements or by Java DATA Step Objects.
EXECUTE Statement
Reads the contents of the file that is specified as either a quoted string path or as a fileref.
Syntax
EXECUTE <(LOAD | PARSEONLY | NORUN)>
Groovy filename | fileref <argument(s)>;
Required Arguments
Groovy filename
specifies the name of the Groovy file that is to be parsed by the EXECUTE
statement.
fileref
specifies the name of a fileref that is to be parsed by the EXECUTE statement.
Optional Arguments
LOAD | PARSEONLY | NORUN
parses the Groovy statement in the specified Groovy file or fileref into a
groovy.lang.Script object, but does not run it. The arguments are aliases for each
other.
argument(s)
specifies arguments that are passed to the code that is being executed.
SUBMIT Statement 1225
Details
The EXECUTE statement reads the contents of the file that is specified as either a
quoted string path or as a fileref. The contents are then parsed into a
groovy.lang.Script object, and the Run method is called on the Script. If one of the
LOAD, PARSEONLY, or NORUN options is present, then this statement parses the
file contents into a Class object but does not run it. Any classes that are defined by
the Groovy code are then available for use by PROC GROOVY statements or by
Java DATA Step Objects.
Note: If you used an EXEC PARSEONLY statement to compile a file into a Class,
then you must submit a CLASS statement so that changes to that file are honored
by future EXEC PARSEONLY commands. If you do not submit the CLEAR
statement, then any changes that you made to the file after you issued the EXEC
PARSEONLY statement are not included by subsequent submissions of the EXEC
PARSEONLY statement. You can use the GroovyScriptEngine Class if you need to
use reloadable scripts.
SUBMIT Statement
Parses the Groovy statements that are between the SUBMIT and ENDSUBMIT commands into a
groovy.lang.Script object and calls the Run method on the Script.
Syntax
SUBMIT <(LOAD | PARSEONLY | NORUN)> <argument(s)>;
Groovy statement(s)
ENDSUBMIT;
Required Argument
Groovy statement(s)
specifies Groovy statements that are to be parsed by the SUBMIT statement
into a groovy.lang.Script object.
Optional Arguments
LOAD | PARSEONLY | NORUN
parses the Groovy statements into a groovy.lang.Script object, but does not run
it. The arguments are aliases for each other.
argument(s)
specifies arguments that are passed to the code that is being submitted.
1226 Chapter 32 / GROOVY Procedure
Details
The SUBMIT statement parses the Groovy statements that are between the
SUBMIT and ENDSUBMIT commands into a groovy.lang.Script object and calls the
Run method on the Script. If one of the LOAD, PARSEONLY, or NORUN options is
present, then this statement parses the Groovy statements into a Class object but
does not run it. Any classes that are defined by the Groovy code are then available
for use by PROC GROOVY statements or by Java DATA Step Objects.
Note:
n The ENDSUBMIT statement must be on a line by itself and preceded by only
blank space.
n Macro substitution is disabled between the SUBMIT and ENDSUBMIT
commands.
n PROC GROOVY with multi-line submit commands cannot be used inside a
macro.
ENDSUBMIT Statement
Ends the Groovy statements that begin with the SUBMIT command.
Syntax
ENDSUBMIT;
Details
Ends the Groovy statements that begin with the SUBMIT statement.
Note: The ENDSUBMIT statement must be on a line by itself and preceded by only
blank space.
CLEAR Statement
Empties the binding and unloads the Groovy classloader.
Usage: GROOVY Procedure 1227
Syntax
CLEAR;
Details
The CLEAR statement empties the binding and unloads the Groovy classloader.
When this statement is executed, any variables that are saved in the binding are
rendered unavailable. Any classes that are loaded into the Groovy classloader are
also rendered unavailable.
Note: Neither the CLEAR statement nor the RESET statement resets the
System.Properties collection or the CLASSPATH.
Special Variables
PROC GROOVY has four special variables: BINDING, ARGS, EXPORTS, and SHELL.
It makes these variables available to any Groovy code that it is running.
BINDING
The BINDING special variable is used to share the state of objects between
executions of PROC GROOVY. It is populated by any variables that are created
without scope or that are explicitly stored in the binding. BINDING also holds all of
the other special variables that are discussed in this section. The binding can be
cleared with the CLEAR statement.
Note: The BINDING special variable is available to any Groovy code that PROC
GROOVY is running.
proc groovy;
eval "a = 42";
eval "binding.b = 84";
1228 Chapter 32 / GROOVY Procedure
ARGS
Arguments are passed to Groovy code in the ARGS special variable in the binding.
Note: The ARGS special variable is available to any Groovy code that PROC
GROOVY is running.
proc groovy;
eval "args.each{ println ""----> ev ${it}"" }" "arg1" "arg2" "arg3";
EXPORTS
The EXPORTS special variable contains a map in the binding. Adding a key or value
pair to this map will create a SAS macro variable when PROC GROOVY ends.
Groovy is case sensitive, but macros are not. If two keys exist in the map that differ
only by their case, then the one that is exported into a SAS macro is not
determined. You can also replace the EXPORTS variable in the binding with any
object that inherits from java.util.Map. If you replace the variable, all of the key or
value pairs in that object will be exported.
Note: The EXPORTS special variable is available to any Groovy code that PROC
GROOVY is running.
proc groovy;
eval "exports.fname = ""first name""";
eval "binding.exports.lname = ""last name""";
eval "exports.put('state', 'NC')";
quit;
data _NULL_;
put "----> &fname &lname: &state";
Example 1: Define Classes 1229
run;
proc groovy;
submit;
exports = [fname:"first name", lname: "last name", state: "NC"]
endsubmit;
quit;
data _NULL_;
put "----> &fname &lname: &state";
run;
SHELL
The SHELL special variable in the binding is set to the groovy.lang.GroovyShell that
was used to compile the current script. You must submit a CLEAR statement before
changes that were made to the execution.groovy file in this example are reflected
in subsequent runs of the code.
Note: The SHELL special variable is available to any Groovy code that PROC
GROOVY is running.
proc groovy;
eval "shell.run(
new File(""execution.groovy""),
[] as String[] )";
quit;
Note: If you need Groovy scripts that will be reloaded automatically when they are
modified, then create a new instance of the GroovyScriptEngine class.
ENDSUBMIT Statement
The following three examples show how to use PROC GROOVY to define a class.
Program
Groovy code is run by default. If your script does not have any executable code,
then an error is returned. The following example defines a class, but it does not
have any executable code, and an error is returned.
proc groovy classpath=cp;
submit;
class Speaker {
def say( word ) {
println "----> \"${word}\""
}
}
endsubmit;
quit;
Program
The following example shows how to define a class that can be run by including a
main method.
proc groovy classpath=cp;
submit;
class Speaker {
def Speaker() {
println "----> ctor"
}
def main( args ) {
println "----> main"
}
}
endsubmit;
quit;
Program
The following example shows how to use the PARSEONLY option to avoid a run
call. You can then use the new class in another execution of PROC GROOVY.
proc groovy classpath=cp;
submit parseonly;
class Speaker {
def say( word ) {
println "----> \"${word}\""
}
Example 2: Pass a Macro Variable to PROC GROOVY 1231
}
endsubmit;
quit;
The following example shows how to define a macro variable and pass it to PROC
GROOVY.
%let _inzip = C:/path/example.zip;
proc groovy;
submit "&_inzip.";
def zipFile = new java.util.zip.ZipFile(new File(args[0]))
zipFile.entries().each {
println zipFile.getInputStream(it).text
}
endsubmit;
quit;
1232 Chapter 32 / GROOVY Procedure
1233
33
HADOOP Procedure
Java, that provides distributed data storage and processing of large amounts of
data.
PROC HADOOP interfaces with the Hadoop JobTracker. This is the service within
Hadoop that controls tasks to specific nodes in the cluster. PROC HADOOP
enables you to submit the following:
n Hadoop Distributed File System (HDFS) commands
n MapReduce programs
SAS_HADOOP_RESTFUL must be set to 1. You must also set the SAS environment
variable SAS_HADOOP_CONFIG_PATH to the location where the hdfs-site.xml and
core-site.xml configuration files exist. The hdfs-site.xml must include the property
for the WebHDFS location. You also need to specify Oozie specific properties in a
configuration file and identify the configuration file with the PROC HADOOP
statement CFG= argument. The Oozie specific properties include oozie_http_port,
fs.default.name, and mapred.job.tracker. For more information, see the SAS Hadoop
Configuration Guide for Base SAS and SAS/ACCESS. PROC HADOOP does not
support running an Oozie job (MAPREDUCE or PIG) on a server that has Kerberos
enabled.
Note: For a list of Hadoop distributions that are supported in SAS 9.4, see SAS 9.4
Supported Hadoop Distributions. For a list of Hadoop distributions that are
supported in SAS Viya, see SAS Viya: Deployment Guide. For information about
configuration, see SAS Hadoop Configuration Guide for Base SAS and SAS/
ACCESS.
Syntax
PROC HADOOP <hadoop-server-options>;
AUTHDOMAIN='authentication-domain'
specifies the name of an authentication domain metadata object in order to
connect to the Hadoop server. The authentication domain references
credentials (user ID and password) without your explicitly specifying the
credentials.
CFG=fileref | 'external-file'
identifies the Hadoop configuration file to use in order to connect to the Hadoop
server. The configuration file contains entries for Hadoop system information,
including file system properties such as fs.defaultFS. The configuration file can
be a copy of the Hadoop core-site.xml file. However, if your Hadoop cluster is
running with HDFS failover enabled, you need to create a file that combines the
properties of the Hadoop core-site.xml and hdfs-site.xml. The configuration file
must specify the name and JobTracker addresses for the specific server.
fileref
specifies the SAS fileref that is assigned to the Hadoop configuration file. To
assign a fileref, use the FILENAME statement.
'external-file'
is the physical location of the XML document. Include the complete
pathname and the filename. The maximum length is 200 characters.
Alias OPTIONS=
Interaction The CFG= option is required for a configuration file that is specific
to Apache Oozie. You must also set the SAS environment variable
SAS_HADOOP_CONFIG_PATH. For more information, see SAS
Hadoop Configuration Guide for Base SAS and SAS/ACCESS. For
other uses, specify the location of configuration files by setting the
SAS_HADOOP_CONFIG_PATH environment variable only. The
environment variable is used by several SAS components.
MAXWAIT=wait-interval
specifies the HTTP status response time when using WebHDFS.
PASSWORD='password'
is the password for the user ID on the Hadoop server. The user ID and password
are added to the set of options that are identified by CFG=.
1238 Chapter 33 / HADOOP Procedure
Alias PASS=
USERNAME='ID'
is an authorized user ID on the Hadoop server. The user ID and password are
added to the set of options that are identified by CFG=.
Alias USER=
VERBOSE
enables additional messages that are displayed on the SAS log. VERBOSE is a
good error diagnostic tool. If you receive an error message when you invoke
SAS, you can use this option to see whether you have an error in your system
option specifications.
HDFS Statement
Submits Hadoop Distributed File System (HDFS) commands.
Restrictions: The HDFS statement supports only one operation per invocation.
The CAT, CHMOD, and LS commands are available only when submitting HDFS
commands through webHDFS.
The RECURSE option and use of wildcards are permitted only when submitting
HDFS commands through webHDFS.
Requirements: To submit HDFS commands through WebHDFS, the SAS environment variable
SAS_HADOOP_RESTFUL must be set to 1. In addition, the Hadoop configuration
file must include the properties for the WebHDFS location. For more information,
see the SAS Hadoop Configuration Guide for Base SAS and SAS/ACCESS.
To submit HDFS commands using the Java API, the Hadoop distribution JAR files
must be copied to a physical location that is accessible to the SAS client machine.
The SAS environment variable SAS_HADOOP_JAR_PATH must be set to the
location of the Hadoop JAR files. For more information, see the SAS Hadoop
Configuration Guide for Base SAS and SAS/ACCESS.
Note: For more information about Hadoop configuration, see SAS Hadoop Configuration
Guide for Base SAS and SAS/ACCESS.
Examples: “Example 1: Submitting HDFS Commands” on page 1250
“Example 2: Submitting HDFS Commands with Wildcard Characters” on page 1251
Syntax
HDFS <hadoop-server-options> <hdfs-command-options>;
HDFS Statement 1239
'HDFS-file'
specifies a pathname or a pathname and a filename. You can use wildcard
characters to substitute for any other character or characters in the
pathname or the filename. Use * to match one or more characters, or ? to
match a single character.
ONLY=n
displays only the specified number of lines from the beginning of the file. For
example, only=10 displays the first ten lines of a file. This option is helpful
to determine the contents of a file.
OUT='output-location'
specifies the output location for the contents, which can be an external file
for your machine or a fileref that is assigned with the FILENAME statement.
By default, the output location is the SAS log.
RECURSE
specifies to display the contents for all files in the specified pathname and
all files that are in subdirectories. RECURSE has no effect if the specified
HDFS file is not a directory.
1240 Chapter 33 / HADOOP Procedure
SHOW_FILENAME
includes the name of the file in the output. For example, hdfs cat='/tmp/
*.txt' show_filename only=10 recurse; displays in the SAS log the
name of the file and the first ten lines of all .txt files that are found in
the /tmp directory and all of its subdirectories.
PERMISSION=value
specifies a value that represents three levels of permissions, which are
owner, group, and user. All three permission levels are required. You can
specify the permissions in read, write, and execute (rwx) symbolic notation
or octal notation.
n For the rwx symbolic notation, use nine characters. The first set of three
characters represents what the owner can do, the second set represents
what a group can do, and the third set represents what a user can do. For
each set of three characters, the first position must be r or - (for read),
the second position must be w or - (for write), and the third position must
be x or - (for execute). For example, permission=rwxr-xr-x specifies
that the owner has Read, Write, and Execute permission, group members
have Read and Execute permission, and users have Read and Execute
permission.
n For octal notation, use three digits. Each digit represents the permissions
for owner, group, and user. Each digit must be from 0 to 7. The octal
notation represents the same numeric value as the rwx symbolic
notation. That is, 4 is r, 2 is w, 1 is x, and 0 is -. For example,
permission=755; specifies that the owner has Read, Write, and Execute
permission, group members have Read and Execute permission, and users
have Read and Execute permission.
RECURSE
specifies to change the access permissions to all files and directories in the
specified pathname and all files and directories that are in subdirectories.
RECURSE has no effect if the specified HDFS file is not a directory. For
example, hdfs chmod='/tmp' permission=755 recurse; changes the
permissions to the specified directory and all files and subdirectories within
the directory.
'local-file'
specifies the complete pathname and the filename. Beginning with SAS
9.4M3, you can use a wildcard character to substitute for any other character
or characters in the pathname or the filename. Use * to match any number of
characters, or ? to match a single character.
OUT='output-location'
specifies the output location for the copied file, which is a complete HDFS
pathname and the filename.
DELETESOURCE
deletes the input source file after a copy command.
OVERWRITE
specifies to overwrite an existing output location.
RECURSE
specifies to copy all the files in the specified pathname and all files that are
in subdirectories. RECURSE has no effect if the specified file is not a
directory.
'HDFS-file'
specifies the complete pathname and the filename. Beginning with SAS
9.4M3, you can use a wildcard character to substitute for any other character
or characters in the pathname or the filename. Use * to match any number of
characters, or ? to match a single character.
OUT='output-location'
specifies the output location for the copied file, which is an external file for
your machine.
DELETESOURCE
deletes the input source file after a copy command.
KEEPCRC
saves the Cyclic Redundancy Check (CRC) file after the copy command to a
local output location. The CRC file is saved to the same location that is
specified in the OUT= option. The CRC file is used to ensure the correctness
of the file being copied. By default, the CRC file is deleted.
OVERWRITE
specifies to overwrite an existing output location.
RECURSE
specifies to copy all the files in the specified pathname and all files that are
in subdirectories. RECURSE has no effect if the specified HDFS file is not a
directory.
DELETE='HDFS-file' <NOWARN>
deletes the specified HDFS file.
HDFS-file
specifies a pathname or a pathname and a filename. If you include the
filename, then only that file is deleted. If you do not include a filename, then
all the files in the specified pathname and all the files that are in
subdirectories are deleted. Beginning with SAS 9.4M3, you can use a
wildcard character to substitute for any other character or characters in the
pathname or the filename. Use * to match any number of characters, or ? to
match a single character.
NOWARN
suppresses the warning message when there is an attempt to delete a file
that does not exist.
LS='HDFS-pathname' <OUT=output-location><RECURSE>
lists the files in the specified HDFS pathname. The output for each file consists
of its permissions, User ID, User ID group, file size, creation date, creation time,
and the filename.
HDFS-pathname
specifies a pathname. You can use a wildcard character to substitute for any
character or characters in the pathname. Use * to match any number of
characters, or ? to match a single character.
OUT=output-location
specifies the output location for the list of files, which can be an external file
for your machine or a fileref that is assigned with the FILENAME statement.
By default, the output location is the SAS log.
RECURSE
specifies to list the files in the specified pathname and all files that are in
subdirectories. RECURSE has no effect if the specified file is not a directory.
MKDIR='HDFS-pathname'
creates the specified HDFS pathname. Specify the complete HDFS pathname.
RENAME='HDFS-file' OUT='output-location'
renames the specified HDFS file.
'HDFS-file'
specifies the pathname and the filename to rename.
OUT='output-location'
specifies the new HDFS pathname and filename.
MAPREDUCE Statement
Submits MapReduce programs into a Hadoop cluster.
MAPREDUCE Statement 1243
Requirement: To submit MapReduce programs to a Hadoop server, the Hadoop configuration file
must include the properties to run MapReduce (MR1) or MapReduce 2 (MR2) and
YARN.
Interactions: To submit MapReduce programs using the Java API, the Hadoop distribution JAR
files must be copied to a physical location that is accessible to the SAS client
machine. The SAS environment variable SAS_HADOOP_JAR_PATH must be set to
the location of the Hadoop JAR files. For more information, see the SAS Hadoop
Configuration Guide for Base SAS and SAS/ACCESS.
Beginning with SAS 9.4M3, to submit MapReduce programs through the Apache
Oozie RESTful API, the SAS environment variable SAS_HADOOP_RESTFUL must
be set to 1. You must also set the SAS environment variable
SAS_HADOOP_CONFIG_PATH to the location where the hdfs-site.xml and core-
site.xml configuration files exist. The hdfs-site.xml file must include the property
for the WebHDFS location. You also need to specify Oozie specific properties in a
configuration file and identify the configuration file with the PROC HADOOP
statement CFG= argument. The Oozie specific properties include oozie_http_port,
fs.default.name, and mapred.job.tracker. For more information, see the SAS Hadoop
Configuration Guide for Base SAS and SAS/ACCESS.
Note: For more information about Hadoop configuration, see SAS Hadoop Configuration
Guide for Base SAS and SAS/ACCESS.
Example: “Example 3: Submitting a MapReduce Program” on page 1253
Syntax
MAPREDUCE<hadoop-server-options> <mapreduce-options>;
OUTPUT='HDFS-pathname'
when connecting to the Hadoop server, specifies a new HDFS pathname
for the MapReduce output.
OUTPUTFORMAT='class-name'
specifies the name of the output format class in dot notation.
OUTPUTKEY='class-name'
specifies the name of the output key class in dot notation.
OUTPUTVALUE='class-name'
is the name of the output value class in dot notation.
PARTITIONER='class-name'
specifies the name of the partitioner class in dot notation.
REDUCE='class-name'
specifies the name of the reducer class in dot notation.
REDUCETASKS=integer
specifies the number of reduce tasks.
REPLACE
when connecting to Hadoop through the Oozie RESTful API, specifies to
delete any existing workflow and JAR file(s) in the Oozie application
before copying new files to the working directory.
SORTCOMPARE='class-name'
specifies the name of the sort comparator class in dot notation.
WORKINGDIR='HDFS-pathname'
specifies the name of the HDFS working directory pathname.
MapReduce Options
COMBINE='class-name'
specifies the name of the combiner class in dot notation.
DELETERESULTS
specifies to delete the output directory, if it exists, before starting the
MapReduce job.
GROUPCOMPARE='class-name'
specifies the name of the grouping comparator (GroupComparator) class in dot
notation.
INPUT='HDFS-pathname'
specifies the HDFS pathname to the MapReduce input file.
INPUTFORMAT='class-name'
specifies the name of the input format class in dot notation.
JAR='external-file(s)'
specifies the locations of the JAR files that contain the MapReduce program
and named classes. Include the complete pathname and the filename.
MAP='class-name'
specifies the name of the map class in dot notation. A map class contains
elements that are formed by the combination of a key value and a mapped
value.
OUTPUT='HDFS-pathname'
when connecting to the Hadoop server, specifies a new HDFS pathname for the
MapReduce output.
OUTPUTFORMAT='class-name'
specifies the name of the output format class in dot notation.
OUTPUTKEY='class-name'
specifies the name of the output key class in dot notation.
OUTPUTVALUE='class-name'
is the name of the output value class in dot notation.
PARTITIONER='class-name'
specifies the name of the partitioner class in dot notation. A partitioner class
controls the partitioning of the keys of the intermediate map outputs.
REDUCE='class-name'
specifies the name of the reducer class in dot notation. The reduce class
reduces a set of intermediate values that share a key to a smaller set of values.
REDUCETASKS=integer
specifies the number of reduce tasks.
REPLACE
when connecting to Hadoop through the Oozie RESTful API, specifies to delete
any existing workflow and JAR file(s) in the Oozie application before copying
new files to the working directory.
SORTCOMPARE='class-name'
specifies the name of the sort comparator class in dot notation.
WORKINGDIR='HDFS-pathname'
specifies the name of the HDFS working directory pathname.
PIG Statement
Submits Pig language code into a Hadoop cluster.
Interactions: To submit Pig language code using the Java API, the Hadoop distribution JAR files
must be copied to a physical location that is accessible to the SAS client machine.
The SAS environment variable SAS_HADOOP_JAR_PATH must be set to the
location of the Hadoop JAR files. For more information, see the SAS Hadoop
Configuration Guide for Base SAS and SAS/ACCESS.
Beginning with SAS 9.4M3, to submit Pig language code through the Apache Oozie
RESTful API, the SAS environment variable SAS_HADOOP_RESTFUL must be set
to 1. You must also set the SAS environment variable
SAS_HADOOP_CONFIG_PATH to the location where the hdfs-site.xml and core-
site.xml configuration files exist. The hdfs-site.xml file must include the property
for the WebHDFS location. You also need to specify Oozie specific properties in a
configuration file and identify the configuration file with the PROC HADOOP
statement CFG= argument. The Oozie specific properties include oozie_http_port,
fs.default.name, and mapred.job.tracker. For more information, see the SAS Hadoop
Configuration Guide for Base SAS and SAS/ACCESS.
Note: For more information about Hadoop configuration, see SAS Hadoop Configuration
Guide for Base SAS and SAS/ACCESS.
Example: “Example 4: Submitting Pig Language Code” on page 1255
Syntax
PIG <hadoop-server-options> <pig-code-options>;
REGISTERJAR='external-file(s)'
specifies the locations of the JAR files that contain the Pig scripts to
execute.
REPLACE
when connecting to Hadoop through the Oozie RESTful API, specifies to
delete any existing workflow and JAR file(s) in the Oozie application
before copying new files to the working directory.
WORKINGDIR='HDFS-pathname'
when connecting to Hadoop through the Oozie RESTful API, specifies the
HDFS pathname for the Oozie workflow application directory.
fileref
is a SAS fileref that is assigned to the source file. To assign a fileref, use the
FILENAME statement.
'external-file'
is the physical location of the source file. Specify the complete pathname
and the filename.
DELETERESULTS
when connecting to the Hadoop server through the Oozie RESTful API, specifies
to delete the existing output location before starting the Oozie job.
OUTPUT='HDFS-pathname'
when connecting to the Hadoop server through the Oozie RESTful API, specifies
the existing output location to delete before starting the Oozie job.
PARAMETERS=fileref | 'external-file'
specifies the source that contains parameters to be passed as arguments when
the Pig code executes.
fileref
is a SAS fileref that is assigned to the source file. To assign a fileref, use the
FILENAME statement.
1248 Chapter 33 / HADOOP Procedure
'external-file'
is the physical location of the source file. Specify the complete pathname
and the filename.
REGISTERJAR='external-file(s)'
specifies the locations of the JAR files that contain the Pig scripts to execute.
Specify the complete pathname and the filename.
REPLACE
when connecting to Hadoop through the Oozie RESTful API, specifies to delete
any existing workflow and JAR file(s) in the Oozie application before copying
new files to the working directory.
WORKINGDIR='HDFS-pathname'
when connecting to Hadoop through the Oozie RESTful API, specifies the HDFS
pathname for the Oozie workflow application directory.
PROPERTIES Statement
Submits configuration properties to the Hadoop server.
Alias: PROP
Example: “Example 5: Submitting Configuration Properties” on page 1256
Syntax
PROPERTIES 'configuration-property-1' <'configuration-property-2'> …;
Required Argument
configuration-property
specifies any property that can be specified in a Hadoop configuration file.
Usage: HADOOP Procedure 1249
The PROC HADOOP HDFS statement submits HDFS commands to the Hadoop
server. HDFS commands are like the Hadoop shell commands that interact with
HDFS and manipulate files. For the list of HDFS commands, see “HDFS Statement”
on page 1238.
The PROC HADOOP PIG statement submits Pig language code into a Hadoop
cluster. For more information, see “PIG Statement” on page 1246.
Details
This PROC HADOOP example submits HDFS commands to a Hadoop server. The
statements create a directory, delete a directory, and copy a file from HDFS to a
local output location.
Program
options set=SAS_HADOOP_CONFIG_PATH="\\sashq\root\u\abcdef\cdh45p1";
options set=SAS_HADOOP_JAR_PATH="\\sashq\root\u\abcdef\cdh45";
proc hadoop username='sasabc' password='sasabc' verbose;
hdfs mkdir='/user/sasabc/new_directory';
hdfs delete='/user/sasabc/temp2_directory';
Example 2: Submitting HDFS Commands with Wildcard Characters 1251
hdfs copytolocal='/user/sasabc/testdata.txt'
out='C:\Users\sasabc\Hadoop\testdata.txt' overwrite;
run;
Program Description
Define the SAS_HADOOP_CONFIG_PATH environment variable and the
SAS_HADOOP_JAR_PATH environment variable. The OPTIONS statements
include the SET system option to define the environment variables. The
environment variables set the location of the Hadoop cluster configuration files
and the Hadoop JAR files so that the required files are available to the SAS session.
options set=SAS_HADOOP_CONFIG_PATH="\\sashq\root\u\abcdef\cdh45p1";
options set=SAS_HADOOP_JAR_PATH="\\sashq\root\u\abcdef\cdh45";
Execute the PROC HADOOP statement. The PROC HADOOP statement controls
access to the Hadoop server by identifying the user ID and password on the
Hadoop server. The statement specifies the VERBOSE option, which enables
additional messages to be written to the SAS log.
proc hadoop username='sasabc' password='sasabc' verbose;
Create an HDFS pathname. The first HDFS statement specifies the MKDIR= option
to create an HDFS pathname.
hdfs mkdir='/user/sasabc/new_directory';
Delete an HDFS file. The second HDFS statement specifies the DELETE= option to
delete an HDFS file.
hdfs delete='/user/sasabc/temp2_directory';
Copy an HDFS file. The third HDFS statement specifies the COPYTOLOCAL=
option to specify the HDFS file to copy, the OUT= option to specify the output
location on the local machine, and the OVERWRITE option to specify that if the
output location exists, write over it.
hdfs copytolocal='/user/sasabc/testdata.txt'
out='C:\Users\sasabc\Hadoop\testdata.txt' overwrite;
run;
OPTIONS statement
SET system option
Details
This PROC HADOOP example submits HDFS commands to a Hadoop server. The
statements display the contents of the specified files, change the permissions for
one HDFS file, and list the files in a specified HDFS pathname.
Program
options set=SAS_HADOOP_CONFIG_PATH="\\sashq\root\u\abcdef\cdh45p1";
options set=SAS_HADOOP_JAR_PATH="\\sashq\root\u\abcdef\cdh45";
options set=SAS_HADOOP_RESTFUL 1;
proc hadoop username='sasabc' password='sasabc' verbose;
hdfs cat='/user/sasabc/*';
hdfs chmod='/user/sasabc/' permission=rwxr-xr-x;
hdfs ls='/user/sasabc/*';
run;
Program Description
Define the SAS_HADOOP_CONFIG_PATH environment variable, the
SAS_HADOOP_JAR_PATH environment variable, and the
SAS_HADOOP_RESTFUL environment variable. The OPTIONS statements include
the SET system option to define the environment variables. The first two
environment variables set the location of the Hadoop cluster configuration files
and the Hadoop JAR files so that the required files are available to the SAS session.
The SAS_HADOOP_RESTFUL environment variable specifies to connect to the
Hadoop server by using the WebHDFS REST API.
options set=SAS_HADOOP_CONFIG_PATH="\\sashq\root\u\abcdef\cdh45p1";
options set=SAS_HADOOP_JAR_PATH="\\sashq\root\u\abcdef\cdh45";
options set=SAS_HADOOP_RESTFUL 1;
Execute the PROC HADOOP statement. The PROC HADOOP statement controls
access to the Hadoop server by identifying the user ID and password on the
Hadoop server. The statement specifies the VERBOSE option, which enables
additional messages to be written to the SAS log.
proc hadoop username='sasabc' password='sasabc' verbose;
Display the contents of HDFS files. The first HDFS statement specifies the CAT=
option to display the contents of HDFS files. The wildcard character * specifies to
Example 3: Submitting a MapReduce Program 1253
match one or more characters. All files that are contained in the directory /user/
sasabc/ are displayed in the SAS log.
hdfs cat='/user/sasabc/*';
Change the file access permissions. The second HDFS statement specifies the
CHMOD= option to change the file access permissions for the specified HDFS
pathname. The file access permissions provide the owner with Read, Write, and
Execute permission, group members with Read and Execute permission, and users
with Read and Execute permission.
hdfs chmod='/user/sasabc/' permission=rwxr-xr-x;
List the files in an HDFS pathname. The third HDFS statement specifies the LS=
option to list the files in the specified HDFS pathname to the SAS log. The wildcard
character * specifies to match one or more characters. All files that are contained in
the directory /user/sasabc/ are displayed in the SAS log. The output for each file
consists of its permissions, User ID, User ID group, file size, creation date, creation
time, and the filename.
hdfs ls='/user/sasabc/*';
run;
Details
This PROC HADOOP example submits a MapReduce program to a Hadoop server.
This code runs the MapReduce job by using the Java API. To run the job by using
the Apache Oozie RESTful API would require additional setup.
The example uses the Hadoop MapReduce application WordCount that reads a text
input file, breaks each line into words, counts the words, and then writes the word
counts to the output text file.
Program
options set=SAS_HADOOP_CONFIG_PATH="pathname";
options set=SAS_HADOOP_JAR_PATH="pathname";
1254 Chapter 33 / HADOOP Procedure
Program Description
Define the SAS_HADOOP_CONFIG_PATH environment variable and the
SAS_HADOOP_JAR_PATH environment variable. The OPTIONS statements
include the SET system option to define the environment variables. The
environment variables set the location of the Hadoop cluster configuration files
and the Hadoop JAR files so that the required files are available to the SAS session.
options set=SAS_HADOOP_CONFIG_PATH="pathname";
options set=SAS_HADOOP_JAR_PATH="pathname";
Execute the PROC HADOOP statement. The PROC HADOOP statement controls
access to the Hadoop server by identifying the user ID and password on the
Hadoop server. The statement specifies the VERBOSE option, which enables
additional messages to be written to the SAS log.
proc hadoop username='sasabc' password='sasabc' verbose;
Create an HDFS pathname. OUTPUT= creates the HDFS pathname for the program
output location named OutputTest.
output='/user/sasabc/outputtest'
Specify the JAR file. JAR= specifies the location of the JAR file that contains the
MapReduce program named WordCount.jar.
jar='pathname/WordCount.jar'
Specify an output key class. OUTPUTKEY= specifies the name of the output key
class org.apache.hadoop.io.Text.
outputkey='org.apache.hadoop.io.Text'
Specify an output value class. OUTPUTVALUE= specifies the name of the output
value class org.apache.hadoop.io.IntWritable.
outputvalue='org.apache.hadoop.io.IntWritable'
Example 4: Submitting Pig Language Code 1255
Specify a reducer class. REDUCE= specifies the name of the reducer class
org.apache.hadoop.examples.WordCount$IntSumReducer.
reduce='org.apache.hadoop.examples.WordCount$IntSumReducer'
Specify a combiner class. COMBINE= specifies the name of the combiner class
org.apache.hadoop.examples.WordCount$IntSumReducer.
combine='org.apache.hadoop.examples.WordCount$IntSumReducer'
Specify a map class. MAP= specifies the name of the map class
org.apache.hadoop.examples.WordCount$TokenizerMapper.
map='org.apache.hadoop.examples.WordCount$TokenizerMapper';
run;
Details
This PROC HADOOP example submits Pig language code into a Hadoop cluster.
This code runs the Pig job by using the Java API. To run the job by using the Apache
Oozie RESTful API would require additional setup.
Here is the Pig language code, which is stored in a text file named sample_pig.txt.
The Pig code references a data file named class.txt. The Pig code also calls a user-
defined Java function named some.custom.class.promotionDecision() to
manipulate the data. The user has created that function and made it available in a
JAR file.
Program
options set=SAS_HADOOP_CONFIG_PATH="pathname";
options set=SAS_HADOOP_JAR_PATH="pathname";
1256 Chapter 33 / HADOOP Procedure
Program Description
Define the SAS_HADOOP_CONFIG_PATH environment variable and the
SAS_HADOOP_JAR_PATH environment variable. The OPTIONS statements
include the SET system option to define the environment variables. The
environment variables set the location of the Hadoop cluster configuration files
and the Hadoop JAR files so that the required files are available to the SAS session.
options set=SAS_HADOOP_CONFIG_PATH="pathname";
options set=SAS_HADOOP_JAR_PATH="pathname";
Assign a file reference to the Pig language code. The FILENAME statement assigns
the file reference MYCODE to the physical location of the file that contains the Pig
language code that is named Sample_Pig.txt, which is shown above.
filename mycode 'pathname/sample_pig.txt';
Execute the PROC HADOOP statement. The PROC HADOOP statement controls
access to the Hadoop server by identifying the user ID and password on the
Hadoop server with the USERNAME= and PASSWORD= options. The statement
specifies the VERBOSE option, which enables additional messages to be written to
the SAS log.
proc hadoop username='sasabc' password='sasabc' verbose;
Execute the PIG statement. The CODE= option specifies the SAS fileref MYCODE,
which is assigned to the physical location of the file that contains the Pig language
code. The REGISTERJAR= option specifies the JAR file gradeAnalysis.jar, which the
user has created, to provide user-defined functions.
pig code=mycode registerjar='pathname/gradeAnalysis.jar';
run;
Details
This PROC HADOOP example submits a MapReduce program to a Hadoop server.
Rather than specifying a Hadoop configuration file in the PROC HADOOP
statement, the configuration properties are submitted in the PROPERTIES
statement.
Program
proc hadoop username='sasabc' password='sasabc' verbose;
prop 'mapred.job.tracker'='xxx.us.company.com:8021'
'fs.default.name'='hdfs://xxx.us.company.com:8020';
mapreduce jar="&mapreducejar."
input="&inputfile."
output="&outdatadir."
deleteresults;
run;
Program Description
Execute the PROC HADOOP statement. The PROC HADOOP statement controls
access to the Hadoop server by identifying the user ID and password on the
Hadoop server. The statement specifies the VERBOSE option, which enables
additional messages to be written to the SAS log.
proc hadoop username='sasabc' password='sasabc' verbose;
34
HDMD Procedure
File Readers
You can use PROC HDMD to describe tabular HDFS files for these formats:
n fixed-record length (binary) data
n delimited text
n XML-encoded text
PROC HDMD can also associate a custom MapReduce reader (one that is based in
Java) with a file. The custom reader is able to produce delimited text or fixed length
binary records that can be described using the PROC HDMD syntax. You can use
custom MapReduce readers only with SAS high-performance procedures for
Hadoop. SAS/ACCESS Interface to Hadoop does not currently use custom readers.
/corp/tables/purchases/franchise_201/income_2012-01-03.dat
Concepts: HDMD Procedure 1261
/corp/tables/purchases/franchise_201/income_2012-01-04.dat
In this example, PROC HDMD creates a SASHDMD descriptor for this income table
data. The income table has four comma-separated columns, a product code, the
quantity purchased, the price, and the total purchase amount including tax.
libname hdplib hadoop server=mysrv1
user=myusr1 pass=mypwd1
hdfs_metadir="/corp/metadata"
hdfs_datadir="/corp/tables/purchases/franchise_201";
Overview
Apache Hive 3.0 provides better support for transactional data. Changes to
improve transaction support for Hive managed tables also mean that PROC HDMD
users must perform extra steps to create and use metadata descriptions (.HDMD
files) for existing Hive tables.
The difference between the two tables is what Hive deletes when you drop the
table.
n managed: table metadata from the Hive MetaStore and physical data from the
Hadoop file system
n external: only table metadata; table data from the Hadoop file system is
retained
1262 Chapter 34 / HDMD Procedure
Therefore, in Apache Hive 3.0, by default a new table is both managed and
transactional.
This example shows how to create the CLASS managed, transactional table.
0: jdbc:hive2:> create table class (name varchar(8), sex varchar(1),
age double, weight double);
INFO : OK
No rows affected (0.045 seconds)
0: jdbc:hive2:> describe formatted class;
INFO : OK
Concepts: HDMD Procedure 1263
+----------------------+------------------------------------------------------------+--------------------+
| col_name | data_type | comment |
+----------------------+------------------------------------------------------------+--------------------+
| # col_name | data_type | |
| name | varchar(8) | |
| sex | varchar(1) | |
| age | double | |
| height | double | |
| weight | double | |
| | NULL | NULL |
| # Detailed Table | | |
| # Information | NULL | NULL |
| Database: | default | NULL |
| OwnerType: | USER | NULL |
| Owner: | hadoop | NULL |
| CreateTime: | Fri Sep 28 09:30:10 EDT 2018 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Retention: | 0 | NULL |
| Location: | hdfs://server16.corp.com:8020/warehouse | |
| | /tablespace/managed/hive/class | NULL |
| Table Type: | MANAGED_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | SAS OS Name | Linux |
| | SAS Version | 9.04.01M6D09272018 |
| | bucketing_version | 2 |
| | numFiles | 1 |
| | numRows | 0 |
| | rawDataSize | 0 |
| | totalSize | 496 |
| | transactional | true |
| | transactional_properties | insert_only |
| | transient_lastDdlTime | 1538141416 |
| | NULL | NULL |
| # Storage | | |
| # Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL |
| InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | field.delim | \u0001 |
| | line.delim | \n |
| | serialization.format | \u0001 |
+----------------------+------------------------------------------------------------+--------------------+
40 rows selected (0.163 seconds)
To resolve this issue, you must create the table as non-transactional. Because of
the restriction that managed tables must be transactional, the table must also be
created as external.
Here is an example using the Beeline client. Note that DESCRIBE FORMATTED
does not show transactional in the table parameters.
0: jdbc:hive2:> create external table class
(name varchar(8), sex varchar(1), age double, weight double)
location '/tmp' tblproperties ("transactional"="false");
INFO : OK
No rows affected (0.267 seconds)
0: jdbc:hive2://server18.corp.com:2181,srv> describe formatted class;
INFO : OK
+----------------------+------------------------------------------------------------+--------------------+
| col_name | data_type | comment |
+----------------------+------------------------------------------------------------+--------------------+
| # col_name | data_type | |
| name | varchar(8) | |
| sex | varchar(1) | |
| age | double | |
| weight | double | |
| | NULL | NULL |
| # Detailed Table | | |
| # Information | NULL | NULL |
| Database: | default | NULL |
| OwnerType: | USER | NULL |
| Owner: | hadoop | NULL |
| CreateTime: | Tue Oct 02 13:10:11 EDT 2018 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Retention: | 0 | NULL |
| Location: | hdfs://server16.corp.com:8020/tmp | NULL |
| Table Type: | EXTERNAL_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | EXTERNAL | TRUE |
| | bucketing_version | 2 |
| | numFiles | 0 |
| | totalSize | 0 |
| | transient_lastDdlTime | 1538500211 |
| | NULL | NULL |
| # Storage | | |
| # Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL |
| InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | serialization.format | 1 |
+----------------------+------------------------------------------------------------+--------------------+
32 rows selected (0.055 seconds)
Concepts: HDMD Procedure 1265
proc sql;
connect using x;
execute (create external table if not exists default.class
(name varchar(8), sex varchar(1), age double, weight double)
location '/tmp/class' tblproperties ("transactional"="false")
) by x;
quit;
Summary
When you know how to use PROC HDMD to create a metadata description of a
Hive table, you can use PROC PRINT and other SAS tools to reference the table.
The example shows how to specify the EXTERNAL and LOCATION keywords along
with the table properties set to FALSE to create the source Hive table.
1266 Chapter 34 / HDMD Procedure
COLUMN Specifies the columns of HDFS files and tables Ex. 1, Ex. 2,
with which to work. Ex. 3, Ex. 4,
Ex. 5, Ex. 6,
Ex. 7, Ex. 8,
Ex. 9
Syntax
PROC HDMD
<BYTE_ORDER=LITTLEENDIAN | BIGENDIAN>
<DATA_FILE='input-filename'>
<ENCODING=encoding>
<FILE_FORMAT=file-format>
<FILE_TYPE='custom-input-file-type'>
<FROM=Hive-table>
PROC HDMD Statement 1267
<HEADER_LINES=n>
<INPUT_CLASS='java.class'>
<MANAGED>
<NAME=libref.filename <hadoop | spark>>
<RECORD_LENGTH=record-length>
<ROW_TAG='row-tag'>
<SEP='character-separator'>
<TEXT_QUALIFIER='character-qualifier'>;
COLUMN column-specifications;
run;
Type optional
Default client
Applies to BINARY
DATA_FILE='input-filename'
specifies the path to the input data file relative to the HDFS_DATADIR= option
in the Hadoop engine LIB statement.
Type required
Default none
ENCODING=encoding
specifies the encoding for text for the input data file or folder.
Type optional
Applies to BINARY
FILE_FORMAT=file-format
specifies the format of the input data that is passed to the SAS Embedded
Process for Hadoop.
BINARY
specifies a file with fixed length records where numeric data is stored in
machine-specific binary form.
1268 Chapter 34 / HDMD Procedure
DELIMITED
specifies a file that contains only text-based data where fields are separated
by a specific delimiter. Delimited records can vary in length.
XML
specifies a text-based data file in XML format.
Type required
Alias FILE_FMT=
Default none
FILE_TYPE='custom-input-file-type'
specifies the file type that is used in the MapReduce framework to load the data
into the SAS Embedded Process. The file type maps to a SAS provided
MapReduce input format classes name. The input format classes that is
provided by SAS for a particular file type creates specific input readers. For
example, the DELIMITED file type is mapped to the MapReduce input format
class, com.sas.access.hadoop.ep.delimited.DelimitedInputFormat.
Type optional
Default none
FROM=Hive-table
specifies the name of a Hive table that you want to use for in-database scoring.
SAS creates the metadata file Hive-table.sashdmd in the target’s metadata
directory.
HEADER_LINES=n
specifies the number of lines that are skipped when parsing delimited files.
Type optional
Default none
Applies to DELIMITED
INPUT_CLASS='java.class'
specifies the fully qualified class name that implements the Java custom
MapReduce reader to use.
PROC HDMD Statement 1269
Type optional
Default none
Requirement The class must be in the class path of the Hadoop server.
MANAGED
specifies that the file is deleted when its metadata is deleted (for example, by
using PROC DELETE).
Type optional
Default By default, data files are not managed—namely, they are not deleted
when the metadata is deleted..
Type required
Default none
Requirement The libref must be a valid Hadoop or Spark engine libref for which
HDFS_METADIR= and HDFS_DATADIR= options have been
specified.
RECORD_LENGTH=record-length
specifies the record length of the BINARY file.
Type required
Default none
Applies to BINARY
ROW_TAG='row-tag'
specifies the XML tag that identifies records in the input XML.
Type required
Default none
1270 Chapter 34 / HDMD Procedure
Applies to XML
SEP='character-separator'
specifies the character to separate the columns for the records in the delimited
input file. Here is how you can specify values.
n SEP=^A
n SEP=','
n SEP=TAB
n SEP=^Z
n SEP='09'x
n SEP=32
Type required
Range You can specify only a single character between the Unicode range
of U+0001 to U+007F.
Applies to DELIMITED
Restriction The value of this option cannot be the same character as for
TEXT_QUALIFIER= and cannot be a newline ('0a'x).
TEXT_QUALIFIER='character-qualifier'
specifies the text qualifier for the input data file or folder. Here is how you can
specify values.
Type optional
Default none
Range You can specify only a single character between the Unicode range
of U+0001 to U+007F.
Applies to DELIMITED
Restriction The value of this option cannot be the same character as for SEP=
and cannot be a newline ('0a'x).
Requirement You must specify either a double quotation mark enclosed in single
quotation marks(' ” ') or a single quotation mark enclosed in double
quotation marks (“ ' ”).
COLUMN Statement 1271
COLUMN Statement
Provides specifications for one or more columns.
Syntax
COLUMN <column-options> <data-type> <name>;
Column Specifications
column-options
specifies one or more column options.
BYTES=byte-length
for BINARY files, specifies the number of bytes that the data occupies in the
record.
Type required
Default none
Applies to BINARY
CTYPE=ctype
for BINARY files, specifies the actual binary type of data that to be stored in
the record. Here are the valid binary data types:
n CHAR
n DOUBLE
n FLOAT
n INT8
n INT16
n INT32
n INT64
n UINT8
n UINT16
n UINT32
n UINT64
Type optional
Default none
1272 Chapter 34 / HDMD Procedure
Applies to BINARY
ENCODING=encoding
for BINARY files, specifies the encoding for the character data if it differs
from the overall file encoding.
Type optional
Default none
Applies to BINARY
FORMAT=format-specification
specifies the format that is associated with the column.
Type optional
Default none
INFORMAT=informat-specification
specifies the informat to use to read the input data.
Type optional
Default none
OFFSET=bytes
specifies the offset of the column data in the record.
Type required
Default none
Applies to BINARY
TAG='tag'
specifies the XML element that encloses the column data.
Type required
Default none
Applies to XML
data-type
specifies a valid data type:
n BIGINT
Example 1: Create Hadoop Metadata from a Delimited File 1273
n CHAR(n)
n DATE
n DOUBLE
n INT
n REAL
n SMALLINT
n TIME[(prec)]
n TIMESTAMP[(prec)]
n TINYINT
n VARCHAR(n)
Type required
Default none
name
specifies a name for the column.
Default none
You can use the HDMD procedure to create metadata in a Hadoop file or directory
of files. This example starts with a comma-delimited file with three columns.
Name,Age,Weight
1274 Chapter 34 / HDMD Procedure
John,32,180
Jane,27,112
Tim,54,210
By assigning a data type for each column that is retrieved, here is how you can
create the metadata.
libname hdplib hadoop server=mysrv1_cluster1
user=myusr1 pass=mypwd1
/* connection options */
config='/user/configs/hadoop_cluster1.xml'
hdfs_tempdir='/corp/tempdir'
hdfs_metadir='/corp/metadata'
hdfs_datadir='/corp/tables/purchases';
This example uses the Hadoop LIBNAME statement to define metadata for a
delimited file. The file contains this data:
12.34 23f45 "This shows quotes" 4.5 2013-05-05
unquoted 11:12:13 09:05:12.2345 "2013-03-04
12:13:12.12345678" 3 10240 1298378438743
A blank character and a double quotation mark are used to parse the file.
libname hdplib hadoop server=mysrv1_cluster1
user=myusr1 pass=mypwd1
/* connection options */
config='/user/configs/hadoop_cluster1.xml'
hdfs_tempdir='/corp/tempdir'
hdfs_metadir='/corp/metadata'
hdfs_datadir='/corp/tables/purchases';
column id double;
column fullname char(32);
run;
Obs id fullname
1 1 Doe, John
2 2 Smith, Sally
In this example, two columns are extracted from an XML file that contains these
XML tags:
<row><Size>42.5</Size><Name>Julius Caesar</Name></row>
column I double;
column J double;
column W double;
run;
'com.abc.hadoop.ep.inputformat.sequence.PeopleCustomSequenceInputFormat
'
data_file='people.seq';
35
HTTP Procedure
For web servers that support it, PROC HTTP uses connection caching and cookie
caching by default. You can toggle the behavior of both types of caching and clear
the caches within the procedure by specifying procedure arguments. Or you can
turn cookie caching off by using a macro variable.
Beginning with the November 2019 release of SAS 9.4M6 and SAS Viya 3.5, you can
easily post form data as well as perform HTTP multipart requests. A new QUERY=
procedure option simplifies the process of submitting query parameters for the
URL= argument. For more information, see “Using the FORM and QUERY Options
with PROC HTTP” on page 1298 and “Using the MULTI Option” on page 1299.
The procedure includes a DEBUG statement, response status macro variables, and
the ability to specify a time-out period for requests. Beginning with SAS 9.4M6 and
SAS Viya 3.5, a new statement, SSLPARMS, enables you to override global SAS
system options for encryption with local options for a specific PROC HTTP request.
re-enabled. For more information, see “SAS Processing Restrictions for Servers in a
Locked-Down State” in SAS Programmer’s Guide: Essentials
Tip: PROC HTTP sets up macro variables with certain values after it executes each
statement. These macro variables can be used inside a macro to test for HTTP
errors. For more information, see “PROC HTTP Response Status Macro Variables”
on page 1301.
“Example 13: Use the DEBUG Statement with the LEVEL= Option” on page 1316
“Example 14: Specify Local Options for Two-Way Encryption in Windows” on page
1319
Syntax
PROC HTTP URL="URL-to-target</redirect/n>"
<METHOD=<">http-method<">>
<authentication-type-options>
<caching-options>
<header-options>
<proxy-server-connection-options>
<web-server-authentication-options>
<EXPECT_100_CONTINUE>
<FOLLOWLOC | NOFOLLOWLOC>
<HTTP_TOKENAUTH>
<IN=<fileref | FORM (arguments) | MULTI <options> (parts) | "string">>
<MAXREDIRECTS=n>
<OUT=fileref>
<QUERY=("parm1"="value1" "parm2="value2" …)>
;
QUERY=(<NOENCODE>"parm1"="value1" "parm2"="value2")
provides an alternate method for submitting query parameters for the
URL= argument.
TIMEOUT=integer
specifies the number of seconds of inactivity to wait before canceling an
HTTP request.
AUTH_NONE
specifies not to use basic authentication, NTLM authentication, or to
negotiate authentication, even when authentication with one of these
methods is possible.
AUTH_NTLM
specifies to use NTLM authentication to authenticate to the connected
server.
OAUTH_BEARER=token
sends an OAuth access token along with the HTTP call.
PROXY_AUTH_BASIC
specifies to perform user identity authentication through a proxy server.
PROXY_AUTH_NEGOTIATE
specifies to perform NTLM, Kerberos, or some other type of HTTP
authentication through a proxy server.
PROXY_AUTH_NTLM
specifies to perform NTLM authentication through a proxy server.
Required Argument
URL="URL-to-target"
specifies a fully qualified URL path that identifies the endpoint for the HTTP
request.
Note The URL that is passed to PROC HTTP is assumed to be URL encoded. To
ensure correct encoding, use an appropriate connection class for the
target web server. For example, use the AWSV4Signer class for Amazon
Web Services. Or, encode reserved characters as described in RFC3986.
Tip Beginning with SAS 9.4M3, you do not have to specify the protocol. If you
set just the path (for example, "httpbin.org" ), the actual URL used is
http://httpbin.org.
Optional Arguments
AUTH_ANY
When a user name and password are supplied, they are used to authenticate the
connected server. Otherwise, any other form of authentication that is available
PROC HTTP Statement 1285
Tip Since there is a chance of more than one trip to the HTTP server,
specify EXPECT_100_CONTINUE to prevent data from being uploaded
multiple times.
AUTH_BASIC
specifies to use user identity authentication to authenticate the connected
server. The user name and password are supplied with the WEBUSERNAME and
WEBPASSWORD arguments.
AUTH_NTLM
specifies to use NTLM authentication to authenticate to the connected server.
As long as your current user identity has permissions, authentication is
established.
AUTH_NEGOTIATE
specifies to use NTLM, Kerberos. or some other type of HTTP authentication to
authenticate to the connected server. As long as your current user identity has
permissions, authentication is established.
AUTH_NONE
specifies not to use basic authentication, NTLM authentication, or to negotiate
authentication, even when authentication with one of these methods is
possible. The OAUTH_BEARER= procedure option can be used with NO_AUTH.
CLEAR_CACHE
specifies to clear both the shared connection and cookie caches before the
HTTP request is executed.
CLEAR_CONN_CACHE
specifies to clear the shared connection cache before the HTTP request is
executed.
CLEAR_COOKIES
specifies to clear the shared cookie cache before the HTTP request is executed.
CT="content-type"
used in conjunction with the HEADERIN= argument, specifies the HTTP
content-type to be set in the request headers. The content-type describes the
data contained in the body fully enough that the receiving user agent can
present the data to the user.
CT="Text/plain; charset=us-ascii"
CT="Application/x-www-form-urlencoded"
Note Beginning with SAS 9.4M3, this option is supported for compatibility with
previous versions of SAS software. Use the “HEADERS Statement” on
page 1294 instead of CT= .
EXPECT_100_CONTINUE
enables a client that is sending a request message with a request body to
determine whether the target server is willing to accept the request, based on
the request headers. Use EXPECT_100_CONTINUE when you are sending large
amounts of data and want to make sure that no unnecessary transfers of the
data occur. For more information, see http://www.w3.org/Protocols/rfc2616/
rfc2616-sec8.html#sec8.2.3.
Valid in HTTP requests that specify the IN= argument, most commonly with
PUT.
FOLLOWLOC
enables write methods to automatically follow URL redirections. By default,
PROC HTTP methods that write data, like POST and PUT, terminate processing
when they are redirected to an alternate location. When FOLLOWLOC is
specified, PROC HTTP initially returns a 300-level response, then submits the
POST or PUT again to the redirected location.
HEADERIN=fileref-to-request-header-file
specifies a fileref to a text file that contains one line per request header in the
format key:value.
z/OS Specifics: In the z/OS operating environment, HEADERIN= files must be
created with a variable record length.
Note Beginning with SAS 9.4M3, this option is supported for compatibility
with previous versions of SAS software. Use the “HEADERS
Statement” on page 1294 instead of HEADERIN=
CAUTION Do not specify both the HEADERS statement and the HEADERIN=
argument. The behavior that results from specifying both options together
is not defined.
HEADEROUT=fileref-to-response-header-file
specifies a fileref to a text file to which the response headers are written in the
format key:value.
HEADEROUT_OVERWRITE
used in conjunction with the HEADEROUT= argument, causes the response
header to record only the last header block sent by the web server when a
redirect occurs.
HTTP_TOKENAUTH
generates a one-time password from the metadata server that can be used to
access the SAS Content Server.
The data is input as name=value pairs. The FORM option URL-encodes and
delimits the pairs with an & (ampersand) as is standard for form
submissions. If a name=value pair should not be URL-encoded, you can
specify a NOENCODE flag before the name=value pair.
1288 Chapter 35 / HTTP Procedure
Note The FORM option is available beginning with the November 2019
release of SAS 9.4M6 and SAS Viya 3.5.
See “Using the FORM and QUERY Options with PROC HTTP” on page 1298
You can specify form data or generic data as input. These options do not
include support for nested multipart uploads. See https://www.w3.org/
Protocols/rfc1341/7_2_Multipart.html for more information about the
multipart content-type.
When the FORM option is used, the upload is performed using the
specialized multipart type known as multipart/form-data. More information
can be found at https://tools.ietf.org/html/rfc7578#section-4.
Note The MULTI option is available beginning with the November 2019
release of SAS 9.4M6 and SAS Viya 3.5.
“string"
specifies input data in a quoted string.
Requirement The IN= option is required when the POST and PUT methods are
used.
MAXREDIRECTS=n
specifies the maximum number of redirects that are allowed. PROC HTTP
automatically follows redirects coming from a 300-level response code, as long
as the NOFOLLOWLOC option is not specified. This option enables you to limit
the number of redirects that are performed.
Default 5
METHOD=<">http-method<">
specifies an HTTP method. Standard methods include HEAD, TRACE, GET,
POST, PUT, and DELETE. Beginning with SAS 9.4M3, the method is open-ended.
Any method that conforms to the HTTP/1.1 standard and is recognizable by the
target web server is acceptable. For information, see the HTTP/1.1 specification
at www.w3.org.
Default Beginning with SAS 9.4M3, if you omit the METHOD argument and
do not specify the IN argument, the default method is GET. If you
omit METHOD and do specify the IN argument — and in SAS
releases prior to SAS 9.4M3 — the default method is POST.
PROC HTTP Statement 1289
Restriction Software releases prior to SAS 9.4M3 support only the standard
methods.
NO_CONN_CACHE
disables connection caching for this HTTP request. The connection will be made
with the specified connection parameters.
NO_COOKIES
specifies that cached cookies will not be used for this HTTP request. This
option does not prevent cookies from being sent manually with the "Cookie"
header.
NOFOLLOWLOC
prevents the GET method from following URL redirections.
NOFOLLOWLOC is the default behavior for HTTP methods that write data.
OAUTH_BEARER=token
sends an OAuth access token along with the HTTP call. Valid token values are a
string, a fileref, or the constant SAS_SERVICES. String values must be quoted.
For all token types, the argument sends an authorization header in the form:
Authorization: Bearer Value.
Notes This option is supported beginning with SAS 9.4M5 and SAS Viya 3.3.
OUT=fileref-to-response-data
specifies a fileref that indicates where output is written.
1290 Chapter 35 / HTTP Procedure
PROXY_AUTH_BASIC
specifies to perform user identity authentication through a proxy server. The
user name and password are supplied with the PROXYUSERNAME and
PROXYPASSWORD arguments.
PROXY_AUTH_NTLM
specifies to perform NTLM authentication through a proxy server. As long as
your current user identity has permissions, authentication is established.
PROXY_AUTH_NEGOTIATE
specifies to perform NTLM, Kerberos, or some other type of HTTP
authentication through a proxy server. As long as your current user identity has
permissions, authentication is established.
PROXYHOST="proxy-host-name"
specifies the Internet host name of an HTTP proxy server. Beginning with SAS
9.4M3, you can specify both the host name and the port number in the
PROXYHOST argument in the form:
host-name:port-number
Earlier SAS versions require you to specify both the PROXYHOST and
PROXYPORT arguments. For the earlier releases, specify PROXYHOST= as:
host-name
PROXYPASSWORD="proxy-passwd"
specifies an HTTP proxy server password.
Tips The password is required only if your proxy server requires credentials.
PROXYPORT=proxy-port-number
specifies an HTTP proxy server port. Beginning with SAS 9.4M3, PROXYPORT is
an optional argument. You are not required to specify PROXYPORT if you
specified both the HTTP proxy server host name and port number in the
PROXYHOST argument.
PROC HTTP Statement 1291
Note Earlier SAS releases require that the HTTP proxy server host name and
port number are specified separately in the PROXYHOST and
PROXYPORT arguments. See “Example 3: Specify a Proxy In the HTTP
Request” on page 1303.
PROXYUSERNAME="proxy-user-name"
specifies an HTTP proxy server user name.
Tip The user name is required only if your proxy server requires credentials.
QUERY=(<NOENCODE>"parm1"="value1" "parm2"="value2")
provides an alternate method for submitting query parameters for the URL=
argument.
Typically, query strings are placed on the URL in the URL= option. This can be
cumbersome when a value must be URL-encoded and arguments need to be
separated with an & (ampersand). You must encode the values in advance. SAS
interprets values preceded by an & as a macro. The QUERY= option enables you
to specify query arguments as name=value pairs in a list, which is automatically
URL-encoded and added to the query string on the URL.
When QUERY= is specified, if the input URL already has an existing query string,
the entire generated replacement string uses an & instead of a ?.
Note The QUERY= option is available beginning with the November 2019
release of SAS 9.4M6 and SAS Viya 3.5.
See “Using the FORM and QUERY Options with PROC HTTP” on page 1298
TIMEOUT=integer
specifies the number of seconds of inactivity to wait before canceling an HTTP
request. Use this option to prevent hangs if there is a chance that the server will
not respond. The default value, 0 (zero), means no time-out period.
WEBAUTHDOMAIN="web-credentials-from-metadata"
specifies the web authentication domain. If specified, a user name and password
are retrieved from metadata for the specified authentication domain.
WEBPASSWORD="basic-authentication-password"
specifies a password for basic authentication.
Alias PASSWORD
WEBUSERNAME="basic-authentication-name"
specifies a user name for basic authentication.
1292 Chapter 35 / HTTP Procedure
Alias USERNAME
DEBUG Statement
Writes debugging information to the SAS log.
Syntax
DEBUG options;
Optional Arguments
Level= 0 | 1 | 2 | 3
0 no debugging. This is the same as specifying PROC HTTP without the
DEBUG statement.
1 displays request and response headers in the log. Setting a debug level
of 1 equates to setting the REQUEST_HEADERS and
RESPONSE_HEADERS options in the DEBUG statement.
2 displays request data as well as level 1 messages in the log. Setting a
debug level of 2 equates to setting the REQUEST_HEADERS,
RESPONSE_HEADERS, and REQUEST_BODY options in the DEBUG
statement.
3 displays response data as well as level 2 messages in the log. Setting a
debug level of 3 equates to setting the REQUEST_HEADERS,
RESPONSE_HEADERS, REQUEST_BODY, and RESPONSE_BODY
options in the DEBUG statement.
CAUTION
Use level 3 with care in SAS 9.4M5 and SAS Viya. The system may become
unstable when the response is binary data.
NO_REQUEST_BODY
suppresses the request body from the information displayed when a debug level
of 2 or greater is specified.
DEBUG Statement 1293
NO_REQUEST_HEADERS
suppresses the request header from the information displayed when a debug
level of 1 or greater is specified.
NO_RESPONSE_BODY
suppresses the response body from the information displayed when a debug
level of 3 is specified.
NO_RESPONSE_HEADERS
suppresses the response header from the information displayed when a debug
level of 1 or greater is specified.
OUTPUT_TEXT
displays the request body and response body as if they are text.
REQUEST_BODY
displays the request body in the log.
REQUEST_HEADERS
displays the request header in the log.
RESPONSE_BODY
displays the response body in the log.
RESPONSE_HEADERS
displays the response header in the log.
Details
You must specify at least one option in the DEBUG statement for debugging
information to be written to the log.
In SAS 9.4M5 and in SAS Viya, you control the amount of information that is printed
with the LEVEL= option. A value of 1 or greater in the LEVEL= option is required to
display debugging information. All DEBUG statement output in these software
versions is written as text.
HEADERS Statement
Specifies request headers for the HTTP request.
Syntax
HEADERS "HeaderName"="HeaderValue" <"HeaderName-n"="HeaderValue-n">;
Required Argument
"HeaderName"="HeaderValue"
is a name and value pair that represents a header name and its value. The
HeaderName can be a standard header name or a custom header name. For
information about header field definitions, see the HTTP/1.1 specification at
www.w3.org.
Note: Do not specify a colon (:) in the header name. The name=value pairs are
automatically translated into the following form:
HeaderName : HeaderValue
Details
The HEADERS statement enables you to specify header values easily within the
procedure request, instead of having to provide a fully formatted input file via a
fileref. Use the HEADERS statement to specify the content-type and character set
of the document that you are uploading when the values are different from the
default values for the method.
Table 35.1 Default Content-Type for the POST and PUT Methods
POST application/x-www-form-urlencoded
SSLPARMS Statement 1295
PUT application/octet-stream
SSLPARMS Statement
Sets encryption options for the PROC HTTP request.
Syntax
SSLPARMS encryption-options;
Required Argument
encryption-options
specifies SAS system options for encryption that enable a secure client-server
connection with the target server. For a listing of available system options, see
“SAS System Options for Encryption” in Encryption in SAS.
Note: Only system options for encryption that begin with “SSL” are supported.
The SAS system options for encryption can be specified in either of the
following ways:
"SystemOption"="OptionValue"
SystemOption="OptionValue"
Details
Use the SSLPARMS statement to apply a SAS system option for encryption locally
instead of globally.
1296 Chapter 35 / HTTP Procedure
Note: All discussion of TLS is also applicable to the predecessor protocol, SSL.
Some UNIX installations require the Server Name Indicator (SNI) to be set so that
they can serve up the proper certificate. Beginning with SAS 9.4M5, the SNI is also
set by default. Earlier SAS 9.4 releases and SAS Viya require you to set the SNI with
environment variables. For information about setting environment variables for a
SAS 9.4 installation, see Encryption in SAS. For information about setting
environment variables for SAS Viya, see Encryption in SAS Viya: Data in Motion.
Beginning with SAS 9.4M6 and SAS Viya 3.5, PROC HTTP enables you to override
the global communication security settings with local communication security
settings with the SSLPARMS statement. In the SSLPARMS statement, specify SAS
system options for encryption. The options that are set in the SSLPARMS
Usage: HTTP Procedure 1297
statement apply only to the PROC HTTP request in which they are specified. For
more information, see “SSLPARMS Statement” on page 1295 and “Example 14:
Specify Local Options for Two-Way Encryption in Windows” on page 1319.
If you do not specify the authentication type, the default type (which is the type of
authentication that is available in SAS releases prior to the third maintenance
release), is AUTH_ANY. AUTH_ANY is equivalent to specifying AUTH_NTLM,
AUTH_NEGOTIATE, and AUTH_BASIC together in the request. AUTH_NTLM
authentication is attempted first (for Windows only), then AUTH_NEGOTIATE, and
so on, although the server ultimately determines which authentication type is used.
If the server that you are connecting to supports the NTLM authentication protocol
or the Kerberos authentication protocol, it usually is not necessary to specify a user
name and password. As long as your current user identity has permissions,
authentication is established.
HTTP_TOKENAUTH enables you to access SAS Content Servers from PROC HTTP
without having to supply a user name and password.
WEBAUTHDOMAIN is also used in lieu of a user name and password. However, you
must set up a metadata entry that stores the user name and password for the
specified web authentication domain.
Wire Logging
Wire logging logs packets of information as they appear on the network. This
information is normally referred to as a dump. Wire dumps enable you to see what
information is being sent to the server and what information the server is sending
1298 Chapter 35 / HTTP Procedure
back. Because you can see the raw data, wire dumps can be useful in debugging
your programs.
For more information, see “SAS Logging” in SAS Logging: Configuration and
Programming Reference.
Beginning with the November 2019 release of SAS 9.4M6 and SAS Viya 3.5, you can
send post URL-encoded data with the FORM parameter in the IN= argument. The
QUERY= procedure option enables you to submit URL-encoded query parameters.
data _null_;
firstname = urlencode("&firstname");
lastname = urlencode("&lastname");
call symputx("firstname",firstname,'G');
call symputx("lastname",lastname,'G');
run;
The URLENCODE function and the SYMPUTX call routine were necessary to
encode the data before submitting the PROC HTTP request.
Usage: HTTP Procedure 1299
With the new FORM option, the same request can now be submitted as follows:
%let firstname=Mickey;
%let lastname=Mouse;
proc http url="http://httpbin.org/post"
in = form ("firstname"="&firstname"
"lastname"="&lastname");
run;
The QUERY= option makes adding query parameters to a URL easier. You can still
add query parameters to a URL manually as shown below. However, when you
modify the URL manually you are responsible for URL encoding any special
characters and making sure that the & (ampersand) appears correctly and is not
mistaken for a macro variable.
url="http://httpbin.org/GET?firstname=&firstname
%nrstr(&lastname)=&lastname";
With the new QUERY= procedure option, the above request can now be submitted
as follows:
%let firstname=Mickey;
%let lastname=Mouse;
proc http url="http://httpbin.org/get"
query = ("firstname"="&firstname"
"lastname"="&lastname");
run;
Both options URL-encode the input data by default unless you specify the
NOENCODE option before a name=value pair.
Note: The new functionality does not include support for nested multipart
uploads.
proc http
url="httpbin.org/post"
in = MULTI FORM ( "Name1" = "Raw Input 1" ,
"Name2" = input1 header="Foo: Bar" header="Foo2: Bar2" ,
"Name3" = input2 FILENAME="Different Filename"
header="Content-Type: text/plain")
1300 Chapter 35 / HTTP Procedure
out=resp
;
run;
o a unique identifier that indicates the beginning of the part. In this example,
that is “Namen”.
o one or more optional HTTP headers. The HTTP headers must be specified
before the input data. For a form, it is a good idea to specify a content-type
header to indicate the type of content that will follow. The Content-
Disposition header is derived from Name= and FILENAME= values. If the
input is a fileref, and a FILENAME value is not specified, the FILENAME=
value is derived from the fileref.
o input data, supplied as a string or a fileref.
For more information, see “Example 12: Create A Program That Captures Status
Response Values” on page 1312.
The macro variables are set with the %LET statement. In the event that you disable
cookie caching, you can delete the macro variable from the symbol table with the
%SYMDEL statement.
Details
This example makes a GET request to a server on the local network. GET is the
simplest and most common request that you can make with PROC HTTP. Beginning
with SAS 9.4M3, GET is the default METHOD value when the IN argument is
omitted for a PROC HTTP request, making the argument optional. A GET request
must specify METHOD=GET in earlier releases of SAS software.
Program
filename resp TEMP;
proc http
url="http://httpbin.org/get"
out=resp;
run;
Example 3: Specify a Proxy In the HTTP Request 1303
Details
This example makes a simple POST request to a server on the local network. The
file to upload is identified by a fileref in the IN argument. When the IN argument is
specified, the default METHOD= value in all SAS releases is POST. The response
and the output headers are written to filerefs.
Program
filename resp TEMP;
filename headout TEMP;
filename input TEMP;
data _null_;
file input;
put "this is some sample text";
run;
proc http
url="http://httpbin.org/post"
in=input
out=resp
headerout=headout;
run;
Details
This example makes the same request as in “Example 2: A Simple POST Request”
on page 1303, except the call is sent to an external server and therefore requires
the use of a proxy server. This example uses the PROXYHOST argument to specify
the name of the external server and the PROXYPORT argument to specify the port
number.
Program
filename out "u:\prochttp\Testware\ProxyTest_out.txt";
filename input TEMP;
data _null_;
file input;
put "this is some sample text";
run;
proc http
url="http://httpbin.org/post"
method=post
in=input
out=out
proxyhost="proxyhost.company.com"
proxyport=889;
run;
Details
The PROC HTTP IN= argument accepts a quoted input string or a fileref to submit
input data. Specifying input in a string makes it easier to send text posts and form-
based posts. This example submits the form that can be found at http://
httpbin.org/forms/post. The response is written to a response file.
Example 5: Specify A Proxy In a Macro Variable 1305
Program
filename resp TEMP;
proc http
method=post
url="http://httpbin.org/post"
in='custname=Sas+User&custtel=919-555-5555&custemail=sas.user%40
sas.com&size=medium&topping=cheese&delivery=12%3A00&comments=Dont
+Drop+It'
out=resp;
run;
data _null_;
infile resp;
input;
put _infile_;
run;
{
"args": {},
"data": "",
"files": {},
"form": {
"comments": "Dont Drop It",
"custemail": "[email protected]",
"custname": "Sas User",
"custtel": "919-555-5555",
"delivery": "12:00",
"size": "medium",
"topping": "cheese"
},
"headers": {
"Accept": "*/*",
"Content-Length": "133",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "SAS/9",
},
"json": null,
"origin": "149.173.1.80, 104.129.194.85",
"url": "http://httpbin.org/post"
}
Details
This example makes the same request as in “Example 3: Specify a Proxy In the
HTTP Request” on page 1303, except the proxy server is specified in a macro
variable and the input text is specified as a string in the IN argument. The
PROCHTTP_PROXY macro variable specifies the proxy server’s Internet host name
and port number as one value. Because the proxy is set in the macro variable, it is
available to all subsequent HTTP requests that are made in the SAS session. In this
request, parameters to the POST are read from a string that is specified in the IN
argument.
Program
%let PROCHTTP_PROXY=proxyhost.company.com:889;
proc http
url="http://httpbin.org/post"
method=post
in="text to write out"
out=out;
run;
Details
This example makes the same POST request as in “Example 5: Specify A Proxy In a
Macro Variable” on page 1305 but captures the response headers in a file called
headerOut.txt.
Example 7: A GET That Specifies HEADEROUT_OVERWRITE 1307
Program
%let PROCHTTP_PROXY=proxyhost.company.com:889;
proc http
url="http://httpbin.org/post"
method=post
in="text to write out"
out=out
headerout=hdrout;
run;
Details
This example shows the effects of the HEADEROUT_OVERWRITE argument. The
GET requests redirect twice before reaching their destination.
HEADEROUT_OVERWRITE causes only the last output header to be recorded.
proc http
url="http://httpbin.org/redirect/2"
method=GET
headerout=hdrs
out=out;
run;
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 20 Apr 2015 14:19:53 GMT
Content-Type: application/json
Content-Length: 195
Connection: keep-alive
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
proc http
url="http://httpbin.org/redirect/2"
method=GET
headerout=hdrs
out=out
HEADEROUT_OVERWRITE;
run;
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 20 Apr 2015 14:22:48 GMT
Content-Type: application/json
Content-Length: 195
Connection: keep-alive
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Example 8: A GET That Uses the HEADERS Statement 1309
Details
The following is an example of a GET method request that specifies the HEADERS
statement. As of SAS 9.4M3, GET is also the default method when the IN argument
is not specified.
Program
filename resp TEMP;
proc http
url="http://httpbin.org/headers"
out=resp;
headers
"Accept"="application/json";
run;
data _null_;
infile resp;
input;
put _infile_;
run;
"headers": {
"Accept": "*/*,application/json",
"Host": "httpbin.org",
"User-Agent": "SAS/9",
}
}
1310 Chapter 35 / HTTP Procedure
Details
This example submits the MKCOL WEBDAV http method. Output is written to a
temporary file named Resp. There are no input and output requirements for
nonstandard methods. As long as the target server returns data and you have
specified a valid OUT, data will be written to your OUT fileref. Here, output is
written to Resp.
Program
filename resp TEMP;
proc http
url="http://hostname/directory/"
method=MKCOL
out=resp;
run;
Details
This example specifies an authentication type for a PROC HTTP request. Two
authentication types are specified, indicating that only Negotiate or NTLM
authentication are allowed.
Program
proc http
url="http://securesite.com"
AUTH_NEGOTIATE
AUTH_NTLM;
run;
Details
This example specifies the EXPECT_100_CONTINUE header.
Program
filename resp TEMP;
filename hdrs TEMP;
proc http
url="http://httpbin.org/put"
method=PUT
in='Some Put Data'
out=resp
headerout=hdrs
EXPECT_100_CONTINUE;
1312 Chapter 35 / HTTP Procedure
run;
data _null_;
infile hdrs;
input;
put _infile_;
run;
data _null_;
infile resp;
input;
put _infile_;
run;
HTTP/1.1 200 OK
Server: myserver/18.0
Date: Mon, 24 Nov 2014 20:18:29 GMT
Content-Type: application/json
Content-Length: 652
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
X-Cache: MISS from transproxy
Via: 1.1 vegur, 1.1 transproxy (squid)
Connection: keep-alive
{
"args": {},
"data": "Some Put Data",
"files": {},
"form": {},
"headers": {
"Accept": "*/*",
"Content-Length": "13",
"Content-Type": "application/octet-stream",
"Host": "httpbin.org",
"User-Agent": "SAS/9",
"Xxpect": "100-continue",
},
"json": null,
"origin": "149.173.1.80, 104.129.194.85",
"url": "http://httpbin.org/put"
}
Details
This example creates a simple macro program that tests the value set in the
SYS_PROCHTTP_STATUS_CODE macro variable. The program specifies to print an
error message when the result code does not match a specified value for the
SYS_PROCHTTP_STATUS_CODE macro variable. The program also specifies to
print the actual values returned by the SYS_PROCHTTP_STATUS_CODE and
SYS_PROCHTTP_STATUS_PHRASE macro variables in any error messages.
The macro program is then invoked after various PROC HTTP requests to illustrate
the results that you can expect under various circumstances. The macro program is
invoked with a value of 200 in each execution. A value of 200 indicates successful
completion of the HTTP request.
Program
The following code creates the macro program.
%macro prochttp_check_return(code);
%if %symexist(SYS_PROCHTTP_STATUS_CODE) ne 1 %then %do;
%put ERROR: Expected &code., but a response was not received from
the HTTP Procedure;
%abort;
%end;
%else %do;
%if &SYS_PROCHTTP_STATUS_CODE. ne &code. %then %do;
%put ERROR: Expected &code., but received &SYS_PROCHTTP_STATUS_CODE.
&SYS_PROCHTTP_STATUS_PHRASE.;
%abort;%end;
%end;
%mend;
Here is the log for the request. The request returned 200. For PROC HTTP requests
whose return code matches the code value specified in the macro program, the
values of the macro variables are ignored. There is no error message; therefore, no
need to print the value of the status reporting macro variables.
NOTE: 200 OK
175
176 %prochttp_check_return(200);
177
178
When a server connection cannot be made, the procedure returns an error message.
However, the error message does not include a return code or a status phrase.
NOTE: The SAS System stopped processing this step because of errors.
181
182 %prochttp_check_return(200);
ERROR: Expected 200, but a response was not received from the HTTP Procedure
ERROR: Execution terminated by an %ABORT statement.
183
When a server connection is made, but the HTTP request fails, the error message
prints the actual return code and its corresponding status phrase.
186
187 %prochttp_check_return(200);
ERROR: Expected 200, but received 404 Not Found
ERROR: Execution terminated by an %ABORT statement.
188
When a server connection is made, but the PROC HTTP request cannot be parsed,
the macro returns an error message. The message does not include a return code or
status phrase.
1316 Chapter 35 / HTTP Procedure
Details
This example shows the information returned by the DEBUG statement Level=
option.
Debug Level 1
A debug level of 1 displays the HTTP request and response headers.
proc http
in = "*************testing prochttp****************"
method=POST
url="http://httpbin.org/post";
debug level = 1;
run;
Example 13: Use the DEBUG Statement with the LEVEL= Option 1317
Debug Level 2
A debug level of 2 displays the HTTP request and response headers and the HTTP
request body.
proc http
in = "*************testing prochttp****************"
method=POST
url="http://httpbin.org/post";
debug level = 2;
run;
Beginning with SAS 9.4M6, the following information is displayed in the log. The
request body is written as binary by default. Use the OUTPUT_TEXT DEBUG
statement option if you want to write the information as text.
1318 Chapter 35 / HTTP Procedure
Debug Level 3
A debug level of 3 displays the request header, response header, request body, and
response body.
proc http
in = "*************testing prochttp****************"
method=POST
url="http://httpbin.org/post";
debug level = 3;
run;
Beginning with SAS 9.4M6, the following information is displayed in the log. The
request and response body are written as binary by default.
Example 14: Specify Local Options for Two-Way Encryption in Windows 1319
Details
The following is an example of a PROC HTTP request that specifies local
encryption options to send a client certificate for two-way TLS authentication on
Windows. The name of the Certificate Authority (CA) that issued the client
certificate is named Glenn’s CA.
The system options used are host dependent. For more information about the
system options in a SAS 9.4 installation, see Encryption in SAS. For more
information about system options in SAS Viya, see Encryption in SAS Viya: Data in
Motion.
Details
The following is an example of a PROC HTTP request that specifies local
encryption options to send a client certificate for two-way TLS authentication on
UNIX. On UNIX, the encryption options that you use depend on whether the
certificate is in PEM or DER format. This example shows how to specify the
location of PEM and DER certificates for user John Doe.
Example 15: Specify Local Options for Two-Way Encryption in UNIX 1321
Note: The paths to the certificates are relative to the location where the web
server is running. Therefore, the certificates must be accessible on disk.
For more information about the system options, see Encryption in SAS.
36
IMPORT Procedure
Starting in SAS 9.4, you can import data from JMP 7 or later files, and JMP variables
can be up to 255 characters long. You can also import value labels to a SAS format
catalog. Extended attributes are now used automatically, and the META=
statement is no longer supported. For more information, see “JMP Files” in
SAS/ACCESS Interface to PC Files: Reference.
When you run the IMPORT procedure, it reads the input file and writes the data to
the specified SAS data set. By default, IMPORT procedure expects the variable
names to appear in the first row. The procedure scans the first 20 rows to count the
variables, and it attempts to determine the correct informat and format for each
variable. You can use the IMPORT procedure’s statements to do the following:
n indicate how many rows SAS scans for variables to determine the type and
length (GUESSINGROWS=)
n indicate at which row SAS begins to read the data (DATAROW=)
You can also use these same statements to change the default values.
When the IMPORT procedure reads a delimited file, it generates a DATA step to
import the data. You control the results with options and statements that are
specific to the input data source. The IMPORT procedure generates the specified
output SAS data set and writes information about the import to the SAS log. The
log displays the DATA step code that is generated by the IMPORT procedure.
If you need to revise your code after the procedure runs, issue the RECALL
command (or press F4) to recall the generated DATA step. At this point, you can
add or remove options from the INFILE statement and customize the INFORMAT,
FORMAT, and INPUT statements to your data.
If you use this method and modify an informat, also modify the format for that
same variable. The informat and format for a given variable also must be of the
same type (either character or numeric). In addition, if the type is character, the
assigned format should be as long as the variable to avoid truncation when the data
is displayed. For example, if a character variable is 400 characters long but has a
format of $char50, then only the first 50 characters are shown when the data is
displayed.
To recall your PROC IMPORT code, issue a second RECALL command (or press F4
again).
Note: By default, the IMPORT procedure reads delimited files as varying record-
length files. If your external file has a fixed-length format, use a SAS DATA step
with an INFILE statement that includes the RECFM=F and LRECL= options. For
more information, see the INFILE statement, RECFM= option in SAS DATA Step
Statements: Reference.
Overview: IMPORT Procedure 1325
The Import Wizard or the External File Interface (EFI) can also be used to import
data. They can guide you through the steps to import an external data source. You
can use the Import Wizard to generate IMPORT procedure statements, which you
can save to a file for subsequent use.
To open the Import Wizard or EFI from the SAS windowing environment, select File
ð Import Data. For more information about the Import Wizard or EFI, see the Base
SAS online Help and documentation. For more detail and an example, see “Using
SAS Import and Export Wizards” in SAS/ACCESS Interface to PC Files: Reference.
Note: EFI does not recognize non-ASCII characters when SAS Registry contains
"UseDateStyle"="Yes".
CAUTION
Sequential access is not allowed when you use EFI.
TIP Sharing Delimited Files Across Hosts: When a delimited file is read
into SAS using the IMPORT procedure, each row must end with a host-
specific, end-of-line delimiter. If you share delimited files that were created
on one host with another host, the default end-of-line delimiters will not
match. When this occurs, you must specify the new host’s end-of-line
delimiter for your files.
n Microsoft Windows: The default newline delimiter is Carriage Return/
Linefeed (CRLF). To read a file that is native to UNIX or Linux, use a
FILENAME statement with the TERMSTR=LF option. For more
information, see the FILENAME statement inSAS DATA Step Statements:
Reference.
n UNIX or Linux: The default end-of-row delimiter is Linefeed (LF). To read
a file that is native to Windows, use a FILENAME statement with the
TERMSTR=CRLF option. For more information, see the FILENAME
statement in SAS DATA Step Statements: Reference.
Beginning with SAS 9.4M3, PROC IMPORT uses the NLNUM informat instead of the
COMMA informat. When you import a file that contains values such as 14,000.01
that have commas, the COMMA informat removes the commas and other
non-numeric characters from the numerical values. Removing these characters can
cause interpretation errors in the values. NLNUM prevents these errors by using the
specified value of the LOCALE system option to interpret numerical values that
have commas.
For example, to enter the numerical equivalent of fourteen thousand and one
hundredth, a person specifying LOCALE=English_UnitedStates would enter
14,000.01. A person specifying LOCALE=French_France would enter 14.000,01.
NLNUM interprets either input value correctly and writes the correct value based
on the specified locale. If you read in 14.000,01 with NLNUM and
LOCALE=French_France, store it in a data set, and then write it with NLNUM and
LOCALE=English_UnitedStates, it is displayed as 14,000.01.
1326 Chapter 36 / IMPORT Procedure
1 Type REGEDIT in the SAS command line to open the Registry Editor.
For more information about the encodings of format catalogs, see Migrating Data to
UTF-8 for SAS Viya and SAS Viya FAQ for Processing UTF-8 Data.
TIP
n Use the XLSX engine to read UTF–8 data. For more information, see
“LIBNAME Statement: XLSX Engine” in SAS/ACCESS Interface to PC Files:
Reference.
n Do not use the DBMS=xls option to import spreadsheets that contain
UTF-8 data.
n Use Microsoft Excel to convert your .xls spreadsheets to .xlsx
spreadsheets before you import them with the DBMS=xlsx option.
The VARCHAR data type is similar to the CHAR data type. CHAR variables have a
length that is measured in terms of bytes. VARCHAR variables have a length that is
measured in terms of characters rather than bytes. For information about using
Overview: IMPORT Procedure 1327
VARCHAR, see “Data Types Supported in the CAS DATA Step” in SAS Cloud
Analytic Services: User’s Guide.
In the following example, the CAS engine is used with the LENGTH statement to
create a VARCHAR variable and a CHAR variable. The VARCHAR variable, X, has a
length of 30 and the CHAR variable, Y, also has a length of 30.
libname mycas cas;
data mycas.string;
length x varchar(30);
length y $30;
x = 'abc'; y = 'def';
run;
proc contents data=mycas.string; run;
EFI_ALLCHARS=YES | NO
Default NO
Notes The value of EFI_ALLCHARS applies to all PROC IMPORT steps that
follow.
The AllChars SAS Registry option has the same effect as the
EFI_ALLCHARS macro variable.
1328 Chapter 36 / IMPORT Procedure
Reset the value of EFI_ALLCHARS after the PROC IMPORT step. Otherwise, all
PROC IMPORT steps that follow EFI_ALLCHARS=YES define all variables as
character.
EFI_MISSING_NUMERICS=YES | NO
Default NO
Note The MissingNumerics SAS Registry option has the same effect as the
EFI_MISSING_NUMERICS macro variable.
EFI_NOQUOTED_DELIMITER=YES | NO
Default NO
Note The QuotedDelimiter SAS Registry option has the same effect as the
EFI_NOQUOTED_DELIMITER macro variable.
specifies whether single quotation marks within a variable value are read as
delimiters inside open quotation marks or as part of the variable value. If your
variables contain delimiters, or to read in a single quotation mark as part of the
variable value, set EFI_NOQUOTED_DELIMITER to Yes.
EFI_QUOTED_NUMERICS=YES | NO
Default NO
Note The QuotedNumerics SAS Registry option has the same effect as the
EFI_QUOTED_NUMERICS macro variable.
PROC IMPORT
DATAFILE="filename" | TABLE="tablename"
OUT=<libref.>SAS data set <(SAS data set options)>
<DBMS=identifier> <REPLACE>;
statements for importing from delimited files
DATAROW=n;
DELIMITER=char'' | 'nnx;
GETNAMES=YES | NO;
GUESSINGROWS=n | MAX;
statements for importing from JMP files
DBENCODING=12-char SAS encoding-value;
FMTLIB=<libref.>format-catalog;
META=libref.member-data-set;
1330 Chapter 36 / IMPORT Procedure
PROC IMPORT Import an external data file to a SAS data set Ex. 1, Ex. 2,
Ex. 3, Ex. 4
GETNAMES Generate SAS variable names from the data Ex. 1, Ex. 2
values in the first row in the input file
Restriction: A pathname for a file can have a maximum length of 201 characters.
Tips: Beginning with SAS Viya 3.5, PROC IMPORT supports all access types that are
available in the FILENAME statement.
Beginning with SAS 9.4M5, PROC IMPORT supports the VARCHAR data type for
CAS tables. For more information, see “Support for the VARCHAR Data Type” on
page 1326.
Use the XLSX engine to read UTF–8 data. For more information, see “LIBNAME
Statement: XLSX Engine” in SAS/ACCESS Interface to PC Files: Reference.
Do not use the DBMS=xls option to import spreadsheets that contain UTF-8 data.
Use Microsoft Excel to convert your .xls spreadsheets to .xlsx spreadsheets before
you import them with DBMS=xlsx.
Syntax
PROC IMPORT
DATAFILE="filename " | TABLE="tablename "
PROC IMPORT Statement 1331
Required Arguments
DATAFILE="filename" | "fileref"
specifies the complete path and filename or fileref for the input PC file,
spreadsheet, or delimited external file. A fileref is a SAS name that is associated
with the physical location of the output file. To assign a fileref, use the
FILENAME statement. For more information about the FILENAME statement,
see SAS Global Statements: Reference. For more information about PC file
formats, see SAS/ACCESS Interface to PC Files: Reference
If you specify a fileref or if the complete path and filename does not include
special characters such as the backslash in a path, lowercase characters, or
spaces, then you can omit the quotation marks.
Alias FILE=
Restrictions The IMPORT procedure does not support device types or access
methods for the FILENAME statement except for DISK. For
example, the IMPORT procedure does not support the TEMP device
type, which creates a temporary external file.
The IMPORT procedure can import data only if SAS supports the
data type. SAS supports numeric and character types of data but
not (for example) binary objects. If the data that you want to
import is a type that SAS does not support, the IMPORT procedure
might not be able to import it correctly. In many cases, the
procedure attempts to convert the data to the best of its ability.
However, conversion is not possible for some types.
For delimited files, the first 20 rows are scanned to determine the
variable attributes. You can increase the number of rows that are
scanned by using the GUESSINGROWS= statement. All values are
read in as character strings. If a Date and Time format or a numeric
informat can be applied to the data value, the type is declared as
numeric. Otherwise, the type remains character.
In the first maintenance release of SAS 9.4, a SAS data set name can contain a
single quotation mark when the VALIDMEMNAME=EXTEND system option is
also specified. Using VALIDMEMNAME= expands the rules for the names of
certain SAS members, such as a SAS data set name. For more information, see
“Rules for SAS Data Set Names, View Names, and Item Store Names” in SAS
Language Reference: Concepts.
TABLE="tablename"
specifies the name of the input DBMS table. If the name does not include
special characters (such as question marks), lowercase characters, or spaces,
you can omit the quotation marks. Note that the DBMS table name might be
case sensitive.
When you import a DBMS table, you must specify the DBMS=
option.
PROC IMPORT Statement 1333
Optional Arguments
DBMS=identifier
specifies the type of data to import. You can import delimited files or JMP files
(DBMS=JMP) in Base SAS. The JMP file format must be Version 7 or later, and
JMP variable names can be up to 255 characters long. SAS supports importing
JMP files that have more than 32,767 variables.
To import a tab-delimited file, specify TAB as the identifier. To import any other
delimited file that does not end in .CSV, specify DLM as the identifier. For a
comma-separated file with a .CSV extension, DBMS= is optional. The IMPORT
procedure recognizes .CSV as an extension for a comma-separated file.
The DBMS argument is required if you are importing a file that does not have an
extension, and the data is delimited by tabs. It is also required if you are
importing a TXT file that has data that is delimited with a comma.
See Table 23.66 on page 855 for more information about identifiers for
this option.
REPLACE
overwrites an existing SAS data set. If you omit REPLACE, the IMPORT
procedure does not overwrite an existing data set.
1334 Chapter 36 / IMPORT Procedure
CAUTION
Using the IMPORT procedure with the REPLACE option to write to an
existing SAS generation data set causes the most recent (base) generation
data set or group of generation data sets to be deleted.
If you write to an existing generation data set using the IMPORT procedure with
the REPLACE option and you do one of the following:
n specify the GENMAX= data set option to increase or decrease the number of
generations, then all existing generations are deleted and replaced with a
single new base generation data set
n omit the GENMAX= data set option, then all existing generations are deleted
and replaced with a single new data set by the same name, but it is not a
generation data set
Instead, use a SAS DATA step with the REPLACE= data set option to replace a
permanent SAS data set and to maintain the generation group for that SAS data
set. For more information, see “Understanding Generation Data Sets” in SAS
Language Reference: Concepts.
Restriction You cannot specify data set options when importing delimited,
comma-separated, or tab-delimited external files.
DATAROW Statement
Starts reading data from the specified row number in the delimited text file.
Syntax
DATAROW=n;
Required Argument
n
specifies the row number in the input file for the IMPORT procedure to start
reading data.
DBENCODING Statement
Indicates the encoding character set to use for the JMP file.
Syntax
DBENCODING=12-char SAS encoding-value;
Required Argument
12-char SAS encoding-value
indicates the encoding to use with JMP files. Encoding maps each character in a
character set to a unique numeric representation, which results in a table of
code points. A single character can have different numeric representations in
different encodings. This value can be up to 12 characters long.
DELIMITER Statement
Specifies the delimiter that separates columns of data in the input file.
Syntax
DELIMITER=char'' | 'nnx;
Required Argument
char | 'nn'x
specifies the delimiter that separates columns of data in the input file. You can
specify the delimiter as a single character or as a hexadecimal value. For
example, if columns of data are separated by an ampersand, specify
DELIMITER='&'.
Note: If you omit DELIMITER=, the IMPORT procedure assumes that the
delimiter is a space.
The DELIMITER statement is required when you import a file that meets any of
these criteria:
n a file that does not have a file extension
n a file that has a .TXT extension and contains data that is delimited by
anything other than tabs
n a file that has a .TXT, .CSV, or .JMP extension and contains data that is
delimited by blank spaces
This example shows how to use the DBMS argument and the DELIMITER
statement to specify a comma delimiter for a file that has a .TXT extension.
proc import datafile="C:\temp\test.txt"
out=test
dbms=dlm
replace;
delimiter=',';
run;
FMTLIB Statement
Saves value labels to the specified SAS format catalog.
Syntax
FMTLIB=<libref.>format-catalog;
GETNAMES Statement 1337
Required Argument
<libref.>format-catalog
specifies the format catalog where the value labels are saved.
GETNAMES Statement
Specifies whether the IMPORT procedure generates SAS variable names from the data values in the
first row in the input file.
Default: YES
Restrictions: Valid only with the IMPORT procedure.
If VALIDVARNAME=ANY is used, GETNAMES= might not prefix an underscore to
the data value.
Interaction: The GETNAMES statement is valid only for delimited files.
Examples: “Example 1: Importing a Delimited File” on page 1339
“Example 2: Importing a Specific Delimited File Using a Fileref” on page 1342
“Example 4: Importing a Comma-Delimited File with a CSV Extension” on page 1349
Syntax
GETNAMES=YES | NO;
Required Argument
YES | NO
YES specifies that the IMPORT procedure generate SAS variable names
from the data values in the first row of the imported delimited file.
NO specifies that the IMPORT procedure generate SAS variable names
as VAR1, VAR2, and so on.
Note: If a data value in the first row in the input file is read and it contains
special characters that are not valid in a SAS name, such as a blank, then SAS
converts the character to an underscore. For example, the variable name
Occupancy Code would become the SAS variable name Occupancy_Code.
Because SAS variable names cannot begin with a number, GETNAMES= prefixes
an underscore to a variable name rather than replace the value’s first character.
For example, 2014.CHANGES becomes _2014_CHANGES.
1338 Chapter 36 / IMPORT Procedure
GUESSINGROWS Statement
Specifies the number of rows of the file to scan to determine the appropriate data type and length
for the variables.
Default: 20
Restriction: This value should be greater than the value specified for DATAROW.
Interaction: The GUESSINGROWS statement is valid only for delimited files.
Syntax
GUESSINGROWS=n | MAX;
Required Arguments
n
indicates the number of rows the IMPORT procedure scans in the input file to
determine the appropriate data type and length of variables. The range is 1 to
2147483647 (or MAX). The scan data process scans from row 1 to the number
that is specified by the GUESSINGROWS option.
Note: You can change the default row value in the SAS Registry. From the SAS
command line, enter regedit. When the Registry Editor opens, select Products
ð BASE ð EFI ð GuessingRows.
MAX
can be specified instead of 2147483647. Specifying the maximum value could
adversely affect performance.
META Statement
Saves JMP metadata information to the specified SAS data set. (Deprecated)
Syntax
META=libref.member-data-set;
Example 1: Importing a Delimited File 1339
Required Argument
libref.member-data-set
specifies the SAS data set that contains the metadata information is to be
written.
The META statement is no longer supported for importing a JMP file and is
ignored. Instead, extended attributes are automatically used. When importing a
JMP file with extended attributes, the attributes are automatically attached to
the new SAS data set.
The META statement can remain in programs, yet it generates a NOTE in the log
saying that META has been replaced by extended attributes and is ignored.
Details
This example imports the following delimited external file and creates a temporary
SAS data set named WORK.MYDATA:
Region&State&Month&Expenses&Revenue
Southern&GA&JAN2001&2000&8000
Southern&GA&FEB2001&1200&6000
Southern&FL&FEB2001&8500&11000
Northern&NY&FEB2001&3000&4000
Northern&NY&MAR2001&6000&5000
Southern&FL&MAR2001&9800&13500
1340 Chapter 36 / IMPORT Procedure
Northern&MA&MAR2001&1500&1000
;
Program
options nodate ps=60 ls=80;
proc import datafile="C:\My Documents\myfiles\delimiter.txt"
dbms=dlm
out=mydata
replace;
delimiter='&';
getnames=yes;
run;
Program Description
Set your system options. The NODATE option suppresses the display of the date
and time in the output. The LINESIZE= option specifies the output line length, and
the PAGESIZE= option specifies the number of lines on an output page.
options nodate ps=60 ls=80;
Specify the input file. Specify that the input file is a delimited file. Replace the data
set if it exists. Identify the output SAS data set.
proc import datafile="C:\My Documents\myfiles\delimiter.txt"
dbms=dlm
out=mydata
replace;
run;
Log
The SAS log displays information about the successful import. For this example,
the IMPORT procedure generates a SAS DATA step, as shown in the partial log that
follows.
Example Code 36.1 External File Imported to Create a SAS Data Set
2 /**********************************************************************
3 * PRODUCT: SAS
4 * VERSION: 9.3
5 * CREATOR: External File Interface
6 * DATE: 31JAN11
7 * DESC: Generated SAS Datastep Code
8 * TEMPLATE SOURCE: (None Specified.)
9 ***********************************************************************/
10 data WORK.MYDATA ;
11 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
12 infile 'C:\My Documents\myfiles\delimiter.txt' delimiter = '&' MISSOVER
12 ! DSD lrecl=32767 firstobs=2 ;
13 informat Region $8. ;
14 informat State $2. ;
15 informat Month MONYY7. ;
16 informat Expenses best32. ;
17 informat Revenue best32. ;
18 format Region $8. ;
19 format State $2. ;
20 format Month MONYY7. ;
21 format Expenses best12. ;
22 format Revenue best12. ;
23 input
24 Region $
25 State $
26 Month
27 Expenses
28 Revenue
29 ;
30 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection
30 ! macro variable */
31 run;
Output
Output 36.1 Data Set Work.MyData
Details
This example imports the following space-delimited file and creates a temporary
SAS data set named Work.States.
Region State Capital Bird
South Georgia Atlanta 'Brown Thrasher'
South 'North Carolina' Raleigh Cardinal
North Connecticut Hartford Robin
West Washington Olympia 'American Goldfinch'
Midwest Illinois Springfield Cardinal
Example 2: Importing a Specific Delimited File Using a Fileref 1343
Program
filename stdata 'c:\temp\state_data.txt' lrecl=100;
Program Description
Specify a filename.
filename stdata 'c:\temp\state_data.txt' lrecl=100;
Specify the input file. Specify the input file is a delimited file. Replace the data set
if it exists. Identify the output SAS data set.
Specify a blank value for the DELIMITER statement. Generate variable names
from the first row of data with the GETNAMES statement.
delimiter=' ';
getnames=yes;
run;
Log
The SAS log displays information about the successful import. For this example,
the IMPORT procedure generates a SAS DATA step, as shown in the partial log that
follows.
1344 Chapter 36 / IMPORT Procedure
334 /**********************************************************************
335 * PRODUCT: SAS
336 * VERSION: 9.4
337 * CREATOR: External File Interface
338 * DATE: 18APR14
339 * DESC: Generated SAS Datastep Code
340 * TEMPLATE SOURCE: (None Specified.)
341 ***********************************************************************/
342 data WORK.STATES ;
343 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
344 infile STDATA delimiter = ' ' MISSOVER DSD lrecl=32767 firstobs=2 ;
345 informat Region $7. ;
346 informat State $16. ;
347 informat Capital $11. ;
348 informat Bird $20. ;
349 format Region $7. ;
350 format State $16. ;
351 format Capital $11. ;
352 format Bird $20. ;
353 input
354 Region $
355 State $
356 Capital $
357 Bird $
358 ;
359 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection
359! macro variable */
360 run;
Output
Output 36.2 Work.States Data Set
Details
This example imports the following tab-delimited file and creates a temporary SAS
data set named Work.Class.
Barbara F 13
Jeffery M 13
Carol F 14
Judy F 14
Alfred M 14
Henry M 14
Jenet F 15
Mary F 15
Ronald M 15
William M 15
Philip M 16
Program
proc import datafile='C:\userid\pathname\Class.txt'
out=class
dbms=dlm
replace;
datarow=5;
delimiter='09'x;
run;
proc print data=class;
run;
Program Description
Specify the input file. The GETNAMES= option defaults to 'yes'. Specify the input
file as a delimited file. Replace the data set if it exists. Specify the output data set.
proc import datafile='C:\userid\pathname\Class.txt'
out=class
dbms=dlm
replace;
The first row read will be row 5 due to the DATAROW= option specification.
datarow=5;
Log
The SAS log displays information about the successful import. For this example,
the IMPORT procedure generates a SAS DATA step, as shown in the partial log that
follows.
1348 Chapter 36 / IMPORT Procedure
Output
Output 36.3 Work.Class Data Set
Details
This example imports the following comma-delimited file and creates a temporary
SAS data set named Work.Shoes.
"Africa","Boot","Addis Ababa","12","$29,761","$191,821","$769"
"Asia","Boot","Bangkok","1","$1,996","$9,576","$80"
"Canada","Boot","Calgary","8","$17,720","$63,280","$472"
"Central America/Caribbean","Boot","Kingston","33","$102,372","$393,376","$4,454"
"Eastern Europe","Boot","Budapest","22","$74,102","$317,515","$3,341"
"Middle East","Boot","Al-Khobar","10","$15,062","$44,658","$765"
"Pacific","Boot","Auckland","12","$20,141","$97,919","$962"
"South America","Boot","Bogota","19","$15,312","$35,805","$1,229"
"United States","Boot","Chicago","16","$82,483","$305,061","$3,735"
"Western Europe","Boot","Copenhagen","2","$1,663","$4,657","$129"
Program
proc import datafile="C:\temp\test.csv"
out=shoes
dbms=csv
replace;
getnames=no;
run;
proc print data=work.shoes;
run;
Program Description
Specify the input data file. Replace the data set if it exists. Specify the output data
set.
proc import datafile="C:\temp\test.csv"
out=shoes
dbms=csv
replace;
Setting the GETNAMES= option to 'no' causes the variable names in record 1 are
not used.
getnames=no;
run;
Example 4: Importing a Comma-Delimited File with a CSV Extension 1351
Log
The SAS log displays information about the successful import. For this example,
the IMPORT procedure generates a SAS DATA step, as shown in the partial log that
follows.
1352 Chapter 36 / IMPORT Procedure
463 /**********************************************************************
464 * PRODUCT: SAS
465 * VERSION: 9.4
466 * CREATOR: External File Interface
467 * DATE: 18APR14
468 * DESC: Generated SAS Datastep Code
469 * TEMPLATE SOURCE: (None Specified.)
470 ***********************************************************************/
471 data WORK.SHOES ;
472 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
473 infile 'C:\myfiles\test.csv' delimiter = ',' MISSOVER DSD lrecl=32767 ;
474 informat VAR1 $27. ;
475 informat VAR2 $6. ;
476 informat VAR3 $13. ;
477 informat VAR4 $4. ;
478 informat VAR5 $10. ;
479 informat VAR6 $10. ;
480 informat VAR7 $8. ;
481 format VAR1 $27. ;
482 format VAR2 $6. ;
483 format VAR3 $13. ;
484 format VAR4 $4. ;
485 format VAR5 $10. ;
486 format VAR6 $10. ;
487 format VAR7 $8. ;
488 input
489 VAR1 $
490 VAR2 $
491 VAR3 $
492 VAR4 $
493 VAR5 $
494 VAR6 $
495 VAR7 $
496 ;
497 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection
497! macro variable */
498 run;
Output
Output 36.4 Work.Shoes Data Set
1354 Chapter 36 / IMPORT Procedure
1355
37
JAVAINFO Procedure
Statement Task
Interaction: When a SAS server is in a locked-down state, the JAVAINFO procedure does not
execute. For more information, see “SAS Processing Restrictions for Servers in a
Locked-Down State” in SAS Language Reference: Concepts.
Syntax
PROC JAVAINFO <options>;
Optional Arguments
ALL
lists current information about the SAS Java environment.
CLASSPATHS
lists information about the classpaths that Java is using.
HELP
provides usage assistance in using the JAVAINFO procedure.
JREOPTIONS
lists the Java properties that are set when the JREOPTIONS configuration
option is specified.
n When used in PROC JAVAINFO, JREOPTIONS specifies the JREOPTIONS
Java properties that are set when Java is started.
n When used in PROC OPTIONS, JREOPTIONS specifies the Java options that
are in the configuration file when SAS is started.
Note: SAS.cfg is the configuration file specified during installation, but other
configuration files can be specified.
OS
lists information about the operating system that SAS is running under.
version
lists the Java Runtime Environment (JRE) that SAS is using.
1357
38
JSON Procedure
SAS provides the JSON engine to read a JSON file. For more information, see
“LIBNAME Statement: JSON Engine” in SAS Global Statements: Reference.
1. Java Script Object Notation (JSON) is a text-based, open standard data format that is designed for human-readable data
interchange. JSON is based on a subset of the JavaScript programming language and uses JavaScript syntax for
describing data objects.
Concepts: JSON Procedure 1359
JSON Containers
JSON output consists of two types of data structure containers:
JSON object container ({ })
begins with a left brace ({) and ends with a right brace (}). An object container
collects key-value pairs that are written as pairs of names and values. A value
can be any of the supported JSON data types, an object, or an array. Each name
is followed a colon and then the value. The key-value pairs are separated by a
comma.
JSON array container ([ ])
begins with a left bracket ([) and ends with a right bracket (]). An array
container collects a list of values that are written as a list of values without
names. A value can be any of the supported JSON data types, an object, or an
array. Values are separated by a comma.
n The WRITE OPEN ARRAY statement explicitly opens an array container, which
you must explicitly close with the WRITE CLOSE statement.
Nesting Containers
The top-level container can include any number of containers. Containers, likewise,
can nest containers to an arbitrary depth. When nesting containers, be careful to
observe the data structure requirements of the current container.
n Objects require a list of key-value pairs, where the value can itself be an object
or array.
n Arrays have no such structural requirement of key-value pairs and are merely a
list of values, objects, or arrays.
n The WRITE VALUES statement or statements for an object container must
result in an even number of values, and the name portion of the key-value pair
must be a string.
run;
{"level":1,"container":"object","created":"implicitly","nest":["level",2,
"container","array","created","explicitly"]}
In this example, the exported data is nested within three array containers.
proc json out="example.json";
write open array; /* explicit open of array */
write values "level" 1; /* write data to array */
write values "container" "array"; /* write data to array */
write values "created" "explicitly"; /* write data to array */
write open array; /* explicit open of array */
write values "level" 2; /* write data to array */
write values "container" "array"; /* write data to array */
write values "created" "explicitly"; /* write data to array */
export sashelp.class (where=(age=11))/nokeys; /* export SAS data */
write close; /* close explicit open */
write close; /* close explicit open */
run;
["level",1,"container","array","created","explicitly",["level",2,"container",
"array","created","explicitly","SASJSONExport","1.0 NOKEYS",
"SASTableData+CLASS",[["Joyce","F",11,51.3,50.5],["Thomas","M",11,57.5,85]]]]
Missing Values
A missing value in SAS is a type of value for a variable that contains no data for a
particular observation or variable. By default, SAS represents a missing numeric
value as a single period and a missing character value as a blank space.
JSON also has the concept of missing values referred to as a null value, which is a
special value that indicates the absence of information.
PROC JSON writes a JSON null value to the JSON output file when the following is
true:
n a SAS data set numeric variable contains a missing value
By default, SAS writes an empty string ("") to the JSON output file when a SAS data
set character variable contains a missing value. However, if you specify the
NOTRIMBLANKS option, the entire string of blanks is written to the JSON output
file.
1362 Chapter 38 / JSON Procedure
By default, PROC JSON scans input strings to ensure that they contain only
characters that are acceptable as JSON strings. Unacceptable characters in an
input string are replaced with the proper escape sequence.
You can specify the NOSCAN option to indicate that the input string is known to
contain acceptable characters or has already been scanned. NOSCAN is supported
in the PROC JSON statement, the EXPORT statement, and the WRITE VALUES
statement.
Name Description
n If the current SAS session encoding is not UTF-8, you can override the encoding
of a JSON output file only if the current SAS session encoding is compatible
with US-ASCII and all strings written to the JSON output file contain only low-
range Latin1 characters (that is, code points 0-127).
PROC JSON Statement 1363
For example, the following code exports a JSON output file that uses the Unicode
UTF-16BE character set encoding. The current SAS session encoding is Wlatin1. The
ENCODING= option in the FILENAME statement tells PROC JSON to transcode the
data from Wlatin1 to the specified Unicode UTF-16BE form when writing to the
external file.
PROC JSON Exports a SAS data set to a JSON file. Ex. 1, Ex. 2,
Ex. 3, Ex. 4,
Ex. 5, Ex. 6,
Ex. 7, Ex. 8
EXPORT Identify the SAS data set to export and control Ex. 1, Ex. 2,
the resulting output Ex. 3, Ex. 5,
Ex. 6, Ex. 7,
Ex. 8
WRITE VALUES Write one or more values to the output file Ex. 4, Ex. 5,
Ex. 6, Ex. 8
WRITE OPEN Open and nest a JSON container in the output Ex. 4, Ex. 5,
file Ex. 6, Ex. 8
WRITE CLOSE Close a JSON container that is open in the Ex. 4, Ex. 5,
output file Ex. 6, Ex. 8
Examples: “Example 1: Exporting a JSON File Using Default Options” on page 1380
“Example 2: Exporting Data to a Top-Level JSON Array Container” on page 1381
“Example 3: Suppressing SAS Variable Names in Observation Data When Using the
EXPORT Statement” on page 1382
“Example 4: Writing JSON Output without Exporting a SAS Data Set” on page 1384
“Example 5: Writing Values and Exporting Data in the Same Program” on page 1387
“Example 6: Writing Values and Controlling Containers in Exported Data” on page
1391
“Example 7: Applying SAS Formats to the Resulting Output” on page 1395
“Example 8: Exporting Multiple SAS Data Sets to a JSON File” on page 1398
Syntax
PROC JSON OUT=fileref | "external-file" <options>;
Required Argument
OUT=fileref | "external-file"
identifies the JSON output file.
fileref
specifies the SAS fileref that is assigned to the JSON output file. To assign a
fileref, use the FILENAME statement.
"external-file"
is the physical location of the JSON output file. Include the complete
pathname and the filename. Enclose the physical name in single or double
quotation marks. The maximum length is 200 characters.
Optional Arguments
FMTCHARACTER | NOFMTCHARACTER
determines whether to apply a character SAS format to the resulting output if a
character SAS format is associated with a SAS data set variable.
Default NOFMTCHARACTER
FMTDATETIME | NOFMTDATETIME
determines whether to apply a date, time, or datetime SAS format to the
resulting output if a date, time, or datetime SAS format is associated with a SAS
data set variable. Applying a SAS format makes the date and time values in the
resulting JSON output more readable.
Default FMTDATETIME
FMTNUMERIC | NOFMTNUMERIC
determines whether to apply a numeric SAS format to the resulting output if a
numeric SAS format is associated with a SAS data set variable.
Default NOFMTNUMERIC
1366 Chapter 38 / JSON Procedure
Restriction Only the SAS formats BESTw., Ew., and w.d write a JSON number
to the output file. All other numeric SAS formats result in a JSON
string.
Requirement FMTNUMERIC applies numeric SAS formats. For date, time, and
datetime SAS formats, use the FMTDATETIME option.
KEYS | NOKEYS
determines whether exported observations are written as JSON objects or as
JSON arrays.
A JSON object stores SAS variable values from observations as key-value pairs.
The SAS variable name is the key. A JSON array stores variable values only.
Default KEYS
Interaction You can specify NOKEYS in the PROC JSON statement, the
EXPORT statement, or both. If the option is specified in both
statements, the EXPORT statement specification takes precedence.
PRETTY | NOPRETTY
determines how to format the JSON output. PRETTY creates a more human-
readable format that uses indention to illustrate the JSON container structure.
NOPRETTY writes the output in a single line.
Default NOPRETTY
Restriction You can specify PRETTY | NOPRETTY in the PROC JSON statement
only.
SASTAGS | NOSASTAGS
determines whether to include or suppress SAS metadata when using the
EXPORT statement. The metadata consists of the SAS export version, exported
SAS data set name, and any non-default option specification, such as PRETTY.
PROC JSON Statement 1367
When EXPORT is the first statement in the procedure request, the top-level
container for the exported data is a JSON object container. The SAS metadata
precedes the exported data. When NOSASTAGS is specified, the top-level
container is a JSON array container and the SAS metadata is suppressed.
See the following topics for information about what happens when the EXPORT
statement is preceded by a WRITE OPEN statement in the PROC JSON request:
n WRITE OPEN ARRAY — “Example 5: Writing Values and Exporting Data in
the Same Program” on page 1387.
n WRITE OPEN OBJECT — “Example 8: Exporting Multiple SAS Data Sets to a
JSON File” on page 1398.
Default SASTAGS
Note To modify the format of the exported observations, use the “KEYS |
NOKEYS” option.
SCAN | NOSCAN
determines whether PROC JSON scans and encodes input strings to ensure that
only characters that are acceptable are exported to the JSON output file.
Default SCAN
TRIMBLANKS | NOTRIMBLANKS
determines whether to remove or retain trailing blanks from the end of
character data in the JSON output. Only space characters are removed.
Default TRIMBLANKS
1368 Chapter 38 / JSON Procedure
EXPORT Statement
Identifies the SAS data set to be exported and controls the resulting output.
Alias: EX
Interaction: If the EXPORT statement is the first statement after the PROC JSON statement,
the top-level container is a JSON object. However, if the NOSASTAGS option is
specified in either the PROC JSON or the EXPORT statement, the top-level
container is a JSON array. PROC JSON automatically closes the implicitly opened
top-level container.
Notes: You can export multiple SAS data sets to the JSON output file by submitting
multiple EXPORT statements.
The resulting JSON output uses the Unicode encoding form UTF-8. You cannot
override the encoding in the output file with the ENCODING= data set option in the
EXPORT statement. For information about overriding the encoding, see “JSON
Output File Encoding” on page 1362.
Examples: “Example 1: Exporting a JSON File Using Default Options” on page 1380
“Example 2: Exporting Data to a Top-Level JSON Array Container” on page 1381
“Example 3: Suppressing SAS Variable Names in Observation Data When Using the
EXPORT Statement” on page 1382
“Example 5: Writing Values and Exporting Data in the Same Program” on page 1387
“Example 6: Writing Values and Controlling Containers in Exported Data” on page
1391
“Example 7: Applying SAS Formats to the Resulting Output” on page 1395
“Example 8: Exporting Multiple SAS Data Sets to a JSON File” on page 1398
Syntax
EXPORT <libref.>SAS-data-set <(SAS-data-set-options)> </options>;
Required Argument
<libref.>SAS-data-set
identifies the SAS data set to be exported with either a one- or two-level SAS
name (library and member name). If you specify a one-level name, by default,
the JSON procedure uses either the User library (if assigned) or the Work
library.
Optional Arguments
(SAS-data-set-options)
specifies SAS data set options that apply to the input SAS data set. For
example, if the data set that you are exporting has an assigned password, you
can use the ALTER=, PW=, READ=, or WRITE= data set options. To export a
subset of data that meets a specified condition, you can use the WHERE=
option. For information about SAS data set options, see SAS Data Set Options:
Reference.
FMTCHARACTER | NOFMTCHARACTER
determines whether to apply a character SAS format to the resulting output if a
character SAS format is associated with a SAS data set variable.
Default NOFMTCHARACTER
FMTDATETIME | NOFMTDATETIME
determines whether to apply a date, time, or datetime SAS format to the
resulting output if a date, time, or datetime SAS format is associated with a SAS
data set variable. Applying a SAS format makes the date and time values in the
resulting JSON output more readable.
Default FMTDATETIME
FMTNUMERIC | NOFMTNUMERIC
determines whether to apply a numeric SAS format to the resulting output if a
numeric SAS format is associated with a SAS data set variable.
Default NOFMTNUMERIC
Restriction Only the SAS formats BESTw., Ew., and w.d write a JSON number
to the output file. All other numeric SAS formats result in a JSON
string.
Requirement FMTNUMERIC applies numeric SAS formats. For date, time, and
datetime SAS formats, use the FMTDATETIME option.
KEYS | NOKEYS
determines whether exported observations are written as JSON objects or as
JSON arrays.
A JSON object stores SAS variable values from observations as key-value pairs.
The variable names are the keys. A JSON array stores variable values only.
Default KEYS
EXPORT Statement 1371
Interaction You can specify NOKEYS in the PROC JSON statement, the
EXPORT statement, or both. If the option is specified in both
statements, the EXPORT statement specification takes precedence.
SASTAGS | NOSASTAGS
determines whether to include or suppress SAS metadata when using the
EXPORT statement. The metadata consists of the SAS export version, exported
SAS data set name, and any non-default option specification, such as PRETTY.
When EXPORT is the first statement in the procedure request, the top-level
container for the exported data is a JSON object container. The SAS metadata
precedes the exported data. When NOSASTAGS is specified, the top-level
container is a JSON array container and the SAS metadata is suppressed.
See the following topics for information about what happens when the EXPORT
statement is preceded by a WRITE OPEN statement in the PROC JSON request:
n WRITE OPEN ARRAY — “Example 5: Writing Values and Exporting Data in
the Same Program” on page 1387.
n WRITE OPEN OBJECT — “Example 8: Exporting Multiple SAS Data Sets to a
JSON File” on page 1398.
Default SASTAGS
Note To modify the format of the exported observations, use the “KEYS |
NOKEYS” option.
SCAN | NOSCAN
determines whether PROC JSON scans and encodes input strings to ensure that
only characters that are acceptable are exported to the JSON output.
Default SCAN
1372 Chapter 38 / JSON Procedure
TABLENAME="name"
specifies a name for the exported SAS data set. The name is exported as SAS
metadata in the JSON output file. Enclose the name in single or double
quotation marks.
TRIMBLANKS | NOTRIMBLANKS
determines whether to remove or retain trailing blanks from the end of
character data in the JSON output. Only space characters are removed.
Default TRIMBLANKS
Alias: WV
Interaction: If the WRITE VALUES statement is the first statement after the PROC JSON
statement, PROC JSON opens a JSON object as the top-level container. PROC
JSON automatically closes the implicitly opened top-level container.
Note: Specifying multiple values in one WRITE VALUES statement is equivalent to
submitting multiple WRITE VALUES statements with only one value each. Only the
order of the values is significant.
Examples: “Example 4: Writing JSON Output without Exporting a SAS Data Set” on page 1384
“Example 5: Writing Values and Exporting Data in the Same Program” on page 1387
“Example 6: Writing Values and Controlling Containers in Exported Data” on page
1391
WRITE VALUES Statement 1373
“Example 8: Exporting Multiple SAS Data Sets to a JSON File” on page 1398
Syntax
WRITE VALUES value(s) </options>;
Required Argument
value(s)
specifies one or more values to write to the JSON output file. Separate values
with a blank space. A value can be one of the following:
n a string, which can be enclosed in single or double quotation marks. If the
string is enclosed in quotation marks, there are no restrictions regarding
content or length. However, if the string is not enclosed in quotation marks,
the following rules apply:
o The length of the string cannot exceed 256 bytes.
o The first character must begin with a letter of the Latin alphabet (A–Z, a–z)
or the underscore. Subsequent characters can be letters of the Latin
alphabet, numerals, or underscores.
o The string cannot contain blanks or special characters except for the
underscore. A string can contain mixed-case letters.
n a number represented in integer, floating point, or exponential format.
n NULL | N.
For example, the statement write values "success" true; causes the
following results to be written to the JSON output file:
n "success": true if the current container is a JSON object
Requirement When writing values to a JSON object container, the name portion
of the key-value pair must be a string. For example, the statement
1374 Chapter 38 / JSON Procedure
Optional Arguments
SCAN | NOSCAN
determines whether PROC JSON scans and encodes input strings to ensure that
only characters that are acceptable are exported to the JSON output.
Default SCAN
TRIMBLANKS | NOTRIMBLANKS
determines whether to remove or retain trailing blanks from the end of
character data in the JSON output. Only space characters are removed.
Default TRIMBLANKS
Alias: WO
Interactions: If the WRITE OPEN statement is the first statement after the PROC JSON
statement, the WRITE OPEN statement establishes the top-level container.
Submit the WRITE CLOSE statement for containers that you explicitly open with
the WRITE OPEN statement.
Examples: “Example 6: Writing Values and Controlling Containers in Exported Data” on page
1391
“Example 8: Exporting Multiple SAS Data Sets to a JSON File” on page 1398
WRITE CLOSE Statement 1375
Syntax
WRITE OPEN type;
Required Argument
type
specifies the type of JSON container:
ARRAY
specifies that the JSON container is an array, which collects a list of values.
An example statement is write open array;.
Alias A
OBJECT
specifies that the JSON container is an object, which collects key-value
pairs. An example statement is write open object;.
Alias O
Alias: WC
Restriction: The WRITE CLOSE statement cannot be the first statement after the PROC JSON
statement.
Interaction: The WRITE CLOSE statement closes the most recently opened container of either
type that was explicitly opened with the WRITE OPEN statement. You should
submit the WRITE CLOSE statement for containers only if you explicitly opened
the container with the WRITE OPEN statement.
Examples: “Example 6: Writing Values and Controlling Containers in Exported Data” on page
1391
“Example 8: Exporting Multiple SAS Data Sets to a JSON File” on page 1398
Syntax
WRITE CLOSE;
1376 Chapter 38 / JSON Procedure
Exporting Data
The JSON procedure enables you to export one or more SAS data sets to a JSON
output file. In the procedure statements, you specify the name of the JSON output
file and the name of the SAS data set to be exported. For example, the following
PROC JSON code writes the data in a SAS data set named Sashelp.Class to a JSON
output file named Output.json.
proc json out="C:\Users\sasabc\JSON\Output.json";
export sashelp.class;
run;
When the WRITE VALUES statement is the first statement specified under the
PROC JSON statement, the statement implicitly creates a JSON object container to
collect the specified values. In this way, you can use the WRITE VALUES statement
in PROC JSON to create your own JSON output file without exporting a SAS data
set. SAS variable values are collected in this object container as key and value
pairs. For example:
proc json out="example.json";
write values "container" "object";
write values "created" "implicitly";
run;
Usage: JSON Procedure 1377
{"container":"object","created":"implicitly"}
The WRITE VALUES statement cannot directly add values to a JSON container that
contains data from an exported SAS data set. The EXPORT statement implicitly
opens and automatically closes a JSON object. You cannot write values to a closed
JSON container with the WRITE VALUES statement. To add values to a JSON
object that includes exported data, you must precede the WRITE VALUES
statement with a WRITE OPEN statement.
The WRITE OPEN statement explicitly opens a JSON container. It also enables you
to specify the JSON container type: object or array.
When the WRITE VALUES statement follows a WRITE OPEN OBJECT statement,
then the first value in the pair is considered to be a key name. The second value is
the key’s value. For example:
proc json out="example.json"
write open object;
write values "container" "object";
write values "created" "explicitly";
write close;
run;
{"container":"object","created":"explicitly"}
When the WRITE VALUES statement follows a WRITE OPEN ARRAY statement,
then the WRITE VALUES statement creates a comma-separated list of values in
the JSON array container.
proc json out="example.out";
write open array;
write values "container" "array";
write values "created" "explictly";
write close;
run;
["container","array","created","explictly"]
Use the WRITE OPEN statement when you want to add information to the JSON
file that contains exported SAS data and when you want to nest JSON containers.
For more information, see “Example 4: Writing JSON Output without Exporting a
SAS Data Set” on page 1384, “Example 5: Writing Values and Exporting Data in the
Same Program” on page 1387, “Example 6: Writing Values and Controlling
Containers in Exported Data” on page 1391, and “Example 8: Exporting Multiple
SAS Data Sets to a JSON File” on page 1398.
The following table lists the options and whether they are supported in a
statement.
1378 Chapter 38 / JSON Procedure
Table 38.2 Statement Option Availability in the PROC JSON, EXPORT, and WRITE VALUES
Statements
determines
whether the top-
level JSON
container is an
object container or
an array container.
Note: Options specified in the PROC JSON statement apply for the duration of the
procedure. Options specified in the EXPORT and WRITE VALUES statement apply
only to that statement. If options are specified in more than one statement, the
options specified in the EXPORT statement or WRITE VALUES statement take
precedence.
Details
This PROC JSON example exports a subset of the Sashelp.Class data set to a JSON
output file. The PROC JSON PRETTY procedure option is specified to return the
output in a format that uses indention to illustrate the default JSON container
structure. Otherwise, the default export options are enabled.
Program
Specify the JSON output file and the EXPORT statement. The PROC JSON
statement specifies the physical location of the JSON output file with the complete
pathname and JSON filename. When the .json extension is omitted, the output is
written to a text file.
proc json out="C:\Users\sasabc\JSON\DefaultOutput.json" pretty;
export sashelp.class (where=(age=11));
run;
values in each object container are key-value pairs. The set of observations is in a
JSON array container ([ ] ). SAS metadata is included in the top of the top-level
container.
{
"SASJSONExport": "1.0 PRETTY",
"SASTableData+CLASS": [
{
"Name": "Joyce",
"Sex": "F",
"Age": 11,
"Height": 51.3,
"Weight": 50.5
},
{
"Name": "Thomas",
"Sex": "M",
"Age": 11,
"Height": 57.5,
"Weight": 85
}
]
}
Details
This example exports a subset of the Sashelp.Class data set to a JSON output file
and specifies to write the exported data to a JSON array container.
Program
proc json out="C:\Users\sasabc\JSON\NosastagsOutput.json" pretty;
export sashelp.class (where=(age=11)) / nosastags;
run;
1382 Chapter 38 / JSON Procedure
Program Description
Specify the JSON output file. The PROC JSON statement specifies the physical
location of the JSON output file with the complete pathname and a JSON filename.
The PRETTY option creates a more readable format.
proc json out="C:\Users\sasabc\JSON\NosastagsOutput.json" pretty;
Identify the SAS data set to be exported and control the top-level JSON
container. NOSASTAGS in the EXPORT statement suppresses SAS metadata and
specifies that the top-level container is an array container. The NOSASTAGS option
can be specified on the PROC JSON statement or on the EXPORT statement.
export sashelp.class (where=(age=11)) / nosastags;
run;
[
{
"Name": "Joyce",
"Sex": "F",
"Age": 11,
"Height": 51.3,
"Weight": 50.5
},
{
"Name": "Thomas",
"Sex": "M",
"Age": 11,
"Height": 57.5,
"Weight": 85
}
]
Details
This example exports a subset of the Sashelp.Class data set to a JSON output file
and specifies to write the exported data set observations as an array. Array
containers store variable values only.
Program
proc json out="C:\Users\sasabc\JSON\NokeysOutput.json" pretty;
export sashelp.class (where=(age=11)) / nokeys;
run;
Program Description
Specify the JSON output file. The PROC JSON statement specifies the physical
location of the JSON output file with the complete pathname and a JSON filename.
The PRETTY option creates a more readable format.
proc json out="C:\Users\sasabc\JSON\NokeysOutput.json" pretty;
Identify the SAS data set to be exported and control the observation container.
NOKEYS in the EXPORT statement writes data set observations as values in an
array container instead of to object containers. The NOKEYS option can be
specified on the PROC JSON statement or on the EXPORT statement.
export sashelp.class (where=(age=11)) / nokeys;
run;
{
"SASJSONExport": "1.0 NOKEYS PRETTY",
"SASTableData+CLASS": [
[
"Joyce",
"F",
11,
51.3,
50.5
],
[
"Thomas",
"M",
11,
57.5,
85
]
]
}
Details
This PROC JSON example illustrates how to write JSON output from SAS without
exporting data from a SAS data set. This gives you complete control over the
content of the JSON output, which enables you to produce arbitrary JSON output.
Program
proc json out="C:\Users\sasabc\JSON\WriteOpenOutput.json" pretty;
write open object;
write values "Nested object sample";
write open object;
write values "Comment" "In a nested object";
Example 4: Writing JSON Output without Exporting a SAS Data Set 1385
write close;
write values "Nested array sample";
write open array;
write open array;
write values "In a nested array";
write values 1 true null;
write close;
write close;
write values "Finished" "End of samples";
write close;
run;
Program Description
Specify the JSON output file. The PRETTY option creates the output in a more
readable format.
proc json out="C:\Users\sasabc\JSON\WriteOpenOutput.json" pretty;
Open a JSON object container and write a value. The WRITE OPEN OBJECT
statement explicitly opens a JSON object container ({ }) as the top-level container.
The WRITE VALUES statement writes a string to the object container.
write open object;
write values "Nested object sample";
Nest an object container with values, and then close the nested object container.
The WRITE OPEN OBJECT statement explicitly opens a second object container in
the top-level container. The WRITE VALUES statement writes two strings in the
second-level object container. The WRITE CLOSE statement closes the most
recently opened container.
write open object;
write values "Comment" "In a nested object";
write close;
Write a value to the JSON output file, open an array container, nest an array
container with values, and then close the two array containers. The WRITE
VALUES statement writes a string in the top-level container. The WRITE OPEN
ARRAY statements create an array container in the top-level container, and then
nests a second array container in the previously opened array container. The
WRITE VALUES statements write a string, a number, the Boolean value TRUE, and
the NULL keyword to the second-level array container. Two WRITE CLOSE
statements close the array containers.
write values "Nested array sample";
write open array;
write open array;
write values "In a nested array";
write values 1 true null;
write close;
write close;
1386 Chapter 38 / JSON Procedure
Write a final value. The WRITE VALUES statement writes two strings to the top-
level container.
write values "Finished" "End of samples";
Close the top-level container. The WRITE CLOSE statement closes the remaining
open container, which is the top-level container.
write close;
run;
{
"Nested object sample": {
"Comment": "In a nested object"
},
"Nested array sample": [
[
"In a nested array",
1,
true,
null
]
],
"Finished": "End of samples"
}
Example 5: Writing Values and Exporting Data in the Same Program 1387
Details
This example writes values and performs three exports of the same data to show
the behavior of the default export settings, the NOSASTAGS option, and the
NOKEYS option. The exported data is inserted into a top-level array container, to
show the effect of NOSASTAGS on the exported data.
Program
proc json out="C:\Users\sasabc\JSON\WriteAndExport.json" pretty;
write open array;
write values "level" 1;
write values "container" "array";
write values "created" "explicitly";
Program Description
Specify the JSON output file in the PROC JSON statement. The PRETTY option
creates the output in a more readable format.
proc json out="C:\Users\sasabc\JSON\WriteAndExport.json" pretty;
Open a top-level JSON array container and specify values. The WRITE OPEN
ARRAY statement explicitly opens a JSON array container ([ ]) as the top-level
container. WRITE VALUES statements specify a string, a numeric value, and
additional strings as values for the array.
write open array;
write values "level" 1;
write values "container" "array";
write values "created" "explicitly";
[
"level",
1,
"container",
"array",
"created",
"explicitly",
"Default Export",
"SASJSONExport",
"1.0 PRETTY",
"SASTableData+CLASS",
[
{
"Name": "Joyce",
"Sex": "F",
"Age": 11,
"Height": 51.3,
"Weight": 50.5
},
{
"Name": "Thomas",
"Sex": "M",
"Age": 11,
"Height": 57.5,
"Weight": 85
}
],
"Export with NOSASTAGS Option",
{
"Name": "Joyce",
"Sex": "F",
"Age": 11,
"Height": 51.3,
"Weight": 50.5
},
{
"Name": "Thomas",
"Sex": "M",
"Age": 11,
"Height": 57.5,
"Weight": 85
},
"Export with NOSASTAGS and NOKEYS Option",
[
"Joyce",
"F",
11,
51.3,
50.5
],
[
"Thomas",
"M",
11,
57.5,
85
]
]
Example 6: Writing Values and Controlling Containers in Exported Data 1391
Details
This PROC JSON example illustrates how to write additional values to a JSON
output file and control and nest JSON containers. The example exports a subset of
the Sashelp.Cars data set to the JSON output file.
Program
%let vehicleType=Truck;
%let minCost=26000;
proc json out="C:\Users\sasabc\JSON\WriteOpenArrayOutput.json"
nosastags pretty;
write open array;
write values "Vehicles";
write open array;
write values "&vehicleType";
write open array;
write values "Greater than $&minCost";
/*********** Asian ***********************/
%let originator=Asia;
write open object;
write values "&originator";
write open array;
export sashelp.cars(where=((origin = "&originator") and
(type = "&vehicleType") and
(MSRP > &minCost) )
keep=make model type origin MSRP);
write close; /* data values */
write close; /* Asia */
/*********** European ***********************/
1392 Chapter 38 / JSON Procedure
%let originator=Europe;
write open object;
write values "&originator";
write open array;
export sashelp.cars(where=((origin = "&originator") and
(type = "&vehicleType") and
(MSRP > &minCost) )
keep=make model type origin MSRP);
write close; /* data values */
write close; /* Europe */
/*********** American ***********************/
%let originator=USA;
write open object;
write values "&originator";
write open array;
export sashelp.cars(where=((origin = "&originator") and
(type = "&vehicleType") and
(MSRP > &minCost) )
keep=make model type origin MSRP);
write close; /* data values */
write close; /* USA */
write close; /* expensive */
write close; /* vehicleType */
write close; /* cars */
run;
Program Description
Assign macro variables. The %LET statements create macro variables and assign
values to be used throughout the code.
%let vehicleType=Truck;
%let minCost=26000;
Specify the JSON output file and control the resulting output. The PROC JSON
statement specifies the physical location of the JSON output file with the complete
pathname and filename. The NOSASTAGS option suppresses the SAS metadata
and the PRETTY option creates a more readable format.
proc json out="C:\Users\sasabc\JSON\WriteOpenArrayOutput.json"
nosastags pretty;
[
"Vehicles",
[
"Truck",
[
"Greater than $26000",
{
"Asia": [
{
"Make": "Nissan",
"Model": " Titan King Cab XE",
"Type": "Truck",
"Origin": "Asia",
"MSRP": 26650
}
]
},
{
"Europe": [
]
},
{
"USA": [
{
"Make": "Cadillac",
"Model": " Escalade EXT",
"Type": "Truck",
"Origin": "USA",
"MSRP": 52975
},
{
"Make": "Chevrolet",
"Model": " Avalanche 1500",
"Type": "Truck",
"Origin": "USA",
"MSRP": 36100
},
{
"Make": "Chevrolet",
"Model": " Silverado SS",
"Type": "Truck",
"Origin": "USA",
"MSRP": 40340
},
{
"Make": "Chevrolet",
"Model": " SSR",
"Type": "Truck",
"Origin": "USA",
"MSRP": 41995
},
Example 7: Applying SAS Formats to the Resulting Output 1395
{
"Make": "Ford",
"Model": " F-150 Supercab Lariat",
"Type": "Truck",
"Origin": "USA",
"MSRP": 33540
},
{
"Make": "GMC",
"Model": " Sierra HD 2500",
"Type": "Truck",
"Origin": "USA",
"MSRP": 29322
}
]
}
]
]
]
Details
This PROC JSON example exports a SAS data set named Work.Formats that
contains variables with associated SAS formats. The resulting JSON output file
applies the SAS formats, which makes the output values more readable.
Program
data formats;
input name $ idnumber $ salary hiredate mmddyy10.;
format salary dollar7. hiredate date9.;
datalines;
Brad 0755 21163 9/24/2012
Lindzey 0767 34321 9/04/2012
;
1396 Chapter 38 / JSON Procedure
Program Description
Create a SAS data set with variables that have associated SAS formats. The DATA
step creates a SAS data set named Work.Formats with four variables. The FORMAT
statement associates the DOLLAR7. SAS numeric format with the variable Salary
and the SAS date format DATE9. with the variable HireDate.
data formats;
input name $ idnumber $ salary hiredate mmddyy10.;
format salary dollar7. hiredate date9.;
datalines;
Brad 0755 21163 9/24/2012
Lindzey 0767 34321 9/04/2012
;
Specify the JSON output file and control the resulting output file. The PROC
JSON statement specifies the physical location of the JSON output file with the
complete pathname and filename and includes the PRETTY option to create a more
readable format.
proc json out="C:\Users\sasabc\JSON\FormatsOutput.json" pretty;
Identify the SAS data set to be exported. The EXPORT statement specifies the
SAS data set name. The FMTDATETIME option is available by default to apply the
date SAS format that is associated with the HireDate variable. The FMTNUMERIC
option is specified to apply the numeric SAS format DOLLAR7. that is associated
with the Salary variable.
export work.formats / fmtnumeric;
run;
Example 7: Applying SAS Formats to the Resulting Output 1397
{
"SASJSONExport": "1.0 PRETTY FMTNUMERIC",
"SASTableData+FORMATS": [
{
"name": "Brad",
"idnumber": "0755",
"salary": "$21,163",
"hiredate": "24SEP2012"
},
{
"name": "Lindzey",
"idnumber": "0767",
"salary": "$34,321",
"hiredate": "04SEP2012"
}
]
}
If the EXPORT statement specified the option NOFMTDATETIME and did not
specify the option FMTNUMERIC, the resulting JSON output file would include
Salary and HireDate values that are less readable.
{
"SASJSONExport": "1.0 PRETTY NOFMTDATETIME",
"SASTableData+FORMATS": [
{
"name": "Brad",
"idnumber": "0755",
"salary": 21163,
"hiredate": 19260
},
{
"name": "Lindzey",
"idnumber": "0767",
"salary": 34321,
"hiredate": 19240
}
]
}
1398 Chapter 38 / JSON Procedure
Details
This PROC JSON example exports two SAS data sets to a JSON output file. The
SasHelp.Class data set contains student information such as their names, ages, and
gender. The MyFiles.Fitness data set contains the student’s fitness achievements,
such as the number of crunches and push-ups that they can do. The example also
illustrates how to control and nest JSON containers and write additional values to
the JSON output file.
n A subset of two SAS data sets is exported. The values for the first SAS data set
are exported as nested array containers and consist of a list of values. The
values for the second SAS data set are exported as nested object containers
and consist of key-value pairs.
n The explicitly opened JSON containers are closed.
Example 8: Exporting Multiple SAS Data Sets to a JSON File 1399
Program
Program Description
Specify the JSON output file and control the resulting output. The PROC JSON
statement specifies the physical location of the JSON output file with the complete
pathname and filename. The PRETTY option creates a more readable format. The
NOKEYS option specifies to write the exported observations in array containers.
Open and nest labeled containers. The statements open a series of nested
containers and write values as labels for the containers.
Identify the first SAS data set to be exported. The EXPORT statement specifies
the two-level SAS name. The WHERE= data set option specifies conditions for
selecting observations. The DROP= data set option excludes the specified
variables from being written to the output file. The selected observations are
exported as nested array containers and consist of a list of values.
1400 Chapter 38 / JSON Procedure
Close the two array containers. The two WRITE CLOSE statements close the two
array containers. Note that when you explicitly open a container, you must
explicitly close it.
Label and open a nested array container. The WRITE VALUE statement nests the
user-defined string Results in the top-level container. The WRITE OPEN ARRAY
statement opens a nested array container.
Identify the second SAS data set to be exported. The EXPORT statement specifies
the two-level SAS name. The WHERE= data set option specifies conditions for
selecting observations. The DROP= data set option excludes the specified variable
from being written to the output file. The KEYS option causes the selected
observations to be exported as nested object containers and consist of key-value
pairs. Note that for the EXPORT statement, the KEYS option overrides the
NOKEYS option that is specified in the PROC JSON statement.
Close the two open containers. Two WRITE CLOSE statements explicitly close the
array container and the object container for the top-level container.
{
"Fitness": [
"Class List",
[
[
"Joyce",
"F",
11
],
[
"Thomas",
"M",
11
]
]
],
"Results": [
{
"Age": 11,
"Push-ups": 15,
"Crunches": 20
},
{
"Age": 11,
"Push-ups": 22,
"Crunches": 33
}
]
}
1402 Chapter 38 / JSON Procedure
1403
39
LUA Procedure
The LUA procedure enables you to run statements from the Lua programming
language within SAS code. You can submit Lua statements from an external Lua
script, or enter Lua statements directly in SAS code.
Note: Support for the LUA procedure was added in SAS 9.4M3.
Note: Support for calling CAS actions was added in SAS 9.4M5.
Note: Support for VARCHAR data was added in SAS Viya 3.3.
Concepts: LUA Procedure 1405
You run the code from the scripts by specifying the following information:
n the location of the Lua script
You specify the location of your Lua scripts by using the FILENAME statement to
define the LUAPATH fileref. To define a single location, enter a SAS statement that
is similar to this one:
filename LUAPATH "/usr/local/scripts/lua";
You can specify more than one location for your Lua scripts. The system searches
for the input Lua script from the locations that are listed, in the order in which you
specify them. One of these locations must contain the Lua system scripts. To
specify more than one location, enclose the list in parentheses and separate values
with a comma:
filename LUAPATH ("/usr/local/scripts/lua","/user/my_name/my_lua_scripts");
Using a single or double quotation mark in your path might cause unexpected
results. If there are quotation marks or apostrophes, such as Mark’s dir, use the
LUA_PATH environment variable. For more information, see “Using the LUA_PATH
Environment Variable with Special Characters”.
Note: If you change the value of LUAPATH during a SAS session, call the LUA
procedure and specify the RESTART option. The RESTART option resets the status
of Lua in SAS and picks up the last value for LUAPATH.
You specify the name of the Lua script to execute by using the INFILE= option in
the PROC LUA statement. For more information, see “Example 2: Specifying Input
from an External Lua Script” on page 1427.
1406 Chapter 39 / LUA Procedure
Note: To create a precompiled *.luc file, you must have Lua 5.2 or higher. See your
Lua documentation about how to use the Lua compiler.
To run Lua files at invocation, use the -SYSIN option. For example, to run the file
abc.lua, submit this command:
sas -sysin abc.lua
You can also run external Lua scripts (*.lua or *.luc files) by using the %INCLUDE
statement in a SAS session. For example, to run the Lua script abc.luc, enter the
following line in your SAS program:
%include "./tmp/abc.luc";
Concepts: LUA Procedure 1407
n your user ID, which must have permission to access the CAS server
The following resources must also be in place. These files are typically installed
automatically as part of your SAS Viya installation:
n middleclass.lua
n swat.lua
n tkluaswat.so
For more information, see “Requirements” in Getting Started with SAS Viya for Lua.
To connect to the CAS server and run Lua code, you must load the appropriate
settings and utilities in your SAS program. You must also load the SAS Scripting
Wrapper for Analytics Transfer (SWAT) library. For more information, see “Example
13: Connecting to the CAS Server” on page 1453.
Lua-var:HELP{}
requests the list of available CAS actions for the CAS session that is associated
with the Lua variable. For example, you might specify s:help{} if you have
assigned your CAS session to the Lua variable s.
1408 Chapter 39 / LUA Procedure
Assign the result of CAS.OPEN to a Lua variable that represents the CAS
session:
s = cas.open("host.mycompany.com", 5570, "myuserid")
After you establish the connection, you can refer to the attributes of s, such as
s.port or s.hostname.
Lua-var:SHUTDOWN{}
ends the CAS session that is associated with the Lua variable and closes the
connection to the CAS server. For example, to close the CAS session
represented by the Lua variable s, use this command:
s:shutdown{}
For more information, see “Determining Where the DATA Step Is Running” in SAS
Cloud Analytic Services: DATA Step Programming and the documentation for an
action, procedure, or function to ensure that VARCHAR data is supported.
Note: Functions that run only in the DATA step, such as ADDRLONG, do not run in
Lua code.
If you do not supply a value for a required argument in a SAS function, then the SAS
function converts that argument to a missing value.
Concepts: LUA Procedure 1409
When substitutions are provided, they are assigned first. Any remaining
substitutions are supplied by local variables.
If no substitution values are submitted, then local variables from the calling
function environment are used. Enclose variables for substitution within @
symbols. For example, to assign the value of a variable called userID to a
variable user, issue the following assignment:
user = @userID@
Here is an example that shows the use of a substitution with the call to
SAS.SUBMIT.
sas.submit([[
data @name@;
x = @x@;
run;
]],{name="foo"})
When you submit SAS code, do not include these keywords within SAS
comments:
run; %macro
quit; %mend
The presence of these keywords within comments in a SAS code block might
trigger warnings or errors in the SAS log and might cause unexpected results. To
avoid this situation, modify the keywords by placing blank spaces or escape
characters within them. For example, run ; or \%\macro within a comment
would not trigger unexpected behavior.
The return value from SAS.SUBMIT is assigned to the value of the SYSERR
automatic macro variable.
1410 Chapter 39 / LUA Procedure
If no substitution values are submitted, then local variables from the calling
function environment are used. Enclose variables for substitution within @
symbols.
For more information, see “Opening a SAS Data Set within Lua Code” on page 1425.
In SAS, there are many numeric functions that return a value of 1 if a condition is
true and 0 if that condition is false. Examples of these functions include SYMEXIST,
SYMLOCAL, SYMGLOBAL, and MISSING. Other SAS functions, such as the ANY*
character functions, search a string for given characters and return the position of
the first occurrence in the string. These functions return 0 if the characters are not
found.
If you call any of these SAS functions in your Lua code, make sure that your Lua
code interprets the results as intended.
For example, suppose that the macro variable FOOBAR is not defined. Therefore, it
does not exist in a local symbol table. The following code incorrectly states that
the variable does exist and is local.
/* INCORRECT use of SAS Boolean functions */
proc lua;
submit;
if sas.symexist("foobar") then
if sas.symlocal("foobar") then
print("In Proc LUA, foobar exists and is LOCAL.")
else
print("In Proc LUA, foobar exists but is not LOCAL.")
end
else
print(In Proc LUA, foobar does not exist.")
end
endsubmit;
run;
The preceding code incorrectly states that FOOBAR exists and is local, because the
0 value that is returned by the SAS functions SYMEXIST and SYMLOCAL are
interpreted as true in Lua.
Instead, when you call SAS Boolean functions, explicitly test for the desired return
value in your Lua code. The following code correctly tests to see whether the
macro variable FOOBAR exists and is local.
/* CORRECT use of SAS Boolean functions */
proc lua;
submit;
if sas.symexist("foobar") == 1 then
if sas.symlocal("foobar") == 1 then
print("In Proc LUA, foobar exists and is LOCAL.")
else
print("In Proc LUA, foobar exists but is not LOCAL.")
end
else
print("In Proc LUA, foobar does not exist.")
end
endsubmit;
run;
The preceding code correctly states that the macro variable FOOBAR does not
exist and is not local, because it was never defined. The return value of 0 from the
SAS function SYMEXIST is now explicitly compared to the value 1.
1412 Chapter 39 / LUA Procedure
For small data sets, you can read an entire SAS data set into a Lua table. You can
then modify or query the Lua table, and you can write changes to a new SAS data
set. However, for large data sets, it is more efficient to process the data one
observation at a time, as you would with a DATA step. For this reason, you should
submit a DATA step within your Lua code for large data sets.
Here are the data set functions that run within the LUA procedure:
SAS.ADD_VARS(data-set-ID, variable-definitions)
adds the specified Lua variables to a new data set. That is, use SAS.ADD_VARS
only when you open a data set using SAS.OPEN with mode ‘o’.
{ {name="varname", type="N|C",
format="format.", length="value",
label="varlabel", informat="informat."},
Only the variable name is required. The default variable type is numeric (N) if
the Lua variable with the same name is also numeric. The default length of
character variables is 200.
You can specify full SAS formats, such as format="best12.3". If you supply a
format that includes a period ( . ), then the system expects that the format is
complete and additional format attributes are ignored. If you do not supply a full
SAS format, you can use a combination of these variable attributes to provide
the format of a variable:
n FORMAT specifies no length nor decimal specification (for example,
format="best")
n FORMAT_WIDTH specifies the number of characters or digits (for example,
format_width=12)
n FORMAT_DEC specifies the number of decimal places (for example,
format_dec=3)
For more information, see “Example 12: Defining and Adding Variables to a Data
Set” on page 1452.
Concepts: LUA Procedure 1413
SAS.ATTR(data-set-ID, attribute-name)
returns the value of attribute-name for a data set. For example, sas.attr(data-
set-ID, 'label') returns the label for the specified data set.
SAS.CLOSE(data-set-ID)
closes an open data set. This function exits if the data set ID is not valid.
SAS.EXISTS(SAS-data-set-name)
returns the Boolean value true if the data set exists or false if the data set
does not exist.
Note: This function is different from the standard SAS function EXIST (with no
‘s’ as the end), which returns a 0 or 1 value. In Lua code, 0 and 1 are interpreted
as true. As an alternative, test whether the condition
(sas.exist("work.test") > 0) is true. The SAS.EXISTS function is available
only within Lua code blocks.
SAS.NOBS(data-set-ID)
returns the number of observations in a data set.
SAS.NVARS(data-set-ID)
returns the number of variables in a data set.
SAS.OPEN(SAS-data-set-name<, mode>)
opens a SAS data set and returns a data set ID if the data set opens
successfully. If the data set does not open, then the function returns nil.
Therefore, use the SAS.EXISTS function before calling the SAS.OPEN function.
In SAS 9.4M5, support was added for specifying data set options, such as
KEEP=, DROP=, or WHERE=. For example, you can open the data set
Sashelp.Class and keep only the variable Age with the following code:
local dsid = sas.open('sashelp.class(keep=age)')
Valid data set modes are I (for reading), O (for creating), and U (for updating). If
you do not supply a value for data set mode, then the default value of I is used.
SAS.READ_DS(SAS-data-set-name)
returns a Lua table that contains the data from the specified SAS data set. If the
data set does not exist, the function returns nil. Therefore, use the SAS.EXISTS
function before calling the SAS.READ_DS function.
As a best practice, use the SAS.READ_DS function for small data sets only. For
large data sets, iterate over observations with the functions that process
individual observations. For more information, see “Functions That Process
Observations” on page 1414.
Alias: SAS.LOAD_DS(SAS-data-set-name)
SAS.SET_ATTR(data-set-ID, attribute-name, value)
assigns a value to an attribute of a data set.
SAS.WHERE(data-set-ID, where-clause)
applies a WHERE clause to a data set. If there is a previously existing WHERE
clause for the data set, then the specified WHERE clause is added to the
previous one. However, if the Boolean value for the optional replace-where-
clause argument is true, then the specified WHERE clause replaces a previously
existing WHERE clause.
1414 Chapter 39 / LUA Procedure
The return code from this function is an integer value. Because Lua interprets all
integer values as true, explicitly test the return value to see whether it is equal
to 0 (false in SAS).
SAS.WRITE_DS(Lua-table, SAS-data-set-name)
creates a SAS data set from a Lua table. You can specify a two-level name for
the SAS data set, such as Work.Random. The Lua table must conform to the
same structure that is returned by the SAS.READ_DS function.
SAS.APPEND(data-set-ID)
appends a newly created observation to a data set.
SAS.DELOBS(data-set-ID)
deletes the current observation in a data set.
SAS.GET_VALUE(data-set-ID, variable-number | variable-name)
returns the value of the specified variable in the current observation. Identify
the variable by its position (number) in the data set or by its name.
SAS.NEXT(data-set-ID)
moves to the next observation in a data set for processing. If you have not yet
begun processing a data set, the SAS.NEXT function moves to the first
observation in that data set. The SAS.NEXT function enables you to work
directly with a SAS data set without needing to first read the data into a Lua
table. This function is useful for large data sets.
Concepts: LUA Procedure 1415
SAS.ROWS(data-set-ID)
iterates over the observations in a data set, loads each row into a Lua table for
processing, and adds a Lua nil at the end of processing. This function is useful
for data sets with relatively few variables.
SAS.UPDATE(data-set-ID)
updates an observation with values that were added by calling the
SAS.PUT_VALUE function.
SAS.VARS(data-set-ID)
iterates over the variables in a data set.
1 SAS.APPEND.
2 SAS.PUT_VALUE. Repeat this function call until values are set for all desired
variables in an observation.
3 SAS.UPDATE.
A data set ID goes out of scope when the program is no longer able to access it. For
example, when a data set ID is defined as a local variable in a user-defined function,
the data set ID becomes out of scope at the end of that function definition. As a
best practice, close a data set before the data set ID goes out of scope.
1416 Chapter 39 / LUA Procedure
SAS.GET_MAX_SYSERR()
returns the current maximum SYSERR value that can be returned from calls to
SAS.SUBMIT without triggering an error in the SAS log.
SAS.IS_QUIET()
returns true or false, depending on the value that was set using the
SAS.SET_QUIET function. This Boolean value indicates whether SAS language
statements that are submitted to the SAS.SUBMIT function are written to the
SAS log.
SAS.SET_MAX_SYSERR(value)
specifies the maximum allowable SYSERR value that can be returned by
SAS.SUBMIT calls. If SAS returns a SYSERR value greater than the specified
value, an error is printed in the SAS log. The default is zero, and possible values
include positive integers.
SAS.SET_QUIET(value)
specifies whether SAS language statements that are submitted to the
SAS.SUBMIT function are written to the SAS log.
Possible argument values are true or false. The default value is false.
not part of the Lua programming language and cannot be used outside of PROC
LUA.
Here are some of the commonly used table functions from the Lua table library that
are available to PROC LUA.
For example, suppose you have created table T with a list of names. You can
check to see whether a text string is contained in that table and print an
appropriate message to the log.
local t = {"John", "Paul", "George", "Ringo"}
if (table.contains(t,"Ringo")) then
print ("The table contains 'Ringo'.")
else
print ("The table does not contain 'Ringo'.")
end
TABLE.SIZE(Lua-table-name)
returns the number of elements in the specified table.
TABLE.TOSTRING(Lua-table-name)
returns a formatted string representation of the specified table.
1418 Chapter 39 / LUA Procedure
For more information, see “Example 3: Loading a SAS Data Set and Viewing the
Resulting Lua Table” on page 1428 and “Example 5: Using Table Functions” on page
1434.
These string functions have been created specifically for PROC LUA. These
functions can be called only within PROC LUA.
STRING.ENDS_WITH(string1, string2)
returns a Boolean that indicates whether the end of String1 matches the value
of String2. The comparison is case sensitive. Enclose a literal text value in
quotation marks.
You can provide substitute values for the tokens results, in, and where:
string.resolve(code, {results="work.foo", in="bar", where="x > 2"})
STRING.STARTS_WITH(string1, string2)
returns a Boolean that indicates whether the beginning of String1 matches the
value of String2. The comparison is case sensitive. Enclose a literal text value in
single or double quotation marks or inside double square brackets ([[ ]]).
STRING.TRIM(string)
returns a string with whitespace characters removed from the beginning and end
of the original string value. Enclose literal text in single or double quotation
marks or inside double square brackets ([[ ]]).
For more information, see “Example 6: Using String Functions” on page 1436.
Make sure that you use a colon (:) when placing the object name before the
function that it acts upon.
Similarly, the following code block from “Example 11: Using Iterator Functions for a
Large Table” on page 1449 could be written in two ways.
-- Iterate over the rows of the data set
local i=0
while sas.next(dsid) do
i=i+1
print("OBS=" .. i)
for vname,var in pairs(vars) do
print(vname, '=', sas.get_value(dsid, vname) )
end
end
The SAS.NEXT and SAS.GET_VALUE functions can both be represented with object
syntax.
-- Iterate over the rows of the data set
local i=0
while dsid:next() do
i=i+1
1420 Chapter 39 / LUA Procedure
print("OBS=" .. i)
for vname,var in pairs(vars) do
print(vname, '=', dsid:get_value(vname) )
end
end
If a PROC FCMP function modifies one of its arguments, that argument is specified
in the OUTARGS statement. To retrieve changes to an argument in the OUTARGS
statement within Lua code, that argument must be defined as an array.
For example, consider PROC LUA code that calls the SORT procedure three times.
You might see the following output in the SAS log.
Syntax: LUA Procedure 1421
In the output, the system CPU time, 0.78 seconds, reflects the aggregate of the
calls to PROC SORT and additional CPU time that is used by PROC LUA.
run;
Statement Task
Syntax
PROC LUA <INFILE='filename'> <RESTART> <TERMINATE>;
Optional Arguments
INFILE= 'filename'
identifies a source file that contains Lua statements to run within a SAS session.
SAS expects this file to end with a .lua file extension, but do not include the
extension in the filename that you specify.
If you use the INFILE= option, then you must specify the path to
the Lua script. Define the path to your Lua scripts by providing a
value for the LUAPATH filename before the PROC LUA
statement. For more information, see “Example 2: Specifying
Input from an External Lua Script” on page 1427.
Example Specify
INFILE='open_data'
to use the code in the open_data.lua Lua script file.
RESTART
resets the state of Lua code submissions for a SAS session. The LUA procedure
is a reentrant procedure that maintains the state for Lua code across calls to the
LUA procedure. This means that global Lua variable assignments or function
definitions remain in memory until you issue the RESTART option, issue the
TERMINATE option, or end the SAS session.
You can specify RESTART at the beginning of a new block of Lua code.
ENDSUBMIT Statement 1423
TERMINATE
stops maintaining the Lua code state in memory and terminates the Lua state
when the LUA procedure completes. Subsequent calls to the LUA procedure
begin a new instance of the Lua code state.
SUBMIT Statement
Identifies the beginning of a block of Lua code. Enter Lua statements between the SUBMIT and
ENDSUBMIT statements.
Syntax
SUBMIT <'assignment(s);'>;
Optional Argument
assignment(s)
identifies one or more macro variable assignments that are passed to the block
of Lua statements. If only one assignment is listed, then the semicolon (;) within
the quotation marks is not required. SAS does not expand macro variables
within a block of Lua statements. Therefore, macro values must be passed
within the list of assignments for the SUBMIT statement.
Example To assign the value of macro variable N to the Lua variable Name,
enter the following SUBMIT statement:
SUBMIT "name=&n";
ENDSUBMIT Statement
Identifies the end of a block of Lua statements. Do not enter any other statement on the same line as
the ENDSUBMIT statement.
1424 Chapter 39 / LUA Procedure
Syntax
ENDSUBMIT;
Any filerefs, librefs, macro variables, and so on, that you define within a SUBMIT
and ENDSUBMIT block are available only within that block of code.
For example, the macro variable Mymacrovar is defined within the SUBMIT and
ENSUBMIT block in the following call to PROC LUA. However, the variable is not
defined in the %PUT statement at the end of the example.
proc lua restart;
submit;
sas.submit([[%let mymacrovar=Hi there;]])
txt = sas.symget("mymacrovar")
print(txt)
endsubmit;
run;
%put &mymacrovar;
Usage: LUA Procedure 1425
If the data set exists, then you can open the data set and read it into a new data set,
WORK.AIR. You can make any changes to the data via the submitted DATA step
that you would in a typical DATA step.
Note: Longer blocks of SAS code are typically assigned to a Lua variable.
proc lua;
submit;
sas.submit( [[ data work.air; set sashelp.air; run; ]] )
endsubmit;
run;
You can also substitute Lua variable values within the SAS code. The following
code shows simple substitutions. For more information, see “Example 9: Submitting
SAS Code with Lua Variable Substitutions” on page 1444.
proc lua;
submit;
local dest = 'work.class'
local source = 'sashelp.class'
1426 Chapter 39 / LUA Procedure
Here is the code to create a sample data set that can be accessed using PROC LUA.
data homes;
input bad loan mortdue value reason $ job $ yoj derog delinq
clage ninq clno debtinc;
datalines;
1 1100 25860 39025 HomeImp Other 10.5 0 0 94.37 1 9 .
0 4700 71855 88566 HomeImp Other 2.0 2 0 283.96 0 5 36.475
0 5500 72147 69918 HomeImp Sales 4.0 2 0 158.53 0 23 43.404
1 6400 25144 45200 HomeImp Other 25.0 0 2 128.00 4 17 .
0 7000 58114 93391 HomeImp ProfEx 6.0 0 0 200.08 1 24 31.737
1 7900 67222 75189 HomeImp Other 4.0 0 0 95.68 0 23 36.980
0 8300 54039 89301 HomeImp Other 18.0 0 0 173.19 0 28 26.267
0 8800 . 32221 DebtCon Other 0.0 0 0 276.84 0 14 24.199
0 9400 71600 99682 DebtCon Other 16.0 . . 159.04 4 16 25.607
0 10000 68807 76581 HomeImp Office 14.0 0 0 237.65 0 32 42.336
0 10200 16322 86505 DebtCon Other 8.0 0 0 259.16 0 14 21.253
0 10700 115118 124198 DebtCon Self 6.0 1 0 174.34 0 19 31.758
0 11100 148235 182053 HomeImp Office 4.0 0 0 198.81 0 55 33.539
0 11700 56441 86987 HomeImp Other 17.0 0 0 198.03 0 16 33.931
0 12100 58556 76724 HomeImp Other 3.0 0 1 234.66 1 16 37.639
0 12500 81865 101048 DebtCon Other 1.0 0 0 147.40 0 23 38.855
0 12900 18106 37881 DebtCon Mgr 8.0 0 0 134.38 0 10 20.516
0 13400 98701 129679 HomeImp ProfEx 10.0 0 0 179.13 2 32 29.549
1 13900 . . HomeImp Other 4.0 0 2 209.48 0 15 35.775
0 14400 45516 63924 DebtCon Other . 0 0 109.33 1 19 40.872
;
Example 2: Specifying Input from an External Lua Script 1427
Details
This example specifies an external Lua script and one or more possible paths to
that script by using the FILENAME statement and the INFILE= option in the PROC
LUA statement.
Program
filename LUAPATH ('/usr/local/scripts/lua','/home/user/myname/
my_scripts');
proc lua infile='my_script';
run;
Program Description
Specify the directories to search for the input Lua script. The FILENAME
statement defines the directories in which Lua scripts are stored and assigns them
to the LUAPATH fileref. Enclose a directory path within single or double quotation
marks.
filename LUAPATH ('/usr/local/scripts/lua','/home/user/myname/
my_scripts');
Execute the PROC LUA statement and specify the Lua script name. The INFILE=
option provides the name of the Lua script that contains Lua statements. In this
example, SAS executes the Lua statements in the my_script.lua file. Do not specify
the ‘.lua’ or ‘.luc’ file extension. Check the package.path variable for your system to
see whether LUA or LUC files are opened first.
proc lua infile='my_script';
run;
1428 Chapter 39 / LUA Procedure
Details
This example uses the Homes data set that you created in Example 1.
In this example, you read the SAS data set Homes and print the data to the SAS log.
Treat individual observations as an entry in an array. Each array entry contains
associated attributes that are derived from the variables in the original SAS data
set. For example, the value of t[2].loan is 4700.
Program
proc lua;
submit;
if (sas.exists("work.homes")) then
local t = sas.read_ds("work.homes")
i=1
while(t[i] ~= nil) do
print("Obs #" .. i)
for k,v in pairs(t[i]) do
print(k,v)
end
print("\n")
i = i+1
end
end
endsubmit;
run;
Program Description
Verify that the SAS data set Homes exists and read it into the Lua table T. Each
observation in the data set can be accessed as if it were in an array, such as t[i].
Example 3: Loading a SAS Data Set and Viewing the Resulting Lua Table 1429
proc lua;
submit;
if (sas.exists("work.homes")) then
local t = sas.read_ds("work.homes")
i=1
Process each observation in the data set. Initialize an iterator, i, and process each
observation by using a WHILE loop. Check to see whether the next observation
exists and then print each variable-value pair. Use a FOR loop to process each
variable name and value, printing both to the SAS log. At the end of each
observation, print a newline character and increment i.
while(t[i] ~= nil) do
print("Obs #" .. i)
for k,v in pairs(t[i]) do
print(k,v)
end
print("\n")
i = i+1
end
Obs #1
reason HomeImp
derog 0
job Other
yoj 10.5
clno 9
loan 1100
bad 1
debtinc .
mortdue 25860
value 39025
clage 94.37
ninq 1
delinq 0
...
Obs #20
reason DebtCon
derog 0
job Other
yoj .
clno 19
loan 14400
bad 0
debtinc 40.872
mortdue 45516
value 63924
clage 109.33
ninq 1
delinq 0
Details
This example creates a Lua table and then writes that table to a SAS data set. The
values in the Lua table are generated by calling the SAS RANNOR and RANUNI
random number generator functions. This code also uses Lua conventions for
processing arrays.
Program
proc lua;
submit;
local tbl = {}
for i=1,10 do
vars = {}
vars.seed = 1234 * i;
vars.randnor = sas.rannor( vars.seed )
vars.randuni = sas.ranuni( vars.seed )
vars.color = "purple"
tbl[#tbl+1] = vars
end
sas.write_ds(tbl, "work.random")
endsubmit;
run;
Program Description
Execute the PROC LUA statement and begin a block of Lua code. Declare a local
Lua array called TBL.
proc lua;
submit;
local tbl = {}
Generate the contents of the Lua table. This example generates the content for
the array TBL. The variables Seed, Randnor, Randuni, and Color are created and
assigned a value over ten iterations of a FOR loop.
for i=1,10 do
vars = {}
1432 Chapter 39 / LUA Procedure
vars.seed = 1234 * i;
vars.randnor = sas.rannor( vars.seed )
vars.randuni = sas.ranuni( vars.seed )
vars.color = "purple"
tbl[#tbl+1] = vars
end
Print the generated Lua table. The resulting Lua table is printed to the SAS log.
print("Lua table:", table.tostring(tbl))
Write the Lua table to a SAS data set. This example writes the Lua table to a SAS
data set called Random in the Work library. By saving the data set to the Work
library, it is accessible only within the same SAS session. You can save the data set
permanently by saving it to a library, such as Sasuser.
sas.write_ds(tbl, "work.random")
endsubmit;
run;
Print the SAS data set. After you have written the SAS data set, you can access it
outside of the LUA procedure. If you save the data set to a library, such as Sasuser,
then the data set is accessible in later SAS sessions.
proc print data=random;run;
Example 4: Writing a SAS Data Set from a Lua Table 1433
Details
This example creates a simple table, prints the number of elements, and prints the
table to the SAS log.
Program
proc lua;
submit;
print(table.size(t))
print("Output with function TABLE.TOSTRING")
print(table.tostring(t))
print("Output with function TABLE.CONCAT")
print(table.concat(t, ", "))
print("Sorting a Table")
table.sort(t)
print("Table with sorted values: ",table.concat(t,", "))
table.sort(t, function(a,b) return a>b end)
Example 5: Using Table Functions 1435
Program Description
Execute the PROC LUA statement. The PROC LUA statement enables you to call
Lua code within your SAS session.
proc lua;
Print information about the table. Print the number of elements in the table and
then print the table using the TABLE.TOSTRING function and the TABLE.CONCAT
function.
print(table.size(t))
print("Output with function TABLE.TOSTRING")
print(table.tostring(t))
print("Output with function TABLE.CONCAT")
print(table.concat(t, ", "))
Insert and remove items in the table. Print a label, and then insert the value "new"
into the table in the fifth position using the TABLE.INSERT function. Print the table
to see the added value. Remove the value in the fourth position in the table, "foo",
using the TABLE.REMOVE function. Print the resulting table.
Sort values in the table. Print a label, and then sort the table in ascending order
using the TABLE.SORT function. Print the resulting table to see the sorted values.
Next, sort the table again and specify a function that sorts the table in descending
order. Print the resulting table.
print("Sorting a Table")
table.sort(t)
print("Table with sorted values: ",table.concat(t,", "))
table.sort(t, function(a,b) return a>b end)
print("Table sorted in descending order: ",table.concat(t,", "))
1436 Chapter 39 / LUA Procedure
Identify the end of a block of Lua statements with the ENDSUBMIT statement.
endsubmit;
run;
Details
This example uses the STRING functions to manipulate text values.
Example 6: Using String Functions 1437
Program
proc lua;
submit;
mystring = "@noun@ @verb@ the @object@"
newstring1 = string.resolve(mystring,
{noun="Pigs",verb="eat",object="pie"})
newstring2 = string.resolve(mystring, {noun="Cars",verb="guzzle",
object="gasoline"})
t = string.split(mystring," ")
endsubmit;
run;
Program Description
Execute the PROC LUA statement and begin processing Lua statements. The
PROC LUA statement enables you to call Lua code within your SAS session. The
SUBMIT statement identifies the beginning of a block of Lua statements.
proc lua;
1438 Chapter 39 / LUA Procedure
submit;
Create a string variable and substitute different values in it. Create a string,
Mystring, that contains substitution tokens Noun, Verb, and Object. Use the
STRING.RESOLVE function to generate new strings, Newstring1 and Newstring2.
Print the original string and the new strings to the SAS log.
mystring = "@noun@ @verb@ the @object@"
newstring1 = string.resolve(mystring,
{noun="Pigs",verb="eat",object="pie"})
newstring2 = string.resolve(mystring, {noun="Cars",verb="guzzle",
object="gasoline"})
Look for matching at the beginning and end of string variables. Use the
STRING.STARTS_WITH function to check whether Newstring2 begins with “Cars”
and then print the appropriate message. Next, use the STRING.ENDS_WITH
function to check whether Newstring2 ends with “pie” and then print the
appropriate message.
if string.starts_with(newstring2, "Cars") then
print("Could be about cars: " .. newstring2)
else
print("Not about cars: " .. newstring2)
end
Trim leading and trailing whitespace characters from a string value. Assign a new
value to Mystring. Use the STRING.TRIM function to remove leading and trailing
whitespace characters. Print the original and trimmed strings for comparison.
mystring = " The fox ate the chicken. "
trimmedstring = string.trim(mystring)
end
End the block of Lua statements and run the LUA procedure. End the block of Lua
statements with the ENDSUBMIT statement.
endsubmit;
run;
Details
This example demonstrates how to add an observation using the SAS functions in
PROC LUA. First, create a simple data set, and add a record to it using the
SAS.APPEND, SAS.PUT_VALUE, and SAS.UPDATE functions. Next, retrieve and
print the values from the updated data set.
Program
data foo;
input x y;
id+1;
datalines;
1 10
2 20
3 30
;
proc lua;
submit;
dsid = sas.open("work.foo",'u')
while(sas.next(dsid) ~= nil) do
myid = sas.get_value(dsid,"id")
myx = sas.get_value(dsid,"x")
myy = sas.get_value(dsid,"y")
print("ID: " .. myid .. " x: " .. myx .. " y: " .. myy)
end
while(sas.next(dsid) ~= nil) do
myid = sas.get_value(dsid,"id")
myx = sas.get_value(dsid,"x")
myy = sas.get_value(dsid,"y")
print("ID: " .. myid .. " x: " .. myx .. " y: " .. myy)
end
sas.append(dsid)
sas.put_value(dsid,"id",4)
sas.put_value(dsid,"x",4)
sas.put_value(dsid,"y",40)
sas.update(dsid)
rc=sas.close(dsid)
endsubmit;
run;
while(sas.next(dsid) ~= nil) do
myid = sas.get_value(dsid,"id")
myx = sas.get_value(dsid,"x")
myy = sas.get_value(dsid,"y")
print("ID: " .. myid .. " x: " .. myx .. " y: " .. myy)
end
rc=sas.close(dsid)
endsubmit;
run;
Program Description
Create a data set. Use the DATA step to create a small data set called Foo.
data foo;
input x y;
id+1;
datalines;
1 10
2 20
3 30
;
Invoke PROC LUA and open the Foo data set for updating. Use the SAS.OPEN
function to assign the contents of the Foo data set to the Dsid Lua variable. By
specifying that you want to open the data set in Update mode (by using the 'u'
argument), you are able to write to the data set. If you do not specify Update mode,
the data set opens in Read-Only mode by default.
proc lua;
submit;
dsid = sas.open("work.foo",'u')
while(sas.next(dsid) ~= nil) do
myid = sas.get_value(dsid,"id")
myx = sas.get_value(dsid,"x")
myy = sas.get_value(dsid,"y")
print("ID: " .. myid .. " x: " .. myx .. " y: " .. myy)
end
Read the values in Foo and print them to the SAS log. Use the WHILE loop with
the SAS.NEXT function to iterate through observations in the data set. For each
observation, retrieve the values for Myid, Myx, and Myy from the ID, X, and Y
variables, respectively. Print these values to the SAS log.
while(sas.next(dsid) ~= nil) do
myid = sas.get_value(dsid,"id")
myx = sas.get_value(dsid,"x")
myy = sas.get_value(dsid,"y")
print("ID: " .. myid .. " x: " .. myx .. " y: " .. myy)
end
1442 Chapter 39 / LUA Procedure
Append a new observation and assign a value to each variable. Call the
SAS.APPEND function to prepare the data set for a new observation. Assign values
using the SAS.PUT_VALUE function and finally update the data set with the
SAS.UPDATE function. Close the data set.
sas.append(dsid)
sas.put_value(dsid,"id",4)
sas.put_value(dsid,"x",4)
sas.put_value(dsid,"y",40)
sas.update(dsid)
rc=sas.close(dsid)
endsubmit;
run;
Restart PROC LUA and open the Work.Foo data set. Assign the contents of
Work.Foo to the Dsid Lua variable.
proc lua restart;
submit;
dsid=sas.open("work.foo")
Retrieve the data from the Work.Foo data set. Use a WHILE loop to process
observations in the Work.Foo data set. Retrieve the values using the
SAS.GET_VALUE function, and print the values to the SAS log.
while(sas.next(dsid) ~= nil) do
myid = sas.get_value(dsid,"id")
myx = sas.get_value(dsid,"x")
myy = sas.get_value(dsid,"y")
print("ID: " .. myid .. " x: " .. myx .. " y: " .. myy)
end
ID: 1 x: 1 y: 10
ID: 2 x: 2 y: 20
ID: 3 x: 3 y: 30
NOTE: PROCEDURE LUA used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
Example 8: Using SAS Macro Variable Values within Lua Statements 1443
Output 39.9 Listing of Work.Foo Data Set Appended with New Observation
ID: 1 x: 1 y: 10
ID: 2 x: 2 y: 20
ID: 3 x: 3 y: 30
ID: 4 x: 4 y: 40
NOTE: PROCEDURE LUA used (Total process time):
real time 0.16 seconds
cpu time 0.06 seconds
Details
No macro substitution occurs between the semicolon (;) at the end of the SUBMIT
statement and the beginning of the ENDSUBMIT statement, except for those within
a SAS.SUBMIT code block. You must specify macro value assignments for Lua
commands within the SUBMIT statement. This example shows how to assign macro
variable values for use within Lua code.
Separate multiple assignments with a semicolon (;) and place all assignments
within a single pair of quotation marks. This example prints the following text to
the SAS log: Hello, George. Have a nice day.
Program
%let g='George';
%let h='Have a nice day.';
proc lua;
submit "name=&g; msg=&h";
print('Hello, ' .. name .. '. ' .. msg)
endsubmit;
run;
1444 Chapter 39 / LUA Procedure
Program Description
Define macro variable values. The macro variables G and H are assigned to
character values. The use of single quotation marks in the macro variable
assignment is required for Lua. Single quotation marks are not normally required for
macro variable assignments in SAS.
%let g='George';
%let h='Have a nice day.';
Execute the PROC LUA statement. The PROC LUA statement enables you to call
Lua code within your SAS session.
proc lua;
Identify the beginning of a block of Lua statements and assign any macro
variables. The SUBMIT statement identifies the beginning of a block of Lua
statements. This example assigns the values of the macro variables g and h to the
local Lua variables name and msg, respectively. No macro expansion occurs between
the end of the SUBMIT statement and the beginning of the ENDSUBMIT statement.
The Lua code refers to the local Lua variables.
submit "name=&g; msg=&h";
Execute Lua statements. Enter Lua statements between the SUBMIT and
ENDSUBMIT statements. Lua statements are not required to end with a semicolon
(;). This example prints text values to the SAS log. Concatenate strings using the ‘..’
operator. Each print statement that you use begins on a new line in the log.
print('Hello, ' .. name .. '. ' .. msg)
Identify the end of a block of Lua statements with the ENDSUBMIT statement.
endsubmit;
run;
Details
This example submits a block of Lua statements. Within the Lua statements, assign
a block of SAS code to a local variable. The example then invokes the SAS code
with the SAS.SUBMIT function and substitutes variable values.
The example uses the Work.Answer data set as input to the local Lua variable
Code.
Program
proc lua;
submit;
local rc
local code = [[
data sample; set answer;
where CCUID = @ccuid@;
y = @subValue@;
run;
]]
rc = sas.submit(code, {ccuid="67", subValue=72})
endsubmit;
run;
Program Description
Execute the LUA procedure.
proc lua;
Identify the beginning of the block of Lua statements to execute. Use the SUBMIT
statement to identify the Lua statements. Declare a local variable Rc. Semicolons
(;) are not required at the end of Lua statements.
submit;
local rc
Assign a block of SAS code to a local Lua variable. A block of SAS code, indicated
by the [[ and ]] brackets, is assigned to the local Lua variable Code. The SAS code
opens the WORK.ANSWER data set and keeps only records where the value of
CCUID matches the value that is specified by @ccuid@. When the SAS code is
executed, a value is assigned to the key @ccuid@. A new variable Y is assigned to
the value that is specified by @subValue@. The resulting data set is saved to the
WORK.SAMPLE data set.
local code = [[
data sample; set answer;
where CCUID = @ccuid@;
y = @subValue@;
run;
]]
Execute the SAS code and assign the result to the Lua variable Rc. The
SAS.SUBMIT function tells the system to run SAS code. The code block that was
assigned to the Code variable is now executed. The call to the SAS.SUBMIT
function includes two variable substitutions. When the SAS code executes, the
character value "67" is substituted for @ccuid@, and the value 72 is substituted for
@subValue@.
rc = sas.submit(code, {ccuid="67", subValue=72})
The ENDSUBMIT statement identifies the end of the block of Lua statements.
The RUN statement completes the call to PROC LUA.
endsubmit;
run;
Details
This example traverses a data set and prints the variable names and corresponding
values for each observation. Use the SAS.ROWS function when there are relatively
few variables.
Program
proc lua;
submit;
local dsid = sas.open("sashelp.class")
for row in sas.rows(dsid) do
for n,v in pairs(row) do
if type(n)=="string" then
print(n,'=', v)
end
end
1448 Chapter 39 / LUA Procedure
end
sas.close(dsid)
endsubmit;
run;
Program Description
Execute the PROC LUA and SUBMIT statements to begin a block of Lua code.
proc lua;
submit;
Open the Sashelp.Class SAS data set and load the contents into the Lua variable
Dsid. By default, the data set is opened for reading.
local dsid = sas.open("sashelp.class")
Process the rows of the data set. The outer FOR loop processes each row in the
data set. The PAIRS function pulls the variable name and value pairs from each row
in the Sashelp.Class data set. The pairs of variable names and values are then
printed to the SAS log. Note that in Lua, a double equal sign (==) is used to assess
whether two values are equal in the IF condition.
for row in sas.rows(dsid) do
for n,v in pairs(row) do
if type(n)=="string" then
print(n,'=', v)
end
end
end
Close the data set, end the block of Lua code, and complete the call to the LUA
procedure.
sas.close(dsid)
endsubmit;
run;
Example 11: Using Iterator Functions for a Large Table 1449
sex = M
height = 69
name = Alfred
weight = 112.5
age = 14
sex = F
height = 56.5
name = Alice
weight = 84
age = 13
sex = F
height = 65.3
name = Barbara
weight = 98
age = 13
sex = F
height = 62.8
name = Carol
weight = 102.5
age = 14
...
sex = M
height = 66.5
name = William
weight = 112
age = 15
NOTE: PROCEDURE LUA used (Total process time):
real time 0.08 seconds
cpu time 0.06 seconds
Details
This example uses the SAS.VARS and SAS.NEXT functions to traverse a SAS data
set and print its contents.
1450 Chapter 39 / LUA Procedure
Program
proc lua;
submit;
local dsid = sas.open("sashelp.company") -- open for input
local vars = {}
-- Iterate over the variables in the data set
for var in sas.vars(dsid) do
vars[var.name:lower()] = var
end
-- Iterate over the rows of the data set
local i=0
while (sas.next(dsid) ~= nil) do
i=i+1
print("OBS=" .. i)
for vname,var in pairs(vars) do
print(vname, '=', sas.get_value(dsid, vname) )
end
end
sas.close(dsid)
endsubmit;
run;
Program Description
Execute the PROC LUA and SUBMIT statements to begin a block of Lua code.
proc lua;
submit;
Declare a local Lua array, VARS, and populate it with the values of the variables
in the data set. The brackets ({ }) identify an array in Lua. Use the SAS.VARS
function to iterate over all the variables in the data set. The example assigns the
value of each variable to the Vars array, where the array key is the variable name in
lowercase.
local vars = {}
-- Iterate over the variables in the data set
for var in sas.vars(dsid) do
vars[var.name:lower()] = var
end
Iterate over each data set observation and print the values for each variable. The
example initializes an iterator variable, I, to 0. The SAS.NEXT iterator function then
cycles through the observations in the data set. For each observation, the example
prints the variable name and corresponding value for each variable.
-- Iterate over the rows of the data set
local i=0
Example 11: Using Iterator Functions for a Large Table 1451
Close the data set and end the call to the LUA procedure.
sas.close(dsid)
endsubmit;
run;
OBS=1
level4 = CONTRACTS
job1 = MANAGER
level3 = ADMIN
level2 = TOKYO
n = 1
depthead = 1
level1 = International Ai
level5 = So Suumi
OBS=2
level4 = CONTRACTS
job1 = ASSISTANT
level3 = ADMIN
level2 = TOKYO
n = 1
depthead = 2
level1 = International Ai
level5 = Steffen Graff
...
OBS=48
level4 = MIS
job1 = TECH. CONS.
level3 = TECHN. SERVICES
level2 = NEW YORK
n = 1
depthead = 2
level1 = International Ai
level5 = Roy Hobbs
NOTE: PROCEDURE LUA used (Total process time):
real time 0.23 seconds
cpu time 0.21 seconds
1452 Chapter 39 / LUA Procedure
Details
This example defines new variables within Lua code and adds them to a data set.
The example uses the SAS.ADD_VARS function to define the variables and add
them to the output data set.
Program
proc lua;
submit;
local dsid = sas.open("work.sample","o")
sas.add_vars(dsid, { {name="var1", type="N", format="BEST12.2"},
{name="var2", type="C", length=500,
format="$char80."},
{name="var3", type="C", label="Character
Data"}
})
sas.close(dsid)
endsubmit;
run;
Program Description
Execute the PROC LUA statement and begin the block of Lua statements. The
PROC LUA statement enables you to call Lua code within your SAS session. The
SUBMIT statement identifies the beginning of the block of Lua statements.
proc lua;
submit;
Example 13: Connecting to the CAS Server 1453
Open the data set and assign its contents to a local Lua variable. Invoke the
SAS.OPEN function to open the WORK.SAMPLE data set. Use the ‘o’ mode to
create the data set if it does not already exist.
local dsid = sas.open("work.sample","o")
Define variables to add to the data set. This example invokes the SAS.ADD_VARS
function to add three variables—Var1, Var2, and Var3—to the data set. The type,
numeric (N) or character (C), is specified for each variable. Various additional
attributes are assigned to the variables. Only the name attribute is required. For
more information, see “Data Set Functions for the LUA Procedure” on page 1412.
sas.add_vars(dsid, { {name="var1", type="N", format="BEST12.2"},
{name="var2", type="C", length=500,
format="$char80."},
{name="var3", type="C", label="Character
Data"}
})
Close the data set and end the call to the LUA procedure. This example executes
the SAS.CLOSE function to close the data set that is associated with the Lua
variable Dsid.
sas.close(dsid)
endsubmit;
run;
Details
This example shows how to connect to a CAS server. The host name is
server.mycompany.com, the port is 5570.
Note: This example assumes that you have SAS Viya 3.5 installed.
Program
proc lua;
submit;
swat_enabled = true
endsubmit;
run;
Program Description
Start PROC LUA and load utility resources. Use the concatenation operator to
append the middleclass.lua location to the package.path value. Similarly, add the
location of the tkluaswat.so file to the package.cpath value. A variable called
Swat_enabled is created to prevent updating the package.path and package.cpath
values if you run this program more than once. For more information about SWAT,
see “Requirements” in Getting Started with SAS Viya for Lua.
proc lua;
submit;
package.path = package.path ..
';/opt/sas/viya/home/SASFoundation/misc/casluaclnt/lua/deps/?.lua'
package.cpath = package.cpath ..
';/opt/sas/viya/home/SASFoundation/misc/casluaclnt/lua/lib/?.so'
swat_enabled = true
Connect to the CAS server. Use the CAS.OPEN function to start a CAS session on
the CAS server. Assign the CAS session to the Lua variable S. Verify that the
connection was successful and print session information. This version of the
CAS.OPEN function assumes that you have created an authinfo file. For more
information, see “Connect and Start a Session” in Getting Started with SAS Viya for
Lua.
-- Connect to the CAS server on your host
s=cas.open("server.mycompany.com",5570)
Run a CAS action. Run the addFmtLib action to add a new format library. Add a new
format definition with the addFormat action. Then apply the format to some values
and check the output to see the formatted values.
-- Run the addFmtLib action to add a format library
s:addFmtLib{fmtlibname="myformats"}
print(result)
Shut down the CAS session and disconnect from the CAS server with the CLOSE
function. End the Lua code block and submit the call to PROC LUA.
-- Shutdown the CAS session
s:close{}
endsubmit;
run;
NOTE: before
NOTE: package.path=
/usr/local/.../init.lua;./?.lua
NOTE: package.cpath=
/usr/local/.../loadall.so;./?.so
NOTE: after
NOTE: package.path=
/usr/local/.../init.lua;./?.lua;SASHOME/SASFoundation/misc/casluaclnt/lua/?.lua;
SASHOME/SASFoundation/misc/casluaclnt/lua/deps/?.lua
NOTE: package.cpath=
/usr/local/.../loadall.so;./?.so;SASHOME/SASFoundation/misc/casluaclnt/lua/
lib/?.so
NOTE: server=
Server Status
Node Count Total Actions
1 1
NOTE: nodestatus=
Node Status
Node Name Role Uptime (Sec) Running Stalled
server.mycompany.com controller 0.647 0 0
[listFmtValues]
ListFmtValues
Format Name Number Value Dest Character value
demo 15 demo1
23 demo2
39 demo3
61 demo4
1458 Chapter 39 / LUA Procedure
Details
This example calls functions that have been defined using the FCMP procedure,
similar to calling standard SAS functions. The function definitions do not need to
be included in the same SAS program. PROC FCMP functions can be stored in a
function library for use by any program if the CMPLIB system option has been
defined within that program. This example defines the SUMX and ADD_SCALAR
functions and saves them to the Sasuser.myFuncs data set in the ArrayFuncs
package. A package is a collection of related routines that are specified by a user.
The package name groups related functions in the data set that contains the PROC
FCMP functions.
Program
proc fcmp outlib=sasuser.myFuncs.ArrayFuncs;
function sumx(x[*]);
sum = 0;
do i = 1 to dim(x);
sum = sum + x[i];
end;
return(sum );
endsub;
options cmplib=sasuser.myFuncs;
Example 14: Running PROC FCMP Functions 1459
proc lua;
submit;
array = { 1, 2, 3, 4, 5 }
sum = sas.sumx(array)
print(sum)
endsubmit;
run;
proc lua;
submit;
array = { 1, 2, 3, 4, 5 }
dim = sas.add_scalar(5, array)
a1 = array[1]
a2 = array[2]
print(a1)
print(a2)
endsubmit;
run;
Program Description
Define functions using the FCMP procedure. The SUMX function sums the values
of an array and returns that value. The ADD_SCALAR function adds a scalar value
to each member of an array. Because the OUTARGS argument for ADD_SCALAR
specifies an array, the call to this function using PROC LUA can change the value of
the submitted array. The ADD_SCALAR function returns the number of elements in
the array.
proc fcmp outlib=sasuser.myFuncs.ArrayFuncs;
function sumx(x[*]);
sum = 0;
do i = 1 to dim(x);
sum = sum + x[i];
end;
return(sum );
endsub;
Specify the location of the compiled functions. This example finds the previously
defined functions in the Sasuser.myFuncs data set. The call to the FCMP procedure
does not need to be included in the same program with the LUA procedure if the
functions are stored. Use the OPTIONS statement to specify the function location
before calling the function within the LUA procedure.
1460 Chapter 39 / LUA Procedure
options cmplib=sasuser.myFuncs;
Call the SUMX function within the LUA procedure. You call the SUMX function by
preceding the call with “SAS.”, similar to calling any standard SAS function.
proc lua;
submit;
array = { 1, 2, 3, 4, 5 }
sum = sas.sumx(array)
print(sum)
endsubmit;
run;
Call the ADD_SCALAR function within the LUA procedure. You call the
ADD_SCALAR function by preceding the call with “SAS.”, similar to calling any
standard SAS function. The calls to SUMX and ADD_SCALAR can come within the
same LUA procedure.
proc lua;
submit;
array = { 1, 2, 3, 4, 5 }
dim = sas.add_scalar(5, array)
a1 = array[1]
a2 = array[2]
print(a1)
print(a2)
endsubmit;
run;
Example 14: Running PROC FCMP Functions 1461
405
406 options cmplib=sasuser.myFuncs;
407
408 proc lua;
409 submit;
410 array = { 1, 2, 3, 4, 5 }
411 sum = sas.sumx(array)
412 print(sum)
413 endsubmit;
414 run;
415
416 proc lua;
417 submit;
418 array = { 1, 2, 3, 4, 5 }
419 dim = sas.add_scalar(5, array)
420 a1 = array[1]
421 a2 = array[2]
422 print(a1)
423 print(a2)
424 endsubmit;
425 run;
40
MEANS Procedure
n performs a t test
By default, PROC MEANS displays output. You can also use the OUTPUT
statement to store the statistics in a SAS data set.
PROC MEANS and PROC SUMMARY are very similar; see Chapter 69, “SUMMARY
Procedure,” on page 2455 for an explanation of the differences.
Overview: MEANS Procedure 1465
Output 40.2 Specified Statistics for Class Levels and Identification of Maximum Values
N
School Year Obs Variable N Mean Range
-----------------------------------------------------------------------------
Kennedy 1992 15 MoneyRaised 15 29.0800000 39.7500000
HoursVolunteered 15 22.1333333 30.0000000
-----------------------------------------------------------------------------
In addition to the report, the program also creates an output data set (located on
page 2 of the output) that identifies the students who raised the most money and
who volunteered the most time over all the observations and within the
combinations of School and Year:
n The first observation in the data set shows the students with the maximum
values overall for MoneyRaised and HoursVolunteered.
n Observations 2 through 4 show the students with the maximum values for each
year, regardless of school.
n Observations 5 and 6 show the students with the maximum values for each
school, regardless of year.
Concepts: MEANS Procedure 1467
n Observations 7 through 12 show the students with the maximum values for each
school-year combination.
For example, numeric format outputs can be wider and have more characters than
expected when the default–format width is greater than the specified width
specified in the data set or the statement.
You can prevent default formats being applied to the output data set by creating
your output data set using the OUTPUT statement instead of the ODS OUTPUT
statement.
If you prefer that format defaults not be applied to ODS output, you can modify the
Base.Summary ODS table template with the following code:
proc template;
edit base.summary;
use_format_defaults=off;
end;
run;
variable values that occur together in any single observation of the input data set
determine the data subgroups. Each subgroup that PROC MEANS generates for a
given type is called a level of that type. Note that for all types, the inactive class
variables can still affect the total observation count of the rejection of
observations with missing values.
When you use a WAYS statement, PROC MEANS generates types that correspond
to every possible unique combination of n class variables chosen from the complete
set of class variables. For example
proc means;
class a b c d e;
ways 2 3;
run;
is equivalent to
proc means;
class a b c d e;
types a*b a*c a*d a*e b*c b*d b*e c*d c*e d*e
a*b*c a*b*d a*b*e a*c*d a*c*e a*d*e
b*c*d b*c*e c*d*e;
run;
If you omit the TYPES statement and the WAYS statement, then PROC MEANS
uses all class variables to subgroup the data (the NWAY type) for displayed output
and computes all types ( 2k ) for the output data set.
N
Pet Gender Obs
---------------------------
dog f 3
m 1
cat f 1
m 2
---------------------------
In the example, PROC MEANS does not list male cats before female cats. Instead, it
determines the order of gender for all types over the entire data set. PROC MEANS
found more observations for female pets (f=4, m=3). The default for ORDER is
ORDER=INTERNAL.
Computational Resources
PROC MEANS uses the same memory allocation scheme across all operating
environments. When class variables are involved, PROC MEANS must keep a copy
of each unique value of each class variable in memory. You can estimate the
memory requirements to group the class variable by calculating
Nc1 Lc1 + K + Nc2 Lc2 + K + ... + Ncn Lcn + K
where
Nci
is the number of unique values for the class variable.
Lci
is the combined unformatted and formatted length of ci .
K
is some constant on the order of 32 bytes (64 for 64-bit architectures).
When you use the GROUPINTERNAL option in the CLASS statement, Lci is simply
the unformatted length of ci .
1470 Chapter 40 / MEANS Procedure
Each unique combination of class variables, c1i c2 j for a given type forms a level in
that type. See “TYPES Statement” on page 1506. You can estimate the maximum
potential space requirements for all levels of a given type, when all combinations
actually exist in the data (a complete type), by calculating
W * Nc1 * Nc2 * ... * Ncn
where
W
is a constant based on the number of variables analyzed and the number of
statistics calculated (unless you request QMETHOD=OS to compute the
quantiles).
Nc1...Ncn
are the number of unique levels for the active class variables of the given type.
Clearly, the memory requirements of the levels overwhelm the levels of the class
variables. For this reason, PROC MEANS can open one or more utility files and
write the levels of one or more types to disk. These types are either the primary
types that PROC MEANS built during the input data scan or the derived types.
If PROC MEANS must write partially complete primary types to disk while it
processes input data, then one or more merge passes can be required to combine
type levels in memory with the levels on disk. In addition, if you use an order other
than DATA for any class variable, then PROC MEANS groups the completed types
on disk. For this reason, the peak disk space requirements can be more than twice
the memory requirements for a given type.
When PROC MEANS uses a temporary work file, you receive the following note in
the SAS log:
Processing on disk occurred during summarization.
Peak disk usage was approximately nnn
Mbytes.
Adjusting MEMSIZE or REALMEMSIZE may improve performance.
When you specify class variables in a CLASS statement, the amount of data-
dependent memory that PROC MEANS uses before it writes to a utility file is
controlled by the SAS system option REALMEMSIZE=. The value of
REALMEMSIZE= indicates the amount of real as opposed to virtual memory that
SAS can expect to allocate. PROC MEANS determines how much data-dependent
memory to use before writing to utility files by calculating the lesser of these two
values:
n the value of REALMEMSIZE=
REALMEMSIZE also affects the behavior of other memory intensive PROCs such as
PROC SORT.
As an alternative, you can use the PROC option SUMSIZE=. Like the PROC option
SORTSIZE=, SUMSIZE= sets the memory threshold where disk-based operations
begin. For best results, set SUMSIZE= to less than the amount of real memory that
Concepts: MEANS Procedure 1471
is likely to be available for the task. For efficiency reasons, PROC MEANS can
internally round up the value of SUMSIZE=. SUMSIZE= has no effect unless you
specify class variables.
Operating Environment Information: The REALMEMSIZE= SAS system option is
not available in all operating environments. For details, see the SAS Companion for
your operating environment.
If PROC MEANS reports that there is insufficient memory, then increase SUMSIZE=
(or REALMEMSIZE=). A SUMSIZE= (or REALMEMSIZE=) value that is greater than
MEMSIZE= has no effect. Therefore, you might also need to increase MEMSIZE=. If
PROC MEANS reports insufficient disk space, then increase the WORK space
allocation. See the SAS documentation for your operating environment for more
information about how to adjust your computation resource parameters.
n Aster
n DB2
n Google BigQuery
n Greenplum
n Hadoop
n HAWQ
n Impala
n Netezza
n Oracle
n PostgreSQL
n SAP HANA
n Snowflake
n Teradata
1472 Chapter 40 / MEANS Procedure
n Vertica
n Yellowbrick
Under the correct conditions, PROC MEANS generates an SQL query based on the
statements that are used and the output statistics that are specified in the PROC
step. If class variables are specified, the procedure creates an SQL GROUP BY
clause that represents the n-way type. The result set that is created when the
aggregation query executes in the database is read by SAS into the internal PROC
MEANS data structure, and all subsequent types are derived from the original n-
way type to form the final analysis results. When SAS format definitions have been
deployed in the database, formatting of class variables occurs in the database. If
the SAS format definitions have not been deployed in the database, the in-
database aggregation occurs on the raw values, and the relevant formats are
applied by SAS as the results' set is merged into the PROC MEANS internal
structures. Multi-label formatting is always done by SAS using the initially
aggregated result set that is returned by the database. The CLASS, TYPES, WAYS,
VAR, BY, FORMAT, and WHERE statements are supported when PROC MEANS is
processed inside the database. FREQ, ID, IDMIN, IDMAX, and IDGROUPS are not
supported. The following statistics are supported for in-database processing: N,
NMISS, MIN, MAX, RANGE, SUM, SUMWGT, MEAN, CSS, USS, VAR, STD, STDERR,
PRET, UCLM, LCLM, CLM, and CV.
Weighting for in-database processing is supported only for N, NMISS, MIN, MAX,
RANGE, SUM, SUMWGT, and MEAN.
The following statistics are currently not supported for in-database processing:
SKEW, KURT, P1, P5, P10, P20, P25/Q1, P30, P40, P50/MEDIAN, P60, P70, P75/Q3,
P80, P90, P95, P99, and MODE.
In-database processing can greatly reduce the volume of data transferred to the
procedure if there are no class variables (one row is returned) or if the selected
class variables have a small number of unique values. However, because PROC
MEANS loads the result set into its internal structures, the memory requirements
for the SAS process are equivalent to what would have been required without in-
database processing. The CPU requirements for the SAS process should be
significantly reduced if the bulk of the data summarization occurs inside the
database. The real time required for summarization should be significantly reduced
because many database-process queries are in parallel.
For more information about in-database processing, see SAS/ACCESS for Relational
Databases: Reference.
Concepts: MEANS Procedure 1473
The value of the SAS system option CPUCOUNT= affects the performance of the
threaded sort. CPUCOUNT= suggests how many system CPUs are available for use
by the threaded procedures.
For more information see the “THREADS System Option” and “CPUCOUNT=
System Option” in the SAS System Options: Reference.
Under the correct conditions, PROC MEANS generates and executes a CAS action
based on the statements that are used and the output statistics that are specified
in the PROC step. If class variables are specified, the procedure creates a
GROUPBY input table parameter that represents the n-way type. The result set
that is created when the action executes on the CAS server is read by SAS into the
internal PROC MEANS data structure. All subsequent types are derived from the
original n-way type to form the final analysis results.
1474 Chapter 40 / MEANS Procedure
For intrinsic formats, formatting of class variables occurs in CAS. For user-defined
formats, formatting of class variables occurs in CAS if the formats have been
defined in or copied to the server. If formats are not available on the CAS server,
initial aggregation occurs on the server using raw values, and the relevant formats
are applied by SAS as the result set is merged into PROC MEANS internal structure.
Multilabel formatting is always done by SAS using the initially aggregated result set
that is returned by CAS.
The CLASS, TYPES, WAYS, VAR, BY, FORMAT, and WHERE statements are
supported when PROC MEANS is processed inside the CAS server. FREQ, ID,
IDMIN, IDMAX, and IDGROUPS are not supported.
The following statistics are supported for processing: N, NMISS, MIN, MAX,
RANGE, SUM, SUMWGT, MEAN, CSS, USS, VAR, STD, STDERR, PRET, UCLM,
LCLM, CLM, and CV.
Weighting for CAS processing is supported only for N, NMISS, MIN, MAX, RANGE,
SUM, SUMWGT, and MEAN.
The following statistics are currently not supported in CAS: SKEW, KURT, P1, P5,
P10, P20, P25/Q1, P30, P40, P50/MEDIAN, P60, P70, P75/Q3, P80, P90, P95, P99,
and MODE.
By default, when the DATA= input data set references an in-memory table or view
in CAS, the MEANS procedure runs on the CAS server when possible. There are
many data set options that prevent processing in CAS, such as OBS=, FIRSTOBS=,
and RENAME=.
Processing in CAS can reduce the volume of data that is transferred to the
procedure if there are no class variables (one row is returned) or if the selected
class variables have a small number of unique values. However, because PROC
MEANS loads the result set into its internal structure, the memory requirements for
the SAS process are equivalent to what would have been required when not
processing in CAS. The CPU requirements for the SAS process should be
significantly reduced if the bulk of the data summarization occurs inside the CAS
server. The real-time processing required for summarization should be significantly
reduced because CAS can process the data in parallel. If the results of PROC
MEANS are directed back to the CAS server using an OUTPUT statement,
processing of intermediate aggregates must still be performed by the MEANS
procedure in SAS. If you want all processing to be performed in CAS, you can
invoke an appropriate action directly, using PROC CAS or one of the many other
CAS clients and languages.
Note: Intermediate results are always processed by the client regardless of the
final destination–CAS or the client.
Here is an example of how to run PROC MEANS with CAS. The CAS LIBNAME
engine is used to connect a SAS 9.4 session to an existing CAS session through the
CAS session name or the CAS session UUID. The resulting libref is then used by
SAS to communicate with the specific CAS session.
/* Connect to a CAS server */
cas casauto host="cloud.example.com" port=5570;
/* Specify the CAS engine LIBNAME statement and use the CAS engine
libref*/
Syntax: MEANS Procedure 1475
Syntax
PROC MEANS <options> <statistic-keyword(s)>;
MAXDEC=number
specifies the number of decimal places for the statistics.
NONOBS
suppresses reporting the total number of observations for each unique
combination of the class variables.
NOPRINT
suppresses all displayed output.
ORDER=DATA | FORMATTED | FREQ | UNFORMATTED
orders the values of the class variables according to the specified order.
PRINTALLTYPES
displays the analysis for all requested combinations of class variables.
PRINTIDVARS
displays the values of the ID variables.
PRINT | NOPRINT
displays the output.
STACKODSOUTPUT
produces an ODS output object
Optional Arguments
ALPHA=value
specifies the confidence level to compute the confidence limits for the mean.
The percentage for the confidence limits is (1−value)×100. An example is
(ALPHA=.05 results in a 95% confidence limit).
Default .05
CHARTYPE
specifies that the _TYPE_ variable in the output data set is a character
representation of the binary value of _TYPE_. The length of the variable equals
the number of class variables.
CLASSDATA=SAS-data-set
specifies a data set that contains the combinations of values of the class
variables that must be present in the output. Any combinations of values of the
class variables that occur in the CLASSDATA= data set but not in the input data
set appear in the output and have a frequency of zero.
Restriction The CLASSDATA= data set must contain all class variables. Their
data type and format must match the corresponding class variables
in the input data set.
Interaction If you use the EXCLUSIVE option, then PROC MEANS excludes any
observation in the input data set whose combination of class
variables is not in the CLASSDATA= data set.
Tip Use the CLASSDATA= data set to filter or to supplement the input
data set.
COMPLETETYPES
creates all possible combinations of class variables even if the combination
does not occur in the input data set.
1480 Chapter 40 / MEANS Procedure
DATA=SAS-data-set
identifies the input SAS data set.
DESCENDTYPES
orders observations in the output data set by descending _TYPE_ value.
Aliases DESCENDING
DESCEND
Tip Use DESCENDTYPES to make the overall total (_TYPE_=0) the last
observation in each BY group.
EXCLNPWGT
excludes observations with nonpositive weight values (zero or negative) from
the analysis. By default, PROC MEANS treats observations with negative
weights like observations with zero weights and counts them in the total
number of observations.
Alias EXCLNPWGTS
EXCLUSIVE
excludes from the analysis all combinations of the class variables that are not
found in the CLASSDATA= data set.
FW=field-width
specifies the field width to display the statistics in printed or displayed output.
FW= has no effect on statistics that are saved in an output data set.
Default 12
Tip If PROC MEANS truncates column labels in the output, then increase
the field width.
INCAS=(YES | NO)
specifies whether to allow in-CAS processing.
YES
Use in-CAS processing. YES is the default.
NO
Do not use in-CAS processing.
IDMIN
specifies that the output data set contain the minimum value of the ID
variables.
MAXDEC=number
specifies the maximum number of decimal places to display the statistics in the
printed or displayed output. MAXDEC= has no effect on statistics that are
saved in an output data set.
Range 0-8
MISSING
considers missing values as valid values to create the combinations of class
variables. Special missing values that represent numeric values (the letters A
through Z and the underscore (_) character) are each considered as a separate
value.
1482 Chapter 40 / MEANS Procedure
Default If you omit MISSING, then PROC MEANS excludes the observations
with a missing class variable value from the analysis.
NONOBS
suppresses the column that displays the total number of observations for each
unique combination of the values of the class variables. This column
corresponds to the _FREQ_ variable in the output data set.
NOPRINT
See the `“PRINT | NOPRINT” option.
NOTHREADS
See the “THREADS | NOTHREADS” option.
NOTRAP
disables floating point exception (FPE) recovery during data processing. By
default, PROC MEANS traps these errors and sets the statistic to missing.
NWAY
specifies that the output data set contain only statistics for the observations
with the highest _TYPE_ and _WAY_ values. When you specify class variables,
NWAY corresponds to the combination of all class variables.
DATA
orders values according to their order in the input data set.
PROC MEANS Statement 1483
Interaction If you use PRELOADFMT in the CLASS statement, then the order
for the values of each class variable matches the order that
PROC FORMAT uses to store the values of the associated user-
defined format. If you use the CLASSDATA= option, then PROC
MEANS uses the order of the unique values of each class
variable in the CLASSDATA= data set to order the output levels.
If you use both options, then PROC MEANS first uses the user-
defined formats to order the output. If you omit EXCLUSIVE,
then PROC MEANS appends after the user-defined format and
the CLASSDATA= values the unique values of the class variables
in the input data set based on the order in which they are
encountered.
FORMATTED
orders values by their ascending formatted values. This order depends on
your operating environment.
Aliases FMT
EXTERNAL
FREQ
orders values by descending frequency count so that levels with the most
observations are listed first.
UNFORMATTED
orders values by their unformatted values. This order depends on your
operating environment. This sort sequence is particularly useful for
displaying dates chronologically.
Aliases UNFMT
INTERNAL
Default UNFORMATTED
1484 Chapter 40 / MEANS Procedure
PRINT | NOPRINT
specifies whether PROC MEANS displays the statistical analysis. NOPRINT
suppresses all the output.
Default PRINT
Tip Use NOPRINT when you want to create only an OUT= output data set.
“Example 12: Identifying the Top Three Extreme Values with the Output
Statistics” on page 1547
PRINTALLTYPES
displays all requested combinations of class variables (all _TYPE_ values) in the
printed or displayed output. Normally, PROC MEANS shows only the NWAY
type.
Alias PRINTALL
Interaction If you use the NWAY option, the TYPES statement, or the WAYS
statement, then PROC MEANS ignores this option.
PRINTIDVARS
displays the values of the ID variables in printed or displayed output.
Alias PRINTIDS
QMARKERS=number
specifies the default number of markers to use for the P² quantile estimation
method. The number of markers controls the size of fixed memory space.
Default The default value depends on which quantiles you request. For the
median (P50), number is 7. For the quantiles (P25 and P50), number is
25. For the quantiles P1, P5, P10, P75 P90, P95, or P99, number is 105. If
you request several quantiles, then PROC MEANS uses the largest
value of number.
Tip Increase the number of markers above the defaults settings to improve
the accuracy of the estimate; reduce the number of markers to
conserve memory and computing time.
QMETHOD=OS | P2
specifies the method that PROC MEANS uses to process the input data when it
computes quantiles. If the number of observations is less than or equal to the
PROC MEANS Statement 1485
QMARKERS= value and QNTLDEF=5, then both methods produce the same
results.
OS
uses order statistics. This method is the same method that PROC
UNIVARIATE uses.
P2
uses the P² method to approximate the quantile.
Default OS
QNTLDEF=1 | 2 | 3 | 4 | 5
specifies the mathematical definition that PROC MEANS uses to calculate
quantiles when QMETHOD=OS. To use QMETHOD=P2, you must use
QNTLDEF=5.
Alias PCTLDEF=
Default 5
statistic-keyword(s)
specifies which statistics to compute and the order to display them in the
output. The available keywords in the PROC statement are
CLM NMISS
CSS RANGE
CV SKEWNESS | SKEW
LCLM STDERR
MAX SUM
MEAN SUMWGT
1486 Chapter 40 / MEANS Procedure
MIN UCLM
MODE USS
N VAR
P1 P90
P5 P95
P10 P99
P20 P30
P40 P60
P70 P80
Q1 | P25 QRANGE
PROBT | PRT T
Requirement To compute standard error, confidence limits for the mean, and the
Student's t-test, you must use the default value of the VARDEF=
option, which is DF. To compute skewness or kurtosis, you must
use VARDEF=N or VARDEF=DF.
See The definitions of the keywords and the formulas for the
associated statistics are listed in “Keywords and Formulas” on
page 2700.
STACKODSOUTPUT
produces an ODS output object whose data set resembles the printed output.
Alias STACKODS
SUMSIZE=value
specifies the amount of memory that is available for data summarization when
you use class variables. value might be one of the following:
n
nK
nM
nG
specifies the amount of memory available in bytes, kilobytes, megabytes, or
gigabytes, respectively. If n is 0, then PROC MEANS use the value of the SAS
system option SUMSIZE=.
MAXIMUM
MAX
specifies the maximum amount of memory that is available.
Tip For best results, do not make SUMSIZE= larger than the amount of
physical memory that is available for the PROC step. If additional space
is needed, then PROC MEANS uses utility files.
See The SAS system option SUMSIZE= in SAS System Options: Reference.
THREADS | NOTHREADS
enables or disables parallel processing of the input data set. This option
overrides the SAS system option THREADS | NOTHREADS unless the system
option is restricted. (See Restriction.) For more information, see “Support for
Parallel Processing” in the SAS Language Reference: Concepts.
Interaction PROC MEANS honors the SAS system option THREADS except
when a BY statement is specified or the value of the SAS system
option CPUCOUNT is less than 2. You can use THREADS in the
1488 Chapter 40 / MEANS Procedure
VARDEF=divisor
specifies the divisor to use in the calculation of the variance and standard
deviation. The following table shows the possible values for divisor and
associated divisors.
Table 40.1 Possible Values for VARDEF=
N Number of observations n
Default DF
Requirement To compute the standard error of the mean, confidence limits for
the mean, or the Student's t-test, use the default value of
VARDEF=.
Tips When you use the WEIGHT statement and VARDEF=DF, the
variance is an estimate of σ 2 , where the variance of the ith
observation is var xi = σ 2 /wi and wi is the weight for the ith
observation. This method yields an estimate of the variance of an
observation with unit weight.
BY Statement
Produces separate statistics for each BY group.
Syntax
BY <DESCENDING> variable-1 <<DESCENDING> variable-2 …> <NOTSORTED>;
Required Argument
variable
specifies the variable that the procedure uses to form BY groups. You can
specify more than one variable. If you omit the NOTSORTED option in the BY
statement, then the observations in the data set either must be sorted by all the
variables that you specify or must be indexed appropriately. Variables in a BY
statement are called BY variables.
Optional Arguments
DESCENDING
specifies that the observations are sorted in descending order by the variable
that immediately follows the word DESCENDING in the BY statement.
NOTSORTED
specifies that observations are not necessarily sorted in alphabetic or numeric
order. The observations are sorted in another way (for example, chronological
order).
Details
CLASS Statement
Specifies the variables whose values define the subgroup combinations for the analysis.
Note: CLASS statements without options use ORDER=INTERNAL, which is the default,
or the value specified by the ORDER= option in the PROC MEANS statement. For
example, in the following code, variables c and d would use ORDER=INTERNAL. If
an ORDER= option had been specified in the PROC MEANS statement, then
variables c and d would use the value specified by the ORDER= option in the PROC
MEANS statement.
class a b / order=data;
class c d;
“Example 12: Identifying the Top Three Extreme Values with the Output Statistics”
on page 1547
Syntax
CLASS variable(s) </ options>;
Required Argument
variable(s)
specifies one or more variables that the procedure uses to group the data.
Variables in a CLASS statement are referred to as class variables. Class
variables are numeric or character. Class variables can have continuous values,
but they typically have a few discrete values that define levels of the variable.
You do not have to sort the data by class variables.
Interaction Use the TYPES statement or the WAYS statement to control which
class variables PROC MEANS uses to group the data.
Optional Arguments
ASCENDING
specifies to sort the class variable levels in ascending order.
Alias ASCEND
DESCENDING
specifies to sort the class variable levels in descending order.
Alias DESCEND
EXCLUSIVE
excludes from the analysis all combinations of the class variables that are not
found in the preloaded range of user-defined formats.
1492 Chapter 40 / MEANS Procedure
GROUPINTERNAL
specifies not to apply formats to the class variables when PROC MEANS groups
the values to create combinations of class variables.
Tip This option saves computer resources when the numeric class
variables contain discrete values.
MISSING
considers missing values as valid values for the class variable levels. Special
missing values that represent numeric values (the letters A through Z and the
underscore (_) character) are each considered as a separate value.
Default If you omit MISSING, then PROC MEANS excludes the observations
with a missing class variable value from the analysis.
MLF
enables PROC MEANS to use the primary and secondary format labels for a
given range or overlapping ranges to create subgroup combinations when a
multilabel format is assigned to a class variable.
Requirement You must use PROC FORMAT and the MULTILABEL option in the
VALUE statement to create a multilabel format.
Interactions If you use the OUTPUT statement with MLF, then the class
variable contains a character string that corresponds to the
formatted value. Because the formatted value becomes the
internal value, the length of this variable is the number of
characters in the longest format label.
Using MLF with ORDER=FREQ might not produce the order that
you expect for the formatted values. You might not get the
expected results when you use MLF with CLASSDATA and
EXCLUSIVE because MLF processing requires that each TYPE be
CLASS Statement 1493
Note When the formatted values overlap, one internal class variable
value maps to more than one class variable subgroup combination.
Therefore, the sum of the N statistics for all subgroups is greater
than the number of observations in the data set (the overall N
statistic).
Tip If you omit MLF, then PROC MEANS uses the primary format
labels. This action corresponds to using the first external format
value to determine the subgroup combinations.
DATA
orders values according to their order in the input data set.
Interaction If you use PRELOADFMT, then the order of the values of each
class variable matches the order that PROC FORMAT uses to
store the values of the associated user-defined format. If you use
the CLASSDATA= option in the PROC statement, then PROC
MEANS uses the order of the unique values of each class
variable in the CLASSDATA= data set to order the output levels.
If you use both options, then PROC MEANS first uses the user-
defined formats to order the output. If you omit EXCLUSIVE in
the PROC statement, then PROC MEANS appends after the
user-defined format and the CLASSDATA= values the unique
values of the class variables in the input data set based on the
order in which they are encountered.
FORMATTED
orders values by their ascending formatted values. This order depends on
your operating environment. If no format has been assigned to a class
variable, then the default format, BEST12., is used.
Aliases FMT
EXTERNAL
1494 Chapter 40 / MEANS Procedure
FREQ
orders values by descending frequency count so that levels with the most
observations are listed first.
UNFORMATTED
orders values by their unformatted values. This order depends on your
operating environment. This sort sequence is particularly useful for
displaying dates chronologically.
Aliases UNFMT
INTERNAL
Default UNFORMATTED
PRELOADFMT
specifies that all formats are preloaded for the class variables.
Details
When you use the NWAY option, PROC MEANS might encounter insufficient
memory for the summarization of all the class variables. You can move some class
variables to the BY statement. For maximum benefit, move class variables to the
BY statement that are already sorted or that have the greatest number of unique
values.
You can use the CLASS and BY statements together to analyze the data by the
levels of class variables within BY groups. See “Example 3: Using the BY Statement
with Class Variables” on page 1521.
Specifying the MISSING option in the CLASS statement enables you to control the
acceptance of missing values for individual class variables.
1496 Chapter 40 / MEANS Procedure
Computer Resources
The total of unique class variable values that PROC MEANS allows depends on the
amount of computer memory that is available. See “Computational Resources” on
page 1469 for more information.
FREQ Statement
Specifies a numeric variable that contains the frequency of each observation.
Syntax
FREQ variable;
Required Argument
variable
specifies a numeric variable whose value represents the frequency of the
observation. If you use the FREQ statement, then the procedure assumes that
each observation represents n observations, where n is the value of variable. If n
is not an integer, then SAS truncates it. If n is less than 1 or is missing, then the
procedure does not use that observation to calculate statistics.
The sum of the frequency variable represents the total number of observations.
Note: The FREQ variable does not affect how PROC MEANS identifies multiple
extremes when you use the IDGROUP syntax in the OUTPUT statement.
ID Statement
Includes additional variables in the output data set.
Syntax
ID variable(s);
Required Argument
variable(s)
identifies one or more variables from the input data set whose maximum values
for groups of observations PROC MEANS includes in the output data set.
Interaction Use IDMIN in the PROC statement to include the minimum value of
the ID variables in the output data set.
Tip Use the PRINTIDVARS option in the PROC statement to include the
value of the ID variable in the displayed output.
Details
See “Sorting Orders for Character Variables ” on page 2359 for information about
how PROC MEANS compares character values to determine the maximum value.
OUTPUT Statement
Writes statistics to a new SAS data set.
Tip: You can use multiple OUTPUT statements to create several OUT= data sets.
Examples: “Example 8: Computing Output Statistics” on page 1537
“Example 9: Computing Different Output Statistics for Several Variables” on page
1539
“Example 10: Computing Output Statistics with Missing Class Variable Values” on
page 1542
“Example 11: Identifying an Extreme Value with the Output Statistics” on page 1544
1498 Chapter 40 / MEANS Procedure
“Example 12: Identifying the Top Three Extreme Values with the Output Statistics”
on page 1547
Syntax
OUTPUT <OUT=SAS-data-set> <output-statistic-specification(s)>
<id-group-specification(s)> <maximum-id-specification(s)>
<minimum-id-specification(s)> </ options>;
Optional Arguments
OUT=SAS-data-set
names the new output data set. If SAS-data-set does not exist, then PROC
MEANS creates it. If you omit OUT=, then the data set is named DATAn, where
n is the smallest integer that makes the name unique.
Default DATAn
Tip You can use data set options with the OUT= option.
output-statistic-specification(s)
specifies the statistics to store in the OUT= data set and names one or more
variables that contain the statistics. The form of the output-statistic-
specification is
statistic-keyword<(variable-list)>=<name(s)>
where
statistic-keyword
specifies which statistic to store in the output data set. The available
statistic keywords are
CLM NMISS
CSS RANGE
CV SKEWNESS | SKEW
LCLM STDERR
MAX SUM
MEAN SUMWGT
MIN UCLM
OUTPUT Statement 1499
MODE USS
N VAR
P1 P90
P5 P95
P10 P99
P20 P30
P40 P60
P70 P80
Q1 | P25 QRANGE
PROBT | PRT T
By default the statistics in the output data set automatically inherit the
analysis variable's format, informat, and label. However, statistics computed
for N, NMISS, SUMWGT, USS, CSS, VAR, CV, T, PROBT, PRT, SKEWNESS,
and KURTOSIS do not inherit the analysis variable's format because this
format might be invalid for these statistics (for example, dollar or datetime
formats).
Restriction If you omit variable and name(s), then PROC MEANS allows the
statistic-keyword only once in a single OUTPUT statement,
unless you also use the AUTONAME option.
“Example 12: Identifying the Top Three Extreme Values with the
Output Statistics” on page 1547
1500 Chapter 40 / MEANS Procedure
variable-list
specifies the names of one or more numeric analysis variables whose
statistics you want to store in the output data set.
name(s)
specifies one or more names for the variables in the output data set that
contain the analysis variable statistics. The first name contains the statistic
for the first analysis variable; the second name contains the statistic for the
second analysis variable; and so on.
Default the analysis variable name. If you specify AUTONAME, then the
default is the combination of the analysis variable name and the
statistic-keyword. If you use the CLASS statement and an
OUTPUT statement without an output-statistic-specification,
then the output data set contains five observations for each
combination of class variables: the value of N, MIN, MAX, MEAN,
and STD. If you use the WEIGHT statement or the WEIGHT
option in the VAR statement, then the output data set also
contains an observation with the sum of weights (SUMWGT) for
each combination of class variables.
Interaction If you specify variable-list, then PROC MEANS uses the order in
which you specify the analysis variables to store the statistics in
the output data set variables.
id-group-specification
combines the features and extends the ID statement, the IDMIN option in the
PROC statement, and the MAXID and MINID options in the OUTPUT statement
to create an OUT= data set that identifies multiple extreme values. The form of
the id-group-specification is
MIN
MAX(variable-list)
specifies the selection criteria to determine the extreme values of one or
more input data set variables specified in variable-list. Use MIN to determine
the minimum extreme value and MAX to determine the maximum extreme
value.
Default If you do not specify MIN or MAX, then PROC MEANS uses the
observation number as the selection criterion to output
observations.
Restriction If you specify criteria that are contradictory, then PROC MEANS
uses only the first selection criterion.
LAST
specifies that the OUT= data set contains values from the last observation
(or the last n observations, if n is specified). If you do not specify LAST, then
the OUT= data set contains values from the first observation (or the first n
observations, if n is specified). The OUT= data set might contain several
observations because in addition to the value of the last (first) observation,
the OUT= data set contains values from the last (first) observation of each
subgroup level that is defined by combinations of class variable values.
Interaction When you specify MIN or MAX and when multiple observations
contain the same extreme values, PROC MEANS uses the
observation number to resolve which observation to save to the
OUT= data set. If you specify LAST, then PROC MEANS uses the
later observations to resolve any ties. If you do not specify LAST,
then PROC MEANS uses the earlier observations to resolve any
ties.
MISSING
specifies that missing values be used in selection criteria.
Alias MISS
OBS
includes an _OBS_ variable in the OUT= data set that contains the number of
the observation in the input data set where the extreme value was found.
Interactions If you use WHERE processing, then the value of _OBS_ might
not correspond to the location of the observation in the input
data set.
[n]
specifies the number of extreme values for each variable in id-variable-list to
include in the OUT= data set. PROC MEANS creates n new variables and
1502 Chapter 40 / MEANS Procedure
By default, PROC MEANS determines one extreme value for each level of
each requested type. If n is greater than one, then n extremes are generated
for each level of each type. When n is greater than one and you request
extreme value selection, the time complexity is O T * N log 2n , where T is
the number of types requested and N is the number of observations in the
input data set. By comparison, to group the entire data set, the time
complexity is O N log 2N .
Default 1
Example For example, to generate two minimum extreme values for each
variable, use
idgroup(min(x) out[2](x y z)=MinX MinY MinZ);
The OUT= data set contains the variables MinX_1, MinX_2, MinY_1,
MinY_2, MinZ_1, and MinZ_2.
(id-variable-list)
identifies one or more input data set variables whose values PROC MEANS
includes in the OUT= data set. PROC MEANS determines which
observations to generate by the selection criteria that you specify (MIN,
MAX, and LAST).
Alias IDGRP
Requirement You must specify the MIN | MAX selection criteria first and
OUT(id-variable-list)= after the suboptions MISSING, OBS, and
LAST.
When you want the output data set to contain extreme values
along with other ID variables, it is more efficient to include
them in the id-variable-list than to request separate statistics.
For example, the statement output idgrp(max(x) out(x a
b)= ); is more efficient than the statement output
idgrp(max(x) out(a b)= ) max(x)=;
“Example 12: Identifying the Top Three Extreme Values with the
Output Statistics” on page 1547
name(s)
specifies one or more names for variables in the OUT= data set.
Default If you omit name, then PROC MEANS uses the names of variables in
the id-variable-list.
OUTPUT Statement 1503
CAUTION
The IDGROUP syntax enables you to create output variables with the same
name. When this action happens, only the first variable appears in the output data
set. Use the AUTONAME option to automatically resolve these naming conflicts.
Note: If you specify fewer new variable names than the combination of analysis
variables and identification variables, then the remaining output variables use
the corresponding names of the ID variables as soon as PROC MEANS exhausts
the list of new variable names.
maximum-id-specification(s)
specifies that one or more identification variables be associated with the
maximum values of the analysis variables. The form of the maximum-id-
specification is
variable
identifies the numeric analysis variable whose maximum values PROC
MEANS determines. PROC MEANS can determine several maximum values
for a variable because, in addition to the overall maximum value, subgroup
levels, which are defined by combinations of class variables values, also
have maximum values.
Tip If you use an ID statement and omit variable, then PROC MEANS uses
all analysis variables.
id-variable-list
identifies one or more variables whose values identify the observations with
the maximum values of the analysis variable.
name(s)
specifies the names for new variables that contain the values of the
identification variable associated with the maximum value of each analysis
variable.
Tips If you use an ID statement, and omit variable and id-variable, then
PROC MEANS associates all ID statement variables with each
analysis variable. Thus, for each analysis variable, the number of
variables that are created in the output data set equals the number of
variables that you specify in the ID statement.
1504 Chapter 40 / MEANS Procedure
See “Example 11: Identifying an Extreme Value with the Output Statistics”
on page 1544
CAUTION
The MAXID syntax enables you to create output variables with the same
name. When this action happens, only the first variable appears in the output data
set. Use the AUTONAME option to automatically resolve these naming conflicts.
Note: If you specify fewer new variable names than the combination of analysis
variables and identification variables, then the remaining output variables use
the corresponding names of the ID variables as soon as PROC MEANS exhausts
the list of new variable names.
minimum-id-specification
See the description of maximum-id-specification. This option behaves in exactly
the same way, except that PROC MEANS determines the minimum values
instead of the maximum values. The form of the minid-specification is
When MINID is used without an explicit variable list, it is similar to the following
more advanced IDGROUP syntax example:
If one or more of the analysis variables has a missing value, the id_variable value
corresponds to the observation with the missing value not the observation with
the value for the MIN statistic.
option
can be one of the following items:
AUTOLABEL
specifies that PROC MEANS appends the statistic name to the end of the
variable label. If an analysis variable has no label, then PROC MEANS
creates a label by appending the statistic name to the analysis variable
name.
See “Example 12: Identifying the Top Three Extreme Values with the Output
Statistics” on page 1547
AUTONAME
specifies that PROC MEANS creates a unique variable name for an output
statistic when you do not assign the variable name in the OUTPUT
statement. This action is accomplished by appending to the statistic-
keyword to the end of the input variable name from which the statistic was
derived.
OUTPUT Statement 1505
See “Example 12: Identifying the Top Three Extreme Values with the Output
Statistics” on page 1547
KEEPLEN
specifies that statistics in the output data set inherit the length of the
analysis variable that PROC MEANS uses to derive them.
CAUTION
You permanently lose numeric precision when the length of the
analysis variable causes PROC MEANS to truncate or round the value
of the statistic. However, the precision of the statistic matches that of
the input.
LEVELS
includes a variable named _LEVEL_ in the output data set. This variable
contains a value from 1 to n that indicates a unique combination of the
values of class variables (the values of _TYPE_ variable).
NOINHERIT
specifies that the variables in the output data set that contain statistics do
not inherit the attributes (label and format) of the analysis variables that are
used to derive them.
Tip By default, the output data set includes an output variable for
each analysis variable and for five observations that contain N,
MIN, MAX, MEAN, and STDDEV. Unless you specify NOINHERIT,
this variable inherits the format of the analysis variable, which
can be invalid for the N statistic (for example, datetime formats).
1506 Chapter 40 / MEANS Procedure
WAYS
includes a variable named _WAY_ in the output data set. This variable
contains a value from 1 to the maximum number of class variables that
indicates how many class variables PROC MEANS combines to create the
TYPE value.
TYPES Statement
Identifies which of the possible combinations of class variables to generate.
Syntax
TYPES request(s);
Required Argument
request(s)
specifies which of the 2k combinations of class variables PROC MEANS uses to
create the types, where k is the number of class variables. A request includes
one class variable name, several class variable names separated by asterisks, or
().
To request class variable combinations quickly, use a grouping syntax by placing
parentheses around several variables and joining other variables or variable
combinations. For example, the following statements illustrate grouping syntax:
Request Equivalent to
If you do not need all types in the output data set, then use the
TYPES statement to specify particular subtypes rather than
applying a WHERE clause to the data set. Doing so saves time and
computer memory.
Details
then the B*C analysis (_TYPE_=3) is written first, followed by the A*C analysis
(_TYPE_=5). However, if you specify
class B A C;
types (A B)*C;
The _TYPE_ variable is calculated even if no output data set is requested. For more
information about the _TYPE_ variable, see “Output Data Set” on page 1514.
VAR Statement
Identifies the analysis variables and their order in the output.
Default: If you omit the VAR statement, then PROC MEANS analyzes all numeric variables
that are not listed in the other statements. When all variables are character
variables, PROC MEANS produces a simple count of observations.
Tip: You can use multiple VAR statements.
See: Chapter 69, “SUMMARY Procedure,” on page 2455
Example: “Example 1: Computing Specific Descriptive Statistics” on page 1516
1508 Chapter 40 / MEANS Procedure
Syntax
VAR variable(s) </ WEIGHT=weight-variable>;
Required Argument
variable(s)
identifies the analysis variables and specifies their order in the results.
Optional Argument
WEIGHT=weight-variable
specifies a numeric variable whose values weight the values of the variables
that are specified in the VAR statement. The variable does not have to be an
integer. The following table describes how PROC MEANS treats various values
of the WEIGHT variable.
Less than 0 Converts the value to zero and counts the observation in
the total number of observations
To exclude observations that contain negative and zero weights from the
analysis, use EXCLNPWGT. Note that most SAS/STAT procedures, such as
PROC GLM, exclude negative and zero weights by default.
The weight variable does not change how the procedure determines the range,
extreme values, or number of missing values.
Skewness and kurtosis are not available with the WEIGHT option.
Note Prior to Version 7 of SAS, the procedure did not exclude the
observations with missing weights from the count of observations.
Tips When you use the WEIGHT option, consider which value of the
VARDEF= option is appropriate. See the discussion of VARDEF=.
WAYS Statement
Specifies the number of ways to make unique combinations of class variables.
Tip: Use the TYPES statement to specify additional combinations of class variables.
Example: “Example 6: Using Preloaded Formats with Class Variables” on page 1531
Syntax
WAYS list;
Required Argument
list
specifies one or more integers that define the number of class variables to
combine to form all the unique combinations of class variables. For example,
you can specify 2 for all possible pairs and 3 for all possible triples. The list can
be specified in the following ways:
n m
n m1 m2 … mn
n m1,m2,…,mn
n m TO n <BY increment>
Example The following code is an example of creating two-way types for the
classification variables A, B, and C. This WAYS statement is
equivalent to specifying a*b, a*c, and b*c in the TYPES statement.
class A B C ; ways 2;
WEIGHT Statement
Specifies weights for observations in the statistical calculations.
See: For information about how to calculate weighted statistics and for an example that
uses the WEIGHT statement, see “WEIGHT” on page 82.
1510 Chapter 40 / MEANS Procedure
Syntax
WEIGHT variable;
Required Argument
variable
specifies a numeric variable whose values weight the values of the analysis
variables. The values of the variable do not have to be integers. The following
table describes how PROC MEANS treats various values of the WEIGHT
variable.
Less than 0 Converts the value to zero and counts the observation
in the total number of observations
To exclude observations that contain negative and zero weights from the
analysis, use EXCLNPWGT. Note that most SAS/STAT procedures, such as
PROC GLM, exclude negative and zero weights by default.
CAUTION
Single extreme weight values can cause inaccurate results. When one (and
only one) weight value is many orders of magnitude larger than the other weight
values (for example, 49 weight values of 1 and one weight value of 1×10 14), certain
statistics might not be within acceptable accuracy limits. The affected statistics are
based on the second moment (such as standard deviation, corrected sum of
squares, variance, and standard error of the mean). Under certain circumstances, no
warning is written to the SAS log.
Note Prior to Version 7 of SAS, the procedure did not exclude the
observations with missing weights from the count of observations.
Tip When you use the WEIGHT statement, consider which value of the
VARDEF= option is appropriate. See the discussion of VARDEF=
and the calculation of weighted statistics in “Keywords and
Formulas” on page 2700 for more information.
The computational details for confidence limits, hypothesis test statistics, and
quantile statistics follow.
Confidence Limits
With the keywords CLM, LCLM, and UCLM, you can compute confidence limits for
the mean. A confidence limit is a range, constructed around the value of a sample
statistic, that contains the corresponding true population value with given
probability (ALPHA=) in repeated sampling.
A two-sided 100 1 − α % confidence interval for the mean has upper and lower
limits
s
x ± t 1 − α/2; n − 1
n
1
where s is n−1
Σ xi − x 2 and t 1 − α/2; n − 1 is the ( 1 − α/2 ) critical value of the
Student's t statistics with n − 1 degrees of freedom.
If you use the WEIGHT statement or WEIGHT= in a VAR statement and the default
value of VARDEF=, which is DF, the 100 1 − α % confidence interval for the
weighted mean has upper and lower limits
sw
yw ± t 1 − α/2
n
Σ wi
i=1
Student's t Test
PROC MEANS calculates the t statistic as
x − μ0
t=
s/ n
where x is the sample mean, n is the number of nonmissing values for a variable,
and s is the sample standard deviation. Under the null hypothesis, the population
mean equals μ0 . When the data values are approximately normally distributed, the
probability under the null hypothesis of a t statistic as extreme as, or more extreme
than, the observed value (the p-value) is obtained from the t distribution with n − 1
degrees of freedom. For large n , the t statistic is asymptotically equivalent to a z
test.
When you use the WEIGHT statement or WEIGHT= in a VAR statement and the
default value of VARDEF=, which is DF, the Student's t statistic is calculated as
yw − μ0
tw =
n
sw / Σ wi
i=1
where yw is the weighted mean, sw is the weighted standard deviation, and wi is the
weight for ith observation. The tw statistic is treated as having a Student's t
distribution with n − 1 degrees of freedom. If you specify the EXCLNPWGT option
in the PROC statement, then n is the number of nonmissing observations when the
value of the WEIGHT variable is positive. By default, n is the number of nonmissing
observations for the WEIGHT variable.
Quantiles
The options QMETHOD=, QNTLDEF=, and QMARKERS= determine how PROC
MEANS calculates quantiles. QNTLDEF= deals with the mathematical definition of
a quantile. See “Quantile and Related Statistics” on page 2706. QMETHOD= deals
with the mechanics of how PROC MEANS handles the input data. The two methods
are
Results: MEANS Procedure 1513
OS
reads all data into memory and sorts it by unique value.
P2
accumulates all data into a fixed sample size that is used to approximate the
quantile.
If data set A has 100 unique values for a numeric variable X and data set B has
1000 unique values for numeric variable X, then QMETHOD=OS for data set B
takes 10 times as much memory as it does for data set A. If QMETHOD=P2, then
both data sets A and B requires the same memory space to generate quantiles.
Missing Values
PROC MEANS excludes missing values for the analysis variables before calculating
statistics. Each analysis variable is treated individually; a missing value for an
observation in one variable does not affect the calculations for other variables. The
statements handle missing values as follows:
n If a class variable has a missing value for an observation, then PROC MEANS
excludes that observation from the analysis unless you use the MISSING option
in the PROC statement or CLASS statement.
n If a BY or ID variable value is missing, then PROC MEANS treats it like any other
BY or ID variable value. The missing values form a separate BY group.
n If a FREQ variable value is missing or nonpositive, then PROC MEANS excludes
the observation from the analysis.
n If a WEIGHT variable value is missing, then PROC MEANS excludes the
observation from the analysis.
PROC MEANS tabulates the number of the missing values. Before the number of
missing values are tabulated, PROC MEANS excludes observations with
frequencies that are nonpositive when you use the FREQ statement and
1514 Chapter 40 / MEANS Procedure
observations with weights that are missing or nonpositive (when you use the
EXCLNPWGT option) when you use the WEIGHT statement. To report this
information in the procedure output use the NMISS statistical keyword in the PROC
statement.
In the output data set, the value of N Obs is stored in the _FREQ_ variable. Use the
NONOBS option in the PROC statement to suppress this information in the
displayed output.
Note: By default the statistics in the output data set automatically inherit the
analysis variable's format and label. However, statistics computed for N, NMISS,
SUMWGT, USS, CSS, VAR, CV, T, PROBT, PRT,SKEWNESS, and KURTOSIS do not
inherit the analysis variable's format because this format can be invalid for these
statistics. Use the NOINHERIT option in the OUTPUT statement to prevent the
other statistics from inheriting the format and label attributes.
Results: MEANS Procedure 1515
n the variable _TYPE_ that contains information about the class variables. By
default _TYPE_ is a numeric variable. If you specify CHARTYPE in the PROC
statement, then _TYPE_ is a character variable. When you use more than 32
class variables, _TYPE_ is automatically a character variable.
n the variable _FREQ_ that contains the number of observations that a given
output level represents.
n the variables requested in the OUTPUT statement that contain the output
statistics and extreme values.
n the variable _STAT_ that contains the names of the default statistics if you omit
statistic keywords.
n the variable _LEVEL_ if you specify the LEVEL option.
The value of _TYPE_ indicates which combination of the class variables PROC
MEANS uses to compute the statistics. The character value of _TYPE_ is a series of
zeros and ones, where each value of one indicates an active class variable in the
type. For example, with three class variables, PROC MEANS represents type 1 as
001, type 5 as 101, and so on.
Usually, the output data set contains one observation per level per type. However,
if you omit statistical keywords in the OUTPUT statement, then the output data set
contains five observations per level (six if you specify a WEIGHT variable).
Therefore, the total number of observations in the output data set is equal to the
sum of the levels for all the types that you request multiplied by 1, 5, or 6,
whichever is applicable.
If you omit the CLASS statement (_TYPE_= 0), then there is always exactly one
level of output per BY group. If you use a CLASS statement, then the number of
levels for each type that are requested have an upper bound equal to the number of
observations in the input data set. By default, PROC MEANS generates all possible
types. In this case the total number of levels for each BY group has an upper bound
equal to
m ⋅ 2k − 1 ⋅ n + 1
where k is the number of class variables and n is the number of observations for the
given BY group in the input data set and m is 1, 5, or 6.
PROC MEANS determines the actual number of levels for a given type from the
number of unique combinations of each active class variable. A single level consists
of all input observations whose formatted class values match.
The following figure shows the values of _TYPE_ and the number of observations in
the data set when you specify one, two, and three class variables.
1516 Chapter 40 / MEANS Procedure
Figure 40.1 The Effect of Class Variables on the OUTPUT Data Set
s
ble
s
ble
ble
ria
ria
va
ria
va
v a
S
AS
S
S
AS
AS
CL
CL
CL
ee
e
o
thr
on
tw
Details
This example does the following:
Example 1: Computing Specific Descriptive Statistics 1517
n computes the statistics for the specified keywords and displays them in order
Program
options nodate pageno=1 linesize=80 pagesize=60;
data cake;
input LastName $ 1-12 Age 13-14 PresentScore 16-17
TasteScore 19-20 Flavor $ 23-32 Layers 34 ;
datalines;
Orlando 27 93 80 Vanilla 1
Ramey 32 84 72 Rum 2
Goldston 46 68 75 Vanilla 1
Roe 38 79 73 Vanilla 2
Larsen 23 77 84 Chocolate .
Davis 51 86 91 Spice 3
Strickland 19 82 79 Chocolate 1
Nguyen 57 77 84 Vanilla .
Hildenbrand 33 81 83 Chocolate 1
Byron 62 72 87 Vanilla 2
Sanders 26 56 79 Chocolate 1
Jaeger 43 66 74 1
Davis 28 69 75 Chocolate 2
Conrad 69 85 94 Vanilla 1
Walters 55 67 72 Chocolate 2
Rossburger 28 78 81 Spice 2
Matthew 42 81 92 Chocolate 2
Becker 36 62 83 Spice 2
Anderson 27 87 85 Chocolate 1
Merritt 62 73 84 Chocolate 1
;
proc means data=cake n mean max min range std fw=8;
var PresentScore TasteScore;
title 'Summary of Presentation and Taste Scores';
run;
Program Description
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the CAKE data set. CAKE contains data from a cake-baking contest: each
participant's last name, age, score for presentation, score for taste, cake flavor, and
1518 Chapter 40 / MEANS Procedure
number of cake layers. The number of cake layers is missing for two observations.
The cake flavor is missing for another observation.
data cake;
input LastName $ 1-12 Age 13-14 PresentScore 16-17
TasteScore 19-20 Flavor $ 23-32 Layers 34 ;
datalines;
Orlando 27 93 80 Vanilla 1
Ramey 32 84 72 Rum 2
Goldston 46 68 75 Vanilla 1
Roe 38 79 73 Vanilla 2
Larsen 23 77 84 Chocolate .
Davis 51 86 91 Spice 3
Strickland 19 82 79 Chocolate 1
Nguyen 57 77 84 Vanilla .
Hildenbrand 33 81 83 Chocolate 1
Byron 62 72 87 Vanilla 2
Sanders 26 56 79 Chocolate 1
Jaeger 43 66 74 1
Davis 28 69 75 Chocolate 2
Conrad 69 85 94 Vanilla 1
Walters 55 67 72 Chocolate 2
Rossburger 28 78 81 Spice 2
Matthew 42 81 92 Chocolate 2
Becker 36 62 83 Spice 2
Anderson 27 87 85 Chocolate 1
Merritt 62 73 84 Chocolate 1
;
Specify the statistics and the statistic options. The statistic keywords specify the
statistics and their order in the output. FW= uses a field width of eight to display
the statistics.
proc means data=cake n mean max min range std fw=8;
Specify the analysis variables. The VAR statement specifies that PROC MEANS
calculate statistics on the PresentScore and TasteScore variables.
var PresentScore TasteScore;
Details
This example does the following:
n analyzes the data for the two-way combination of class variables and across all
observations
n limits the number of decimal places for the displayed statistics
Program
options nodate pageno=1 linesize=80 pagesize=60;
data grade;
input Name $ 1-8 Gender $ 11 Status $13 Year $ 15-16
Section $ 18 Score 20-21 FinalGrade 23-24;
datalines;
Abbott F 2 97 A 90 87
Branford M 1 98 A 92 97
Crandell M 2 98 B 81 71
Dennison M 1 97 A 85 72
Edgar F 1 98 B 89 80
Faust M 1 97 B 78 73
Greeley F 2 97 A 82 91
Hart F 1 98 B 84 80
Isley M 2 97 A 88 86
Jasper M 1 97 B 91 93
;
proc means data=grade maxdec=3;
var Score;
class Status Year;
types () status*year;
title 'Final Exam Grades for Student Status and Year of Graduation';
run;
Program Description
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
1520 Chapter 40 / MEANS Procedure
Create the GRADE data set. GRADE contains each student's last name, gender,
status of either undergraduate (1) or graduate (2), expected year of graduation,
class section (A or B), final exam score, and final grade for the course.
data grade;
input Name $ 1-8 Gender $ 11 Status $13 Year $ 15-16
Section $ 18 Score 20-21 FinalGrade 23-24;
datalines;
Abbott F 2 97 A 90 87
Branford M 1 98 A 92 97
Crandell M 2 98 B 81 71
Dennison M 1 97 A 85 72
Edgar F 1 98 B 89 80
Faust M 1 97 B 78 73
Greeley F 2 97 A 82 91
Hart F 1 98 B 84 80
Isley M 2 97 A 88 86
Jasper M 1 97 B 91 93
;
Generate the default statistics and specify the analysis options. Because no
statistics are specified in the PROC MEANS statement, all default statistics (N,
MEAN, STD, MIN, MAX) are generated. MAXDEC= limits the displayed statistics to
three decimal places.
proc means data=grade maxdec=3;
Specify the analysis variable. The VAR statement specifies that PROC MEANS
calculate statistics on the Score variable.
var Score;
Specify subgroups for the analysis. The CLASS statement separates the analysis
into subgroups. Each combination of unique values for Status and Year represents a
subgroup.
class Status Year;
Specify which subgroups to analyze. The TYPES statement requests that the
analysis be performed on all the observations in the GRADE data set as well as the
two-way combination of Status and Year, which results in four subgroups (because
Status and Year each have two unique values).
types () status*year;
Output
PROC MEANS displays the default statistics for all the observations (_TYPE_=0)
and the four class levels of the Status and Year combination (Status=1, Year=97;
Status=1, Year=98; Status=2, Year=97; Status=2, Year=98).
Example 3: Using the BY Statement with Class Variables 1521
Details
This example does the following:
n separates the analysis for the combination of class variables within BY values
Program
options nodate pageno=1 linesize=80 pagesize=60;
proc sort data=Grade out=GradeBySection;
by section;
run;
proc means data=GradeBySection min max median;
by Section;
var Score;
class Status Year;
title1 'Final Exam Scores for Student Status and Year of
Graduation';
title2 ' Within Each Section';
run;
Program Description
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Sort the GRADE data set. PROC SORT sorts the observations by the variable
Section. Sorting is required in order to use Section as a BY variable in the PROC
MEANS step.
proc sort data=Grade out=GradeBySection;
by section;
run;
Specify the analyses. The statistic keywords specify the statistics and their order
in the output.
proc means data=GradeBySection min max median;
Divide the data set into BY groups. The BY statement produces a separate analysis
for each value of Section.
by Section;
Specify the analysis variable. The VAR statement specifies that PROC MEANS
calculate statistics on the Score variable.
var Score;
Specify subgroups for the analysis. The CLASS statement separates the analysis
by the values of Status and Year. Because there is no TYPES statement in this
program, analyses are performed for each subgroup, within each BY group.
class Status Year;
Output
Output 40.5 Final Exam Scores
CLASS statement
Data sets: CAKE
CAKETYPE
Details
This example does the following:
n specifies the field width and decimal places of the displayed statistics
n uses only the values in CLASSDATA= data set as the levels of the combinations
of class variables
n calculates the range, median, minimum, and maximum
Program
options nodate pageno=1 linesize=80 pagesize=60;
data caketype;
input Flavor $ 1-10 Layers 12;
datalines;
Vanilla 1
Vanilla 2
Vanilla 3
Chocolate 1
Chocolate 2
Chocolate 3
;
proc means data=cake range median min max fw=7 maxdec=0
classdata=caketype exclusive printalltypes;
var TasteScore;
Program Description
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
Example 4: Using a CLASSDATA= Data Set with Class Variables 1525
Create the CAKETYPE data set. CAKETYPE contains the cake flavors and number
of layers that must occur in the PROC MEANS output.
data caketype;
input Flavor $ 1-10 Layers 12;
datalines;
Vanilla 1
Vanilla 2
Vanilla 3
Chocolate 1
Chocolate 2
Chocolate 3
;
Specify the analyses and the analysis options. The FW= option uses a field width
of seven and the MAXDEC= option uses zero decimal places to display the
statistics. CLASSDATA= and EXCLUSIVE restrict the class levels to the values that
are in the CAKETYPE data set. PRINTALLTYPES displays all combinations of class
variables in the output.
proc means data=cake range median min max fw=7 maxdec=0
classdata=caketype exclusive printalltypes;
Specify the analysis variable. The VAR statement specifies that PROC MEANS
calculate statistics on the TasteScore variable.
var TasteScore;
Specify subgroups for analysis. The CLASS statement separates the analysis by
the values of Flavor and Layers. Note that these variables, and only these variables,
must appear in the CAKETYPE data set.
class flavor layers;
Output
PROC MEANS calculates statistics for the 13 chocolate and vanilla cakes. Because
the CLASSDATA= data set contains 3 as the value of Layers, PROC MEANS uses 3
as a class value even though the frequency is zero.
1526 Chapter 40 / MEANS Procedure
Details
This example does the following:
n computes the statistics for the specified keywords and displays them in order
n analyzes the data for the one-way combination of cake flavor and the two-way
combination of cake flavor and participant's age
n assigns user-defined formats to the class variables
n orders the levels of the cake flavors by the descending frequency count and
orders the levels of age by the ascending formatted values
Program
options nodate pageno=1 linesize=80 pagesize=64;
proc format;
value $flvrfmt
'Chocolate'='Chocolate'
'Vanilla'='Vanilla'
'Rum','Spice'='Other Flavor';
value agefmt (multilabel)
15 - 29='below 30 years'
30 - 50='between 30 and 50'
51 - high='over 50 years'
15 - 19='15 to 19'
20 - 25='20 to 25'
25 - 39='25 to 39'
40 - 55='40 to 55'
56 - high='56 and above';
run;
proc means data=cake fw=6 n min max median nonobs;
class flavor/order=data;
class age /mlf order=fmt;
1528 Chapter 40 / MEANS Procedure
Program Description
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=80 pagesize=64;
Create the $FLVRFMT. and AGEFMT. formats. PROC FORMAT creates user-
defined formats to categorize the cake flavors and ages of the participants.
MULTILABEL creates a multilabel format for Age. A multilabel format is one in
which multiple labels can be assigned to the same value, in this case because of
overlapping ranges. Each value is represented in the output for each range in which
it occurs.
proc format;
value $flvrfmt
'Chocolate'='Chocolate'
'Vanilla'='Vanilla'
'Rum','Spice'='Other Flavor';
value agefmt (multilabel)
15 - 29='below 30 years'
30 - 50='between 30 and 50'
51 - high='over 50 years'
15 - 19='15 to 19'
20 - 25='20 to 25'
25 - 39='25 to 39'
40 - 55='40 to 55'
56 - high='56 and above';
run;
Specify the analyses and the analysis options. FW= uses a field width of six to
display the statistics. The statistic keywords specify the statistics and their order in
the output. NONOBS suppresses the N Obs column.
proc means data=cake fw=6 n min max median nonobs;
Specify subgroups for the analysis. The CLASS statements separate the analysis
by values of Flavor and Age. ORDER=DATA orders values according to their order in
the input data set. ORDER=FMT orders the levels of Age by ascending formatted
values. MLF specifies that multilabel value formats be used for Age.
class flavor/order=data;
class age /mlf order=fmt;
Example 5: Using Multilabel Value Formats with Class Variables 1529
Specify which subgroups to analyze. The TYPES statement requests the analysis
for the one-way combination of Flavor and the two-way combination of Flavor and
Age.
types flavor flavor*age;
Specify the analysis variable. The VAR statement specifies that PROC MEANS
calculate statistics on the TasteScore variable.
var TasteScore;
Format the output. The FORMAT statement assigns user-defined formats to the
Age and Flavor variables for this analysis.
format age agefmt. flavor $flvrfmt.;
Output
The one-way combination of class variables appears before the two-way
combination. A field width of six truncates the statistics to four decimal places. For
the two-way combination of Age and Flavor, the total number of observations is
greater than the one-way combination of Flavor. This situation arises because of
the multilabel format for age, which maps one internal value to more than one
formatted value. The order of the levels of Flavor is based on the frequency count
for each level. The order of the levels of Age is based on the order of the user-
defined formats.
1530 Chapter 40 / MEANS Procedure
Details
This example does the following:
n specifies the field width of the statistics
n includes all possible combinations of class variables values in the analysis even
if the frequency is zero
n considers missing values as valid class levels
n uses only the preloaded range of user-defined formats as the levels of class
variables
n orders the results by the value of the formatted data
Program
options nodate pageno=1 linesize=80 pagesize=64;
proc format;
value layerfmt 1='single layer'
1532 Chapter 40 / MEANS Procedure
2-3='multi-layer'
.='unknown';
value $flvrfmt (notsorted)
'Vanilla'='Vanilla'
'Orange','Lemon'='Citrus'
'Spice'='Spice'
'Rum','Mint','Almond'='Other Flavor';
run;
proc means data=cake fw=7 completetypes missing nonobs;
class flavor layers/preloadfmt exclusive order=data;
ways 1 2;
var TasteScore;
format layers layerfmt. flavor $flvrfmt.;
title 'Taste Score For Number of Layers and Cake Flavors';
run;
Program Description
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=80 pagesize=64;
Create the LAYERFMT. and $FLVRFMT. formats. PROC FORMAT creates user-
defined formats to categorize the number of cake layers and the cake flavors.
NOTSORTED keeps $FLVRFMT unsorted to preserve the original order of the
format values.
proc format;
value layerfmt 1='single layer'
2-3='multi-layer'
.='unknown';
value $flvrfmt (notsorted)
'Vanilla'='Vanilla'
'Orange','Lemon'='Citrus'
'Spice'='Spice'
'Rum','Mint','Almond'='Other Flavor';
run;
Generate the default statistics and specify the analysis options. FW= uses a field
width of seven to display the statistics. COMPLETETYPES includes class levels
with a frequency of zero. MISSING considers missing values valid values for all
class variables. NONOBS suppresses the N Obs column. Because no specific
analyses are requested, all default analyses are performed.
proc means data=cake fw=7 completetypes missing nonobs;
Specify subgroups for the analysis. The CLASS statement separates the analysis
by values of Flavor and Layers. PRELOADFMT and EXCLUSIVE restrict the levels
Example 6: Using Preloaded Formats with Class Variables 1533
Specify which subgroups to analyze. The WAYS statement requests one-way and
two-way combinations of class variables.
ways 1 2;
Specify the analysis variable. The VAR statement specifies that PROC MEANS
calculate statistics on the TasteScore variable.
var TasteScore;
Format the output. The FORMAT statement assigns user-defined formats to the
Flavor and Layers variables for this analysis.
format layers layerfmt. flavor $flvrfmt.;
Output
The one-way combination of class variables appears before the two-way
combination. PROC MEANS reports only the level values that are listed in the
preloaded range of user-defined formats even when the frequency of observations
is zero (in this case, citrus). PROC MEANS rejects entire observations based on the
exclusion of any single class value in a given observation. Therefore, when the
number of layers is unknown, statistics are calculated for only one observation. The
other observation is excluded because the flavor chocolate was not included in the
preloaded user-defined format for Flavor. The order of the levels is based on the
order of the user-defined formats. PROC FORMAT automatically sorted the Layers
format and did not sort the Flavor format.
1534 Chapter 40 / MEANS Procedure
Details
This example does the following:
n specifies the field width and number of decimal places of the statistics
Program
data charity;
input School $ 1-7 Year 9-12 Name $ 14-20 MoneyRaised 22-26
HoursVolunteered 28-29;
datalines;
Monroe 2016 Allison 31.65 19
Monroe 2016 Barry 23.76 16
Monroe 2016 Candace 21.11 5
run;
Program Description
Create the CHARITY data set. CHARITY contains information about high-school
students' volunteer work for a charity. The variables give the name of the high
school, the year of the fund-raiser, the first name of each student, the amount of
money each student raised, and the number of hours each student volunteered. A
DATA step creates this data set.
data charity;
input School $ 1-7 Year 9-12 Name $ 14-20 MoneyRaised 22-26
HoursVolunteered 28-29;
datalines;
Monroe 2016 Allison 31.65 19
Monroe 2016 Barry 23.76 16
Monroe 2016 Candace 21.11 5
Specify the analyses and the analysis options. FW= uses a field width of eight and
MAXDEC= uses two decimal places to display the statistics. ALPHA=0.1 specifies a
90% confidence limit, and the CLM keyword requests two-sided confidence limits.
MEAN and STD request the mean and the standard deviation, respectively.
proc means data=charity fw=8 maxdec=2 alpha=0.1 clm mean std;
Specify subgroups for the analysis. The CLASS statement separates the analysis
by values of Year.
class Year;
Specify the analysis variables. The VAR statement specifies that PROC MEANS
calculate statistics on the MoneyRaised and HoursVolunteered variables.
var MoneyRaised HoursVolunteered;
Output
PROC MEANS displays the lower and upper confidence limits for both variables for
each year.
Example 8: Computing Output Statistics 1537
Details
This example does the following:
n suppresses the display of PROC MEANS output
n stores the name of the student with the best final exam scores in a new variable
n stores the number of class variables that are combined in the _WAY_ variable
Program
options nodate pageno=1 linesize=80 pagesize=60;
proc means data=Grade noprint;
class Status Year;
var FinalGrade;
output out=sumstat mean=AverageGrade
idgroup (max(score) obs out (name)=BestScore)
/ ways levels;
run;
proc print data=sumstat noobs;
title1 'Average Undergraduate and Graduate Course Grades';
title2 'For Two Years';
run;
Program Description
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the analysis options. NOPRINT suppresses the display of all PROC
MEANS output.
proc means data=Grade noprint;
Specify subgroups for the analysis. The CLASS statement separates the analysis
by values of Status and Year.
class Status Year;
Specify the analysis variable. The VAR statement specifies that PROC MEANS
calculate statistics on the FinalGrade variable.
var FinalGrade;
Specify the output data set options. The OUTPUT statement creates the
SUMSTAT data set and writes the mean value for the final grade to the new
variable AverageGrade. IDGROUP writes the name of the student with the top
exam score to the variable BestScore and the observation number that contained
the top score. WAYS and LEVELS write information about how the class variables
are combined.
output out=sumstat mean=AverageGrade
idgroup (max(score) obs out (name)=BestScore)
/ ways levels;
run;
Print the output data set WORK.SUMSTAT. The NOOBS option suppresses the
observation numbers.
Example 9: Computing Different Output Statistics for Several Variables 1539
Output
The first observation contains the average course grade and the name of the
student with the highest exam score over the two-year period. The next four
observations contain values for each class variable value. The remaining four
observations contain values for the Year and Status combination. The variables
_WAY_, _TYPE_, and _LEVEL_ show how PROC MEANS created the class variable
combinations. The variable _OBS_ contains the observation number in the GRADE
data set that contained the highest exam score.
statistic keywords
PRINT procedure
WHERE= data set option
Data set: GRADE
Details
This example does the following:
n suppresses the display of PROC MEANS output
n stores the statistics for the class level and combinations of class variables that
are specified by WHERE= in the output data set
n orders observations in the output data set by descending _TYPE_ value
n stores the mean exam scores and mean final grades without assigning new
variables names
n stores the median final grade in a new variable
Program
options nodate pageno=1 linesize=80 pagesize=60;
proc means data=Grade noprint descend;
class Status Year;
var Score FinalGrade;
output out=Sumdata (where=(status='1' or _type_=0))
mean= median(finalgrade)=MedianGrade;
run;
proc print data=Sumdata;
title 'Exam and Course Grades for Undergraduates Only';
title2 'and for All Students';
run;
Program Description
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Example 9: Computing Different Output Statistics for Several Variables 1541
Specify the analysis options. NOPRINT suppresses the display of all PROC
MEANS output. DESCEND orders the observations in the OUT= data set by
descending _TYPE_ value.
proc means data=Grade noprint descend;
Specify subgroups for the analysis. The CLASS statement separates the analysis
by values of Status and Year.
class Status Year;
Specify the analysis variables. The VAR statement specifies that PROC MEANS
calculate statistics on the Score and FinalGrade variables.
var Score FinalGrade;
Specify the output data set options. The OUTPUT statement writes the mean for
Score and FinalGrade to variables of the same name. The median final grade is
written to the variable MedianGrade. The WHERE= data set option restricts the
observations in SUMDATA. One observation contains overall statistics (_TYPE_=0).
The remainder must have a status of 1.
output out=Sumdata (where=(status='1' or _type_=0))
mean= median(finalgrade)=MedianGrade;
run;
Output
The first three observations contain statistics for the class variable levels with a
status of 1. The last observation contains the statistics for all the observations (no
subgroup). Score contains the mean test score and FinalGrade contains the mean
final grade.
Details
This example does the following:
n suppresses the display of PROC MEANS output
n considers missing values as valid level values for only one class variable
n orders observations in the output data set by the ascending frequency for a
single class variable
n stores observations for only the highest _TYPE_ value
Program
options nodate pageno=1 linesize=80 pagesize=60;
proc means data=cake chartype nway noprint;
class flavor /order=freq ascending;
class layers /missing;
var TasteScore;
output out=cakestat max=HighScore;
run;
Example 10: Computing Output Statistics with Missing Class Variable Values 1543
Program Description
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the analysis options. NWAY prints observations with the highest _TYPE_
value. NOPRINT suppresses the display of all PROC MEANS output.
proc means data=cake chartype nway noprint;
Specify subgroups for the analysis. The CLASS statements separate the analysis
by Flavor and Layers. ORDER=FREQ and ASCENDING order the levels of Flavor by
ascending frequency. MISSING uses missing values of Layers as a valid class level
value.
class flavor /order=freq ascending;
class layers /missing;
Specify the analysis variable. The VAR statement specifies that PROC MEANS
calculate statistics on the TasteScore variable.
var TasteScore;
Specify the output data set options. The OUTPUT statement creates the
CAKESTAT data set and generates the maximum value for the taste score to the
new variable HighScore.
output out=cakestat max=HighScore;
run;
Output
The CAKESTAT output data set contains only observations for the combination of
the two class variables, Flavor and Layers. Therefore, the value of _TYPE_ is 11 for
all observations. The observations are ordered by ascending frequency of Flavor.
The missing value in Layers is a valid value for this class variable. PROC MEANS
excludes the observation with the missing flavor because it is an invalid value for
Flavor.
1544 Chapter 40 / MEANS Procedure
Details
This example does the following:
n identifies the observations with maximum values for two variables
Program
proc means data=Charity n mean range chartype;
class School Year;
var MoneyRaised HoursVolunteered;
output out=Prize maxid(MoneyRaised(name)
HoursVolunteered(name))= MostCash MostTime
max= ;
Program Description
Specify the analyses. The statistic keywords specify the statistics and their order
in the output. CHARTYPE writes the _TYPE_ values as binary characters in the
output data set
proc means data=Charity n mean range chartype;
Specify subgroups for the analysis. The CLASS statement separates the analysis
by School and Year.
class School Year;
Specify the analysis variables. The VAR statement specifies that PROC MEANS
calculate statistics on the MoneyRaised and HoursVolunteered variables.
var MoneyRaised HoursVolunteered;
Specify the output data set options. The OUTPUT statement writes the new
variables, MostCash, and MostTime, which contain the names of the students who
collected the most money and volunteered the most time, respectively, to the
PRIZE data set.
output out=Prize maxid(MoneyRaised(name)
HoursVolunteered(name))= MostCash MostTime
max= ;
Output
The first page of output shows the output from PROC MEANS with the statistics
for six class levels: one for Monroe High for the years 1992, 1993, and 1994; and one
for Kennedy High for the same three years.
The output from PROC PRINT shows the maximum MoneyRaised and
HoursVolunteered values and the names of the students who are responsible for
them. The first observation contains the overall results, the next three contain the
results by year, the next two contain the results by school, and the final six contain
the results by School and Year.
Example 12: Identifying the Top Three Extreme Values with the Output Statistics 1547
Details
This example does the following:
n suppresses the display of PROC MEANS output
n analyzes the data for the one-way combination of the class variables and across
all observations
n stores the total and average amount of money raised in new variables
n stores in new variables the top three amounts of money raised, the names of the
three students who raised the money, the years when it occurred, and the
schools the students attended
n automatically resolves conflicts in the variable names when names are assigned
to the new variables in the output data set
n appends the statistic name to the label of the variables in the output data set
that contain statistics that were computed for the analysis variable
n assigns a format to the analysis variable so that the statistics that are computed
from this variable inherit the attribute in the output data set
n renames the _FREQ_ variable in the output data set
Program
proc format;
value yrFmt . = " All";
value $schFmt ' ' = "All ";
run;
proc means data=Charity noprint;
class School Year;
types () school year;
var MoneyRaised;
output out=top3list(rename=(_freq_=NumberStudents))sum= mean=
idgroup( max(moneyraised) out[3] (moneyraised name
school year)=)/autolabel autoname;
label MoneyRaised='Amount Raised';
format year yrfmt. school $schfmt.
moneyraised dollar8.2;
run;
proc print data=top3list;
title1 'School Fund Raising Report';
title2 'Top Three Students';
run;
Example 12: Identifying the Top Three Extreme Values with the Output Statistics 1549
Program Description
Create the YRFMT. and $SCHFMT. formats. PROC FORMAT creates user-defined
formats that assign the value of All to the missing levels of the class variables.
proc format;
value yrFmt . = " All";
value $schFmt ' ' = "All ";
run;
Generate the default statistics and specify the analysis options. NOPRINT
suppresses the display of all PROC MEANS output.
proc means data=Charity noprint;
Specify subgroups for the analysis. The CLASS statement separates the analysis
by values of School and Year.
class School Year;
Specify which subgroups to analyze. The TYPES statement requests the analysis
across all the observations and for each one-way combination of School and Year.
types () school year;
Specify the analysis variable. The VAR statement specifies that PROC MEANS
calculate statistics on the MoneyRaised variable.
var MoneyRaised;
Specify the output data set options. The OUTPUT statement creates the
TOP3LIST data set. RENAME= renames the _FREQ_ variable that contains the
frequency count for each class level. SUM= and MEAN= specify that the sum and
mean of the analysis variable (MoneyRaised) are written to the output data set.
IDGROUP writes 12 variables that contain the top three amounts of money raised
and the three corresponding students, schools, and years. AUTOLABEL appends
the analysis variable name to the label for the output variables that contain the
sum and mean. AUTONAME resolves naming conflicts for these variables.
output out=top3list(rename=(_freq_=NumberStudents))sum= mean=
idgroup( max(moneyraised) out[3] (moneyraised name
school year)=)/autolabel autoname;
Format the output. The LABEL statement assigns a label to the analysis variable
MoneyRaised. The FORMAT statement assigns user-defined formats to the Year
and School variables and a SAS dollar format to the MoneyRaised variable.
label MoneyRaised='Amount Raised';
format year yrfmt. school $schfmt.
moneyraised dollar8.2;
run;
Display information about the TOP3LIST data set. PROC DATASETS displays the
contents of the TOP3LIST data set. NOLIST suppresses the directory listing for the
WORK data library.
proc datasets library=work nolist;
contents data=top3list;
title1 'Contents of the PROC MEANS Output Data Set';
run;
Output
The output from PROC PRINT shows the top three values of MoneyRaised, the
names of the students who raised these amounts, the schools the students
attended, and the years when the money was raised. The first observation contains
the overall results, the next three contain the results by year, and the final two
contain the results by school. The missing class levels for School and Year are
replaced with the value ALL. The labels for the variables that contain statistics that
were computed from MoneyRaised include the statistic name at the end of the
label.
See the TEMPLATE procedure in SAS Output Delivery System: User’s Guide for an
example of how to create a custom table template for this output data set.
Example 12: Identifying the Top Three Extreme Values with the Output Statistics 1551
The first example does not use the STACKODSOUTPUT option. The second
example uses the STACKODSOUTPUT option.
Example 13: Using the STACKODSOUTPUT Option to Control Data 1553
Program
proc means data=sashelp.class;
class sex;
var weight height;
ods output summary=default;
run;
proc print data=default; run;
proc contents data=default; run;
Program Description
This code processes the data without using the STACKODSOUTPUT option.
proc means data=sashelp.class;
class sex;
var weight height;
ods output summary=default;
run;
Prints the data using PROC PRINT. Print the contents of the procedure using
PROC CONTENTS.
proc print data=default; run;
proc contents data=default; run;
OUTPUT
The following outputs show the difference in processing data using the
STACKODSOUTPUT option and then not using the option.
class sex;
var weight height;
ods output summary=stacked;
run;
Print the data using PROC PRINT. Print the contents of the procedure using PROC
CONTENTS.
proc print data=stacked; run;
proc contents data=stacked; run;
References
Jain R., and I. Chlamtac. 1985. “The P² Algorithm for Dynamic Calculation of
Quantiles and Histograms without Sorting Observations.” Communications of
the Association of Computing Machinery 28 (10): 1076–0185.
1559
41
MIGRATE Procedure
The procedure migrates a library from most SAS 6, SAS 7, SAS 8, and SAS ®9
operating environments to the current release of SAS. The migration must occur
within the same engine family. For example, V6, V7, or V8 can migrate to V9, but
V6TAPE must migrate to V9TAPE.
The procedure does not support stored compiled DATA step programs or stored
compiled macros. (Instead, move the source code to the target, where you can
compile and store it.) The procedure does not support SAS program files. The
procedure does not support Scalable Performance Data (SPD) engine data sets.
(See SAS Scalable Performance Data Engine: Reference.) The procedure does not
support the extended observation count attribute.
Data Sets
PROC MIGRATE retains alternate collating sequence, compression, created and
modified datetimes, deleted observations, encryption, extended attributes, indexes,
Concepts: MIGRATE Procedure 1561
The audit trail and generations are also migrated. Indexes and integrity constraints
are rebuilt on the member in the target library. See “Migrating a Data Set with Audit
Trails, Generations, Indexes, or Integrity Constraints” on page 1567.
Migrated data sets take on the data representation and encoding attributes of the
target library. When you migrate a data set to an encoding where the characters are
represented by more bytes, truncation might occur if the column length does not
accommodate the larger character size. For example, a character might be
represented in Wlatin1 encoding as one byte but in UTF-8 as two bytes. The best
solution is to expand the column length with the CVP engine and PROC COPY
before you migrate. (PROC MIGRATE does not currently support the CVP engine.)
The CVPMULTIPLIER=2.5 value is usually sufficient to avoid truncation. If your
data contains Asian characters, CVPMULTIPLIER=4 is recommended. For more
information, read about the CVPMULTIPLIER= option and avoiding character data
truncation in SAS National Language Support (NLS): Reference Guide. See also the
paper “Multilingual Computing with SAS® 9.4” on support.sas.com.
For SAS data sets that use the ASCII-OEM character set, PROC MIGRATE does not
translate non-English characters. This issue is very uncommon. To migrate a SAS
data set with ASCII-OEM characters, use the CPORT and CIMPORT procedures
with the TRANTAB option. Specify the appropriate TRANTAB values for the source
and target data sets.
Views
As with data sets, migrated data views take on the data representation and
encoding attributes of the target library. When you migrate a library that contains
DATA step views to a different operating environment, and the views were created
prior to SAS 9.2, you might need to set the proper encoding. In releases prior to SAS
9.2, DATA step views did not save encoding information. Therefore, if the view has a
different encoding than the target session, you must specify the INENCODING=
option for the source library's LIBNAME statement. Here is an example:
libname Srclib 'source-library-pathname' inencoding="OPEN_ED-1047";
libname Lib1 'target-library-pathname';
proc migrate in=Srclib out=Lib1;
run;
In addition, embedded librefs associated with a view are not updated during
migration. The following example illustrates the issue. In this example, Lib1.MyView
contains a view of the data set Lib1.MyData:
data Lib1.MyData;x=1;
run;
proc sql;
create view Lib1.MyView as select * from Lib1.MyData;
quit;
After you migrate Lib1 to Lib2, you have Lib2.MyView and Lib2.MyData. However,
because Lib2.MyView was originally created with an embedded libref of Lib1, it still
1562 Chapter 41 / MIGRATE Procedure
references the data set Lib1.MyData, not Lib2.MyData. The following example fails
with an error message that Lib1 cannot be found:
proc print data=Lib2.MyView;
run;
PROC MIGRATE supports three types of views: DATA step views, SQL views, and
SAS/ACCESS views:
DATA Step Views
When you create a DATA step view, you can specify the SOURCE= option to
store the DATA step code along with the view. PROC MIGRATE supports DATA
step views with stored code. The stored code is recompiled the first time the
DATA step view is accessed by SAS in the target environment. PROC MIGRATE
does not support DATA step views that were created prior to SAS 8 or DATA
step views without stored code. For DATA step views without stored code, use
the DESCRIBE statement in the source session to recover the DATA step code.
Then submit the DATA step code in the target session and recompile it.
PROC SQL Views
PROC MIGRATE supports PROC SQL views with no known issues.
SAS/ACCESS Views
PROC MIGRATE supports SAS/ACCESS views that were written with the
Oracle, SAP, or DB2 engine. PROC MIGRATE automatically uses the CV2VIEW
procedure, which converts SAS/ACCESS views into SQL views. Migrating
SAS/ACCESS views to a different operating environment is not supported. For
more information about the conversion, see the overview of the CV2VIEW
procedure in SAS/ACCESS for Relational Databases: Reference.
Catalogs
To migrate catalogs, PROC MIGRATE calls PROC CPORT and PROC CIMPORT. You
might notice that CPORT and CIMPORT notes are written to the SAS log during
migration. PROC CPORT and CIMPORT restrictions apply. For example, catalogs in
sequential libraries are not migrated. Stored compiled macros that are stored in
catalogs are not supported. (Instead, move the source code to the target, where
you can compile and store it.) Catalog entries might need to be updated after
migrating (for example, code that contains hardcoded pathnames).
page 1571. If the catalogs were created in SAS 6 or SAS 8, SLIBREF= must
be assigned through a SAS 8 server.
MDDBs
PROC MIGRATE supports MDDBs with no known issues.
Items Stores
PROC MIGRATE supports item stores unless you migrate from a 32-bit to a 64-bit
environment. Migrations from 32-bit to 64-bit environments use Remote Library
Services (RLS), which does not support item stores. In that case, an error message
is not written to the SAS log, but item stores might not work correctly in the target
library.
Not Supported
PROC MIGRATE does not support stored compiled DATA step programs or stored
compiled macros. (Instead, move the source code to the target, where you can
compile and store it.) PROC MIGRATE does not support SAS program files. PROC
MIGRATE does not support SPD Engine data sets. (See SAS Scalable Performance
Data Engine: Reference.) See also restrictions above for each member type.
SAS 8.2 source libraries from the following operating environments are not
supported: CMS, OS/2, OpenVMS VAX, or 64-bit AIX. See “Additional Steps for
Unsupported Catalogs or Libraries” on page 1573.
Catalogs that were created under a Tru64 UNIX source environment are not
supported for a Linux for x64 or a Solaris for x64 target environment. To migrate
catalogs under those conditions, see “Additional Steps for Unsupported Catalogs or
Libraries” on page 1573.
SAS files that were created prior to SAS 6.12 (SAS 6.09E for z/OS) must be
converted to SAS 6.12 before they can be migrated to the current release of SAS.
Some SAS 6 source environments are not supported; see “Migrating a SAS 6
Library” on page 1568.
This procedure is not available in SAS Viya orders that include only SAS Visual
Analytics.
Interaction: The International Components for Unicode (ICU) version is used to sort data sets
with a linguistic collating sequence. If a linguistically sorted data set has a different
ICU version number than that of the current SAS session, the following occurs:
PROC MIGRATE retains the data set's sort order in the OUT= destination library.
However, the data set is no longer marked as sorted, and a message is written to
the SAS log. For more information about linguistic sorting, see Chapter 64, “SORT
Procedure,” on page 2355.
Tips: Assign the OUT= target library to an empty location. If a member already exists in
the target library that has the same name and member type as a member in the
source library, the member is not migrated. An error message is written to the SAS
log, and PROC MIGRATE continues with the next member. Note that members in a
sequential library are an exception, because PROC MIGRATE does not read the
entire tape to determine existence.
For encoding and transcoding issues, see the SAS National Language Support (NLS):
Reference Guide.
Syntax
PROC MIGRATE IN=libref-1 OUT=libref-2
<BUFSIZE=KEEPSIZE | n | nK | nM | nG>
PROC MIGRATE Statement 1565
<MOVE>
<SLIBREF=libref>
<KEEPNODUPKEY>;
Required Arguments
IN=libref-1
names the source SAS library from which to migrate members.
Restriction Concatenated libraries are not supported. See SAS Note 24539 at
https://support.sas.com/kb/24/539.html.
If CATALOG is the only member type in the library and you are
using the SLIBREF= option, then omit the IN= argument.
OUT=libref-2
names the target SAS library to contain the migrated members.
Requirement Assign the OUT= target library to a different physical location than
the IN= source library.
Interaction PROC MIGRATE can use the LIBNAME option OUTREP= for DATA,
VIEW, ACCESS, MDDB, and DMDB member types. If you specify
the OUTREP= option, you might also want to specify the
EXTENDOBSCOUNTER= option. These options are appropriate in
the LIBNAME statement for the OUT= library. See SAS DATA Step
Statements: Reference.
Optional Arguments
BUFSIZE=KEEPSIZE | n | nK | nM | nG
specifies the buffer page size of the members that are written to the target
library. For example, a value of 10000 specifies a page size of 10,000 bytes, and
a value of 4k specifies a page size of 4096 bytes. A value of 0 results in the
default. Setting the page size can help optimize SAS performance. In SAS 9.4M3,
the BUFSIZE default is changed. The new default is the buffer page size of the
current session. To continue using the previous behavior, which is to clone the
page size of the members from the source library, specify BUFSIZE=KEEPSIZE.
For more details about the BUFSIZE= data set option or system option, see the
documentation for your operating environment.
KEEPSIZE
retains (clones) the page size of members from the source library.
n
specifies the number of bytes.
nK
specifies the number of kilobytes.
nM
specifies the number of megabytes.
nG
specifies the number of gigabytes.
MOVE
deletes the original members from the source library. If a member already exists
in the target library, the member is not deleted from the source library and a
message is sent to the SAS log. If a catalog already exists in the target library,
then no catalogs are deleted from the source library and a message is sent to
the log. If a data set has referential integrity constraints, the data set is not
deleted from the source library and an error is sent to the log.
Restriction The engine that is associated with the IN= source library must
support the deletion of tables. Sequential engines do not support
the deletion of tables.
SLIBREF=libref
specifies a libref that is assigned through a SAS/CONNECT or SAS/SHARE
server. If cross-environment data access (CEDA) processing is invoked, and if
the IN= source library contains catalogs, then you must specify a
SAS/CONNECT or SAS/SHARE libref in the IN= option or in the SLIBREF=
Usage: MIGRATE Procedure 1567
Interactions If CATALOG is the only member type in the library and you are
using the SLIBREF= option, then omit the IN= argument.
KEEPNODUPKEY
specifies to retain the NODUPKEY sort order. See “Migrating a SAS Data Set
with NODUPKEY Sort Indicator” on page 1568.
moved a data set using the operating system and failed to include an index in
the move.
n If an error occurs while integrity constraints are applied to a migrated data set,
or while an audit trail or generations are migrated, the data set is removed from
the target library. A note is written to the SAS log. If the MOVE option is
specified, it does not delete the data set from the source library.
n For a data set with referential integrity constraints, the MOVE option does not
delete any members in the source library, even when the migration is successful.
You must remove referential integrity constraints before the member can be
deleted. An error message is written to the SAS log.
Under the default behavior (without the KEEPNODUPKEY option), the SAS data
set retains its sort indicator in the target library. However, the NODUPKEY
attribute is removed, and a warning message is written to the SAS log. This is the
default behavior because SAS data sets that were sorted with the NODUPKEY
option in previous releases might retain observations with duplicate keys. You can
re-sort the migrated SAS data set by the key variables in PROC SORT so that
observations with duplicate keys are eliminated and the correct attributes are
recorded.
If you specify the KEEPNODUPKEY option, you must examine your migrated data
to determine whether observations with duplicate keys exist. If so, you must re-sort
the SAS data set to have the data and NODUPKEY sort attribute match.
If your catalogs or libraries are not supported, see “Additional Steps for
Unsupported Catalogs or Libraries” on page 1573. For important information about
the SLIBREF= option, see “Using a SAS/CONNECT or SAS/SHARE Server” on page
1571.
Usage: MIGRATE Procedure 1569
SAS 6.12 AIX, HP-UX, or AIX, HP-UX, or Solaris PROC MIGRATE does not
Solaris SPARC SPARC support catalogs from SAS
6 AIX.
For HP-UX or Solaris
libraries that contain
catalogs, specify the
SLIBREF= option.
SAS files that were created prior to SAS 6.12 (SAS 6.09E for z/OS) must be
converted to SAS 6.12 before they can be migrated to the current release of SAS.
See “Additional Steps for Unsupported Catalogs or Libraries” on page 1573.
Overview
In SAS 7 and 8, the SHORTFILEEXT option creates a file with a shortened, three-
character extension on PC operating environments only. This feature is necessary
for operating systems that use a file allocation table (FAT) file system. The FAT file
system is also referred to as 8.3 because a file name can include up to eight
characters and a file extension can include up to three characters. These files are
created on PC environments. They are not usable by SAS on other environments.
1570 Chapter 41 / MIGRATE Procedure
Note: SAS 6 files all have three-character extensions but are not affected by this
issue. You can distinguish SAS 6 files because their extensions do not contain the
number 7.
Below is a table of the short and standard extensions for SAS 7 and 8 files. To
determine whether a library contains files with short extensions, look at the file
names in the SAS Explorer or use the file management tools of your operating
environment.
Table 41.2 Short and Standard File Extensions for SAS 7 and 8 Files
For example, a library named MyLib contains two files with short extensions: a SAS
data set named MyData.sd7 and a catalog named MyCat.sc7. Use the following
code to migrate the library to SAS ®9:
libname MyLib v8 'source-library-pathname' shortfileext;
libname NewLib v9 'target-library-pathname';
After migration, the target library NewLib contains two files with standard
extensions: a SAS data set named MyData.sas7bdat and a catalog named
MyCat.sas7bcat.
Here are two ways to determine whether a SAS library contains catalogs:
n Use operating system tools to examine the file system. The file extension for
SAS catalogs is .sas7bcat.
n Submit a DATASETS procedure with MEMTYPE=CAT. Examine the SAS log for a
list of catalogs.
1572 Chapter 41 / MIGRATE Procedure
For SAS 6 files, use the SLIBREF= option for SAS 6 HP-UX or Solaris libraries that
contain catalogs. See “Migrating a SAS 6 Library” on page 1568.
If the catalogs were created in SAS 6 or SAS 8, SLIBREF= must be assigned through
a SAS 8 server. (Note that this is not the same server that you assign through the
IN= argument. If you assign a server through the IN= argument, the IN= server must
be SAS 9.1.3 or later.)
If you cannot meet these requirements, use the alternate method described in
“Additional Steps for Unsupported Catalogs or Libraries” on page 1573.
When you use the SLIBREF= option for a SAS 8.2 library, multilabel formats are not
supported. If a catalog contains a multilabel format, the format is not created on
the target and an error is printed to the log. See SAS Note 20052, which is available
from SAS customer support.
Usage: MIGRATE Procedure 1573
PROC MIGRATE is not supported for migrating catalogs under the following
circumstances:
n SAS 6 AIX catalogs to any target library.
n Tru64 UNIX catalogs to either Linux for x64 or Solaris for x64 target library.
You can use additional steps for just the catalogs, if unsupported catalogs are the
only issue. By using PROC MIGRATE for other members of the library, you can
retain those members' attributes.The CPORT and CIMPORT method below has
some limitations. For example, when transcoding to a new encoding, truncation can
occur. If truncation occurs, you must expand variable lengths. You can either use
the CVP engine with PROC CPORT or use the EXTENDVAR= option with PROC
CIMPORT.
Process
1 In the source session, create a transport file with PROC CPORT. (See Chapter
16, “CPORT Procedure,” on page 537.)
2 Move the transport file to the target environment. Do not use RLS (a feature of
SAS/CONNECT and SAS/SHARE software) to move catalogs, or you will
encounter errors. You must use binary FTP, the DOWNLOAD procedure,
Network File System (NFS), or another method of directly accessing files.
3 In the target session, use CIMPORT to import the transport file. (See Chapter 12,
“CIMPORT Procedure,” on page 389.)
1574 Chapter 41 / MIGRATE Procedure
If a catalog entry contains 7-bit ASCII characters only, then the catalog entry can be
used in any other ASCII session, including UTF-8. (7-bit ASCII characters include
the letters of the English alphabet, digits, and symbols that are frequently used in
punctuation or SAS syntax.)
The CVP engine can help avoid truncation when you re-create a formats catalog in
an encoding where the characters are represented by more bytes. Truncation might
occur if a format length does not accommodate the larger character size. In the
target session, specify the CVPMULT= option in the source LIBNAME statement.
Note that the CVP engine cannot increase the values of the START, MIN, MAX, or
LABEL variables in the CNTLOUT= data set. In addition, when you use CNTLIN= to
import catalogs, you might experience transcoding errors that are not written to the
log. For example, a character might not be available in the target encoding, or the
code point could be used for a different character in the target encoding. If you
continue to experience transcoding problems, re-create the format in the target
session. If you do not have access to the SAS program statements that were used
to create the original format, you can print the CNTLOUT= data set in the source
session and use that information as a guide.
IN=
OUT=
Details
In this example, the following is demonstrated:
n A spawner starts a session on the SAS/CONNECT server to access remote SAS
data. Here are two reasons to use a SAS/CONNECT or SAS/SHARE server:
o You can use a server to migrate across machines when you do not have
direct access.
o You are required to use a server if both of the following conditions are met:
n The source library contains catalogs.
n Processing would invoke CEDA in the target session.
In general, CEDA is invoked when you migrate to an incompatible
operating environment. For more information about CEDA, see “Cross-
Environment Data Access” in SAS Programmer’s Guide: Essentials.
n The SLIBREF= argument is not used. (To learn whether SLIBREF= is required,
see “Using a SAS/CONNECT or SAS/SHARE Server” on page 1571.)
n The IN= argument accesses all of the supported file types in the source library,
including incompatible catalogs, by using SAS/CONNECT or SAS/SHARE
software. This example uses SAS/CONNECT software. The SAS/CONNECT or
SAS/SHARE server that you assign to the IN= argument must be SAS 9.1.3 or
later.
Program
options comamid=tcp;
%let myserver=host.name.com;
signon myserver.__1234 user=userid password='mypw';
libname source 'source-library-pathname' server=myserver.__1234;
libname target 'target-library-pathname';
proc migrate in=source out=target <options>;
run;
signoff myserver.__1234;
Program Description
In the target session, submit the following code to invoke a SAS/CONNECT
spawner. The COMAMID= system option specifies TCP/IP as the communications
access method. The myserver macro variable is assigned to the host name of the
1576 Chapter 41 / MIGRATE Procedure
Assign the source and target libraries. Substitute your library paths, and specify
the myserver macro variable for SERVER= in the source library. Include the port
number if it is specified in the SIGNON statement.
libname source 'source-library-pathname' server=myserver.__1234;
libname target 'target-library-pathname';
Log Messages
The following SAS log messages indicate that a data set and a formats catalog are
migrated. PROC MIGRATE calls the CPORT and CIMPORT procedures to migrate
catalogs. Normally CEDA does not support catalogs, but using a SAS/CONNECT
server avoids CEDA restrictions. The server must have the same data
representation and encoding as the data.
Details
In this example, the following is demonstrated:
n The source and target libraries are directly accessible by the target computer.
For example, you might use NFS, which is a standard protocol of UNIX operating
environments. See the documentation for NFS and for your operating
environment.
n A SAS/CONNECT or SAS/SHARE server is not used. To learn whether a server
is required, see “Using a SAS/CONNECT or SAS/SHARE Server” on page 1571.
n If CEDA processing is not invoked, then any catalogs in the library are included
in the migration. If CEDA is invoked, then catalogs are not migrated.
Program
From a session in the current release of SAS, submit the following.
libname Source 'source-library-pathname';
libname Target 'target-library-pathname';
Details
In this example, the following is demonstrated:
n The source and target libraries are on different computers.
n The SLIBREF= argument accesses the catalogs in the source library. (To learn
whether SLIBREF= is required, see “Using a SAS/CONNECT or SAS/SHARE
Server” on page 1571.) The SLIBREF= argument must be assigned to a
1578 Chapter 41 / MIGRATE Procedure
Program
signon v8srv sascmd='my-v8–sas-invocation-command';
rsubmit;
libname Srclib <engine> 'source-library-pathname';
endrsubmit;
libname Source <engine> '/nfs/v8machine-name/source-library-pathname';
libname Srclib <engine> server=v8srv;
libname Target <engine> 'target-library-pathname';
proc migrate in=Source out=Target slibref=Srclib <options>;
run;
proc migrate out=Target slibref=Srclib <options>;
run;
Program Description
From a session in the current release of SAS, submit the SIGNON command to
invoke a SAS/CONNECT server session. Note that because you are working across
computers, you might specify a machine name in the server ID.
signon v8srv sascmd='my-v8–sas-invocation-command';
Within this remote SAS 8.2 session, assign a libref to the source library that
contains the library members to be migrated. Use the RSUBMIT and
ENDRSUBMIT commands for SAS/CONNECT.
rsubmit;
libname Srclib <engine> 'source-library-pathname';
endrsubmit;
Example 3: Migrating with Incompatible Catalogs When the SLIBREF= Option Is Required
1579
In the local (client) session in the current release, assign to the same source
library through NFS.
libname Source <engine> '/nfs/v8machine-name/source-library-pathname';
Assign the same libref to the same source libref as in step 2 (in this example,
Srclib). But do not assign the libref to a physical location. Instead, specify the
SERVER= option with the server ID (in this example, V8SRV) that you assigned in
the SIGNON command in step 1.
libname Srclib <engine> server=v8srv;
Use PROC MIGRATE with the SLIBREF= option. For the IN= and OUT= options,
specify the usual source and target librefs (in this example, Source and Target,
respectively). Set SLIBREF= to the libref that uses the SERVER= option (in this
example, Srclib).
proc migrate in=Source out=Target slibref=Srclib <options>;
run;
Alternatively, if CATALOG is the only member type in the library and you are
using the SLIBREF= option, then omit the IN= argument.
proc migrate out=Target slibref=Srclib <options>;
run;
1580 Chapter 41 / MIGRATE Procedure
1581
42
OPTIONS Procedure
SAS system options control how SAS formats output, handles files, processes data
sets, interacts with the operating environment, and does other tasks that are not
specific to a single SAS program or data set. You use the OPTIONS procedure to
obtain information about an option or a group of options. Here is some of the
information that the OPTIONS procedure provides:
n the current value of an option and how it was set
n a description of an option
n valid syntax for the option, valid option values, and the range of values
n if an option value has been modified by the INSERT or APPEND system options
For additional information about SAS system options, see SAS System Options:
Reference.
PROC OPTIONS List the current system option settings to the Ex. 1, Ex. 2,
SAS Log Ex. 3, Ex. 4
Examples: “Example 1: Producing the Short Form of the Options Listing” on page 1598
“Example 2: Displaying the Setting of a Single Option” on page 1599
“Example 3: Displaying Expanded Path Environment Variables” on page 1600
“Example 4: List the Options That Can Be Specified by the INSERT and APPEND
Options” on page 1601
Syntax
PROC OPTIONS <options>;
HOST
displays only host options.
LISTINSERTAPPEND
lists the system options whose value can be modified by the INSERT and
APPEND system options.
LISTOPTSAVE
lists the system options that can be saved with PROC OPTSAVE or the
DMOPTSAVE command.
LISTRESTRICT
lists the system options that can be restricted by your site administrator.
NOHOST
displays only portable options.
OPTION=option-name
OPTION=(option-name-1 … option-name-n)
displays information about one or more system options.
RESTRICT
displays system options that the site administrator has restricted from
being updated.
Optional Arguments
DEFINE
displays the short description of the option, the option group, and the option
type. SAS displays information about when the option can be set, whether an
option can be restricted, the valid values for the option, and whether the
OPTSAVE procedure will save the option.
Restriction Saving and loading system options is not valid in SAS Viya.
Information about whether the option can be saved or loaded is
displayed only for SAS 9.4.
EXPAND
when displaying a character option, replaces an environment variable in the
option value with the value of the environment variable. EXPAND is ignored if
the option is a Boolean option, such as CENTER or NOCENTER, or if the value
of the option is numeric.
Restriction Variable expansion is valid only in the Windows and UNIX operating
environments.
See “NOEXPAND” on page 1586 option to view paths that display the
environment variable
GROUP=group-name
GROUP=(group-name–1 ... group-name-n)
displays the options in one or more groups specified by group-name.
Requirement When you specify more than one group, enclose the group names
in parenthesis and separate the group names by a space.
HEXVALUE
displays system option character values as hexadecimal values.
HOST
displays only host options.
LISTINSERTAPPEND
lists the system options whose value can be modified by the INSERT and
APPEND system options. The INSERT option specifies a value that is inserted
as the first value of a system option value list. The APPEND option specifies a
value that is appended as the last value of a system option value list. Use the
LISTINERTAPPEND option to display which system options can have values
inserted at the beginning or appended at the end of their value lists.
Example “Example 4: List the Options That Can Be Specified by the INSERT
and APPEND Options” on page 1601
LISTGROUPS
lists the system option groups as well as a description of each group.
LISTOPTSAVE
lists the system options that can be saved with PROC OPTSAVE or the
DMOPTSAVE command.
Restriction This option is not valid in SAS Viya. PROC OPTSAVE and the
DMOPTSAVE command are not valid in SAS Viya.
LISTRESTRICT
lists the system options that can be restricted by your site administrator.
1586 Chapter 42 / OPTIONS Procedure
See “RESTRICT” on page 1587 option to list options that have been restricted
by the site administrator
LONG
lists each system option on a separate line with a description. This is the
default. Alternatively, you can create a compressed listing without descriptions.
Example “Example 1: Producing the Short Form of the Options Listing” on page
1598
LOGNUMBERFORMAT
displays numeric system option values using locale-specific punctuation.
NOEXPAND
when displaying a path, displays the path using environment variable(s) and not
the value of the environment variable(s). This is the default.
See “EXPAND” on page 1584 option to display a path by expanding the value of
environment variables
NOHOST
displays only portable options.
NOLOGNUMBERFORMAT
displays numeric system option values without using punctuation, such as a
comma or a period. This is the default.
OPTION=option-name
OPTION=(option-name-1 … option-name-n)
displays a short description and the value (if any) of the option specified by
option-name. DEFINE and VALUE options provide additional information about
the option.
option-name
specifies the option to use as input to the procedure.
RESTRICT
displays the system options that have been set by your site administrator in a
restricted options configuration file. These options cannot be changed by the
user. For each option that is restricted, the RESTRICT option displays the
option's value, scope, and how it was set.
If your site administrator has not restricted any options, then the following
message appears in the SAS log:
Your Site Administrator has not restricted any SAS options.
See “LISTRESTRICT” on page 1585 option to list options that can be restricted
by the site administrator
SHORT
specifies to display a compressed listing of options without descriptions.
See “LONG” on page 1586 option to create a listing with descriptions of the
options.
VALUE
displays the option's value and scope, as well as how the value was set. If the
value was set using a configuration file, the SAS log displays the name of the
configuration file. If the option was set using the INSERT or APPEND system
options, the SAS log displays the value that was inserted or appended.
The following example shows a partial log that displays the settings of portable
options.
proc options;
run;
Example Code 42.1 The SAS Log Showing a Partial Listing of SAS System Options
Portable Options:
NOACCESSIBLECHECK Do not detect and log ODS output that is not accessible.
NOACCESSIBLEGRAPH Do not create accessible ODS graphics by default.
NOACCESSIBLEPDF Do not create accessible PDF files by default.
NOACCESSIBLETABLE Do not create accessible tables for enabled procedures, by default.
The log displays both portable and host options when you submit proc options;.
To view only host options, use this version of the OPTIONS procedure:
proc options host;
run;
Usage: OPTIONS Procedure 1589
Example Code 42.2 The SAS Log Showing a Partial List of Host Options
Host Options:
ACCESSIBILITY=STANDARD
Specifies whether accessibility features are enabled in the Customize Tool
dialog box and in some Properties dialog boxes.
ALIGNSASIOFILES Aligns SAS files on a page boundary for improved performance.
ALTLOG= Specifies the location for a copy of the SAS log when SAS is running in batch
mode.
ALTPRINT= Specifies the location for a copy of the SAS procedure output when SAS is
running in batch mode.
AUTHPROVIDERDOMAIN=
Specifies the authentication provider that is associated with a domain.
AUTHSERVER= Specifies the domain server that finds and authenticates secure server logins.
AWSCONTROL=(SYSTEMMENU MINMAX TITLE)
Specifies whether the main SAS window includes a title bar, a system control
menu, and minimize and maximize buttons.
AWSDEF=(0 0 79 80)
Specifies the location and dimensions of the main SAS window when SAS
initializes.
AWSMENU Displays the menu bar in the main SAS window.
ERRORCHECK=NORMAL
Option Definition Information for SAS Option ERRORCHECK
Group= ERRORHANDLING
Group Description: Error messages and error conditions settings
Description: Specifies whether SAS enters syntax-check mode when errors are found in the
LIBNAME, FILENAME, %INCLUDE, and LOCK statements.
Type: The option value is of type CHARACTER
Maximum Number of Characters: 10
Casing: The option value is retained uppercased
Quotes: If present during "set", start and end quotes are removed
Parentheses: The option value does not require enclosure within parentheses. If
present, the parentheses are retained.
Expansion: Environment variables, within the option value, are not expanded
Number of valid values: 2
Valid value: NORMAL
Valid value: STRICT
When Can Set: Startup or anytime during the SAS Session
Restricted: Your Site Administrator can restrict modification of this option
Optsave: PROC Optsave or command Dmoptsave will save this option
To view the settings for more than one option, enclose the options in parentheses
and separate the options with a space:
proc options option=(pdfsecurity pdfpassword) define;
run;
Example Code 42.4 The Settings of Two SAS System Options
PDFSECURITY=NONE
Option Definition Information for SAS Option PDFSECURITY
Group= PDF
Group Description: PDF settings
Group= SECURITY
Group Description: Security settings
Description: Specifies the level of encryption to use for PDF documents.
Type: The option value is of type CHARACTER
Maximum Number of Characters: 4
Casing: The option value is retained uppercased
Quotes: If present during "set", start and end quotes are removed
Parentheses: The option value does not require enclosure within parentheses. If
present, the parentheses are retained.
Expansion: Environment variables, within the option value, are not expanded
Number of valid values: 3
Valid value: HIGH
Valid value: LOW
Valid value: NONE
When Can Set: Startup or anytime during the SAS Session
Restricted: Your Site Administrator can restrict modification of this option
Optsave: PROC Optsave or command Dmoptsave will save this option
Usage: OPTIONS Procedure 1591
PDFPASSWORD=xxxxxxxx
Option Definition Information for SAS Option PDFPASSWORD
Group= PDF
Group Description: PDF settings
Group= SECURITY
Group Description: Security settings
Description: Specifies the password to use to open a PDF document and the password used by a
PDF document owner.
Type: The option value is of type CHARACTER
Maximum Number of Characters: 2048
Casing: The option value is retained with original casing
Quotes: If present during "set", start and end quotes are removed
Parentheses: The option value must be enclosed within parentheses. The parentheses are
retained.
Expansion: Environment variables, within the option value, are not expanded
Number of valid values: 2
Valid value: OPEN
Valid value: OWNER
When Can Set: Startup or anytime during the SAS Session
Restricted: Your Site Administrator cannot restrict modification of this option
Optsave: PROC Optsave or command Dmoptsave will not save this option
Option Groups
GROUP=ADABAS ADABAS
GROUP=ANIMATION Animation
GROUP=DATACOM Datacom
GROUP=DB2 DB2
GROUP=EMAIL E-mail
GROUP=ENVDISPLAY Display
Usage: OPTIONS Procedure 1593
GROUP=ENVFILES Files
GROUP=HELP Help
GROUP=IDMS IDMS
GROUP=IMS IMS
GROUP=INSTALL Installation
GROUP=ISPF ISPF
GROUP=MEMORY Memory
GROUP=META Metadata
GROUP=PDF PDF
GROUP=PERFORMANCE Performance
GROUP=REXX REXX
GROUP=SECURITY Security
GROUP=SMF SMF
GROUP=SQL SQL
GROUP=SVG SVG
GROUP=TK TK
Use the GROUP= option to display system options that belong to a particular
group. You can specify one or more groups.
proc options group=(svg graphics);
run;
1594 Chapter 42 / OPTIONS Procedure
Group=SVG
ANIMATION=STOP Specifies whether to start or stop animation.
ANIMDURATION=MIN Specifies the number of seconds that each animation frame displays.
ANIMLOOP=YES Specifies the number of iterations that animated images repeat.
ANIMOVERLAY Specifies that animation frames are overlaid in order to view all frames.
SVGAUTOPLAY Starts animation when the page is loaded in the browser.
NOSVGCONTROLBUTTONS
Does not display the paging control buttons and an index in a multipage SVG
document.
SVGFADEIN=0 Specifies the number of seconds for the fade-in effect for a graph.
SVGFADEMODE=OVERLAP
Specifies whether to use sequential frames or to overlap frames for the
fade-in effect of a graph.
SVGFADEOUT=0 Specifies the number of seconds for a graph to fade out of view.
SVGHEIGHT= Specifies the height of the viewport. Specifies the value of the height
attribute of the outermost SVG element.
NOSVGMAGNIFYBUTTON
Disables the SVG magnifier tool.
SVGPRESERVEASPECTRATIO=
Specifies whether to force uniform scaling of SVG output. Specifies the
preserveAspectRatio attribute on the outermost SVG element.
SVGTITLE= Specifies the text in the title bar of the SVG output. Specifies the value of
the TITLE element in the SVG file.
SVGVIEWBOX= Specifies the coordinates, width, and height that are used to set the viewBox
attribute on the outermost SVG element.
SVGWIDTH= Specifies the width of the viewport. Specifies the value of the width
attribute of the outermost SVG element.
SVGX= Specifies the x-axis coordinate of one corner of the rectangular region for
an embedded SVG element. Specifies the x attribute in the outermost SVG
element.
SVGY= Specifies the y-axis coordinate of one corner of the rectangular region for
an embedded SVG element. Specifies the y attribute in the outermost SVG
element.
Group=GRAPHICS
DEVICE= Specifies the device driver to which SAS/GRAPH sends procedure output.
GSTYLE Uses ODS styles to generate graphs that are stored as GRSEG catalog entries.
GWINDOW Displays SAS/GRAPH output in the GRAPH window.
MAPS=("!sasroot\path-to-maps")
Specifies the location of SAS/GRAPH map data sets.
MAPSGFK=( "!sasroot\path-to-maps" )
Specifies the location of GfK maps.
MAPSSAS=( "!sasroot\path-to-maps" )
Specifies the location of SAS map data sets.
FONTALIAS= Assigns a Windows font to one of the SAS fonts.
You can use the following group names as values for the GROUP= option to list the
system options in a group:
You can use the following groups to list operating environment–specific values that
might be available when you use the GROUP= option with PROC OPTIONS.
The following SAS logs shows the output when the RESTRICT option is specified
and partial output when the LISTRESTRICT option is specified.
Example Code 42.7 A List of Options That Have Been Restricted by the Site Administrator
1
proc options restrict;
2 run;
SAS (r) Proprietary Software Release 9.4 TS1M6
Example Code 42.8 A Partial Log That Lists Options That Can Be Restricted
Your Site Administrator can restrict the ability to modify the following Portable Options:
Note: PROC OPTSAVE does not save these types of system options:
n SAS invocation options
Note: PROC OPTSAVE and PROC OPTLOAD, as well as the DMOPTSAVE and
DMOPTLOAD commands are valid only in SAS 9.4. They are not valid in SAS Viya.
The following SAS log shows a partial list of the options that can be saved by using
PROC OPTSAVE or the DMOPTSAVE command:
Results: OPTIONS Procedure 1597
Example Code 42.9 A Partial List of System Options That Can Be Saved
The OPTIONS procedure displays passwords in the SAS log as eight Xs, regardless
of the actual password length.
Operating Environment Information: PROC OPTIONS produces additional
information that is specific to the environment under which you are running SAS.
For more information about this and for descriptions of host-specific options, refer
to the SAS documentation for your operating environment.
1598 Chapter 42 / OPTIONS Procedure
See Also
n SAS Companion for UNIX Environments
n SAS Companion for Windows
n SAS Companion for z/OS
Details
This example shows how to generate the short form of the listing of SAS system
option settings. Compare this short form with the long form that is shown in
“Display a List of System Options” on page 1588.
Program
List all options and their settings. SHORT lists the SAS system options and their
settings without any descriptions.
proc options short;
run;
Example 2: Displaying the Setting of a Single Option 1599
Log
Example Code 42.10 Partial Listing of the SHORT Option
Portable Options:
Details
This example shows how to display the setting of a single SAS system option. The
log shows the current setting of the SAS system option MEMBLKSZ. The DEFINE
and VALUE options display additional information. The LOGNUMBERFORMAT
displays the value using commas.
Program
Specify the MEMBLKSZ SAS system option. OPTION=MEMBLKSZ displays option
value information. DEFINE and VALUE display additional information.
LOGNUMBERFORMAT specifies to format the value using commas.
proc options option=memblksz define value lognumberformat;
1600 Chapter 42 / OPTIONS Procedure
run;
Log
Example Code 42.11 Log Output from Specifying the MEMBLKSZ Option
Details
This example shows the value of an environment variable when the path is
displayed.
Example 4: List the Options That Can Be Specified by the INSERT and APPEND Options
1601
Program
Show the value of the environment variables: The EXPAND option causes the
values of environment variables to display in place of the environment variable. The
NOEXPAND option causes the environment variable to display. In this example, the
environment variable is !sasroot
proc options option=msg expand;
run;
proc options option=msg noexpand;
run;
Log
Example Code 42.12 Displaying an Expanded and Nonexpanded Pathname Using the
OPTIONS Procedure
MSG=( '!sasroot\sasmsg')
The path to the sasmsg directory
Details
This example shows how to display the options that can be specified by the
INSERT and APPEND system options.
Program
List all options that can be specified by the INSERT and APPEND options in SAS
9.4. The LISTINSERTAPPEND option provides a list and a description of these
options.
proc options listinsertappend;
run;
Log
Example Code 42.13 Displaying the Options That Can Be Specified by the INSERT and
APPEND Options
HELPLOC Specifies the location of the text and index files for the facility that
is used to view the online SAS Help and Documentation.
MSG Specifies the path to the library that contains SAS error messages.
SET Defines a SAS environment variable.
1603
43
OPTLOAD Procedure
You can load SAS system option settings from a SAS data set or registry key by
using one of these methods:
n the DMOPTLOAD command from a command line in the SAS windowing
environment. For example, the command loads system options from the
registry: DMOPTLOAD key= “core\options”.
n the PROC OPTLOAD statement.
When an option is restricted by the site administrator, and the option value that is
being set by PROC OPTLOAD differs from the option value that was established by
the site administrator, SAS issues a warning message to the log.
1604 Chapter 43 / OPTLOAD Procedure
PROC OPTLOAD Use SAS system option settings that are stored Ex. 1
in the SAS registry or in a SAS data set
Syntax
PROC OPTLOAD <options>;
Optional Arguments
DATA=libref.dataset
specifies the library and data set name from where SAS system option settings
are loaded. The SAS variable OPTNAME contains the character value of the
SAS system option name, and the SAS variable OPTVALUE contains the
character value of the SAS system option setting.
Default If you omit the DATA= option and the KEY= option, the procedure
will use the default SAS library and data set. The default library is
where the current user profile resides. Unless you specify a library,
Example: Load a Data Set of Saved System Options 1605
You must use quotation marks around the "SAS registry key"
name. Separate the names in a sequence of key names with a
backslash (\). For example, KEY="CORE\OPTIONS" loads system
options from the CORE\OPTIONS registry key.
Details
This example saves the current system option settings using the OPTSAVE
procedure, modifies the YEARCUTOFF system option, and then loads the original
set of system options.
Program
libname mysas "c:\mysas";
proc options option=yearcutoff;
run;
proc optsave out=mysas.options;
run;
1606 Chapter 43 / OPTLOAD Procedure
options yearcutoff=2000;
proc options option=yearcutoff;
run;
proc optload data=mysas.options;
run;
Program Description
These statements and procedures were submitted one at a time and not run as a
SAS program to allow the display of the YEARCUTOFF option.
Use the OPTIONS statement to set the YEARCUTOFF= system option to the
value 2000.
options yearcutoff=2000;
Display the value of the YEARCUTOFF= system option. After loading the saved
system option settings, the value of the YEARCUTOFF= option has been restored
to the original value.
proc options option=yearcutoff;
run;
Example: Load a Data Set of Saved System Options 1607
Log
Example Code 43.1 The SAS Log Shows the YEARCUTOFF= Value After Loading Options
Using PROC OPTLOAD
YEARCUTOFF=1926 Specifies the first year of a 100-year span that is used by date
informats
and functions to read a two-digit year.
NOTE: The data set MYSAS.OPTIONS has 259 observations and 2 variables.
NOTE: PROCEDURE OPTSAVE used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
6 options yearcutoff=2000;
YEARCUTOFF=2000 Specifies the first year of a 100-year span that is used by date
informats
and functions to read a two-digit year.
YEARCUTOFF=1926 Specifies the first year of a 100-year span that is used by date
informats
and functions to read a two-digit year.
44
OPTSAVE Procedure
SAS system options can be saved across SAS sessions. You can save the settings of
the SAS system options in a SAS data set or registry key by using one of these
methods:
n the DMOPTSAVE command from a command line in the SAS windowing
environment. Use the command like this: DMOPTSAVE <save-location>.
n the PROC OPTSAVE statement.
1610 Chapter 44 / OPTSAVE Procedure
PROC OPTSAVE Save the current SAS system option settings to Ex. 1
the SAS registry or to a SAS data set
Syntax
PROC OPTSAVE <options >;
Optional Arguments
KEY="SAS registry key"
specifies the location in the SAS registry of stored SAS system option settings.
The registry is retained in SASUSER. If SASUSER is not available, then the
temporary WORK library is used. For example, KEY="OPTIONS" saves the
system options in the OPTIONS registry key.
You must use quotation marks around the “SAS registry key”
name.
Tip To specify a subkey, enter multiple key names starting with the
root key.
CAUTION If the key already exists, it will be overwritten. If the specified key
does not already exist in the current SAS registry, then the key is
automatically created when option settings are saved in the SAS registry.
OUT=libref.dataset
specifies the names of the library and data set where SAS system option
settings are saved. The SAS variable OPTNAME contains the character value of
the SAS system option name. The SAS variable OPTVALUE contains the
character value of the SAS system option setting.
Default If you omit the OUT= and the KEY= options, the procedure will use
the default SAS library and data set. The default SAS library is where
the current user profile resides. Unless you specify a SAS library, the
default library is SASUSER. If SASUSER is in use by another active
SAS session, then the temporary WORK library is the default
location where the data set is saved. The default data set name is
MYOPTS.
Example Code 44.1 The SAS Log Displaying Output for the Option Procedure DEFINE
Option
PAGENO=1
Option Definition Information for SAS Option PAGENO
Group= LISTCONTROL
Group Description: Procedure output and display settings
Description: Resets the SAS output page number.
Type: The option value is of type LONG
Range of Values: The minimum is 1 and the maximum is 2147483647
Valid Syntax(any casing): MIN|MAX|n|nK|nM|nG|nT|hexadecimal
Numeric Format: Usage of LOGNUMBERFORMAT impacts the value format
When Can Set: Startup or anytime during the SAS Session
Restricted: Your Site Administrator can restrict modification of this option
Optsave: PROC Optsave or command Dmoptsave will save this option
Note: PROC OPTSAVE does not save these types of system options:
n SAS invocation options
Details
This example saves the current system option settings using the OPTSAVE
procedure.
Program
libname mysas "c:\mysas";
Program Description
Create a libref.
libname mysas "c:\mysas";
Log
Example Code 44.3 The SAS Log Shows Processing of PROC OPTSAVE
NOTE: The data set MYSAS.OPTIONS has 289 observations and 2 variables.
NOTE: PROCEDURE OPTSAVE used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
1616 Chapter 44 / OPTSAVE Procedure
1617
45
PLOT Procedure
The following output is a simple plot of the high values of the Dow Jones Industrial
Average between 1968 and 2008. PROC PLOT determines the plotting symbol and
the scales for the axes. Here are the statements that produce the output:
options nodate pageno=1 linesize=64
pagesize=25;
You can also overlay two plots, as shown in the following output. One plot shows
the high values of the DJIA data set; the other plot shows the low values. The plot
also shows that you can specify plotting symbols and put a box around a plot. The
statements that produce the following output are shown in “Example 3: Overlaying
Two Plots” on page 1655.
1620 Chapter 45 / PLOT Procedure
PROC PLOT can also label points on a plot with the values of a variable, as shown
in the following output. The plotted data represents population density and crime
rates for selected U.S. states.The SAS code that produces the following output is
shown in “Example 11: Adjusting Labels on a Plot with the PLACEMENT= Option” on
page 1679.
Concepts: PLOT Procedure 1621
RUN Groups
PROC PLOT is an interactive procedure. It remains active after a RUN statement is
executed. Usually, SAS terminates a procedure after executing a RUN statement.
When you start the PLOT procedure, you can continue to submit any valid
statements without resubmitting the PROC PLOT statement. Thus, you can easily
experiment with changing labels, values of tick marks, and so on. Any options
submitted in the PROC PLOT statement remain in effect until you submit another
PROC PLOT statement.
When you submit a RUN statement, PROC PLOT executes all the statements
submitted since the last PROC PLOT or RUN statement. Each group of statements
1622 Chapter 45 / PLOT Procedure
is called a RUN group. With each RUN group, PROC PLOT begins a new page and
begins with the first item in the VPERCENT= and HPERCENT= lists, if any.
You can use the BY statement interactively. The BY statement remains in effect
until you submit another BY statement or terminate the procedure.
See “Example 11: Adjusting Labels on a Plot with the PLACEMENT= Option” on
page 1679 for an example of using RUN-group processing with PROC PLOT.
Pointer Symbols
Pointer symbols associate a point with its label by pointing in the general direction
of the label placement. When you use a label variable and do not specify a plotting
symbol or if the variable value is null ('00'x), PROC PLOT uses pointer symbols as
plotting symbols. PROC PLOT uses four different pointer symbols based on the
value of the S= and V= suboptions in the PLACEMENT= option. The table below
shows the pointer symbols:
S= V= Symbol
CENTER >0 ∘
CENTER <=0 v
If you are using pointer symbols and multiple points coincide, then PROC PLOT
uses the number of points, 2-9, as the plotting symbol. If the number of points is
more than 9, then the procedure uses an asterisk (*).
Understanding Penalties
PROC PLOT assesses the quality of placements with penalties. If all labels are
plotted with zero penalty, then no labels collide and all labels are near their
symbols. When it is not possible to place all labels with zero penalty, PROC PLOT
tries to minimize the total penalty.
The following table lists the penalty, its default value, the index used to reference
the penalty, and the range of values that can be assigned to the penalty. Each
penalty is described in more detail in Table 45.85 on page 1624.
Default
Penalty Penalty Index Range
The following table contains the index values from the previous table with a
description of the corresponding penalty.
1624 Chapter 45 / PLOT Procedure
1 A nonblank character in the plot collides with an embedded blank in a label, or there is not a
blank or a plot boundary before or after each label fragment.
2 A split occurs on a nonblank or nonpunctuation character when you do not specify a split
character.
3 A label is placed with a different number of lines than the L= suboption specifies, when you
specify a split character.
4-7 A label is placed far away from the corresponding point. PROC PLOT calculates the penalty
according to this (integer arithmetic) formula:
MAX H − f hs, 0 + vsw × MAX V − L+ f vs+ V > 0 /2, 0 /vhsd
Notice that penalties 4 through 7 are actually just components of the formula used to
determine the penalty. Changing the penalty for a free horizontal or free vertical shift to a
large value such as 500 removes any penalty for a large horizontal or vertical shift. “Example
6: Plotting Date Values on an Axis” on page 1662 illustrates a case in which removing the
horizontal shift penalty is useful.
8 A label might collide with its own plotting symbol. If the plotting symbol is blank, then a
collision state cannot occur. See “Collision States” on page 1625 for more information.
15-214 A label character does not appear in the plot. By default, the penalty for not printing the first
character is greater than the penalty for not printing the second character, and so on. By
default, the penalty for not printing the fifth and subsequent characters is the same.
Changing Penalties
You can change the default penalties with the PENALTIES= option in the PLOT
statement. Because PROC PLOT considers penalties when it places labels,
changing the default penalties can change the placement of the labels.
For example, if you have labels that all begin with the same two-letters, then you
can increase the default penalty for not printing the third, fourth, and fifth
characters and decrease the penalty for not printing the first and second
characters. See “Using the PENALTIES= Option” on page 1648 for an example of
how to use the PENALTIES= option.
Concepts: PLOT Procedure 1625
Collision States
Collision states are placement states that can cause a label to collide with its own
plotting symbol. PROC PLOT usually avoids using collision states because of the
large default penalty of 500 that is associated with them. PROC PLOT does not
consider the actual length or splitting of any particular label when determining if a
placement state is a collision state.
Here are the rules that PROC PLOT uses to determine collision states:
n When S=CENTER, placement states that do not shift the label up or down
sufficiently so that all of the label is shifted onto completely different lines from
the symbol are collision states.
n When S=RIGHT, placement states that shift the label zero or more positions to
the left without first shifting the label up or down onto completely different
lines from the symbol are collision states.
n When S=LEFT, placement states that shift the label zero or more positions to
the right without first shifting the label up or down onto completely different
lines from the symbol are collision states.
Note: A collision state cannot occur if you do not use a plotting symbol.
Reference Lines
PROC PLOT places labels and computes penalties before placing reference lines on
a plot. The procedure does not attempt to avoid rows and columns that contain
reference lines.
When you overlay two or more label plots, all label plots are treated as a single plot
in avoiding collisions and computing hidden character counts. Labels of different
plots never overprint, even with the OVP system option in effect.
Time
For a given plot size, the time that is required to construct the plot is approximately
proportional to n × len . The amount of time required to split the labels is
approximately proportional to ns2 . Generally, the more placement states that you
specify, the more time that PROC PLOT needs to place the labels. However,
increasing the number of horizontal and vertical shifts gives PROC PLOT more
flexibility to avoid collisions, often resulting in less time used to place labels.
Memory
PROC PLOT uses 24p bytes of memory for the internal placement state list. PROC
PLOT uses n 84 + 5len + 4s 1 + 1.5 s + 1 bytes for the internal list of labels. PROC
PLOT builds all plots in memory; each printing position uses one byte of memory.
If you run out of memory, then request fewer plots in each PLOT statement and put
a RUN statement after each PLOT statement.
PROC PLOT<options>;
BY<DESCENDING>variable-1<<DESCENDING>variable-2…><NOTSORTED>;
PLOTplot-request(s)</ options>;
Tip: You can use data set options with the DATA= option. SAS Data Set Options:
Reference
Example: “Example 10: Excluding Observations That Have Missing Values” on page 1676
Syntax
PROC PLOT<options>;
Optional Arguments
DATA=<SAS-data-set>
specifies the input data set. specifies the input SAS data set.
ENCRYPTKEY=<key-value>
specifies the key value needed for plotting an AES-encrypted data set. specifies
the key value needed for plotting an AES-encrypted data set. If the input data
set was created with ENCRYPT=AES, then you must specify the
ENCRYPTKEY= value to plot its data. For example, if a data set named
secretPlot is created using the DATA statement
data secretPlot(encrypt=AES encryptkey=Ib007)
then you must specify the following PROC statement to plot the data in
secretPlot:
proc plot data=secretPlot(encryptkey=Ib007);
See “ENCRYPTKEY= Data Set Option” in SAS Data Set Options: Reference for
more information about the ENCRYPTKEY= data set option.
FORMCHAR <(position(s))>'formatting-character(s)'
specifies the characters that construct the borders of the plot. defines the
characters to use for constructing the borders of the plot.
position(s)
identifies the position of one or more characters in the SAS formatting-
character string. A space or a comma separates the positions.
1 | Vertical separators
2 - Horizontal separators
35911 - Corners
7 + Intersection of vertical
and horizontal
separators
formatting-character(s)
lists the characters to use for the specified positions. PROC PLOT assigns
characters in formatting-character(s) to position(s), in the order in which
they are listed. For example, the following option assigns the asterisk (*) to
the third formatting character, the number sign (#) to the seventh character,
and does not alter the remaining characters: formchar(3,7)='*#'
HPERCENT=percent(s)
specifies the percentage of the available horizontal space for each plot.
specifies one or more percentages of the available horizontal space to use for
each plot. HPERCENT= enables you to put multiple plots on one page. PROC
PLOT tries to fit as many plots as possible on a page. After using each of the
percent(s), PROC PLOT cycles back to the beginning of the list. A zero in the list
forces PROC PLOT to go to a new page even if it could fit the next plot on the
same page.
1630 Chapter 45 / PLOT Procedure
HPERCENT=33
prints three plots per page horizontally; each plot is one-third of a page wide.
HPERCENT=50 25 25
prints three plots per page; the first is twice as wide as the other two.
HPERCENT=33 0
produces plots that are one-third of a page wide; each plot is on a separate
page.
HPERCENT=300
produces plots three pages wide.
At the beginning of every BY group and after each RUN statement, PROC PLOT
returns to the beginning of the percent(s) and starts printing a new page.
Alias HPCT=
Default 100
MISSING
includes missing character variable values in the construction of the axes. It has
no effect on numeric variables.
NOLEGEND
suppresses the legend at the top of each plot. The legend lists the names of the
variables being plotted and the plotting symbols used in the plot.
NOMISS
excludes observations for which either variable is missing from the calculation
of the axes. Normally, PROC PLOT draws an axis based on all the values of the
variable being plotted, including points for which the other variable is missing.
UNIFORM
uniformly scales axes across BY groups. Uniform scaling enables you to directly
compare the plots for different values of the BY variables.
Restriction You cannot use PROC PLOT with the UNIFORM option with an
engine that supports concurrent access if another user is updating
the data set at the same time.
BY Statement 1631
VPERCENT=percent(s)
specifies one or more percentages of the available vertical space to use for each
plot. If you use a percentage greater than 100, then PROC PLOT prints sections
of the plot on successive pages.
Alias VPCT=
Default 100
VTOH=aspect-ratio
specifies the aspect ratio (vertical to horizontal) of the characters on the output
device. aspect-ratio is a positive real number. If you use the VTOH= option, then
PROC PLOT spaces tick marks so that the distance between horizontal tick
marks is nearly equal to the distance between vertical tick marks. For example,
if characters are twice as high as they are wide, then specify VTOH=2.
Interaction VTOH= has no effect if you use the HSPACE= and VSPACE= options
in the PLOT statement.
BY Statement
Produces a separate plot and starts a new page for each BY group.
Restriction: This procedure is not available in SAS Viya orders that include only SAS Visual
Analytics.
See: “BY” on page 74
Example: “Example 8: Plotting BY Groups” on page 1668
Syntax
BY<DESCENDING>variable-1<<DESCENDING>variable-2…><NOTSORTED>;
Required Argument
variable
specifies the variable that the procedure uses to form BY groups. You can
specify more than one variable. If you do not use the NOTSORTED option in the
1632 Chapter 45 / PLOT Procedure
BY statement, then you must sort or index the data set by the values of the
variables specified in the BY statement. Variables in a BY statement are called
BY variables.
Optional Arguments
DESCENDING
specifies that the observations are sorted in descending order by the variable
that immediately follows the word DESCENDING in the BY statement.
NOTSORTED
specifies that observations are not necessarily sorted in alphabetic or numeric
order. The data is grouped in another way, such as chronological order.
PLOT Statement
Requests the plots to be produced by PROC PLOT.
Restriction: This procedure is not available in SAS Viya orders that include only SAS Visual
Analytics.
Tip: You can use multiple PLOT statements.
Examples: “Example 1: Specifying a Plotting Symbol” on page 1650
“Example 2: Controlling the Horizontal Axis and Adding a Reference Line” on page
1653
“Example 3: Overlaying Two Plots” on page 1655
“Example 4: Producing Multiple Plots per Page” on page 1658
“Example 5: Plotting Data on a Logarithmic Scale” on page 1660
“Example 6: Plotting Date Values on an Axis” on page 1662
“Example 7: Producing a Contour Plot” on page 1665
“Example 8: Plotting BY Groups” on page 1668
“Example 9: Adding Labels to a Plot” on page 1672
“Example 11: Adjusting Labels on a Plot with the PLACEMENT= Option” on page
1679
“Example 12: Adjusting Labeling on a Plot with a Macro” on page 1684
“Example 13: Changing a Default Penalty” on page 1687
PLOT Statement 1633
Syntax
PLOTplot-request(s)</ options>;
SPLIT='split-character'
specifies a split character for the label.
STATES
lists all placement states in effect.
Required Argument
plot-request(s)
specifies the variables (vertical and horizontal) to plot and the plotting symbol
to use to mark the points on the plot.
For more information, see “Labels and Plot Points” on page 1622. In addition, see
“Example 9: Adding Labels to a Plot” on page 1672 and all the examples that
follow it.
This form of the plot request uses the default method of choosing a plotting
symbol to mark plot points. When a point on the plot represents the values
of one observation in the data set, PROC PLOT puts the character A at that
point. When a point represents the values of two observations, the character
B appears. When a point represents values of three observations, the
character C appears, and so on, through the alphabet. The character Z is
used for the occurrence of 26 or more observations at the same printing
position.
See “Specifying Variable Lists in Plot Requests” on page 1647 and “Specifying
Combinations of Variables” on page 1647
Optional Arguments
BOX
draws a border around the entire plot, rather than just on the left side and
bottom.
CONTOUR<=number-of-levels>
draws a contour plot using plotting symbols with varying degrees of shading
where number-of-levels is the number of levels for dividing the range of variable.
The plot request must be of the form vertical*horizontal=variable where variable
is a numeric variable in the data set. The intensity of shading is determined by
the values of this variable.
When you use CONTOUR, PROC PLOT does not plot observations with missing
values for variable.
Default 10
Range 1-10
HAXIS=axis-specification
specifies the tick-mark values for the horizontal axis.
n< … n>
BY increment
n TO n BY increment
n For numeric values, axis-specification is either an explicit list of values, a BY
increment, or a combination of both:
The values must be in either ascending or descending order. Use a negative
value for increment to specify descending order. The specified values are
spaced evenly along the horizontal axis even if the values are not uniformly
distributed. Numeric values can be specified in the following ways:
Table 45.5 Specifying Numeric HAXIS= Values
'value-1' <…'value-n'>
For example, the following statement assigns three cities to represent the
tick-mark values for the horizontal axis:
haxis='Paris' 'London' 'Tokyo'
'date-time-value'i<…'date-time-value'i>
haxis='01JAN95'd to '01JAN96'd
by qtr
1638 Chapter 45 / PLOT Procedure
Note: You must use a FORMAT statement to print the tick-mark values in
an understandable form.
Interaction You can use the HAXIS= and VAXIS= options with the VTOH=
option to equate axes. If your data is suitable, then use HAXIS=BY n
and VAXIS=BY n with the same value for n and specify a value for
the VTOH= option. The number of columns that separate the
horizontal tick marks is nearly equal to the number of lines that
separate the vertical tick marks times the value of the VTOH=
option. In some cases, PROC PLOT cannot simultaneously use all
three values and changes one or more of the values.
HEXPAND
expands the horizontal axis to minimize the margins at the sides of the plot and
to maximize the distance between tick marks, if possible.
HEXPAND causes PROC PLOT to ignore information about the spacing of the
data. Plots produced with this option waste less space but can obscure the
nature of the relationship between the variables.
HPOS=axis-length
specifies the number of print positions on the horizontal axis. The maximum
value of axis-length that allows a plot to fit on one page is three positions less
than the value of the LINESIZE= system option. This maximum ensures that
there is enough space for the procedure to print information next to the vertical
axis. The exact maximum depends on the number of characters that are in the
vertical variable's values. If axis-length is too large to fit on a line, then PROC
PLOT ignores the option.
HREF=value-specification
draws lines on the plot perpendicular to the specified values on the horizontal
axis. PROC PLOT includes the values that you specify with the HREF= option on
the horizontal axis unless you specify otherwise with the HAXIS= option.
HREFCHAR='character'
specifies the character to use to draw the horizontal reference line.
HREVERSE
reverses the order of the values on the horizontal axis.
HSPACE=n
specifies that a tick mark will occur on the horizontal axis at every nth print
position, where n is the value of HSPACE=.
HZERO
assigns a value of zero to the first tick mark on the horizontal axis.
Interaction PROC PLOT ignores HZERO if the horizontal variable has negative
values or if the HAXIS= option specifies a range that does not begin
with zero.
LIST<=penalty-value>
lists the horizontal and vertical axis values, the penalty, and the placement
state of all points plotted with a penalty greater than or equal to penalty-value.
If no plotted points have a penalty greater than or equal to penalty-value, then
no list is printed.
OUTWARD='character'
tries to force the point labels outward, away from the origin of the plot, by
protecting positions next to symbols that match character that are in the
direction of the origin (0,0). The algorithm tries to avoid putting the labels in the
protected positions, so they usually move outward.
Tip This option is useful only when you are labeling points with the values of a
variable.
OVERLAY
overlays all plots that are specified in the PLOT statement on one set of axes.
The variable names, or variable labels if they exist, from the first plot are used
to label the axes. Unless you use the HAXIS= option or the VAXIS= option,
PROC PLOT automatically scales the axes in the way that best fits all the
variables.
When the SAS system option OVP is in effect and overprinting is allowed, the
plots are superimposed. Otherwise, when NOOVP is in effect, PROC PLOT uses
the plotting symbol from the first plot to represent points that appear in more
than one plot. In such a case, the output includes a message telling you how
many observations are hidden.
PENALTIES<(index-list)>=penalty-list
changes the default penalties. The index-list provides the positions of the
penalties in the list of penalties. The penalty-list contains the values that you
are specifying for the penalties that are indicated in the index-list. The index-list
1640 Chapter 45 / PLOT Procedure
and the penalty-list can contain one or more integers. In addition, both index-list
and penalty-list accept the form: value TO value
PLACEMENT=(expression(s))
controls the placement of labels by specifying possible locations of the labels
relative to their coordinates. Each expression consists of a list of one or more
suboptions (H=, L=, S=, or V=) that are joined by an asterisk (*) or a colon (:).
PROC PLOT uses the asterisk and colon to expand each expression into
combinations of values for the four possible suboptions. The asterisk creates
every possible combination of values in the expression list. A colon creates only
pairwise combinations. The colon takes precedence over the asterisk. With the
colon, if one list is shorter than the other, then the values in the shorter list are
reused as necessary.
H=integer(s)
specifies the number of horizontal spaces (columns) to shift the label
relative to the starting position. Both positive and negative integers are
valid. Positive integers shift the label to the right; negative integers shift it to
the left. For example, you can use the H= suboption in the following way:
place=(h=0 1 -1 2 -2)
You can use the keywords BY ALT in this list. BY ALT produces a series of
numbers whose signs alternate between positive and negative and whose
absolute values change by one after each pair. For example, the following
PLACE= specifications are equivalent:
place=(h=0 -1 to -3 by alt)
place=(h=0 -1 1 -2 2 -3 3)
If the series includes zero, then the zero appears twice. For example, the
following PLACE= options are equivalent:
place=(h= 0 to 2 by alt)
place=(h=0 0 1 -1 2 -2)
Default H=0
L=integer(s)
specifies the number of lines onto which the label can be split.
Default L=1
Range 1-200
S=start-position(s)
specifies where to start printing the label. The value for start-position can be
one or more of the following:
PLOT Statement 1641
CENTER
the procedure centers the label around the plotting symbol.
RIGHT
the label starts at the plotting symbol location and continues to the right.
LEFT
the label starts to the left of the plotting symbol and ends at the plotting
symbol location.
Default CENTER
V=integer(s)
specifies the number of vertical spaces (lines) to shift the label relative to
the starting position. V= behaves the same as the H= suboption, described
earlier.
place=((v=1)
(s=right left : h=2 -2)
(v=-1)
(h=0 1 to 2 by alt * v=1 -1)
(l=1 to 3 * v=1 to 2 by alt *
h=0 1 to 2 by alt))
(S=RIGHT LEFT : H=2 −2) S=RIGHT L=1 H=2 V=0 Begin the label in the
second column to the right
of the point. Use one line for
the label.
1642 Chapter 45 / PLOT Procedure
(H=0 1 to 2 BY ALT * V=1 −1) S=CENTER L=1 H=0 V=1 Center the label, relative to
the point, on the line above
the point.
(L=1 to 3 * V=1 to 2 BY ALT * H=0 1 S=CENTER L=1 H=0 V=1 Center the label, relative to
to 2 BY ALT) the point, on the line above
PLOT Statement 1643
Alias PLACE=
Default There are two defaults for the PLACE= option. If you are using a blank
as the plotting symbol, then the default placement state is
PLACE=(S=CENTER : V=0 : H=0 : L=1), which centers the label. If you
are using anything other than a blank, then the default is
PLACE=((S=RIGHT LEFT : H=2 −2) (V=1 −1 * H=0 1 -1 2 -2)). The
default for labels placed with symbols includes multiple positions
around the plotting symbol so the procedure has flexibility when
placing labels on a crowded plot.
Scontour-level='character-list'
specifies the plotting symbol to use for a single contour level. When PROC
PLOT produces contour plots, it automatically chooses the symbols to use for
each level of intensity. You can use the S= option to override these symbols and
specify your own. You can include up to three characters in character-list. If
overprinting is not allowed, then PROC PLOT uses only the first character.
For example, to specify three levels of shading for the Z variable, use the
following statement:
plot y*x=z /
contour=3 s1='A' s2='+' s3='X0A';
This feature was designed especially for printers where the hexadecimal
constants can represent gray scale fill characters.
If you omit a plotting symbol for each contour level, then PROC PLOT uses the
default symbols:
slist='.' ',' '-' '=' '+' 'O' 'X' 'W' '*' '#'
Restriction If you use the SLIST= option, then it must be listed last in the PLOT
statement.
SPLIT='split-character'
when labeling plot points, specifies where to split the label when the label
spans two or more lines. The label is split onto the number of lines that is
specified in the L= suboption to the PLACEMENT= option. If you specify a split
PLOT Statement 1645
character, then the procedure always splits the label on each occurrence of that
character, even if it cannot find a suitable placement. If you specify L=2 or more
but do not specify a split character, then the procedure tries to split the label on
blanks or punctuation but will split words if necessary.
STATES
lists all the placement states in effect. STATES prints the placement states in
the order in which you specify them in the PLACE= option.
VAXIS=axis-specification
specifies tick mark values for the vertical axis. VAXIS= follows the same rules as
the HAXIS= option.
VEXPAND
expands the vertical axis to minimize the margins above and below the plot and
to maximize the space between vertical tick marks, if possible.
VPOS=axis-length
specifies the number of print positions on the vertical axis. The maximum value
for axis-length that allows a plot to fit on one page is eight lines less than the
value of the SAS system option PAGESIZE= because you must allow room for
the procedure to print information under the horizontal axis. The exact
maximum depends on the titles that are used, whether plots are overlaid, and
whether CONTOUR is specified. If the value of axis-length specifies a plot that
cannot fit on one page, then the plot spans multiple pages.
VREF=value-specification
draws lines on the plot perpendicular to the specified values on the vertical
axis. PROC PLOT includes the values that you specify with the VREF= option on
the vertical axis unless you specify otherwise with the VAXIS= option. For the
syntax for value-specification, see “HAXIS=axis-specification” on page 1636.
VREFCHAR='character'
specifies the character to use to draw the vertical reference lines.
VREVERSE
reverses the order of the values on the vertical axis.
VSPACE=n
specifies that a tick mark will occur on the vertical axis at every nth print
position, where n is the value of VSPACE=.
VZERO
assigns a value of zero to the first tick mark on the vertical axis.
Interaction PROC PLOT ignores the VZERO option if the vertical variable has
negative values or if the VAXIS= option specifies a range that does
not begin with zero.
Because PROC PLOT prints one character for each observation, using SAS program
statements to generate the data set for PROC PLOT can enhance the effectiveness
of continuous plots. For example, suppose that you want to generate data in order
to plot the following equation, for x ranging from 0 to 100:
y = 2.54 + 3.83x
If the plot is printed with a LINESIZE= value of 80, then about 75 positions are
available on the horizontal axis for the X values. Thus, 2 is a good increment: 51
observations are generated, which is fewer than the 75 available positions on the
horizontal axis.
However, if the plot is printed with a LINESIZE= value of 132, then an increment of 2
produces a plot in which the plotting symbols have space between them. For a
smoother line, a better increment is 1, because 101 observations are generated.
If both the vertical and horizontal specifications request more than one variable
and if a variable appears in both lists, then it will not be plotted against itself. For
example, the following statement does not plot B*B and C*C:
plot (a b c)*(b c d);
A colon combines the variables pairwise. Thus, the first variables of each list
combine to request a plot, as do the second, third, and so on. For example, the
following plot requests are equivalent:
plot (y1-y2) : (x1-x2);
This example extends the penalty list. The 20th penalty of 2 is the penalty for not
printing the sixth through 200th character. When the last index i is greater than 18,
the last penalty is used for the (i − 14th character and beyond.
You can also extend the penalty list by just specifying the starting index. For
example, the following PENALTIES= option is equivalent to the one above:
penalties(15)=2 2 11 10 8 2
Printed Output
Each plot uses one full page unless the plot's size is changed by the VPOS= and
HPOS= options in the PLOT statement, the VPERCENT= or HPERCENT= options in
the PROC PLOT statement, or the PAGESIZE= and LINESIZE= system options.
Results: PLOT Procedure 1649
Titles, legends, and variable labels are printed at the top of each page. Each axis is
labeled with the variable's name or, if it exists, the variable's label.
Normally, PROC PLOT begins a new plot on a new page. However, the VPERCENT=
and HPERCENT= options enable you to print more than one plot on a page.
VPERCENT= and HPERCENT= are described earlier in “PROC PLOT Statement” on
page 1627 .
PROC PLOT always begins a new page after a RUN statement and at the beginning
of a BY group.
Missing Values
If values of either of the plotting variables are missing, then PROC PLOT does not
include the observation in the plot. However, in a plot of Y*X, values of X with
corresponding missing values of Y are included in scaling the X axis, unless the
NOMISS option is specified in the PROC PLOT statement.
Hidden Observations
By default, PROC PLOT uses different plotting symbols (A, B, C, and so on) to
represent observations whose values coincide on a plot. However, if you specify
your own plotting symbol or if you use the OVERLAY option, then you might not be
able to recognize coinciding values.
If you specify a plotting symbol, then PROC PLOT uses the same symbol regardless
of the number of observations whose values coincide. If you use the OVERLAY
option and overprinting is not in effect, then PROC PLOT uses the symbol from the
first plot request. In both cases, the output includes a message telling you how
many observations are hidden.
Details
This example expands on Output 45.237 on page 1619 by specifying a different
plotting symbol.
Example 1: Specifying a Plotting Symbol 1651
Program
options formchar="|----|+|---+=|-/\<>*";
data djia;
input Year HighDate date7. High LowDate date7. Low;
format highdate lowdate date7.;
datalines;
1968 03DEC68 985.21 21MAR68 825.13
1969 14MAY69 968.85 17DEC69 769.93
...more data lines...
2006 27DEC06 12510.57 20JAN06 10667.39
2007 09OCT07 14164.53 05MAR07 12050.41
2008 02MAY08 13058.20 10OCT08 8451.19
;
proc plot data=djia;
plot high*year='*'
/ vspace=5 vaxis=by 1000;
title 'High Values of the Dow Jones Industrial Average';
title2 'from 1968 to 2008';
run;
Program Description
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available.
options formchar="|----|+|---+=|-/\<>*";
Create the DJIA data set. DJIA contains the high and low closing marks for the Dow
Jones Industrial Average from 1968 to 2008 . The DATA step creates this data set.
data djia;
input Year HighDate date7. High LowDate date7. Low;
format highdate lowdate date7.;
datalines;
1968 03DEC68 985.21 21MAR68 825.13
1969 14MAY69 968.85 17DEC69 769.93
...more data lines...
2006 27DEC06 12510.57 20JAN06 10667.39
2007 09OCT07 14164.53 05MAR07 12050.41
2008 02MAY08 13058.20 10OCT08 8451.19
;
Create the plot. The plot request plots the values of High on the vertical axis and
the values of Year on the horizontal axis. It also specifies an asterisk as the plotting
symbol. The VAXIS= option and the VSPACE= option. The VAXIS=by 1000 option
specifies tick mark values for the vertical axis in increments of 1,000. The VSPACE=
option specifies the amount of print space between tick marks on the vertical axis.
proc plot data=djia;
plot high*year='*'
/ vspace=5 vaxis=by 1000;
1652 Chapter 45 / PLOT Procedure
Output
PROC PLOT determines the tick marks and the scale of both axes.
Details
This example specifies values for the horizontal axis and draws a reference line
from the vertical axis.
Program
options formchar="|----|+|---+=|-/\<>*";
data djia;
input Year HighDate date7. High LowDate date7. Low;
format highdate lowdate date7.;
datalines;
1968 03DEC68 985.21 21MAR68 825.13
1969 14MAY69 968.85 17DEC69 769.93
...more data lines...
2006 27DEC06 12510.57 20JAN06 10667.39
2007 09OCT07 14164.53 05MAR07 12050.41
2008 02MAY08 13058.20 10OCT08 8451.19
;
proc plot data=djia;
plot high*year='*'
Program Description
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available.
options formchar="|----|+|---+=|-/\<>*";
Create the DJIA data set. DJIA contains the high and low closing marks for the Dow
Jones Industrial Average from 1968 to 2008. The DATA step creates this data set.
data djia;
input Year HighDate date7. High LowDate date7. Low;
format highdate lowdate date7.;
datalines;
1968 03DEC68 985.21 21MAR68 825.13
1969 14MAY69 968.85 17DEC69 769.93
...more data lines...
2006 27DEC06 12510.57 20JAN06 10667.39
2007 09OCT07 14164.53 05MAR07 12050.41
2008 02MAY08 13058.20 10OCT08 8451.19
;
Create the plot. The plot request plots the values of High on the vertical axis and
the values of Year on the horizontal axis. It also specifies an asterisk as the plotting
symbol.
proc plot data=djia;
plot high*year='*'
Customize the horizontal axis and draw a reference line. HAXIS= specifies that the
horizontal axis will show the values 1968 to 2008 in ten-year increments. VREF=
draws a reference line that extends from the value 3000 on the vertical axis.
Output
Output 45.5 Plot with Reference Line
Details
This example overlays two plots and puts a box around the plot.
1656 Chapter 45 / PLOT Procedure
Program
options formchar="|";
data djia;
input Year HighDate date7. High LowDate date7. Low;
format highdate lowdate date7.;
datalines;
1968 03DEC68 985.21 21MAR68 825.13
1969 14MAY69 968.85 17DEC69 769.93
...more data lines...
2006 27DEC06 12510.57 20JAN06 10667.39
2007 09OCT07 14164.53 05MAR07 12050.41
2008 02MAY08 13058.20 10OCT08 8451.19
;
proc plot data=djia formchar="|----|+|---+=|-/\<>*";
plot high*year='*'
low*year='o' / overlay box
haxis=by 10
vaxis=by 5000;
title 'Plot of Highs and Lows';
title2 'for the Dow Jones Industrial Average';
run;
Program Description
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available.
options formchar="|";
Create the DJIA data set. DJIA contains the high and low closing marks for the Dow
Jones Industrial Average from 1968 to 2008. The DATA step creates this data set.
data djia;
input Year HighDate date7. High LowDate date7. Low;
format highdate lowdate date7.;
datalines;
1968 03DEC68 985.21 21MAR68 825.13
1969 14MAY69 968.85 17DEC69 769.93
...more data lines...
2006 27DEC06 12510.57 20JAN06 10667.39
2007 09OCT07 14164.53 05MAR07 12050.41
2008 02MAY08 13058.20 10OCT08 8451.19
;
Create the plot. Create the plot using the PROC statement. Set the FORMCHAR
option. Setting the FORMCHAR option to this exact string renders better HTML
output when it is viewed outside of the SAS environment where SAS Monospace
fonts are not available. The first plot request plots High on the vertical axis, plots
Year on the horizontal axis, and specifies an asterisk as a plotting symbol. The
second plot request plots Low on the vertical axis, plots Year on the horizontal axis,
Example 3: Overlaying Two Plots 1657
and specifies an 'o' as a plotting symbol. OVERLAY superimposes the second plot
onto the first. BOX draws a box around the plot. OVERLAY and BOX apply to both
plot requests. HAXIS= specifies that the horizontal axis will show the values 1968
to 2008 in ten-year increments. VAXIS= specifies that the vertical axis will show
the values in increments of 5,000.
proc plot data=djia formchar="|----|+|---+=|-/\<>*";
plot high*year='*'
low*year='o' / overlay box
haxis=by 10
vaxis=by 5000;
Output
Output 45.6 Two Plots Overlaid Using Different Plotting Symbols
1658 Chapter 45 / PLOT Procedure
Details
This example places three plots on one page of output.
Program
options formchar="|----|+|---+=|-/\<>*" pagesize=40 linesize=120;
data djia;
input Year HighDate date7. High LowDate date7. Low;
format highdate lowdate date7.;
datalines;
1968 03DEC68 985.21 21MAR68 825.13
1969 14MAY69 968.85 17DEC69 769.93
...more data lines...
2006 27DEC06 12510.57 20JAN06 10667.39
2007 09OCT07 14164.53 05MAR07 12050.41
2008 02MAY08 13058.20 10OCT08 8451.19
;
proc plot data=djia vpercent=50 hpercent=50;
plot high*year='*';
plot low*year='o';
plot high*year='*' low*year='o' / overlay box;
title 'Plots of the Dow Jones Industrial Average';
title2 'from 1968 to 2008';
run;
Example 4: Producing Multiple Plots per Page 1659
Program Description
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available. The PAGESIZE= option sets the number of lines
of output to 40, and the LINESIZE= option sets the line size in the output window
to 120 characters.
options formchar="|----|+|---+=|-/\<>*" pagesize=40 linesize=120;
Create the DJIA data set. DJIA contains the high and low closing marks for the Dow
Jones Industrial Average from 1968 to 2008. The DATA step creates this data set.
data djia;
input Year HighDate date7. High LowDate date7. Low;
format highdate lowdate date7.;
datalines;
1968 03DEC68 985.21 21MAR68 825.13
1969 14MAY69 968.85 17DEC69 769.93
...more data lines...
2006 27DEC06 12510.57 20JAN06 10667.39
2007 09OCT07 14164.53 05MAR07 12050.41
2008 02MAY08 13058.20 10OCT08 8451.19
;
Specify the plot sizes. VPERCENT= specifies that 50% of the vertical space on the
page of output is used for each plot. HPERCENT= specifies that 50% of the
horizontal space is used for each plot.
proc plot data=djia vpercent=50 hpercent=50;
Create the first plot. This plot request plots the values of High on the vertical axis
and the values of Year on the horizontal axis. It also specifies an asterisk as the
plotting symbol.
plot high*year='*';
Create the second plot. This plot request plots the values of Low on the vertical
axis and the values of Year on the horizontal axis. It also specifies an asterisk as the
plotting symbol.
plot low*year='o';
Create the third plot. The first plot request plots High on the vertical axis, plots
Year on the horizontal axis, and specifies an asterisk as a plotting symbol. The
second plot request plots Low on the vertical axis, plots Year on the horizontal axis,
and specifies an 'o' as a plotting symbol. OVERLAY superimposes the second plot
onto the first. BOX draws a box around the plot. OVERLAY and BOX apply to both
plot requests.
plot high*year='*' low*year='o' / overlay box;
Output
Output 45.7 Three Plots on One Page
Details
This example uses a DATA step to generate the data set EQUA. The DATA step
creates the data by using an iterative DO statement. The PROC PLOT step shows
two plots of the same data: one plot without a horizontal axis specification and one
plot with a logarithmic scale specified for the horizontal axis.
Program
data equa;
do Y=1 to 3 by .1;
X=10**y;
output;
end;
run;
proc plot data=equa hpercent=50;
plot y*x / vspace=1;
plot y*x / haxis=10 100 1000 vspace=1;
title 'Two Plots with Different';
title2 'Horizontal Axis Specifications';
run;
Program Description
Create the EQUA data set. EQUA creates values of X and Y by incrementing the
variable Y from 1 to 3 by increments of .1. Each value of X is calculated as 10Y.
data equa;
do Y=1 to 3 by .1;
X=10**y;
output;
end;
run;
Specify the plot sizes. HPERCENT= makes room for two plots side-by-side by
specifying that 50% of the horizontal space is used for each plot.
proc plot data=equa hpercent=50;
Create the plots. The PLOT statement requests plot Y on the vertical axis and X on
the horizontal axis. HAXIS= specifies a logarithmic scale for the horizontal axis for
the second plot. The VSPACE= option specifies the amount of print space between
the tick marks.
plot y*x / vspace=1;
plot y*x / haxis=10 100 1000 vspace=1;
1662 Chapter 45 / PLOT Procedure
Output
Output 45.8 Two Plots with Different Horizontal Axis Specifications
DATA step
Data set: EMERGENCY_CALLS
Details
This example uses a DATA step to create the data set EMERGENCY_CALLS and
shows how you can specify date values on an axis.
Program
options formchar="|----|+|---+=|-/\<>*";
data emergency_calls;
input Date : date7. Calls @@;
label calls='Number of Calls';
datalines;
1APR94 134 11APR94 384 13FEB94 488
2MAR94 289 21MAR94 201 14MAR94 460
3JUN94 184 13JUN94 152 30APR94 356
4JAN94 179 14JAN94 128 16JUN94 480
5APR94 360 15APR94 350 24JUL94 388
6MAY94 245 15DEC94 150 17NOV94 328
7JUL94 280 16MAY94 240 25AUG94 280
8AUG94 494 17JUL94 499 26SEP94 394
9SEP94 309 18AUG94 248 23NOV94 590
19SEP94 356 24FEB94 201 29JUL94 330
10OCT94 222 25MAR94 183 30AUG94 321
11NOV94 294 26APR94 412 2DEC94 511
27MAY94 294 22DEC94 413 28JUN94 309
;
proc plot data=emergency_calls;
plot calls*date / haxis='1JAN94'd to '1JAN95'd by month vaxis=by
100 vspace=5;
format date mmyyd5.;
title 'Calls to City Emergency Services Number';
title2 'Sample of Days for 1994';
run;
Program Description
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available.
options formchar="|----|+|---+=|-/\<>*";
1664 Chapter 45 / PLOT Procedure
Create the plot. The plot request plots Calls on the vertical axis and Date on the
horizontal axis. HAXIS= uses a monthly time for the horizontal axis. The notation
'1JAN94'd is a date constant. The value '1JAN95'd ensures that the axis will have
enough room for observations from December.
proc plot data=emergency_calls;
plot calls*date / haxis='1JAN94'd to '1JAN95'd by month vaxis=by
100 vspace=5;
Format the DATE values. The FORMAT statement assigns the MMYYD5. format to
Date, which uses 2-digit month and 2-digit year values separated by a hyphen.
format date mmyyd5.;
Output
Output 45.9 Plot with Date Values along the Horizontal Axis
Details
This example uses a DATA step to create the data set CONTOURS. It shows how to
represent the values of three variables with a two-dimensional plot by setting one
of the variables as the CONTOUR variable. The variables X and Y appear on the
axes, and Z is the contour variable. Program statements are used to generate the
observations for the plot, and the following equation describes the contour surface:
z = 46.2 + .09x − .0005x2 + .1y − .0005y2 + .0004xy
Program
options formchar="|----|+|---+=|-/\<>*";
data contours;
format Z 5.1;
do X=0 to 400 by 5;
do Y=0 to 350 by 10;
z=46.2+.09*x-.0005*x**2+.1*y-.0005*y**2+.0004*x*y;
output;
end;
end;
run;
proc print data=contours(obs=5) noobs;
title 'CONTOURS Data Set';
title2 'First 5 Observations Only';
run;
proc plot data=contours;
plot y*x=z / contour=10;
title 'A Contour Plot';
run;
Program Description
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available.
options formchar="|----|+|---+=|-/\<>*";
Create the CONTOURS data set. The CONTOURS data set contains observations
with values of X that range from 0 to 400 by 5 and with values of Y that range from
0 to 350 by 10.
data contours;
format Z 5.1;
Example 7: Producing a Contour Plot 1667
do X=0 to 400 by 5;
do Y=0 to 350 by 10;
z=46.2+.09*x-.0005*x**2+.1*y-.0005*y**2+.0004*x*y;
output;
end;
end;
run;
Print the CONTOURS data set. The OBS= data set option limits the printing to only
the first 5 observations. NOOBS suppresses printing of the observation numbers.
proc print data=contours(obs=5) noobs;
title 'CONTOURS Data Set';
title2 'First 5 Observations Only';
run;
Create the plot. The PLOT statement plots Y on the vertical axis, plots X on the
horizontal axis, and specifies Z as the contour variable. CONTOUR=10 specifies
that the plot will divide the values of Z into ten increments, and each increment will
have a different plotting symbol.
proc plot data=contours;
plot y*x=z / contour=10;
Output
The shadings associated with the values of Z appear at the bottom of the plot. The
plotting symbol # shows where high values of Z occur.
Details
This example uses the data set “EDUCATION” on page 2785 to show BY-group
processing in PROC PLOT.
Program
options formchar=|----|+|---+=|-/\<>*";
data education;
input State $14. +1 Code $ DropoutRate Expenditures MathScore
Region $;
label dropout='Dropout Percentage - 1989'
expend='Expenditure Per Pupil - 1989'
math='8th Grade Math Exam - 1990';
datalines;
Alabama AL 22.3 3197 252 SE
Alaska AK 35.8 7716 . W
...more data lines...
New York NY 35.0 . 261 NE
North Carolina NC 31.2 3874 250 SE
North Dakota ND 12.1 3952 281 MW
Ohio OH 24.4 4649 264 MW
;
proc sort data=education;
by region;
run;
proc plot data=education;
by region;
plot expenditures*dropoutrate='*' / href=28.6
vaxis=by 500 vspace=5
haxis=by 5 hspace=12;
title 'Plot of Dropout Rate and Expenditure Per Pupil';
run;
1670 Chapter 45 / PLOT Procedure
Program Description
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available.
options formchar=|----|+|---+=|-/\<>*";
Create the EDUCATION data set. “EDUCATION” on page 2785 contains educational
data (Source: U.S. Department of Education) about some U.S. states. DropoutRate
is the percentage of high school dropouts. Expenditures is the dollar amount the
state spends on each pupil. MathScore is the score of eighth-grade students on a
standardized math test. Not all states participated in the math test.
data education;
input State $14. +1 Code $ DropoutRate Expenditures MathScore
Region $;
label dropout='Dropout Percentage - 1989'
expend='Expenditure Per Pupil - 1989'
math='8th Grade Math Exam - 1990';
datalines;
Alabama AL 22.3 3197 252 SE
Alaska AK 35.8 7716 . W
...more data lines...
New York NY 35.0 . 261 NE
North Carolina NC 31.2 3874 250 SE
North Dakota ND 12.1 3952 281 MW
Ohio OH 24.4 4649 264 MW
;
Sort the EDUCATION data set. PROC SORT sorts EDUCATION by Region so that
Region can be used as the BY variable in PROC PLOT.
proc sort data=education;
by region;
run;
Create a separate plot for each BY group. The BY statement creates a separate
plot for each value of Region.
proc plot data=education;
by region;
Create the plot with a reference line. The PLOT statement plots Expenditures on
the vertical axis, plots DropoutRate on the horizontal axis, and specifies an asterisk
as the plotting symbol. HREF= draws a reference line that extends from 28.6 on the
horizontal axis. The reference line represents the national average. VAXIS and
HAXIS are used to set the tick marks along the vertical and horizontal axes. The
VSPACE= option specifies the amount of print space between the vertical tick
marks.
plot expenditures*dropoutrate='*' / href=28.6
vaxis=by 500 vspace=5
haxis=by 5 hspace=12;
Output
PROC PLOT produces a plot for each BY group. Only the plots for Midwest and
Northeast are shown.
PROC SORT
Data set: EDUCATION
Details
This example shows how to use variables in a data set to label the points on a plot.
The example adds labels to the output from the example “Example 8: Plotting BY
Groups” on page 1668 . PROC SORT is used first to sort the data set by Region so
that Region can be used as the BY variable in the first PLOT statement.
Program
options formchar="|----|+|---+=|-/\<>*";
proc sort data=education;
by region;
run;
proc plot data=education;
by region;
plot expenditures*dropoutrate='*' $ state / href=28.6
vaxis=by 500 vspace=5
haxis=by 5 hspace=12;
title 'Plot of Dropout Rate and Expenditure Per Pupil';
run;
Program Description
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available.
options formchar="|----|+|---+=|-/\<>*";
Sort the EDUCATION data set. PROC SORT sorts EDUCATION by Region so that
Region can be used as the BY variable in PROC PLOT.
proc sort data=education;
by region;
run;
Create a separate plot for each BY group. The BY statement creates a separate
plot for each value of Region.
proc plot data=education;
by region;
1674 Chapter 45 / PLOT Procedure
Create the plot with a reference line and a label for each data point. The plot
request plots Expenditures on the vertical axis, plots DropoutRate on the
horizontal axis, and specifies an asterisk as the plotting symbol. The label variable
specification ($ state) in the PLOT statement labels each point on the plot with
the name of the corresponding state. HREF= draws a reference line that extends
from 28.6 on the horizontal axis. The reference line represents the national average.
VAXIS and HAXIS are used to set the tick marks along the vertical and horizontal
axes. The HSPACE=12 option specifies that there are 12 print spaces between the
horizontal tick marks.
plot expenditures*dropoutrate='*' $ state / href=28.6
vaxis=by 500 vspace=5
haxis=by 5 hspace=12;
Output
PROC PLOT produces a plot for each BY group. Only the plots for Midwest and
Northeast are shown.
Example 9: Adding Labels to a Plot 1675
BY statement
PLOT statement
PLOT statement options
HAXIS=
HREF=
HSPACE=
VAXIS=
VSPACE=
PROC SORT
WHERE statement
Data set: EDUCATION
Details
This example shows how missing values affect the calculation of the axes. The
example uses the “EDUCATION” on page 2785 data set.
Program
options formchar="|----|+|---+=|-/\<>*";
proc sort data=education;
by region;
run;
proc plot data=education nomiss;
by region;
plot expenditures*dropoutrate='*' $ state / href=28.6
vaxis=by 500 vspace=5
haxis=by 5 hspace=12;
title 'Plot of Dropout Rate and Expenditure Per Pupil';
run;
Program Description
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available.
options formchar="|----|+|---+=|-/\<>*";
Sort the EDUCATION data set. PROC SORT sorts EDUCATION by Region so that
Region can be used as the BY variable in PROC PLOT.
proc sort data=education;
1678 Chapter 45 / PLOT Procedure
by region;
run;
Exclude data points with missing values. NOMISS excludes observations that have
a missing value for either of the axis variables.
proc plot data=education nomiss;
Create a separate plot for each BY group. The BY statement creates a separate
plot for each value of Region.
by region;
Create the plot with a reference line and a label for each data point. The plot
request plots Expenditures on the vertical axis, plots DropoutRate on the
horizontal axis, and specifies an asterisk as the plotting symbol. The label variable
specification ($ state) in the PLOT statement labels each point on the plot with
the name of the corresponding state. HREF= draws a reference line extending from
28.6 on the horizontal axis. The reference line represents the national average.
VAXIS and HAXIS are used to set the tick marks along the vertical and horizontal
axes. The VSPACE=5 option specifies that there are 5 spaces between tick marks
on the vertical axis and HSPACE=12 specifies that there are 12 spaces between the
horizontal tick marks.
plot expenditures*dropoutrate='*' $ state / href=28.6
vaxis=by 500 vspace=5
haxis=by 5 hspace=12;
Output
PROC PLOT produces a plot for each BY group. Only the plot for the Northeast is
shown. Because New York has a missing value for Expenditures, the observation is
excluded and PROC PLOT does not use the value 35 for DropoutRate to calculate
the horizontal axis. Compare the horizontal axis in this output with the horizontal
axis in the plot for Northeast in “Example 9: Adding Labels to a Plot” on page 1672.
Example 11: Adjusting Labels on a Plot with the PLACEMENT= Option 1679
Details
This example illustrates the default placement of labels and how to adjust the
placement of labels on a crowded plot. The labels are values of variables in the
data set “CENSUS” on page 2748.1
Program
options formchar="|----|+|---+=|-/\<>*";
data census;
input Density CrimeRate State $ 14-27 PostalCode $ 29-30;
datalines;
263.3 4575.3 Ohio OH
62.1 7017.1 Washington WA
1. Source: U.S. Bureau of the Census and the 1987 Uniform Crime Reports, FBI.
Example 11: Adjusting Labels on a Plot with the PLACEMENT= Option 1681
Program Description
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available.
options formchar="|----|+|---+=|-/\<>*";
Create the CENSUS data set. CENSUS contains the variables CrimeRate and
Density for selected states. CrimeRate is the number of crimes per 100,000 people.
Density is the population density per square mile in the 1980 census. A DATA step,
“CENSUS” on page 2748 creates this data set.
data census;
input Density CrimeRate State $ 14-27 PostalCode $ 29-30;
datalines;
263.3 4575.3 Ohio OH
62.1 7017.1 Washington WA
Create the plot with a label for each data point. The plot request plots Density on
the vertical axis, CrimeRate on the horizontal axis, and uses the first letter of the
value of State as the plotting symbol. This makes it easier to match the symbol
with its label. The label variable specification ($ state) in the PLOT statement
labels each point with the corresponding state name.
Specify plot options. BOX draws a box around the plot. LIST= lists the labels that
have penalties greater than or equal to 1. HAXIS= and VAXIS= specify increments
only. PROC PLOT uses the data to determine the range for the axes. The
VSPACE=10 option specifies that there are 10 spaces between tick marks on the
vertical axis and HSPACE=10 specifies that there are 10 spaces between the
horizontal tick marks.
box
list=1
haxis=by 1000
vaxis=by 250
vspace=10
hspace=10;
Request a second plot. Because PROC PLOT is interactive, the procedure is still
running at this point in the program. It is not necessary to restart the procedure to
submit another plot request. LIST=1 produces no output because there are no
penalties of 1 or greater.
plot density*crimerate=state $ state /
box
list=1
haxis=by 1000
1682 Chapter 45 / PLOT Procedure
vaxis=by 250
vspace=10
Output
The labels Tennessee, South Carolina, Arkansas, Minnesota, and South Dakota
have penalties. The default placement states do not provide enough possibilities
for PROC PLOT to avoid penalties given the proximity of the points. Four label
characters are hidden.
Example 11: Adjusting Labels on a Plot with the PLACEMENT= Option 1683
Details
This example illustrates the default placement of labels and uses a macro to adjust
the placement of labels. The labels are values of a variable in the data set
“CENSUS” on page 2748.
Program
options formchar="|----|+|---+=|-/\<>*";
%macro place(n);
%if &n > 13 %then %let n = 13;
placement=(
%if &n <= 0 %then (s=center); %else (h=2 -2 : s=right left);
%if &n = 1 %then (v=1 * h=0 -1 to -2 by alt);
%else %if &n = 2 %then (v=1 -1 * h=0 -1 to -5 by alt);
%else %if &n > 2 %then (v=1 to 2 by alt * h=0 -1 to -10 by alt);
%if &n > 3 %then
(s=center right left * v=0 1 to %eval(&n - 2) by alt *
h=0 -1 to %eval(-3 * (&n - 2)) by alt *
l=1 to %eval(2 + (10 * &n - 35) / 30)); )
%if &n > 4 %then penalty(7)=%eval((3 * &n) / 2);
%mend;
proc plot data=census;
plot density*crimerate=state $ state /
box
list=1
haxis=by 1000
vaxis=by 250
vspace=12
%place(4);
title 'A Plot of Population Density and Crime Rates';
run;
Program Description
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available.
options formchar="|----|+|---+=|-/\<>*";
Create the plot. The plot request plots Density on the vertical axis, CrimeRate on
the horizontal axis, and uses the first letter of the value of State as the plotting
symbol. The label variable specification ($ state ) in the PLOT statement t labels
each point with the corresponding state name.
proc plot data=census;
plot density*crimerate=state $ state /
Specify plot options. BOX draws a box around the plot. LIST= lists the labels that
have penalties greater than or equal to 1. HAXIS= and VAXIS= specify increments
only. PROC PLOT uses the data to determine the range for the axes. The
VSPACE=12 option specifies that there are 12 spaces between tick marks on the
vertical axis. The PLACE macro determines the placement of the labels.
box
list=1
haxis=by 1000
vaxis=by 250
vspace=12
%place(4);
Output
Output 45.19 Plot with Labels Placed Using a Macro
Details
This example demonstrates how changing a default penalty affects the placement
of labels. The goal is to produce a plot that has labels that do not detract from how
the points are scattered.
Program
options formchar="|----|+|---+=|-/\<>*";
proc plot data=census;
plot density*crimerate=state $ state /
placement=(h=100 to 10 by alt * s=left right)
penalties(4)=500 list=0
haxis=0 to 13000 by 1000
vaxis=by 100
vspace=5;
title 'A Plot of Population Density and Crime Rates';
run;
Program Description
Set the FORMCHAR option. Setting FORMCHAR to this exact string renders better
HTML output when it is viewed outside of the SAS environment where SAS
Monospace fonts are not available.
options formchar="|----|+|---+=|-/\<>*";
proc plot data=census;
plot density*crimerate=state $ state /
Change the default penalty. PENALTIES(4)= changes the default penalty for a free
horizontal shift to 500, which removes all penalties for a horizontal shift. LIST=
shows how far PROC PLOT shifted the labels away from their respective points.
penalties(4)=500 list=0
Customize the axes. HAXIS= creates a horizontal axis long enough to leave space
for the labels on the sides of the plot. VAXIS= specifies that the values on the
vertical axis be in increments of 100. The VSPACE=5 option specifies that there are
5 spaces between tick marks on the vertical axis.
Example 13: Changing a Default Penalty 1689
Output
Output 45.20 Plot with Default Penalties Adjusted
1690 Chapter 45 / PLOT Procedure
46
PMENU Procedure
Menus can replace the command line as a way to execute commands. To activate
menus, issue the PMENU command from any command line. Menus must be
activated in order for them to appear.
SAS Menus
When menus are activated, each active window has a menu bar, which lists items
that you can select. Depending on which item you select, SAS either processes a
command, displays a menu or a submenu, or requests that you complete
information in a dialog box. The dialog box is simply a box of questions or choices
that require answers before an action can be performed. The following figure
illustrates features that you can create with PROC PMENU.
Note: A menu bar in some operating environments might appear as a pop-up menu
or might appear at the bottom of the window.
Procedure Execution
You must include an initial MENU statement that defines the menu bar, and you
must include all ITEM statements and any SELECTION, MENU, SUBMENU, and
DIALOG statements as well as statements that are associated with the DIALOG
statement within the same RUN group. For example, the following statements
define two separate PMENU catalog entries. Both are stored in the same catalog,
but each PMENU catalog entry is independent of the other. In the example, both
PMENU catalog entries create menu bars that simply list windowing environment
commands the user can select and execute:
libname proclib 'SAS-data-library';
menu menu2;
item end;
item pgm;
item log;
item output;
run;
When you submit these statements, you receive a message that says that the
PMENU entries have been created. To display one of these menu bars, you must
associate the PMENU catalog entry with a window and then activate the window
1694 Chapter 46 / PMENU Procedure
with the menus turned on, as described in “Steps for Building and Using PMENU
Catalog Entries” on page 1694.
1 Use PROC PMENU to define the menu bars, menus, and other features that you
want. Store the output of PROC PMENU in a SAS catalog. For more information,
see “Associating a Menu with a Window” on page 1732.
3 Associate the PMENU catalog entry created in step 1 with a window by using
one of the following:
n the MENU= option in the WINDOW statement in Base SAS software. For
more information, see “Associating a Menu with a Window” on page 1732.
n the MENU= option in the %WINDOW statement in the macro facility.
4 Activate the window that you created. Make sure that the menus are turned on.
Concepts: PMENU Procedure 1695
proc pmenu;
menu menu-bar;
item 'menu-item' menu=pull-down-menu;
...more-ITEM-statements...
menu pull-down-menu;
...ITEM-statements-for-pull-down-menu...
run;
n Create a menu bar with an item that submits a command other than the one that
appears on the menu bar:
proc pmenu;
menu menu-bar;
item 'menu-item' selection=selection;
...more-ITEM-statements...
selection selection 'command-string';
run;
n Create a menu bar with an item that opens a dialog box, which displays
information and requests text input:
proc pmenu;
menu menu-bar;
item 'menu-item' menu=pull-down-menu;
...more-ITEM-statements...
menu pull-down-menu;
item 'menu-item' dialog=dialog-box;
dialog dialog-box 'command @1';
text #line @column 'text';
text #line @column LEN=field-length;
run;
n Create a menu bar with an item that opens a dialog box, which permits one
choice from a list of possible values:
proc pmenu;
menu menu-bar;
item 'menu-item' menu=pull-down-menu;
...more-ITEM-statements...
menu pull-down-menu;
1696 Chapter 46 / PMENU Procedure
n Create a menu bar with an item that opens a dialog box, which permits several
independent choices:
proc pmenu;
menu menu-bar;
item 'menu-item' menu=pull-down-menu;
...more-ITEM-statements...
menu pull-down-menu;
item 'menu-item' dialog=dialog-box;
dialog dialog-box 'command &1';
text #line @column 'text';
checkbox #line @column 'text';
...more-CHECKBOX-statements...
run;
TEXT Specify text and the input fields for a dialog Ex. 2
box
Restriction: This procedure is not available in SAS Viya orders that include only SAS Visual
Analytics.
Example: “Example 1: Building a Menu Bar for an FSEDIT Application” on page 1713
1698 Chapter 46 / PMENU Procedure
Syntax
PROC PMENU <CATALOG=<libref.>catalog>
<DESC 'entry-description'>;
Optional Arguments
CATALOG=<libref.>catalog
specifies the catalog in which you want to store PMENU entries.
Default If you omit libref, then the PMENU entries are stored in a catalog in
the SASUSER library. If you omit CATALOG=, then the entries are
stored in the SASUSER.PROFILE catalog.
DESC 'entry-description'
provides a description for the PMENU catalog entries created in the step.
Note These descriptions are displayed when you use the CATALOG window
in the windowing environment or the CONTENTS statement in the
CATALOG procedure.
CHECKBOX Statement
Defines choices that a user can make within a dialog box.
Syntax
CHECKBOX <ON> #line @column 'text-for-selection'
<COLOR=color> <SUBSTITUTE='text-for-substitution'>;
Required Arguments
column
specifies the column in the dialog box where the check box and text are placed.
line
specifies the line in the dialog box where the check box and text are placed.
DIALOG Statement 1699
text-for-selection
defines the text that describes this check box. This text appears in the window
and, if the SUBSTITUTE= option is not used, is also inserted into the command
in the preceding DIALOG statement when the user selects the check box.
Optional Arguments
COLOR=color
defines the color of the check box and the text that describes it.
ON
indicates that by default this check box is active. If you use this option, then you
must specify it immediately after the CHECKBOX keyword.
SUBSTITUTE='text-for-substitution'
specifies the text that is to be inserted into the command in the DIALOG
statement.
Details
DIALOG Statement
Describes a dialog box that is associated with an item on a menu.
Syntax
DIALOG dialog-box 'command-string field-number-specification';
1700 Chapter 46 / PMENU Procedure
Required Arguments
command-string
is the command or partial command that is executed when the item is selected.
The limit of the command-string that results after the substitutions are made is
the command-line limit for your operating environment. Typically, the
command-line limit is approximately 80 characters.
Note: If you are using PROC PMENU to submit any command that is valid only
in the PROGRAM EDITOR window (such as the INCLUDE command), then you
must have the windowing environment running, and you must return control to
the PROGRAM EDITOR window.
dialog-box
is the same name specified for the DIALOG= option in a previous ITEM
statement.
field-number-specification
can be one or more of the following:
You can embed the field numbers, for example, @1, %1, or &1, in the command
string and mix different types of field numbers within a command string. The
numeric portion of the field number corresponds to the relative position of
TEXT, RADIOBOX, and CHECKBOX statements, not to any actual number in
these statements.
@1…@n
are optional TEXT statement numbers that can add information to the
command before it is submitted. Numbers preceded by an at sign (@)
correspond to TEXT statements that use the LEN= option to define input
fields.
%1…%n
are optional RADIOBOX statement numbers that can add information to the
command before it is submitted. Numbers preceded by a percent sign (%)
correspond to RADIOBOX statements following the DIALOG statement.
&1…&n
are optional CHECKBOX statement numbers that can add information to the
command before it is submitted. Numbers preceded by an ampersand (&)
correspond to CHECKBOX statements following the DIALOG statement.
Note To specify a literal @ (at sign), % (percent sign), or & (ampersand) in the
command-string, use a double character: @@ (at signs), %% (percent
signs), or && (ampersands).
DIALOG Statement 1701
Details
n You cannot control the placement of the dialog box. The dialog box is not
scrollable. The size and placement of the dialog box are determined by your
windowing environment.
n To use the DIALOG statement, specify an ITEM statement with the DIALOG=
option in the ITEM statement.
n The ITEM statement creates an entry in a menu bar or in a menu, and the
DIALOG= option specifies which DIALOG statement describes the dialog box.
n You can use CHECKBOX, RADIOBOX, and RBUTTON statements to define the
contents of the dialog box.
n The following figure shows a typical dialog box. A dialog box can request
information in three ways:
o Fill in a field. Fields that accept text from a user are called text fields.
o Choose from a list of mutually exclusive choices. A group of selections of
this type is called a radio button, and each individual selection is called a
radio button.
o Indicate whether you want to select other independent choices. For example,
you could choose to use various options by selecting any or all of the listed
selections. A selection of this type is called a check box.
Dialog boxes have two or more buttons, such as OK and Cancel, automatically
built into the box. A button causes an action to occur.
ITEM Statement
Identifies an item to be listed in a menu bar or in a menu.
Examples: “Example 1: Building a Menu Bar for an FSEDIT Application” on page 1713
“Example 3: Creating a Dialog Box to Search Multiple Variables” on page 1719
“Example 5: Associating Menus with a FRAME Application” on page 1735
Syntax
ITEM command <options> <action-options>;
ITEM 'menu-item' <options> <action-options>;
Required Arguments
command
a single word that is a valid SAS command for the window in which the menu
appears. Commands that are more than one word, such as WHERE CLEAR, must
be enclosed in single quotation marks. The command appears in uppercase
letters on the menu bar.
If you want to control the case of a SAS command on the menu, then enclose
the command in single quotation marks. The case that you use then appears on
the menu.
ITEM Statement 1703
'menu-item'
a word or text string, enclosed in quotation marks, that describes the action that
occurs when the user selects this item. A menu item should not begin with a
percent sign (%).
Optional Arguments
ACCELERATE=name-of-key
defines a key sequence that can be used instead of selecting an item. When the
user presses the key sequence, it has the same effect as selecting the item from
the menu bar or menu.
action-option
is one of the following:
DIALOG=dialog-box
specifies the name of an associated DIALOG statement, which displays a
dialog box when the user selects this item.
MENU=pull-down-menu
specifies the name of an associated MENU statement, which displays a
menu when the user selects this item.
SELECTION=selection
specifies the name of an associated SELECTION statement, which submits a
command when the user selects this item.
SUBMENU=submenu
associates the item with a common submenu.
GRAY
indicates that the item is not an active choice in this window. This option is
useful when you want to define standard lists of items for many windows, but
not all items are valid in all windows. When this option is set and the user
selects the item, no action occurs.
HELP='help-text'
specifies text that is displayed when the user displays the menu item. For
example, if you use a mouse to pull down a menu, then position the mouse
pointer over the item and the text is displayed.
ID=integer
specifies a value that is used as an identifier for an item in a menu. This
identifier is used within a SAS/AF application to selectively activate or
deactivate items in a menu or to set the state of an item as a check box or a
radio button.
You can use the same ID for more than one item.
MNEMONIC=character
underlines the first occurrence of character in the text string that appears on the
menu. The character must be in the text string.
The character is typically used in combination with another key, such as Alt.
When you use the key sequence, it has the same effect as putting your cursor on
the item. But it does not invoke the action that the item controls.
STATE=CHECK | RADIO
provides the ability to place a check box or a radio button next to an item that
has been selected.
Tip STATE= is used with the ID= option and the WINFO function in SAS
Component Language.
Details
All ITEM statements for a menu must be placed immediately after the MENU
statement and before any DIALOG, SELECTION, SUBMENU, or other MENU
statements. In some operating environments, you can insert SEPARATOR
statements between ITEM statements to produce lines separating groups of items
in a menu. For more information, see “SEPARATOR Statement” on page 1710.
Note: If you specify a menu bar that is too long for the window, then it might be
truncated or wrapped to multiple lines.
MENU Statement
Names the catalog entry that stores the menus or defines a menu.
Examples: “Example 1: Building a Menu Bar for an FSEDIT Application” on page 1713
“Example 5: Associating Menus with a FRAME Application” on page 1735
Syntax
MENU menu-bar;
MENU pull-down-menu;
1706 Chapter 46 / PMENU Procedure
Required Arguments
One of the following arguments is required:
menu-bar
names the catalog entry that stores the menus.
pull-down-menu
names the menu that appears when the user selects an item in the menu bar.
The value of pull-down-menu must match the pull-down-menu name that is
specified in the MENU= option in a previous ITEM statement.
Details
Defining Menus
When used to define a menu, the MENU statement must follow an ITEM statement
that specifies the MENU= option. Both the ITEM statement and the MENU
statement for the menu must be in the same RUN group as the MENU statement
that defines the menu bar for the PMENU catalog entry.
For both menu bars and menus, follow the MENU statement with ITEM statements
that define each of the items that appear on the menu. Group all ITEM statements
for a menu together. For example, the following PROC PMENU step creates one
catalog entry, WINDOWS, which produces a menu bar with two items, Primary
windows and Other windows. When you select one of these items, a menu is
displayed.
libname proclib 'SAS-data-library';
RADIOBOX Statement
Defines a box that contains mutually exclusive choices within a dialog box.
Syntax
RADIOBOX DEFAULT=button-number;
Required Argument
DEFAULT=button-number
indicates which radio button is the default.
Default 1
Details
The RADIOBOX statement indicates the beginning of a list of selections.
Immediately after the RADIOBOX statement, you must list an RBUTTON
statement for each of the selections the user can make. When the user makes a
choice, the text value that is associated with the selection is inserted into the
command string of the previous DIALOG statement at field locations prefixed by a
percent sign (%).
1708 Chapter 46 / PMENU Procedure
RBUTTON Statement
Lists mutually exclusive choices within a dialog box.
Syntax
RBUTTON <NONE> #line @column'text-for-selection'
<COLOR=color> <SUBSTITUTE='text-for-substitution'>;
Required Arguments
column
specifies the column in the dialog box where the radio button and text are
placed.
line
specifies the line in the dialog box where the radio button and text are placed.
text-for-selection
defines the text that appears in the dialog box and, if the SUBSTITUTE= option
is not used, defines the text that is inserted into the command in the preceding
DIALOG statement.
Note: Be careful not to overlap columns and lines when placing text and radio
buttons. If you overlap text and buttons, Then you will get an error message.
Also, specify space between other text and a radio button.
Optional Arguments
COLOR=color
defines the color of the radio button and the text that describes the button.
NONE
defines a button that indicates none of the other choices. Defining this button
enables the user to ignore any of the other choices. No characters, including
blanks, are inserted into the DIALOG statement.
Restriction If you use this option, then it must appear immediately after the
RBUTTON keyword.
SELECTION Statement 1709
SUBSTITUTE='text-for-substitution'
specifies the text that is to be inserted into the command in the DIALOG
statement.
SELECTION Statement
Defines a command that is submitted when an item is selected.
Syntax
SELECTION selection 'command-string';
Required Arguments
selection
is the same name specified for the SELECTION= option in a previous ITEM
statement.
command-string
is a text string, enclosed in quotation marks, that is submitted as a command-
line command when the user selects this item. There is a limit of 200 characters
for command-string. However, the command-line limit of approximately 80
characters cannot be exceeded. The command-line limit differs slightly for
various operating environments.
Note: SAS uses only the first eight characters of an item that is specified with a
SELECTION statement. When a user selects an item from a menu list, the first
eight characters of each item name in the list must be unique so that SAS can
select the correct item in the list. If the first eight characters are not unique, SAS
selects the last item in the list.
Details
You define the name of the item in the ITEM statement and specify the
SELECTION= option to associate the item with a subsequent SELECTION
1710 Chapter 46 / PMENU Procedure
statement. The SELECTION statement then defines the actual command that is
submitted when the user chooses the item in the menu bar or menu.
You are likely to use the SELECTION statement to define a command string. You
create a simple alias by using the ITEM statement, which invokes a longer
command string that is defined in the SELECTION statement. For example, you
could include an item in the menu bar that invokes a WINDOW statement to enable
data entry. The actual commands that are processed when the user selects this
item are the commands to include and submit the application.
Note: If you are using PROC PMENU to issue any command that is valid only in the
PROGRAM EDITOR window (such as the INCLUDE command), then you must have
the windowing environment running. Also, you must return control to the
PROGRAM EDITOR window.
SEPARATOR Statement
Draws a line between items on a menu.
Syntax
SEPARATOR;
SUBMENU Statement
Specifies the SAS file that contains a common submenu associated with an item.
Example: “Example 1: Building a Menu Bar for an FSEDIT Application” on page 1713
Syntax
SUBMENU submenu-name SAS-file;
TEXT Statement 1711
Required Arguments
submenu-name
specifies a name for the submenu statement. To associate a submenu with a
menu item, submenu-name must match the submenu name specified in the
SUBMENU= action-option in the ITEM statement.
SAS-file
specifies the name of the SAS file that contains the common submenu.
TEXT Statement
Specifies text and the input fields for a dialog box.
Syntax
TEXT #line @column field-description
<ATTR=attribute> <COLOR=color>;
Required Arguments
column
specifies the starting column for the text or input field.
field-description
defines how the TEXT statement is used. The field-description can be one of the
following:
LEN=field-length
is the length of an input field in which the user can enter information. If the
LEN= argument is used, then the information entered in the field is inserted
into the command string of the previous DIALOG statement at field
locations that are prefixed by an at sign (@).
'text'
is the text string that appears inside the dialog box at the location defined by
line and column.
line
specifies the line number for the text or input field.
1712 Chapter 46 / PMENU Procedure
Optional Arguments
ATTR=attribute
defines the attribute for the text or input field. These are valid attribute values:
n BLINK
n HIGHLIGH
n REV_VIDE
n UNDERLIN
COLOR=color
defines the color for the text or input field characters. Here are the color values
that you can use:
BLACK BROWN
GRAY MAGENTA
PINK WHITE
BLUE CYAN
GREEN ORANGE
RED YELLOW
Details
This example creates a menu bar that can be used in an FSEDIT application to
replace the default menu bar. The selections available on these menus do not
enable end users to delete or duplicate observations.
Note:
n The windows in the PROC PMENU examples were produced in the UNIX
environment and might appear slightly different from the same windows in
other operating environments.
n You should know the operating environment-specific system options that can
affect how menus are displayed and merged with existing SAS menus. For
details, see the SAS documentation for your operating environment.
Program
libname proclib
1714 Chapter 46 / PMENU Procedure
'SAS-data-library';
proc pmenu catalog=proclib.menucat;
menu project;
item 'File' menu=f;
item 'Edit' submenu=editmnu;
item 'Scroll' menu=s;
item 'Help' menu=h;
menu f;
item 'Goback' selection=g;
item 'Save';
selection g 'end';
submenu editmnu sashelp.core.edit;
menu s;
item 'Next Obs' selection=n;
item 'Prev Obs' selection=p;
item 'Top';
item 'Bottom';
selection n 'forward';
selection p 'backward';
menu h;
item 'Keys';
item 'About this application' selection=hlp;
selection hlp 'sethelp user.menucat.staffhlp.help;help';
quit;
Program Description
Declare the PROCLIB library. The PROCLIB library is used to store menu
definitions.
libname proclib
'SAS-data-library';
Specify the catalog for storing menu definitions. Menu definitions will be stored in
the PROCLIB.MENUCAT catalog.
proc pmenu catalog=proclib.menucat;
Specify the name of the catalog entry. The MENU statement specifies PROJECT
as the name of the catalog entry. The menus are stored in the catalog entry
PROCLIB.MENUCAT.PROJECT.PMENU.
menu project;
Design the menu bar. The ITEM statements specify the items for the menu bar. The
value of the MENU= option is used in a subsequent MENU statement. The Edit item
uses a common predefined submenu; the menus for the other items are defined in
this PROC step.
item 'File' menu=f;
item 'Edit' submenu=editmnu;
item 'Scroll' menu=s;
item 'Help' menu=h;
Example 1: Building a Menu Bar for an FSEDIT Application 1715
Design the File menu. This group of statements defines the selections available
under File on the menu bar. The first ITEM statement specifies Goback as the first
selection under File. The value of the SELECTION= option corresponds to the
subsequent SELECTION statement, which specifies END as the command that is
issued for that selection. The second ITEM statement specifies that the SAVE
command is issued for that selection.
menu f;
item 'Goback' selection=g;
item 'Save';
selection g 'end';
Design the Scroll menu. This group of statements defines the selections available
under Scroll on the menu bar.
menu s;
item 'Next Obs' selection=n;
item 'Prev Obs' selection=p;
item 'Top';
item 'Bottom';
selection n 'forward';
selection p 'backward';
Design the Help menu. This group of statements defines the selections available
under Help on the menu bar. The SETHELP command specifies a HELP entry that
contains user-written information for this FSEDIT application. The semicolon that
appears after the HELP entry name enables the HELP command to be included in
the string. The HELP command invokes the HELP entry.
menu h;
item 'Keys';
item 'About this application' selection=hlp;
selection hlp 'sethelp user.menucat.staffhlp.help;help';
quit;
You can also specify the menu bar on the command line in the FSEDIT session or by
issuing a CALL EXECCMD command in SAS Component Language (SCL).
For other methods of associating the customized menu bar with the FSEDIT
window, see “Associating a Menu Bar with an FSEDIT Session” on page 1724.
1716 Chapter 46 / PMENU Procedure
Details
This example adds a dialog box to the menus created in “Example 1: Building a
Menu Bar for an FSEDIT Application” on page 1713. The dialog box enables the user
to use a WHERE clause to subset the SAS data set.
Program
libname proclib
'SAS-data-library';
proc pmenu catalog=proclib.menucat;
menu project;
Example 2: Collecting User Input in a Dialog Box 1717
Program Description
Declare the PROCLIB library. The PROCLIB library is used to store menu
definitions.
libname proclib
'SAS-data-library';
Specify the catalog for storing menu definitions. Menu definitions will be stored in
the PROCLIB.MENUCAT catalog.
proc pmenu catalog=proclib.menucat;
Specify the name of the catalog entry. The MENU statement specifies PROJECT
as the name of the catalog entry. The menus are stored in the catalog entry
PROCLIB.MENUCAT.PROJECT.PMENU.
menu project;
Design the menu bar. The ITEM statements specify the items for the menu bar. The
value of the MENU= option is used in a subsequent MENU statement.
1718 Chapter 46 / PMENU Procedure
Design the File menu. This group of statements defines the selections under File on
the menu bar. The first ITEM statement specifies Goback as the first selection under
File. The value of the SELECTION= option corresponds to the subsequent
SELECTION statement, which specifies END as the command that is issued for that
selection. The second ITEM statement specifies that the SAVE command is issued
for that selection.
menu f;
item 'Goback' selection=g;
item 'Save';
selection g 'end';
Design the Edit menu. This group of statements defines the selections available
under Edit on the menu bar.
menu e;
item 'Cancel';
item 'Add';
Design the Scroll menu. This group of statements defines the selections available
under Scroll on the menu bar.
menu s;
item 'Next Obs' selection=n;
item 'Prev Obs' selection=p;
item 'Top';
item 'Bottom';
selection n 'forward';
selection p 'backward';
Design the Subset menu. This group of statements defines the selections available
under Subset on the menu bar. The value d1 in the DIALOG= option is used in the
subsequent DIALOG statement.
menu sub;
item 'Where' dialog=d1;
item 'Where Clear';
Design the Help menu. This group of statements defines the selections available
under Help on the menu bar. The SETHELP command specifies a HELP entry that
contains user-written information for this FSEDIT application. The semicolon
enables the HELP command to be included in the string. The HELP command
invokes the HELP entry.
menu h;
item 'Keys';
item 'About this application' selection=hlp;
selection hlp 'sethelp proclib.menucat.staffhlp.help;help';
Design the dialog box. The DIALOG statement builds a WHERE command. The
arguments for the WHERE command are provided by user input into the text entry
fields described by the three TEXT statements. The @1 notation is a placeholder
for user input in the text field. The TEXT statements specify the text in the dialog
box and the length of the input field.
Example 3: Creating a Dialog Box to Search Multiple Variables 1719
You can also specify the menu bar on the command line in the FSEDIT session or by
issuing a CALL EXECCMD command in SAS Component Language (SCL). Refer to
SAS(R) Component Language 9.3: Reference for complete documentation on SCL.
For other methods of associating the customized menu bar with the FSEDIT
window, see “Associating a Menu Bar with an FSEDIT Session” on page 1724.
The following dialog box appears when the user chooses Subset and then Where.
DEFAULT=
RBUTTON statement option
SUBSTITUTE=
SAS macro invocation
Details
This example shows how to modify the menu bar in an FSEDIT session to enable a
search for one value across multiple variables. The example creates customized
menus to use in an FSEDIT session. The menu structure is the same as in the
preceding example, except for the WHERE dialog box.
When selected, the menu item invokes a macro. The user input becomes values for
macro parameters. The macro generates a WHERE command that expands to
include all the variables needed for the search.
Program
libname proclib
'SAS-data-library';
proc pmenu catalog=proclib.menucat;
menu project;
item 'File' menu=f;
item 'Edit' menu=e;
item 'Scroll' menu=s;
item 'Subset' menu=sub;
item 'Help' menu=h;
menu f;
item 'Goback' selection=g;
item 'Save';
selection g 'end';
menu e;
item 'Cancel';
item 'Add';
menu s;
item 'Next Obs' selection=n;
item 'Prev Obs' selection=p;
item 'Top';
item 'Bottom';
Example 3: Creating a Dialog Box to Search Multiple Variables 1721
selection n 'forward';
selection p 'backward';
menu sub;
item 'Where' dialog=d1;
item 'Where Clear';
menu h;
item 'Keys';
item 'About this application' selection=hlp;
selection hlp 'sethelp proclib.menucat.staffhlp.help;help';
dialog d1 '%%wbuild(%1,%2,@1,%3)';
text #1 @1 'Choose a region:';
radiobox default=1;
rbutton #3 @5 'Northeast' substitute='NE';
rbutton #4 @5 'Northwest' substitute='NW';
rbutton #5 @5 'Southeast' substitute='SE';
rbutton #6 @5 'Southwest' substitute='SW';
text #8 @1 'Choose a contaminant:';
radiobox default=1;
rbutton #10 @5 'Pollutant A' substitute='pol_a,2';
rbutton #11 @5 'Pollutant B' substitute='pol_b,4';
text #13 @1 'Enter Value for Search:';
text #13 @25 len=6;
text #15 @1 'Choose a comparison criterion:';
radiobox default=1;
rbutton #16 @5 'Greater Than or Equal To'
substitute='GE';
rbutton #17 @5 'Less Than or Equal To'
substitute='LE';
rbutton #18 @5 'Equal To' substitute='EQ';
quit;
Program Description
Declare the PROCLIB library. The PROCLIB library is used to store menu
definitions.
libname proclib
'SAS-data-library';
Specify the catalog for storing menu definitions. Menu definitions will be stored in
the PROCLIB.MENUCAT catalog.
proc pmenu catalog=proclib.menucat;
Specify the name of the catalog entry. The MENU statement specifies STAFF as
the name of the catalog entry. The menus are stored in the catalog entry
PROCLIB.MENUCAT.PROJECT.PMENU.
menu project;
Design the menu bar. The ITEM statements specify the items for the menu bar. The
value of the MENU= option is used in a subsequent MENU statement.
1722 Chapter 46 / PMENU Procedure
Design the File menu. This group of statements defines the selections under File on
the menu bar. The first ITEM statement specifies Goback as the first selection under
File. The value of the SELECTION= option corresponds to the subsequent
SELECTION statement, which specifies END as the command that is issued for that
selection. The second ITEM statement specifies that the SAVE command is issued
for that selection.
menu f;
item 'Goback' selection=g;
item 'Save';
selection g 'end';
Design the Edit menu. The ITEM statements define the selections under Edit on the
menu bar.
menu e;
item 'Cancel';
item 'Add';
Design the Scroll menu. This group of statements defines the selections under
Scroll on the menu bar. If the quoted string in the ITEM statement is not a valid
command, then the SELECTION= option corresponds to a subsequent SELECTION
statement, which specifies a valid command.
menu s;
item 'Next Obs' selection=n;
item 'Prev Obs' selection=p;
item 'Top';
item 'Bottom';
selection n 'forward';
selection p 'backward';
Design the Subset menu. This group of statements defines the selections under
Subset on the menu bar. The DIALOG= option names a dialog box that is defined in
a subsequent DIALOG statement.
menu sub;
item 'Where' dialog=d1;
item 'Where Clear';
Design the Help menu. This group of statements defines the selections under Help
on the menu bar. The SETHELP command specifies a HELP entry that contains
user-written information for this FSEDIT application. The semicolon that appears
after the HELP entry name enables the HELP command to be included in the string.
The HELP command invokes the HELP entry.
menu h;
item 'Keys';
item 'About this application' selection=hlp;
selection hlp 'sethelp proclib.menucat.staffhlp.help;help';
Design the dialog box. WBUILD is a SAS macro. The double percent sign that
precedes WBUILD is necessary to prevent PROC PMENU from expecting a field
number to follow. The field numbers %1, %2, and %3 equate to the values that the
Example 3: Creating a Dialog Box to Search Multiple Variables 1723
user specified with the radio buttons. The field number @1 equates to the search
value that the user enters.
dialog d1 '%%wbuild(%1,%2,@1,%3)';
Add a radio button for region selection. The TEXT statement specifies text for the
dialog box that appears on line 1 and begins in column 1. The RADIOBOX statement
specifies that a radio button will appear in the dialog box. DEFAULT= specifies that
the first radio button (Northeast) will be selected by default. The RBUTTON
statements specify the mutually exclusive choices for the radio buttons: Northeast,
Northwest, Southeast, or Southwest. SUBSTITUTE= gives the value that is
substituted for the %1 in the DIALOG statement above if that radio button is
selected.
text #1 @1 'Choose a region:';
radiobox default=1;
rbutton #3 @5 'Northeast' substitute='NE';
rbutton #4 @5 'Northwest' substitute='NW';
rbutton #5 @5 'Southeast' substitute='SE';
rbutton #6 @5 'Southwest' substitute='SW';
Add a radio button for pollutant selection. The TEXT statement specifies text for
the dialog box that appears on line 8 (#8) and begins in column 1 (@1). The
RADIOBOX statement specifies that a radio button will appear in the dialog box.
DEFAULT= specifies that the first radio button (Pollutant A) will be selected by
default. The RBUTTON statements specify the mutually exclusive choices for the
radio buttons: Pollutant A or Pollutant B. SUBSTITUTE= gives the value that is
substituted for the %2 in the preceding DIALOG statement if that radio button is
selected.
text #8 @1 'Choose a contaminant:';
radiobox default=1;
rbutton #10 @5 'Pollutant A' substitute='pol_a,2';
rbutton #11 @5 'Pollutant B' substitute='pol_b,4';
Add an input field. The first TEXT statement specifies text for the dialog box that
appears on line 13 and begins in column 1. The second TEXT statement specifies an
input field that is 6 bytes long that appears on line 13 and begins in column 25. The
value that the user enters in the field is substituted for the @1 in the preceding
DIALOG statement.
text #13 @1 'Enter Value for Search:';
text #13 @25 len=6;
Add a radio button for comparison operator selection. The TEXT statement
specifies text for the dialog box that appears on line 15 and begins in column 1. The
RADIOBOX statement specifies that a radio button will appear in the dialog box.
DEFAULT= specifies that the first radio button (Greater Than or Equal To) will be
selected by default. The RBUTTON statements specify the mutually exclusive
choices for the radio buttons. SUBSTITUTE= gives the value that is substituted for
the %3 in the preceding DIALOG statement if that radio button is selected.
text #15 @1 'Choose a comparison criterion:';
radiobox default=1;
rbutton #16 @5 'Greater Than or Equal To'
substitute='GE';
rbutton #17 @5 'Less Than or Equal To'
substitute='LE';
1724 Chapter 46 / PMENU Procedure
The following dialog box appears when the user selects Subset and then Where.
Details
PROCLIB.LAKES 1
To associate the customized menu bar menu with the FSEDIT session, do any one
of the following:
n enter a SETPMENU command on the command line. The command for this
example is
setpmenu proclib.menucat.project.pmenu
n include an SCL program with the FSEDIT session that uses the customized
menus and turns on the menus, for example:
fseinit:
call execcmd('setpmenu proclib.menucat.project.pmenu;
pmenu on;');
return;
init:
return;
main:
return;
term:
return;
1726 Chapter 46 / PMENU Procedure
Using the custom menu item, you would select Southwest, Pollutant A, enter .50
as the value, and choose Greater Than or Equal To as the comparison criterion.
Two lakes, New Dam and Border, meet the criteria.
The WBUILD macro uses the four pieces of information from the dialog box to
generate a WHERE command:
n One of the values for region, either NE, NW, SE, or SW, becomes the value of the
macro parameter REGION.
n Either pol_a,2 or pol_b,4 become the values of the PREFIX and NUMVAR
macro parameters. The comma is part of the value that is passed to the
WBUILD macro and serves to delimit the two parameters, PREFIX and
NUMVAR.
n The value that the user enters for the search becomes the value of the macro
parameter VALUE.
n The operator that the user chooses becomes the value of the macro parameter
OPERATOR.
To see how the macro works, again consider the following example, in which you
want to know whether any of the lakes in the southwest tested for a value of .50 or
greater for pollutant A. The following table contains the values of the macro
parameters:
REGION SW
PREFIX pol_a
NUMVAR 2
VALUE .50
OPERATOR GE
The first %IF statement checks to make sure that the user entered a value. If a
value has been entered, then the macro begins to generate the WHERE command.
First, the macro creates the beginning of the WHERE command:
where region="SW" and (
Example 4: Creating Menus for a DATA Step Window Application 1727
Next, the %DO loop executes. For pollutant A, it executes twice because
NUMVAR=2. In the macro definition, the period in &prefix.&i concatenates pol_a
with 1 and with 2. At each iteration of the loop, the macro resolves PREFIX,
OPERATOR, and VALUE, and it generates a part of the WHERE command. On the
first iteration, it generates pol_a1 GE .50
The %IF statement in the loop checks to determine whether the loop is working on
its last iteration. If it is not working, then the macro makes a compound WHERE
command by putting an OR between the individual clauses. The next part of the
WHERE command becomes OR pol_a2 GE .50
The loop ends after two executions for pollutant A, and the macro generates the
end of the WHERE command:
)
Results from the macro are placed on the command line. The following code is the
definition of the WBUILD macro. The underlined code shows the parts of the
WHERE command that are text strings that the macro does not resolve:
%macro wbuild(region,prefix,numvar,value,operator);
/* check to see if value is present */
%if &value ne %then %do;
where region="®ion" AND (
/* If the values are character, */
/* enclose &value in double quotation marks. */
%do i=1 %to &numvar;
&prefix.&i &operator &value
/* if not on last variable, */
/* generate 'OR' */
%if &i ne &numvar %then %do;
OR
%end;
%end;
)
%end;
%mend wbuild;
Details
This example defines an application that enables the user to enter human resources
data for various departments, and to request reports from the data sets that are
created by the data entry.
The first part of the example describes the PROC PMENU step that creates the
menus. The subsequent sections describe how to use the menus in a DATA step
window application.
Program
libname proclib
'SAS-data-library';
filename de 'external-file';
filename prt 'external-file';
proc pmenu catalog=proclib.menus;
menu select;
item 'File' menu=f;
item 'Data_Entry' menu=deptsde;
item 'Print_Report' menu=deptsprt;
menu f;
item 'End this window' selection=endwdw;
item 'End this SAS session' selection=endsas;
selection endwdw 'end';
selection endsas 'bye';
menu deptsde;
item 'For Dept01' selection=de1;
item 'For Dept02' selection=de2;
item 'Other Departments' dialog=deother;
selection de1 'end;pgm;include de;change xx 01;submit';
selection de2 'end;pgm;include de;change xx 02;submit';
dialog deother 'end;pgm;include de;c deptxx @1;submit';
text #1 @1 'Enter department name';
text #2 @3 'in the form DEPT99:';
text #2 @25 len=7;
menu deptsprt;
Example 4: Creating Menus for a DATA Step Window Application 1729
Program Description
Declare the PROCLIB library. The PROCLIB library is used to store menu
definitions.
libname proclib
'SAS-data-library';
Declare the DE and PRT filenames. The FILENAME statements define the external
files in which the programs to create the windows are stored.
filename de 'external-file';
filename prt 'external-file';
Specify the catalog for storing menu definitions. Menu definitions will be stored in
the PROCLIB.MENUCAT catalog.
proc pmenu catalog=proclib.menus;
Specify the name of the catalog entry. The MENU statement specifies SELECT as
the name of the catalog entry. The menus are stored in the catalog entry
PROCLIB.MENUS.SELECT.PMENU.
menu select;
Design the menu bar. The ITEM statements specify the three items on the menu
bar. The value of the MENU= option is used in a subsequent MENU statement.
item 'File' menu=f;
item 'Data_Entry' menu=deptsde;
item 'Print_Report' menu=deptsprt;
1730 Chapter 46 / PMENU Procedure
Design the File menu. This group of statements defines the selections under File.
The value of the SELECTION= option is used in a subsequent SELECTION
statement.
menu f;
item 'End this window' selection=endwdw;
item 'End this SAS session' selection=endsas;
selection endwdw 'end';
selection endsas 'bye';
Design the Data_Entry menu. This group of statements defines the selections
under Data_Entry on the menu bar. The ITEM statements specify that For Dept01
and For Dept02 appear under Data_Entry. The value of the SELECTION= option
equates to a subsequent SELECTION statement, which contains the string of
commands that are actually submitted. The value of the DIALOG= option equates
to a subsequent DIALOG statement, which describes the dialog box that appears
when this item is selected.
menu deptsde;
item 'For Dept01' selection=de1;
item 'For Dept02' selection=de2;
item 'Other Departments' dialog=deother;
Specify commands under the Data_Entry menu. The commands in single quotation
marks are submitted when the user selects For Dept01 or For Dept02. The END
command ends the current window and returns to the PROGRAM EDITOR window
so that further commands can be submitted. The INCLUDE command includes the
SAS statements that create the data entry window. The CHANGE command
modifies the DATA statement in the included program so that it creates the correct
data set. The SUBMIT command submits the DATA step program.
selection de1 'end;pgm;include de;change xx 01;submit';
selection de2 'end;pgm;include de;change xx 02;submit';
Design the DEOTHER dialog box. The DIALOG statement defines the dialog box
that appears when the user selects Other Departments. The DIALOG statement
modifies the command string so that the name of the department that is entered by
the user is used to change deptxx in the SAS program that is included. The first two
TEXT statements specify text that appears in the dialog box. The third TEXT
statement specifies an input field. The name that is entered in this field is
substituted for the @1 in the DIALOG statement.
dialog deother 'end;pgm;include de;c deptxx @1;submit';
text #1 @1 'Enter department name';
text #2 @3 'in the form DEPT99:';
text #2 @25 len=7;
Design the Print_Report menu. This group of statements defines the choices under
the Print_Report item. These ITEM statements specify that For Dept01 and For
Dept02 appear in the menu. The value of the SELECTION= option equates to a
subsequent SELECTION statement, which contains the string of commands that
are actually submitted.
menu deptsprt;
item 'For Dept01' selection=prt1;
item 'For Dept02' selection=prt2;
item 'Other Departments' dialog=prother;
Example 4: Creating Menus for a DATA Step Window Application 1731
Specify commands for the Print_Report menu. The commands in single quotation
marks are submitted when the user selects For Dept01 or For Dept02. The END
command ends the current window and returns to the PROGRAM EDITOR window
so that further commands can be submitted. The INCLUDE command includes the
SAS statements that print the report. (For more information, see “Printing a
Program” on page 1734.) The CHANGE command modifies the PROC PRINT step in
the included program so that it prints the correct data set. The SUBMIT command
submits the PROC PRINT program.
selection prt1
'end;pgm;include prt;change xx 01 all;submit';
selection prt2
'end;pgm;include prt;change xx 02 all;submit';
Design the PROTHER dialog box. The DIALOG statement defines the dialog box
that appears when the user selects Other Departments. The DIALOG statement
modifies the command string so that the name of the department that is entered by
the user is used to change deptxx in the SAS program that is included. The first two
TEXT statements specify text that appears in the dialog box. The third TEXT
statement specifies an input field. The name entered in this field is substituted for
the @1 in the DIALOG statement.
dialog prother 'end;pgm;include prt;c deptxx @1 all;submit';
text #1 @1 'Enter department name';
text #2 @3 'in the form DEPT99:';
text #2 @25 len=7;
Specify a second catalog entry and menu bar. The MENU statement specifies
ENTRDATA as the name of the catalog entry that this RUN group is creating. File is
the only item on the menu bar. The selections available are End this window and
End this SAS session.
menu entrdata;
item 'File' menu=f;
menu f;
item 'End this window' selection=endwdw;
item 'End this SAS session' selection=endsas;
selection endwdw 'end';
selection endsas 'bye';
run;
quit;
1732 Chapter 46 / PMENU Procedure
Other Examples
The WINDOW statement creates the HRSELECT window. MENU= associates the
PROCLIB.MENUS.SELECT.PMENU entry with this window.
data _null_;
window hrselect menu=proclib.menus.select
#4 @10 'This application allows you to'
#6 @13 '- Enter human resources data for'
#7 @15 'one department at a time.'
#9 @13 '- Print reports on human resources data for'
#10 @15 'one department at a time.'
#12 @13 '- End the application and return to the PGM window.'
#14 @13 '- Exit from the SAS System.'
#19 @10 'You must have the menus turned on.';
The WINDOW statement creates the HRDATA window. MENU= associates the
PROCLIB.MENUS.ENTRDATA.PMENU entry with the window.
data proclib.deptxx;
window hrdata menu=proclib.menus.entrdata
#5 @10 'Employee Number'
#8 @10 'Salary'
#11 @10 'Employee Name'
#5 @31 empno $4.
#8 @31 salary 10.
#11 @31 name $30.
#19 @10 'Press ENTER to add the observation to the data set.';
The %INCLUDE statement recalls the statements in the file HRWDW. The
statements in HRWDW redisplay the primary window. See the HRSELECT window
on page 1732
filename hrwdw 'external-file';
%include hrwdw;
run;
The SELECTION and DIALOG statements in the PROC PMENU step modify the
DATA statement in this program so that the correct department name is used when
the data set is created. That is, if the user selects Other Departments and enters
DEPT05, then the DATA statement is changed by the command string in the DIALOG
statement to
data proclib.dept05;
Printing a Program
When the user selects Print_Report from the menu bar, a menu is displayed. When
the user selects one of the listed departments or chooses to enter a different
department, the following statements are invoked. These statements are stored in
the external file referenced by the PRT fileref.
PROC PRINTTO routes the output to an external file.
proc printto
file='external-file' new;
run;
libname proclib
'SAS-data-library';
This PROC PRINTTO step restores the default output destination. See Chapter 49,
“PRINTTO Procedure,” on page 1857.
proc printto;
run;
The %INCLUDE statement recalls the statements in the file HRWDW. The
statements in HRWDW redisplay the primary window.
filename hrwdw 'external-file';
%include hrwdw;
run;
Example 5: Associating Menus with a FRAME Application 1735
Details
This example creates menus for a FRAME entry and gives the steps necessary to
associate the menus with a FRAME entry from SAS/AF software.
Program
libname proclib
'SAS-data-library';
proc pmenu catalog=proclib.menucat;
menu frame;
item 'File' menu=f;
item 'Help' menu=h;
menu f;
item 'Cancel';
item 'End';
menu h;
item 'About the application' selection=a;
item 'About the keys' selection=k;
selection a 'sethelp proclib.menucat.app.help;help';
selection k 'sethelp proclib.menucat.keys.help;help';
run;
quit;
Program Description
Declare the PROCLIB library. The PROCLIB library is used to store menu
definitions.
libname proclib
'SAS-data-library';
1736 Chapter 46 / PMENU Procedure
Specify the catalog for storing menu definitions. Menu definitions will be stored in
the PROCLIB.MENUCAT catalog.
proc pmenu catalog=proclib.menucat;
Specify the name of the catalog entry. The MENU statement specifies FRAME as
the name of the catalog entry. The menus are stored in the catalog entry
PROCLIB.MENUS.FRAME.PMENU.
menu frame;
Design the menu bar. The ITEM statements specify the items in the menu bar. The
value of MENU= corresponds to a subsequent MENU statement.
item 'File' menu=f;
item 'Help' menu=h;
Design the File menu. The MENU statement equates to the MENU= option in a
preceding ITEM statement. The ITEM statements specify the selections that are
available under File on the menu bar.
menu f;
item 'Cancel';
item 'End';
Design the Help menu. The MENU statement equates to the MENU= option in a
preceding ITEM statement. The ITEM statements specify the selections that are
available under Help on the menu bar. The value of the SELECTION= option
equates to a subsequent SELECTION statement.
menu h;
item 'About the application' selection=a;
item 'About the keys' selection=k;
Specify commands for the Help menu. The SETHELP command specifies a HELP
entry that contains user-written information for this application. The semicolon
that appears after the HELP entry name enables the HELP command to be included
in the string. The HELP command invokes the HELP entry.
selection a 'sethelp proclib.menucat.app.help;help';
selection k 'sethelp proclib.menucat.keys.help;help';
run;
quit;
2 In the Properties window, select the Value field for the pmenuEntry Attribute
Name. The Select An Entry window appears.
3 In the Select An Entry window, enter the name of the catalog entry that is
specified in the PROC PMENU step that creates the menus.
Example 5: Associating Menus with a FRAME Application 1737
4 Test the FRAME as follows from the menu bar of the FRAME:Build ð Test
Notice in the following display that the menus are now associated with the
FRAME.
For more information about programming with FRAME entries, see Getting
Started with SAS/AF(R) 9.3 and Frames.
1738 Chapter 46 / PMENU Procedure
1739
47
PRESENV Procedure
Note: This procedure is not available in SAS Viya orders that include only SAS
Visual Analytics.
The PRESENV procedure works with the PRESENV system option to preserve your
SAS program and data sets. You can turn the option on or off at any time. When the
PRESENV system option is turned off, the global statements collection is
suspended. When turned back on, the collection resumes. At no point is the
collection discarded. However, the collection does not begin until the first time the
option is turned on.
n CATNAME
n FILENAME
n FOOTNOTE
n GOPTIONS
n LEGEND
n LIBNAME
n LOCK
n MISSING
n OPTIONS
n PATTERN
n SASFILE
n SYMBOL
n TITLE
PROC PRESENV Statement 1741
Macro variables that are used in your program are also collected in memory, but
other global statements, such as the X command, are not collected. Macros that are
compiled during the program execution are stored in a Work directory, and that
directory is copied as part of the PROC PRESENV execution.
Syntax
PROC PRESENV PERMDIR=libref SASCODE=fileref <SHOW_COMMENTS>;
Required Arguments
PERMDIR=libref
specifies a libref where all of the Work data sets, catalogs, and macros are
written.
SASCODE=fileref
specifies a fileref where a SAS program is written. The SAS program contains all
of the code that is necessary to restore the environment.
1742 Chapter 47 / PRESENV Procedure
Optional Argument
SHOW_COMMENTS
displays all global statements. Redundant global statements are commented
out.
If this option is not used, then the global statements are suppressed.
Tip Use this option only for debugging your program, because this option can
greatly increase the amount of text that is being generated.
The value of PERMDIR is a libref where all of the Work data sets and catalogs
(including work.sasmacr) are written. The value of SASCODE is a fileref where a
SAS program is written. The SAS program contains all of the code that is necessary
to restore the environment.
Restore-file is the filename that is associated with the fileref in the SASCODE=
argument of the original job. When you execute the program, all macros, macro
variables, options, and global statements are restored to their original values.
Example 1: Preserve a SAS Environment 1743
Details
This example shows a program with features that you want to save to use again in a
later SAS session. Before you run this program, you invoke SAS and specify the
PRESENV system option. Your SAS invocation command might look like this:
sas -presenv
This program sets the PS= and LS= system options and creates a data set called
Mydata1 in the Work directory. This program also defines a macro variable
MYMACVAR, defines a macro SOMEMAC, and creates a protected data set called
Protected in the Work directory. At the end of the program, you define the location
to save any data sets and variable definitions and you specify the name of the SAS
code that restores these items in a subsequent SAS session.
Program
options ps=100 ls=100;
data mydata1;
a=1; b=2; c=3;
run;
%let mymacvar=123;
1744 Chapter 47 / PRESENV Procedure
%macro somemac;
data mydata2;
y=3;
run;
%put Data set Mydata2 is from the saved macro somemac;
%mend;
Program Description
Define system options. In the OPTIONS statement, you define the PS= and LS=
system options.
options ps=100 ls=100;
Create a data set in the Work directory. You use a DATA step to create the
Mydata1 data set. This data set contains one observation and three variables: A, B,
and C.
data mydata1;
a=1; b=2; c=3;
run;
Define a macro variable and a macro. Use the %LET statement to define the macro
variable MYMACVAR. Use the %MACRO and %MEND statements to define a new
macro called SOMEMAC. The SOMEMAC macro creates a data set called Mydata2
with one observation and one variable. Then the function prints a message to the
log.
%let mymacvar=123;
%macro somemac;
data mydata2;
y=3;
run;
%put Data set Mydata2 is from the saved macro somemac;
%mend;
Create a protected data set. Use a DATA step to create a data set called Protected
that requires a password for Read and Write access. The data set contains one
observation with two variables.
Example 2: Restore a SAS Environment 1745
Save the data sets, variable definition, and macro definition for use in a later SAS
session. Use the LIBNAME statement to define a location in which to save the data
sets and macro definitions. Specify a fileref, Prescode, to use as a reference that
you use to restore the current SAS environment. Call the PRESENV procedure and
specify the Preslib and Prescode values that correspond to the current SAS
environment. Remember the name my_sas_env so that you can restore the SAS
environment in a later session.
libname preslib 'C:\Users\<userid>\sasuser\projectA\';
filename prescode 'my_sas_env';
Details
This example shows how to restore the SAS environment to the state that it was in
after running the previous example program.
1746 Chapter 47 / PRESENV Procedure
Program
%include 'my_sas_env';
data _null_;
optval = getoption('ps');
put " ps = " optval;
optval = getoption('ls');
put " ls = " optval;
run;
%somemac;
data newdata;
x=&mymacvar;
y=2;
run;
data mydata3;
set mydata1;
run;
Program Description
Restore the previous SAS environment. Use the %INCLUDE statement to restore
the SAS environment that is associated with the my_sas_env file. This restores the
SAS environment to match the state when PROC PRESENV was run in a previous
SAS session.
%include 'my_sas_env';
Verify the system options that were set for the my_sas_env environment. Use the
GETOPTION function to request the values for the PS= and LS= system options.
The PUT statement prints the values to the SAS log. These values match the values
that were set in the previous example.
data _null_;
optval = getoption('ps');
put " ps = " optval;
optval = getoption('ls');
put " ls = " optval;
run;
Example 2: Restore a SAS Environment 1747
Use a macro and a macro variable that were defined in the my_sas_env
environment. Run the %SOMEMAC macro to generate the Mydata2 data set and
print a message to the SAS log. Use the DATA step to create a data set called
Newdata that contains the value of the MYMACVAR macro variable. Call PROC
PRINT to print the contents of Newdata.
%somemac;
data newdata;
x=&mymacvar;
y=2;
run;
Work with data sets that were created in the my_sas_env environment. Use the
DATA step to create a data set Mydata3 from the existing data set Mydata1.
Mydata1 was created in the previous example. Call PROC PRINT to print the
contents of the new data set Mydata3. Call PROC PRINT again to print the
contents of the existing data set Protected. The Protected data set was created in
the previous example and the protected status of the data set, which requires a
password to Read or Alter it, remains in place.
data mydata3;
set mydata1;
run;
1
2 %include 'my_sas_env';
NOTE: Libref PRESLIB was successfully assigned as follows:
Engine: V9
Physical Name: C:\Users\<userid>\sasuser\projectA
40
41 data _null_;
42 optval = getoption('ps');
43 put " ps = " optval;
44 optval = getoption('ls');
45 put " ls = " optval;
46 run;
ps = 100
ls = 100
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
1748 Chapter 47 / PRESENV Procedure
47
48 %somemac;
NOTE: There were 1 observations read from the data set WORK.NEWDATA.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
56
57 data mydata3;
58 set mydata1;
59 run;
NOTE: There were 1 observations read from the data set WORK.MYDATA1.
NOTE: The data set WORK.MYDATA3 has 1 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
60
61 proc print data=mydata3; run;
NOTE: There were 1 observations read from the data set WORK.MYDATA3.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
62
63 proc print data=protected (read=XXXXXX); run;
NOTE: There were 1 observations read from the data set WORK.PROTECTED.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
Example 2: Restore a SAS Environment 1749
48
PRINT Procedure
A Simple Report
The following output illustrates the simplest type of report that you can produce.
The statements that produce the output follow. “Example 2: Selecting Variables to
Print” on page 1797 creates the data set EXPREV.
options obs=10;
proc print data=exprev;
run;
TIP The OBS= system option is valid for all steps during your current SAS
session or until you change the setting. To set the number of observations
for a single PROC step, use the OBS= data set option:
proc print data=exprev(obs=10);
For more information, see “OBS= Data Set Option” in SAS Data Set Options:
Reference and “OBS= System Option” in SAS System Options: Reference .
Overview: PRINT Procedure 1753
This next example creates the CAS table Mycas.Cars as a subset of the
Sashelp.cars data set:
options cashost="cloud.example.com" casport=5555;
cas casauto;
libname mycas cas;
data mycas.cars;
set mycas.cars(where=(weight>6000));
keep make model type;
run;
Customized Report
The following HTML report is a customized report that is produced by PROC PRINT
using ODS. The statements that create this report do the following:
n customize the title and the column headings
n sum the values for Salary for each job code and for all job codes, and add a label
for the summary line and the grand total line
For an explanation of the program that produces this report, see “Program: Creating
an HTML Report with the STYLE Option” on page 1849.
Syntax: PRINT Procedure 1755
Tips: Each password and encryption key option must be coded on a separate line to
ensure that they are properly blotted in the log.
Supports the Output Delivery System. For details, see “Output Delivery System:
Basic Concepts” in SAS Output Delivery System: User’s Guide.
You can use the ATTRIB, FORMAT, LABEL, TITLE, and WHERE statements. See
SAS DATA Step Statements: Reference. For more information, see “Statements with
the Same Function in Multiple Procedures” on page 73.
SAS includes checks to verify that the PROC PRINT output is accessible for the
visually impaired. You can set the ACCESSIBLECHECK system option to have SAS
verify if the output is accessible. For best practices about creating accessible
output, see Creating Accessible Output in SAS Using ODS and ODS Graphics.
SUMBY Limit the number of sums that appear in the Ex. 5, Ex. 6,
report Ex. 7, Ex. 9
VAR Select variables that appear in the report and Ex. 2, Ex. 3,
determine their order Ex. 9
PROC PRINT Statement 1757
Syntax
PROC PRINT <options>;
Optional Arguments
BLANKLINE=n
BLANKLINE=(COUNT=n <STYLE=[style-attibute-specification(s)]>)
specifies to insert a blank line after every n observations. The observation count
is reset to 0 at the beginning of each BY group for all ODS destinations.
n
COUNT=n
specifies the observation number after which SAS inserts a blank line.
STYLE=[style-attribute-specification(s)]
specifies the style attribute to use for the blank line.
Default DATA
See The STYLE= option on page 1764 for valid style attributes.
Tip SAS includes checks to verify that the PROC PRINT output is accessible
for the visually impaired. When you specify the BLANKLINE= option, the
output that PROC PRINT creates includes one or more lines that are not
data. Screen readers and users might interpret these lines incorrectly.
When you set the ACCESSIBLECHECK system option, SAS checks to see if
the BLANKLINE option has been used to add blank lines to the output. If
blank lines are in the output, SAS writes a warning message to the SAS log.
For best practices about creating accessible output, see Creating
Accessible Output in SAS Using ODS and ODS Graphics.
PROC PRINT Statement 1759
link-text
specifies text to use in the table of contents.
#BYLINE
substitutes the entire BY line without leading or trailing blanks for #BYLINE
in the text string. The BY line uses the format variable-name=value.
#BYVALn
#BYVAL(BY-variable-name)
substitutes the current value of the specified BY variable for #BYVAL in the
text string. Specify the variable with one of these values:
n
specifies a variable by its position in the BY statement. For example,
#BYVAL2 specifies the second variable in the BY statement.
BY-variable-name
specifies a variable from the BY statement by its name. For example,
#BYVAL(YEAR) specifies the BY variable, YEAR. variable-name is not
case sensitive.
#BYVARn
#BYVAR(BY-variable-name)
substitutes the name of the BY variable or the label associated with the
variable (whatever the BY line would normally display) for #BYVAR in the
text string. Specify the variable with one of these values:
n
specifies a variable by its position in the BY statement. For example,
#BYVAR2 specifies the second variable in the BY statement.
BY-variable-name
specifies a variable from the BY statement by its name. For example,
#BYVAR(SITES) specifies the BY variable, SITES. variable-name is not
case sensitive.
Restrictions CONTENTS= does not affect the HTML body file. It affects only the
HTML contents file.
See For information about HTML output, see Files Produced by the
HTML Destination and “ODS HTML Statement” in SAS Output
Delivery System: User’s Guide.
DATA=SAS-data-set
specifies the SAS data set to print.
DOUBLE
writes a blank line between observations.
1760 Chapter 48 / PRINT Procedure
Alias D
GRANDTOTAL_LABEL='label'
displays a label on the grand total line. You can include the #BYVAR and
#BYVAL variables in 'label'.
Aliases GRAND_LABEL
GRANDTOT_LABEL
GTOT_LABEL
GTOTAL_LABEL
Restriction The #BYVAR and #BYVAL variables are not sup ported for the
LISTING destination.
Tip SAS includes checks to verify that the PROC PRINT output is
accessible for the visually impaired. When you set the
ACCESSIBLECHECK system option, SAS verifies whether a label is
available for both the SUMLABEL and the GRANDTOTAL_LABEL
options. If SAS detects that the output does not have a label for the
summary and grand total values, SAS writes a message to the log.
For best practices about creating accessible output, see Creating
Accessible Output in SAS Using ODS and ODS Graphics.
HEADING=HORIZONTAL | VERTICAL
controls the orientation of the column headings.
HORIZONTAL
prints all column headings horizontally.
Alias H
VERTICAL
prints all column headings vertically.
Alias V
Restriction For LISTING output, if the column heading is too long for the
page, the variable name is used in place of a label.
Default Headings are either all horizontal or all vertical. If you omit HEADING=,
PROC PRINT determines the direction of the column headings as
follows:
If you use LABEL and at least one variable has a label, all headings
are horizontal.
LABEL
specifies to use the variables' labels as column headings.
Alias L
Default PROC PRINT uses the name of the variable as the column heading
in the following two circumstances:
1. if you omit the LABEL option in the PROC PRINT statement,
even if the PROC PRINT step contains a LABEL statement
2. if a variable does not have a label
Interactions By default, if you specify LABEL and at least one variable has a
label, PROC PRINT prints all column headings horizontally.
Therefore, using LABEL might increase the number of pages of
output. (Use HEADING=VERTICAL in the PROC PRINT statement
to print vertical column headings.)
Note The SAS system option LABEL must be in effect in order for any
procedure to use labels. For more information see “LABEL System
Option” in SAS System Options: Reference.
Tip To create a blank column heading for a variable, use this LABEL
statement in your PROC PRINT step:
label variable-name='00'x;
N<=“string-1” <“string-2”>>
prints the number of observations in the data set, in BY groups, or both and
specifies explanatory text to print with the number.
N Option
Use PROC PRINT Action
With Prints the number of observations in the data set at the end of the
neither a report and labels the number with the value of string-1.
1762 Chapter 48 / PRINT Procedure
N Option
Use PROC PRINT Action
BY nor a
SUM
statement
Tip SAS includes checks to verify that the PROC PRINT output is
accessible for the visually impaired. When you specify the N= option,
the output that PROC PRINT creates includes a text line that is not
data. Screen readers might interpret this line of text as data. When
you set the ACCESSIBLECHECK system option, SAS verifies whether
the output is accessible. If the output contains the text that is not
data, SAS writes a warning message to the SAS log. For best
practices about creating accessible output, see Creating Accessible
Output in SAS Using ODS and ODS Graphics.
NOOBS
suppresses the column in the output that identifies each observation by
number.
OBS=“column-header”
specifies a column heading for the column that identifies each observation by
number.
Tip OBS= honors the split character. (See the discussion of the SPLIT=
option on page 1764.)
ROUND
rounds unformatted numeric values to two decimal places. (Formatted values
are already rounded by the format to the specified number of decimal places.)
For both formatted and unformatted variables, PROC PRINT uses these
rounded values to calculate any sums in the report.
If you omit ROUND, PROC PRINT adds the actual values of the rows to obtain
the sum even though it displays the formatted (rounded) values. Any sums are
also rounded by the format, but they include only one rounding error, that of
rounding the sum of the actual values. The ROUND option, on the other hand,
rounds values before summing them, so there might be multiple rounding errors.
The results without ROUND are more accurate, but ROUND is useful for
published reports where it is important for the total to be the sum of the printed
(rounded) values.
Be aware that the results from PROC PRINT with the ROUND option might
differ from the results of summing the same data with other methods such as
PROC MEANS or the DATA step. Consider a simple case in which the following
is true:
n The data set contains three values for X: .003, .004, and .009.
Depending on how you calculate the sum, you can get three different answers:
0.02, 0.01, and 0.016. The following figure shows the results of calculating the
sum with PROC PRINT (without and with the ROUND option) and PROC
MEANS.
Notice that the sum produced without the ROUND option (.02) is closer to the
actual result (0.16) than the sum produced with ROUND (0.01). However, the
sum produced with ROUND reflects the numbers that are displayed in the
report.
Alias R
CAUTION Do not use ROUND with PICTURE formats. ROUND is for use with
numeric values. SAS procedures treat variables that have picture formats as
character variables. Using ROUND with such variables might lead to
unexpected results.
1764 Chapter 48 / PRINT Procedure
ROWS=page-format
formats the rows on a page. Currently, PAGE is the only value that you can use
for page-format:
PAGE
prints only one row of variables for each observation per page. When you
use ROWS=PAGE, PROC PRINT does not divide the page into sections; it
prints as many observations as possible on each page. If the observations do
not fill the last page of the output, PROC PRINT divides the last page into
sections and prints all the variables for the last few observations.
Restriction ROWS= is valid only for the ODS LISTING destination. Therefore,
HTML output from PROC PRINT appears the same if you use
ROWS=.
Tip The PAGE value can reduce the number of pages in the output if the
data set contains large numbers of variables and observations.
However, if the data set contains a large number of variables but few
observations, the PAGE value can increase the number of pages in
the output.
See “Page Layout for Limited Page Sizes” on page 1792 for discussion of
the default layout.
SPLIT='split-character'
specifies the split character, which controls line breaks in column headings. It
also uses labels as column headings. PROC PRINT breaks a column heading
when it reaches the split character and continues the header on the next line.
The split character is not part of the column heading although each occurrence
of the split character counts toward the 256-character maximum for a label.
Alias S=
Interactions You do not need to use both LABEL and SPLIT= because SPLIT=
implies the use of labels.
The OBS= option honors the split character. (See the discussion of
“OBS=“column-header”” on page 1762.)
Note PROC PRINT does not split labels of BY variables in the heading
preceding each BY group, a summary label, or a grand total level,
even if you specify SPLIT=. Instead, PROC PRINT replaces the split
character with a blank.
STYLE <(location(s))>=<style-override(s)>
specify one or more ODS style overrides to modify the default style element
and attributes in a specific area of a report.
style-element-name | [style-attribute-name-1=style-attribute-value-1
<style-attribute-name-2=style-attribute-value-2 …>]
location
identifies the part of the report that the STYLE option affects. If location(s)
is not specified, PROC PRINT determines the location to where the style
override is applied based on the statement, the specified style element, and
the style attribute.
The following table shows the available locations and the other statements
in which you can specify them.
Table 48.1 Specifying Locations in the STYLE Option
Or
Data in the ID
columns when the
DATA location is
specified in the
STYLE= option of
the ID statement
BYSUM
1 Prior to SAS 9.4, if you specified the HEADER location in the STYLE= option of the PROC PRINT
statement, all column headings rendered using the HEADER style attributes. In SAS 9.4, you use
the OBSHEADER location in the STYLE= option of the PROC PRINT statement to format the
OBS column and the ID columns. The PROC PRINT statement STYLE= option in your existing
programs might need to include the OBSHEADER location as well as the HEADER location.
If the same style attributes appear for the OBSHEADER location in the
PROC PRINT statement and the HEADER location in the ID statement, the
HEADER location attributes override the OBSHEADER attributes. All other
style attributes for the ID columns in both the PROC PRINT statement and
the ID statement are merged to create the style for the ID columns. For
example, in the PROC PRINT statement, the attributes for the OBSHEADER
location are {fontsize=5 fontweight=bold}. In the ID statement, the
attributes for the HEADER location are [fontsize=6 fontstyle=italic].
The resulting style for the ID column is [fontsize=6 fontweight=bold
fontstyle=italic].
proc print data=exprev style(obsheader)={fontsize=5 fontweight=bold};
id country / style(header)=[fontsize=6 fontstyle=italic];
run;
If the same style attributes appear for the OBS location in the PROC PRINT
statement and the DATA location in the ID statement, the DATA location
attributes override the OBS attributes. All other style attributes for the ID
columns in both the PROC PRINT statement and the ID statement are
merged to create the style for the ID columns. For example, in the PROC
PRINT statement, the attributes for the OBS location are
{backgroundcolor=light gray color=blue}. In the ID statement, the
attributes for the DATA location are [color=white fontstyle=italic]. The
resulting style for the ID column is [backgroundcolor=light gray
color=white fontstyle=italic].
proc print data=exprev style(obs)={backgroundcolor=light gray color=blue};
id country / style(data)=[color=white fontstyle=italic];
run;
style-element-name
is the name of a style element in a style template that is registered with the
Output Delivery System. SAS provides some style templates. Users can
create their own style templates with the TEMPLATE procedure. See SAS
Output Delivery System: Procedures Guide.
PROC PRINT Statement 1769
When style elements are processed, more specific style elements override
less specific style elements. For a table of default style elements and style
attributes for each PROC PRINT location, see Table 48.95 on page 1789.
Tip You can use compound names and formats for style element names. An
example of using a compound style element name is
style(obsheader)=data.italic.red;. An example of using a format
element name is style=$cities. For more information about using
formats, see the SAS Output Delivery System: Procedures Guide.
style-attribute-specification
describes the style attribute to change. Each style-attribute-specification has
this general form:
style-attribute-name=style-attribute-value
BACKGROUNDCOLOR= FONTWIDTH= 1
BACKGROUNDIMAGE= COLOR= 1
BORDERCOLOR= FRAME=
BORDERCOLORDARK= HTMLCLASS=
BORDERCOLORLIGHT= TEXTALIGN=
BORDERWIDTH= OUTPUTWIDTH=
CELLPADDING= POSTHTML=
CELLSPACING= POSTIMAGE=
FONT= 1 POSTTEXT=
FONTFAMILY= 1 PREHTML=
FONTSIZE= 1 PREIMAGE=
FONTSTYLE= 1 PRETEXT=
FONTWEIGHT= 1 RULES=
1 When you use these attributes, they affect only the text that is specified with the PRETEXT=,
POSTTEXT=, PREHTML=, and POSTHTML= attributes. To alter the foreground color or the font
for the text that appears in the table, you must set the corresponding attribute in a location that
affects the cells rather than the table.
You can set these style attributes in all locations other than TABLE:
1770 Chapter 48 / PRINT Procedure
ASIS= FONTWIDTH=
BACKGROUNDCOLOR= HREFTARGET=
BACKGROUNDIMAGE= CLASS=
BORDERCOLOR= TEXTALIGN=
BORDERCOLORDARK= NOBREAKSPACE=
BORDERCOLORLIGHT= POSTHTML=
BORDERWIDTH= POSTIMAGE=
HEIGHT= POSTTEXT=
CELLWIDTH= PREHTML=
FLYOVER= PREIMAGE=
FONT= PRETEXT=
FONTFAMILY= PROTECTSPECIALCHARACTERS=
FONTSIZE= TAGATTR=
FONTSTYLE= URL=
FONTWEIGHT= VERTICALALIGN=
Restriction STYLE= is not valid for the ODS LISTING or ODS OUTPUT
destinations.
See For a table of style attributes that can be used with PROC
TABULATE, PROC REPORT, and PROC PRINT, see Table 48.94 on
page 1786.
For a table of default style elements and style attributes for each
PROC PRINT location, see Table 48.95 on page 1789.
For more information about using styles with PROC PRINT, see “Use
ODS Styles with PROC PRINT” on page 1780.
SUMLABEL
NOSUMLABEL
SUMLABEL='label'
specifies whether to display a label on the summary line for a BY group.
SUMLABEL
specifies to use the variable label, if it exists, as the label on the summary
line in place of the variable name.
NOSUMLABEL
specifies to leave the label on the summary line blank. Alternatively, you can
use SUMLABEL="" (two single or double quotation marks with no space
between them) to indicate a blank on the summary line.
SUMLABEL='label'
specifies the text to use as a label on the summary line of a BY group. You
can include the #BYVAR and #BYVAL variables in 'label'.
Restriction The #BYVAR and #BYVAL variables are not supported for the
LISTING destination.
Default If you omit SUMLABEL, PROC PRINT uses the BY variable names in
the summary line.
Tip SAS includes checks to verify that the PROC PRINT output is
accessible for the visually impaired. When you set the
ACCESSIBLECHECK system option, SAS verifies whether a label is
available for both the SUMLABEL and the GRANDTOTAL_LABEL
options. If SAS detects that the output does not have a label for the
summary and grand total values, SAS writes a message to the log. For
best practices about creating accessible output, see Creating
Accessible Output in SAS Using ODS and ODS Graphics.
UNIFORM
See WIDTH=UNIFORM on page 1771.
FULL
uses a variable's formatted width as the column width. If the variable does
not have a format that explicitly specifies a field width, PROC PRINT uses
the default width. For a character variable, the default width is the length of
the variable. For a numeric variable, the default width is 12. When you use
WIDTH=FULL, the column widths do not vary from page to page.
MINIMUM
uses for each variable the minimum column width that accommodates all
values of the variable.
Alias MIN
UNIFORM
uses each variable's formatted width as its column width on all pages. If the
variable does not have a format that explicitly specifies a field width, PROC
PRINT uses the widest data value as the column width. When you specify
WIDTH=UNIFORM, PROC PRINT normally needs to read the data set twice.
However, if all the variables in the data set have formats that explicitly
specify a field width (for example, BEST12. but not BEST.), PROC PRINT
reads the data set only once.
Alias U
Restriction When not all variables have formats that explicitly specify a
width, you cannot use WIDTH=UNIFORM with an engine that
supports concurrent access if another user is updating the data
set at the same time.
Tips If the data set is large and you want a uniform report, you can
save computer resources by using formats that explicitly specify
a field width so that PROC PRINT reads the data only once.
UNIFORMBY
formats all columns uniformly within a BY group, using each variable's
formatted width as its column width. If the variable does not have a format
that explicitly specifies a field width, PROC PRINT uses the widest data
value as the column width.
Alias UBY
Default If you omit WIDTH= and do not specify the UNIFORM option, PROC
PRINT individually constructs each page of output. The procedure
analyzes the data for a page and decides how best to display them.
Therefore, column widths might differ from one page to another.
Tip Column width is affected not only by variable width but also by the
length of column headings. Long column headings might lessen the
usefulness of WIDTH=.
BY Statement
Produces a separate section of the report for each BY group.
See: Chapter 3, “Statements with the Same Function in Multiple Procedures,” on page 73
Examples: “Example 4: Creating Separate Sections of a Report for Groups of Observations” on
page 1809
“Example 5: Summing Numeric Variables with One BY Group” on page 1819
“Example 6: Summing Numeric Variables with Multiple BY Variables” on page 1824
“Example 7: Limiting the Number of Sums in a Report” on page 1835
“Example 9: Creating a Customized Layout with BY Groups and ID Variables” on
page 1845
Syntax
BY <DESCENDING> variable-1 <<DESCENDING> variable-2 …> <NOTSORTED>;
Required Argument
variable
specifies the variable that the procedure uses to form BY groups. You can
specify more than one variable. If you do not use the NOTSORTED option in the
BY statement, the observations in the data set must either be sorted using
PROC SORT by all the variables that you specify, or they must be indexed
appropriately. Variables in a BY statement are called BY variables.
Optional Arguments
DESCENDING
specifies that the data set is sorted in descending order by the variable that
immediately follows the word DESCENDING in the BY statement.
NOTSORTED
specifies that observations are not necessarily sorted in alphabetic or numeric
order. The data is grouped in another way, such as chronological order.
Details
ID Statement
Identifies observations by using the formatted values of the variables that you list instead of by
using observation numbers.
Examples: “Example 8: Controlling the Layout of a Report with Many Variables” on page 1839
“Example 9: Creating a Customized Layout with BY Groups and ID Variables” on
page 1845
Syntax
ID variable(s)
</ STYLE <(location(s))>=<style-override(s)> >;
Required Argument
variable(s)
specifies one or more variables to print instead of the observation number at
the beginning of each row of the report.
warning to the SAS log and does not treat all ID variables as ID
variables.
Optional Argument
STYLE <(location(s))>=<style-override(s)>
specifies one or more style overrides to use for ID columns created with the ID
statement.
style-element-name | [style-attribute-name-1=style-attribute-value-1
<style-attribute-name-2=style-attribute-value-2 …>]
Restriction Style specifications for the OBSHEADER location is not valid in the
ID statement.
See For information about the arguments of this option and how it is
used, see the STYLE= on page 1764 option in the PROC PRINT
statement.
1776 Chapter 48 / PRINT Procedure
Details
PAGEBY Statement
Controls page ejects that occur before a page is full.
Requirement: BY statement
Example: “Example 4: Creating Separate Sections of a Report for Groups of Observations” on
page 1809
Syntax
PAGEBY BY-variable;
Required Argument
BY-variable
identifies a variable appearing in the BY statement in the PROC PRINT step. If
the value of the BY variable changes, or if the value of any BY variable that
precedes it in the BY statement changes, PROC PRINT begins printing a new
page.
Interaction If you use the BY statement with the SAS system option NOBYLINE,
which suppresses the BY line that normally appears in output
produced with BY-group processing, PROC PRINT always starts a
new page for each BY group. This behavior ensures that if you create
customized BY lines by putting BY-group information in the title and
suppressing the default BY lines with NOBYLINE, the information in
the titles matches the report on the pages. (See “Creating Titles
That Contain BY-Group Information ” on page 57.)
SUM Statement
Totals values of numeric variables.
Tip: SAS includes checks to verify that the PROC PRINT output is accessible for the
visually impaired. When you use the SUM statement and set the
SUM Statement 1777
Syntax
SUM variable(s)
</ STYLE <(location(s))>=<style-override(s)> >;
Required Argument
variable(s)
identifies the numeric variables to total in the report.
Optional Argument
STYLE <(location(s))>=<style-override(s)>
specifies one or more style overrides to use for cells containing sums that are
created with the SUM statement.
style-element-name | [style-attribute-name-1=style-attribute-value-1
<style-attribute-name-2=style-attribute-value-2 …>]
Tips To specify different style overrides for different cells reporting sums, use
a separate SUM statement for each variable and add a different STYLE
option to each SUM statement.
If the STYLE option is used in multiple SUM statements that affect the
same location, the STYLE option in the last SUM statement will be used.
See For information about the arguments of this option and how it is used, see
the option STYLE= on page 1758 in the PROC PRINT statement.
1778 Chapter 48 / PRINT Procedure
Details
When you use a SUM statement and a BY statement with multiple BY variables,
PROC PRINT sums the SUM variables for each BY group that contains more than
one observation, just as it does if you use only one BY variable. However, it
provides sums only for those BY variables whose values change when the BY group
changes. (See “Example 6: Summing Numeric Variables with Multiple BY Variables”
on page 1824.)
Note: When the value of a BY variable changes, the SAS System considers that the
values of all variables listed after it in the BY statement also change.
SUMBY Statement
Limits the number of sums that appear in the report.
Requirement: BY statement
Example: “Example 7: Limiting the Number of Sums in a Report” on page 1835
Syntax
SUMBY BY-variable;
Required Argument
BY-variable
identifies a variable that appears in the BY statement in the PROC PRINT step.
If the value of the BY variable changes, or if the value of any BY variable that
precedes it in the BY statement changes, PROC PRINT prints the sums of all
variables listed in the SUM statement.
VAR Statement 1779
Details
VAR Statement
Selects variables that appear in the report and determines their order.
Tip: If you omit the VAR statement, PROC PRINT prints all variables in the data set.
Examples: “Example 2: Selecting Variables to Print” on page 1797
“Example 9: Creating a Customized Layout with BY Groups and ID Variables” on
page 1845
Syntax
VAR variable(s)
</ STYLE <(location(s))>=<style-override(s)> >;
Required Argument
variable(s)
identifies the variables to print. PROC PRINT prints the variables in the order in
which you list them.
Interaction In the PROC PRINT output, variables that are listed in the ID
statement precede variables that are listed in the VAR statement. If
a variable in the ID statement also appears in the VAR statement,
the output contains two columns for that variable.
Optional Argument
STYLE <(location(s))>=<style-override(s)>
specifies one or more style overrides to use for all columns that are created by a
VAR statement.
style-element-name | [style-attribute-name-1=style-attribute-value-1
<style-attribute-name-2=style-attribute-value-2 …>]
Tip To specify different style overrides for different columns, use a separate
VAR statement to create a column for each variable and add a different
STYLE option to each VAR statement.
See For information about the arguments of this option and how it is used, see
the option STYLE= on page 1764 in the PROC PRINT statement.
The Base SAS reporting procedures, PROC PRINT, PROC REPORT, and PROC
TABULATE, enable you to quickly analyze your data and organize it into easy-to-
read tables. You can use the STYLE= option with these procedure statements to
modify the appearance of your report. The STYLE= option enables you to make
changes in sections of output without changing the default style for all of the
output. You can customize specific sections of procedure output by specifying the
STYLE= option in specific statements within the procedure.
The following program uses the STYLE= option to create the colors in the PROC
PRINT output below. For the complete input data set, see “EXPREV” on page 2790.
proc print data=exprev noobs sumlabel='Total' GRANDTOTAL_LABEL="Grand
Total"
style(table)=[frame=box rules=groups]
style(bysumline)=[background=red foreground=linen]
Usage: PRINT Procedure 1781
style(grandtotal)=[foreground=green]
style(header)=[font_style=italic background=orange];
by sale_type order_date;
sum price quantity;
sumby sale_type;
label sale_type='Sale Type' order_date='Sale Date';
format price dollar10.2 cost dollar10.2;
var Country / style(data)=[font_face=arial font_weight=bold
background=linen];
var Price / style(data)=[font_style=italic background=yellow];
var Cost / style(data)=[foreground=hgt. background=lightgreen];
title 'Retail and Quantity Totals for Each Sale Type';
run;
1782 Chapter 48 / PRINT Procedure
Note: Because styles control the presentation of the data, they have no effect on
output objects that go to the LISTING, DOCUMENT, or OUTPUT destination.
Available styles are in the SASHELP.TMPLMST item store. In SAS Enterprise Guide,
the list of style sheets is shown by the Style Wizard. In batch mode or SAS Studio,
you can display the list of available style templates by using the LIST statement in
PROC TEMPLATE:
proc template;
list styles / store=sashelp.tmplmst;
run;
For complete information about viewing ODS styles, see “Viewing ODS Styles
Supplied by SAS” in SAS Output Delivery System: Advanced Topics.
By default, HTML 4 output uses the HTMLBlue style template and HTML 5 output
uses the HTMLEncore style template. To help you become familiar with styles,
style elements, and style attributes, look at the relationship between them.
You can use the SOURCE statement in PROC TEMPLATE to display the structure
of a style template. The following code prints the structure of the HTMLBlue style
template to the SAS log:
proc template;
source styles.HTMLBlue;
run;
1784 Chapter 48 / PRINT Procedure
The following figure illustrates the structure of a style. The figure shows the
relationship between the style, the style elements, and the style attributes.
The following list corresponds to the numbered items in the preceding figure:
the overall appearance of the ODS documents that use it. The default style for
HTML output is HtmlBlue. Each style consists of style elements.
You can create new styles with the “DEFINE STYLE Statement” in SAS Output
Delivery System: Procedures Guide. New styles can be created independently or
from an existing style. You can use “PARENT= Statement” in SAS Output
Delivery System: Procedures Guide to create a new style from an existing style.
For complete documentation about ODS styles, see “Style Templates” in SAS
Output Delivery System: Advanced Topics.
2 Header and Footer are examples of style elements. A style element is a
collection of style attributes that apply to a particular part of the output for a
SAS program. For example, a style element might contain instructions for the
presentation of column headings or for the presentation of the data inside table
cells. Style elements might also specify default colors and fonts for output that
uses the style. Style elements exist inside styles and consist of one or more
style attributes. Style elements can be user-defined or supplied by SAS. User-
defined style elements can be created by the “STYLE Statement” in SAS Output
Delivery System: Procedures Guide.
Note: For a list of the default style elements used for HTML and markup
languages and their inheritance, see “Style Elements” in SAS Output Delivery
System: Advanced Topics.
The following table shows commonly used style attributes that you can set with
the STYLE= option in PROC PRINT, PROC TABULATE, and PROC REPORT. Most of
these attributes apply to parts of the table other than cells (for example, table
borders and the lines between columns and rows). Note that not all attributes are
valid in all destinations. For more information about these style attributes, their
valid values, and their applicable destinations, see “Style Attributes Tables” in SAS
Output Delivery System: Advanced Topics.
1786 Chapter 48 / PRINT Procedure
Table 48.2 Style Attributes for PROC REPORT, PROC TABULATE, and PROC PRINT
PROC
REPORT PROC PROC
Areas: TABULATE PRINT:
PROC CALLDEF, STATEMENTS all
REPORT COLUMN, PROC VAR, CLASS, PROC locations
STATEMENT HEADER, TABULATE BOX, PRINT other
REPORT LINES, STATEMENT CLASSLEV, TABLE than
Attribute Area SUMMARY TABLE KEYWORD location TABLE
ASIS= X X X X
BACKGROUNDCOLO X X X X X X
R=
BACKGROUNDIMAG X X X X X X
E=
BORDERBOTTOMCO X X X
LOR=
BORDERBOTTOMST X X X X
YLE=
BORDERBOTTOMWI X X X X
DTH=
BORDERLEFTCOLOR X X X
=
BORDERLEFTSTYLE X X X X
=
BORDERLEFTWIDTH X X X X
=
BORDERCOLOR= X X X X X
BORDERCOLORDAR X X X X X X
K=
BORDERCOLORLIGH X X X X X X
T=
BORDERRIGHTCOLO X X X
R=
BORDERRIGHTSTYL X X X X
E=
BORDERRIGHTWIDT X X X X
H=
Usage: PRINT Procedure 1787
PROC
REPORT PROC PROC
Areas: TABULATE PRINT:
PROC CALLDEF, STATEMENTS all
REPORT COLUMN, PROC VAR, CLASS, PROC locations
STATEMENT HEADER, TABULATE BOX, PRINT other
REPORT LINES, STATEMENT CLASSLEV, TABLE than
Attribute Area SUMMARY TABLE KEYWORD location TABLE
BORDERTOPCOLOR X X X
=
BORDERTOPSTYLE= X X X X
BORDERTOPWIDTH X X X X
=
BORDERWIDTH= X X X X X X
CELLPADDING= X X X
CELLSPACING= X X X
CELLWIDTH= X X X X X
CLASS= X X X X X X
COLOR= X X X
FLYOVER= X X X X
FONT= X X X X X X
FONTFAMILY= X X X X X X
FONTSIZE= X X X X X X
FONTSTYLE= X X X X X X
FONTWEIGHT= X X X X X X
FONTWIDTH= X X X X X
FRAME= X X X
HEIGHT= X X X X X
HREFTARGET= X X X
HTMLSTYLE= X X X X X
NOBREAKSPACE=2 X X X X
OUTPUTWIDTH= X X X X X
1788 Chapter 48 / PRINT Procedure
PROC
REPORT PROC PROC
Areas: TABULATE PRINT:
PROC CALLDEF, STATEMENTS all
REPORT COLUMN, PROC VAR, CLASS, PROC locations
STATEMENT HEADER, TABULATE BOX, PRINT other
REPORT LINES, STATEMENT CLASSLEV, TABLE than
Attribute Area SUMMARY TABLE KEYWORD location TABLE
POSTHTML=1 X X X X X X
POSTIMAGE= X X X X X X
POSTTEXT=1 X X X X X X
PREHTML=1 X X X X X X
PREIMAGE= X X X X X X
PRETEXT=1 X X X X X X
PROTECTSPECIALC X X X X
HARS=
RULES= X X X
TAGATTR= X X X X X X
TEXTALIGN= X X X X X X
URL= X X X
VERTICALALIGN= X X X
WIDTH= X X X X X
1 When you use these attributes in this location, they affect only the text that is specified with the PRETEXT=,
POSTTEXT=, PREHTML=, and POSTHTML= attributes. To alter the foreground color or the font for the text that appears
in the table, you must set the corresponding attribute in a location that affects the cells rather than the table. For
complete documentation about style attributes and their values, see “Style Attributes” in SAS Output Delivery System:
Advanced Topics.
2 To help prevent unexpected wrapping of long text strings when using PROC REPORT with the ODS RTF destination, set
NOBREAKSPACE=OFF in a location that affects the LINE statement. The NOBREAKSPACE=OFF attribute must be set in
the PROC REPORT code either on the LINE statement or on the PROC REPORT statement where style(lines) is specified.
commonly used ODS destinations: HTML, PDF, and RTF. Each destination has a
default style template that is applied to all output that is written to the destination.
n The default style for HTML output is HTMLBlue.
For complete documentation about the ODS destinations and their default styles,
see “Style Templates” in SAS Output Delivery System: Advanced Topics.
Table 48.3 Default Style Elements and Style Attributes for Report Regions
BORDERWIDTH =
NaN
For LISTING output, if the page size is set too small, SAS cannot print both the data
and any titles or footnotes on the same page. If this happens, only the data is
printed to the LISTING destination and SAS writes a warning message to the log. To
write both the data and titles or footnotes on the same page, make sure that the
page size is adequate.
To change the ODS destination for the report, use ODS statements before the
PROC PRINT statement. If you do not want HTML output, be sure to close the ODS
HTML destination before you run the procedure. For more information about using
ODS, see the SAS Output Delivery System: User’s Guide.
See the PRINT procedure examples on page 1797 for a sampling of the types of
reports that the procedure produces.
1792 Chapter 48 / PRINT Procedure
Each time that PROC PRINT runs, by default, SAS adds a page break after the
output. A page break is rendered by separating output with a horizontal rule. For
more information, see “ODS HTML Statement” in SAS Output Delivery System:
User’s Guide.
Observations
PROC PRINT uses an identical layout for all observations on a page for ODS
destinations that produce output whose page size is limited in width and length.
Some of these ODS destinations are RTF, PDF, and LISTING. First, it attempts to
print observations on a single line, as shown in the following figure.
1
Obs Va r_1 Va r_2 Va r_3
1 ~~~~ ~~~~ ~~~~
2 ~~~~ ~~~~ ~~~~
3 ~~~~ ~~~~ ~~~~
4 ~~~~ ~~~~ ~~~~
5 ~~~~ ~~~~ ~~~~
6 ~~~~ ~~~~ ~~~~
If PROC PRINT cannot fit all the variables on a single line, it splits the observations
into two or more sections and prints the observation number or the ID variables at
the beginning of each line. For example, in the following figure, PROC PRINT prints
the values for the first three variables in the first section of each page and the
values for the second three variables in the second section of each page.
Results: PRINT Procedure 1793
1
Obs Va r_1 Va r_2 Va r_3
1 ~~~~ ~~~~ ~~~~
2 ~~~~ ~~~~ ~~~~
3 ~~~~ ~~~~ ~~~~
2
Obs Va r_4 Va r_5 Va r_6
Obs Va r_1 Va r_2 Va r_3
1 ~~~~ ~~~~ ~~~~
4 ~~~~ ~~~~ ~~~~
2 ~~~~ ~~~~ ~~~~
5 ~~~~ ~~~~ ~~~~
3 ~~~~ ~~~~ ~~~~
6 ~~~~ ~~~~ ~~~~
If PROC PRINT cannot fit all the variables on one page, the procedure prints
subsequent pages with the same observations until it has printed all the variables.
For example, in the following figure, PROC PRINT uses the first two pages to print
values for the first three observations and the second two pages to print values for
the rest of the observations.
1
Ob s V ar_1 Va r_ 2 V ar_3 2
1 ~~~~ ~~~~ ~~~~ Ob s V ar_7 Va r_8 Va r_9
2 ~~~~ ~~~~ ~~~~ 1 ~~~~ ~~~~ ~~~~
3 ~~~~ ~~~~ ~~~~ 2 ~~~~ ~~~~ ~~~~
3 ~~~~ ~~~~ ~~~~
Ob s V ar_4 Va r_ 5 V ar_6
1 ~~~~ ~~~~ ~~~~ Ob s V ar_10 Va r_ 11 V ar_12
2 ~~~~ ~~~~ ~~~~ 1 ~~~~ ~~~~ ~~~~
3 ~~~~ ~~~~ ~~~~ 2 ~~~~ ~~~~ ~~~~
3 ~~~~ ~~~~ ~~~~
3
Ob s V ar_1 Va r_ 2 V ar_3 4
4 ~~~~ ~~~~ ~~~~ Ob s V ar_7 Va r_8 Va r_9
5 ~~~~ ~~~~ ~~~~ 4 ~~~~ ~~~~ ~~~~
6 ~~~~ ~~~~ ~~~~ 5 ~~~~ ~~~~ ~~~~
6 ~~~~ ~~~~ ~~~~
Ob s V ar_4 Va r_ 5 V ar_6
4 ~~~~ ~~~~ ~~~~ Ob s V ar_10 Va r_ 11 V ar_12
5 ~~~~ ~~~~ ~~~~ 4 ~~~~ ~~~~ ~~~~
6 ~~~~ ~~~~ ~~~~ 5 ~~~~ ~~~~ ~~~~
6 ~~~~ ~~~~ ~~~~
Note: For the LISTING destination, you can alter the page layout with the ROWS=
option in the PROC PRINT statement. (See the discussion of ROWS= option on
page 1764.)
1794 Chapter 48 / PRINT Procedure
Column Headings
The amount of spacing specifies whether PROC PRINT prints column headings
horizontally or vertically. Figure 48.55 on page 1792, Figure 48.56 on page 1793, and
Figure 48.57 on page 1793 all illustrate horizontal headings. The following figure
illustrates vertical headings.
1
V V V
a a a
O r r r
b – – –
s 1 2 3
Note: If you use LABEL and at least one variable has a label, PROC PRINT prints
all column headings horizontally unless you specify HEADING=VERTICAL.
Column Width
By default, PROC PRINT uses a variable's formatted width as the column width.
(The WIDTH= option overrides this default behavior for the LISTING destination.) If
the variable does not have a format that explicitly specifies a field width, PROC
PRINT uses the widest data value for that variable on that page as the column
width.
When PROC PRINT prints these three variables on a line, it uses 14 print positions
for the two ID variables and the space after each one. This arrangement leaves 80–
14, or 66, print positions for COMMENT. Longer values of COMMENT are truncated.
Example 1: Print a CAS Table 1795
Note: Column width is affected not only by variable width but also by the length of
column headings. Long column headings might lessen the usefulness of WIDTH=.
Details
This example demonstrates the following tasks:
n establishes a CAS session
n associates the Mycas libref with the CAS engine and the CAS session
run;
proc mdsummary data=mycas.cars;
var mpg_highway;
groupby origin type / out=mycas.mpghw_sum;
run;
options obs=15;
proc print data=mycas.mpghw_sum;
var origin type _mean_;
title "Average Highway Milages";
run;
Program Description
Start the CAS server, set up the CAS session, create a libref for the CAS engine,
and connect the engine to the CAS session. The OPTIONS statement connects
SAS to the CAS server. The CAS statement creates the Mysess session using the
CASUSER caslib. The LIBNAME statement creates the Mycas libref for the CAS
engine, which uses the Mysess CAS session.
options cashost="cloud.example.com" casport=5555;
cas mysess sessopts=(caslib='casuser');
libname mycas cas sessref=mysess;
Load the table Sashelp.cars into the caslib Casuser. The OUTCASLIB= option
names the caslib to where the table is loaded. Use the LOAD statement to load the
table from Sashelp.cars. The REPLACE option replaces the table and names the
table to load.
proc casutil outcaslib="casuser";
load data=sashelp.cars replace;
run;
Summarize the data using PROC MDSUMMARY. The VAR statement specifies the
analysis variable to order the results. The GROUPBY statement creates BY groups
and saves the output to the table Mycas.mpghw_sum.
proc mdsummary data=mycas.cars;
var mpg_highway;
groupby origin type / out=mycas.mpghw_sum;
run;
Print the first 15 rows of the summary results. With OBS=15, PROC PRINT prints
only 15 rows of the CAS table. The VAR statement limits the output table to three
columns, Origin, Type, and _Mean_.
options obs=15;
proc print data=mycas.mpghw_sum;
var origin type _mean_;
title "Average Highway Milages";
run;
Example 2: Selecting Variables to Print 1797
Details
This example demonstrates the following tasks:
n selects three variables for the reports
1798 Chapter 48 / PRINT Procedure
n creates a report for the default HTML destination and the LISTING destination
at the same time
n creates a stylized HTML report
Program Description
HTML is the default destination when SAS opens in the windowing environment.
Print the output The VAR statement specifies the variables to print.
proc print data=exprev;
var country price sale_type;
title 'Monthly Price Per Unit and Sale Type for Each Country';
footnote '*prices in USD';
run;
Example 2: Selecting Variables to Print 1799
Sale_
Obs Country Price Type
*prices in USD
1800 Chapter 48 / PRINT Procedure
Program Description
You can go a step further and add more formatting to your HTML output. The
following example uses the STYLE option to add shading and spacing to your
HTML report.
options obs=5;
ods html file='your_file_styles.html';
Create stylized HTML output. The first STYLE option specifies that the column
headings are written in green italic font. The second STYLE option specifies that
observation number column has a background color of the RGB color a8a44ff8a
and a text color of blue. The BLANKLINE option specifies to add a blank line
between each observation and use a background color of the CMYK color
cx456789. Because a style has not been defined for the OBSHEADER location, the
Obs column heading in the output uses the default style color and not green.
Program Description
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. The PAGENO= option specifies the starting page
number. The LINESIZE= option specifies the output line length, and the PAGESIZE=
option specifies the number of lines on an output page. The OBS= option specifies
the number of observations to display.
options nodate pageno=1 linesize=80 pagesize=30 obs=10;
1802 Chapter 48 / PRINT Procedure
Close the HTML destination and open the LISTING destination. HTML is the
default destination when you start SAS. To create only a LISTING report, you can
close the HTML destination and open the LISTING destination.
ods html close;
ods listing;
Print the data set EXPREV. EXPREV contains information about a company's
product order type and price per unit for two months. DOUBLE inserts a blank line
between observations. The DOUBLE option has no effect on the HTML output.
proc print data=exprev double;
Select the variables to include in the report. The VAR statement creates columns
for Country, Price, and Sale_Type, in that order.
var country price sale_type;
Specify a title and a footnote. The TITLE statement specifies the title for the
report. The FOOTNOTE statement specifies a footnote for the report.
title 'Monthly Price Per Unit and Sale Type for Each Country';
footnote '*prices in USD';
run;
Close the LISTING destination and reopen the HTML destination. When you close
and reopen the HTML destination, SAS saves HTML output to the current directory
and not the Work library.
ods listing close;
ods html;
Output: LISTING
By default, PROC PRINT identifies each observation by number under the column
heading Obs.
Example 3: Customizing Text in Column Headings 1803
Monthly Price Per Unit and Sale Type for Each Country 1
Sale_
Obs Country Price Type
*prices in USD
Details
This example demonstrates the following tasks:
n underlines the text in column headings for variables in LISTING output
n adds background color to the column headings for variables in PDF output
n customizes the column heading for the column that identifies observations by
number
n shows the number of observations in the report
n writes the values of the variable Price with dollar signs and periods
Program Description
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. The PAGENO= option specifies the starting page
number. The LINESIZE= option specifies the output line length, and the PAGESIZE=
option specifies the number of lines on an output page. The OBS= option specifies
the number of observations to be displayed.
options nodate pageno=1 linesize=80 pagesize=30 obs=10;
Close the HTML destination and open the LISTING destination. By default, the
HTML destination is open.
ods html close;
ods listing;
Example 3: Customizing Text in Column Headings 1805
Print the report and define the column headings. SPLIT= identifies the asterisk as
the character that starts a new line in column headings. The N option prints the
number of observations at the end of the report. OBS= specifies the column
heading for the column that identifies each observation by number. The split
character (*) starts a new line in the column heading. The equal signs (=) in the
value of OBS= underlines the column heading.
proc print data=exprev split='*' n
obs='Observation*Number*===========';
Select the variables to include in the report. The VAR statement creates columns
for Country, Sale_Type, and Price, in that order.
var country sale_type price;
Assign the variables' labels as column headings. The LABEL statement associates
a label with each variable for the duration of the PROC PRINT step. When you use
the SPLIT= option in the PROC PRINT statement, the procedure uses labels for
column headings. The split character (*) starts a new line in the column heading.
The equal signs (=) in the labels underlines the column headings.
label country='Country Name**============'
sale_type='Order Type**=========='
price='Price Per Unit*in USD*==============';
Specify a title for the report, and format any variable containing numbers. The
FORMAT statement assigns the DOLLAR10.2 format to the variable Price in the
report. The TITLE statement specifies a title.
format price dollar10.2;
title 'Order Type and Price Per Unit in Each Country';
run;
Output: LISTING
Output 48.6 Customizing Column Headings: LISTING Output
N = 10
Program Description
You can easily create PDF output by adding a few ODS statements. In the following
example, ODS statements were added to produce PDF output.
Create PDF output and specify the file to store the output in. The ODS PDF
statement opens the PDF destination and creates PDF output. The FILE= argument
specifies the external file that contains the PDF output.
Example 3: Customizing Text in Column Headings 1807
Set the procedure options. The N option prints the number of observations at the
end of the report. OBS= specifies the column heading for the column that identifies
each observation by number.
proc print data=exprev n obs='Observation Number';
Process the variables in the data set. The VAR statement specifies the variables to
print. The LABEL statement creates text to print in place of the variable names. The
FORMAT statement specifies to format the price variables using the DOLLARw.
format. The TITLE statement creates a title for the report.
var country sale_type price;
label country='Country Name'
sale_type='Order Type'
price='Price Per Unit in USD';
format price dollar10.2;
title 'Order Type and Price Per Unit in Each Country';
run;
Close the PDF destination. The ODS PDF CLOSE statement closes the PDF
destination.
ods pdf close;
Output: PDF
Output 48.7 Customizing Column Heading: Default PDF Output
1808 Chapter 48 / PRINT Procedure
Program Description
The OBS= system option specifies to process 10 observations.
options obs=10;
ods pdf file='your_file.pdf';
Create stylized PDF output. The first STYLE option specifies that the background
color of the cell containing the value for N be changed to light blue and that the
font style be changed to italic. The second STYLE option specifies that the
background color of the observation column, the observation header, and the other
variable's headers be changed to a light yellow, the text color is changed to blue,
and the font style is changed to italic.
proc print data=exprev n obs='Observation Number'
style(n)={backgroundcolor=light blue fontstyle=italic}
style(header obs obsheader)={backgroundcolor=light yellow
color=blue
fontstyle=italic};
style(data)={backgroundcolor=very light blue}
Create stylized PDF output. The STYLE option changes the color of the cells
containing data to a very light blue.
var country sale_type price / style(data)=[backgroundcolor=very
light blue];
label country='Country Name'
sale_type='Order Type'
price='Price Per Unit in USD';
format price dollar10.2;
Example 4: Creating Separate Sections of a Report for Groups of Observations 1809
run;
title 'Order Type and Price Per Unit in Each Country';
Close the PDF destination. The ODS PDF CLOSE statement closes the PDF
destination.
ods pdf close;
LABEL statement
ODS RTF statement
TITLE statement
Data set: EXPREV
ODS HTML, RTF
destinations:
Details
This example demonstrates the following:
n suppresses the printing of observation numbers at the beginning of each row
n presents the data for each sale type in a separate section of the report
Program Description
The HTML destination is open by default. No ODS HTML statement is needed.
options obs=10;
Sort the EXPREV data set. PROC SORT sorts the observations by Sale_Type,
Order_Date, and Quantity.
proc sort data=exprev;
by sale_type order_date quantity;
run;
Print the report, specify the total number of observations in each BY group, and
suppress the printing of observation numbers. N= prints the number of
observations in a BY group at the end of that BY group. The explanatory text that
the N= option provides precedes the number. NOOBS suppresses the printing of
observation numbers at the beginning of the rows. LABEL uses the variables' labels
as column headings.
proc print data=exprev n='Number of observations for the month: '
noobs label;
Specify the variables to include in the report. The VAR statement creates columns
for Quantity, Cost, and Price, in that order.
var quantity cost price;
Create a separate section for each order type and specify page breaks for each BY
group of Order_Date. The BY statement produces a separate section of the report
for each BY group and prints a heading above each one. The PAGEBY statement
starts a new page each time the value of Order_Date changes.
by sale_type order_date;
pageby order_date;
Establish the column headings. The LABEL statement associates labels with the
variables Sale_Type and Order_Date for the duration of the PROC PRINT step.
When you use the LABEL option in the PROC PRINT statement, the procedure uses
labels for column headings.
label sale_type='Order Type' order_date='Order Date';
Format the columns that contain numbers and specify a title and footnote. The
FORMAT statement assigns a format to Price and Cost for this report. The TITLE
statement specifies a title. The TITLE2 statement specifies a second title.
format price dollar7.2 cost dollar7.2;
title 'Prices and Cost Grouped by Date and Order Type';
title2 'in USD';
run;
proc options option=bufno define;
run;
1812 Chapter 48 / PRINT Procedure
Output: HTML
Output 48.9 Creating Separate Sections of a Report for Groups of Observations:
HTML Output
Example 4: Creating Separate Sections of a Report for Groups of Observations 1813
Program Description
The OBS= system option specifies to process 10 observations.
options obs=10;
Create output for Microsoft Word and specify the file to store the output in. The
ODS RTF statement opens the RTF destination and creates output formatted for
Microsoft Word. The FILE= option specifies the external file that contains the RTF
output. The STARTPAGE=NO option specifies that no new pages be inserted
explicitly at the start of each by group.
ods rtf file='your_file.rtf' startpage=no;
proc sort data=exprev;
by sale_type order_date quantity;
run;
proc print data=exprev n='Number of observations for each order type:'
noobs label;
var quantity cost price;
by sale_type order_date;
pageby order_date;
label sale_type='Order Type' order_date='Order Date';
format price dollar7.2 cost dollar7.2;
title 'Price and Cost Grouped by Date and Order Type';
title2 'in USD';
run;
Close the RTF destination. The ODS RTF CLOSE statement closes the RTF
destination.
ods rtf close;
Example 4: Creating Separate Sections of a Report for Groups of Observations 1815
Output: RTF
Output 48.10 Creating Separate Sections of a Report for Groups of Observations:
Default RTF Output
1816 Chapter 48 / PRINT Procedure
Program Description
The OBS= system option specifies to process 10 observations.
options obs=10;
ods rtf file='your_file.rtf' startpage=no;
proc sort data=exprev;
by sale_type order_date quantity;
run;
Create a stylized RTF report. The first STYLE option specifies that the background
color of the cell containing the number of observations be changed to light gray.
The second STYLE option specifies that the background color of the column
heading for the variable Quantity be changed to light yellow. The third STYLE
option specifies that the background color of the column heading for the variable
Cost be changed to light blue and the font color be changed to white. The fourth
STYLE option specifies that the background color of the column heading for the
variable Price be changed to light green.
proc print data=exprev n='Number of observations for the month: '
noobs label style(N)={backgroundcolor=very light gray};
var quantity / style(header)=[backgroundcolor=light yellow];
var cost / style(header)=[backgroundcolor=light blue foreground =
white];
var price / style(header)=[backgroundcolor=light green];
by sale_type order_date;
Example 4: Creating Separate Sections of a Report for Groups of Observations 1817
pageby order_date;
label sale_type='Order Type' order_date='Order Date';
format price dollar7.2 cost dollar7.2;
Details
This example demonstrates the following tasks:
n sums expenses and revenues for each region and for all regions.
n shows the number of observations in each BY group and in the whole report.
n creates a customized title, containing the name of the region. This title replaces
the default BY line for each BY group.
n creates a default HTML file.
Program Description
The HTML destination is open by default. This program uses the default filename
for the HTML output. No ODS HTML statement is needed.
Start each BY group on a new page and suppress the printing of the default BY
line. The SAS system option NOBYLINE suppresses the printing of the default BY
line. When you use PROC PRINT with the NOBYLINE option, each BY group starts
on a new page. The OBS= option specifies the number of observations to process.
options obs=10 nobyline;
Sort the data set. PROC SORT sorts the observations by Sale_Type.
proc sort data=exprev;
by sale_type;
run;
Print the report, suppress the printing of observation numbers, and print the total
number of observations for the selected variables. NOOBS suppresses the
printing of observation numbers at the beginning of the rows. SUMLABEL prints the
BY variable label on the summary line of each. N= prints the number of
observations in a BY group at the end of that BY group and (because of the SUM
statement) prints the number of observations in the data set at the end of the
report. The first piece of explanatory text that N= provides precedes the number
for each BY group. The second piece of explanatory text that N= provides precedes
the number for the entire data set.
proc print data=exprev noobs label sumlabel
n='Number of observations for the order type: '
'Number of observations for the data set: ';
Select the variables to include in the report. The VAR statement creates columns
for Country, Order_Date, Quantity, and Price, in that order.
var country order_date quantity price;
Assign the variables' labels as column headings. The LABEL statement associates
a label with each variable for the duration of the PROC PRINT step.
label sale_type='Sale Type'
Example 5: Summing Numeric Variables with One BY Group 1821
Sum the values for the selected variables. The SUM statement alone sums the
values of Price and Quantity for the entire data set. Because the PROC PRINT step
contains a BY statement, the SUM statement also sums the values of Price and
Quantity for each sale type that contains more than one observation.
sum price quantity;
by sale_type;
Format the numeric values for a specified column. The FORMAT statement
assigns the DOLLAR7.2. format to Price for this report.
format price dollar7.2;
Specify and format a dynamic (or current) title. The TITLE statement specifies a
title. The #BYVAL specification places the current value of the BY variable
Sale_Type in the title. Because NOBYLINE is in effect, each BY group starts on a
new page, and the title serves as a BY line.
title 'Retail and Quantity Totals for #byval(sale_type) Sales';
run;
Generate the default BY line. The SAS system option BYLINE resets the printing of
the default BY line.
options byline;
Output: HTML
Output 48.12 Summing Numeric Variables with One BY Group HTML Output
1822 Chapter 48 / PRINT Procedure
Program Description
options obs=10 nobyline;
Produce CSV formatted output and specify the file to store it in. The ODS
CSVALL statement opens the CSVALL destination and creates a file containing
tabular output with titles, notes, and BY lines. The FILE= argument specifies the
external file that contains the CSV output.
ods csvall file='your_file.csv';
proc sort data=exprev;
by sale_type;
run;
proc print data=exprev noobs label sumlabel
n='Number of observations for the order type: '
'Number of observations for the data set: ';
var country order_date quantity price;
label price='Total Retail Price* in USD'
country='Country' order_date='Date' quantity='Quantity';
sum price quantity;
by sale_type;
format price dollar7.2;
title 'Retail and Quantity Totals for #byval(sale_type) Sales';
run;
options byline;
Close the CSVALL destination. The ODS CSVALL CLOSE statement closes the
CSVALL destination.
ods csvall close;
1824 Chapter 48 / PRINT Procedure
LABEL statement
FORMAT statement
SORT procedure
TITLE statement
Data set: EXPREV
ODS HTML, LISTING
destinations:
Details
This example demonstrates the following tasks:
n sums quantities and retail prices for the following items:
Program Description
options obs=10;
Produce HTML output and specify the file to store the output in. The HTML
destination is open by default. The ODS HTML FILE= statement creates a file that
contains HTML output. The FILE= argument specifies the external file that contains
the HTML output.
proc sort data=exprev;
by sale_type order_date;
run;
proc print data=exprev n noobs sumlabel='Totals'
grandtotal_label='Grand Total';
by sale_type order_date;
sum price quantity cost;
label sale_type='Sale Type' order_date='Sale Date';
format price dollar10.2 cost dollar10.2;
title 'Retail and Quantity Totals for Each Sale Date and Sale Type';
run;
Example 6: Summing Numeric Variables with Multiple BY Variables 1827
Output: HTML
Output 48.14 Summing Numeric Variables with Multiple BY Variables: In Store
Sales: Default HTML Output
1828 Chapter 48 / PRINT Procedure
Program Description
options obs=10;
proc sort data=exprev;
by sale_type order_date;
run;
proc print data=exprev n noobs sumlabel='Totals'
grandtotal_label='Grand Total;
Create stylized HTML output. The STYLE option in the first SUM statement
specifies that the background color of the cell containing the grand total for the
variable Price be changed to white and the font color be changed to blue. The
STYLE option in the second SUM statement specifies that the background color of
cells containing totals for the variable Quantity be changed to dark blue and the
font color be changed to white.
by sale_type order_date;
sum price / style(GRANDTOTAL)=[backgroundcolor=white color=blue];
sum quantity / style(TOTAL)=[backgroundcolor=dark blue color=white];
label sale_type='Sale Type' order_date='Sale Date';
format price dollar10.2 cost dollar10.2;
title 'Retail and Quantity Totals for Each Sale Date and Sale Type';
run;
1830 Chapter 48 / PRINT Procedure
Program Description
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. The PAGENO= option specifies the starting page
number. The LINESIZE= option specifies the output line length, and the PAGESIZE=
option specifies the number of lines on an output page. OBS= specifies to stop
process in the data set after observation 15.
options nodate pageno=1 linesize=80 pagesize=40 obs=15;
Close the HTML destination and open the LISTING destination. The HTML
destination is open by default.
ods html close;
ods listing;
Sort the data set. PROC SORT sorts the observations by Sale_Type and
Order_Date.
Print the report, suppress the printing of observation numbers, print the total
number of observations for the selected variables and use the BY variable labels
in place of the BY variable names in the summary line. The N option prints the
number of observations in a BY group at the end of that BY group and prints the
total number of observations used in the report at the bottom of the report.
NOOBS suppresses the printing of observation numbers at the beginning of the
rows. The SUMLABEL option prints ‘Totals’ in the summary line in place of the BY
variable labels.
proc print data=exprev n noobs sumlabel=Totals'
grandtotal_label='Grand Total';
Create a separate section of the report for each BY group, and sum the values for
the selected variables. The BY statement produces a separate section of the report
for each BY group. The SUM statement alone sums the values of Price and Quantity
for the entire data set. Because the program contains a BY statement, the SUM
statement also sums the values of Price and Quantity for each BY group that
contains more than one observation.
by sale_type order_date;
sum price quantity;
Establish a label for selected variables, format the values of specified variables,
and create a title. The LABEL statement associates a label with the variables
Sale_Type and Order_Date for the duration of the PROC PRINT step. The labels are
used in the BY line at the beginning of each BY group and in the summary line in
place of BY variables. The FORMAT statement assigns a format to the variables
Price and Cost for this report. The TITLE statement specifies a title.
label sale_type='Sale Type'
Example 6: Summing Numeric Variables with Multiple BY Variables 1833
order_date='Sale Date';
format price dollar10.2 cost dollar10.2;
title 'Retail and Quantity Totals for Each Sale Date and Sale Type';
run;
Output: LISTING
The report uses default column headings (variable names) because neither the
SPLIT= nor the LABEL option is used. Nevertheless, the BY line at the top of each
section of the report shows the BY variables' labels and their values. The BY
variables' labels identifies the subtotals in the report summary line.
PROC PRINT sums Price and Quantity for each BY group that contains more than
one observation. However, sums are shown only for the BY variables whose values
change from one BY group to the next. For example, in the first BY group, where the
sale type is Catalog Sale and the sale date is >1/1/12, Quantity and Price are
summed only for the sale date because the next BY group is for the same sale type.
Output 48.16 PROC PRINT LISTING Output Showing Total Values for Price and
Quantity
Retail and Quantity Totals for Each Sale Date and Sale Type 1
Ship_
Country Emp_ID Date Quantity Price Cost
N = 4
Ship_
Country Emp_ID Date Quantity Price Cost
N = 3
1834 Chapter 48 / PRINT Procedure
Retail and Quantity Totals for Each Sale Date and Sale Type 2
Ship_
Country Emp_ID Date Quantity Price Cost
N = 1
Ship_
Country Emp_ID Date Quantity Price Cost
N = 3
Ship_
Country Emp_ID Date Quantity Price Cost
N = 1
Retail and Quantity Totals for Each Sale Date and Sale Type 3
Ship_
Country Emp_ID Date Quantity Price Cost
N = 3
Total N = 15
Example 7: Limiting the Number of Sums in a Report 1835
Details
This example demonstrates the following tasks:
n creates a separate section of the report for each combination of sale type and
sale date
n sums quantities and retail prices only for each sale type and for all sale types,
not for individual dates
n creates a PDF file
sumby sale_type;
label sale_type='Sale Type' order_date='Sale Date';
format price dollar10.2 cost dollar10.2;
title 'Retail and Quantity Totals for Each Sale Type';
run;
ods pdf close;
Program Description
The OBS= system option specifies to process 10 observations.
options obs=10;
Produce PDF output and specify the file to store the output in. The ODS HTML
CLOSE statement closes the default destination. The ODS PDF statement opens
the PDF destination and creates a file that contains PDF output. The FILE=
argument specifies the external file that contains the PDF output.
ods html close;
ods pdf file='your_file.pdf';
Sort the data set. PROC SORT sorts the observations by Sales_Type and
Order_Date.
proc sort data=exprev;
by sale_type order_date;
run;
Print the report and remove the observation numbers. NOOBS suppresses the
printing of observation numbers at the beginning of the rows. SUMLABEL uses the
label for the BY variables on the summary line of each BY group.
proc print data=exprev noobs sumlabel='Total' grandtotal_label='Grand
Total';
Sum the values for each region. The SUM and BY statements work together to sum
the values of Price and Quantity for each BY group as well as for the whole report.
The SUMBY statement limits the subtotals to one for each type of sale.
by sale_type order_date;
sum price quantity;
sumby sale_type;
Assign labels to specific variables. The LABEL statement associates a label with
the variables Sale_Type and Order_Date for the duration of the PROC PRINT step.
These labels are used in the BY group title or the summary line.
label sale_type='Sale Type' order_date='Sale Date';
Assign a format to the necessary variables and specify a title. The FORMAT
statement assigns the COMMA10. format to Cost and Price for this report. The
TITLE statement specifies a title.
format price dollar10.2 cost dollar10.2;
title 'Retail and Quantity Totals for Each Sale Type';
run;
Example 7: Limiting the Number of Sums in a Report 1837
Close the PDF destination. The ODS PDF CLOSE statement closes the PDF
destination.
ods pdf close;
Output: PDF
Output 48.17 Limiting the Number of Sums in a Report: PDF Output
Program Description
options obs=10;
ods pdf file='your_file.pdf';
proc sort data=exprev;
by sale_type order_date;
run;
proc print data=exprev noobs sumlabel='Total' grandtotal_label='Grand
Total';
by sale_type order_date;
Create stylized PDF output. The STYLE option in the first SUM statement specifies
that the background color of cells containing totals for the variable Price be
changed to light blue and the font color be changed to white. The STYLE option in
the second SUM statement specifies that the background color of the cell
containing the grand total for the Quantity variable be changed to yellow and the
font color be changed to red.
sum quantity price / style(TOTAL)=[backgroundcolor=light blue
color=white];
sum quantity price / style(GRANDTOTAL)=[backgroundcolor=green
color=white];
sumby sale_type;
label sale_type='Sale Type' order_date='Sale Date';
format price dollar10.2 cost dollar10.2;
title 'Retail and Quantity Totals for Each Sale Type';
run;
Example 8: Controlling the Layout of a Report with Many Variables 1839
STYLE
ODS RTF statement
SAS data set options
OBS=
Data set: EMPDATA
ODS LISTING, RTF
destinations:
Details
This example shows two ways of printing a data set with a large number of
variables: one is the default, printing multiple rows when there are a large number
of variables, and the other uses ROWS= option to print one row. The ROWS= option
is valid only for the LISTING destination. For detailed explanations of the layouts of
these two reports, see the option ROWS= on page 1764 and “Page Layout for
Limited Page Sizes” on page 1792.
These reports use a page size of 24 and a line size of 64 to help illustrate the
different layouts.
id idnumber;
title 'Personnel Data';
run;
Program Description
Create the EMPDATA data set. The data set EMPDATA contains personal and job-
related information about a company's employees. The DATA step creates this data
set.
data empdata;
input IdNumber $ 1-4 LastName $ 8-18 FirstName $ 19-28
City $ 29-41 State $ 42-43
Gender $ 45 JobCode $ 49-51 Salary 55-60 @63 Birth date7.
@73 Hired date7. HomePhone $ 83-95;
format birth hired date9.;
datalines;
1919 Adams Gerald Stamford CT M TA2 34376 15SEP70 07JUN05
203/781-1255
1653 Alexander Susan Bridgeport CT F ME2 35108 18OCT72 12AUG98
203/675-7715
Print only the first 12 observations in a data set. The OBS= data set option uses
only the first 12 observations to create the report. (This is just to conserve space
here.) The ID statement identifies observations with the formatted value of
IdNumber rather than with the observation number. This report is shown in
“Output: LISTING” on page 1842.
proc print data=empdata(obs=12);
id idnumber;
title 'Personnel Data';
run;
Print a report that contains only one row of variables on each page. ROWS=PAGE
prints only one row of variables for each observation on a page. This report is
shown in Output 48.283 on page 1843.
proc print data=empdata(obs=12) rows=page;
id idnumber;
title 'Personnel Data';
run;
1842 Chapter 48 / PRINT Procedure
Output: LISTING
In the traditional procedure output, each page of this report contains values for all
variables in each observation. In the HTML output, this report is identical to the
report that uses ROWS=PAGE.
Note that PROC PRINT automatically splits the variable names that are used as
column headings at a change in capitalization if the entire name does not fit in the
column. Compare the column headings for LastName (which fits in the column) and
FirstName (which does not fit in the column).
Output 48.19 Default Layout for a Report with Many Variables: LISTING Output
Personnel Data 1
Id First
Number LastName Name City State Gender
Id Job
Number Code Salary Birth Hired HomePhone
Personnel Data 2
Id First
Number LastName Name City State Gender
Id Job
Number Code Salary Birth Hired HomePhone
Each page of this report contains values for only some of the variables in each
observation. However, each page contains values for more observations than the
default report does.
Personnel Data 3
Id First
Number LastName Name City State Gender
Personnel Data 4
Id Job
Number Code Salary Birth Hired HomePhone
Program Description
The RTF output shows all data in one row. The ROWS= option is valid only for the
LISTING destination.
options pageno=1;
Create output for Microsoft Word and specify the file to store the output in. The
ODS RTF statement opens the RTF destination and creates output formatted for
Microsoft Word. The FILE= argument specifies the external file that contains the
RTF output.
ods rtf file='your_file.rtf';
proc print data=empdata(obs=12);
id idnumber;
title 'Personnel Data';
run;
Close the RTF destination. The ODS RTF CLOSE statement closes the RTF
destination.
ods rtf close;
Output: RTF
Output 48.21 Layout for a Report with Many Variables: RTF Output
Example 9: Creating a Customized Layout with BY Groups and ID Variables 1845
Details
This customized report demonstrates the following tasks:
n selects variables to include in the report and the order in which they appear
n sums the salaries for each job code and for all job codes
Program Description
Create and sort a temporary data set. PROC SORT creates a temporary data set in
which the observations are sorted by JobCode and Gender.
options nodate pageno=1 linesize=64 pagesize=60;
proc sort data=empdata out=tempemp;
by jobcode gender;
run;
Identify the character that starts a new line in column headings. SPLIT= identifies
the asterisk as the character that starts a new line in column headings.
proc print data=tempemp split='*' sumlabel='Total'
grandtotal_label='Grand Total';
Specify the variables to include in the report. The VAR statement and the ID
statement together select the variables to include in the report. The ID statement
and the BY statement produce the special format.
id jobcode;
by jobcode;
var gender salary;
Calculate the total value for each BY group. The SUM statement totals the values
of Salary for each BY group and for the whole report.
sum salary;
Create formatted columns. The FORMAT statement assigns a format to Salary for
this report. The WHERE statement selects for the report only the observations for
job codes that contain the letters 'FA' or 'ME'. The TITLE statement specifies the
report title.
format salary dollar11.2;
where jobcode contains 'FA' or jobcode contains 'ME';
title 'Salary Expenses';
run;
Example 9: Creating a Customized Layout with BY Groups and ID Variables 1847
Output: LISTING
The ID and BY statements work together to produce this layout. The ID variable is
listed only once for each BY group. The BY lines are suppressed. Instead, the value
of the ID variable, JobCode, identifies each BY group.
Salary Expenses 1
FA1 F $23,177.00
F $22,454.00
M $22,268.00
----------- -------------
Total $67,899.00
FA2 F $28,888.00
F $27,787.00
M $28,572.00
----------- -------------
Total $85,247.00
FA3 F $32,886.00
F $33,419.00
M $32,217.00
----------- -------------
Total $98,522.00
ME1 M $29,769.00
M $28,072.00
M $28,619.00
----------- -------------
Total $86,460.00
ME2 F $35,108.00
F $34,929.00
M $35,345.00
M $36,925.00
M $35,090.00
M $35,185.00
----------- -------------
Total $212,582.00
ME3 M $43,025.00
=========== =============
Grand Total $593,735.00
by jobcode gender;
run;
ods html file='your_file.html';
proc print data=tempemp (obs=10) sumlabel='Total'
grandtotal_lable='Grand Total';
id jobcode;
by jobcode;
var gender salary;
sum salary;
label jobcode='Job Code'
gender='Gender'
salary='Annual Salary';
format salary dollar11.2;
where jobcode contains 'FA' or jobcode contains 'ME';
title 'Salary Expenses';
run;
Program Description
proc sort data=empdata out=tempemp;
by jobcode gender;
run;
Produce HTML output and specify the file to store the output in. The HTML
destination is the default ODS destination. The ODS HTML statement FILE= option
specifies the external file that contains the HTML output.
ods html file='your_file.html';
Define the procedure options. The (obs=10) data set option sets the number of
observations to process. The SUMLABEL option indicates to use the label 'Total' on
the summary line for each BY group. The GRANDTOTAL_LABEL option indicates to
use the label 'Grand Total' on the grand total line after all BY groups in the report.
proc print data=tempemp (obs=10) sumlabel='Total'
grandtotal_lable='Grand Total';
id jobcode;
by jobcode;
var gender salary;
sum salary;
label jobcode='Job Code'
gender='Gender'
salary='Annual Salary';
format salary dollar11.2;
where jobcode contains 'FA' or jobcode contains 'ME';
title 'Salary Expenses';
run;
Example 9: Creating a Customized Layout with BY Groups and ID Variables 1849
Output: HTML
Output 48.23 Creating a Customized Layout with BY Groups and ID Variables:
Default HTML Output
style(HEADER)={fontstyle=italic}
style(DATA)={backgroundcolor=blue foreground=white};
id jobcode;
by jobcode;
var gender salary;
sum salary / style(total)={color=red};
label jobcode='Job Code'
gender='Gender'
salary='Annual Salary';
format salary dollar11.2;
where jobcode contains 'FA' or jobcode contains 'ME';
title 'Expenses Incurred for';
title2 'Salaries for Flight Attendants and Mechanics';
run;
Program Description
proc sort data=empdata out=tempemp;
by jobcode gender;
run;
Create stylized HTML output. The first STYLE option specifies that the font of the
headers be changed to italic. The second STYLE option specifies that the
background of cells that contain data be changed to blue and the foreground of
these cells be changed to white. The SUMLABEL and GRANDTOTAL_LABEL
options use a label in the summary and grand total lines, respectively, in place of
variable names.
proc print data=tempemp (obs=10)sumlabel='Total'
grandtotal_label='Grand Total'
style(HEADER)={fontstyle=italic}
style(DATA)={backgroundcolor=blue foreground=white};
id jobcode;
by jobcode;
var gender salary;
Create total values that are written in red. The STYLE option specifies that the
color of the foreground of the cell that contain the totals be changed to red.
sum salary / style(total)={color=red};
label jobcode='Job Code'
gender='Gender'
salary='Annual Salary';
format salary dollar11.2;
where jobcode contains 'FA' or jobcode contains 'ME';
title 'Expenses Incurred for';
title2 'Salaries for Flight Attendants and Mechanics';
run;
Example 10: Printing All the Data Sets in a SAS Library 1851
PRINT procedure
Data sets: PROCLIB.DELAY and
PROCLIB.INTERNAT from the Raw Data and DATA Steps appendix
ODS HTML
destination:
Details
This example prints all the data sets in a SAS library. You can use the same
programming logic with any procedure. Just replace the PROC PRINT step near the
end of the example with whatever procedure step you want to execute. The
example uses the macro language. For details about the macro language, see SAS
Macro Language: Reference.
Program Description
libname printlib 'SAS-data-library';
libname proclib 'SAS-data-library';
options nodate pageno=1;
Copy the desired data sets from the WORK library to a permanent library. PROC
DATASETS copies two data sets from the WORK library to the PRINTLIB library in
order to limit the number of data sets available to the example.
proc datasets library=proclib memtype=data nolist;
copy out=printlib;
select delay internat;
run;
Create a macro and specify the parameters. The %MACRO statement creates the
macro PRINTALL. When you call the macro, you can pass one or two parameters to
it. The first parameter is the name of the library whose data set you want to print.
The second parameter is a library used by the macro. If you do not specify this
parameter, the WORK library is the default.
%macro printall(libname,worklib=work);
Create the local macro variables. The %LOCAL statement creates two local macro
variables, NUM and I, to use in a loop.
%local num i;
Produce an output data set. This PROC DATASETS step reads the library that you
specify as a parameter when you invoke the macro. The CONTENTS statement
produces an output data set called TEMP1 in WORKLIB. This data set contains an
observation for each variable in each data set in the library LIBNAME. By default,
each observation includes the name of the data set that the variable is included in
as well as other information about the variable. However, the KEEP= data set
option writes only the name of the data set to TEMP1.
proc datasets library=&libname memtype=data nodetails;
contents out=&worklib..temp1(keep=memname) data=_all_ noprint;
run;
Specify the unique values in the data set, assign a macro variable to each one, and
assign DATA step information to a macro variable. This DATA step increments the
value of N each time it reads the last occurrence of a data set name (when IF
LAST.MEMNAME is true). The CALL SYMPUT statement uses the current value of
N to create a macro variable for each unique value of MEMNAME in the data set
TEMP1. The TRIM function removes extra blanks in the TITLE statement in the
PROC PRINT step that follows.
data _null_;
set &worklib..temp1 end=final;
by memname notsorted;
if last.memname;
n+1;
call symput('ds'||left(put(n,8.)),trim(memname));
Determine the number of observations in the DATA step. When it reads the last
observation in the data set (when FINAL is true), the DATA step assigns the value
1854 Chapter 48 / PRINT Procedure
of N to the macro variable NUM. At this point in the program, the value of N is the
number of observations in the data set.
if final then call symput('num',put(n,8.));
Run the DATA step. The RUN statement is crucial. It forces the DATA step to run,
thus creating the macro variables that are used in the CALL SYMPUT statements
before the %DO loop, which uses them, executes.
run;
Print the data sets and end the macro. The %DO loop issues a PROC PRINT step
for each data set. The %MEND statement ends the macro.
%do i=1 %to #
proc print data=&libname..&&ds&i noobs;
title "Data Set &libname..&&ds&i";
run;
%end;
%mend printall;
Print all the data sets in the PRINTLIB library. This invocation of the PRINTALL
macro prints all the data sets in the library PRINTLIB.
%printall(printlib)
Example 10: Printing All the Data Sets in a SAS Library 1855
Output: HTML
Output 48.25 Data Set PRINTLIB.DELAY
1856 Chapter 48 / PRINT Procedure
49
PRINTTO Procedure
You can store the SAS log or procedure output in an external file or in a SAS catalog
entry. To write SAS output to a file or a catalog entry, the ODS LISTING destination
must be open. With additional programming, you can use SAS output as input data
within the same job.
Table 49.1 Default Destinations for SAS Log and Procedure Output
Restrictions: To route SAS log and procedure output directly to a printer, you must use a
FILENAME statement with the PROC PRINTTO statement. See “Route SAS Log or
Procedure Output Directly to a Printer” on page 1864 and “Example 4: Routing to a
Printer” on page 1878.
The PRINTTO procedure does not define ODS destinations.
When SAS is started in objectserver mode, the PRINTTO procedure does not route
log messages to the log specified by the ALTLOG= system option.
Note: LOG=LOG and PRINT=PRINT route the log and procedure output to the default
destinations. However, specifying LOG=PRINT or PRINT=LOG to route log or
procedure output to the same default destination is not valid.
Tips: To reset the destination for the SAS log and procedure output to the default, use
the PROC PRINTTO statement without options.
To route the SAS log and procedure output to the same file, specify the same file
with both the LOG= and PRINT= options.
Examples: “Example 1: Routing to External Files” on page 1865
“Example 2: Routing to SAS Catalog Entries” on page 1869
“Example 3: Using Procedure Output as an Input File” on page 1873
“Example 4: Routing to a Printer” on page 1878
Syntax
PROC PRINTTO <options>;
Without Arguments
When no options are specified, the PROC PRINTTO statement does the following:
n closes any files opened by a PROC PRINTTO statement
n points both the SAS log and SAS procedure output to their default destinations
1860 Chapter 49 / PRINTTO Procedure
Interaction: To close the appropriate file and to return only the SAS log or procedure output
to its default destination, use LOG=LOG or PRINT=PRINT.
Examples:
“Example 1: Routing to External Files” on page 1865
“Example 2: Routing to SAS Catalog Entries” on page 1869
Optional Arguments
LABEL='description'
provides a description for a catalog entry that contains a SAS log or procedure
output.
Interaction Use the LABEL= option only when you specify a catalog entry as the
value for the LOG= option or the PRINT= option.
LOG
routes the SAS log to its default destination.
file-specification
routes the SAS log to an external file. file-specification can be one of the
following:
'external-file'
the name of an external file specified in quotation marks.
log-filename
is an unquoted alphanumeric text string. SAS creates a log that uses log-
filename.log as the log filename.
fileref
a fileref previously assigned to an external file.
SAS-catalog-entry
routes the SAS log to a SAS catalog entry. By default, libref is SASUSER,
catalog is PROFILE, and type is LOG. Express SAS-catalog-entry in one of the
following ways:
libref.catalog.entry<.LOG>
a SAS catalog entry stored in the SAS library and SAS catalog specified.
catalog.entry<.LOG>
a SAS catalog entry stored in the specified SAS catalog in the default
SAS library SASUSER.
PROC PRINTTO Statement 1861
entry.LOG
a SAS catalog entry stored in the default SAS library and catalog:
SASUSER.PROFILE.
fileref
a fileref previously assigned to a SAS catalog entry. Search for
"FILENAME, CATALOG Access Method" in the SAS online
documentation.
Default LOG
Interactions The SAS log and procedure output cannot be routed to the same
catalog entry at the same time.
The NEW option replaces the existing contents of a file with the
new log. Otherwise, the new log is appended to the file.
To route the SAS log and procedure output to the same file, specify
the same file with both the LOG= and PRINT= options.
When routing the log to a SAS catalog entry, you can use the
LABEL option to provide a description for the entry in the catalog
directory.
When the log is routed to a file other than the default log file and
programs are submitted from multiple sources, the final SAS
system messages that contain the real and CPU times are written
to the default SAS log.
Tips After routing the log to an external file or a catalog entry, you can
specify LOG to route the SAS log back to its default destination.
When routing the SAS log, include a RUN statement in the PROC
PRINTTO statement. If you omit the RUN statement, the first line
of the following DATA or PROC step is not routed to the new file.
(This occurs because a statement does not execute until a step
boundary is crossed.)
If you create a macro that contains a password and you do not want
the password to appear in the SAS log, use the LOG=file-
specification option to redirect the log to an external file.
When you specify LOG=, SAS stores the path of the SAS log file in
the &SYSPRINTTOLOG automatic macro variable. You can use this
macro variable to restore the previous SAS log file location. For
more information, see “Restore the Previous SAS Log or LISTING
Output File Location” on page 1865.
NEW
clears any information that exists in a file and prepares the file to receive the
SAS log or procedure output.
Default If you omit NEW, the new information is appended to the existing
file.
Interaction If you specify both LOG= and PRINT=, NEW applies to both.
PRINT
routes procedure output to its default destination.
Tip After routing it to an external file or a catalog entry, you can specify
PRINT to route subsequent procedure output to its default destination.
file-specification
routes procedure output to an external file. file-specification can be one of
the following:
'external-file'
the name of an external file specified in quotation marks.
print-filename
is an unquoted alphanumeric text string. SAS creates a print file that uses
print-filename as the print filename.
Operating Environment Information: For more information about using
print-filename, see the documentation for your operating environment.
fileref
a fileref previously assigned to an external file.
Operating Environment Information: For additional information about file-
specification for the PRINT option, see the documentation for your operating
environment.
SAS-catalog-entry
routes procedure output to a SAS catalog entry. By default, libref is
SASUSER, catalog is PROFILE, and type is OUTPUT. Express SAS-catalog-
entry in one of the following ways:
libref.catalog.entry<.OUTPUT>
a SAS catalog entry stored in the SAS library and SAS catalog specified.
catalog.entry<.OUTPUT>
a SAS catalog entry stored in the specified SAS catalog in the default
SAS library SASUSER.
PROC PRINTTO Statement 1863
entry.OUTPUT
a SAS catalog entry stored in the default SAS library and catalog:
SASUSER.PROFILE.
fileref
a fileref previously assigned to a SAS catalog entry. Search for
"FILENAME, CATALOG Access Method" in the SAS online
documentation.
Default PRINT
Interactions When you specify PRINT, FILE=, or NAME=, and the LISTING
destination is not open, the PRINTTO procedure opens the LISTING
destination for the duration of routing the procedure output. If the
LISTING destination was open before PRINT, FILE=, or NAME= was
specified, it remains open after the output has been routed to its
destination.
The procedure output and the SAS log cannot be routed to the
same catalog entry at the same time.
The NEW option replaces the existing contents of a file with the
new procedure output. If you omit NEW, the new output is
appended to the file.
To route the SAS log and procedure output to the same file, specify
the same file with both the LOG= and PRINT= options.
When routing procedure output to a SAS catalog entry, you can use
the LABEL option to provide a description for the entry in the
catalog directory.
Tip When you specify PRINT=, SAS stores the path of the LISTING
output file in the &SYSPRINTTOLIST automatic macro variable.
You can use this macro variable to restore the previous LISTING
output file location. For more information, see “Restore the
Previous SAS Log or LISTING Output File Location” on page 1865.
UNIT=nn
routes the output to the file identified by the fileref FTnnF001, where nn is an
integer between 1 and 99.
Tips You can define this fileref yourself. However, some operating systems
predefine certain filerefs in this form.
When you specify UNIT=, SAS stores the path of the LISTING output file
in the &SYSPRINTTOLIST automatic macro variable. You can use this
macro variable to restore the previous LISTING output file location. For
1864 Chapter 49 / PRINTTO Procedure
more information, see “Restore the Previous SAS Log or LISTING Output
File Location” on page 1865.
You can specify the beginning page number for the output that you are currently
producing by using the PAGENO= in an OPTIONS statement.
The PRINTTO procedure does not support the COLORPRINTING system option. If
you route the SAS log or procedure output to a color printer, the output does not
print in color.
opened when you use proc printto; to reset the output destinations. SAS does
not open the LISTING destination when you specify the LOG= option.
If the LISTING destination is open before the PROC PRINTTO PRINT= option
executes, it remains open after the output is routed to the external file.
SYSPRINTTOLOG contains the path of the SAS log file location prior to
redirection by the PRINTTO procedure
SYSPRINTTOLIST contains the path of the LISTING output file location prior
to redirection by the PRINTTO procedure
To restore the previous file locations, you specify the appropriate automatic macro
variable as the value of the LOG=, PRINT=, or UNIT= options. Here are some
examples:
Details
This example uses PROC PRINTTO to route the log and procedure output to an
external file and then reset both destinations to the default.
Program
options nodate pageno=1 linesize=80 pagesize=60 source;
proc printto log='log-file';
run;
data numbers;
input x y z;
datalines;
14.2 25.2 96.8
10.8 51.6 96.8
9.5 34.2 138.2
8.8 27.6 83.2
11.5 49.4 287.0
6.3 42.0 170.7
;
proc printto print='output-file'
new;
run;
proc print data=numbers;
title 'Listing of NUMBERS Data Set';
run;
proc printto;
run;
Program Description
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page. The SOURCE option writes lines of source code to the
default destination for the SAS log.
options nodate pageno=1 linesize=80 pagesize=60 source;
Route the SAS log to an external file. PROC PRINTTO uses the LOG= option to
route the SAS log to an external file. By default, this log is appended to the current
contents of log-file.
proc printto log='log-file';
Example 1: Routing to External Files 1867
run;
Create the NUMBERS data set. The DATA step uses list input to create the
NUMBERS data set.
data numbers;
input x y z;
datalines;
14.2 25.2 96.8
10.8 51.6 96.8
9.5 34.2 138.2
8.8 27.6 83.2
11.5 49.4 287.0
6.3 42.0 170.7
;
Route the procedure output to an external file. PROC PRINTTO routes output to
an external file. Because the LISTING destination must be open in order to route
SAS output to an external file, SAS opens the LISTING destination if it is not
already open. You do not need to include the ODS LISTING statement. Because
NEW is specified, any output written to output-file will overwrite the file's current
contents. If SAS opened the LISTING destination to process the PROC PRINTTO
output, SAS closes the LISTING destination after the output is written to output-
file.
proc printto print='output-file'
new;
run;
Print the NUMBERS data set. The PROC PRINT output is written to the specified
external file.
proc print data=numbers;
title 'Listing of NUMBERS Data Set';
run;
Reset the SAS log and procedure output destinations to default. PROC PRINTTO
routes subsequent logs and procedure output to their default destinations and
closes both of the current files.
proc printto;
run;
Log
Example Code 49.1 Portion of Log Routed to the Default Destination
5
6 data numbers;
7 input x y z;
8 datalines;
15 ;
16
17 proc printto print='print1.out' new;
18 run;
19
20 proc print data=numbers;
21 title 'Listing of NUMBERS Data Set';
22 run;
NOTE: There were 6 observations read from the data set WORK.NUMBERS.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.56 seconds
cpu time 0.09 seconds
23
24 proc printto;
25 run;
Example 2: Routing to SAS Catalog Entries 1869
Output
Output 49.1 Procedure Output Routed to an External File
Details
This example uses PROC PRINTTO to route the SAS log and procedure output to a
SAS catalog entry and then to reset both destinations to the default.
1870 Chapter 49 / PRINTTO Procedure
Program
options source;
libname lib1 'SAS-library';
proc printto log=test.log label='Inventory program' new;
run;
data lib1.inventry;
length Dept $ 4 Item $ 6 Season $ 6 Year 4;
input dept item season year @@;
datalines;
3070 20410 spring 2011 3070 20411 spring 2012
3070 20412 spring 2012 3070 20413 spring 2012
3070 20414 spring 2011 3070 20416 spring 2009
3071 20500 spring 2011 3071 20501 spring 2009
3071 20502 spring 2011 3071 20503 spring 2011
3071 20505 spring 2010 3071 20506 spring 2009
3071 20507 spring 2009 3071 20424 spring 2011
;
proc printto print=lib1.cat1.inventry.output
label='Inventory program' new;
run;
Program Description
Set the SAS system options. The SOURCE option specifies to write source
statements to the SAS log.
options source;
Assign a libref.
libname lib1 'SAS-library';
Route the SAS log to a SAS catalog entry. PROC PRINTTO routes the SAS log to a
SAS catalog entry named SASUSER.PROFILE.TEST.LOG. The PRINTTO procedure
uses the default libref and catalog SASUSER.PROFILE because only the entry name
and type are specified. LABEL= assigns a description for the catalog entry.
proc printto log=test.log label='Inventory program' new;
run;
Create the LIB1.INVENTORY data set. The DATA step creates a permanent SAS
data set.
data lib1.inventry;
Example 2: Routing to SAS Catalog Entries 1871
Route the procedure output to a SAS catalog entry. PROC PRINTTO routes opens
the LISTING destination in order to route the procedure output from the
subsequent PROC REPORT step to the SAS catalog entry
LIB1.CAT1.INVENTRY.OUTPUT. LABEL= assigns a description for the catalog entry.
After the procedure output is routed to the SAS catalog, PROC PRINTTO closes the
LISTING destination.
proc printto print=lib1.cat1.inventry.output
label='Inventory program' new;
run;
Reset the SAS log and procedure output back to the default and close the file.
PROC PRINTTO closes the current files that were opened by the previous PROC
PRINTTO step and reroutes subsequent SAS logs and procedure output to their
default destinations.
proc printto;
run;
Log
To view this log using SAS Explorer, select Sasuser ð Profile. Double-click Test.
The log opens in NOTEPAD.
1872 Chapter 49 / PRINTTO Procedure
49
50 data lib1.inventry;
51 length Dept $ 4 Item $ 6 Season $ 6 Year 4;
52 input dept item season year @@;
53 datalines;
NOTE: SAS went to a new line when INPUT statement reached past the end of a
line.
NOTE: The data set LIB1.INVENTRY has 14 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
61 ;
62 ods listing;
63 proc printto print=lib1.cat1.inventry.output
64 label='Inventory program' new;
65 run;
66
67 proc report data=lib1.inventry nowindows headskip;
68 column dept item season year;
69 title 'Current Inventory Listing';
70 run;
NOTE: There were 14 observations read from the data set LIB1.INVENTRY.
NOTE: PROCEDURE REPORT used (Total process time):
real time 0.09 seconds
cpu time 0.04 seconds
71
72 proc printto;
73 run;
NOTE: PROCEDURE PRINTTO used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Output
To view this log using SAS Explorer, select Lib1 ð Cat1. Double-click Inventry. The
output opens in NOTEPAD.
Example 3: Using Procedure Output as an Input File 1873
Details
This example uses PROC PRINTTO to route procedure output to an external file
and then uses that file as input to a DATA step.
Program
data test;
do n=1 to 1000;
x=int(ranuni(77777)*7);
y=int(ranuni(77777)*5);
1874 Chapter 49 / PRINTTO Procedure
output;
end;
run;
filename routed 'output-filename';
proc printto print=routed new;
run;
proc freq data=test;
tables x*y / chisq;
run;
proc printto print=print;
run;
data probtest;
infile routed;
input word1 $ @;
if word1='Chi-Squa' then
do;
input df chisq prob;
keep chisq prob;
output;
end;
run;
proc print data=probtest;
title 'Chi-Square Analysis for Table of X by Y';
run;
Program Description
Generate random values for the variables. The DATA step uses the RANUNI
function to randomly generate values for the variables X and Y in the data set A.
data test;
do n=1 to 1000;
x=int(ranuni(77777)*7);
y=int(ranuni(77777)*5);
output;
end;
run;
Assign a fileref and route procedure output to the file that is referenced. The
FILENAME statement assigns a fileref to an external file. PROC PRINTTO routes
subsequent procedure output to the file that is referenced by the fileref ROUTED.
PROC PRINTTO opens the LISTING destination for the duration of routing the
procedure option. See PROC FREQ Output Routed to the External File Referenced
as ROUTED below.
filename routed 'output-filename';
proc printto print=routed new;
run;
Example 3: Using Procedure Output as an Input File 1875
Produce the frequency counts. PROC FREQ computes frequency counts and a chi-
square analysis of the variables X and Y in the data set TEST. This output is routed
to the file that is referenced as ROUTED.
proc freq data=test;
tables x*y / chisq;
run;
Close the file. You must use another PROC PRINTTO to close the file that is
referenced by fileref ROUTED so that the following DATA step can read it. The step
also routes subsequent procedure output to the default destination. PRINT= causes
the step to affect only procedure output, not the SAS log.
proc printto print=print;
run;
Create the data set PROBTEST. The DATA step uses ROUTED, the file containing
PROC FREQ output, as an input file and creates the data set PROBTEST. This DATA
step reads all records in ROUTED but creates an observation only from a record
that begins with Chi-Squa.
data probtest;
infile routed;
input word1 $ @;
if word1='Chi-Squa' then
do;
input df chisq prob;
keep chisq prob;
output;
end;
run;
Print the PROBTEST data set. PROC PRINT produces a simple listing of data set
PROBTEST. This output is routed to the default destination. See PROC PRINT
Output of Data Set PROBTEST, Routed to Default Destination in the Output
section.
proc print data=probtest;
title 'Chi-Square Analysis for Table of X by Y';
run;
1876 Chapter 49 / PRINTTO Procedure
Output
Output 49.3 PROC FREQ Output Routed to the External File Referenced as ROUTED
Example 3: Using Procedure Output as an Input File 1877
1878 Chapter 49 / PRINTTO Procedure
Output 49.4 PROC PRINT Output of Data Set PROBTEST, Routed to the Default
Destination
Details
This example uses PROC PRINTTO to route procedure output directly to a printer.
Example 4: Routing to a Printer 1879
Program
options nodate pageno=1 linesize=80 pagesize=60;
filename your_fileref printer
'printer-name';
proc printto print=your_fileref;
run;
Program Description
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Associate a fileref with the printer name. The FILENAME statement associates a
fileref with the printer name that you specify. If you want to associate a fileref with
the default printer, omit 'printer-name'.
filename your_fileref printer
'printer-name';
Specify the file to route to the printer. The PRINT= option specifies the file that
PROC PRINTTO routes to the printer.
proc printto print=your_fileref;
run;
1880 Chapter 49 / PRINTTO Procedure
1881
50
PRODUCT_STATUS Procedure
Overview: PRODUCT_STATUS
Procedure
Special Considerations
PROC PRODUCT_STATUS does not return information about web applications or
other Java-based products.
1882 Chapter 50 / PRODUCT_STATUS Procedure
If your site has installed a SAS Metadata Server, then you should use the SAS
ViewRegistry utility instead of PROC PRODUCT_STATUS.
The SYSVLONG and SYSVLONG4 automatic macro variables return only the
version information for the SAS host image that is installed at your site. They do
not return information for all of the SAS Foundation products that are installed at
your site. For more information, see “SYSVLONG Automatic Macro Variable” in SAS
Macro Language: Reference and “SYSVLONG4 Automatic Macro Variable” in SAS
Macro Language: Reference.
Statement Task
“Example: Results from Specify the names and versions of the SAS Foundation
PROC PRODUCT_STATUS” products that are installed on your operating system.
Restriction: PROC PRODUCT_STATUS is deprecated for SAS Viya 3.5 and will not be available
in future SAS Viya releases. PROC PRODUCT_STATUS is available to SAS 9.4
users.
Syntax
PROC PRODUCT_STATUS;
Details
The PROC PRODUCT STATUS statement does not have any arguments.
Example: Results from PROC PRODUCT_STATUS 1883
Here is a partial output that contains an example of the results that are produced
by PROC PRODUCT_STATUS.
51
PROTO Procedure
void void
C Structures in SAS
Basic Concepts
Many C language libraries contain functions that have structure pointers as
arguments. In SAS, structures can be defined only in PROC PROTO. After being
defined, they can be declared and instantiated within many PROC PROTO
compatible procedures, such as PROC COMPILE.
Each structure is set to zero values at declaration time. The structure retains the
value from the previous pass through the data to start the next pass.
Concepts: PROTO Procedure 1889
Structure elements are referenced by using the static period (.) notation of C. There
is no pointer syntax for SAS. If a structure points to another structure, the only way
to reference the structure that is pointed to is by assigning the pointer to a declared
structure of the same type. You use that declared structure to access the elements.
The length of arrays must be known to SAS so that an array entry in a structure can
be used in the same way as an array in SAS, as long as its dimension is declared in
the structure. This requirement includes arrays of short, int, and long types. If the
entry is actually a pointer to an array of a double type, then the array elements can
be accessed by assigning that pointer to a SAS array. Pointers to arrays of other
types cannot be accessed by using the array syntax.
Structure Example
proc proto package =
sasuser.mylib.struct
label = "package of structures";
struct foo2 {
str tom;
};
Enumerations in SAS
Enumerations are mnemonics for integer numbers. Enumerations enable you to set
a literal name as a specific number and aid in the readability and supportability of C
programs. Enumerations are used in C language libraries to simplify the return
codes. After a C program is compiled, you can no longer access enumeration names.
typedef enum
{
True, False, Maybe
} YesNoMaybeType;
typedef enum {
Ten=10, Twenty=20, Thirty=30, Forty=40, Fifty=50
} Tens;
typedef struct {
short rows;
short cols;
YesNoMaybeType type;
Tens dollar;
ExerciseArray dates;
} EStructure;
run;
The following PROC FCMP example shows how to access these enumerated types.
In this example, the enumerated values that are set up in PROC PROTO are
implemented in SAS as macro variables. Therefore, they must be accessed using
the & symbol:
proc fcmp library=sasuser.mylib;
Concepts: PROTO Procedure 1891
mystruct.type=&True;
mystruct.dollar=&Twenty;
run;
The function name tells PROC PROTO which function's source code is specified
between the EXTERNC and EXTERNCEND statements. When PROC PROTO
compiles source code, it includes any structure definitions and C function
prototypes that are currently declared. However, typedef and #define are not
included.
Function Description
Function Description
The following example shows a simple C function written directly in PROC PROTO:
proc proto
package=sasuser.mylib.foo;
struct mystruct {
short a;
long b;
};
int fillMyStruct(short a, short b,
struct mystruct * s);
Concepts: PROTO Procedure 1893
externc fillMyStruct;
int fillMyStruct(short a, short b,
struct mystruct * s) {
s ->a = a;
s ->b = b;
return(0);
}
externcend;
run;
n The union type is not supported. However, if you plan to use only one element
of the union, you can declare the variable for the union as the type for that
element.
n All non-pointer references to other structures must be defined before they are
used.
n You cannot use the ENUM key word in a structure. In order to specify ENUM in a
structure, use the TYPEDEF key word.
n Structure elements with the same alphanumeric name but with different cases
(for example, ALPHA, Alpha, and alpha) are not supported. SAS is not case-
sensitive. Therefore, all structure elements must be unique when compared in a
case-insensitive program.
1894 Chapter 51 / PROTO Procedure
Restriction: This procedure is not available in SAS Viya orders that include only SAS Visual
Analytics.
Example: “Example: Splitter Function Example” on page 1910
Syntax
PROC PROTO PACKAGE=entry <options>;
PROC PROTO Statement 1895
Required Arguments
PACKAGE=entry
specifies the SAS entry where the prototype information is saved. Entry is a
three-level name having the following form: library.dataset.package.
Package enables you to specify grouping in the GUI. Package must be unique for
the first 8 characters.
function-prototype-1
function-prototype-n
contains the C code of the function prototypes.
Optional Arguments
ENCRYPT | HIDE
specifies that encoding within a database is allowed.
LABEL=package-label
specifies a text string that is used to describe or label the package. The
maximum length of the label is 256 characters.
NOSIGNALS
specifies that none of the functions in a package will produce exceptions or
signals.
STDCALL
for Windows PC platforms only, indicates that all functions in the package are
called using the "_stdcall" convention.
1896 Chapter 51 / PROTO Procedure
STRUCTPACKn | PACKn
for 32 bit Windows PC platforms or the Power Architecture platform running on
Linux only, specifies that all structures in this package were compiled with the
given N-BYTE packing pragma. That is, STRUCTURE4 specifies that all
structures in the package were compiled with the “#pragma pack(4)” option.
LINK Statement
Specifies the name of the load module that contains your functions. Specifying the path is optional.
Restriction: The LINK statement is not supported for z/OS or MVS environments.
Windows The image specified in the LINK statement must not contain an extension for the
specifics: image to load.
Note:
Syntax
LINK load-module <NOUNLOAD>;
Required Argument
load-module
specifies the load module that contains your functions. You can add more LINK
statements to include as many libraries as you need for your prototypes.
Load-module can have the following forms, depending on your operating
environment:
'c:\mylibs\xxx.dll';
'c:\mylibs\xxx';
'/users/me/mylibs/xxx';
Tip If the full pathname for a module is larger than 32 bytes, use the
PROTOLIBS system option to specify the path. Then use the LINK
statement to specify the module name.
Optional Argument
NOUNLOAD
specifies that selected libraries remain loaded when the SAS session ends.
MAPMISS Statement 1897
Details
All functions must be declared externally in your load module so that SAS can find
them. For most platforms, external declaration is the default behavior for the
compiler. However, many C compilers do not export function names by default. The
following examples show how to declare your functions for external loading for
most PC compilers:
_declspec(dllexport) int myfunc(int, double);
_declspec(dllexport) int price2(int a, double foo);
CAUTION
Security Issue Calling a third-party library to a valid path for a load library that is set by
the LINK statement can cause a security issue.
Beginning with SAS 9.4M6, the SAS administrator at your site can use the
PROTOLIBS system option to control the function of the LINK statement. The
administrator can use the PROTOLIBS option to specify a list of valid paths where
modules that are specified by the LINK statement can be loaded. The administrator
can also use the PROTOLIBS option to disable the LINK statement.
The PROTOLIBS option is a restricted option. Only the SAS administrator at your
site can specify a value for it. The default value of the PROTOLIBS option is NONE,
which means that the LINK statement is disabled.
You can use the VALUE argument of PROC OPTIONS to find where you can register
load modules.
MAPMISS Statement
Specifies alternative values, by type, to pass to functions if values are missing.
Syntax
MAPMISS <POINTER=pointer-value > <INT=integer-value > <DOUBLE=double-
value>
< LONG=long-value > <SHORT=short-value>;
1898 Chapter 51 / PROTO Procedure
Optional Arguments
POINTER=pointer-value
specifies the pointer value to pass to functions for pointer values that are
missing.
Default null
INT=integer-value
specifies an integer value to pass to functions for integer values that are
missing.
DOUBLE=double-value
specifies a double value to pass to functions for double values that are missing.
LONG=long-value
specifies a long value to pass to functions for long values that are missing.
SHORT=short-value
specifies a short value to pass to functions for short values that are missing.
Details
The MAPMISS statement is used to specify alternative values, by data type or
pointer value. These values are passed to functions if values are missing. The
values are specified as arguments in the MAPMISS statement.
If you set POINTER=NULL, a null value pointer is passed to the functions for
pointer variables that are missing. If you do not specify a mapping for a type that is
used as an argument to a function, the function is not called when an argument of
that type is missing.
MAPMISS values have no affect on arrays because array elements are not checked
for missing values when they are passed as parameters to C functions.
FUNCTION-PROTOTYPE-N Statement
Registers function prototypes in the PROTO procedure.
Syntax
function-prototype-1 <function-prototype-n ...> return-type function-name
(argument-type <argument-name> / <iotype>
<argument-label>, ...) <options>;
FUNCTION-PROTOTYPE-N Statement 1899
Required Arguments
return-type
specifies a C language type for the returned value.
function-name
specifies the name of the function to be registered.
Tip Function names within a given package must be unique in the first 32
characters.
argument-type
specifies the C language type for the function argument.
You must specify argument-type for each argument in the function’s argument
list. The argument list must be enclosed in parentheses. If the argument is an
array, then you must specify the argument name prefixed to square brackets
that contain the array size (for example, double A[10]). If the size is not known
or if you want to disable verification of the length, then use type*name instead
(for example, double*A).
argument-name
specifies the name of the argument.
iotype
specifies the I/O type of the argument. Use I for input, O for output, and U for
update.
Alias IO
By default, all parameters that are pointers are assumed to be input type
U. All non-pointer values are assumed to be input type I. This behavior
parallels the C language parameter passing scheme.
argument-label
specifies a description or label for the argument.
1900 Chapter 51 / PROTO Procedure
LABEL="text-string"
specifies a description or a label for the function. Enclose the text string in
quotation marks.
Note The LABEL option can be used with the PROTO procedure.
KIND | GROUP=group-type
specifies the group that the function belongs to. The KIND= or GROUP= option
allows for convenient grouping of functions in a package.
You can use any string (up to 40 characters) in quotation marks to group similar
functions.
Note The KIND or GROUP option can be used with the PROTO procedure.
Tip The following special cases provided for Risk Dimensions do not require
quotation marks: INPUT (Instrument Input), TRANS (Risk Factor
Transformation), PRICING (Instrument Pricing), and PROJECT. The
default is PRICING.
In the following example, the allocated length of str is 10, but the current length is
5. When the string is null-terminated at the allocated length, "hello " is passed to
the function xxx:
length str $ 10;
str = "hello";
call xxx(str);
To avoid the blank padding, use the SAS TRIM function on the parameter within the
function call:
length str $ 10;
str = "hello";
call xxx(trim(str));
Function Names
External functions and FCMP functions can have the same name as long as they are
saved to different packages. When these packages are loaded, a warning message
in the log identifies which package contains the default definition for a given
1902 Chapter 51 / PROTO Procedure
function. To use a function definition from a package other than the default, call the
function using package-name.function-name.
When you load multiple packages of external functions, all function names must be
unique. If two or more external functions of the same name are loaded, the first
function that is loaded will be used. Duplicate external functions are ignored. A
warning message in the log indicates which package contains the function that will
be used, and which package contained the discarded definition.
In working with arrays, it is important to note that SAS arrays are passed to
EXTERNC routines as single dimensional arrays. They will be read as single
dimensional arrays when returning to an FCMP function.
The following example shows log output that is generated when you use arrays and
the double ** variable type.
options cmplib=work.proto_ds;
data cmat;
array ctmp[3,3] _temporary_;
array c[3,3];
output;
call sas_idmat(ctmp);
do i=1 to dim(c,1);
do j=1 to dim(c,2);
c[i,j]=ctmp[i,j];
end;
end;
output;
drop i j;
run;
b[1, 1]=1 b[1, 2]=0 b[1, 3]=0 b[2, 1]=0 b[2, 2]=1 b[2, 3]=0 b[3, 1]=0 b[3, 2]=0 b[3,
3]=1
There is no way to return and save a pointer to any type in a SAS variable. Pointers
are always dereferenced, and their contents are converted and copied to SAS
variables.
EXTERNC DOUBLE | INT | LONG | SHORT | CHAR <[*][*]> var-1 <var-2 ... var-n>;
The following table (Table 51.100 on page 1903 ) shows how these variables are
treated when they are positioned on the left side of an expression. The table shows
the automatic casting that is performed for a short type on the right side of an
assignment. (Explicit type conversions can be forced in any expression, with a
unary operator called a cast.) The table lists all the allowed combinations of short
types that are associated with SAS variables.
Note: A table for int, long, and double types can be created by substituting any of
these types for "short" in this table.
If any of the pointers are null and require dereferencing, then the result is set to
missing if there is a missing value set for the result variable. For more information,
see “MAPMISS Statement” on page 1897.
Table 51.4 Automatic Type Casting for the short Data Type in an Assignment
Statement
short short ** y = ** x
The following table shows how these variables are treated when they are passed as
arguments to an external C function.
Definitions are loaded regardless of whether they have unique names or duplicate
names. Multiple definitions of certain PROC PROTO elements (for example,
enumeration names and function prototypes) can cause name conflicts and
generate errors. To prevent name conflicts between PROC PROTO packages,
ensure that elements such as enumerated types and function definitions have
unique names.
The following example loads three PROC PROTO packages, and shows how the
order in which typedef and #define statements override one another.
This part of the example loads the first two PROC PROTO packages:
proc fcmp;
x = p1();
put "Should be 2: " x=;
run;
The result from executing the programs above is 2, because the packages are
loaded in order.
In the following example, PROC PROTO adds a third package and includes it in
PROC FCMP locally, keeping the CMPLIB= system option set as above:
proc proto package = work.p3.test3;
typedef struct { int a; int b; } AB_t;
#define NUM 3;
int p3(void);
externc p3;
int p3(void)
{
return NUM;
}
externcend;
run;
In this example, the local definition of NUM in work.p3 is used instead of the global
definitions that are loaded through work.p1 and work.p2.
Usage: PROTO Procedure 1907
In the following example, the LINKLIST structure and the GET_LIST function are
defined by using PROC PROTO. The GET_LIST function is an external C routine that
generates a linked list with as many elements as requested:
struct linklist{
double value;
struct linklist * next;
1908 Chapter 51 / PROTO Procedure
};
The following example shows how to use the ISNULL helper function to loop over
the linked list that is created by the GET_LIST function:
struct linklist list;
list = get_list(3);
put list.value=;
do while (^isnull(list.next));
list = list.next;
put list.value=;
end;
LIST.value=0
LIST.value=1
LIST.value=2
CALL SETNULL(pointer-element);
If you specify a variable that has a pointer value (a structure entry), then SETNULL
sets the pointer to null:
call setnull(12.next);
The following example assumes that the same LINKLIST structure that is described
in “ISNULL C Helper Function” on page 1907 is defined using PROC PROTO. The
SETNULL CALL routine can be used to set the next element to null:
proc proto;
struct linklist list;
call setnull(list.next);
run;
Struct_array specifies an array; index is a 1–based index as used in SAS arrays; and
struct_element points to an element in the array.
The following example consists of two parts. Copy and paste the two parts of the
example into your SAS editor, and run them as one SAS program.
In the first part of this example, the following structures and function are defined
using PROC PROTO:
options cmplib=(work.proto_ds work.fcmp_ds);
proc proto package=work.proto_ds.cfcns;
struct POINT {
short s;
int i;
long l;
double d;
};
struct POINT_ARRAY {
int length;
struct POINT * p;
char name[32];
};
struct POINT * struct_array( int );
externc struct_array;
struct POINT * struct_array( int num ) {
return(malloc(sizeof(struct POINT) * num));
}
externcend;
run;
In the second part of this example, the PROC FCMP code segment shows how to
use the STRUCTINDEX CALL routine to get and set each POINT structure element
of an array called P in the POINT_ARRAY structure:
proc fcmp;
struct point_array pntarray;
struct point pnt;
/* Get each element using the STRUCTINDEX CALL routine and set
values. */
do i = 1 to 2;
call structindex(pntarray.p, i, pnt);
put "Before setting the" i "element: " pnt=;
pnt.s = 1;
pnt.i = 2;
pnt.l = 3;
pnt.d = 4.5;
put "After setting the" i "element: " pnt=;
end;
1910 Chapter 51 / PROTO Procedure
run;
Example Code 51.2 Results from the STRUCTINDEX CALL Routine
Details
This example shows how to use PROC PROTO to prototype two external C
language functions called SPLIT and CASHFLOW. These functions are contained in
the two shared libraries that are specified by the LINK statements.
Program
options nodate pageno=1 linesize=80 pagesize=40;
proc proto package =
sasuser.myfuncs.mathfun
label = "package of math functions";
link "link-library";
link "link-library";
int split(int x "number to split")
label = "splitter function" kind=PRICING;
int cashflow(double amt, double rate, int periods,
double * flows / iotype=O)
label = "cash flow function" kind=PRICING;
run;
proc fcmp libname=sasuser.myfuncs;
array flows[20];
a = split(32);
put a;
Example: Splitter Function Example 1911
Program Description
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGNO= specifies the starting page number.
LINESIZE= specifies the output line length. PAGESIZE= specifies the number of
lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Specify the catalog entry where the function package information is saved. The
catalog entry is a three-level name.
proc proto package =
sasuser.myfuncs.mathfun
label = "package of math functions";
Specify the libraries that contain the SPLIT and CASHFLOW functions. You can
add more LINK statements to include as many libraries as you need for your
prototypes.
link "link-library";
link "link-library";
Prototype the SPLIT function. The INT statement prototypes the SPLIT function
and assigns a label to the function.
int split(int x "number to split")
label = "splitter function" kind=PRICING;
Execute the PROTO procedure. The RUN statement executes the PROTO
procedure.
run;
Call the SPLIT and CASHFLOW functions. PROC FCMP calls the SPLIT and
CASHFLOW functions. Output from PROC FCMP is created.
proc fcmp libname=sasuser.myfuncs;
array flows[20];
a = split(32);
put a;
b = cashflow(1000, .07, 20, flows);
put b;
put flows;
1912 Chapter 51 / PROTO Procedure
run;
Execute the FCMP procedure. The RUN statement executes the FCMP procedure.
run;
16
12
70 105 128.33333333 145.83333333 159.83333333 171.5 181.5 190.25 198.02777778
205.02777778
211.39141414 217.22474747 222.60936286 227.60936286 232.27602953 236.65102953
240.76867658
244.65756547 248.341776 251.841776
1913
52
PRTDEF Procedure
See Also
Chapter 53, “PRTEXP Procedure,” on page 1933
PROC PRTDEF Create printer definitions in batch mode either Ex. 1, Ex. 2,
for an individual user or for all SAS users at Ex. 3, Ex. 4,
your site Ex. 5
Syntax
PROC PRTDEF <options>;
Optional Arguments
DATA=SAS-data-set
specifies the SAS data set that contains the printer attributes.
See “Input Data Set Variables That Are Used To Create Printer
Definitions” on page 1916
DELETE
specifies that the default operation is to delete the printer definitions from the
registry.
Interaction If both DELETE and REPLACE are specified, then DELETE is the
default operation.
FOREIGN
specifies that the registry entries are being created for export to a different
host. As a consequence, tests of any host-dependent items, such as the
TRANTAB, are skipped.
LIST
specifies that a list of printers that is created or replaced is written to the log.
REPLACE
specifies that the default operation is to modify existing printer definitions. Any
printer name that already exists is modified by using the information in the
printer attributes data set. Any printer name that does not exist is added.
USESASHELP
specifies that the printer definitions are to be placed in the Sashelp library,
where they are available to all users. If the USESASHELP option is not specified,
then the printer definitions are placed in the current Sasuser library, where they
are available to the local user only.
Windows Specifics: You can create printer definitions with PROC PRTDEF in
the Windows operating environment. However, because Universal Printing is
turned off by default in Windows, these printer definitions do not appear in the
Print window. If you want to use your printer definitions when Universal Printing
is turned off, then either specify the printer definition as part of the
1916 Chapter 52 / PRTDEF Procedure
PRINTERPATH system option or, from the Output Delivery System (ODS), issue
the following code:
ODS PRINTER SAS PRINTER=myprinter;
Restriction To use the USESASHELP option, you must have permission to write
to the Sashelp catalog.
Table 52.1 Required and Optional Variable for Creating Printer Definition Records
Required
DEST Destination
DEVICE Device
MODEL Prototype
Optional
Usage: PRTDEF Procedure 1917
DESC Description
PREVIEW Preview
PROTOCOL Protocol
UNITS CM or IN units
VIEWER Viewer
Required Variables
To create or modify a printer, you must supply the NAME, MODEL, DEVICE, and
DEST variables. All the other variables use default values from the printer
prototype that is specified by the MODEL variable.
DEST
specifies the output destination for the printer.
Operating Environment Information: DEST is case sensitive for some devices.
DEVICE
specifies the type of I/O device to use when sending output to the printer. Valid
devices are listed in the Printer Definition wizard and in the SAS Registry Editor.
MODEL
specifies the printer prototype to use when defining the printer.
For a valid list of prototypes or model descriptions, you can look in the SAS
Registry Editor under CORE\PRINTING\PROTOTYPES.
Tip While in interactive mode, you can invoke the registry with the
REGEDIT command.
NAME
specifies the printer definition name that is associated with the rest of the
attributes in the printer definition.
The name is unique within a given registry. If a new printer definition contains a
name that already exists, then the record is not processed unless the REPLACE
option has been specified or unless the value of the OPCODE variable is Modify.
Restriction NAME is limited to 127 characters, must have at least one nonblank
character, and cannot contain a backslash. Leading and trailing
blanks are stripped from the name.
Optional Variables
The following variables are optional in the input data set:
Usage: PRTDEF Procedure 1919
BOTTOM
specifies the default bottom margin in the units that are specified by the UNITS
variable.
CHARSET
specifies the default font character set.
Restriction The value must be one of the character set names in the typeface
that is specified by the TYPEFACE variable.
DESC
specifies the description of the printer.
Default DESC defaults to the prototype that is used to create the printer.
FONTSIZE
specifies the point size of the default font.
HOSTOPT
specifies any host options for the output destination. The host options are not
case sensitive.
LEFT
specifies the default left margin in the units that are specified by the UNITS
variable.
LRECL
specifies the buffer size or record length to use when sending output to the
printer.
Default If LRECL is less than zero when modifying an existing printer, the
printer's buffer size is reset to the size that is specified by the printer
prototype.
OPCODE
is a character variable that specifies what action (Add, Delete, or Modify) to
perform on the printer definition.
Add
creates a new printer definition in the registry. If the REPLACE option has
been specified, then this operation will also modify an existing printer
definition.
Delete
removes an existing printer definition from the registry.
Modify
changes an existing printer definition in the registry or adds a new one.
1920 Chapter 52 / PRTDEF Procedure
PAPERIN
specifies the default paper source or input tray.
Restriction The value of PAPERIN must be one of the paper source names in the
printer prototype that is specified by the MODEL variable.
PAPEROUT
specifies the default paper destination or output tray.
PAPERSIZ
specifies the default paper source or input tray.
Restriction The value of PAPERSIZ must be one of the paper size names listed
in the printer prototype that is specified by the MODEL variable.
PAPERTYP
specifies the default paper type.
Restriction The value of PAPERTYP must be one of the paper source names
listed in the printer prototype that is specified by the MODEL
variable.
PREVIEW
specifies the printer application to use for print preview.
PROTOCOL
specifies the I/O protocol to use when sending output to the printer.
Operating Environment Information: On mainframe systems, the protocol
describes how to convert the output to a format that can be processed by a
protocol converter that connects the mainframe to an ASCII device.
RES
specifies the default printer resolution.
Restriction The value of RES must be one of the resolution values available to
the printer prototype that is specified by the MODEL variable.
Usage: PRTDEF Procedure 1921
RIGHT
specifies the default right margin in the units that are specified by the UNITS
variable.
STYLE
specifies the default font style.
Restriction The value of STYLE must be one of the styles available to the
typeface that is specified by the TYPEFACE variable.
TOP
specifies the default top margin in the units that are specified by the UNITS
variable.
TRANTAB
specifies which translation table to use when sending output to the printer.
Operating Environment Information: The translation table is needed when an
EBCDIC host sends data to an ASCII device.
TYPEFACE
specifies the typeface of the default font.
Restriction The typeface must be one of the typeface names available to the
printer prototype that is specified by the MODEL variable.
UNITS
specifies the units CM or IN that are used by margin variables.
VIEWER
specifies the host system command that is to be used during print previews. As
a result, PROC PRTDEF causes a preview printer to be created.
Preview printers are specialized printers that are used to display printer output
on the screen before printing.
WEIGHT
specifies the default font weight.
Restriction The value must be one of the valid weights for the typeface that is
specified by the TYPEFACE variable.
1922 Chapter 52 / PRTDEF Procedure
Details
This example shows you how to set up various printers.
Program
data printers;
input name $ 1-14 model $ 16-42 device $ 46-53 dest $ 57-70;
datalines;
Myprinter PostScript Level 1 (Color) PRINTER printer1
Laserjet PCL 5 (DeltaRow) PIPE lp -dprinter5
Color LaserJet PostScript Level 2 (Color) PIPE lp -dprinter2
;
proc prtdef data=printers;
run;
Program Description
Create the PRINTERS data set. The INPUT statement contains the names of the
four required variables. Each data line contains the information that is needed to
produce a single printer definition.
data printers;
input name $ 1-14 model $ 16-42 device $ 46-53 dest $ 57-70;
datalines;
Myprinter PostScript Level 1 (Color) PRINTER printer1
Laserjet PCL 5 (DeltaRow) PIPE lp -dprinter5
Color LaserJet PostScript Level 2 (Color) PIPE lp -dprinter2
;
Example 2: Creating a Ghostview Printer in SASUSER to Preview PostScript Printer Output
in SASUSER 1923
Specify the input data set that contains the printer attributes and create the
printer definitions. PROC PRTDEF creates the printer definitions for the SAS
registry, and the DATA= option specifies PRINTERS as the input data set that
contains the printer attributes.
proc prtdef data=printers;
run;
Log
Example Code 52.1 The SAS Log After Defining Printers
1 data printers;
2 input name $ 1-14 model $ 16-42 device $ 46-53 dest $ 57-70;
3 datalines;
7 ;
8 proc prtdef data=printers;
9 run;
Details
This example creates a Ghostview printer definition in the Sasuser library for
previewing PostScript output.
Program
data gsview;
name = "Ghostview";
desc = "Print Preview with Ghostview";
model= "PostScript Level 2 (Color)";
viewer = 'ghostview %s';
device = "Dummy";
dest = " ";
run;
proc prtdef data=gsview list replace;
run;
Program Description
Create the GSVIEW data set, and specify the printer name, printer description,
printer prototype, and commands to be used for print preview. The GSVIEW data
set contains the variables whose values contain the information that is needed to
produce the printer definitions. The NAME variable specifies the printer name that
will be associated with the rest of the attributes in the printer definition data
record. The DESC variable specifies the description of the printer. The MODEL
variable specifies the printer prototype to use when defining this printer. The
VIEWER variable specifies the host system commands to be used for print preview.
GSVIEW must be installed on your system and the value for VIEWER must include
the path to find it. You must enclose the value in single quotation marks because of
the %s. If you use double quotation marks, SAS will assume that %s is a macro
variable. DEVICE and DEST are required variables, but no value is needed in this
example. Therefore, a “dummy” or blank value should be assigned.
data gsview;
name = "Ghostview";
desc = "Print Preview with Ghostview";
model= "PostScript Level 2 (Color)";
viewer = 'ghostview %s';
device = "Dummy";
dest = " ";
run;
Specify the input data set that contains the printer attributes, create the printer
definitions, write the printer definitions to the SAS log, and replace a printer
definition in the SAS registry. The DATA= option specifies GSVIEW as the input
Example 3: Creating a Single Printer Definition That Is Available to All Users 1925
data set that contains the printer attributes. PROC PRTDEF creates the printer
definitions. The LIST option specifies that a list of printers that are created or
replaced will be written to the SAS log. The REPLACE option specifies that a
printer definition will replace a printer definition in the registry if the name of the
printer definition matches a name already in the registry. If the printer definition
names do not match, then the new printer definition is added to the registry.
proc prtdef data=gsview list replace;
run;
Log
Example Code 52.2 The SAS Log After Defining a GhostView Printer
10 data gsview;
11 name = "Ghostview";
12 desc = "Print Preview with Ghostview";
13 model= "PostScript Level 2 (Color)";
14 viewer = 'ghostview %s';
15 device = "Dummy";
16 dest = " ";
Details
This example creates a definition for a Tektronix Phaser 780 printer with a
Ghostview print previewer with the following specifications:
n bottom margin set to 1 inch
Program
data tek780;
name = "Tek780";
desc = "Test Lab Phaser 780P";
model = "Tek Phaser 780 Plus";
device = "PRINTER";
dest = "testlab3";
preview = "Ghostview";
units = "cm";
bottom = 2.5;
fontsize = 14;
papersiz = "ISO A4";
run;
proc prtdef data=tek780 usesashelp;
run;
Program Description
Create the TEK780 data set and supply appropriate information for the printer
destination. The TEK780 data set contains the variables whose values contain the
information that is needed to produce the printer definitions. In the example,
assignment statements are used to assign these variables. The NAME variable
specifies the printer name that is associated with the rest of the attributes in the
printer definition data record. The DESC variable specifies the description of the
printer. The MODEL variable specifies the printer prototype to use when defining
this printer. The DEVICE variable specifies the type of I/O device to use when
sending output to the printer. The DEST variable specifies the output destination
for the printer. The PREVIEW variable specifies which printer is used for print
preview. The UNITS variable specifies whether the margin variables are measured
in centimeters or inches. The BOTTOM variable specifies the default bottom margin
in the units that are specified by the UNITS variable. The FONTSIZE variable
specifies the point size of the default font. The PAPERSIZ variable specifies the
default paper size.
data tek780;
Example 4: Adding, Modifying, and Deleting Printer Definitions 1927
name = "Tek780";
desc = "Test Lab Phaser 780P";
model = "Tek Phaser 780 Plus";
device = "PRINTER";
dest = "testlab3";
preview = "Ghostview";
units = "cm";
bottom = 2.5;
fontsize = 14;
papersiz = "ISO A4";
run;
Create the TEK780 printer definition and make the definition available to all
users. The DATA= option specifies TEK780 as the input data set. The
USESASHELP option specifies that the printer definition will be available to all
users.
proc prtdef data=tek780 usesashelp;
run;
Details
This example does the following:
n adds two printer definitions
Program
data printers;
length name $ 80
model $ 80
device $ 8
dest $ 80
opcode $ 3
;
input opcode $& name $& model $& device $& dest $&;
1928 Chapter 52 / PRTDEF Procedure
datalines;
add Color PostScript PostScript Level 2 (Color) DISK
sasprt.ps
mod LaserJet 5 PCL 5 (DeltaRow) DISK
sasprt.pcl
del Gray Postscript PostScript Level 1 (Gray Scale) DISK
sasprt.ps
del test PostScript Level 2 (Color) DISK
sasprt.ps
add ColorPS PostScript Level 2 (Color) DISK
sasprt.ps
;
proc prtdef data=printers replace list;
run;
Program Description
Create the PRINTERS data set and specify which actions to perform on the
printer definitions. The PRINTERS data set contains the variables whose values
contain the information that is needed to produce the printer definitions. The
MODEL variable specifies the printer prototype to use when defining this printer.
The DEVICE variable specifies the type of I/O device to use when sending output
to the printer. The DEST variable specifies the output destination for the printer.
The OPCODE variable specifies which action (add, delete, or modify) to perform on
the printer definition. The first Add operation creates a new printer definition for
Color PostScript in the SAS registry. The second Add operation creates a new
printer definition for ColorPS in the SAS registry. The Mod operation modifies the
existing printer definition for LaserJet 5 in the registry. The Del operation deletes
the printer definitions for test from the registry. The & specifies that two or more
blanks separate character values. This allows the name and model value to contain
blanks.
data printers;
length name $ 80
model $ 80
device $ 8
dest $ 80
opcode $ 3
;
input opcode $& name $& model $& device $& dest $&;
datalines;
add Color PostScript PostScript Level 2 (Color) DISK
sasprt.ps
mod LaserJet 5 PCL 5 (DeltaRow) DISK
sasprt.pcl
del Gray Postscript PostScript Level 1 (Gray Scale) DISK
sasprt.ps
del test PostScript Level 2 (Color) DISK
sasprt.ps
add ColorPS PostScript Level 2 (Color) DISK
sasprt.ps
;
Example 5: Deleting a Single Printer Definition 1929
Create multiple printer definitions and write them to the SAS log. The DATA=
option specifies the input data set PRINTERS that contains the printer attributes.
PROC PRTDEF creates five printer definitions, two of which have been deleted. The
LIST option specifies that a list of printers that are created or replaced will be
written to the log.
proc prtdef data=printers replace list;
run;
Log
Example Code 52.3 The SAS Log After Modifying and Deleting Printers
15 data printers;
16 length name $ 80
17 model $ 80
18 device $ 8
19 dest $ 80
20 opcode $ 3
21 ;
22 input opcode $& name $& model $& device $& dest $&;
23 datalines;
29 ;
30 proc prtdef data=printers list replace;
31 run;
Details
This example shows you how to delete a printer from the registry.
Program
data deleteprt;
name='printer1';
run;
proc prtdef data=deleteprt delete list;
run;
Program Description
Create the DELETEPRT data set. The NAME variable contains the name of the
printer to delete.
data deleteprt;
name='printer1';
run;
Delete the printer definition from the registry and write the deleted printer to the
log. The DATA= option specifies DELETEPRT as the input data set. PROC PRTDEF
creates printer definitions for the SAS registry. DELETE specifies that the printer is
to be deleted. LIST specifies to write the deleted printer to the log.
proc prtdef data=deleteprt delete list;
run;
Example 5: Deleting a Single Printer Definition 1931
Log
Example Code 52.4 The SAS Log After Deleting a Single Printer
45 data deleteprt;
46 name='printer1';
47 run;
53
PRTEXP Procedure
If you write printer definitions to a SAS data set, you can later replicate and modify
them. You can then use PROC PRTDEF to create the printer definitions in the SAS
registry from your input data set. For a complete discussion of PROC PRTDEF and
the variables and attributes that are used to create the printer definitions, see
“Input Data Set Variables That Are Used To Create Printer Definitions” on page
1916.
1934 Chapter 53 / PRTEXP Procedure
See Also
Chapter 52, “PRTDEF Procedure,” on page 1913
PROC PRTEXP Obtain printer attributes from the SAS registry Ex. 1, Ex. 2
Syntax
PROC PRTEXP <options>;
SELECT Statement 1935
Optional Arguments
USESASHELP
specifies that SAS search only the SASHELP portion of the registry for printer
definitions.
Default The default is to search both the SASUSER and SASHELP portions of
the registry for printer definitions.
OUT=SAS-data-set
specifies the SAS data set that contains the printer definitions.
The data set that is specified by the OUT=SAS-data-set option is the same type
of data set that is specified by the DATA=SAS-data-set option in PROC PRTDEF
to define each printer.
EXCLUDE Statement
Names the printers whose information does not appear in output.
Syntax
EXCLUDE printer(s);
Required Argument
printer(s)
specifies one or more printers that you do not want the output to contain
information about.
SELECT Statement
Names the printers whose information is contained in the output.
Syntax
SELECT printer(s);
1936 Chapter 53 / PRTEXP Procedure
Required Argument
printer(s)
specifies one or more printers that you would like the output to contain
information about.
Details
This example shows you how to write the attributes that are used to define a
printer to the SAS log.
Program
Specify the printer that you want information about, specify that only the
SASHELP portion of the registry be searched, and write the information to the
SAS log. The SELECT statement specifies that you want the attribute information
that is used to define the printer PostScript to be included in the output. The
USESASHELP option specifies that only the SASHELP registry is to be searched for
PostScript's printer definitions. The data that is needed to define each printer is
written to the SAS log because the OUT= option was not used to specify a SAS
data set.
proc prtexp usesashelp;
select postscript;
run;
Example 2: Writing Attributes to a SAS Data Set 1937
Log
Example Code 53.1 The SAS Log After Extracting Printer Information from the SASHELP
Portion of the Registry
NAME: PostScript
MODEL: PostScript Level 1 (Color)
DEVICE: DISK
DEST: sasprt.ps
HOSTOPT:
PROTOCOL:
TRANTAB:
DESC: Generic PostScript Level 1 Printer
PREVIEW: Adobe Reader
VIEWER:
PAPERSIZ:
PAPERTYP:
PAPERIN:
PAPEROUT:
RES: 300 DPI
TOP: 0.50
LEFT: 0.50
RIGHT: 0.50
BOTTOM: 0.50
UNITS: IN
TYPEFACE: <MTmonospace>
WEIGHT: Normal
STYLE: Regular
CHARSET: Western
FONTSIZE: 8.00
LRECL: .
Details
This example shows you how to create a SAS data set that contains the data that
PROC PRTDEF would use to define the printers PCL4, PCL5, PCL5E, and PCLC.
1938 Chapter 53 / PRTEXP Procedure
Program
Specify the printers that you want information about and create the PRDVTER
data set. The SELECT statement specifies the printers PCL4, PCL5, PCL5E, and
PCLC. The OUT= option creates the SAS data set PRDVTER, which contains the
same attributes that are used by PROC PRTDEF to define the printers PCL4, PCL5,
PCL5E, and PCLC. SAS will search both the SASUSER and SASHELP registries,
because USESASHELP was not specified.
proc prtexp out=PRDVTER;
select pcl4 pcl5 pcl5e pcl5c;
run;
Output
The following data set is a partial view of the Prdvter data set that contains 26
variables and four observations.
54
PWENCODE Procedure
The encoded password is never written to the SAS log in plain text. Instead, each
character of the password is replaced by an X in the SAS log.
Encoding methods for PROC PWENCODE are now SAS001 – SAS005. Starting in
SAS 9.4M5, PROC PWENCODE provides stronger password protection using the
SAS005 method of encoding.
Password protection is an important part of your security strategy, but you should
not rely only on password protection for all your data security needs; a determined
Concepts: PWENCODE Procedure 1941
and knowledgeable attacker can break passwords. Data should also be protected
by other security controls such as file system permissions, other access control
mechanisms, and encryption of data at rest and in transit.
Encoding Methods
Starting in SAS 9.4M5,the SAS005 method for encoding passwords is added. When
SAS005 is specified for PROC PWENCODE, a more secure 256-bit fixed key is
generated. SAS005, like SAS004, uses a 256-bit fixed key plus a 64-bit random
salt. However, it is hashed for additional iterations.
sas002, which can also SASProprietary, which is Uses a 32-bit fixed key.
be specified as sasenc included in SAS software.
IMPORTANT Starting with SAS 9.4M8, SAS Foundation servers use the
cryptographic libraries provided and installed on the operating system to
provide encryption for data at rest and data in motion. With this change, SAS
no longer provides the cryptographic libraries for SAS Foundation servers as
part of the SAS Installation. AES encryption is supported using the operating
system cryptographic libraries. Prior to SAS 9.4M8, AES encryption was
supported as part of SAS/SECURE.
For more information, see “Cryptographic Library Support Starting with SAS
9.4M8” in Encryption in SAS.
1942 Chapter 54 / PWENCODE Procedure
Note: The METHOD= option supports the SAS003, SAS004, and SAS005 values.
Prior to SAS 9.4M8, you needed SAS/SECURE to support SAS003–SAS005. With
SAS 9.4M8, SAS003-SAS005 are supported using the operating system’s
cryptopgraphic libraries. SAS Proprietary encoding that supports SAS002 is
available with all SAS software. For more information, see SAS/SECURE.
Syntax
PROC PWENCODE IN='password' <OUT=fileref> <METHOD=encoding-method>;
Required Argument
IN='password'
specifies the password to encode. The password can contain up to a maximum
of 512 characters, which include alphanumeric characters, spaces, and special
characters.
Note: Data set passwords must follow SAS naming rules. If the IN=password
follows SAS naming rules, it can also be used for SAS data sets. For information
PROC PWENCODE Statement 1943
about SAS naming rules, see “Rules for Most SAS Names” in SAS Language
Reference: Concepts.
If the password contains embedded single or double quotation marks, use the
standard SAS rules for quoting character constants. These rules can be found in
the SAS Constants in Expressions chapter of SAS Language Reference: Concepts.
Optional Arguments
OUT=fileref
specifies a fileref to which the output string is to be written. If the OUT= option
is not specified, the output string is written to the SAS log.
is set to the value that is written to the OUT= fileref or to the value that is
displayed in the SAS log.
METHOD=encoding-method
specifies the encoding method. Here are the supported values for encoding-
method.
n SAS001
n SAS002
n SAS003
n SAS004
n SAS005
The SAS003, SAS004, and SAS005 encoded passwords use a 256-bit fixed key
plus a random salt value that is applied to the encoding method. Therefore, each
time you use PROC PWENCODE to encode the same password, you get a
different encoded password, because the salt values are random.
For more information about each of these encoding methods, see “Encoding
Methods ” on page 1941.
1944 Chapter 54 / PWENCODE Procedure
Note: The METHOD= option supports the SAS003, SAS004, and SAS005
values. Prior to SAS 9.4M8, you needed SAS/SECURE to provide support of
SAS003-SAS005 encoding methogs. With SAS 9.4M8, SAS003-SAS005 are
supported using the operating system’s cryptopgraphic libraries. SAS
Proprietary encoding that supports SAS002 is available with all SAS software.
For more information, see SAS/SECURE.
If the METHOD= option is omitted, the default encoding method is used. The
default method is sas002 in most cases. SAS002 is also the default method used
if you specify an invalid method.
When the FIPS 140-2 compliance option, -encryptfips, is specified, the encoding
method defaults to sas003. For more information about FIPS, see “FIPS 140-2
Standards Compliance” in Encryption in SAS.
Details
This example shows a simple case of encoding a password and writing the encoded
password to the SAS log.
Program
Encode the password.
proc pwencode in='my password';
run;
Log
Note that each character of the password is replaced by an X in the SAS log.
Example 2: Using an Encoded Password in a SAS Program 1945
{SAS002}DBCC571245AD0B31433834F80BD2B99E16B3C969
Details
This example illustrates the following:
n encoding a password and saving it to an external file
n reading the encoded password with a DATA step, storing it in a macro variable,
and using it in a SAS/ACCESS LIBNAME statement
Program Description
Declare a fileref.
filename pwfile
'external-filename';
Encode the password and write it to the external file. The OUT= option specifies
which external fileref the encoded password is written to.
1946 Chapter 54 / PWENCODE Procedure
Program Description
Declare a fileref for the encoded-password file.
filename pwfile
'external-filename';
Set the SYMBOLGEN SAS system option. This step shows that the actual
password cannot be revealed, even when the macro variable that contains the
encoded password is resolved in the SAS log. This step is not required in order for
the program to work properly.
options symbolgen;
Read the file and store the encoded password in a macro variable. The DATA step
stores the encoded password in the macro variable DBPASS.
data _null_;
infile pwfile truncover;
input line :$50.;
call symputx('dbpass',line);
run;
Use the encoded password to access a DBMS. You must use double quotation
marks (“ ”) so that the macro variable resolves properly.
libname x odbc dsn=SQLServer user=testuser password="&dbpass";
Example 3: Saving an Encoded Password to the Paste Buffer 1947
Log
DETAILS
This example saves an encoded password to the paste buffer. You can then paste
the encoded password into another SAS program or into the password field of an
authentication dialog box.
1948 Chapter 54 / PWENCODE Procedure
Program
filename clip clipbrd;
proc pwencode in='my password' out=clip;
run;
Program Description
Declare a fileref with the CLIPBRD access method.
filename clip clipbrd;
Encode the password and save it to the paste buffer. The OUT= option saves the
encoded password to the fileref that was declared in the previous statement.
proc pwencode in='my password' out=clip;
run;
Log
Note that each character of the password is replaced by an X in the SAS log.
24
25 filename clip clipbrd;
26 proc pwencode in=XXXXXXXXXXXXX out=clip;
27 run;
Details
This example shows a simple case of encoding a password using the sas003
encoding method and writing the encoded password to the SAS log. SAS003 uses a
16-bit salt to encode a password.
Example 5: Specifying Method= SAS005 to Encode a Password 1949
Program
Encode the password using SAS003. The encoded password is a 256-bit key with a
16 bit random salt.
proc pwencode in='mypassword' method=sas003;
run;
Log
Note that each character of the password is replaced by an X in the SAS log.
SAS003 encoding uses AES encryption plus a 16-bit salt. Because SAS003 uses
random salting, each time you run the following code, a different password is
generated.
{SAS003}4837B146585CED2C9FED14A3C946D68E4389
Details
This example shows a simple case of encoding a password using the sas005
encoding method and writing the encoded password to the SAS log. SAS005 uses a
256-bit fixed key that uses a 64-bit random salt to encode the password.
Program
Encode the password using SAS005.
proc pwencode in='mypassword' method=sas005;
run;
1950 Chapter 54 / PWENCODE Procedure
Log
Note that each character of the password is replaced by an X in the SAS log.
SAS005 encoding uses AES encryption with a 256-bit fixed key and a 64-bit
random salt value. SAS005 increases security for stored passwords by using the
SHA-256 hashing algorithm and is hashed for additional iterations. Because
SAS005 uses random salting, each time you run the following code, a different
password is generated.
{SAS005}ADD8AB7108595A7D1A69190D78CDFE6145C1EB849CC7A43D
55
QDEVICE Procedure
Six different reports are available. These reports summarize information such as
color support, default output sizes, margin sizes, resolution, supported fonts,
hardware symbols, hardware fill types, hardware line styles, and device options.
You can send the output of this procedure to the SAS log or to an output SAS data
set.
n WINPRTM (monochrome)
If SAS starts with Universal Printing active on Windows, the default printer report
is for the default SAS universal printer and not a Windows printer.
See “Example 4: Generate a Report for the Default Printer” on page 1989 for
example reports.
PROC QDEVICE Statement 1953
PROC QDEVICE
<REPORT=GENERAL | FONT | DEVOPTION | LINESTYLE | RECTANGLE |
SYMBOL>
<OUT=SAS-data-set>
<CATALOG=catalog-name>
<DEVLOC=GDEVICEn | SASHELP | _ALL_ | libref>
<REGISTRY=SASHELP | SASUSER>
<SUPPORT=YES | NO | ALL>
<UNITS=IN | CM;>
DEVICE <device-name(s)> <_ALL_> <_HTML_> <_LISTING_> <_RTF_>;
PRINTER <printer-name(s)> <_ALL_> <_PCL_> <_PDF_> <_PRINTER_> <_PS_>;
VAR variable-1 <variable-2 …>;
PROC QDEVICE Specify an (optional) output data set, which Ex. 1, Ex. 2,
report to generate, which locations to search, Ex. 3, Ex. 4,
whether to list supported or non-supported Ex. 5, Ex. 6,
features, and sizing information Ex. 7
Syntax
PROC QDEVICE
<REPORT=GENERAL | FONT | DEVOPTION | LINESTYLE | RECTANGLE |
SYMBOL>
<OUT=SAS-data-set>
<CATALOG=catalog-name>
<DEVLOC=GDEVICEn | SASHELP | _ALL_ | libref>
<REGISTRY=SASHELP | SASUSER>
<SUPPORT=YES | NO | ALL>
<UNITS=IN | CM>;
Optional Arguments
CATALOG=catalog-name
specifies the name of a SAS device catalog to search for a device.
Aliases C=
CAT=
Default DEVICES
Interaction The CATALOG= option works with the DEVLOC= option. When you
specify the CATALOG= option, SAS looks in the library that is
specified by the DEVLOC= option (for example, sashelp.mycatalog).
PROC QDEVICE Statement 1955
Example “Example 7: Specify a User Library and Catalog for a Report” on page
1997
GDEVICEn
specifies to search one of the SAS/GRAPH device libraries for a device. n can
be 0–9.
SASHELP
specifies to search the Sashelp library for a device.
_ALL_
specifies to search the libraries Gdevice0 – Gdevice9 and the Sashelp library,
in this order, for a device. All occurrences of a device from any of these
libraries are reported.
libref
specifies a valid SAS library to search.
Defaults If you do not specify the DEVLOC= option, libraries are searched in
the following order:
1. Gdevice0–Gdevice9
2. Sashelp
Interaction The DEVLOC= option works with the CATALOG= option. When you
specify the CATALOG= option, SAS looks in the library that is
specified by the DEVLOC= option (for example, sashelp.mycatalog).
Example “Example 7: Specify a User Library and Catalog for a Report” on page
1997
OUT=SAS-data-set
specifies an output SAS data set for the report.
REGISTRY=SASHELP | SASUSER
specifies which section of the SAS registry to search when querying a universal
printer.
SASHELP
search the SASHELP section of the registry.
SASUSER
search the SASUSER section of the registry.
Alias REG
DEVOPTION
produces a report of the hardware device options supported by the specified
device.
FONT
produces a report of all system and device-resident fonts supported by the
specified device or printer.
GENERAL
produces a report of general information about the specified device or
printer. This report includes information such as destination, margin sizes,
default font information, resolution, color information, and size by pixels.
This is the default report.
LINESTYLE
produces a report of the hardware line styles supported by the specified
device.
RECTANGLE
produces a report of the hardware fill types supported by the specified
device.
SYMBOL
produces a report of the hardware symbols supported by the specified
device.
Default GENERAL
See See “Valid Variables for All Reports” on page 1960 for the descriptions
of the variables that are included in each report.
SUPPORT=YES | NO | ALL
specifies whether to report only supported features, only unsupported features,
or all features.
YES
reports only hardware features and options that are supported.
NO
reports only the hardware features and options that are not supported.
DEVICE Statement 1957
ALL
reports both supported and unsupported hardware features and options.
Default YES
UNITS=IN | CM
specifies whether the values for the certain variables are reported in inches (IN)
or centimeters (CM) in the GENERAL report.
The HEIGHT, WIDTH, LEFT, LMIN, RIGHT, RMIN, BOTTOM, BMIN, TOP, TMIN,
HRES, and VRES variables are reported in inches or centimeters. HRES and
VRES values are reported as pixels-per-inch or pixels-per-centimeter.
Default IN
DEVICE Statement
Specifies which SAS/GRAPH devices to generate a report for.
Requirement: You must specify at least one device name, _ALL_, _HTML_, _LISTING_, or _RTF_.
Syntax
DEVICE <device-name(s)> <_ALL_> <_HTML_> <_LISTING_> <_RTF_>;
Optional Arguments
device-name(s)
specifies the device for which you want to generate a report. Separate device
names with a blank space. Enclose device names that contain spaces in
quotation marks. You can use the wildcard characters * and ? to report all
devices with similar names.
*
* represents any number of characters in that position of the device name.
Note You can specify the * and ? wildcard characters in the same
device name.
Example 'svg*'
1958 Chapter 55 / QDEVICE Procedure
?
? represents one character in the device name. You can specify multiple
consecutive ? characters in the device name.
Note You can specify the * and ? wildcard characters in the same
device name.
Example 'tiff?300'
_ALL_
generates reports for all devices.
_HTML_
determines the default device that is used by the ODS HTML destination and
generates a report for that device. The device is based on the default HTML
version that is assigned in the ODS key of the SAS registry.
_LISTING_
determines the default device that is used by the ODS Listing destination and
generates a report for that device. The default value is a host–specific display
device.
_RTF_
determines the default device that is used by the ODS RTF destination and
generates a report for that device.
PRINTER Statement
Specifies which universal printers to generate a report for.
Requirement: You must specify at least one printer name, _ALL_, _PCL_, _PDF_, _PRINTER_, or
_PS_.
Syntax
PRINTER <printer-name(s)> <_ALL_> <_PCL_> <_PDF_> <_PRINTER_> <_PS_>;
Optional Arguments
printer-name(s)
specifies the universal printers for which you want to generate a report. If the
printer name contains spaces, enclose the printer name in quotation marks.
PRINTER Statement 1959
Separate printer names with a blank space. You can use the wildcard characters
* and ? to report all printers with similar names.
*
indicates to report all printers that match any number of characters in the
position of the * in the printer name.
Note You can specify the * and ? wildcard characters in the same
printer name.
Example 'pcl*' reports on the printers pcl4, pcl5, pcl5c, and pcl5e
?
indicates to report all printers that match the printer name, where the
character in the ? position can be any character. You can use multiple ?
characters in printer-name to represent the same number of characters in the
same position in the printer name.
Note You can specify the * and ? wildcard characters in the same
printer name.
Example 'tiff?' reports on the printers tiffa and tiffk. It does not report on
the printer tiff because the tiff printer is only four characters.
_ALL_
generates reports for all universal printers.
_PCL_
determines the default printer that is used by the ODS PCL destination and
generates a report for that printer.
_PDF_
determines the default printer that is used by the ODS PDF destination and
generates a report for that printer.
_PRINTER_
determines the default printer that is used by the ODS PRINTER destination and
generates a report for that printer.
Windows By default, SAS uses Windows printing and not Universal Printing.
specifics When SAS uses Windows printing, the report that is generated when
you specify the _PRINTER_ argument has information for the SAS
printer interface device that is associated with the default Windows
printer. The default Windows printer is specified by the SYSPRINT=
system option. The SAS printer interface devices are WINPRTC
(color), WINPRTG (gray scale), or WINPRTM (monochrome). The
1960 Chapter 55 / QDEVICE Procedure
report displays the printer interface device in the Name field and the
printer name in the Description field.
_PS_
determines the default printer that is used by the ODS PS destination and
generates a report for that printer.
VAR Statement
Specifies which variables to include in a report. The order of the variables in the report is determined
by the order in which they are specified in the VAR statement.
Default: If you do not specify a VAR statement, all of the variables for the report are
included in a default order.
Tip: If you specify the VAR statement, you must specify at least one variable.
Otherwise, the statement is ignored.
Syntax
VAR variable-1 <variable-2 …>;
LOCATION
for the device entry that was found, displays the physical location of the
Gdevice0-Gdevice9 library, the Sashelp library, or the library that is specified by
the DEVLOC= option. For universal printers, this variable displays the SAS
registry (SASHELP or SASUSER) where the printer was found.
NAME
displays the name of the device or printer.
TYPE
displays the type of device or printer. Here are the types of devices and printers:
n Graph Device
n Shortcut Device
n System Display
n System Metafile
n Universal Previewer
n Universal Printer
VAR Statement 1961
BIT
displays the bit position in the DEVOPTS string for the corresponding device
option.
BITSTRING
displays the bit pattern of the corresponding device option.
DESC
displays the default description of the device or printer.
LOCATION
for the device entry that was found, displays the physical location of the
Gdevice0-Gdevice9 library, the Sashelp library, or the library that is specified by
the DEVLOC= option.
NAME
displays the name of the device or printer.
ODESC
displays the descriptions of the hardware options in effect for the device.
OPTION
displays the names of the hardware options in effect for the device.
SUPPORT
displays the device options.
TYPE
displays the type of device or printer.
n Shortcut Device
n System Display
1962 Chapter 55 / QDEVICE Procedure
n System Metafile
ALIAS
reports an alternate name for a font that is registered by the FONTREG
procedure.
DESC
displays the default description of the device or printer.
FONT
displays the name of the default font.
FSTYLE
displays the font style, such as Roman or Italic, for each font and font weight, in
an output data set.
Restriction When the report output is directed to the SAS log, the FONT report
displays only font family names, such as Courier, Helvetica, Times,
and so on. The specific font style is not reported.
Note If the font name is acquired from a CHARREC list in a device entry,
the style is not available. See “CHARREC Device Parameter” in
SAS/GRAPH: Reference for more information.
FTYPE
displays the type of font, such as Printer Resident, System, or Software.
Note The values for the FTYPE variable in the output data set are Printer
Resident, System, or Software. The value Software appears only in a
FONT report for a SAS/GRAPH device that has hardware font support
disabled.
FWEIGHT
displays the font weight, such as Normal or Bold, for each font and font style, in
an output data set.
Restriction When the report output is directed to the SAS log, the FONT report
displays only font family names, such as Courier, Helvetica, Times,
and so on. The specific font weight is not reported.
VAR Statement 1963
Note If the font name is acquired from a CHARREC list in a device entry,
the weight is not available. See “CHARREC Device Parameter” in
SAS/GRAPH: Reference for more information.
FVERSION
specifies the font version.
Restriction When the report output is directed to the SAS log, the FONT report
displays only font family names, such as Courier, Helvetica, Times,
and so on. The specific font version is not reported.
LOCATION
for the device entry that was found, displays the physical location of the
Gdevice0-Gdevice9 library, the Sashelp library, or the library that is specified by
the DEVLOC= option. For universal printers, this variable displays the SAS
registry (SASHELP or SASUSER) where the printer was found.
NAME
displays the name of the device or printer.
TYPE
displays the type of device or printer. Here are the types of devices and printers:
n Graph Device
n Shortcut Device
n System Display
n System Metafile
n Universal Previewer
n Universal Printer
ALIAS
reports an alternate name for a font that is registered by the FONTREG
procedure.
ANIMATION
specifies whether animation is active, enabled, disabled, or unsupported for a
Universal Printer:
Active indicates that the graphs in the ODS HTML output are
grouped together in the animation. Animate=Start and
Animate=Stop are ignored.
1964 Chapter 55 / QDEVICE Procedure
BMIN
displays the minimum size of the bottom margin.
BOTTOM
displays the current size of bottom margin.
CLRSPACE
displays the type of color support (color space) such as RGB, RGBA, CMYK,
HLS, and so on.
COLS
displays the number of horizontal columns in the output.
COMPRESSION
indicates the condition under which compression is used. Compression can
always be in effect, in effect only when specified by a compression option, or
never used by the device or printer.
COMPMETHOD
indicates the compression method that is used if compression is supported by
device or printer.
DESC
displays the default description of the device or printer.
DEST
displays the default destination of the device or universal printer if the device or
printer does not send output directly to a printer or a display device. If the
device sends output directly to a printer or a display device, the value of DEST
is blank.
VAR Statement 1965
EMBEDDING
indicates whether font embedding is supported.
FHEIGHT
displays the height, in the respective units, of the default font.
Note If the font name is acquired from a CHARREC list in a device entry, the
height is not available. See “CHARREC Device Parameter” in SAS/GRAPH:
Reference for more information.
FONT
displays the name of the default font.
FORMAT
displays the output format type (for example, EMF, EMF Plus, EMF Dual,
PostScript, GIF, Host Display, and so on).
FSTYLE
displays the style of the default font (for example, Roman, Regular, and so on).
Note If the font name is acquired from a CHARREC list in a device entry,
the style is not available. See “CHARREC Device Parameter” in
SAS/GRAPH: Reference for more information.
FWEIGHT
displays the weight of the default font (for example, Normal, Medium, and so
on).
Note If the font name is acquired from a CHARREC list in a device entry,
the weight is not available. See “CHARREC Device Parameter” in
SAS/GRAPH: Reference for more information.
FVERSION
specifies the version of the font.
HEIGHT
displays the default vertical height of output (in UNITS) sent to the device or
printer.
HRES
displays the horizontal resolution (pixels per UNIT) of output sent to the device
or printer. Horizontal resolution is calculated by the formula HRES=XPIXELS/
WIDTH.
Interaction If either the HRES or VRES variables are specified in the VAR
statement, the horizontal and vertical resolutions are displayed
together in the SAS log using the label XxY Resolution. In an output
data set, HRES and VRES are reported separately.
IOTYPE
displays the type of input/output used by the device or printer (for example,
DISK, PRINTER, PIPE, GTERM, and so on).
LEFT
displays the size of the left margin of output.
LMIN
displays the minimum left margin.
LOCATION
for the device entry that was found, displays the physical location of the
Gdevice0-Gdevice9 library, the Sashelp library, or the library that is specified by
the DEVLOC= option. For universal printers, this variable displays the SAS
registry (SASHELP or SASUSER) where the printer was found.
MAXCOLORS
displays the maximum number of colors that are supported by the device or
printer.
MODULE
specifies the name of the device driver module.
NAME
displays the name of the device or printer.
VAR Statement 1967
PROTOTYPE
displays the prototype (model) that was used to define the universal printer.
RIGHT
displays the size of the right margin.
RMIN
displays the minimum size of the right margin.
ROWS
displays the number of vertical rows in the output.
TMIN
displays the minimum top margin of output.
TOP
displays the size of the top margin.
TYPE
displays the type of device or printer. Here are the types of devices and printers:
n Graph Device
n Shortcut Device
n System Display
n System Metafile
n Universal Previewer
n Universal Printer
UNITS
displays the units (IN for inches or CM for centimeters) in which sizes are
displayed. In the SAS log, the value of UNITS appears respectively, as inches or
centimeters. In an output data set, the value of UNITS appears as IN or CM.
Interaction If the VAR statement does not specify any variables for size,
margins, or resolution, the SAS log shows the units that are used to
measure size, margins or resolution. Here is an example:
Name: EMF
Units: inches
VISUAL
displays the visual color type (for example, Indexed Color, Direct Color, True
Color, Monochrome, or Gray Scale).
VRES
displays the vertical resolution (pixels per UNIT) of output sent to the device or
printer.
Interaction If either the HRES or VRES variables are specified in the VAR
statement, the horizontal and vertical resolutions are displayed
together in the SAS log using the label XxY Resolution. In an output
data set, HRES and VRES are reported separately.
WIDTH
displays the width of output (in UNITS) sent to device or printer.
XPIXELS
displays the width of the output in pixels.
YPIXELS
displays the height of the output in pixels.
DESC
displays the default description of the device or printer.
LINE
displays the line styles supported by the device or printer.
Interaction In a SAS log LINESTYLE report, the LINE and SUPPORT variables
are reported together. If either the LINE variable or the SUPPORT
variable is specified in the VAR statement, the line styles are
reported using the Supported Line Styles or Unsupported Line
Styles variable labels.
LOCATION
displays the physical location of the Gdevice0-Device9 or Sashelp library that
contains the Devices catalog where the device entry was found or the library
that is specified by the DEVLOC= option.
NAME
displays the name of the device.
SUPPORT
displays the device lines styles.
TYPE
displays the type of device or printer:
n Graph Device
n Shortcut Device
n System Display
n System Metafile
DESC
displays the default description of the device or printer.
FILL
displays the hardware fill types that are supported by the device.
Interaction In a SAS log RECTANGLE report, the FILL and SUPPORT variables
are reported together. If either the FILL variable or the SUPPORT
variable is specified in the VAR statement, the fill names are
reported using either the label Supported Hardware Fills or the label
Unsupported Hardware Fills.
LOCATION
displays the physical location of the Gdevice0-Device9 or Sashelp library that
contains the Devices catalog where the device entry was found or the library
that is specified by the DEVLOC= option.
NAME
displays the name of the device.
1970 Chapter 55 / QDEVICE Procedure
SUPPORT
displays the hardware fills.
TYPE
displays the type of device or printer.
n Shortcut Device
n System Display
n System Metafile
DESC
displays the default description of the device or printer.
LOCATION
displays the physical location of the Gdevice0-Device9 or Sashelp library that
contains the Devices catalog where the device entry was found or the library
that is specified by the DEVLOC= option.
NAME
displays the name of the device.
SUPPORT
displays the device or printer symbols.
SYMBOL
specifies the name of hardware symbols.
TYPE
displays the type of device or printer.
n Shortcut Device
n System Display
n System Metafile
n DESC
n TYPE
n LOCATION
For a description of the variables, see “Valid Variables for All Reports” on page
1960.
1972 Chapter 55 / QDEVICE Procedure
The following table lists the variables that you can use in a GENERAL report as
well as the labels for the variables that are used either in the SAS log or the output
data set. If you do not specify the VAR statement, the variables appear in the order
in which they appear in the table.
1 If the type of units is displayed with the value of a variable, such as 0 inches for the left margin, the
Units label is not displayed in the output to the SAS log.
16 proc qdevice;
17 printer svg;
18 run;
Name: SVG
Description: Scalable Vector Graphics 1.1
Module: SASPDSVG
Type: Universal Printer
Registry: SASHELP
Prototype: SVG 1.1
Default Typeface: Cumberland AMT
Typeface Alias: Courier
Font Style: Regular
Font Weight: Normal
Font Height: 8 points
Font Version: Version 1.03
Maximum Colors: 16777216
Visual Color: Direct Color
Color Support: RGBA
Destination: sasprt.svg
I/O Type: DISK
Data Format: SVG
Height: 6.25 inches
Width: 8.33 inches
Ypixels: 600
Xpixels: 800
Rows(vpos): 50
Columns(hpos): 114
Left Margin: 0 inches
Minimum Left Margin: 0 inches
Right Margin: 0 inches
Minimum Right Margin: 0 inches
Bottom Margin: 0 inches
Minimum Bottom Margin: 0 inches
Top Margin: 0 inches
Minimum Top Margin: 0 inches
XxY Resolution: 96x96 pixels per inch
Compression Enabled: Never
Compression Method: Deflate
Font Embedding: Option
Animation: Enabled
The following table lists the variables that you can use in a FONT report as well as
the labels for the variables that are used either in the SAS log or the output data
1976 Chapter 55 / QDEVICE Procedure
set. If you do not specify the VAR statement, the variables appear in the order in
which they appear in the table.
If the VAR statement specifies one or more of the variables FONT, FTYPE, FSTYLE,
FWEIGHT, or FVERSION, the SAS log reports only the font type labels and the font
family names. The font styles, weights, and versions are not reported to the SAS
log. The font type label that appears is dependent on the font type. Some example
labels are Supported Font Typefaces, Supported Resident Typefaces, Supported
TrueType Typefaces, and Supported Type1 Typefaces.
Usage: QDEVICE Procedure 1977
Name: ACTIVEX
Description: ActiveX enabled GIF Driver
Type: Graph Device
Device Catalog: your-font-catalog-path
Supported Font Typefaces: System (7x16) 8pt
System (9x20) 10pt
Terminal (8x12) 7pt
Terminal (4x6) 4pt
Terminal (5x12) 7pt
Terminal (6x8) 5pt
Terminal (7x12) 7pt
Terminal (10x18) 11pt
Terminal (12x16) 10pt
Fixedsys (8x15) 7pt
Fixedsys (10x20) 11pt
Modern
Roman
Script
Courier (8x13) 8pt
The following table lists the variables that you can use in a DEVOPTION report as
well as the labels for the variables that are used either in the SAS log or the output
data set. If you do not specify the VAR statement, the variables appear in the order
in which they appear in the table.
1978 Chapter 55 / QDEVICE Procedure
Bit Position: 1
Bit Pattern: 4000000000000000
Device Option: GDPIEFILL
Option Description: Device has hardware pie-fill capability
Support: Yes
Bit Position: 3
Bit Pattern: 1000000000000000
Device Option: GDCRT
Option Description: Hardware is a CRT or the device acts like a CRT
Support: Yes
Bit Position: 5
Bit Pattern: 0400000000000000
Device Option: GDPOLYGONFILL
Option Description: Device has polygonfill capability
Support: Yes
Bit Position: 7
Bit Pattern: 0100000000000000
Device Option: GDRGB
Option Description: Hardware is capable of defining colors in one or more color
spaces
Support: Yes
Bit Position: 8
Bit Pattern: 0080000000000000
Device Option: GDMBPOLY
Option Description: Hardware can draw polygons with multiple boundaries
Support: Yes
Bit Position: 9
Bit Pattern: 0040000000000000
Device Option: GDOPACITY
Option Description: Hardware is capable of supporting opacity
Support: Yes
Bit Position: 11
Bit Pattern: 0010000000000000
Device Option: GDLWIDTH
Option Description: Hardware can draw lines of varying widths
Support: Yes
Bit Position: 14
Bit Pattern: 0002000000000000
Device Option: GDHRDCHR
Option Description: Hardware characters are supported by the device
Support: Yes
1980 Chapter 55 / QDEVICE Procedure
Bit Position: 15
Bit Pattern: 0001000000000000
Device Option: GDXLIMIT
Option Description: There is no limit on max value allowed for x coordinate
Support: Yes
Bit Position: 16
Bit Pattern: 0000800000000000
Device Option: GDYLIMIT
Option Description: There is no limit on max value allowed for y coordinate
Support: Yes
Bit Position: 18
Bit Pattern: 0000200000000000
Device Option: GDTXJUSTIFY
Option Description: Hardware is capable of justifying proportional text
Support: Yes
Bit Position: 24
Bit Pattern: 0000008000000000
Device Option: GDUNICODE
Option Description: Device supports the use of the Unicode font attribute
Support: Yes
Bit Position: 25
Bit Pattern: 0000004000000000
Device Option: GDPOLYLINE
Option Description: Hardware is capable of supporting polylines
Support: Yes
Bit Position: 28
Bit Pattern: 0000000800000000
Device Option: GDTRUETYPE
Option Description: Device supports the use of TrueType fonts
Support: Yes
Bit Position: 36
Bit Pattern: 0000000008000000
Device Option: GDIMAGE
Option Description: Device is capable of drawing images
Support: Yes
Bit Position: 39
Bit Pattern: 0000000001000000
Device Option: GDIMGROTATE
Option Description: Device is incapable of doing image rotation
Support: Yes
Usage: QDEVICE Procedure 1981
Bit Position: 40
Bit Pattern: 0000000000800000
Device Option: GDTRUECOLOR
Option Description: Hardware is a 24-bit true color device
Support: Yes
Bit Position: 44
Bit Pattern: 0000000000080000
Device Option: GDTEXTCLIP
Option Description: Hardware will clip text at the device limits
Support: Yes
Bit Position: 50
Bit Pattern: 0000000000002000
Device Option: GDAUTOSIZE
Option Description: Autosize text to fit rows and columns
Support: Yes
Bit Position: 54
Bit Pattern: 0000000000000200
Device Option: GDPOLYOUTLINE
Option Description: Device draws polygon outlines
Support: Yes
Bit Position: 55
Bit Pattern: 0000000000000100
Device Option: GDPRINTERPATH
Option Description: Device temporarily sets printerpath to that of the device name
Support: Yes
Bit Position: 56
Bit Pattern: 0000000000000080
Device Option: GDOPTPASSTHRU
Option Description: PAPERSIZE option sets default value of PAPERSIZE goption
Support: Yes
Bit Position: 57
Bit Pattern: 0000000000000040
Device Option: GDPIEOUTLINE
Option Description: Driver draws pie slice outlines (empty pies)
Support: Yes
The following table lists the variables that you can use in a LINESTYLE report as
well as the labels for the variables that are used either in the SAS log or the output
data set. If you do not specify the VAR statement, the variables appear in the order
in which they appear in the table.
1982 Chapter 55 / QDEVICE Procedure
Name: LJ5PS
Description: LaserJet 5P -- 600 dpi -- PostScript
Type: Graph Device
Device Catalog: your-sas-path\sashelp
Supported Line Styles: 1-44
The following table lists the variables that you can use in a RECTANGLE report as
well as the labels for the variables that are used either in the SAS log or the output
data set. If you do not specify the VAR statement, the variables appear in the order
in which they appear in the table.
Name: SASPRTG
Description: POSTSCRIPT LEVEL 1
Type: Printer Interface Device
Device Catalog: your-sas-path\sashelp
Supported Hardware Fills: Empty,Solid
The following table lists the variables that you can use in a SYMBOL report as well
as the labels for the variables that are used either in the SAS log or the output data
set. If you do not specify the VAR statement, the variables appear in the order in
which they appear in the table.
Name: CGM
Description: CGM generator--binary output
Type: Graph Device
Device Catalog: your-sas-path\sashelp
Supported Hardware Symbols: Plus,X,Star
Details
The following example creates a General report for the default display device. This
example assumes that you are running in an interactive mode on Windows.
For the WIN device, the number of colors is controlled by your Windows display
settings. The size is controlled by your monitor and resolution settings.
1986 Chapter 55 / QDEVICE Procedure
Program
proc qdevice;
run;
Log
If you do not specify the OUT= option, the QDEVICE procedure sends its output to
the SAS log. The output for the Windows operating environment is shown below as
it appears in the SAS log.
Example Code 55.1 The SAS Log After Running PROC QDEVICE
Name: WIN
Description: Microsoft Windows Display
Type: System Display
Device Catalog: your-device-catalog
Default Typeface: Sasfont
Font Style: Roman
Font Weight: Normal
Font Height: 7 points
Maximum Colors: 2147483647
Visual Color: True Color
Color Support: RGB
I/O Type: GTERM
Data Format: Host Display
Height: 5.75 inches
Width: 9.25 inches
Ypixels: 690
Xpixels: 1110
Rows(vpos): 46
Columns(hpos): 111
Left Margin: 0 inches
Minimum Left Margin: 0 inches
Right Margin: 0 inches
Minimum Right Margin: 0 inches
Bottom Margin: 0 inches
Minimum Bottom Margin: 0 inches
Top Margin: 0 inches
Minimum Top Margin: 0 inches
XxY Resolution: 120x120 pixels per inch
Compression Enabled: Never
Font Embedding: Never
Animation: Unsupported
Details
The following example creates a General report for all devices and writes the
results to WORK.ALLDEVICES.
You can use the _ALL_ keyword to generate a report for all devices.
Program
proc qdevice out=allDevices;
device _all_;
run;
Output
The following image shows a portion of the report as it appears in the Viewtable
window.
Output 55.1 The Output Data Set Report for All Devices
1988 Chapter 55 / QDEVICE Procedure
Details
The following example creates a General report for all devices that end in EMF and
the PDF and SVG? universal printers. The results are written to the
WORK.MYREPORT data set. If you do not specify the REPORT= option, the
QDEVICE procedure generates a General report.
Program
proc qdevice out=myreport;
device '*emf';
printer pdf 'svg?';
run;
Output
The following image shows a portion of the report as it appears in the Viewtable
window.
Example 4: Generate a Report for the Default Printer 1989
Output 55.2 The Output Data Set Report for the EMF Device, and PDF and SVG Printers
Details
By default, printing in SAS under Windows is done by the default Windows printer
and not by Universal Printing. Therefore, the results that you see for the QDEVICE
procedure when you use the printer _PRINTER_ statement differ. Under Windows,
where the NOUPRINT system option is the default, the report is based on the
printer interface device that interfaces with the default Windows printer. Under
UNIX, where the UPRINT system option is set, the report is based on the default
SAS universal printer.
Because the REPORT= option is not specified, the QDEVICE procedure generates a
General report. The OUT= option is not specified and the results are written to the
SAS log. The _PRINTER_ keyword determines the default printer to report on and
generates a report for that printer.
Program: Windows
proc qdevice;
printer _PRINTER_;
run;
1 proc qdevice;
2 printer _PRINTER_;
3 run;
NOTE: The "\\wprt02nc0\clxmfpj21" printer will be used by default with the ODS
PRINTER destination.
Name: WINPRTC
Description: \\WPRT02NC0\CLXMFPJ21
Module: SASGDDMX
Type: Printer Interface Device
Device Catalog: your-sas-path\sashelp
Default Typeface: SAS Monospace
Font Style: Roman
Font Weight: Normal
Font Height: 10 points
Font Version: mfgpctt-v4.4 Thu Sep 16 14:30:47 EDT 1999
Maximum Colors: 2097152
Visual Color: True Color
Color Support: RGB
I/O Type: PRINTER
Data Format: Host Printer
Height: 10.67 inches
Width: 8.15 inches
Ypixels: 6392
Xpixels: 4892
Rows(vpos): 55
Columns(hpos): 97
Left Margin: 0.18 inches
Minimum Left Margin: 0.18 inches
Right Margin: 0.17 inches
Minimum Right Margin: 0.17 inches
Bottom Margin: 0.18 inches
Minimum Bottom Margin: 0.18 inches
Top Margin: 0.17 inches
Minimum Top Margin: 0.17 inches
XxY Resolution: 600x600 pixels per inch
Compression Enabled: Never
Font Embedding: Never
Animation: Unsupported
Program: UNIX
proc qdevice;
printer _PRINTER_;
run;
Example 5: Generate a Font Report 1991
1 proc qdevice;
2 printer _PRINTER_;
3 run;
NOTE: The "PostScript Level 1" printer will be used by default with the ODS PRINTER
destination.
Details
The first example generates a report of all the printer-resident and system fonts
available for the printer. The results are written to the Work.Myfonts data set.
The second example is a SAS program that uses a macro, the DATA step, and the
PRINT procedure to create a list of fonts for devices.
Program
proc qdevice report=font out=myfonts;
printer 'postscript level 2';
run;
Output
The following output shows the report as it appears in the Viewtable window.
Program
/* Macro FONTLIST - Report fonts supported by a device */
%mend fontlist;
%fontlist(device, pcl5c)
Program Description
Create the macro fontlist. The %macro statement begins the macro. The input to
the macro is the type, whether it is a device or printer, and the name of the device
or printer.
/* Macro FONTLIST - Report fonts supported by a device */
Create a data set, fonts, for the device. The macro input variables, type and name,
are used to create a Font report using the QDEVICE procedure. The output is
written to the data set fonts.
proc qdevice report=font out=fonts;
&type &name;
var font ftype fstyle fweight;
run;
Categorize the font type. Fonts can be a type System, TrueType, Adobe Type1,
Adobe CFF/Type2, Bitstream PFR, or Printer Resident.
data;
set fonts;
drop ftype;
length type $16;
if ftype = "System"
then do;
if substr(font,2,3) = "ttf" then type = "TrueType";
1994 Chapter 55 / QDEVICE Procedure
%mend fontlist;
Use the macro &fontlist to create and output data set for the PCL5c device.
%fontlist(device, pcl5c)
Example 6: Generate a Device Option Report 1995
Output
Output 55.4 A Partial View of the Fonts Supported by the PCL5c Device
Details
The following example creates a Device Options (DEVOPTIONS) report for the
PNG device. The report is written to an output data set.
Program
proc qdevice report=devoption support=all out=devop;
device png;
run;
proc print data=devop;
var bit bitstring option odesc;
where Support="Yes";
title "Supported PNG Device Options";
run;
Program Description
Report the options for the PNG device. The REPORT=DEVOPTION option
specifies to create a device options report. SUPPORT=ALL specifies to report all
device features. The option OUT=DEVOP creates the data set WORK.DEVOP. The
DEVICE PNG statement specifies to report on the PNG device.
proc qdevice report=devoption support=all out=devop;
device png;
run;
Print the device options report. Printing the WORK.DEVOP data set, the printed
report shows the BIT, BITSTRING, OPTION, and ODESC variables for the device
options where Support="Yes".
proc print data=devop;
var bit bitstring option odesc;
where Support="Yes";
title "Supported PNG Device Options";
run;
Example 7: Specify a User Library and Catalog for a Report 1997
Output
Output 55.5 The Output Data Set for the PNG Device
DEVLOC=
PRINTER statement
Details
This example creates a general report for a GIF printer in a user-specified library
and catalog using the DEVLOC= and CATALOG= options in the PROC QDEVICES
statement. The results are written to the SAS log.
Program
Assign the device library and catalog
libname devlib 'c:\em';
proc qdevice report=general devloc=devlib cat=mydevices;
device gif;
run;
Example 7: Specify a User Library and Catalog for a Report 1999
Log
Example Code 55.4 The SAS Log Report for a Specific Device Library and Catalog
Name: GIF
Description: Graphics Interchange Format RGB Color/Alpha Blending
Module: SASGDDMX
Type: Shortcut Device
Device Catalog: C:\em
Prototype: GIF
Default Typeface: Cumberland AMT
Typeface Alias: Courier
Font Style: Regular
Font Weight: Normal
Font Height: 8 points
Font Version: Version 1.03
Maximum Colors: 16777216
Visual Color: True Color
Color Support: RGBA
Destination: sasprt.gif
I/O Type: DISK
Data Format: GIF
Height: 6.25 inches
Width: 8.33 inches
Ypixels: 600
Xpixels: 800
Rows(vpos): 50
Columns(hpos): 114
Left Margin: 0 inches
Minimum Left Margin: 0 inches
Right Margin: 0 inches
Minimum Right Margin: 0 inches
Bottom Margin: 0 inches
Minimum Bottom Margin: 0 inches
Top Margin: 0 inches
Minimum Top Margin: 0 inches
XxY Resolution: 96x96 pixels per inch
Compression Enabled: Always
Compression Method: LZW
Font Embedding: Never
Animation: Enabled
2000 Chapter 55 / QDEVICE Procedure
2001
56
RANK Procedure
Ranking Data
The following output shows the results of ranking the values of one variable with a
simple PROC RANK step. In this example, the new ranking variable shows the order
of finish of five golfers over a four-day competition. The player with the lowest
number of strokes finishes in first place. The following statements produce the
output:
proc rank data=golf out=rankings;
var strokes;
ranks Finish;
run;
Output 56.1 Assignment of the Lowest Rank Value to the Lowest Variable Value
1 Jack 279 2
2 Jerry 283 3
3 Mike 274 1
4 Randy 296 4
5 Tito 302 5
In the following output, the candidates for city council are ranked by district
according to the number of votes that they received in the election. They are also
ranked according to the number of years that they have served in office.
This example shows how PROC RANK can do the following tasks:
Concepts: RANK Procedure 2003
n reverse the order of the rankings so that the highest value receives the rank of 1,
the next highest value receives the rank of 2, and so on
n rank the observations separately by values of multiple variables
For an explanation of the program that produces this report, see “Example 2:
Ranking Values within BY Groups” on page 2019.
Output 56.2 Assignment of the Lowest Rank Value to the Highest Variable Value within Each BY
Group
Vote Years
Obs Candidate Vote Years Rank Rank
1 Cardella 1689 8 1 1
2 Latham 1005 2 3 2
3 Smith 1406 0 2 3
4 Walker 846 0 4 3
N = 4
Vote Years
Obs Candidate Vote Years Rank Rank
5 Hinkley 912 0 3 3
6 Kreitemeyer 1198 0 2 3
7 Lundell 2447 6 1 1
8 Thrash 912 2 3 2
N = 4
Computer Resources
For any variable that is being ranked, PROC RANK stores in memory the value of
that variable for every observation.
2004 Chapter 56 / RANK Procedure
Statistical Applications
Ranks are useful for investigating the distribution of values for a variable. The ranks
divided by n or n+1 form values in the range 0 to 1, and these values estimate the
cumulative distribution function. You can apply inverse cumulative distribution
functions to these fractional ranks to obtain probability quantile scores. You can
compare these scores to the original values to judge the fit to the distribution. For
example, if a set of data has a normal distribution, the normal scores should be a
linear function of the original values, and a plot of scores versus original values
should be a straight line.
These statistical tests commonly assume that the data is from a continuous
distribution, in which the probability of a tie is theoretically zero. In practice,
whether because of inaccuracies in measurement, the finite accuracy of
representation within a digital computer, or other reasons, tied values often occur.
It is also conventional in these statistical tests to assign the average rank to a
group of tied values. Assignment of the average rank is preferred because it
Concepts: RANK Procedure 2005
preserves the sum of the ranks and therefore does not distort the estimate of the
cumulative distribution function.
For applications within and outside of statistics, the RANK procedure provides the
TIES= option to control the treatment of tied values. The default value for this
option depends on the specified ranking or scoring method, which you can specify
with the options of the PROC RANK statement. For ranking and scoring methods,
when TIES=LOW, TIES=HIGH, or TIES=MEAN, tied values are initially treated as if
they are distinguishable. These methods all begin by sorting the values of the
analysis variable within a BY group, and then assigning to each nonmissing value an
ordinal number that indicates its position in the sequence.
Scoring methods include normal and Savage scoring, which are requested by the
NORMAL= and SAVAGE options. Non-scoring methods include ordinal ranking, the
default, and those methods that are requested by the FRACTION, NPLUS1,
GROUPS=, and PERCENT options. For the scoring methods NORMAL= and
SAVAGE, PROC RANK obtains the probability quantile scores with the appropriate
formulas as if no tied values were present within the data. PROC RANK then
resolves tied values by selecting the minimum, selecting the maximum, or
calculating the average of all scores within a tied group.
For all ranking and scoring methods, when TIES=DENSE, tied values are treated as
indistinguishable, and each value within a tied group is assigned the same ordinal.
As with the other TIES= resolution methods, all ranking and scoring methods begin
by sorting the values of the analysis variable and then assigning ordinals. However,
a group of tied values is treated as a single value. The ordinal assigned to the group
differs by only +1 from the ordinal that is assigned to the value just prior to the
group, if there is one. The ordinal differs by only -1 from the ordinal assigned to the
value just after the group, if there is one. Therefore, the smallest ordinal within a BY
group is 1, and the largest ordinal is the number of unique, nonmissing values in the
BY group.
After the ordinals are assigned, PROC RANK calculates ranks and scores using the
number of unique, nonmissing values instead of the number of nonmissing values
for scaling. Because of its tendency to distort the cumulative distribution function
estimate, dense ranking is not generally acceptable for use in nonparametric
statistical tests.
Note that PROC RANK bases its computations on the internal numeric values of
the analysis variables. The procedure does not format or round these values before
analysis. When values differ in their internal representation, even slightly, PROC
RANK does not treat them as tied values. If this is a concern for your data, then
round the analysis variables by an appropriate amount before invoking PROC
RANK. For information about the ROUND function, see “ROUND Function” in SAS
Functions and CALL Routines: Reference.
2006 Chapter 56 / RANK Procedure
n Aster
n DB2
n Google BigQuery
n Greenplum
n Hadoop
n HAWQ
n Impala
n Netezza
n Oracle
n PostgreSQL
n SAP HANA
n Snowflake
n Teradata
n Vertica
n Yellowbrick
Note: When using the Google BigQuery data source, columns in the BY statement
in PROC RANK cannot be of data type FLOAT64 for in-database processing.
Concepts: RANK Procedure 2007
The presence of table statistics might affect the performance of the RANK
procedure's in-database processing. If your DBMS is not configured to
automatically generate table statistics, then manual generation of table statistics
might be necessary to achieve acceptable in-database performance.
Note: For DB2, generation of table statistics (either automatic or manual) is highly
recommended for all but the smallest input tables.
If the RANK procedure's input data set is a table or view that resides within a
database from which rows would normally be retrieved with the SAS/ACCESS
interface to a supported DBMS, then PROC RANK can perform much or all of its
work within the DBMS. There are several other factors that determine whether
such in-database processing can occur. In-database processing will not occur in the
following circumstances:
n if the RENAME= data set option is specified on the input data set.
When PROC RANK can process data within the DBMS, it generates an SQL query.
The structure of the SQL query that is generated during an in-database invocation
of PROC RANK depends on several factors, including these:
n the target DBMS
n the PROC RANK options that are used, such as TIES= and DESCENDING
The SQL query expresses the required calculations and is submitted to the DBMS.
The results of this query will either remain as a new table within the DBMS if the
output of the RANK procedure is directed there, or it will be returned to SAS. The
settings for the MSGLEVEL option and the SQLGENERATION option determine
whether messages will be printed to the SAS log, which indicates whether in-
database processing was performed. Generated SQL can be examined by setting
the SQL_IP_TRACE option or the SASTRACE= option. SQL_IP_TRACE shows the
SQL that is generated by PROC RANK. For more information, see the SASTRACE=
2008 Chapter 56 / RANK Procedure
For more information about the settings for system options, library options, data
set options, and statement options that affect in-database performance for SAS
procedures, see the SQLGENERATION= LIBNAME Option and the
SQLGENERATION= option in SAS/ACCESS for Relational Databases: Reference.
PROC RANK Compute the ranks for one or more numeric Ex. 1, Ex. 2,
variables in a SAS data set and writes the ranks Ex. 3
to a new SAS data set
Restrictions: This procedure is not available in SAS Viya orders that include only SAS Visual
Analytics.
Only one ranking method can be specified in a single PROC RANK step.
Examples: “Example 1: Ranking Values of Multiple Variables” on page 2017
“Example 2: Ranking Values within BY Groups” on page 2019
“Example 3: Partitioning Observations into Groups Based on Ranks” on page 2022
Syntax
PROC RANK <options>;
Preserve values
PRESERVERAWBYVALUES
preserves raw values of all BY variables.
Optional Arguments
DATA=SAS-data-set
specifies the input SAS data set.
Restrictions You cannot use PROC RANK with an engine that supports
concurrent access if another user is updating the data set at the
same time.
DESCENDING
reverses the direction of the ranks. With DESCENDING, the largest value
receives a rank of 1, the next largest value receives a rank of 2, and so on.
Otherwise, values are ranked from smallest to largest.
FRACTION
computes fractional ranks by dividing each rank by the number of observations
having nonmissing values of the ranking variable.
Alias F
GROUPS=number-of-groups
assigns group values ranging from 0 to number-of-groups minus 1. Common
specifications are GROUPS=100 for percentiles, GROUPS=10 for deciles, and
GROUPS=4 for quartiles. For example, GROUPS=4 partitions the original values
into four groups. The smallest values receive, by default, a quartile value of 0
and the largest values receiving a quartile value of 3.
PROC RANK Statement 2011
FLOOR is the FLOOR function, rank is the value's order rank, k is the value of
GROUPS=, and n is the number of observations having nonmissing values of the
ranking variable for TIES=LOW, TIES=MEAN, and TIES=HIGH. For TIES=DENSE,
n is the number of observations that have unique nonmissing values.
NORMAL=BLOM | TUKEY | VW
computes normal scores from the ranks. The resulting variables appear normally
distributed. n is the number of observations that have nonmissing values of the
ranking variable for TIES=LOW, TIES=MEAN, and TIES=HIGH. For TIES=DENSE,
n is the number of observations that have unique nonmissing values. The
formulas are as follows:
BLOM
yi=Φ −1((ri−3/8)/(n+1/4))
TUKEY
yi=Φ−1((ri−1/3)/(n+1/3))
VW
yi=Φ −1((ri)/(n+1))
VW stands for van der Waerden. With NORMAL=VW, you can use the scores for
a nonparametric location test. All three normal scores are approximations to the
exact expected order statistics for the normal distribution (also called normal
scores). The BLOM version appears to fit slightly better than the others (Blom
1958; Tukey 1962).
Interaction If you specify the TIES= option, then PROC RANK computes the
normal score from the ranks based on non-tied values and applies
the TIES= specification to the resulting score.
NPLUS1
computes fractional ranks by dividing each rank by the denominator n+1, where
n is the number of observations that have nonmissing values of the ranking
2012 Chapter 56 / RANK Procedure
Aliases FN1
N1
OUT=SAS-data-set
names the output data set. If SAS-data-set does not exist, PROC RANK creates
it. If you omit OUT=, the data set is named using the DATAn naming convention.
PRESERVERAWBYVALUES
preserves raw values of all BY variables. when those variables are propagated to
the output data set. If the PRESERVERAWBYVALUES option is not specified,
and one BY variable is specified, then a representative value for each BY group
is written to the output data set. If multiple BY variables are specified, then a
representative set of values for each BY group is written to the output data set.
PERCENT
divides each rank by the number of observations that have nonmissing values of
the variable and multiplies the result by 100 to get a percentage. n is the number
of observations that have nonmissing values of the ranking variable for
TIES=LOW, TIES=MEAN, and TIES=HIGH. For TIES=DENSE, n is the number of
observations that have unique nonmissing values.
Alias P
Tip You can use PERCENT to calculate cumulative percentages, but you
use GROUPS=100 to compute percentiles.
SAVAGE
computes Savage (or exponential) scores from the ranks by the following
formula (Lehman 1998):
1
yi = Σ j
−1
j = n − ri + 1
Interaction If you specify the TIES= option, then PROC RANK computes the
Savage score from the ranks based on non-tied values and applies
the TIES= specification to the resulting score.
HIGH
assigns the largest of the corresponding ranks (or largest of the normal
scores when NORMAL= is specified).
LOW
assigns the smallest of the corresponding ranks (or smallest of the normal
scores when NORMAL= is specified).
MEAN
assigns the mean of the corresponding rank (or mean of the normal scores
when NORMAL= is specified).
DENSE
computes scores and ranks by treating tied values as a single-order statistic.
For the default method, ranks are consecutive integers that begin with the
number one and end with the number of unique, nonmissing values of the
variable that is being ranked. Tied values are assigned the same rank.
Interaction If you specify the NORMAL= option, then the TIES= specification
applies to the normal score, not to the rank that is used to compute
the normal score.
BY Statement
Produces a separate set of ranks for each BY group.
Syntax
BY <DESCENDING> variable-1
<<DESCENDING> variable-2 …>
<NOTSORTED>;
Required Argument
variable
specifies the variable that the procedure uses to form BY groups. You can
specify more than one variable. If you do not use the NOTSORTED option in the
BY statement, then the observations in the data set must either be sorted by all
the variables that you specify or be indexed appropriately. Variables in a BY
statement are called BY variables.
Note: When using the Google BigQuery data source, columns in the BY
statement in PROC RANK cannot be of data type FLOAT64 for in-database
processing.
Optional Arguments
DESCENDING
specifies that the observations are sorted in descending order by the variable
that immediately follows the word DESCENDING in the BY statement.
NOTSORTED
specifies that observations are not necessarily sorted in alphabetic or numeric
order. The observations are grouped in another way, such as chronological order.
If you are using a SAS/ACCESS engine, and you specify a BY statement, then
the data is always returned in sorted order. If you specify the NOTSORTED
option, then it is ignored and in-database processing is performed.
RANKS Statement
Creates new variables for the rank values.
Default: If you omit the RANKS statement, the rank values replace the original variable
values in the output data set.
Requirement: If you use the RANKS statement, you must also use the VAR statement.
VAR Statement 2015
Syntax
RANKS new-variables(s);
Required Argument
new-variable(s)
specifies one or more new variables that contain the ranks for the variable(s)
listed in the VAR statement. The first variable listed in the RANKS statement
contains the ranks for the first variable listed in the VAR statement. The second
variable listed in the RANKS statement contains the ranks for the second
variable listed in the VAR statement, and so on.
VAR Statement
Specifies the input variables.
Default: If you omit the VAR statement, PROC RANK computes ranks for all numeric
variables in the input data set.
Examples: “Example 1: Ranking Values of Multiple Variables” on page 2017
“Example 2: Ranking Values within BY Groups” on page 2019
“Example 3: Partitioning Observations into Groups Based on Ranks” on page 2022
Syntax
VAR data-set-variables(s);
Required Argument
data-set-variable(s)
specifies one or more variables for which ranks are computed.
Details
the RANKS statement, the rank values replace the original values in the output
data set.
Missing Values
Missing values are not ranked and are left missing when ranks or rank scores
replace the original values in the output data set.
The output data set contains all the variables from the input data set plus the
variables named in the RANKS statement. If you omit the RANKS statement, the
rank values replace the original variable values in the output data set.
Numeric Precision
For in-database processing, the mathematical operations expressed by the RANK
procedure in SQL, and the order in which they are performed, are essentially the
same as those performed within SAS. However, in-database processing might result
in small numerical differences when compared to results produced directly by SAS.
Example 1: Ranking Values of Multiple Variables 2017
Details
This example performs the following actions:
n reverses the order of the ranks so that the highest value receives the rank of 1
n creates ranking variables and prints them with the original variables
Program
options nodate pageno=1 linesize=80 pagesize=60;
data cake;
input Name $ 1-10 Present 12-13 Taste 15-16;
datalines;
Davis 77 84
Orlando 93 80
Ramey 68 72
Roe 68 75
Sanders 56 79
Simms 68 77
Strickland 82 79
;
proc rank data=cake out=order descending ties=low;
var present taste;
2018 Chapter 56 / RANK Procedure
Program Description
Set the SAS system options. The NODATE option specifies to omit the date and
time at which the SAS job begins. The PAGENO= option specifies the page number
for the next page of output that SAS produces. The LINESIZE= option specifies the
line size. The PAGESIZE= option specifies the number of lines for a page of SAS
output.
options nodate pageno=1 linesize=80 pagesize=60;
Create the CAKE data set. This data set contains each participant's last name,
score for presentation, and score for taste in a cake-baking contest.
data cake;
input Name $ 1-10 Present 12-13 Taste 15-16;
datalines;
Davis 77 84
Orlando 93 80
Ramey 68 72
Roe 68 75
Sanders 56 79
Simms 68 77
Strickland 82 79
;
Generate the ranks for the numeric variables in descending order and create the
Order output data set. DESCENDING reverses the order of the ranks so that the
high score receives the rank of 1. TIES=LOW gives tied values the best possible
rank. OUT= creates the Order output data set.
proc rank data=cake out=order descending ties=low;
Create two new variables that contain ranks. The VAR statement specifies the
variables to rank. The RANKS statement creates two new variables, PresentRank
and TasteRank, that contain the ranks for the variables Present and Taste,
respectively.
var present taste;
ranks PresentRank TasteRank;
run;
Print the data set. PROC PRINT prints the Order data set. The TITLE statement
specifies a title.
proc print data=order;
title "Rankings of Participants' Scores";
run;
Example 2: Ranking Values within BY Groups 2019
Output: Listing
Output 56.3 Rankings of Participants' Scores
Present Taste
Obs Name Present Taste Rank Rank
1 Davis 77 84 3 1
2 Orlando 93 80 1 2
3 Ramey 68 72 4 7
4 Roe 68 75 4 6
5 Sanders 56 79 7 3
6 Simms 68 77 4 5
7 Strickland 82 79 2 3
Details
This example performs the following actions:
n ranks observations separately within BY groups
n reverses the order of the ranks so that the highest value receives the rank of 1
n creates ranking variables and prints them with the original variables
Program
options nodate pageno=1 linesize=80 pagesize=60;
data elect;
input Candidate $ 1-11 District 13 Vote 15-18 Years 20;
2020 Chapter 56 / RANK Procedure
datalines;
Cardella 1 1689 8
Latham 1 1005 2
Smith 1 1406 0
Walker 1 846 0
Hinkley 2 912 0
Kreitemeyer 2 1198 0
Lundell 2 2447 6
Thrash 2 912 2
;
proc rank data=elect out=results ties=low descending;
by district;
var vote years;
ranks VoteRank YearsRank;
run;
proc print data=results n;
by district;
title 'Results of City Council Election';
run;
Program Description
Set the SAS system options. The NODATE option specifies to omit the date and
time at which the SAS job begins. The PAGENO= option specifies the page number
for the next page of output that SAS produces. The LINESIZE= option specifies the
line size. The PAGESIZE= option specifies the number of lines for a page of SAS
output.
options nodate pageno=1 linesize=80 pagesize=60;
Create the Elect data set. This data set contains each candidate's last name,
district number, vote total, and number of years' experience on the city council.
data elect;
input Candidate $ 1-11 District 13 Vote 15-18 Years 20;
datalines;
Cardella 1 1689 8
Latham 1 1005 2
Smith 1 1406 0
Walker 1 846 0
Hinkley 2 912 0
Kreitemeyer 2 1198 0
Lundell 2 2447 6
Thrash 2 912 2
;
Generate the ranks for the numeric variables in descending order and create the
Results output data set. DESCENDING reverses the order of the ranks so that the
highest vote total receives the rank of 1. TIES=LOW gives tied values the best
possible rank. OUT= creates the Results output data set.
proc rank data=elect out=results ties=low descending;
Example 2: Ranking Values within BY Groups 2021
Create a separate set of ranks for each BY group. The BY statement separates the
rankings by values of District.
by district;
Create two new variables that contain ranks. The VAR statement specifies the
variables to rank. The RANKS statement creates the new variables, VoteRank and
YearsRank, that contain the ranks for the variables Vote and Years, respectively.
var vote years;
ranks VoteRank YearsRank;
run;
Print the data set. PROC PRINT prints the Results data set. The N option prints the
number of observations in each BY group. The TITLE statement specifies a title.
proc print data=results n;
by district;
title 'Results of City Council Election';
run;
Output: Listing
In the following output, Hinkley and Thrash tied with 912 votes in the second
district. They both receive a rank of 3 because TIES=LOW.
Vote Years
Obs Candidate Vote Years Rank Rank
1 Cardella 1689 8 1 1
2 Latham 1005 2 3 2
3 Smith 1406 0 2 3
4 Walker 846 0 4 3
N = 4
Vote Years
Obs Candidate Vote Years Rank Rank
5 Hinkley 912 0 3 3
6 Kreitemeyer 1198 0 2 3
7 Lundell 2447 6 1 1
8 Thrash 912 2 3 2
N = 4
2022 Chapter 56 / RANK Procedure
Details
This example performs the following actions:
n partitions observations into groups on the basis of values of two input variables
Program
options nodate pageno=1 linesize=80 pagesize=60;
data swim;
input Name $ 1-7 Gender $ 9 Back 11-14 Free 16-19;
datalines;
Andrea F 28.6 30.3
Carole F 32.9 24.0
Clayton M 27.0 21.9
Curtis M 29.0 22.6
Doug M 27.3 22.4
Ellen F 27.8 27.0
Jan F 31.3 31.2
Jimmy M 26.3 22.5
Karin F 34.6 26.2
Mick M 29.0 25.4
Richard M 29.7 30.2
Sam M 27.2 24.1
Susan F 35.1 36.1
;
proc sort data=swim out=pairs;
by gender;
run;
Example 3: Partitioning Observations into Groups Based on Ranks 2023
Program Description
Set the SAS system options. The NODATE option specifies to omit the date and
time at which the SAS job began. The PAGENO= option specifies the page number
for the next page of output that SAS produces. The LINESIZE= option specifies the
line size. The PAGESIZE= option specifies the number of lines for a page of SAS
output.
options nodate pageno=1 linesize=80 pagesize=60;
Create the Swim data set. This data set contains swimmers' first names and their
times, in seconds, for the backstroke and the freestyle. This example groups the
swimmers into pairs, within male and female classes, based on times for both
strokes so that every swimmer is paired with someone who has a similar time for
each stroke.
data swim;
input Name $ 1-7 Gender $ 9 Back 11-14 Free 16-19;
datalines;
Andrea F 28.6 30.3
Carole F 32.9 24.0
Clayton M 27.0 21.9
Curtis M 29.0 22.6
Doug M 27.3 22.4
Ellen F 27.8 27.0
Jan F 31.3 31.2
Jimmy M 26.3 22.5
Karin F 34.6 26.2
Mick M 29.0 25.4
Richard M 29.7 30.2
Sam M 27.2 24.1
Susan F 35.1 36.1
;
Sort the Swim data set and create the Pairs output data set. PROC SORT sorts the
data set by Gender. This action is required to obtain a separate set of ranks for each
group. OUT= creates the Pairs output data set.
proc sort data=swim out=pairs;
by gender;
run;
Generate the ranks that are partitioned into three groups and create an output
data set. GROUPS=3 assigns one of three possible group values (0,1,2) to each
swimmer for each stroke. OUT= creates the Rankpair output data set.
2024 Chapter 56 / RANK Procedure
Create a separate set of ranks for each BY group. The BY statement separates the
rankings by Gender.
by gender;
Replace the original values of the variables with the rank values. The VAR
statement specifies that Back and Free are the variables to rank. With no RANKS
statement, PROC RANK replaces the original variable values with the group values
in the output data set.
var back free;
run;
Print the data set. PROC PRINT prints the Rankpair data set. The N option prints
the number of observations in each BY group. The TITLE statement specifies a title.
proc print data=rankpair n;
by gender;
title 'Pairings of Swimmers for Backstroke and Freestyle';
run;
Output: Listing
In the following output, the group values pair swimmers with similar times to work
on each stroke. For example, Andrea and Ellen work together on the backstroke
because they have the fastest times in the female class. The groups of male
swimmers are unbalanced because there are seven male swimmers; for each stroke,
one group has three swimmers.
Example 3: Partitioning Observations into Groups Based on Ranks 2025
1 Andrea 0 1
2 Carole 1 0
3 Ellen 0 1
4 Jan 1 2
5 Karin 2 0
6 Susan 2 2
N = 6
7 Clayton 0 0
8 Curtis 2 1
9 Doug 1 0
10 Jimmy 0 1
11 Mick 2 2
12 Richard 2 2
13 Sam 1 1
N = 7
References
Blom, G. 1958. Statistical Estimates and Transformed Beta Variables. New York, New
York: John Wiley & Sons, Inc.
Conover, W.J. 1998. Practical Nonparametric Statistics, Third Edition. New York, New
York: John Wiley & Sons, Inc.
Conover, W.J. and R.L. Iman. 1976. “On Some Alternative Procedures Using Ranks
for the Analysis of Experimental Designs.” Communications in Statistics A5 (14):
1348–1368.
Conover, W.J. and R.L. Iman. 1981. “Rank Transformations as a Bridge between
Parametric and Nonparametric Statistics.” The American Statistician 35: 124–129.
Iman, R.L. and W.J. Conover. 1979. “The Use of the Rank Transform in Regression.”
Technometrics 21: 499–509.
Lehman, E.L. 1998. Nonparametrics: Statistical Methods Based on Ranks. Upper
Saddle River, New Jersey: Prentice Hall.
Quade, D. 1966. “On Analysis of Variance for the K-Sample Problem.” Annals of
Mathematical Statistics 37: 1747–1758.
2026 Chapter 56 / RANK Procedure
57
REGISTRY Procedure
For more information, see “The SAS Registry” in SAS Language Reference: Concepts.
Syntax
PROC REGISTRY <options>;
PROC REGISTRY Statement 2029
uses uppercase for all keys, names, and item values when you import a
file.
USESASHELP
performs the specified operation on the Sashelp portion of the SAS
registry.
Optional Arguments
CLEARSASUSER
erases from the Sasuser portion of the SAS registry the keys that were added by
a user.
COMPAREREG1='libname.registry-name-1'
specifies one of two registries to compare. The results appear in the SAS log.
libname
is the name of the library in which the registry file resides.
registry-name-1
is the name of the first registry.
Interaction To specify a single key and all of its subkeys, specify the
STARTAT= option.
COMPAREREG2='libname.registry-name-2'
specifies the second of two registries to compare. The results appear in the SAS
log.
libname
is the name of the library in which the registry file resides.
registry-name-2
is the name of the second registry.
COMPARETO=file-specification
compares the contents of a file that contains registry information to a registry. It
returns information about keys and values that it finds in the file that are not in
the registry. It reports the following items as differences:
n keys that are defined in the external file but not in the registry
n value names for a given key that are in the external file but not in the registry
COMPARETO= does not report as differences any keys and values that are in
the registry but not in the file because the registry could easily be composed of
pieces from many different files.
'external-file'
is the path and name of an external file that contains the registry
information.
fileref
is a fileref that has been assigned to an external file.
DEBUGON
enables registry debugging by providing more descriptive log entries.
DEBUGOFF
disables registry debugging.
EXPORT=file-specification
writes the contents of a registry to the specified file, where file-specification is
one of the following values:
'external-file'
is the name of an external file that contains the registry information.
fileref
is a fileref that has been assigned to an external file.
To export a single key and all of its subkeys, specify the STARTAT=
option.
Example “Example 2: Listing and Exporting the Registry File” on page 2039
2032 Chapter 57 / REGISTRY Procedure
FOLLOWLINKS
follows links that are found when processing the LIST option.
Normally the LIST option displays the values of the link items. If you use the
FOLLOWLINKS option, the links are treated as keys, and items contained in the
links are displayed.
FULLSTATUS
lists the keys, subkeys, and values that were added or deleted as a result of
running the IMPORT= and UNINSTALL options.
IMPORT=file-specification
specifies the file to import into the SAS registry. PROC REGISTRY does not
overwrite the existing registry. Instead, it updates the existing registry with the
contents of the specified file.
'external-file'
is the path and name of an external file that contains the registry
information.
fileref
is a fileref that has been assigned to an external file.
Interactions By default, IMPORT= imports the file to the Sasuser portion of the
SAS registry. To import the file to the Sashelp portion of the
registry, specify the USESASHELP option. You must have Write
permission to Sashelp to use USESASHELP.
KEYSONLY
limits the LIST, LISTUSER, LISTHELP, and LISTREG options output to display
keys only.
LEVELS=n
limits the number of levels to display for the LIST, LISTUSER, LISTHELP, and
LISTREG options.
LIST
writes the contents of the entire SAS registry to the SAS log.
Interaction To write a single key and all of its subkeys, use the STARTAT=
option.
LISTHELP
writes the contents of the Sashelp portion of the registry to the SAS log.
Interaction To write a single key and all of its subkeys, use the STARTAT=
option.
LISTREG='libname.registry-name'
lists the contents of the specified registry in the log.
libname
is the name of the library in which the registry file resides.
registry-name
is the name of the registry.
Here is an example:
proc registry listreg='sashelp.regstry';
run;
Interaction To list a single key and all of its subkeys, use the STARTAT= option.
LISTUSER
writes the contents of the Sasuser portion of the registry to the SAS log.
Interaction To write a single key and all of its subkeys, use the STARTAT=
option.
Example “Example 2: Listing and Exporting the Registry File” on page 2039
STARTAT='key-name'
exports or writes the contents of a single key and all of its subkeys.
You must specify an entire key sequence if you want to start listing at any
subkey under the root key.
UNINSTALL=file-specification
deletes from the specified registry all the keys and values that are in the
specified file.
'external-file'
is the name of an external file that contains the keys and values to delete.
2034 Chapter 57 / REGISTRY Procedure
fileref
is a fileref that has been assigned to an external file. To assign a fileref, you
can do the following:
n use the Explorer Window
Interactions By default, UNINSTALL deletes the keys and values from the
Sasuser portion of the SAS registry. To delete the keys and values
from the Sashelp portion of the registry, specify the USESASHELP
option. You must have Write permission to Sashelp to use this
option.
UPCASE
uses uppercase for all incoming key names.
UPCASEALL
uses uppercase for all keys, names, and item values when you import a file.
USESASHELP
performs the specified operation on the Sashelp portion of the SAS registry.
A registry file must have a particular structure. Each entry in the registry file
consists of a key name, followed on the next line by one or more values. The key
name identifies the key or subkey that you are defining. Any values that follow
specify the names or data to associate with the key.
Usage: REGISTRY Procedure 2035
Examples of valid key name sequences follow. These sequences are typical of the
SAS registry:
n [CORE\EXPLORER\MENUS\ENTRIES\CLASS]
n [CORE\EXPLORER\NEWMEMBER\CATALOG]
n [CORE\EXPLORER\NEWENTRY\CLASS]
n [CORE\EXPLORER\ICONS\ENTRIES\LOG]
value-name=value-content
A value-name can be an at sign (@), which indicates the default value name, or it
can be any text string in double quotation marks. If the text string contains an
ampersand (&), then the character (either uppercase or lowercase) that follows the
ampersand is a shortcut for the value name. For more information, see “Sample
Registry Entries” on page 2036.
The entire text string cannot contain more than 255 characters (including quotation
marks and ampersands). It can contain any character except a backslash (\).
n a string. You can put any character inside the quotation marks, including nothing
("").
The following display shows how the different types of values that are described
above appear in the Registry Editor:
n a string="my data"
n dword=dword:00010203
The following display shows a registry entry that contains default PostScript
printer settings:
Usage: REGISTRY Procedure 2037
Figure 57.2 Portion of a Registry Editor Showing Settings for a PostScript Printer
To see what the actual registry text file looks like, you can use PROC REGISTRY to
write the contents of the registry key to the SAS log, using the LISTUSER and
STARTAT= options.
The following example shows the syntax for sending a Sasuser registry entry to the
log:
proc registry
listuser
startat='sasuser-registry-key-name';
run;
Details
This example imports a file into the Sasuser portion of the SAS registry. The
following source file contains examples of valid key name sequences in a registry
file:
[HKEY_USER_ROOT\AllGoodPeopleComeToTheAidOfTheirCountry]
@="This is a string value"
"Value2"=""
"Value3"="C:\\This\\Is\\Another\\String\\Value"
Example 2: Listing and Exporting the Registry File 2039
Program
filename source 'external-file';
proc registry import=source;
run;
Program Description
Assign a fileref to a file that contains valid text for the registry. The FILENAME
statement assigns the fileref SOURCE to the external file that contains the text to
read into the registry.
filename source 'external-file';
Invoke PROC REGISTRY to import the file that contains input for the registry.
PROC REGISTRY reads the input file that is identified by the fileref SOURCE.
IMPORT= writes to the Sasuser portion of the SAS registry by default.
proc registry import=source;
run;
Log
Example Code 57.2 Results from Importing a File to the SAS Registry
Details
The registry file is usually very large. To export a portion of the registry, use the
STARTAT= option.
This example lists the Sasuser portion of the SAS registry and exports it to an
external file.
2040 Chapter 57 / REGISTRY Procedure
Program
proc registry
listuser
export='external-file';
run;
Program Description
Write the contents of the Sasuser portion of the registry to the SAS log. The
LISTUSER option causes PROC REGISTRY to write the entire Sasuser portion of
the registry to the log.
proc registry
listuser
Export the registry to the specified file. The EXPORT= option writes a copy of the
Sasuser portion of the SAS registry to the external file.
export='external-file';
run;
Log
Example Code 57.3 Results from Listing and Exporting a SAS Registry File
FILENAME statement
Details
This example compares the Sasuser portion of the SAS registry to an external file.
Comparisons such as this one are useful if you want to know the difference
between a backup file that was saved with a .txt file extension and the current
registry file.
To compare the Sashelp portion of the registry with an external file, specify the
USESASHELP option.
This SAS log shows two differences between the Sasuser portion of the registry
and the specified external file. In the registry, the value of "Initialized" is "True"; in
the external file, it is "False". In the registry, the value of "Icon" is "658"; in the
external file it is "343".
Program
Program Description
Assign a fileref to the external file that contains the text to compare to the
registry. The FILENAME statement assigns the fileref TESTREG to the external file.
Compare the specified file to the Sasuser portion of the SAS registry. The
COMPARETO option compares the contents of a file to a registry. It returns
information about keys and values that it finds in the file that are not in the registry.
proc registry
compareto=testreg;
run;
2042 Chapter 57 / REGISTRY Procedure
Log
Example Code 57.4 Results from Comparing the Registry to an External File
Details
This example uses the REGISTRY procedure options COMPAREREG1= and
COMPAREREG2= to specify two registry files for comparison.
Program
libname proclib 'SAS-library';
proc registry comparereg1='sasuser.regstry'
startat='CORE\EXPLORER'
comparereg2='proclib.regstry';
run;
Program Description
Declare the PROCLIB library. The PROCLIB library contains a registry file.
Example 5: Specifying an Entire Key Sequence with the STARTAT= Option 2043
Start PROC REGISTRY and specify the first registry file to be used in the
comparison.
proc registry comparereg1='sasuser.regstry'
Limit the comparison to the registry keys including and following the specified
registry key. The STARTAT= option limits the scope of the comparison to the
EXPLORER subkey under the CORE key. By default the comparison includes the
entire contents of both registries.
startat='CORE\EXPLORER'
Log
Example Code 57.5 Results from Comparing Two Registry Files
STARTAT= option
Details
The following example shows how to use the STARTAT= option. You must specify
an entire key sequence if you want to start listing any subkey under the root key.
The root key is optional.
Program
proc registry export = my-fileref
startat='core\explorer\icons';
run;
Details
The following example writes a list of ODS fonts to the SAS log.
Program
proc registry clearsasuser; run;
proc registry listhelp startat='ods\fonts'; run;
proc registry clearsasuser; run;
Example 6: Displaying a List of Fonts 2045
Log
Example Code 57.6 Results from Displaying a List of Fonts from the SAS Registry
58
REPORT Procedure
For more information about creating accessible tables with PROC REPORT, see
“Overview of Table Accessibility ” in Creating Accessible SAS Output Using ODS
and ODS Graphics . This feature applies to SAS 9.4M6 and later releases.
Both detail and summary reports can contain summary report lines (break lines) as
well as report rows. A summary line summarizes numerical data for a set of detail
rows or for all detail rows. PROC REPORT provides both default and customized
summaries. (See “Using Break Lines” on page 2066.)
This overview illustrates the types of reports that PROC REPORT can produce. The
statements that create the data sets and formats used in these reports are in
“Example 1: Selecting Variables and Creating a Summary Line for a Report” on page
2171. The formats are stored in a permanent SAS library. See the REPORT
procedure examples for more reports and for the statements that create them.
Figure 58.1 Simple Detail Report with a Detail Row for Each Observation
Detail row
The report in the following figure uses the same observations as the above figure.
However, the statements that produce this report
n order the rows by the values of Manager and Department.
n create a customized summary line for the whole report. A customized summary
lets you control the content and appearance of the summary information, but
you must write additional PROC REPORT statements to create one.
For an explanation of the program that produces this report, see “Example 2:
Ordering the Rows in a Report” on page 2174.
Figure 58.2 Ordered Detail Report with Default and Customized Summaries
Detail row
Manager D ep a rt me n t Sales
-----------------------------------
The summary report in the following figure contains one row for each store in the
northern sector. Each detail row represents four observations in the input data set,
one observation for each department. Information about individual departments
does not appear in this report. Instead, the value of Sales in each detail row is the
sum of the values of Sales in all four departments. In addition to consolidating
multiple observations into one row of the report, the statements that create this
report
Overview: REPORT Procedure 2051
n create default summary lines that total the sales for each sector of the city
n create a customized summary line that totals the sales for both sectors
For an explanation of the program that produces this report, see “Example 5:
Consolidating Multiple Observations into One Row of a Report” on page 2183.
Default summary
Detail ro w line for Sector
Customized summary
line for the whole report
The summary report in the following figure is similar to the above figure. The major
difference is that it also includes information for individual departments. Each
selected value of Department forms a column in the report. In addition, the
statements that create this report compute and display a variable that is not in the
input data set
For an explanation of the program that produces this report, see “Example 6:
Creating a Column for Each Value of a Variable” on page 2187.
2052 Chapter 58 / REPORT Procedure
Figure 58.4 Summary Report with a Column for Each Value of a Variable
Computed variable
---------------------------------------------------
| Combined sales for meat and dairy : $1,545.00 |
| Combined sales for produce : $390.00 |
| |
| Combined sales for all perishables: $1,935.00 |
---------------------------------------------------
The customized report in the following figure shows each manager's store on a
separate page. Only the first two pages appear here. The statements that create
this report create
n a customized heading for each page of the report
n a customized summary with text that is dependent on the total sales for that
manager's store
For an explanation of the program that produces this report, see “Example 7:
Writing a Customized Summary on Each Page” on page 2190.
Overview: REPORT Procedure 2053
Northeast Sector
Store managed by Alomar
De pa rt me nt Sales Profit
-----------------------------------
Northeast Sector
Store managed by Andrews
De pa rt me nt Sales Profit
-----------------------------------
The report in the following figure uses customized style elements to control things
like font faces, font sizes, and justification, as well as the width of the border of the
table and the width of the spacing between cells. This report was created by using
the HTML destination of the Output Delivery System (ODS) and the STYLE=
option in several statements in the procedure.
For an explanation of the program that produces this report, see “Example 13:
Specifying Style Elements for ODS Output in Multiple Statements” on page 2212.
For information about ODS, see “Output Delivery System” on page 72.
2054 Chapter 58 / REPORT Procedure
layout of the report. To design the layout, ask yourself the following types of
questions:
n What do I want to display in each column of the report?
When you understand the layout of the report, use the COLUMN and DEFINE
statements in PROC REPORT to construct the layout.
The COLUMN statement lists the items that appear in the columns of the report,
describes the arrangement of the columns, and defines headings that span multiple
columns. A report item can be
n a data set variable
Omit the COLUMN statement if you want to include all variables in the input data
set in the same order as they occur in the data set.
The DEFINE statement defines the characteristics of an item in the report. These
characteristics include how PROC REPORT uses the item in the report, the text of
the column heading, and the format to use to display values.
Note: The DEFINE statement equates to the DEFINITION window if you are using
the WINDOWS environment.
A report can contain variables that are not in the input data set. These variables
must have a usage of COMPUTED.
Display Variables
A report that contains one or more display variables has a row for every
observation in the input data set. Display variables do not affect the order of the
rows in the report. If no order variables appear to the left of a display variable, then
the order of the rows in the report reflects the order of the observations in the data
2056 Chapter 58 / REPORT Procedure
set. By default, PROC REPORT treats all character variables as display variables.
For an example, see “Example 1: Selecting Variables and Creating a Summary Line
for a Report” on page 2171.
Order Variables
A report that contains one or more order variables has a row for every observation
in the input data set. If no display variable appears to the left of an order variable,
then PROC REPORT orders the detail rows according to the ascending, formatted
values of the order variable. You can change the default order with ORDER= and
DESCENDING in the DEFINE statement.
If the report contains multiple order variables, then PROC REPORT establishes the
order of the detail rows by sorting these variables from left to right in the report.
PROC REPORT does not repeat the value of an order variable from one row to the
next if the value does not change, unless an order variable to its left changes
values. For an example, see “Example 2: Ordering the Rows in a Report” on page
2174.
The order of observations is not inherently defined for DBMS tables. If you specify
the ORDER=DATA option for input data in a DBMS table, the order of rows written
to a database table from PROC REPORT is not likely to be preserved.
Group Variables
If a report contains one or more group variables, then PROC REPORT tries to
consolidate into one row all observations from the data set that have a unique
combination of formatted values for all group variables.
When PROC REPORT creates groups, it orders the detail rows by the ascending,
formatted values of the group variable. You can change the default order with
ORDER= and DESCENDING in the DEFINE statement or with the DEFINITION
window.
If the report contains multiple group variables, then the REPORT procedure
establishes the order of the detail rows by sorting these variables from left to right
in the report. PROC REPORT does not repeat the values of a group variable from
one row to the next if the value does not change, unless a group variable to its left
changes values.
If you are familiar with procedures that use class variables, then you see that group
variables are class variables that are used in the row dimension in PROC
TABULATE.
Note: You cannot always create groups. PROC REPORT cannot consolidate
observations into groups if the report contains any order variables or any display
variables that do not have one or more statistics associated with them. (See the
COLUMN statement on page 2109.) In the interactive report window environment,
if PROC REPORT cannot immediately create groups, then the procedure changes
all display and order variables to group variables so that it can create the group
Concepts: REPORT Procedure 2057
Analysis Variables
An analysis variable is a numeric variable that is used to calculate a statistic for all
the observations represented by a cell of the report. (Across variables, in
combination with group variables or order variables, determine which observations
a cell represents.) You associate a statistic with an analysis variable in the
variable's definition or in the COLUMN statement. By default, PROC REPORT uses
numeric variables as analysis variables that are used to calculate the Sum statistic.
For more information, see the “BREAK Statement” on page 2097 and “RBREAK
Statement” on page 2134 statements.
For examples, refer to “Example 2: Ordering the Rows in a Report” on page 2174,
“Example 3: Using Aliases to Obtain Multiple Statistics for the Same Variable” on
page 2177, “Example 5: Consolidating Multiple Observations into One Row of a
Report” on page 2183, and “Example 6: Creating a Column for Each Value of a
Variable” on page 2187.
Note: Be careful when you use SAS dates in reports that contain summary lines.
SAS dates are numeric variables. Unless you explicitly define dates as some other
type of variable (ORDER, GROUP, or DISPLAY), PROC REPORT summarizes them.
2058 Chapter 58 / REPORT Procedure
Across Variables
PROC REPORT creates a column for each value of an across variable. PROC
REPORT orders the columns by the ascending, formatted values of the across
variable. You can change the default order with ORDER= and DESCENDING in the
DEFINE statement. If no other variable helps define the column, then PROC
REPORT displays the N statistic (the number of observations in the input data set
that belong to that cell of the report.) See the COLUMN statement on page 2109.
Note: When a display variable and an across variable share a column, the report
must also contain another variable that is not in the same column. When referring
to columns created by an across variable, you must use the _cn_ syntax.
If you are familiar with procedures that use class variables, then you see that
across variables are like class variables that are used in the column dimension with
PROC TABULATE. Generally, you use Across variables in conjunction with order or
group variables. For an example, see “Example 6: Creating a Column for Each Value
of a Variable” on page 2187.
Computed Variables
Computed variables are variables that you define for the report. They are not in the
input data set, and PROC REPORT does not add them to the input data set.
However, computed variables are included in an output data set if you create one.
n computing the value of the variable in a compute block associated with the
variable
For examples, refer to “Example 6: Creating a Column for Each Value of a Variable”
on page 2187, “Example 8: Displaying a Calculated Percentage Column in a Report”
on page 2195, and “Example 10: Creating an Output Data Set and Storing Computed
Variables” on page 2202.
Several items can collectively define the contents of a column in a report. For
example, in the following figure, the values that appear in the third and fourth
columns are collectively determined by Sales, an analysis variable, and by
Department, an across variable. You create this type of report with the COLUMN
statement or, in the interactive report window environment, by placing report items
above or below each other. This arrangement is called stacking items in the report
because each item generates a heading, and the headings are stacked one above
the other.
options nodate pageno=1 fmtsearch=(proclib);
proc report data=grocery split='*';
column sector manager department,sales perish;
define sector / group format=$sctrfmt. 'Sector' '';
define manager / group format=$mgrfmt. 'Manager* ';
define department/ across format=$deptfmt. '_Department_';
define sales / analysis sum format=dollar11.2 ' ';
define perish / computed format=dollar11.2 'Perishable Total';
break after manager / skip;
compute perish;
perish=_c3_+_c4_;
endcomp;
title "Sales Figures for Perishables in Northern Sectors";
where sector contains 'n' and (department='p1' or department='p2');
run;
title;
When you use multiple items to define the contents of a column, at most one of the
following can be in a column:
n a display variable with or without a statistic above or below it
n an order variable
n a group variable
n a computed variable
More than one of these items in a column creates a conflict for PROC REPORT
about which values to display.
The following table shows which report items can share a column.
2060 Chapter 58 / REPORT Procedure
Note: You cannot stack order variables with other report items.
Compute
Display Analysis Order Group d Across Statistic
Display X* X
Analysis X X
Order
Group X
Computed X
variable
Across X* X X X X
Statistic X X X
When a display variable and an across variable share a column, the report must also contain another
*
When a column is defined by stacked report items, PROC REPORT formats the
values in the column by using the format that is specified for the lowest report item
in the stack that does not have an ACROSS usage.
n analysis variable
n order variable
n group variable
n computed variable
n across variable
n N statistic
Note: The values in a column that is occupied only by an across variable are
frequency counts.
Concepts: REPORT Procedure 2061
Note: When you use the COMPUTE statement, you do not have to use a
corresponding BREAK or RBREAK statement. See “Using Break Lines” on page 2066.
Also see “Example 2: Ordering the Rows in a Report” on page 2174, which uses
COMPUTE AFTER but does not use the RBREAK statement. Use these statements
only when you want to implement one or more BREAK statement or RBREAK
statement options. See “Example 7: Writing a Customized Summary on Each Page”
on page 2190, which uses both COMPUTE AFTER MANAGER and BREAK AFTER
MANAGER.)
For an in-depth look at Using a Compute Block, refer to The REPORT Procedure: A
Primer for the Compute Block .
n define a variable that appears in a column of the report but is not in the input
data set.
n define display attributes for a report item. (See “CALL DEFINE Statement” on
page 2105.)
n define or change the value for a report item, such as showing the word “Total”
on a summary line.
In addition, all compute blocks can use most SAS language elements to perform
calculations. (See “The Contents of Compute Blocks” on page 2062.) A PROC
REPORT step can contain multiple compute blocks, but they cannot be nested.
ARRAY END
array-reference IF-THEN/ELSE
assignment LENGTH
CALL RETURN
CONTINUE sum
DO (all forms) END
n comments
n null statements
Within a compute block, you can also use these PROC REPORT features:
n Compute blocks for a customized summary can contain one or more LINE
statements, which place customized text and formatted values in the summary.
(See the “LINE Statement” on page 2132.)
n Compute blocks for a report item can contain one or more CALL DEFINE
statements, which set attributes like color and format each time a value for the
item is placed in the report. (See the “CALL DEFINE Statement” on page 2105.)
n Any compute block can reference the automatic variable _BREAK_. (See “The
Automatic Variable _BREAK_” on page 2068.)
Concepts: REPORT Procedure 2063
n by a compound name that identifies both the variable and the name of the
statistic that you calculate with it. A compound name has this form
variable-name.statistic
n by an alias that you create in the COLUMN statement or in the DEFINITION
window.
n by column number, in the form
'_Cn_'
where n is the number of the column (from left to right) in the report.
Note: Even though the columns that you define with NOPRINT and NOZERO
do not appear in the report, you must count them when you are referencing
columns by number. See the discussion of “NOPRINT” on page 2125 and
“NOZERO” on page 2126.
Note: Referencing variables that have missing values leads to missing values. If a
compute block references a variable that has a missing value, then PROC REPORT
displays that variable as a blank (for character variables) or as a period (for
numeric variables).
The following table shows how to use each type of reference in a compute block.
*
If the variable has an alias, then you must reference it with the alias.
**
Even if the variable has an alias, you must reference it by column number.
Refer to “Example 3: Using Aliases to Obtain Multiple Statistics for the Same
Variable” on page 2177, which references analysis variables by their aliases;
“Example 6: Creating a Column for Each Value of a Variable” on page 2187, which
references variables by column number; and “Example 8: Displaying a Calculated
Percentage Column in a Report” on page 2195, which references group variables
and computed variables by name.
1 COMPUTE report-item;
2 COMPUTE BEFORE;
5 COMPUTE AFTER;
Note: PROC REPORT assigns values to the columns in a row of a report from left
to right. Consequently, you cannot base the calculation of a computed variable on
any variable that appears to its right in the report. For information about how PROC
REPORT (in general) builds a report, see Results: REPORT Procedure on page 2158.
COMPUTE report-item;
Concepts: REPORT Procedure 2065
When you include the report-item argument, the compute block executes on every
observation when that particular column is processed. In general, this statement is
used for a specific report-item column so that you can calculate a value, change a
value, change a format, apply style attributes, or create a temporary variable.
COMPUTE BEFORE;
With this syntax, the compute block is executed before the first detail row. The
block is executed only once. Overall summary values for the analysis variables are
available in the compute block. In addition, values of temporary variables that are
created in this block are available to other compute blocks. Values for group or
order variables are not available in this block. Text that is generated by LINE
statements appears below the headers and above the first detail row in the report.
CALL DEFINE statements set attributes on rows that are created by an RBREAK
BEFORE statement.
When you use this syntax, the compute block is executed when the value of the
target variable changes. In this syntax:
n target specifies either a group variable or an order variable.
n BEFORE is a value for location. This value specifies that the compute block is
executed at the top (or beginning) of the section for a specific value of target.
n The value of the target variable for this specific section of the report is available
to the compute block.
n The values of temporary variables created in this block are available to other
compute blocks.
n Summary values for this specific target value are available to the compute
block.
n CALL DEFINE statements set attributes on rows that are created by a BREAK
BEFORE target statement.
The compute block is executed once for each page. The page break can be
generated either by the destination to which you send the output or by using the
following BREAK statement syntax:
BREAK <location> <break-var> /PAGE;
Typically, this block outputs text by using a LINE statement. The text appears at
the top of the page above the headers. However, it is still part of the table. Any
variable that is listed in the LINE statement takes its value from the first detail row
in the report.
COMPUTE AFTER;
The compute block is executed after the last detail row, and the block is executed
only once (for each report). Overall summary values for the analysis variables are
also available in the compute block. Values for group or order variables are not
available in this block. Text that is generated by LINE statements appears after the
last detail row in the report. This block is executed after each BY value if the PROC
REPORT code contains a BY statement. CALL DEFINE statements set attributes on
rows created by an RBREAK AFTER statement.
2066 Chapter 58 / REPORT Procedure
With this syntax, the compute block is executed when the value of the target
(break) variable changes. In this syntax:
n target specifies a variable that is defined as either a group variable or an order
variable.
n AFTER is a value for location. This value specifies that the compute block is
executed at the bottom (or end) of the section for a specific value of target.
n The value of the target variable for this specific section of the report is available
to the compute block.
n Summary values for this specific target value are available to the compute
block.
n CALL DEFINE statements set attributes on rows created by a BREAK AFTER
target statement.
The compute block is executed once for each page. The page break can be
generated either by the destination to which you send the output or by using the
following BREAK statement syntax:
BREAK <location> <break-var> /PAGE;
Any variable that is listed in the LINE statement takes its value from the last detail
row in the report. The text is placed at the end of the table, but it is still part of the
table.
Note: The exact location of the text might change for each page, based on the
amount of information that is output to each page. The text is placed on the last
page of a table if it spans multiple pages.
n values calculated for either a set of rows or for the whole report
Default summaries are produced with the BREAK statement, the RBREAK
statement. You can use default summaries to visually separate parts of the report,
to summarize information for numeric variables, or both. Options provide some
control over the appearance of the break lines, but if you choose to summarize
numeric variables, then you have no control over the content and the placement of
the summary information. (A break line that summarizes information is a summary
line.)
Customized summaries are produced in a compute block. You can control both the
appearance and content of a customized summary, but you must write the code to
do so.
1 summary line
2 page break
For LISTING output, the order in which the break lines appear is as follows:
2 summary line
5 page break
If you define a customized summary for the same location, then customized break
lines appear after underlining or double underlining. This occurs only in LISTING
output.
2068 Chapter 58 / REPORT Procedure
n the value of the break variable if the current line is part of a break between sets
of observations
n the value _RBREAK_ if the current line is part of a break at the beginning or end
of the report
n the value _PAGE_ if the current line is part of a break at the beginning or end of a
page
========= =========
Total: $6,313.00 3
========= =========
Note: When you refer in a compute block to a statistic that has an alias, do not use
a compound name. Generally, you must use the alias. However, if the statistic
shares a column with an across variable, then you must reference it by column
number. (See “Four Ways to Reference Report Items in a Compute Block” on page
2063.)
The PROC REPORT STYLE= option supports all ODS destinations except LISTING
and OUTPUT. For more information, see “Using ODS Styles with PROC REPORT” on
page 2142.
PROC REPORT supports the OUTPUT destination for SAS output data sets.
Because ODS already knows the logical structure of the data and its native form,
ODS can output a SAS data set that represents exactly the same resulting data set
that the procedure worked with internally. For more information, see “ODS
OUTPUT Statement” in SAS Output Delivery System: User’s Guide.
The value of the SAS system option CPUCOUNT= affects the performance of the
threaded sort. CPUCOUNT= suggests how many system CPUs are available for use
by the threaded procedures.
For more information, see the “THREADS System Option” in SAS System Options:
Reference and the “CPUCOUNT= System Option” in SAS System Options: Reference.
PROC REPORT<options>;
BREAK location break-variable </ options>
BY variable-1
<<DESCENDING> variable-2 …> <NOTSORTED>;
COLUMNcolumn-specification(s);
COMPUTE location <target>
</ STYLE=<style-override(s)> >;
LINE specification(s);
. . . select SAS language elements . . .
ENDCOMP;
COMPUTE report-item </ type-specification>;
CALL DEFINE (column-id ', < ' attribute-name', value>
| _ROW_, < 'attribute-name', value>);
. . . select SAS language elements . . .
ENDCOMP;
DEFINE report-item / <options>;
2072 Chapter 58 / REPORT Procedure
FREQ variable;
RBREAK location </ options>;
WEIGHT variable;
CALL DEFINE Set the value of an attribute for a particular Ex. 5, Ex. 13
column in the current row
DEFINE Describe how to use and display a report item Ex. 2, Ex. 3,
Ex. 5, Ex. 6,
Ex. 4, Ex. 7,
Ex. 8, Ex. 10,
Ex. 10, Ex.
11, Ex. 13
Examples: “Example 1: Selecting Variables and Creating a Summary Line for a Report” on page
2171
“Example 2: Ordering the Rows in a Report” on page 2174
“Example 6: Creating a Column for Each Value of a Variable” on page 2187
“Example 4: Displaying Multiple Statistics for One Variable” on page 2181
“Example 9: How PROC REPORT Handles Missing Values” on page 2198
“Example 10: Creating an Output Data Set and Storing Computed Variables” on
page 2202
“Example 13: Specifying Style Elements for ODS Output in Multiple Statements” on
page 2212
“Example 14: Using the CELLWIDTH= Style Attribute with PROC REPORT” on page
2220
Syntax
PROC REPORT <options>;
PCTLDEF=
See QNTLDEF=.
THREADS
NOTHREADS
overrides the SAS system option THREADS | NOTHREADS.
WINDOWS
NOWINDOWS
selects the interactive report window or the nonwindowing environment.
ODS Listing
HEADLINE
underlines all column headings and the spaces between them.
HEADSKIP
writes a blank line beneath all column headings.
2076 Chapter 58 / REPORT Procedure
Store and retrieve report definitions, PROC REPORT statements, and your report
profile
LIST
writes to the SAS log the PROC REPORT code that creates the current
report.
NOEXEC
suppresses the building of the report.
OUTREPT=libref.catalog.entry
stores in the specified catalog the report definition that is defined by the
PROC REPORT step that you submit.
PROFILE=libref.catalog
identifies the report profile to use.
REPORT=libref.catalog.entry
specifies the report definition to use.
Optional Arguments
BOX
uses formatting characters to add line-drawing characters to the report. These
characters
n surround each page of the report
n separate values in a summary line from other values in the same columns
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
Interaction You cannot use BOX if you use WRAP in the PROC REPORT
statement or in the ROPTIONS window or if you use FLOW in any
item definition.
BYPAGENO=number
If a BY statement is present, specifies the page number at the start of each BY
group.
This option makes table captions both visual and accessible if the
ACCESSIBLETABLE system option is specified.
#BYLINE
substitutes the entire BY line without leading or trailing blanks for #BYLINE
in the text string. The BY line uses the format variable-name=value.
#BYVALn
#BYVAL(BY-variable-name)
substitutes the current value of the specified BY variable for #BYVAL in the
text string.
n
specifies a variable by its position in the BY statement. For example,
#BYVAL2 specifies the second variable in the BY statement.
BY-variable-name
specifies a variable from the BY statement by its name. For example,
#BYVAL(YEAR) specifies the BY variable, YEAR. Variable-name is not
case sensitive.
#BYVARn
#BYVAR(BY-variable-name)
substitutes the name of the BY-variable or the label associated with the
variable (whatever the BY line would normally display) for #BYVAR in the
text string.
n
specifies a variable by its position in the BY statement. For example,
#BYVAR2 specifies the second variable in the BY statement.
BY-variable-name
specifies a variable from the BY statement by its name. For example,
#BYVAR(SITES) specifies the BY variable, SITES. Variable-name is not
case sensitive.
Tip You can use the PROC DOCUMENT OBBNOTE option to display or
edit the caption.
CENTER | NOCENTER
specifies whether to center or left-justify the report and summary text
(customized break lines).
PROC REPORT honors the first of these centering specifications that it finds:
n the CENTER or NOCENTER option in the PROC REPORT statement or the
CENTER toggle in the ROPTIONS window
n the CENTER or NOCENTER option stored in the report definition that is
loaded with REPORT= in the PROC REPORT statement
n the SAS system option CENTER or NOCENTER
COLWIDTH=column-width
specifies the default number of characters for columns containing computed
variables or numeric data set variables.
When setting the width for a column, PROC REPORT first looks at WIDTH= in
the definition for that column. If WIDTH= is not present, then PROC REPORT
uses a column width large enough to accommodate the format for the item. If no
format is associated with the item, then the column width depends on variable
types as shown in the following table.
Table 58.2 Using References in a Compute Block
Default 9
Restriction This option affects only the LISTING output. It has no affect on
other ODS output. For the formatted ODS destinations, use the
STYLE= option with the WIDTH=, CELLWIDTH=, or
OUTPUTWIDTH= style attributes. Refer to “Style Attributes Tables”
in SAS Output Delivery System: Advanced Topics for details. See how
style attributes WIDTH= and CELLWIDTH= can be used with PROC
PROC REPORT Statement 2079
COMMAND
displays command lines rather than menu bars in all REPORT windows.
After you have started PROC REPORT in the interactive report window
environment, you can display the menu bars in the current window by issuing
the COMMAND command. You can display the menu bars in all PROC REPORT
windows by issuing the PMENU command. The PMENU command affects all the
windows in your SAS session. Both of these commands are toggles.
You can store a setting of COMMAND in your report profile. PROC REPORT
honors the first of these settings that it finds:
n the COMMAND option in the PROC REPORT statement
COMPLETECOLS | NOCOMPLETECOLS
creates all possible combinations for the values of the across variables even if
one or more of the combinations do not occur within the input data set.
Consequently, the column headings are the same for all logical pages of the
report within a single BY group.
Default COMPLETECOLS
COMPLETEROWS | NOCOMPLETEROWS
displays all possible combinations of the values of the group variables, even if
one or more of the combinations do not occur in the input data set.
Consequently, the row headings are the same for all logical pages of the report
within a single BY group.
Default NOCOMPLETEROWS
#BYLINE
substitutes the entire BY line without leading or trailing blanks for #BYLINE
in the text string. The BY line uses the format variable-name=value.
#BYVALn
#BYVAL(BY-variable-name)
substitutes the current value of the specified BY variable for #BYVAL in the
text string.
n
specifies a variable by its position in the BY statement. For example,
#BYVAL2 specifies the second variable in the BY statement.
BY-variable-name
specifies a variable from the BY statement by its name. For example,
#BYVAL(YEAR) specifies the BY variable, YEAR. Variable-name is not
case sensitive.
#BYVARn
#BYVAR(BY-variable-name)
substitutes the name of the BY-variable or the label associated with the
variable (whatever the BY line would normally display) for #BYVAR in the
text string.
n
specifies a variable by its position in the BY statement. For example,
#BYVAR2 specifies the second variable in the BY statement.
BY-variable-name
specifies a variable from the BY statement by its name. For example,
#BYVAR(SITES) specifies the BY variable, SITES. Variable-name is not
case sensitive.
Note The use of the By directives with the CONTENT= option applies to
SAS 9.4M6 and to later releases.
Tip All ODS destinations except OUTPUT and LISTING support the
STYLE= option.
DATA=SAS-data-set
specifies the input data set.
EXCLNPWGT
excludes observations with nonpositive weight values (zero or negative) from
the analysis. By default, PROC REPORT treats observations with negative
weights like observations with zero weights and counts them in the total
number of observations.
Alias EXCLNPWGTS
FORMCHAR <(position(s))>='formatting-character(s)'
defines the characters to use as line-drawing characters in the report.
position(s)
identifies the position of one or more characters in the SAS formatting-
character string. A space or a comma separates the positions.
formatting-character(s)
lists the characters to use for the specified positions. PROC REPORT assigns
characters in formatting-character(s) to position(s), in the order in which
they are listed. For example, the following option assigns the asterisk (*) to
the third formatting character, the number sign (#) to the seventh character,
and does not alter the remaining characters: formchar(3,7)='*#'
Table 58.3 Formatting Characters Used by PROC REPORT
6 | Leftmost character in a
row of horizontal
separators
7 + Intersection of a column
of vertical characters and
a row of horizontal
characters
8 | Rightmost character in a
row of horizontal
separators
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
HEADLINE
underlines all column headings and the spaces between them at the top of each
page of the report.
The HEADLINE option underlines with the second formatting character. (See
the discussion of “FORMCHAR <(position(s))>='formatting-character(s)' ” on
page 2081.)
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
HEADSKIP
writes a blank line beneath all column headings (or beneath the underlining that
the HEADLINE option writes) at the top of each page of the report.
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
HELP=libref.catalog
identifies the library and catalog containing user-defined help for the report.
This help can be in CBT or HELP catalog entries. You can write a CBT or HELP
entry for each item in the report with the BUILD procedure in SAS/AF software.
Store all such entries for a report in the same catalog.
Specify the entry name for help for a particular report item in the DEFINITION
window for that report item or in a DEFINE statement.
LIST
writes to the SAS log the PROC REPORT code that creates the current report.
This listing might differ in these ways from the statements that you submit:
n It shows some defaults that you might not have specified.
n It omits some statements that are not specific to the REPORT procedure,
whether you submit them with the PROC REPORT step or had previously
submitted them. These statements include
LIST
OUT=
OUTREPT=
PROFILE=
REPORT=
WINDOWS | NOWINDOWS
n It includes these style(<location>)= options:
CENTER SPACING
HEADER USAGE
LEFT WIDTH
RIGHT
Restriction The LIST option does not support styles by columns if you specify
style(column)= in the DEFINE statement.
PROC REPORT Statement 2085
LS=line-size
specifies the length of a line of the report.
PROC REPORT honors the line size specifications that it finds in the following
order of precedence:
n the LS= option in the PROC REPORT statement or LINESIZE= in the
ROPTIONS window
n the LS= setting stored in the report definition loaded with REPORT= in the
PROC REPORT statement
n the SAS system option LINESIZE=
Note: The PROC REPORT LS= option takes precedence over all other line size
options.
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
MISSING
considers missing values as valid values for group, order, or across variables.
Special missing values used to represent numeric values (the letters A through
Z and the underscore (_) character) are each considered as a different value. A
group for each missing value appears in the report. If you omit the MISSING
option, then PROC REPORT does not include observations with a missing value
for any group, order, or across variables in the report.
NAMED
writes name= in front of each value in the report, where name is the column
heading for the value.
Interaction When you use the NAMED option, PROC REPORT automatically
uses the NOHEADER option.
NOALIAS
lets you use a report that was created before compute blocks required aliases. If
you use NOALIAS, then you cannot use aliases in compute blocks.
NOCENTER
See “CENTER|NOCENTER ” on page 2078.
2086 Chapter 58 / REPORT Procedure
NOCOMPLETECOLS
See “COMPLETECOLS|NOCOMPLETECOLS” on page 2079.
NOCOMPLETEROWS
See “COMPLETEROWS|NOCOMPLETEROWS” on page 2079.
NOEXEC
suppresses the building of the report. Use NOEXEC with OUTREPT= to store a
report definition in a catalog entry. Use NOEXEC with LIST and REPORT= to
display a listing of the specified report definition.
Alias NOEXECUTE
NOHEADER
suppresses column headings, including headings that span multiple columns.
When you suppress the display of column headings in the interactive report
window environment, you cannot select any report items.
NOTHREADS
See “THREADS|NOTHREADS” on page 2095.
NOWINDOWS
See “WINDOWS|NOWINDOWS ” on page 2096.
Alias NOWD
Default The default mode is now the nonwindowing environment. You no longer
have to specify NOWINDOWS or NOWD.
OUT=SAS-data-set
names the output data set. If this data set does not exist, then PROC REPORT
creates it. The data set contains one observation for each report row and one
observation for each unique summary line. If you use both customized and
default summaries at the same place in the report, then the output data set
contains only one observation because the two summaries differ only in how
they present the data. Information about customization (underlining, color, text,
and so on) is not data and is not saved in the output data set.
The output data set contains one variable for each column of the report. PROC
REPORT tries to use the name of the report item as the name of the
corresponding variable in the output data set. However, it cannot perform this
substitution if a data set variable is under or over an across variable or if a data
set variable appears multiple times in the COLUMN statement without aliases.
In these cases, the name of the variable is based on the column number (_C1_,
_C2_, and so on).
Output data set variables that are derived from input data set variables retain
the formats of their counterparts in the input data set. PROC REPORT derives
labels for these variables from the corresponding column headings in the report
unless the only item defining the column is an across variable. In that case, the
variables have no label. If multiple items are stacked in a column, then the
labels of the corresponding output data set variables come from the analysis
variable in the column.
The output data set also contains a character variable named _BREAK_. If an
observation in the output data set derives from a detail row in the report, then
PROC REPORT Statement 2087
Examples “Example 10: Creating an Output Data Set and Storing Computed
Variables” on page 2202
OUTREPT=libref.catalog.entry
stores in the specified catalog entry the REPORT definition that is defined by
the PROC REPORT step that you submit. PROC REPORT assigns the entry a
type of REPT.
The stored report definition might differ in these ways from the statements that
you submit:
n It omits some statements that are not specific to the REPORT procedure,
whether you submit them with the PROC REPORT step or whether they are
already in effect when you submit the step. These statements include
BY TITLE
FOOTNOTE WEIGHT
FREQ WHERE
n It includes these PROC REPORT statement options. Others are omitted.
BOX NOHEADER
CENTER PAGESIZE
COLWIDTH PANELS
COMPLETECOLS PCTLDEF
COMPLETEROWS PSPACE
FORMCHAR QMARKERS
HEADLINE QMETHOD
HEADSKIP SHOWALL
HELP SPACING
LINESIZE SPLIT
MISSING VARDEF
NAMED WRAP
n It omits SAS system options.
2088 Chapter 58 / REPORT Procedure
PANELS=number-of-panels
specifies the number of panels on each page of the report. If the width of a
report is less than half of the line size, then you can display the data in multiple
sets of columns so that rows that would otherwise appear on multiple pages
appear on the same page. Each set of columns is a panel. A familiar example of
this type of report is a telephone book, which contains multiple panels of names
and telephone numbers on a single page.
When PROC REPORT writes a multipanel report, it fills one panel before
beginning the next.
n line size
Default 1
Note This option affects only the LISTING output. It has no affect on other
ODS output. However, the COLUMNS= option in the ODS PRINTER,
ODS PDF, and ODS RTF statements produces similar results. For
details, see the statements in SAS Output Delivery System: User’s
Guide.
Tip If number-of-panels is larger than the number of panels that can fit on
the page, then PROC REPORT creates as many panels as it can. Let
PROC REPORT put your data in the maximum number of panels that
can fit on the page by specifying a large number of panels (for example,
99).
See For information about the space between panels and the line size, see
the discussions of PSPACE= on page 2090 and the discussion of LS= on
page 2085.
PCTLDEF=
See QNTLDEF= on page 2090.
PROFILE=libref.catalog
identifies the report profile to use. A report profile does the following:
n specifies the location of menus that define alternative menu bars and menus
for the REPORT and COMPUTE windows
n sets defaults for WINDOWS, PROMPT, and COMMAND
PROC REPORT uses the entry REPORT.PROFILE in the catalog that you specify
as your profile. If no such entry exists, or if you do not specify a profile, then
PROC REPORT uses the entry REPORT.PROFILE in SASUSER.PROFILE. If you
have no profile, then PROC REPORT uses default menus and the default
settings of the options.
You create a profile from the PROFILE window while using PROC REPORT in an
interactive report window environment. To create a profile:
PROC REPORT Statement 2089
n Select OK to exit the PROFILE window. When you exit the window, PROC
REPORT stores the profile in SASUSER.PROFILE.REPORT.PROFILE. Use the
CATALOG procedure or the Explorer window to copy the profile to another
location.
Note: If, after opening the PROFILE window, you decide not to create a
profile, then select CANCEL to close the window.
PROMPT
opens the REPORT window and starts the PROMPT facility. This facility guides
you through creating a new report or adding more data set variables or statistics
to an existing report.
If you start PROC REPORT with prompting, then the first window gives you a
chance to limit the number of observations that are used during prompting.
When you exit the prompter, PROC REPORT removes these limits:
n the PROMPT option in the PROC REPORT statement
If you omit PROMPT from the PROC REPORT statement, then the procedure
uses the setting in your report profile, if you have one. If you do not have a
report profile, then PROC REPORT does not use the prompt facility. For
information about report profiles, see “PROFILE Window” on page 2246.
Restriction When you use the PROMPT option, you open the REPORT window.
When the REPORT window is open, you cannot send procedure
output to any ODS destination.
Tip You can store a setting of PROMPT in your report profile. PROC
REPORT honors the first of these settings that it finds.
PS=page-size
specifies the number of lines in a page of the report.
PROC REPORT honors the first of these page size specifications that it finds:
n the PS= option in the PROC REPORT statement
n the PS= setting in the report definition specified with REPORT= in the PROC
REPORT statement
n the SAS system option PAGESIZE=
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
2090 Chapter 58 / REPORT Procedure
PSPACE=space-between-panels
specifies the number of blank characters between panels. PROC REPORT
separates all panels in the report by the same number of blank characters. For
each panel, the sum of its width and the number of blank characters separating
it from the panel to its left cannot exceed the line size.
Default 4
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
QMARKERS=number
specifies the default number of markers to use for the P 2 estimation method.
The number of markers controls the size of fixed memory space.
Default The default value depends on which quantiles you request. For the
median (P50), number is 7. For the quartiles (P25 and P75), number is
25. For the quantiles P1, P5, P10, P90, P95, or P99, number is 105. If you
request several quantiles, then PROC REPORT uses the largest default
value of number.
Tip Increase the number of markers above the default settings to improve
the accuracy of the estimates; you can reduce the number of markers to
conserve computing resources.
QMETHOD=OS | P2
specifies the method that PROC REPORT uses to process the input data when it
computes quantiles. If the number of observations is less than or equal to the
value of the QMARKERS= option, and the value of the QNTLDEF= option is 5,
then both methods produce the same results.
OS
uses order statistics. PROC UNIVARIATE uses this technique.
P2
uses the P2 method to approximate the quantile.
Default OS
Restriction When QMETHOD=P2, PROC REPORT does not compute MODE and
weighted quantiles.
QNTLDEF=1 | 2 | 3 | 4 | 5
specifies the mathematical definition that the procedure uses to calculate
quantiles when the value of the QMETHOD= option is OS. When
QMETHOD=P2, you must use QNTLDEF=5.
PROC REPORT Statement 2091
Alias PCTLDEF=
Default 5
REPORT=libref.catalog.entry
specifies the report definition to use. PROC REPORT stores all report
definitions as entries of type REPT in a SAS catalog.
Interaction If you use REPORT=, then you cannot use the COLUMN statement.
SHOWALL
overrides options in the DEFINE statement that suppress the display of a
column.
SPACING=space-between-columns
specifies the number of blank characters between columns. For each column,
the sum of its width and the blank characters between it and the column to its
left cannot exceed the line size.
Default 2
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
Interactions PROC REPORT separates all columns in the report by the number
of blank characters specified by SPACING= in the PROC REPORT
statement unless you use SPACING= in the DEFINE statement to
change the spacing to the left of a specific item.
SPANROWS
specifies that when the value of a GROUP or ORDER column is the same in
multiple rows, the value is displayed in a single cell that occupies that column in
all the rows for which the value is the same. A box is essentially created for that
part of the column, and no rows appear in that box.
The SPANROWS option also allows GROUP and ORDER variables values to
repeat when the values break across pages. Only the PDF, PS, and
TAGSETS.RTF destinations support this part of the feature.
Notes The SPANROWS option has no effect on the Report Window, data sets,
LISTING, or OUTPUT destinations.
When the LINE statement occurs at the bottom of a page, the GROUP
and ORDER variables values do not repeat when the values break across
RTF and TAGSETS.RTF pages.
2092 Chapter 58 / REPORT Procedure
Tip If a summary row appears in the middle of a set of rows that would
otherwise be spanned by a single cell, the summary row introduces its
own cell in that column. This action breaks the spanning cell into two
cells even when the value of the GROUP or ORDER variable that comes
after the summary row is unchanged.
SPLIT='character'
specifies the split character. PROC REPORT breaks column text when it reaches
that character and continues the text on the next line. The split character itself
is not part of the column heading or text value although each occurrence of the
split character counts toward the 256-character maximum for a label.
Restriction This option works only in the column heading on ODS destinations
other than LISTING output.
Interaction The FLOW option in the DEFINE statement honors the split
character.
STYLE<(location(s))>=<style-override(s)>
specifies one or more style overrides to use for different parts of the report.
location(s)
identifies the part of the report that the STYLE= option affects. The
following table shows what parts of a report are affected by values of
location.
The valid and default values for location vary by what statement the STYLE=
option appears in. The following table shows valid and default location
values for each statement. To specify more than one value of location in the
same STYLE= option, separate each value with a space.
Table 58.4 Locations and Default Style Elements for Each Statement in PROC
REPORT
Valid Default
Location Location Part of Report Default Style
Statement Values Value Affected Element
Valid Default
Location Location Part of Report Default Style
Statement Values Value Affected Element
All names shown in the following table can be used in place of location(s) in
the PROC statement. The DEFINE statement accepts column and header.
The BREAK and RBREAK statement accept summary and lines.
style-override
specifies one or more style attributes or style elements to override the
default style element and attributes in a specific area of a report. You can
specify a style override in two ways:
n Specify a style element. A style element is a collection of style attributes
that apply to a particular part of the output for a SAS program.
n Specify a style attribute. A style attribute is a name-value pair that
describes a single behavioral or visual aspect of a piece of output. This is
the most specific method of changing the appearance of your output.
Note: These style overrides take precedence over those specified in the
PROC statement.
style-element-name | [style-attribute-name-1=style-attribute-value-1
<style-attribute-name-2=style-attribute-value-2 …>]
Note: You can use braces ({ and }) instead of square brackets ([ and ]).
style-element-name
is the name of a style element that is part of an ODS style template. SAS
provides some style templates. Users can create their own style
templates with the TEMPLATE procedure. See SAS Output Delivery
System: Procedures Guide
See For information about using styles with PROC REPORT, see “Using
ODS Styles with PROC REPORT” on page 2142.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions ” on page 2149.
style-attribute-name
specifies the attribute to change. For a list of the commonly used style
attributes that you can set with the STYLE= option in PROC PRINT,
PROC TABULATE, and PROC REPORT, see Table 58.114 on page 2146.
See For information about using styles with PROC REPORT, see “Using
ODS Styles with PROC REPORT” on page 2142.
PROC REPORT Statement 2095
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions ” on page 2149.
style-attribute-value
specifies a value for the attribute. Each attribute has a different set of
valid values. A SAS format can also be used as an attribute value for
conditional formatting.
See For information about using styles with PROC REPORT, see “Using
ODS Styles with PROC REPORT” on page 2142.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions ” on page 2149.
Restriction All ODS destinations except OUTPUT and LISTING support the
STYLE= option.
See “Style Elements and Style Attributes for Table Regions ” on page
2149 for details.
Example “Example 13: Specifying Style Elements for ODS Output in Multiple
Statements” on page 2212
THREADS | NOTHREADS
enables or disables parallel processing of the input data set. This option
overrides the SAS system option THREADS | NOTHREADS unless the system
option is restricted. (See Restriction.) See “Support for Parallel Processing” in
SAS Language Reference: Concepts for more information.
Interaction PROC REPORT uses the value of the SAS system option THREADS
except when a BY statement is specified or the value of the SAS
system option CPUCOUNT is less than 2. You can specify the
THREADS option in the PROC REPORT statement to force PROC
REPORT to use parallel processing in these situations.
2096 Chapter 58 / REPORT Procedure
VARDEF=divisor
specifies the divisor to use in the calculation of the variance and standard
deviation. The following table shows the possible values for divisor and
associated divisors.
Table 58.5 Possible Values for VARDEF=
N Number of observations n
Default DF
Requirement To compute the standard error of the mean and Student's t-test,
use the default value of VARDEF=.
Tips When you use the WEIGHT statement and VARDEF=DF, the
variance is an estimate of σ 2 , where the variance of the ith
observation is var xi = σ 2 /wi and wi is the weight for the ith
observation. This yields an estimate of the variance of an
observation with unit weight.
WINDOWS | NOWINDOWS
selects an interactive report window or nonwindowing environment.
When you use WINDOWS, SAS opens the REPORT window for the interactive
report interface, which enables you to modify a report repeatedly and to see the
modifications immediately. When you use NOWINDOWS, PROC REPORT runs
BREAK Statement 2097
without the REPORT window and sends its output to the open output
destinations.
Alias WD | NOWD
Restriction When you use the WINDOWS option, you can send the output only
to a SAS data set or to a Printer destination.
See If you are using the WINDOWS environment, see information about
the report profile in PROFILE= on page 2088.
WRAP
displays one value from each column of the report, on consecutive lines if
necessary, before displaying another value from the first column. By default,
PROC REPORT displays only values for as many columns as it can fit on one
page. It fills a page with values for these columns before starting to display
values for the remaining columns on the next page.
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
Interaction When WRAP is in effect, PROC REPORT ignores PAGE in any item
definitions.
Tip Typically, you use WRAP in conjunction with the NAMED option in
order to avoid wrapping column headings.
BREAK Statement
Produces a default summary at a break (a change in the value of a group or order variable). The
information in a summary applies to a set of observations. The observations share a unique
combination of values for the break variable and all other group or order variables to the left of the
break variable in the report.
Syntax
BREAK location break-variable </ options>;
Required Arguments
location
controls the placement of the break lines and is either
AFTER
places the break lines immediately after the last row of each set of rows that
have the same value for the break variable.
BEFORE
places the break lines immediately before the first row of each set of rows
that have the same value for the break variable.
break-variable
is a group or order variable. The REPORT procedure writes break lines each time
the value of this variable changes.
BREAK Statement 2099
Optional Arguments
COLOR=color
specifies the color of the break lines in the REPORT window. The default color is
the color of Foreground in the SASCOLOR window. You can use the following
colors:
Table 58.6 Colors Allowed for Break Lines
BLACK MAGENTA
BLUE ORANGE
BROWN PINK
CYAN RED
GRAY WHITE
GREEN YELLOW
Restriction This option affects only output in the interactive report window
environment.
Note Not all operating environments and devices support all colors, and
on some operating systems and devices, one color might map to
another color. For example, if the DEFINITION window displays the
word BROWN in yellow characters, then selecting BROWN results in
a yellow item.
CONTENTS='link-text'
specifies the text for the entries in the table of contents created by default or by
options settings in ODS destinations that support the STYLE= option. If the
PAGE= option and the CONTENTS= option with link-text is specified, PROC
REPORT uses the value of link-text as a link for tables created in the table of
contents.
Interactions If the DEFINE statement has a page option and there is a BREAK
BEFORE statement with a PAGE option and the CONTENTS=
option has a value other than empty quotation marks specified,
then PROC REPORT adds a directory to the table of contents and
2100 Chapter 58 / REPORT Procedure
For RTF output, the CONTENTS= option has no effect on the RTF
body file unless you turn on the CONTENTS=YES option in the
ODS RTF statement. In that case, a Table of Contents page is
inserted at the front of your RTF output file. Your CONTENTS=
option text from PROC REPORT then shows up in this separate
Table of Contents page.
If there are multiple BREAK BEFORE statements, then the link text
is the concatenation of all of the CONTENTS= values or of all the
default values.
DOL
(for double overlining) uses the 13th formatting character to overline each value
n that appears in the summary line
n that would appear in the summary line if you specified the SUMMARIZE
option
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
Interaction If you specify both the OL and DOL options, then PROC REPORT
honors only OL.
DUL
(for double underlining) uses the 13th formatting character to underline each
value
n that appears in the summary line
n that would appear in the summary line if you specified the SUMMARIZE
option
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
Interaction If you specify both the UL and DUL options, then PROC REPORT
honors only UL.
OL
(for overlining) uses the second formatting character to overline each value
n that appears in the summary line
n that would appear in the summary line if you specified the SUMMARIZE
option
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
Interaction If you specify both the OL and DOL options, then PROC REPORT
honors only OL.
PAGE
in LISTING output, starts a new page. In the ODS destinations that support the
STYLE= option, the PAGE option starts a new table. All ODS destinations
except OUTPUT and LISTING support the STYLE= option.
Interaction If you use PAGE in the BREAK statement and you create a break at
the end of the report, then the summary for the whole report
appears on a separate page.
SKIP
writes a blank line for the last break line.
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
2102 Chapter 58 / REPORT Procedure
STYLE<location(s)>=<style-override>
specifies the style override to use for default summary lines that are created
with the BREAK statement.
style-element-name | [style-attribute-name-1=style-attribute-value-1
<style-attribute-name-2=style-attribute-value-2 …>]
Restriction All ODS destinations except OUTPUT and LISTING support the
STYLE= option.
See “Style Elements and Style Attributes for Table Regions ” on page
2149
SUMMARIZE
writes a summary line in each group of break lines. A summary line for a set of
observations contains values for the following:
n the break variable (which you can suppress with the SUPPRESS option)
n statistics
n analysis variables
n computed variables
The following table shows how PROC REPORT calculates the value for each
type of report item in a summary line that is created by the BREAK statement:
*
If you reference a variable with a missing value in a customized summary line,
then PROC REPORT displays that variable as a blank (for character variables) or a
period (for numeric variables).
Note: PROC REPORT cannot create groups in a report that contains order or
display variables.
SUPPRESS
suppresses printing of
n the value of the break variable in the summary line
n any underlining and overlining in the break lines in the column that contains
the break variable
Interaction If you use SUPPRESS, then the value of the break variable is
unavailable for use in customized break lines unless you assign a
value to it in the compute block that is associated with the
break. (See “COMPUTE Statement” on page 2112.)
UL
(for underlining) uses the second formatting character to underline each value
n that appears in the summary line
n that would appear in the summary line if you specified the SUMMARIZE
option
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
Interaction If you specify both the UL and DUL options, then PROC REPORT
honors only UL.
Details
Note: If you define a customized summary for the break, then customized break
lines appear after underlining or double underlining. For more information about
customized break lines, see “COMPUTE Statement” on page 2112 and “LINE
Statement” on page 2132.
BY Statement
Creates a separate report on a separate page for each BY group.
CALL DEFINE Statement 2105
Restriction: If you use the BY statement, then you must use the PROC REPORT statement in
the nonwindowing environment (NOWINDOWS or NOWD option).
Interaction: If you use the RBREAK statement in a report that uses BY processing, then PROC
REPORT creates a default summary for each BY group. In this case, you cannot
summarize information for the whole report.
Tip: Using the BY statement does not make the FIRST. and LAST. variables available in
compute blocks.
See: “BY” on page 74
Syntax
BY <DESCENDING> variable-1
<<DESCENDING> variable-2 …> <NOTSORTED>;
Required Argument
variable
specifies the variable that the procedure uses to form BY groups. You can
specify more than one variable. If you do not use the NOTSORTED option in the
BY statement, then the observations in the data set either must be sorted by all
the variables that you specify or must be indexed appropriately. Variables in a
BY statement are called BY variables.
Optional Arguments
DESCENDING
specifies that the data set is sorted in descending order by the variable that
immediately follows the word DESCENDING in the BY statement.
NOTSORTED
specifies that observations are not necessarily sorted in alphabetic or numeric
order. For example, the data are grouped in chronological order.
nonwindowing environment. In fact, URL, URLBP, and URLP are effective only in the nonwindowing
environment. The STYLE= and URL attributes are effective only when you are using ODS to create
output.
compute weight;
if weight > 100.0 then
call define(_row_, "style/merge", "style={font_weight=bold}");
endcomp;
Syntax
CALL DEFINE (column-id | _ROW_, ' attribute-name', value);
Required Arguments
column-id
specifies a column name or a column number (that is, the position of the column
from the left edge of the report). A column ID can be one of the following:
n a character literal (in quotation marks) that is the column name
n the automatic variable _COL_, which identifies the column that contains the
report item that the compute block is attached to
_ROW_
is an automatic variable that indicates the entire current row.
CALL DEFINE Statement 2107
attribute-name
is the attribute to define. For attribute names, refer to Table 58.111 on page 2107.
Note: The attributes BLINK, HIGHLIGHT, and RVSVIDEO do not work on all
devices.
value
sets the value for the attribute. For values for each attribute, refer to Table
58.111 on page 2107.
Table 58.7 Attribute Descriptions
BLINK Controls blinking of current value 1 turns blinking on; 0 Interactive report
turns it off window
environment
COLOR Controls the color of the current 'blue', 'red', 'pink', 'green', Interactive report
value in the REPORT window 'cyan', 'yellow', 'white', window
'orange', 'black', environment
'magenta', 'gray', 'brown'
COMMAND Specifies that a series of commands A quoted string of SAS Interactive report
follows commands to submit to window
the command line environment
FORMAT Specifies a format for the column A SAS format or a user- Interactive report
defined format window and
nonwindowing
environments
HIGHLIGHT Controls highlighting of the current 1 turns highlighting on; Interactive report
value 0 turns it off window
environment
RVSVIDEO Controls display of the current value 1 turns reverse video on; Interactive report
0 turns it off window
environment
STYLE Specifies the style override of the an ODS style attribute All ODS
column or row destinations.
STYLE/ Merge the style specified with the an ODS style attribute All ODS
MERGE existing styles in the same row or destinations.
column
STYLE/ Replace the existing style in the row an ODS style attribute All ODS
REPLACE or column destinations.
2108 Chapter 58 / REPORT Procedure
URL Makes the contents of each cell of A quoted URL (either ODS HTML, HTML5,
the column a link to the specified single or double RTF, PDF,
Uniform Resource Locator (URL) quotation marks can be PowerPoint, EPUB
used) destinations
URLBP Makes the contents of each cell of A quoted URL (either ODS HTML and
the column a link. The link points to a single or double HTML5 destinations
Uniform Resource Locator that is a quotation marks can be
concatenation of used)
1 the string that is specified by the
BASE= option in the ODS HTML
statement
2 the string that is specified by the
PATH= option in the ODS HTML
statement
3 the value of the URLBP attribute
1
URLP Makes the contents of each cell of A quoted URL (either ODS HTML and
the column a link. The link points to a single or double HTML5 destinations
Uniform Resource Locator that is a quotation marks can be
concatenation of used)
1 the string that is specified by the
PATH= option in the ODS HTML
statement
2 the value of the URLP attribute
1 For information about the BASE= and PATH= options, see the documentation for the ODS HTML Statement in SAS
Output Delivery System: User’s Guide.
Details
The STYLE= value functions like the STYLE= option in other statements in PROC
REPORT. However, instead of acting as an option in a statement, it becomes the
value for the STYLE attribute. For example, the following CALL DEFINE statement
sets the background color to yellow and the font size to 7 for the specified column:
call define(_col_, "style",
"style=[backgroundcolor=yellow fontsize=7]");
COLUMN Statement 2109
For information about style precedence, see “Order of Precedence When Applying
Style Attributes to Data Cells” on page 2151.
Restriction: All ODS destinations except OUTPUT and LISTING support the
STYLE= option.
Interaction: If you set a style override for the CALLDEF location in the PROC
REPORT statement and you want to use that exact style override in a CALL
DEFINE statement, use an empty string as the value for the STYLE attribute, as
shown here:
call define (_col_, "STYLE", "" );
Tip: FONT names that contain characters other than letters or underscores must be
enclosed in quotation marks.
Featured in: “Example 13: Specifying Style Elements for ODS Output in Multiple
Statements” on page 2212
STYLE= and STYLE/REPLACE attributes specify the style element to be used for
ODS. If a style already exists for a cell or row, these STYLE attributes tell CALL
DEFINE to replace the style specified by the STYLE= option. For an example
program, see “Example 16: Using STYLE/REPLACE in PROC REPORT CALL DEFINE
Statement” on page 2225.
The STYLE/MERGE attribute tells CALL DEFINE to merge the style specified by
the STYLE= value with the existing style attributes that are in the same cell or row.
If there is no previously existing STYLE= value to merge, STYLE/MERGE acts the
same as the STYLE or STYLE/REPLACE attributes. For an example program, see
“Example 15: Using STYLE/MERGE in PROC REPORT CALL DEFINE Statement” on
page 2223.
COLUMN Statement
Describes the arrangement of all columns and of headings that span more than one column.
Restriction: You cannot use the COLUMN statement if you use REPORT= in the PROC REPORT
statement.
Examples: “Example 1: Selecting Variables and Creating a Summary Line for a Report” on page
2171
“Example 3: Using Aliases to Obtain Multiple Statistics for the Same Variable” on
page 2177
“Example 6: Creating a Column for Each Value of a Variable” on page 2187
2110 Chapter 58 / REPORT Procedure
Syntax
COLUMN column-specification(s);
Required Argument
column-specification(s)
is one or more of the following:
n report-item(s)
n report-item=name
report-item(s)
identifies items that each form a column in the report.
If you stack a statistic with an analysis variable, then the statistic that you
name in the column statement overrides the statistic in the definition of the
analysis variable. For example, the following PROC REPORT step produces a
report that contains the minimum value of Sales for each sector:
proc report data=grocery;
column sector sales,min;
define sector/group;
define sales/analysis sum;
run;
COLUMN Statement 2111
If you stack a display variable under an across variable, then all the values of
that display variable appear in the report.
Interaction A series of stacked report items can include only one analysis
variable or statistic. If you include more than one analysis
variable or statistic, then PROC REPORT returns an error
because it cannot determine which values to put in the cells of
the report.
Tip You can use parentheses to group report items whose headings
should appear at the same level rather than stacked one above
the other.
header
is a string of characters that spans one or more columns in the report.
PROC REPORT prints each heading on a separate line. You can use split
characters in a heading to split one heading over multiple lines. See the
discussion of SPLIT= on page 2092.
In LISTING output, if the first and last characters of a heading are one of
the following characters, then PROC REPORT uses that character to
expand the heading to fill the space over the column or columns. Note
that the <> and the >< must be paired. − = . _ * + <> ><
Similarly, if the first character of a heading is < and the last character is >,
or vice versa, then PROC REPORT expands the heading to fill the space
over the column by repeating the first character before the text of the
heading and the last character after it.
report-item(s)
specifies the columns to span.
report-item=name
specifies an alias for a report item. You can use the same report item more
than once in a COLUMN statement. However, you can use only one DEFINE
statement for any given name. (The DEFINE statement designates
characteristics such as formats and customized column headings. If you omit
a DEFINE statement for an item, then the REPORT procedure uses defaults.)
Assigning an alias in the COLUMN statement does not by itself alter the
report. However, it does enable you to use separate DEFINE statements for
each occurrence of a variable or statistic.
Note You cannot always use an alias. When you refer in a compute block
to a report item that has an alias, you must use the alias. However,
if the report item shares a column with an across variable, then you
must reference the column by column number. (See “Four Ways to
Reference Report Items in a Compute Block” on page 2063.)
COMPUTE Statement
Starts a compute block containing one or more programming statements that PROC REPORT
executes as it builds the report.
Restriction: If you are sending a report to multiple ODS destinations or to an ODS document
that is replayed later, avoid the use of non-deterministic functions in a COMPUTE
block (for example, LAG, DIF, RANUNI, DATETIME, and so on). If you need to use
data created by such functions in your report, call the functions in a DATA step and
store the results in the data set before running PROC REPORT.
Interaction: An ENDCOMP statement must mark the end of the group of statements in the
compute block.
Note: A compute block can be associated with a report item or with a location (at the top
or bottom of a report; at the top or bottom of a page; before or after a set of
observations). You create a compute block with the COMPUTE window or with the
COMPUTE statement. One form of the COMPUTE statement associates the
compute block with a report item. Another form associates the compute block with
a location. For a list of the SAS language elements that you can use in compute
blocks, see “The Contents of Compute Blocks” on page 2062.
Tip: For information about how to use compute blocks, see The REPORT Procedure: A
Primer for the Compute Block
Examples: “Example 2: Ordering the Rows in a Report” on page 2174
“Example 3: Using Aliases to Obtain Multiple Statistics for the Same Variable” on
page 2177
“Example 5: Consolidating Multiple Observations into One Row of a Report” on
page 2183
“Example 6: Creating a Column for Each Value of a Variable” on page 2187
“Example 7: Writing a Customized Summary on Each Page” on page 2190
COMPUTE Statement 2113
Syntax
COMPUTE location <target>
</ STYLE=<style-override(s) > >;
LINE specification(s);
. . . select SAS language elements . . .
ENDCOMP;
COMPUTE report-item </ type-specification>;
CALL DEFINE (column-id, 'attribute-name', value);
. . . select SAS language elements . . .
ENDCOMP;
Required Arguments
You must specify either a location or a report item in the COMPUTE statement.
location
determines where the compute block executes in relation to target.
AFTER
executes the compute block at a break in one of the following places:
n immediately after the last row of a set of rows that have the same value
for the variable that you specify as target or, if there is a default summary
on that variable, immediately after the creation of the preliminary
summary line. (See Results: REPORT Procedure on page 2158.)
n in LISTING output, near the bottom of each page, immediately before any
footnotes, if you specify _PAGE_ as target.
n at the end of the report if you omit a target.
BEFORE
executes the compute block at a break in one of the following places:
n immediately before the first row of a set of rows that have the same
value for the variable that you specify as target or, if there is a default
summary on that variable, immediately after the creation of the
preliminary summary line. (See Results: REPORT Procedure on page
2158.)
n in LISTING output, near the top of each page, between any titles and the
column headings, if you specify _PAGE_ as target.
n immediately before the first detail row if you omit a target.
2114 Chapter 58 / REPORT Procedure
Note If a report contains more columns than fit on a printed page, PROC
REPORT generates an additional page or pages to contain the
remaining columns. In this case, when you specify _PAGE_ as target,
the COMPUTE block does NOT re-execute for each of these
additional pages; the COMPUTE block re-executes only after all
columns have been printed.
Examples “Example 3: Using Aliases to Obtain Multiple Statistics for the Same
Variable” on page 2177
report-item
specifies a data set variable, a computed variable, or a statistic to associate the
compute block with. If you are working in the nonwindowing environment, then
you must include the report item in the COLUMN statement. If the item is a
computed variable, then you must include a DEFINE statement for it.
Optional Arguments
STYLE<(location(s))>=<style-override(s)>
specifies the style to use for the text that is created by any LINE statements in
this compute block.
style-element-name | [style-attribute-name-1=style-attribute-value-1
<style-attribute-name-2=style-attribute-value-2 …>]
Restriction All ODS destinations except OUTPUT and LISTING support the
STYLE= option.
See “Style Elements and Style Attributes for Table Regions ” on page
2149
Example “Example 13: Specifying Style Elements for ODS Output in Multiple
Statements” on page 2212
target
controls when the compute block executes. If you specify a location (BEFORE
or AFTER) for the COMPUTE statement, then you can also specify target, which
can be one of the following:
break-variable
is a group or order variable.
When you specify a break variable, PROC REPORT executes the statements
in the compute block each time the value of the break variable changes.
CENTER
centers each line that the compute block writes.
LEFT
left-justifies each line that the compute block writes.
RIGHT
right-justifies each line that the compute block writes.
Default CENTER
type-specification
specifies the type. (Optional) Also specifies the length of report-item. If the
report item that is associated with a compute block is a computed variable, then
PROC REPORT assumes that it is a numeric variable unless you use a type
specification to specify that it is a character variable. A type specification has
the form
CHARACTER <LENGTH=length>
where
CHARACTER
specifies that the computed variable is a character variable. If you do not
specify a length, then the variable's length is 8.
Alias CHAR
LENGTH=length
specifies the length of a computed character variable.
Default 8
Range 1 to 200
DEFINE Statement
Describes how to use and display a report item.
Restriction: A weight cannot be applied to a report-item alias without also applying it to the
report-item. The WEIGHT= option must appear in the DEFINE statement for the
report-item.
Accessibility When the DEFINE statement includes an ORDER or GROUP option, the
note: SPANROWS option must also be included in the PROC REPORT statement to
generate an accessible table.
Tip: If you do not use a DEFINE statement, then PROC REPORT uses default
characteristics.
Examples: “Example 2: Ordering the Rows in a Report” on page 2174
“Example 3: Using Aliases to Obtain Multiple Statistics for the Same Variable” on
page 2177
“Example 5: Consolidating Multiple Observations into One Row of a Report” on
page 2183
“Example 6: Creating a Column for Each Value of a Variable” on page 2187
“Example 4: Displaying Multiple Statistics for One Variable” on page 2181
“Example 7: Writing a Customized Summary on Each Page” on page 2190
“Example 8: Displaying a Calculated Percentage Column in a Report” on page 2195
“Example 10: Creating an Output Data Set and Storing Computed Variables” on
page 2202
“Example 11: Using a Format to Create Groups” on page 2206
“Example 13: Specifying Style Elements for ODS Output in Multiple Statements” on
page 2212
“Example 12: Using Multilabel Formats” on page 2209
“Example 14: Using the CELLWIDTH= Style Attribute with PROC REPORT” on page
2220
DEFINE Statement 2117
Syntax
DEFINE report-item / <options>;
specifies a style element (for the Output Delivery System) for the report
item.
WEIGHT=weight-variable
specifies a numeric variable whose values weight the value of the
analysis variable.
WIDTH=column-width
defines the width of the column in which PROC REPORT displays the
report item.
Required Argument
report-item
specifies the name or alias (established in the COLUMN statement) of the data
set variable, computed variable, or statistic to define. The following are types of
names that can be used for report-item:
DEFINE Statement 2119
n a name literal
n a statistic
Notes The names in variable range lists refer to variables in the input data set,
not statistic names, or computed variable names. Use only one name for
each DEFINE statement. That one name, however, can be a range list.
Example syntax using a variable range list is: DEFINE Var1–Var3/
width=10 center “#Visit#Date”;
Optional Arguments
ACROSS
defines report-item, which must be a data set variable, as an across variable.
(See “Across Variables” on page 2058.)
ANALYSIS
defines report-item, which must be a data set variable, as an analysis variable.
(See “Analysis Variables” on page 2057.)
By default, PROC REPORT calculates the Sum statistic for an analysis variable.
Specify an alternate statistic with the statistic option in the DEFINE statement.
Note: Special missing values show up as missing values when they are defined
as ANALYSIS variables.
CENTER
centers the formatted values of the report item within the column width and
centers the column heading over the values. This option has no effect on the
CENTER option in the PROC REPORT statement, which centers the report on
the page.
Restriction This option affects the header and the data of LISTING output. In
ODS output, only the data is affected by this option.
COLOR=color
specifies the color in the REPORT window of the column heading and of the
values of the item that you are defining. You can use the following colors:
BLACK MAGENTA
BLUE ORANGE
BROWN PINK
CYAN RED
GRAY WHITE
GREEN YELLOW
Note: Not all operating environments and devices support all colors, and in
some operating environments and devices, one color might map to another
color. For example, if the DEFINITION window displays the word BROWN in
yellow characters, then selecting BROWN results in a yellow item.
column-header
defines the column heading for the report item. Enclose each heading in single
or double quotation marks. When you specify multiple column headings, PROC
REPORT uses a separate line for each one. The split character also splits a
column heading over multiple lines.
In LISTING output, if the first and last characters of a heading are one of the
following characters, then PROC REPORT uses that character to expand the
heading to fill the space over the column: :− = \_ .* +
DEFINE Statement 2121
Similarly, if the first character of a heading is < and the last character is >, or vice
versa, then PROC REPORT expands the heading to fill the space over the
column by repeating the first character before the text of the heading and the
last character after it.
Item Header
Tips If you want to use names when labels exist, then submit the following
SAS statement before invoking PROC REPORT: options nolabel;
Examples “Example 3: Using Aliases to Obtain Multiple Statistics for the Same
Variable” on page 2177
COMPUTED
defines the specified item as a computed variable. Computed variables are
variables that you define for the report. They are not in the input data set, and
PROC REPORT does not add them to the input data set.
n computing the value of the variable in a compute block associated with the
variable
2122 Chapter 58 / REPORT Procedure
CONTENTS='link-text'
specifies the text for the entries in the table of contents created by default or by
options settings in ODS destinations that support the STYLE= option. If the
DEFINE statement has the PAGE= option and the CONTENTS= option specified
with a link-text value assigned, then PROC REPORT adds a directory to the
table of contents and uses the value of link-text as a link for tables created in
the table of contents.
Default If the DEFINE statement has a PAGE option, but does not have a
CONTENTS= option specified, then a directory is created with the
directory text as COLA-COLB. COLA is the name or alias of the
leftmost column and COLB is the name or alias of the rightmost
column. If the table has only one column, then the directory text is
the column name or alias.
Interactions If the DEFINE statement has a page option and there is a BREAK
BEFORE statement with a PAGE option and the CONTENTS=
option specified has a value other than empty quotation marks,
then PROC REPORT adds a directory to the table of contents and
puts links to the tables in that directory.
If there are multiple BREAK BEFORE statements, then the link text
is the concatenation of all of the CONTENTS= values or of all the
default values.
DESCENDING
reverses the order in which PROC REPORT displays rows or values of a group,
order, or across variable.
Tip By default, PROC REPORT orders group, order, and across variables by
their formatted values. Use the ORDER= option in the DEFINE statement
to specify an alternate sort order.
DISPLAY
defines report-item, which must be a data set variable, as a display variable.
(See “Display Variables” on page 2055.)
EXCLUSIVE
excludes from the report and the output data set all combinations of the group
variables and the across variables that are not found in the preloaded range of
user-defined formats.
FLOW
wraps the value of a character variable in its column. The FLOW option honors
the split character. If the text contains no split character, then PROC REPORT
tries to split text at a blank.
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
FORMAT=format
assigns a SAS or user-defined format to the item. This format applies to report-
item as PROC REPORT displays it; the format does not alter the format
associated with a variable in the data set. For data set variables, PROC REPORT
honors the first of these formats that it finds:
n the format that is assigned with FORMAT= in the DEFINE statement
n the format that is assigned in a FORMAT statement when you invoke PROC
REPORT
n the format that is associated with the variable in the data set
If none of these formats is present, then PROC REPORT uses BESTw. for
numeric variables and $w. for character variables. The value of w is the default
column width. For character variables in the input data set, the default column
width is the variable's length. For numeric variables in the input data set and for
computed variables (both numeric and character), the default column width is
the value specified by COLWIDTH= in the PROC REPORT statement or in the
ROPTIONS window.
2124 Chapter 58 / REPORT Procedure
In the interactive report window environment, if you are unsure what format to
use, then type a question mark (?) in the format field in the DEFINITION window
to access the FORMATS window.
Alias F=
GROUP
defines report-item, which must be a data set variable, as a group variable. (See
“Group Variables” on page 2056.)
ID
specifies that the item that you are defining is an ID variable. An ID variable and
all columns to its left appear at the left of every page of a report. ID ensures
that you can identify each row of the report when the report contains more
columns than fits on one page.
LEFT
left-justifies the formatted values of the report item within the column width
and left-justifies the column headings over the values. If the format width is the
same as the width of the column, then the LEFT option has no effect on the
placement of values.
Restriction This option affects the header and the data of LISTING output. In
ODS output, only the data is affected by this option.
MISSING
considers missing values as valid values for the report item. Special missing
values that represent numeric values (the letters A through Z and the
underscore (_) character) are each considered as a separate value.
Default If you omit the MISSING option, then PROC REPORT excludes from the
report and the output data sets all observations that have a missing
value for any group, order, or across variable.
MLF
enables PROC REPORT to use the format label or labels for a given range or for
overlapping ranges to create subgroup combinations that use multilabel
DEFINE Statement 2125
formatting. These multilabel formats are used only with group and across
variables.
MLF is supported on all ODS destinations, the LISTING destination, data sets,
and the REPORT WINDOW.
Requirement Use PROC FORMAT and the MULTILABEL option in the VALUE
statement to create a multilabel format.
Tips The MLF option has no effect unless the variable is associated
with a multilabel format. If there is no MULTILABEL format
associated with the column, then an additional FORMAT
statement or FORMAT= option in the DEFINE statement is needed
to associate an existing format or informat with one or more
variables. If MULTILABEL format is already associated with the
column (using any regular method to associate a format with the
variable), then no additional FORMAT statement or FORMAT=
option is needed.
NOPRINT
suppresses the display of the report item. Use this option
n if you do not want to show the item in the report but you need to use its
values to calculate other values that you use in the report.
n to establish the order of rows in the report.
n if you do not want to use the item as a column but want to have access to its
values in summaries. (See “Example 7: Writing a Customized Summary on
Each Page” on page 2190.)
Interactions Even though the columns that you define with NOPRINT do not
appear in the report, you must count them when you are referencing
columns by number. (See “Four Ways to Reference Report Items in
a Compute Block” on page 2063.)
NOZERO
suppresses the display of the report item if its values are all zero or missing.
Interactions Even though the columns that you define with NOZERO do not
appear in the report, you must count them when you are referencing
columns by number. (See “Four Ways to Reference Report Items in
a Compute Block” on page 2063.)
ORDER
defines report-item, which must be a data set variable, as an order variable. (See
“Order Variables” on page 2056.)
DATA
orders values according to their order in the input data set.
Note If you specify the ORDER=DATA option for input data in a DBMS
table, the order of rows written to a database table from PROC
REPORT is not likely to be preserved.
FORMATTED
orders values by their formatted (external) values. If no format has been
assigned to a class variable, then the default format, BEST12., is used.
FREQ
orders values by ascending frequency count.
INTERNAL
orders values by their unformatted values. This order depends on your
operating environment. This sort sequence is particularly useful for
displaying dates chronologically.
Default FORMATTED
Interaction DESCENDING in the item's definition reverses the sort sequence for
an item. By default, the order is ascending.
DEFINE Statement 2127
Note The default value for the ORDER= option in PROC REPORT is not
the same as the default value in other SAS procedures. In other SAS
procedures, the default is ORDER=INTERNAL. The default for the
option in PROC REPORT might change in a future release to be
consistent with other procedures. Therefore, in production jobs
where it is important to order report items by their formatted values,
specify ORDER=FORMATTED even though it is currently the
default. Doing so ensures that PROC REPORT continues to produce
the reports that you expect even if the default changes.
PAGE
inserts a page break just before printing the first column containing values of
the report item.
Interaction PAGE is ignored if you use WRAP in the PROC REPORT statement
or in the ROPTIONS window.
PRELOADFMT
specifies that the format is preloaded for the variable.
RIGHT
right-justifies the formatted values of the specified item within the column
width and right-justifies the column headings over the values. If the format
width is the same as the width of the column, then RIGHT has no effect on the
placement of values.
Restriction This option affects the header and the data of LISTING output. In
ODS output, only the data is affected by this option.
SPACING=horizontal-positions
defines the number of blank characters to leave between the column being
defined and the column immediately to its left. For each column, the sum of its
width and the blank characters between it and the column to its left cannot
exceed the line size.
Default 2
Restriction This option has no effect on ODS destinations other than the
LISTING destination.
statistic
associates a statistic with an analysis variable. You must associate a statistic
with every analysis variable in its definition. PROC REPORT uses the statistic
that you specify to calculate values for the analysis variable for the
observations that are represented by each cell of the report. You cannot use
statistic in the definition of any other type of variable.
See “Statistics That Are Available in PROC REPORT” on page 2140 for a list of
available statistics.
Default SUM
Note PROC REPORT uses the name of the analysis variable as the default
heading for the column. You can customize the column heading with
the column-header option in the DEFINE statement.
DEFINE Statement 2129
STYLE<(location(s))>=<style-overrides(s)>
specifies the style element to use for column headings and for text inside cells
for this report item.
style-element-name | [style-attribute-name-1=style-attribute-value-1
<style-attribute-name-2=style-attribute-value-2 …>]
Restriction All ODS destinations except OUTPUT and LISTING support the
STYLE= option.
See “Style Elements and Style Attributes for Table Regions ” on page
2149
Example “Example 13: Specifying Style Elements for ODS Output in Multiple
Statements” on page 2212
WEIGHT=weight-variable
specifies a numeric variable whose values weight the values of the analysis
variable that is specified in the DEFINE statement. The variable value does not
have to be an integer. The following table describes how PROC REPORT treats
various values of the WEIGHT variable.
Weight
Value PROC REPORT Response
Less than Converts the value to zero and counts the observation in the total
0 number of observations
To exclude observations that contain negative and zero weights from the
analysis, use the EXCLNPWGT option in the PROC REPORT statement. Note
that most SAS/STAT procedures, such as PROC GLM, exclude negative and zero
weights by default.
Alias WGT=
Tips When you use the WEIGHT= option, consider which value of the
VARDEF= option in the PROC REPORT statement is appropriate.
WIDTH=column-width
defines the width of the column in which PROC REPORT displays report-item.
This option affects only LISTING output.
Default A column width that is just large enough to handle the format. If
there is no format, then PROC REPORT uses the value of the
COLWIDTH= option in the PROC REPORT statement.
Restriction This option has no effect on ODS destinations other than LISTING
output. For ODS destinations, use the STYLE= option with the
WIDTH= style attribute or the CELLWIDTH= style attribute. Refer to
“Style Attributes Tables” in SAS Output Delivery System: Advanced
Topics for details. See how style attributes WIDTH= and
CELLWIDTH= can be used with PROC REPORT in “Example 14:
Using the CELLWIDTH= Style Attribute with PROC REPORT” on
page 2220.
Tip When you stack items in the same column in a report, the width of
the item that is at the bottom of the stack determines the width of
the column.
FREQ Statement 2131
ENDCOMP Statement
Marks the end of one or more programming statements that PROC REPORT executes as it builds the
report.
Syntax
ENDCOMP;
FREQ Statement
Treats observations as if they appear multiple times in the input data set.
Tip: The effects of the FREQ and WEIGHT statements are similar except when
calculating degrees of freedom.
See: For an example that uses the FREQ statement, see “Example” on page 80
Syntax
FREQ variable;
Required Argument
variable
specifies a numeric variable whose value represents the frequency of the
observation. If you use the FREQ statement, then the procedure assumes that
each observation represents n observations, where n is the value of variable. If n
is not an integer, then SAS truncates it. If n is less than 1 or is missing, then the
procedure does not use that observation to calculate statistics.
2132 Chapter 58 / REPORT Procedure
Details
LINE Statement
Provides a subset of the features of the PUT statement for writing customized summaries.
Restrictions: This statement is valid only in a compute block that is associated with a location in
the report.
You cannot use the LINE statement in conditional statements (IF-THEN, IF-THEN/
ELSE, and SELECT) because it is not executed until PROC REPORT has executed
all other statements in the compute block.
Accessibility Using the LINE statement causes an inaccessible table to be generated.
note:
Syntax
LINE specification(s);
Required Argument
specification(s)
can have one of the following forms. You can mix different forms of
specifications in one LINE statement.
item item-format
specifies the item to display and the format to use to display it, where
item
is the name of a data set variable, a computed variable, or a statistic in
the report. For information about referencing report items, see “Four
Ways to Reference Report Items in a Compute Block” on page 2063.
LINE Statement 2133
item-format
is a SAS format or user-defined format. You must specify a format for
each item.
'character-string '
specifies a string of text to display. When the string is a blank and nothing
else is in specification(s), PROC REPORT prints a blank line.
number-of-repetitions*'character-string '
specifies a character string and the number of times to repeat it.
pointer-control
specifies the column in which PROC REPORT displays the next specification.
You can use either of the following forms for pointer controls:
@column-number
specifies the number of the column in which to begin displaying the next
item in the specification list.
+column-increment
specifies the number of columns to skip before beginning to display the
next item in the specification list.
Restriction The pointer controls are designed for LISTING output. They have
no effect on other ODS destinations.
Details
n the _ALL_, _INFILE_, and _PAGE_ arguments and the OVERPRINT option
2134 Chapter 58 / REPORT Procedure
n format modifiers
n array elements
RBREAK Statement
Produces a default summary at the beginning or end of a report or at the beginning or end of each BY
group.
Examples: “Example 1: Selecting Variables and Creating a Summary Line for a Report” on page
2171
“Example 8: Displaying a Calculated Percentage Column in a Report” on page 2195
Syntax
RBREAK location </ options>;
SUMMARIZE
includes a summary line as one of the break lines.
UL
underlines each value.
Required Argument
location
controls the placement of the break lines and is either of the following:
AFTER
places the break lines at the end of the report.
BEFORE
places the break lines at the beginning of the report.
Optional Arguments
COLOR=color
specifies the color of the break lines in the REPORT window. You can use the
following colors:
BLACK MAGENTA
BLUE ORANGE
BROWN PINK
CYAN RED
GRAY WHITE
GREEN YELLOW
Note Not all operating environments and devices support all colors, and in
some operating environments and devices, one color might map to
another color. For example, if the DEFINITION window displays the
word BROWN in yellow characters, then selecting BROWN results in
a yellow item.
CONTENTS='link-text'
specifies the text for the entries in the table of contents created by default or by
options settings in ODS destinations that support the STYLE= option. Only the
RBREAK BEFORE statement with the PAGE and SUMMARIZE options specified
creates a table within the table of contents. If the CONTENTS= option plus the
2136 Chapter 58 / REPORT Procedure
PAGE and SUMMARIZE options are specified, then PROC REPORT uses the
value of link-text and places that text in the table of contents for the tables that
are created. If the value of CONTENTS= is empty quotation marks, then no link
is created in the table of contents.
DOL
(for double overlining) uses the 13th formatting character to overline each value
n that appears in the summary line
n that would appear in the summary line if you specified the SUMMARIZE
option
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
Interaction If you specify both the OL and DOL options, then PROC REPORT
honors only OL.
DUL
(for double underlining) uses the 13th formatting character to underline each
value
n that appears in the summary line
n that would appear in the summary line if you specified the SUMMARIZE
option
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
Interaction If you specify both the UL and DUL options, then PROC REPORT
honors only UL.
OL
(for overlining) uses the second formatting character to overline each value
n that appears in the summary line
n that would appear in the summary line if you specified the SUMMARIZE
option
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
Interaction If you specify both the OL and DOL options, then PROC REPORT
honors only OL.
PAGE
starts a new page after the last break line of a break located at the beginning of
the report. On RBREAK BEFORE, the PAGE option starts a new table.
SKIP
writes a blank line after the last break line of a break located at the beginning of
the report.
Restriction This option has no effect on ODS destinations other than the
LISTING destination.
STYLE<(location(s))>=<style-overrides(s)>
specifies the style element to use for default summary lines that are created
with the RBREAK statement.
Restriction All ODS destinations except OUTPUT and LISTING support the
STYLE= option.
See “Style Elements and Style Attributes for Table Regions ” on page
2149
SUMMARIZE
includes a summary line as one of the break lines. A summary line at the
beginning or end of a report contains values for the following:
n statistics
n analysis variables
n computed variables
The following table shows how PROC REPORT calculates the value for each
type of report item in a summary line created by the RBREAK statement:
2138 Chapter 58 / REPORT Procedure
UL
(for underlining) uses the second formatting character to underline each value
n that appears in the summary line
n that would appear in the summary line if you specified the SUMMARIZE
option.
Restriction This option affects only the LISTING output. It has no affect on
other ODS output.
Interaction If you specify both the UL and DUL options, then PROC REPORT
honors only UL.
Details
Note: If you define a customized summary for the break, then customized break
lines appear after underlining or double underlining. For more information about
customized break lines, see the COMPUTE statement on page 2112 and the LINE
statement “LINE Statement” on page 2132.
WEIGHT Statement
Specifies weights for analysis variables in the statistical calculations.
See: For information about calculating weighted statistics see “Calculating Weighted
Statistics” on page 83. For an example that uses the WEIGHT statement, see
“Weighted Statistics Example” on page 84.
Syntax
WEIGHT variable;
Required Argument
variable
specifies a numeric variable whose values weight the values of the analysis
variables. The value of the variable does not have to be an integer.
Table 58.8 Variable Values and How PROC REPORT Responds
Restriction PROC REPORT will not compute MODE when a weight variable is
active. Instead, try using PROC UNIVARIATE when MODE needs to
be computed and a weight variable is active.
Tip When you use the WEIGHT statement, consider which value of the
VARDEF= option is appropriate. See VARDEF= on page 2096 and
the calculation of weighted statistics in “Keywords and Formulas” on
page 2700 for more information.
Details
CSS PCTSUM
CV RANGE
MAX STD
MEAN STDERR
MIN SUM
MODE SUMWGT
Usage: REPORT Procedure 2141
N USS
NMISS VAR
PCTN
P1 P60
P5 P70
P10 P80
P20 P90
P30 P95
P40 P99
Q1 | P25 QRANGE
PRT | PROBT T
These statistics, the formulas that are used to calculate them, and their data
requirements are discussed in “Keywords and Formulas” on page 2700.
To compute standard error and the Student's t-test, you must use the default value
of VARDEF=, which is DF.
You can place N anywhere because it is the number of observations in the input
data set that contribute to the value in a cell of the report. The value of N does not
depend on a particular variable.
Note: If you use the MISSING option in the PROC REPORT statement, then N
includes observations with missing group, order, or across variables.
2142 Chapter 58 / REPORT Procedure
The Base SAS reporting procedures, PROC PRINT, PROC REPORT, and PROC
TABULATE, enable you to quickly analyze your data and organize it into easy-to-
read tables. You can use the STYLE= option with these procedure statements to
modify the appearance of your report. The STYLE= option enables you to make
changes in sections of output without changing the default style for all of the
output. You can customize specific sections of procedure output by specifying the
STYLE= option in specific statements within the procedure.
The following program uses the STYLE= option to create the background colors in
the PROC REPORT output below:
title "Height and Weight by Gender and Age";
proc report nowd data=sashelp.class
style(header)=[background=white];
col age (('Gender' sex),(weight height));
define age / style(header)=[background=lightgreen];
define sex / across style(header)=[background=yellow] ' ';
define weight / style(header)=[background=orange];
define height / style(header)=[background=tan];
run;
Usage: REPORT Procedure 2143
Note: Because styles control the presentation of the data, they have no effect on
output objects that go to the LISTING, DOCUMENT, or OUTPUT destination.
Available styles are in the SASHELP.TMPLMST item store. In SAS Enterprise Guide,
the list of style sheets is shown by the Style Wizard. In batch mode or SAS Studio,
you can display the list of available style templates by using the LIST statement in
PROC TEMPLATE:
2144 Chapter 58 / REPORT Procedure
proc template;
list styles / store=sashelp.tmplmst;
run;
For complete information about viewing ODS styles, see “Viewing ODS Styles
Supplied by SAS” in SAS Output Delivery System: Advanced Topics.
By default, HTML 4 output uses the HTMLBlue style template and HTML 5 output
uses the HTMLEncore style template. To help you become familiar with styles,
style elements, and style attributes, look at the relationship between them.
You can use the SOURCE statement in PROC TEMPLATE to display the structure
of a style template. The following code prints the structure of the HTMLBlue style
template to the SAS log:
proc template;
source styles.HTMLBlue;
run;
The following figure illustrates the structure of a style. The figure shows the
relationship between the style, the style elements, and the style attributes.
Usage: REPORT Procedure 2145
The following list corresponds to the numbered items in the preceding figure:
You can create new styles with the “DEFINE STYLE Statement” in SAS Output
Delivery System: Procedures Guide. New styles can be created independently or
from an existing style. You can use “PARENT= Statement” in SAS Output
Delivery System: Procedures Guide to create a new style from an existing style.
For complete documentation about ODS styles, see “Style Templates” in SAS
Output Delivery System: Advanced Topics.
2 Header and Footer are examples of style elements. A style element is a
collection of style attributes that apply to a particular part of the output for a
SAS program. For example, a style element might contain instructions for the
presentation of column headings or for the presentation of the data inside table
cells. Style elements might also specify default colors and fonts for output that
uses the style. Style elements exist inside styles and consist of one or more
style attributes. Style elements can be user-defined or supplied by SAS. User-
defined style elements can be created by the “STYLE Statement” in SAS Output
Delivery System: Procedures Guide.
Note: For a list of the default style elements used for HTML and markup
languages and their inheritance, see “Style Elements” in SAS Output Delivery
System: Advanced Topics.
The following table shows commonly used style attributes that you can set with
the STYLE= option in PROC PRINT, PROC TABULATE, and PROC REPORT. Most of
these attributes apply to parts of the table other than cells (for example, table
borders and the lines between columns and rows). Note that not all attributes are
valid in all destinations. For more information about these style attributes, their
valid values, and their applicable destinations, see “Style Attributes Tables” in SAS
Output Delivery System: Advanced Topics.
Table 58.10 Style Attributes for PROC REPORT, PROC TABULATE, and PROC PRINT
PROC
REPORT PROC PROC
Areas: TABULATE PRINT:
PROC CALLDEF, STATEMENTS all
REPORT COLUMN, PROC VAR, CLASS, PROC locations
STATEMENT HEADER, TABULATE BOX, PRINT other
REPORT LINES, STATEMENT CLASSLEV, TABLE than
Attribute Area SUMMARY TABLE KEYWORD location TABLE
ASIS= X X X X
Usage: REPORT Procedure 2147
PROC
REPORT PROC PROC
Areas: TABULATE PRINT:
PROC CALLDEF, STATEMENTS all
REPORT COLUMN, PROC VAR, CLASS, PROC locations
STATEMENT HEADER, TABULATE BOX, PRINT other
REPORT LINES, STATEMENT CLASSLEV, TABLE than
Attribute Area SUMMARY TABLE KEYWORD location TABLE
BACKGROUNDCOLO X X X X X X
R=
BACKGROUNDIMAG X X X X X X
E=
BORDERBOTTOMCO X X X
LOR=
BORDERBOTTOMST X X X X
YLE=
BORDERBOTTOMWI X X X X
DTH=
BORDERLEFTCOLOR X X X
=
BORDERLEFTSTYLE X X X X
=
BORDERLEFTWIDTH X X X X
=
BORDERCOLOR= X X X X X
BORDERCOLORDAR X X X X X X
K=
BORDERCOLORLIGH X X X X X X
T=
BORDERRIGHTCOLO X X X
R=
BORDERRIGHTSTYL X X X X
E=
BORDERRIGHTWIDT X X X X
H=
BORDERTOPCOLOR X X X
=
2148 Chapter 58 / REPORT Procedure
PROC
REPORT PROC PROC
Areas: TABULATE PRINT:
PROC CALLDEF, STATEMENTS all
REPORT COLUMN, PROC VAR, CLASS, PROC locations
STATEMENT HEADER, TABULATE BOX, PRINT other
REPORT LINES, STATEMENT CLASSLEV, TABLE than
Attribute Area SUMMARY TABLE KEYWORD location TABLE
BORDERTOPSTYLE= X X X X
BORDERTOPWIDTH X X X X
=
BORDERWIDTH= X X X X X X
CELLPADDING= X X X
CELLSPACING= X X X
CELLWIDTH= X X X X X
CLASS= X X X X X X
COLOR= X X X
FLYOVER= X X X X
FONT= X X X X X X
FONTFAMILY= X X X X X X
FONTSIZE= X X X X X X
FONTSTYLE= X X X X X X
FONTWEIGHT= X X X X X X
FONTWIDTH= X X X X X
FRAME= X X X
HEIGHT= X X X X X
HREFTARGET= X X X
HTMLSTYLE= X X X X X
NOBREAKSPACE=2 X X X X
OUTPUTWIDTH= X X X X X
POSTHTML=1 X X X X X X
Usage: REPORT Procedure 2149
PROC
REPORT PROC PROC
Areas: TABULATE PRINT:
PROC CALLDEF, STATEMENTS all
REPORT COLUMN, PROC VAR, CLASS, PROC locations
STATEMENT HEADER, TABULATE BOX, PRINT other
REPORT LINES, STATEMENT CLASSLEV, TABLE than
Attribute Area SUMMARY TABLE KEYWORD location TABLE
POSTIMAGE= X X X X X X
POSTTEXT=1 X X X X X X
PREHTML=1 X X X X X X
PREIMAGE= X X X X X X
PRETEXT=1 X X X X X X
PROTECTSPECIALC X X X X
HARS=
RULES= X X X
TAGATTR= X X X X X X
TEXTALIGN= X X X X X X
URL= X X X
VERTICALALIGN= X X X
WIDTH= X X X X X
1 When you use these attributes in this location, they affect only the text that is specified with the PRETEXT=,
POSTTEXT=, PREHTML=, and POSTHTML= attributes. To alter the foreground color or the font for the text that appears
in the table, you must set the corresponding attribute in a location that affects the cells rather than the table. For
complete documentation about style attributes and their values, see “Style Attributes” in SAS Output Delivery System:
Advanced Topics.
2 To help prevent unexpected wrapping of long text strings when using PROC REPORT with the ODS RTF destination, set
NOBREAKSPACE=OFF in a location that affects the LINE statement. The NOBREAKSPACE=OFF attribute must be set in
the PROC REPORT code either on the LINE statement or on the PROC REPORT statement where style(lines) is specified.
For complete documentation about the ODS destinations and their default styles,
see “Style Templates” in SAS Output Delivery System: Advanced Topics.
Table 58.11 Default Style Elements and Style Attributes for Table Locations
For example, use the style precedence for non-summary rows shown below. First,
for a particular cell, PROC REPORT uses the default style attributes. Next, for each
cell in a column, PROC REPORT overrides the default style attributes. By step five,
the previous styles that were applied are overwritten, but only for the cells
specified by the column-id.
The following lists the style precedence for the summary rows and the non-
summary rows.
compute diff;
diff = predict.sum - actual.sum;
endcomp;
run;
Printing a Report
You can print the output file directly or use PROC PRINTTO to redirect the output
to another file. In either case, no form is used, but carriage-control characters are
written if the destination is a print file.
PROC PRINTTO does not use a form, but it does write carriage-control characters if
you are writing to a print file.
Note: You need two PROC PRINTTO steps. The first PROC PRINTTO step
precedes the PROC REPORT step. It redirects the output to a file. The second
PROC PRINTTO step follows the PROC REPORT step. It reestablishes the default
destination and frees the output file. You cannot print the file until PROC PRINTTO
frees it.
2154 Chapter 58 / REPORT Procedure
Note: A report definition might differ from the SAS program that creates the
report. See the discussion of OUTREPT= on page 2087.
You can use a report definition to create an identically structured report for any
SAS data set that contains variables with the same names as the ones that are
used in the report definition. Use the REPORT= option in the PROC REPORT
statement to load a report definition when you start PROC REPORT. For
information, see “REPORT=libref.catalog.entry” on page 2091.
When the DATA= input data set is stored as a table or view in a DBMS, the PROC
REPORT procedure can use in-database processing to perform most of its work
within the database. In-database processing can provide the advantages of faster
processing and reduced data transfer between the database and SAS software.
n Aster
n DB2
n Google BigQuery
n Greenplum
n Hadoop
n HAWQ
n Impala
Usage: REPORT Procedure 2155
n Netezza
n Oracle
n PostgreSQL
n SAP HANA
n Snowflake
n Teradata
n Vertica
n Yellowbrick
If the SAS format definitions have not been deployed in the database, the in-
database aggregation occurs on the raw values, and the relevant formats are
applied by SAS as the results' set is merged into the PROC REPORT internal
structures. For more information, see the section “Deploying and Using SAS
Formats” in SAS/ACCESS for Relational Databases: Reference.
In-database processing will not occur if the PROC REPORT step contains variables
with usage types DISPLAY or ORDER.
The following statistics are supported for in-database processing: N, NMISS, MIN,
MAX, MEAN, RANGE, SUM, SUMWGT, CSS, USS, VAR, STD, STDERR, and CV.
Weighting for in-database processing is supported only for N, NMISS, MIN, MAX,
RANGE, SUM, SUMWGT, and MEAN.
For more information about in-database processing, see SAS/ACCESS for Relational
Databases: Reference.
the appropriate statistical actions. For an overview about how procedures run in
CAS, see Chapter 5, “CAS Processing of Base Procedures,” on page 93.
The CAS LIBNAME statement option controls whether and how CAS procedures
are run inside CAS. By default, the CAS procedures are run inside CAS when
possible. However, there are many data set options that can prevent CAS
processing.
When the DATA= input data set references an in-memory table or view in CAS, the
REPORT procedure can use CAS actions to perform some of its work within the
server. To reference an in-memory table or view, you must specify the CAS engine
LIBNAME statement and specify the CAS engine libref option IN= or DATA=. By
default, PROC REPORT uses CAS processing whenever a CAS engine libref is
specified on the input.
The following example shows how to run SAS 9.4 PROC REPORT that uses CAS
processing. The LIBNAME statement assigns a CAS engine libref named mycas that
you use to connect to the CAS session casauto.
option casport=10935 cashost="cloud.example.com";
cas casauto ;
CAS processing does not occur if the PROC REPORT step contains variables with
usage types DISPLAY or ORDER. If a DISPLAY or ORDER variable is found, all of
the data is brought back to the client and PROC REPORT runs on the SAS client
server.
CSS RANGE
CV STDERR
MAX SUM
MEAN SUMWGT
MIN STD
N USS
NMISS VAR
Usage: REPORT Procedure 2157
Note: Weighting for CAS processing is supported only for N, NMISS, MIN, MAX,
RANGE, SUM, SUMWGT, and MEAN.
The computations and processing are done in CAS. However, the rendering of the
final table is done by the SAS client.
For information about how to use the CAS LIBNAME statement, see “CAS
LIBNAME Statement” in SAS Cloud Analytic Services: User’s Guide.
To use the #BYVAR and #BYVAL substitutions, insert the item in the text string at
the position where you want the substitution text to appear. Both #BYVAR and
#BYVAL specifications must be followed by a delimiting character. The character
can be either a space or other non-alphanumeric character, such as a quotation
mark. If no delimiting character is provided, then the specification is ignored and its
text remains intact and is displayed with the rest of the string. To allow a #BYVAR
or #BYVAL substitution to be followed immediately by other text, with no
delimiter, use a trailing dot (as with macro variables). The trailing dot is not
displayed in the resolved text. If you want a period to be displayed as the last
character in the resolved text, use two dots after the #BYVAR or #BYVAL
substitution.
The substitution for #BYVAR or #BYVAL does not occur in the following cases:
n if you use a #BYVAR or #BYVAL specification for a variable that is not named in
the BY statement. For example, you might use #BYVAL2 when there is only one
BY-variable or #BYVAL(ABC) when ABC is non-existent or is not a BY-variable.
n if there is no BY statement
2158 Chapter 58 / REPORT Procedure
Sequence of Events
This section explains the general process of building a report. For examples that
illustrate this process, see “Report-Building Examples” on page 2160. The sequence
of events is the same whether you use programming statements or the interactive
report window environment.
To understand the process of building a report, you must understand the difference
between report variables and temporary variables. Report variables are variables
that are specified in the COLUMN statement. A report variable can come from the
input data set or can be computed (that is, the DEFINE statement for that variable
specifies the COMPUTED option). A report variable might or might not appear in a
compute block. Variables that appear only in one or more compute blocks are
temporary variables. Temporary variables do not appear in the report and are not
written to the output data set (if one is requested).
PROC REPORT initializes report variables to missing at the beginning of each row
of the report. The value for a temporary variable is initialized to missing before
PROC REPORT begins to construct the rows of the report, and it remains missing
until you specifically assign a value to it. PROC REPORT retains the value of a
temporary variable from the execution of one compute block to another.
1 It consolidates the data by group, order, and across variables. It calculates all
statistics for the report, the statistics for detail rows as well as the statistics for
summary lines in breaks. Statistics include those statistics that are computed
for analysis variables. PROC REPORT calculates statistics for summary lines
whether they appear in the report.
Note: You can also use statistics with PROC REPORT as follows.
n Use group statistics in compute blocks for a break before the group variable.
n Use statistics for the whole report in a compute block at the beginning of the
report.
This document references these statistics with the appropriate compound
name. For information about referencing report items in a compute block, see
“Four Ways to Reference Report Items in a Compute Block” on page 2063.
Note: You cannot use the LINE statement in conditional statements (IF-THEN,
IF-THEN/ELSE, and SELECT) because it is not executed until PROC REPORT
has executed all other statements in the compute block.
4 After each report row is completed, PROC REPORT sends the row to all of the
ODS destinations that are currently open.
n You use a compute block at the break. (You can attach a compute block to a
break without using a BREAK or RBREAK statement or without selecting any
options in the BREAK window.)
For more information about using compute blocks, see “Using Compute Blocks”
on page 2061 and “COMPUTE Statement” on page 2112.
The summary line that PROC REPORT constructs at this point is preliminary. If no
compute block is attached to the break, then the preliminary summary line
becomes the final summary line. However, if a compute block is attached to the
break, then the statements in the compute block can alter the values in the
preliminary summary line.
2160 Chapter 58 / REPORT Procedure
PROC REPORT prints the summary line only if you summarize numeric variables in
the break.
Report-Building Examples
At the end of the report a break summarizes the statistics and computed variables
in the report and assigns to Sector the value of TOTALS:.
The following statements produce “Report with Groups and a Report Summary” on
page 2161. The user-defined formats that are used are created by a PROC
FORMAT step on page 2171.
libname proclib
'SAS-library';
options nodate pageno=1 linesize=64
pagesize=60 fmtsearch=(proclib);
compute before;
totprof = 0;
endcomp;
compute profit;
if sector ne ' ' or department ne ' ' then do;
if department='np1' or department='np2'
then profit=0.4*sales.sum;
else profit=0.25*sales.sum;
totprof = totprof + profit;
end;
else
profit = totprof;
endcomp;
Results: REPORT Procedure 2161
1 PROC REPORT starts building the report by consolidating the data (Sector and
Department are group variables) and by calculating the statistics (Sales.sum
and N) for each detail row and for the break at the end of the report.
2 Now, PROC REPORT is ready to start building the first row of the report. This
report does not contain a break at the beginning of the report or a break before
any groups, so the first row of the report is a detail row. The procedure initializes
all report variables to missing, as the following figure illustrates. Missing values
for a character variable are represented by a blank, and missing values for a
numeric variable are represented by a period.
. . .
3 The following figure illustrates the construction of the first three columns of the
row. PROC REPORT fills in values for the row from left to right. Values come
from the statistics that were computed at the beginning of the report-building
process.
2162 Chapter 58 / REPORT Procedure
Figure 58.12 First Detail Row with Values Filled in from Left to Right
Northeast . . .
Northeast Canned . . .
4 The next column in the report contains the computed variable Profit. When it
gets to this column, PROC REPORT executes the statements in the compute
block that is attached to Profit. Nonperishable items (which have a value of np1
or np2) return a profit of 40%; perishable items (which have a value of p1 or p2)
return a profit of 25%.
if department='np1' or department='np2'
then profit=0.4*sales.sum;
else profit=0.25*sales.sum;
8 PROC REPORT repeats steps 2, 3, 4, 5, and 6 for each detail row in the report.
9 At the break at the end of the report, PROC REPORT constructs the break lines
described by the RBREAK statement. These lines include double underlining,
double overlining, and a preliminary version of the summary line. The statistics
for the summary line were calculated earlier. (See step 1.) The value for the
computed variable is calculated when PROC REPORT reaches the appropriate
column, just as it is in detail rows. PROC REPORT uses these values to create
the preliminary version of the summary line. (See the following figure.)
$4,285.00 $1,423.75 20
10 If no compute block is attached to the break, then the preliminary version of the
summary line is the same as the final version. However, in this example, a
compute block is attached to the break. Therefore, PROC REPORT now
executes the statements in that compute block. In this case, the compute block
contains one statement:
sector='TOTALS:';
This statement replaces the value of Sector, which in the summary line is
missing by default, with the word TOTALS:. After PROC REPORT executes the
statement, it modifies the summary line to reflect this change to the value of
Sector. The final version of the summary line appears in the following figure.
11 Finally, PROC REPORT writes all the break lines, with underlining, overlining,
and the final summary line, to the report. See See “Report with Groups and a
Report Summary” on page 2161.
until you specifically assign a value to it. PROC REPORT retains the value of a
temporary variable from the execution of one compute block to another.
Because all compute blocks share the current values of all variables, you can
initialize temporary variables at a break at the beginning of the report or at a break
before a break variable. This report initializes the temporary variable Sctrtot at a
break before Sector.
Note: PROC REPORT creates a preliminary summary line for a break before it
executes the corresponding compute block. If the summary line contains computed
variables, then the computations are based on the values of the contributing
variables in the preliminary summary line. If you want to recalculate computed
variables based on values that you set in the compute block, then you must do so
explicitly in the compute block. This report illustrates this technique. If no compute
block is attached to a break, then the preliminary summary line becomes the final
summary line.
The report in “Report with Temporary Variables” on page 2166 contains five
columns:
n Sector and Department are group variables.
n Sales is an analysis variable that is used twice in this report: once to calculate
the Sum statistic, and once to calculate the Pctsum statistic.
n Sctrpct is a computed variable whose values are based on the values of Sales
and a temporary variable, Sctrtot, which is the total sales for a sector.
At the beginning of the report, a customized report summary tells what the sales
for all stores are. At a break before each group of observations for a department, a
default summary summarizes the data for that sector. At the end of each group a
break inserts a blank line.
Note: Calculations of the percentages do not multiply their results by 100 because
PROC REPORT prints them with the PERCENT. format.
libname proclib
'SAS-library';
options nodate pageno=1 linesize=64
pagesize=60 fmtsearch=(proclib);
ods html close;
ods listing;
proc report data=grocery noheader;
column sector department sales
Sctrpct sales=Salespct;
format=dollar9.2 ;
define sctrpct / computed
format=percent9.2 ;
define salespct / pctsum format=percent9.2;
compute before;
line ' ';
line @16 'Total for all stores is '
sales.sum dollar9.2;
line ' ';
line @29 'Sum of' @40 'Percent'
@51 'Percent of';
line @6 'Sector' @17 'Department'
@29 'Sales'
@40 'of Sector' @51 'All Stores';
line @6 55*'=';
line ' ';
endcomp;
compute sctrpct;
sctrpct=sales.sum/sctrtot;
endcomp;
1 PROC REPORT starts building the report by consolidating the data (Sector and
Department are group variables) and by calculating the statistics (Sales.sum
and Sales.pctsum) for each detail row, for the break at the beginning of the
report, for the breaks before each group, and for the breaks after each group.
2 PROC REPORT initializes the temporary variable, Sctrtot, to missing. (See the
following figure.)
3 Because this PROC REPORT step contains a COMPUTE BEFORE statement, the
procedure constructs a preliminary summary line for the break at the beginning
of the report. This preliminary summary line contains values for the statistics
(Sales.sum and Sales.pctsum) and the computed variable (Sctrpct).
At this break, Sales.sum is the sales for all stores, and Sales.pctsum is the
percentage those sales represent for all stores (100%). PROC REPORT takes the
values for these statistics from the statistics that were computed at the
beginning of the report-building process.
The value for Sctrpct comes from executing the statements in the
corresponding compute block. Because the value of Sctrtot is missing, PROC
REPORT cannot calculate a value for Sctrpct. Therefore, in the preliminary
summary line (which is not printed in this case), this variable also has a missing
value. (See the following figure.)
Results: REPORT Procedure 2167
The statements in the COMPUTE BEFORE block do not alter any variables.
Therefore, the final summary line is the same as the preliminary summary line.
Figure 58.18 Preliminary and Final Summary Line for the Break at the Beginning
of the Report
4 Because the program does not include an RBREAK statement with the
SUMMARIZE option, PROC REPORT does not write the final summary line to
the report. Instead, it uses LINE statements to write a customized summary that
embeds the value of Sales.sum into a sentence and to write customized column
headings. (The NOHEADER option in the PROC REPORT statement suppresses
the default column headings, which would have appeared before the customized
summary.)
5 Next, PROC REPORT constructs a preliminary summary line for the break
before the first group of observations. (This break both uses the SUMMARIZE
option in the BREAK statement with a compute block attached to it. Either of
these conditions generates a summary line.) The preliminary summary line
contains values for the break variable (Sector), the statistics (Sales.sum and
Sales.pctsum), and the computed variable (Sctrpct). At this break, Sales.sum is
the sales for one sector (the northeast sector). PROC REPORT takes the values
for Sector, Sales.sum, and Sales.pctsum from the statistics that were computed
at the beginning of the report-building process.
The value for Sctrpct comes from executing the statements in the
corresponding compute blocks. Because the value of Sctrtot is still missing,
PROC REPORT cannot calculate a value for Sctrpct. Therefore, in the
preliminary summary line, Sctrpct has a missing value. (See the following
figure.)
Figure 58.19 Preliminary Summary Line for the Break before the First Group of
Observations
6 PROC REPORT creates the final version of the summary line by executing the
statements in the COMPUTE BEFORE SECTOR compute block. These
statements execute once each time the value of Sector changes.
n The first statement assigns the value of Sales.sum, which in that part of the
report represents total sales for one Sector, to the variable Sctrtot.
2168 Chapter 58 / REPORT Procedure
Note: In this example, you must recalculate the value for Sctrpct in the final
summary line. If you do not recalculate the value for Sctrpct, then it is missing
because the value of Sctrtot is missing when the COMPUTE Sctrpct block
executes.
Figure 58.20 Final Summary Line for the Break before the First Group of
Observations
8 Now, PROC REPORT is ready to start building the first report row. It initializes
all report variables to missing. Values for temporary variables do not change.
The following figure illustrates the first detail row at this point.
9 The following figure illustrates the construction of the first three columns of the
row. PROC REPORT fills in values for the row from left to right. The values
come from the statistics that were computed at the beginning of the report-
building process.
10 The next column in the report contains the computed variable Sctrpct. When it
gets to this column, PROC REPORT executes the statement in the compute
block attached to Sctrpct. This statement calculates the percentage of the
sector's total sales that this department accounts for.
sctrpct=sales.sum/sctrtot;
Figure 58.23 First Detail Row with the First Computed Variable Added
11 The next column in the report contains the statistic Sales.pctsum. PROC
REPORT gets this value from the statistics that are created at the beginning of
the report-building process. The first detail row is now complete. (See the
following figure.)
12 PROC REPORT writes the detail row to the report. It repeats steps 8, 9, 10, 11,
and 12 for each detail row in the group.
13 After writing the last detail row in the group to the report, PROC REPORT
constructs the default group summary. Because no compute block is attached
to this break and because the BREAK AFTER statement does not include the
SUMMARIZE option, PROC REPORT does not construct a summary line. The
only action at this break is that the SKIP option in the BREAK AFTER statement
writes a blank line after the last detail row of the group.
14 Now the value of the break variable changes from Northeast to Northwest.
PROC REPORT constructs a preliminary summary line for the break before this
group of observations. As at the beginning of any row, PROC REPORT initializes
all report variables to missing but retains the value of the temporary variable.
Next, it completes the preliminary summary line with the appropriate values for
the break variable (Sector), the statistics (Sales.sum and Sales.pctsum), and the
computed variable (Sctrpct). At this break, Sales.sum is the sales for the
Northwest sector. Because the COMPUTE BEFORE Sector block has not yet
executed, the value of Sctrtot is still $1,831.00, the value for the Northeast
sector. Thus, the value that PROC REPORT calculates for Sctrpct in this
preliminary summary line is incorrect. (See the following figure.) The statements
in the compute block for this break calculate the correct value. (See the
following step.)
2170 Chapter 58 / REPORT Procedure
Figure 58.25 Preliminary Summary Line for the Break before the Second Group
of Observations
CAUTION
Synchronize values for computed variables in break lines to prevent
incorrect results. If the PROC REPORT step does not recalculate Sctrpct in the
compute block that is attached to the break, then the value in the final summary line
is not synchronized with the other values in the summary line, and the report is
incorrect.
15 PROC REPORT creates the final version of the summary line by executing the
statements in the COMPUTE BEFORE Sector compute block. These statements
execute once each time the value of Sector changes.
n The first statement assigns the value of Sales.sum, which in that part of the
report represents sales for the Northwest sector, to the variable Sctrtot.
n The second statement completes the summary line by recalculating Sctrpct
from the new, appropriate value of Sctrtot. The following figure shows the
final summary line.
Figure 58.26 Final Summary Line for the Break before the Second Group of
Observations
16 Now, PROC REPORT is ready to start building the first row for this group of
observations. It repeats steps 8 through 16 until it has processed all
observations in the input data set (stopping with step 14 for the last group of
observations).
Example 1: Selecting Variables and Creating a Summary Line for a Report 2171
Details
This example uses a permanent data set and permanent formats to create a report
that contains the following:
n one row for every observation
Program
libname proclib 'SAS-library';
data grocery;
input Sector $ Manager $ Department $ Sales @@;
datalines;
se 1 np1 50 se 1 p1 100 se 1 np2 120 se 1 p2 80
2172 Chapter 58 / REPORT Procedure
Program Description
Declare the PROCLIB library. The PROCLIB library is used to store user-created
formats.
libname proclib 'SAS-library';
Create the GROCERY data set. GROCERY contains one day's sales figures for eight
stores in the Grocery Mart chain. Each observation contains one day's sales data for
one department in one store.
data grocery;
input Sector $ Manager $ Department $ Sales @@;
datalines;
Example 1: Selecting Variables and Creating a Summary Line for a Report 2173
Specify the format search library. The SAS system option FMTSEARCH= adds the
SAS library PROCLIB to the search path that is used to locate formats.
options fmtsearch=(proclib);
Specify the report options. By default, REPORT procedure runs without the
REPORT window and sends its output to the open output destinations.
proc report data=grocery;
Specify the report columns. The report contains a column for Manager,
Department, and Sales. Because there is no DEFINE statement for any of these
variables, PROC REPORT uses the character variables (Manager and Department)
as display variables and the numeric variable (Sales) as an analysis variable that is
used to calculate the sum statistic.
column manager department sales;
Select the observations to process. The WHERE statement selects for the report
only the observations for stores in the southeast sector.
where sector='se';
Format the report columns. The FORMAT statement assigns formats to use in the
report. You can use the FORMAT statement only with data set variables.
format manager $mgrfmt. department $deptfmt.
sales dollar11.2;
Specify the titles. SYSDATE is an automatic macro variable that returns the date
on which the SAS job or SAS session began. The TITLE2 statement uses double
rather than single quotation marks so that the macro variable resolves.
title 'Sales for the Southeast Sector';
title2 "for &sysdate";
run;
Output: HTML
Output 58.2 Selecting Variables and Creating a Summary Line for a Report
ANALYSIS
FORMAT=
ORDER
ORDER=
SUM
BREAK statement options
AFTER
SUMMARIZE
STYLE=
LIBNAME statement
OPTIONS statement
WHERE statement
TITLE statement
Data set: GROCERY
Format: $MGRFMT
Format: $DEPTFMT
Details
This example does the following:
n arranges the rows alphabetically by the formatted values of Manager and the
internal values of Department (so that sales for the two departments that sell
nonperishable goods precede sales for the two departments that sell perishable
goods)
n controls the default column width and the spacing between columns
Program
libname proclib 'SAS-library';
options fmtsearch=(proclib);
proc report data=grocery;
column manager department sales;
define manager / order order=formatted format=$mgrfmt.;
define department / order order=internal format=$deptfmt.;
define sales / analysis sum format=dollar7.2;
break after manager / summarize
style=[font_style=italic];
2176 Chapter 58 / REPORT Procedure
where sector='se';
title 'Sales for the Southeast Sector';
run;
Program Description
Declare the PROCLIB library. The PROCLIB library is used to store user-created
formats.
libname proclib 'SAS-library';
Specify the format search library. The SAS system option FMTSEARCH= adds the
SAS library PROCLIB to the search path that is used to locate formats.
options fmtsearch=(proclib);
Specify the report options. By default, PROC REPORT runs without the REPORT
window and sends its output to the open output destinations.
proc report data=grocery;
Specify the report columns. The report contains a column for Manager,
Department, and Sales.
column manager department sales;
Define the sort order variables. The values of all variables with the ORDER option
in the DEFINE statement determine the order of the rows in the report. In this
report, PROC REPORT arranges the rows first by the value of Manager (because it
is the first variable in the COLUMN statement) and then by the values of
Department. ORDER= specifies the sort order for a variable. This report arranges
the rows according to the formatted values of Manager and the internal values of
Department (np1, np2, p1, and p2). FORMAT= specifies the formats to use in the
report.
define manager / order order=formatted format=$mgrfmt.;
define department / order order=internal format=$deptfmt.;
Define the analysis variable. Sum calculates the sum statistic for all observations
that are represented by the current row. In this report each row represents only one
observation. Therefore, the Sum statistic is the same as the value of Sales for that
observation in the input data set. Using Sales as an analysis variable in this report
enables you to summarize the values for each group and at the end of the report.
define sales / analysis sum format=dollar7.2;
Select the observations to process. The WHERE statement selects for the report
only the observations for stores in the southeast sector.
where sector='se';
Example 3: Using Aliases to Obtain Multiple Statistics for the Same Variable 2177
Output: HTML
Output 58.3 Ordering the Rows in a Report
Details
The customized summary at the end of this report displays the minimum and
maximum values of Sales over all departments for stores in the southeast sector.
To determine these values, PROC REPORT needs the MIN and MAX statistic for
Sales in every row of the report. However, to keep the report simple, the display of
these statistics is suppressed.
Program
libname proclib 'SAS-library';
options fmtsearch=(proclib);
proc report data=grocery;
column manager department sales
sales=salesmin
sales=salesmax;
define manager / order
order=formatted
format=$mgrfmt.
'Manager';
define department / order
order=internal
format=$deptfmt.
'Department';
define sales / analysis sum format=dollar7.2 'Sales';
define salesmin / analysis min noprint;
define salesmax / analysis max noprint;
Example 3: Using Aliases to Obtain Multiple Statistics for the Same Variable 2179
compute after;
line 'Departmental sales ranged from'
salesmin dollar7.2 " " 'to' " " salesmax dollar7.2;
endcomp;
where sector='se';
title 'Sales for the Southeast Sector';
title2 "for &sysdate";
run;
Program Description
Declare the PROCLIB library. The PROCLIB library is used to store user-created
formats.
libname proclib 'SAS-library';
Specify the format search library. The SAS system option FMTSEARCH= adds the
SAS library PROCLIB to the search path that is used to locate formats.
options fmtsearch=(proclib);
Specify the report options. By default, PROC REPORT runs without the REPORT
window and sends its output to the open output destinations.
proc report data=grocery;
Specify the report columns. The report contains columns for Manager and
Department. It also contains three columns for Sales. The column specifications
SALES=SALESMIN and SALES=SALESMAX create aliases for Sales. These aliases
enable you to use a separate definition of Sales for each of the three columns.
column manager department sales
sales=salesmin
sales=salesmax;
Define the sort order variables. The values of all variables with the ORDER option
in the DEFINE statement determine the order of the rows in the report. In this
report, PROC REPORT arranges the rows first by the value of Manager (because it
is the first variable in the COLUMN statement) and then by the values of
Department. The ORDER= option specifies the sort order for a variable. This report
arranges the values of Manager by their formatted values and arranges the values
of Department by their internal values (np1, np2, p1, and p2). FORMAT= specifies
the formats to use in the report. Text in quotation marks specifies column headings.
define manager / order
order=formatted
format=$mgrfmt.
'Manager';
define department / order
order=internal
format=$deptfmt.
'Department';
Define the analysis variable. The value of an analysis variable in any row of a
report is the value of the statistic that is associated with it (in this case Sum),
calculated for all observations that are represented by that row. In a detail report
2180 Chapter 58 / REPORT Procedure
each row represents only one observation. Therefore, the Sum statistic is the same
as the value of Sales for that observation in the input data set.
define sales / analysis sum format=dollar7.2 'Sales';
Define additional analysis variables for use in the summary. These DEFINE
statements use aliases from the COLUMN statement to create separate columns
for the MIN and MAX statistics for the analysis variable Sales. NOPRINT
suppresses the printing of these statistics. Although PROC REPORT does not print
these values in columns, it has access to them so that it can print them in the
summary.
define salesmin / analysis min noprint;
define salesmax / analysis max noprint;
Select the observations to process. The WHERE statement selects for the report
only the observations for stores in the southeast sector.
where sector='se';
Specify the titles. SYSDATE is an automatic macro variable that returns the date
on which the SAS job or SAS session began. The TITLE2 statement uses double
rather than single quotation marks so that the macro variable resolves.
title 'Sales for the Southeast Sector';
title2 "for &sysdate";
run;
Example 4: Displaying Multiple Statistics for One Variable 2181
Output: HTML
Output 58.4 Using Aliases to Obtain Multiple Statistics for the Same Variable
Details
The report in this example displays six statistics for the sales for each manager's
store.
Program
libname proclib 'SAS-library';
options fmtsearch=(proclib);
proc report data=grocery;
column sector manager (Sum Min Max Range Mean Std),sales;
define manager / group format=$mgrfmt.;
define sector / group format=$sctrfmt.;
define sales / format=dollar11.2 ;
title 'Sales Statistics for All Sectors';
run;
Program Description
Declare the PROCLIB library. The PROCLIB library is used to store user-created
formats.
libname proclib 'SAS-library';
Specify the format search library. The SAS system option FMTSEARCH= adds the
SAS library PROCLIB to the search path that is used to locate formats.
options fmtsearch=(proclib);
Specify the report columns. This COLUMN statement creates a column for Sector,
Manager, and each of the six statistics that are associated with Sales.
column sector manager (Sum Min Max Range Mean Std),sales;
Define the group variables and the analysis variable. In this report, Sector and
Manager are group variables. Each detail row of the report consolidates the
information for all observations with the same values of the group variables.
FORMAT= specifies the formats to use in the report.
define manager / group format=$mgrfmt.;
define sector / group format=$sctrfmt.;
define sales / format=dollar11.2 ;
Output: HTML
Output 58.5 Displaying Multiple Statistics for One Variable
Details
This example creates a summary report that does the following:
n consolidates information for each combination of Sector and Manager into one
row of the report
n contains default summaries of sales for each sector
n uses one format for sales in detail rows and a different format in summary rows
Program
libname proclib 'SAS-library';
options fmtsearch=(proclib);
proc report data=grocery;
column sector manager sales;
define sector / group format=$sctrfmt.'Sector';
define manager / group format=$mgrfmt.'Manager';
define sales / analysis sum format=comma10.2 'Sales';
break after sector / summarize
style=[font_style=italic]
suppress;
compute after;
line 'Combined sales for the northern sectors were '
sales.sum dollar9.2 '.';
endcomp;
compute sales;
Example 5: Consolidating Multiple Observations into One Row of a Report 2185
Program Description
Declare the PROCLIB library. The PROCLIB library is used to store user-created
formats.
libname proclib 'SAS-library';
Specify the format search library. The SAS system option FMTSEARCH= adds the
SAS library PROCLIB to the search path that is used to locate formats.
options fmtsearch=(proclib);
Specify the report options. By default, PROC REPORT runs without the REPORT
window and sends its output to the open output destinations.
proc report data=grocery;
Specify the report columns. The report contains columns for Sector, Manager, and
Sales.
column sector manager sales;
Define the group and analysis variables. In this report, Sector and Manager are
group variables. Sales is an analysis variable that is used to calculate the Sum
statistic. Each detail row represents a set of observations that have a unique
combination of formatted values for all group variables. The value of Sales in each
detail row is the sum of Sales for all observations in the group. FORMAT= specifies
the format to use in the report. Text in quotation marks in a DEFINE statement
specifies the column heading.
define sector / group format=$sctrfmt.'Sector';
define manager / group format=$mgrfmt.'Manager';
define sales / analysis sum format=comma10.2 'Sales';
Specify a format for the summary rows. In detail rows, PROC REPORT displays the
value of Sales with the format that is specified in its definition (COMMA10.2). The
compute block specifies an alternate format to use in the current column on
summary rows. Summary rows are identified as a value other than a blank for
_BREAK_.
compute sales;
if _break_ ne ' ' then
call define(_col_,"format","dollar11.2");
endcomp;
Select the observations to process. The WHERE statement selects for the report
only the observations for stores in the northeast and northwest sectors. The TITLE
statement specifies the title.
where sector contains 'n';
Output: HTML
Output 58.6 Consolidating Multiple Observations into One Row of a Report
Example 6: Creating a Column for Each Value of a Variable 2187
Details
The report in this example does the following:
n consolidates multiple observations into one row
n contains a column for each value of Department that is selected for the report
(the departments that sell perishable items)
n contains a variable that is not in the input data set
Program
libname proclib 'SAS-library';
options fmtsearch=(proclib);
proc report data=grocery split='*';
column sector manager department,sales perish;
define sector / group format=$sctrfmt. 'Sector' '';
define manager / group format=$mgrfmt. 'Manager* ';
define department / across format=$deptfmt. 'Department';
define sales / analysis sum format=dollar11.2 ' ';
define perish / computed format=dollar11.2
'Perishable*Total';
compute perish;
perish=sum(_c3_, _c4_);
endcomp;
compute after;
line 'Combined sales for meat and dairy : '
_c3_ dollar11.2 '';
line 'Combined sales for produce : '
_c4_ dollar11.2 '';
line 'Combined sales for all perishables: '
_c5_ dollar11.2 '';
endcomp;
where sector contains 'n'
and (department='p1' or department='p2');
title 'Sales Figures for Perishables in Northern Sectors';
run;
Program Description
Declare the PROCLIB library. The PROCLIB library is used to store user-created
formats.
libname proclib 'SAS-library';
Specify the format search library. The SAS system option FMTSEARCH= adds the
SAS library PROCLIB to the search path that is used to locate formats.
options fmtsearch=(proclib);
Specify the report options. By default, PROC REPORT runs without the REPORT
window and sends its output to the open output destinations. SPLIT= defines the
split character as an asterisk (*) because the default split character (/) is part of the
name of a department.
proc report data=grocery split='*';
Specify the report columns. Department and Sales are separated by a comma in
the COLUMN statement, so they collectively determine the contents of the column
Example 6: Creating a Column for Each Value of a Variable 2189
that they define. Each item generates a heading, but the heading for Sales is set to
blank in its definition. Because Sales is an analysis variable, its values fill the cells
that are created by these two variables.
column sector manager department,sales perish;
define sector / group format=$sctrfmt. 'Sector' '';
define manager / group format=$mgrfmt. 'Manager* ';
Define the across variable. PROC REPORT creates a column and a column heading
for each formatted value of the across variable Department. PROC REPORT orders
the columns by these values. PROC REPORT also generates a column heading that
spans all these columns. Quoted text in the DEFINE statement for Department
customizes this heading.
define department / across format=$deptfmt. 'Department';
Define the analysis variable. Sales is an analysis variable that is used to calculate
the sum statistic. In each case, the value of Sales is the sum of Sales for all
observations in one department in one group. (In this case, the value represents a
single observation.)
define sales / analysis sum format=dollar11.2 ' ';
Define the computed variable. The COMPUTED option indicates that PROC
REPORT must compute values for Perish. You compute the variable's values in a
compute block that is associated with Perish.
define perish / computed format=dollar11.2
'Perishable*Total';
Calculate values for the computed variable. This compute block computes the
value of Perish from the values for the Meat/Dairy department and the Produce
department. Because the variables Sales and Department collectively define these
columns, there is no way to identify the values to PROC REPORT by name.
Therefore, the assignment statement uses column numbers to unambiguously
specify the values to use. Each time PROC REPORT needs a value for Perish, it
sums the values in the third and fourth columns of that row of the report.
compute perish;
perish=sum(_c3_, _c4_);
endcomp;
Output: HTML
Output 58.7 Creating a Column for Each Value of a Variable
Details
The report in this example displays a record of one day's sales for each store. The
rows are arranged so that all the information about one store is together, and the
information for each store begins on a new page. Some variables appear in columns.
Others appear only in the page heading that identifies the sector and the store's
manager.
The heading that appears at the top of each page is created with the _PAGE_
argument in the COMPUTE statement.
The text that appears at the bottom of the page depends on the total of Sales for
the store. Only the first two pages of the report appear here.
Program
libname proclib 'SAS-library';
options fmtsearch=(proclib);
proc report data=grocery;
title 'Sales for Individual Stores';
column sector manager department sales Profit;
define sector / group noprint;
define manager / group noprint;
define profit / computed format=dollar11.2;
define sales / analysis sum format=dollar11.2;
define department / group format=$deptfmt.;
compute profit;
2192 Chapter 58 / REPORT Procedure
if department='np1' or department='np2'
then profit=0.4*sales.sum;
else profit=0.25*sales.sum;
endcomp;
compute before _page_ / style={just=left};
line sector $sctrfmt. ' Sector';
line 'Store managed by ' manager $mgrfmt.;
endcomp;
break after manager / summarize style=[font_style=italic] page;
compute after manager;
length text $ 35;
if sales.sum lt 500 then
text='Sales are below the target region.';
else if sales.sum ge 500 and sales.sum lt 1000 then
text='Sales are in the target region.';
else if sales.sum ge 1000 then
text='Sales exceeded goal!';
line text $35.;
endcomp;
run;
Program Description
Declare the PROCLIB library. The PROCLIB library is used to store user-created
formats.
libname proclib 'SAS-library';
Specify the format search library. The SAS system option FMTSEARCH= adds the
SAS library PROCLIB to the search path that is used to locate formats.
options fmtsearch=(proclib);
Specify the report options. By default, PROC REPORT runs without the REPORT
window and sends its output to the open output destinations.
proc report data=grocery;
Specify the report columns. The report contains a column for Sector, Manager,
Department, Sales, and Profit, but the NOPRINT option suppresses the printing of
the columns for Sector and Manager. The page heading (created later in the
program) includes their values. To get these variable values into the page heading,
Sector and Manager must be in the COLUMN statement.
column sector manager department sales Profit;
Define the group, computed, and analysis variables. In this report, Sector,
Manager, and Department are group variables. Each detail row of the report
consolidates the information for all observations with the same values of the group
variables. Profit is a computed variable whose values are calculated in the next
section of the program. FORMAT= specifies the formats to use in the report.
Example 7: Writing a Customized Summary on Each Page 2193
Create a customized page heading. This compute block executes at the top of each
page, after PROC REPORT writes the title. It writes the page heading for the
current manager's store. The STYLE= option left-justifies the text in the LINE
statements. The LINE statements write a variable value with the format specified
immediately after the variable's name.
compute before _page_ / style={just=left};
line sector $sctrfmt. ' Sector';
line 'Store managed by ' manager $mgrfmt.;
endcomp;
Specify the length of the customized summary text. The LENGTH statement
assigns a length of 35 to the temporary variable TEXT. In this particular case, the
LENGTH statement is unnecessary because the longest version appears in the first
IF/THEN statement. However, using the LENGTH statement ensures that even if
the order of the conditional statements changes, TEXT is long enough to hold the
longest version.
length text $ 35;
Specify the conditional logic for the customized summary text. You cannot use
the LINE statement in conditional statements (IF-THEN, IF-THEN/ELSE, and
SELECT) because it does not take effect until PROC REPORT has executed all
other statements in the compute block. These IF-THEN/ELSE statements assign a
value to TEXT based on the value of Sales.sum in the summary row. A LINE
statement writes that variable, whatever its value happens to be.
2194 Chapter 58 / REPORT Procedure
Output: HTML
Output 58.8 Writing a Customized Summary on Each Page
Example 8: Displaying a Calculated Percentage Column in a Report 2195
Details
The summary report in this example shows the total sales for each store and the
percentage that these sales represent of sales for all stores. Each of these columns
has its own heading. A single heading also spans all the columns.
The report includes a computed character variable, COMMENT, that flags stores
with an unusually high percentage of sales.
Program
libname proclib 'SAS-library';
options fmtsearch=(proclib);
proc report data=grocery;
2196 Chapter 58 / REPORT Procedure
title;
column ('Individual Store Sales as a Percent of All Sales'
sector manager sales,(sum pctsum) comment);
define manager / group
format=$mgrfmt.;
define sector / group
format=$sctrfmt.;
define sales / format=dollar11.2
'';
define sum / format=dollar9.2
'Total Sales';
define pctsum / 'Percent of Sales' format=percent6.;
define comment / computed style(column)=[cellwidth=2.5in];
compute comment / char length=40;
if sales.pctsum gt .15 and _break_ = ' '
then comment='Sales substantially above expectations.';
else comment=' ';
endcomp;
rbreak after / summarize style=[font_style=italic];
run;
Program Description
Declare the PROCLIB library. The PROCLIB library is used to store user-created
formats.
libname proclib 'SAS-library';
Specify the format search library. The SAS system option FMTSEARCH= adds the
SAS library PROCLIB to the search path that is used to locate formats.
options fmtsearch=(proclib);
Specify the report options. Specify the data set and other options.
proc report data=grocery;
title;
Specify the report columns. The COLUMN statement uses the text in quotation
marks as a spanning heading. The heading spans all the columns in the report
because they are all included in the pair of parentheses that contains the heading.
The COLUMN statement associates two statistics with Sales: Sum and Pctsum.
The Sum statistic sums the values of Sales for all observations that are included in
a row of the report. The Pctsum statistic shows what percentage of Sales sum is for
all observations in the report.
column ('Individual Store Sales as a Percent of All Sales'
sector manager sales,(sum pctsum) comment);
Define the group and analysis columns. In this report, Sector and Manager are
group variables. Each detail row represents a set of observations that have a unique
combination of formatted values for all group variables. Sales is, by default, an
analysis variable that is used to calculate the Sum statistic. However, because
Example 8: Displaying a Calculated Percentage Column in a Report 2197
statistics are associated with Sales in the column statement, those statistics
override the default. FORMAT= specifies the formats to use in the report. Text
between quotation marks specifies the column heading.
define manager / group
format=$mgrfmt.;
define sector / group
format=$sctrfmt.;
define sales / format=dollar11.2
'';
define sum / format=dollar9.2
'Total Sales';
Define the percentage and computed columns. The DEFINE statement for Pctsum
specifies a column heading and a format. The PERCENT. format presents the value
of Pctsum as a percentage rather than a decimal. The DEFINE statement for
COMMENT defines a computed variable and assigns it a column.
define pctsum / 'Percent of Sales' format=percent6.;
define comment / computed style(column)=[cellwidth=2.5in];
Specify the conditional logic for the computed variable. For every store where
sales exceed 15% of the sales for all stores, the compute block creates a comment
that says Sales substantially above expectations. Of course, on the summary
row for the report, the value of Pctsum is 100. However, it is inappropriate to flag
this row as having exceptional sales. The automatic variable _BREAK_ distinguishes
detail rows from summary rows. In a detail row, the value of _BREAK_ is blank. The
THEN statement executes only on detail rows where the value of Pctsum exceeds
0.15.
if sales.pctsum gt .15 and _break_ = ' '
then comment='Sales substantially above expectations.';
else comment=' ';
endcomp;
Produce the report summary. This RBREAK statement creates a default summary
at the end of the report. SUMMARIZE writes the values of Sales.sum and
Sales.pctsum in the summary line. STYLE= italicizes the summary line.
rbreak after / summarize style=[font_style=italic];
run;
2198 Chapter 58 / REPORT Procedure
Output: HTML
Output 58.9 Calculating Percentages
Details
This example illustrates how PROC REPORT handles missing values for group (or
order or across) variables with and without the MISSING option. The differences in
the reports are apparent if you compare the values of N for each row and compare
the totals in the default summary at the end of the report.
Program Description
Declare the PROCLIB library. The PROCLIB library is used to store user-created
formats.
libname proclib 'SAS-library';
Specify the format search library. The SAS system option FMTSEARCH= adds the
SAS library PROCLIB to the search path that is used to locate formats.
options fmtsearch=(proclib);
2200 Chapter 58 / REPORT Procedure
Create the GROCMISS data set. GROCMISS is identical to GROCERY except that it
contains some observations with missing values for Sector, Manager, or both.
data grocmiss;
input Sector $ Manager $ Department $ Sales @@;
datalines;
se 1 np1 50 . 1 p1 100 se . np2 120 se 1 p2 80
se 2 np1 40 se 2 p1 300 se 2 np2 220 se 2 p2 70
nw 3 np1 60 nw 3 p1 600 . 3 np2 420 nw 3 p2 30
nw 4 np1 45 nw 4 p1 250 nw 4 np2 230 nw 4 p2 73
nw 9 np1 45 nw 9 p1 205 nw 9 np2 420 nw 9 p2 76
sw 5 np1 53 sw 5 p1 130 sw 5 np2 120 sw 5 p2 50
. . np1 40 sw 6 p1 350 sw 6 np2 225 sw 6 p2 80
ne 7 np1 90 ne . p1 190 ne 7 np2 420 ne 7 p2 86
ne 8 np1 200 ne 8 p1 300 ne 8 np2 420 ne 8 p2 125
;
Specify the report options. By default, PROC REPORT runs without the REPORT
window and sends its output to the open output destinations.
proc report data=grocmiss;
Specify the report columns. The report contains columns for Sector, Manager, the
N statistic, and Sales.
column sector manager N sales;
Define the group and analysis variables. In this report, Sector and Manager are
group variables. Sales is, by default, an analysis variable that is used to calculate
the Sum statistic. Each detail row represents a set of observations that have a
unique combination of formatted values for all group variables. The value of Sales
in each detail row is the sum of Sales for all observations in the group. In this PROC
REPORT step, the procedure does not include observations with a missing value for
the group variable. FORMAT= specifies formats to use in the report.
define sector / group format=$sctrfmt.;
define manager / group format=$mgrfmt.;
define sales / format=dollar9.2;
OUTPUT: HTML
Output 58.10 Output with No Missing Values
OUTPUT: HTML
Output 58.11 Output with Missing Values
WHERE=
LIBNAME statement
OPTIONS statement
TITLE
Data set: GROCERY
Format: $MGRFMT
Details
This example uses WHERE processing as it builds an output data set. This
technique enables you to do WHERE processing after you have consolidated
multiple observations into a single row.
Note: This technique is needed because you cannot subset on the results of
analysis variables. You cannot subset on a value calculated by PROC REPORT.
The first PROC REPORT step creates a report in which each row represents all the
observations from the input data set for a single manager. The second PROC
REPORT step builds a report from the output data set.
Program Description
Declare the PROCLIB library. The PROCLIB library is used to store user-created
formats.
2204 Chapter 58 / REPORT Procedure
Specify the format search library. The SAS system option FMTSEARCH= adds the
SAS library PROCLIB to the search path that is used to locate formats.
options fmtsearch=(proclib);
Specify the report options and columns. By default, PROC REPORT runs without
the REPORT window and sends its output to the open output destinations. OUT=
creates the output data set PROFIT. The output data set contains a variable for
each column in the report (Manager, Sales, and the computed column manager_pct)
as well as for the variable _BREAK_, which is not used in this example. Each
observation in the data set represents a row of the report. Because Manager is a
group variable and Sales is an analysis variable that is used to calculate the Sum
statistic, each row in the report (and therefore each observation in the output data
set) represents multiple observations from the input data set. In particular, each
value of Sales in the output data set is the total of all values of Sales for that
manager. The WHERE= data set option in the OUT= option filters those rows as
PROC REPORT creates the output data set. Only those observations with sales
that exceed $1,000 become observations in the output data set.
proc report data=grocery
out=profit( where=(sales gt 1000 and _break_='') );
column manager sales manager_pct;
Define the group and analysis variables and compute the values for the percent of
sales. The overall sum is placed in a temporary variable, total_sales, and the values
for the percent of sales is computed.
define manager / group;
define manager_pct / computed;
compute before;
total_sales = sales.sum;
endcomp;
compute manager_pct;
manager_pct = sales.sum /total_sales;
endcomp;
run;
OUTPUT: HTML
Here is the data set created by PROC REPORT. It is used as the input set in the next
PROC REPORT step.
Example 10: Creating an Output Data Set and Storing Computed Variables 2205
OUTPUT: HTML
Output 58.13 Report Based on the Output Data Set
2206 Chapter 58 / REPORT Procedure
Details
This example shows how to use formats to control the number of groups that
PROC REPORT creates. The program creates a format for Department that
classifies the four departments as one of two types: perishable or nonperishable.
Consequently, when Department is an across variable, PROC REPORT creates only
two columns instead of four. The column heading is the formatted value of the
variable.
Program
libname proclib 'SAS-library';
options fmtsearch=(proclib);
proc format;
value $perish 'p1','p2'='Perishable'
'np1','np2'='Nonperishable';
run;
proc report data=grocery;
Example 11: Using a Format to Create Groups 2207
Program Description
Declare the PROCLIB library. The PROCLIB library is used to store user-created
formats.
libname proclib 'SAS-library';
Specify the format search library. The SAS system option FMTSEARCH= adds the
SAS library PROCLIB to the search path that is used to locate formats.
options fmtsearch=(proclib);
Create the $PERISH. format. PROC FORMAT creates a format for Department.
This variable has four different values in the data set, but the format has only two
values.
proc format;
value $perish 'p1','p2'='Perishable'
'np1','np2'='Nonperishable';
run;
Specify the report options. By default, the REPORT procedure runs without the
REPORT window and sends its output to the open output destinations.
proc report data=grocery;
Specify the report columns. Department and Sales are separated by a comma in
the COLUMN statement, so they collectively determine the contents of the column
that they define. Because Sales is an analysis variable, its values fill the cells that
are created by these two variables. The report also contains a column for Manager
and a column for Sales by itself (which is the sales for all departments).
column manager department,sales sales;
Define the group and across variables. Manager is a group variable. Each detail row
of the report consolidates the information for all observations with the same value
of Manager. Department is an across variable. PROC REPORT creates a column and
a column heading for each formatted value of Department. ORDER=FORMATTED
2208 Chapter 58 / REPORT Procedure
Define the analysis variable. Sales is an analysis variable that is used to calculate
the Sum statistic. Sales appears twice in the COLUMN statement, and the same
definition applies to both occurrences. FORMAT= specifies the format to use in the
report. STYLE= specifies the width of the column. Notice that the column headings
for the columns that both Department and Sales create are a combination of the
heading for Department and the (default) heading for Sales.
define sales / analysis sum
format=dollar9.2 style=[cellwidth=13];
Output: HTML
Output 58.14 Using a Format to Create Groups
Details
This example uses a multilabel format to create a report that does the following:
n shows how to specify a multilabel format in the VALUE statement of PROC
FORMAT
n shows how to activate multilabel format processing using the MLF option with
the DEFINE statement
n shows how using NOTSORTED and PRELOADFMT use the sort order shown in
PROC FORMAT.
Program
proc format;
value agelfmt (multilabel notsorted)
11='11'
12='12'
13='13'
14='14'
15='15'
16='16'
11-12='11 or 12'
13-14='13 or 14'
15-16='15 or 16'
low-13='13 and below'
14-high='14 and above' ;
run;
ods html file="example.html";
title "GROUP Variable with MLF Option";
proc report data=sashelp.class;
col age ('Mean' height weight);
define age / group mlf format=agelfmt. 'Age Group' order=data
preloadfmt;
define height / mean format=6.2 'Height (in.)';
define weight / mean format=6.2 'Weight (lbs.)';
run;
Program Description
Create the AGE1FMT. format. The FORMAT procedure creates a multilabel format
for ages by using the MULTILABEL option. A multilabel format is one in which
multiple labels can be assigned to the same value. Each value is represented in the
Example 12: Using Multilabel Formats 2211
table for each range in which it occurs. Use the NOTSORTED to ensure that the sort
order is kept.
proc format;
value agelfmt (multilabel notsorted)
11='11'
12='12'
13='13'
14='14'
15='15'
16='16'
11-12='11 or 12'
13-14='13 or 14'
15-16='15 or 16'
low-13='13 and below'
14-high='14 and above' ;
run;
Specify a title.
title "GROUP Variable with MLF Option";
Specify the report options. By default, the REPORT procedure runs without the
REPORT window and sends its output to the open output destination.
proc report data=sashelp.class;
Specify the report columns. The report contains a column for Age, Height, and
Weight. The Mean of the Height and Weight are calculated.
col age ('Mean' height weight);
Define the variables. AGE is a group variable. The AGE variable uses the MLF
option to activate multilabel format processing. MLF should be used only with
group and across variables. FORMAT= specifies that the multilabel format,
AGE1FMT. is used. The ORDER=DATA option along with the NOTSORTED option
on PROC FORMAT, keeps the desired sort order. Each detail row of the report
consolidates the information for all observations with the same values of the group
variables. The mean is calculated for the HEIGHT and WEIGHT column values.
PRELOADFMT option specifies that the format is preloaded for the variable.
define age / group mlf format=agelfmt. 'Age Group' order=data
preloadfmt;
define height / mean format=6.2 'Height (in.)';
define weight / mean format=6.2 'Weight (lbs.)';
run;
2212 Chapter 58 / REPORT Procedure
Output: HTML
Output 58.15 Using Multilabel Formats
LIBNAME statement
OPTIONS statement
TITLE statement
WHERE statement
ODS PDF statement
ODS RTF statement
Data set: GROCERY
Format: $MGRFMT
Format: $DEPTFMT
Details
This example creates HTML, PDF, and RTF files and sets the style elements for
each location in the report in the PROC REPORT statement. It then overrides some
of these settings by specifying style elements in other statements. For more
information, see “Style Elements and Style Attributes for Table Regions ” on page
2149.
Program
libname proclib 'SAS-library';
options fmtsearch=(proclib);
ods pdf file='external-PDF-file';
ods rtf file='external-RTF-file';
proc report data=grocery
style(report)=[cellspacing=5 borderwidth=10 bordercolor=blue]
style(header)=[color=yellow
fontstyle=italic fontsize=6]
style(column)=[color=moderate brown
fontfamily=helvetica fontsize=4]
style(lines)=[color=white backgroundcolor=black
fontstyle=italic fontweight=bold fontsize=5]
style(summary)=[color=cx3e3d73 backgroundcolor=cxaeadd9
fontfamily=helvetica fontsize=3 textalign=r];
column manager department sales;
define manager / order
order=formatted
format=$mgrfmt.
'Manager'
style(header)=[color=white
2214 Chapter 58 / REPORT Procedure
backgroundcolor=black];
define department / order
order=internal
format=$deptfmt.
'Department'
style(column)=[fontstyle=italic];
break after manager / summarize;
compute after manager
/ style=[fontstyle=roman fontsize=3 fontweight=bold
backgroundcolor=white color=black];
line 'Subtotal for ' manager $mgrfmt. 'is '
sales.sum dollar7.2 '.';
endcomp;
compute sales;
if sales.sum>100 and _break_=' ' then
call define(_col_, "style",
"style=[backgroundcolor=yellow
fontfamily=helvetica
fontweight=bold]");
endcomp;
compute after;
line 'Total for all departments is: '
sales.sum dollar7.2 '.';
endcomp;
where sector='se';
title 'Sales for the Southeast Sector';
run;
ods pdf close;
ods rtf close;
Program Description
Declare the PROCLIB library. The PROCLIB library is used to store user-created
formats.
libname proclib 'SAS-library';
Specify the format search library. The SAS system option FMTSEARCH= adds the
SAS library PROCLIB to the search path that is used to locate formats.
options fmtsearch=(proclib);
Specify the ODS output filenames. By opening multiple ODS destinations, you can
produce multiple output files in a single execution. HTML output is produced by
default. The ODS PDF statement produces output in Portable Document Format
(PDF). The ODS RTF statement produces output in Rich Text Format (RTF). The
output from PROC REPORT goes to each of these files.
ods pdf file='external-PDF-file';
ods rtf file='external-RTF-file';
Example 13: Specifying Style Elements for ODS Output in Multiple Statements 2215
Specify the report options. By default, PROC REPORT runs without the REPORT
window. In this case, SAS writes the output to the traditional procedure output, the
HTML body file, and the RTF and PDF files.
proc report data=grocery
Specify the style attributes for the report. This STYLE= option sets the style
element for the structural part of the report. Because no style element is specified,
PROC REPORT uses all the style attributes of the default style element for this
location except for the ones that are specified here.
style(report)=[cellspacing=5 borderwidth=10 bordercolor=blue]
Specify the style attributes for the column headings. This STYLE= option sets the
style element for all column headings. Because no style element is specified, PROC
REPORT uses all the style attributes of the default style element for this location
except for the ones that are specified here.
style(header)=[color=yellow
fontstyle=italic fontsize=6]
Specify the style attributes for the report columns. This STYLE= option sets the
style element for all the cells in all the columns. Because no style element is
specified, PROC REPORT uses all the style attributes of the default style element
for this location except for the ones that are specified here.
style(column)=[color=moderate brown
fontfamily=helvetica fontsize=4]
Specify the style attributes for the compute block lines. This STYLE= option sets
the style element for all the LINE statements in all compute blocks. Because no
style element is specified, PROC REPORT uses all the style attributes of the
default style element for this location except for the ones that are specified here.
style(lines)=[color=white backgroundcolor=black
fontstyle=italic fontweight=bold fontsize=5]
Specify the style attributes for the report summaries. This STYLE= option sets the
style element for all the default summary lines. Because no style element is
specified, PROC REPORT uses all the style attributes of the default style element
for this location except for the ones that are specified here.
style(summary)=[color=cx3e3d73 backgroundcolor=cxaeadd9
fontfamily=helvetica fontsize=3 textalign=r];
Specify the report columns. The report contains columns for Manager,
Department, and Sales.
column manager department sales;
Define the first sort order variable. In this report Manager is an order variable.
PROC REPORT arranges the rows first by the value of Manager (because it is the
first variable in the COLUMN statement). ORDER= specifies that values of Manager
are arranged according to their formatted values. FORMAT= specifies the format to
use for this variable. Text in quotation marks specifies the column headings.
define manager / order
order=formatted
format=$mgrfmt.
'Manager'
2216 Chapter 58 / REPORT Procedure
Specify the style attributes for the first sort order variable column heading. The
STYLE= option sets the foreground and background colors of the column heading
for the Manager column heading.
style(header)=[color=white
backgroundcolor=black];
Define the second sort order variable. In this report Department is an order
variable. PROC REPORT arranges the rows first by the value of Manager (because
it is the first variable in the COLUMN statement), then by the value of Department.
ORDER= specifies that values of Department are arranged according to their
internal values. FORMAT= specifies the format to use for this variable. Text in
quotation marks specifies the column heading.
define department / order
order=internal
format=$deptfmt.
'Department'
Specify the style attributes for the second sort order variable column. The
STYLE= option sets the font of the cells in the column Department to italic. The
style attributes for the cells match the ones that were established for the COLUMN
location in the PROC REPORT statement.
style(column)=[fontstyle=italic];
Specify the text for the customized summary. The LINE statement places the
quoted text and the values of Manager and Sales.sum (with the formats $MGRFMT.
and DOLLAR7.2) in the summary. An ENDCOMP statement must end the compute
block.
line 'Subtotal for ' manager $mgrfmt. 'is '
sales.sum dollar7.2 '.';
endcomp;
Produce a customized background for the analysis column. This compute block
specifies a background color and a bold font for all cells in the Sales column that
contain values of 100 or greater and that are not summary lines.
compute sales;
if sales.sum>100 and _break_=' ' then
Example 13: Specifying Style Elements for ODS Output in Multiple Statements 2217
Select the observations to process. The WHERE statement selects for the report
only the observations for stores in the southeast sector.
where sector='se';
Output: HTML
Output 58.16 Style Elements for ODS HTML Output in Multiple Statements
Example 13: Specifying Style Elements for ODS Output in Multiple Statements 2219
Output: PDF
Output 58.17 Style Elements for ODS PDF Output in Multiple Statements
2220 Chapter 58 / REPORT Procedure
Output: RTF
Output 58.18 Style Elements for ODS RTF Output in Multiple Statements
Details
This example uses PROC REPORT to create a table and uses ODS style attributes.
This example
n sets the cell width of the total report
n defines the cell width for the columns in the ODS output.
Refer to “Style Attributes Tables” in SAS Output Delivery System: Advanced Topics
for details.
Note: The DEFINE statement WIDTH= option changes the width only for tables
output in LISTING output.
Program
proc report data=sashelp.class;
col name age sex;
define name / style(column)=[cellwidth=1in];
define age / style(column)=[cellwidth=.5in];
define sex / style(column)=[cellwidth=.5in];
title "Using the CELLWIDTH= Style with PROC REPORT";
run;
Program Description
Specify the cell width for all the columns in the ODS output. By default, the
REPORT procedure runs without the REPORT window and sends its output to the
open output destination. ODS HTML is the output destination used as the default
in this example.
proc report data=sashelp.class;
Define the column widths using the CELLWIDTH= style attribute. Define the
dimensions of the NAME, AGE, and SEX column using the STYLE= option.
define name / style(column)=[cellwidth=1in];
define age / style(column)=[cellwidth=.5in];
2222 Chapter 58 / REPORT Procedure
Specify a table title. Provide a table name and run the SAS program.
title "Using the CELLWIDTH= Style with PROC REPORT";
run;
Output: HTML
Output 58.19 Using the CELLWIDTH= Style Attributes with PROC REPORT
Example 15: Using STYLE/MERGE in PROC REPORT CALL DEFINE Statement 2223
Details
This example uses PROC REPORT to create a table that uses the STYLE/MERGE
option in the CALL DEFINE statement.
Program
proc report data=sashelp.class;
col name sex age height weight;
define name--weight / display;
compute sex;
if sex = 'M' then
call define('name', "style", "style=[background=cyan]");
endcomp;
compute age;
if age > 13 then
call define('name', "style/merge", "style=[color=red]");
endcomp;
title "Using STYLE/MERGE Style with PROC REPORT";
run;
Program Description
Specify the cell width for all the columns in the ODS output. By default, the
REPORT procedure runs without the REPORT window and sends its output to the
2224 Chapter 58 / REPORT Procedure
open output destination. ODS HTML is the output destination used as the default
in this example.
proc report data=sashelp.class;
Apply style attributes. Apply the cyan color to the background of the names that
are males.
compute sex;
if sex = 'M' then
call define('name', "style", "style=[background=cyan]");
endcomp;
Merge style attributes. Apply the color red to the names that are over the age 13.
That name color is merged with cells that have the cyan background color.
compute age;
if age > 13 then
call define('name', "style/merge", "style=[color=red]");
endcomp;
Specify a table title. Provide a table name and run the SAS program.
title "Using STYLE/MERGE Style with PROC REPORT";
run;
Example 16: Using STYLE/REPLACE in PROC REPORT CALL DEFINE Statement 2225
Output: HTML
Output 58.20 Using STYLE/MERGE
STYLE/REPLACE
TITLE statement
Details
This example uses PROC REPORT to create a table that uses the STYLE/REPLACE
option in the CALL DEFINE statement.
Program
proc report data=sashelp.class;
col name sex age height weight;
define name--weight / display;
compute sex;
if sex = 'M' then
call define('name', "style", "style=[background=cyan]");
endcomp;
compute age;
if age > 13 then
call define('name', "style/replace", "style=[color=red]");
endcomp;
title "Using STYLE/REPLACE";
run;
Program Description
Specify the cell width for all the columns in the ODS output. By default, the
REPORT procedure runs without the REPORT window and sends its output to the
open output destination. ODS HTML is the output destination used as the default
in this example.
proc report data=sashelp.class;
Apply style attributes. Apply the cyan color to the background of the names that
are males.
compute sex;
if sex = 'M' then
Example 16: Using STYLE/REPLACE in PROC REPORT CALL DEFINE Statement 2227
Replace style attributes. Apply the color red to the names that are over the age 13.
For the names that are over the age of 13, the red name color replaces the
background color cyan.
compute age;
if age > 13 then
call define('name', "style/replace", "style=[color=red]");
endcomp;
Specify a table title. Provide a table name and run the SAS program.
title "Using STYLE/REPLACE";
run;
2228 Chapter 58 / REPORT Procedure
Output: HTML
Output 58.21 Using STYLE/REPLACE
2229
59
REPORT Procedure Windows
Dictionary
BREAK Window
Controls PROC REPORT's actions at a change in the value of a group or order variable or at the top
or bottom of a report.
Details
Path
Edit ð Summarize information
After you select Summarize information, PROC REPORT offers you four choices
for the location of the break:
n Before Item
n After Item
n At the top
n At the bottom
Note: To create a break before or after detail lines (when the value of a group or
order variable changes), you must select a variable before you open the BREAK
window.
BREAK Window 2231
Description
Note: For information about changing the formatting characters that are used by
the line drawing options in this window, see the discussion of “FORMCHAR
<(position(s))>='formatting-character(s)' ” on page 2081
Options
Overline summary
uses the second formatting character to overline each value
n that appears in the summary line
n that would appear in the summary line if you specified the SUMMARIZE
option
Interaction If you specify options to overline and to double overline, then PROC
REPORT overlines.
n that would appear in the summary line if you specified the SUMMARIZE
option
Interaction If you specify options to overline and to double overline, then PROC
REPORT overlines.
2232 Chapter 59 / REPORT Procedure Windows
Underline summary
uses the second formatting character to underline each value
n that appears in the summary line
n that would appear in the summary line if you specified the SUMMARIZE
option
n that would appear in the summary line if you specified the SUMMARIZE
option
This option has no effect if you use it in a break at the end of a report.
Interaction If you use this option in a break on a variable and you create a break
at the end of the report, then the summary for the whole report is on
a separate page.
n analysis variables
n computed variables
The following table shows how PROC REPORT calculates the value for each
type of report item in a summary line created by the BREAK window:
BREAK Window 2233
The break variable The current value of the variable (or a missing
value if you select suppress break value)
*
If you reference a variable with a missing value in a customized summary line,
then PROC REPORT displays that variable as a blank (for character variables) or a
period (for numeric variables).
n any underlining and overlining in the break lines in the column containing the
break variable
If you select Suppress break value, then the value of the break variable is
unavailable for use in customized break lines unless you assign it a value in the
compute block that is associated with the break.
Color
From the list of colors, select the one to use in the REPORT window for the column
heading and the values of the item that you are defining. The default is the color of
Foreground in the SASCOLOR window. (For more information, see the online Help
for the SASCOLOR window.)
2234 Chapter 59 / REPORT Procedure Windows
Note: Not all operating environments and devices support all colors, and in some
operating environments and devices, one color might map to another color. For
example, if the DEFINITION window displays the word BROWN in yellow
characters, then selecting BROWN results in a yellow item.
Buttons
Edit Program
opens the COMPUTE window and enables you to associate a compute block
with a location in the report.
OK
applies the information in the BREAK window to the report and closes the
window.
Cancel
closes the BREAK window without applying information to the report.
COMPUTE Window
Attaches a compute block to a report item or to a location in the report. Use the SAS Text Editor
commands to manipulate text in this window.
Details
Path
From Edit Program in the COMPUTED VAR, DEFINITION, or BREAK window.
Description
For information about the SAS language features that you can use in the COMPUTE
window, see “The Contents of Compute Blocks” on page 2062.
Details
Path
Select a column. Then select Edit ð Add Item ð Computed Column.
After you select Computed Column, PROC REPORT prompts you for the location
of the computed column relative to the column that you have selected. After you
select a location, the COMPUTED VAR window appears.
Description
Enter the name of the variable at the prompt. If it is a character variable, then
select the Character data check box and, if you want, enter a value in the Length
field. The length can be any integer between 1 and 200. If you leave the field blank,
then PROC REPORT assigns a length of 8 to the variable.
After you enter the name of the variable, select Edit Program to open the
COMPUTE window. Use programming statements in the COMPUTE window to
define the computed variable. After closing the COMPUTE and COMPUTED VAR
windows, open the DEFINITION window to describe how to display the computed
variable.
Details
Path
Select a report item. Then select Edit ð Add Item ð Data Column.
After you select Data column, PROC REPORT prompts you for the location of the
computed column relative to the column that you have selected. After you select a
location, the DATA COLUMNS window appears.
2236 Chapter 59 / REPORT Procedure Windows
Description
Select one or more variables to add to the report. When you select the first
variable, it moves to the top of the list in the window. If you select multiple
variables, then subsequent selections move to the bottom of the list of selected
variables. An asterisk (*) identifies each selected variable. The order of selected
variables from top to bottom determines their order in the report from left to right.
Details
Path
File ð Open Data Set
Description
The first list box in the DATA SELECTION window lists all the librefs defined for
your SAS session. The second one lists all the SAS data sets in the selected library.
Note: You must use data that is compatible with the current report definition. The
data set that you load must contain variables whose names are the same as the
variable names in the current report definition.
Buttons
OK
loads the selected data set into the current report definition.
Cancel
closes the DATA SELECTION window without loading new data.
DEFINITION Window
Displays the characteristics associated with an item in the report and lets you change them.
DEFINITION Window 2237
Details
Path
Select a report item. Then select Edit ð Define.
Description
Usage
For an explanation of each type of usage, see “Laying Out a Report ” on page 2054.
DISPLAY
defines the selected item as a display variable. DISPLAY is the default for
character variables.
ORDER
defines the selected item as an order variable.
GROUP
defines the selected item as a group variable.
ACROSS
defines the selected item as an across variable.
ANALYSIS
defines the selected item as an analysis variable. You must specify a statistic
(see the discussion of the Statistic= attribute on page 2238) for an analysis
variable. ANALYSIS is the default for numeric variables.
COMPUTED
defines the selected item as a computed variable. Computed variables are
variables that you define for the report. They are not in the input data set, and
PROC REPORT does not add them to the input data set. However, computed
variables are included in an output data set if you create one.
Attributes
Format=
assigns a SAS or user-defined format to the item. This format applies to the
selected item as PROC REPORT displays it; the format does not alter the
format that is associated with a variable in the data set. For data set variables,
PROC REPORT honors the first of these formats that it finds:
n the format that is assigned with FORMAT= in the DEFINITION window
n the format that is assigned in a FORMAT statement when you start PROC
REPORT
n the format that is associated with the variable in the data set
If none of these formats is present, then PROC REPORT uses BESTw. for
numeric variables and $w. for character variables. The value of w is the default
column width. For character variables in the input data set, the default column
width is the variable's length. For numeric variables in the input data set and for
computed variables (both numeric and character), the default column width is
the value of the COLWIDTH= attribute in the ROPTIONS window.
If you are unsure what format to use, then type a question mark (?) in the format
field in the DEFINITION window to access the FORMATS window.
Spacing=
defines the number of blank characters to leave between the column being
defined and the column immediately to its left. For each column, the sum of its
width and the blank characters between it and the column to its left cannot
exceed the line size.
Default 2
Width=
defines the width of the column in which PROC REPORT displays the selected
item.
Default A column width that is just large enough to handle the format. If there
is no format, then PROC REPORT uses the value of COLWIDTH=.
Note When you stack items in the same column in a report, the width of the
item that is at the bottom of the stack determines the width of the
column.
Statistic=
associates a statistic with an analysis variable. You must associate a statistic
with every analysis variable in its definition. PROC REPORT uses the statistic
that you specify to calculate values for the analysis variable for the
DEFINITION Window 2239
observations represented by each cell of the report. You cannot use statistic in
the definition of any other type of variable.
Note: PROC REPORT uses the name of the analysis variable as the default
heading for the column. You can customize the column heading with the Header
field of the DEFINITION window.
CSS PCTSUM
CV RANGE
MAX STD
MEAN STDERR
MIN SUM
N SUMWGT
NMISS USS
PCTN VAR
P1 P90
P5 P95
P10 P99
Q1 | P25 QRANGE
PRT|PROBT T
Explanations of the keywords, the formulas that are used to calculate them, and
the data requirements are discussed in Appendix 1, “SAS Elementary Statistics
Procedures,” on page 2699.
Default SUM
2240 Chapter 59 / REPORT Procedure Windows
Requirement To compute standard error and the Student's t-test you must use
the default value of VARDEF=, which is DF.
Order=
orders the values of a GROUP, ORDER, or ACROSS variable according to the
specified order, where
DATA
orders values according to their order in the input data set.
FORMATTED
orders values by their formatted (external) values. By default, the order is
ascending.
FREQ
orders values by ascending frequency count.
INTERNAL
orders values by their unformatted values, which yields the same order that
PROC SORT would yield. This order is operating environment-dependent.
This sort sequence is particularly useful for displaying dates chronologically.
Default FORMATTED
Interaction DESCENDING in the item's definition reverses the sort sequence for
an item.
Note The default value for the ORDER= option in PROC REPORT is not
the same as the default value in other SAS procedures. In other SAS
procedures, the default is ORDER=INTERNAL. The default for the
option in PROC REPORT might change in a future release to be
consistent with other procedures. Therefore, in production jobs
where it is important to order report items by their formatted values,
specify ORDER=FORMATTED even though it is currently the
default. Doing so ensures that PROC REPORT will continue to
produce the reports that you expect even if the default changes.
Justify=
You can justify the placement of the column heading and of the values of the
item that you are defining within a column in one of three ways:
LEFT
left-justifies the formatted values of the item that you are defining within
the column width and left-justifies the column heading over the values. If the
format width is the same as the width of the column, then LEFT has no
effect on the placement of values.
RIGHT
right-justifies the formatted values of the item that you are defining within
the column width and right-justifies the column heading over the values. If
the format width is the same as the width of the column, then RIGHT has no
effect on the placement of values.
DEFINITION Window 2241
CENTER
centers the formatted values of the item that you are defining within the
column width and centers the column heading over the values. This option
has no effect on the setting of the SAS system option CENTER.
When justifying values, PROC REPORT justifies the field width defined by
the format of the item within the column. Thus, numbers are always aligned.
Data type=
shows you if the report item is numeric or character. You cannot change this
field.
Item Help=
references a HELP or CBT entry that contains Help information for the selected
item. Use PROC BUILD in SAS/AF software to create a HELP or CBT entry for a
report item. All HELP and CBT entries for a report must be in the same catalog,
and you must specify that catalog with the HELP= option in the PROC REPORT
statement or from the User Help fields in the ROPTIONS window.
To access a Help entry from the report, select the item and issue the HELP
command. PROC REPORT first searches for and displays an entry named entry-
name.CBT. If no such entry exists, then PROC REPORT searches for entry-
name.HELP. If neither a CBT nor a HELP entry for the selected item exists, then
the opening frame of the Help for PROC REPORT is displayed.
Alias=
By entering a name in the Alias field, you create an alias for the report item that
you are defining. Aliases let you distinguish between different uses of the same
report item. When you refer in a compute block to a report item that has an
alias, you must use the alias. (See “Example 3: Using Aliases to Obtain Multiple
Statistics for the Same Variable” on page 2177.)
Options
NOPRINT
suppresses the display of the item that you are defining. Use this option
n if you do not want to show the item in the report but you need to use the
values in it to calculate other values that you use in the report.
n to establish the order of rows in the report.
n if you do not want to use the item as a column but want to have access to its
values in summaries. (See “Example 7: Writing a Customized Summary on
Each Page” on page 2190.)
Interactions Even though the columns that you define with NOPRINT do not
appear in the report, you must count them when you are referencing
columns by number. (See “Four Ways to Reference Report Items in
a Compute Block” on page 2063 .)
NOZERO
suppresses the display of the item that you are defining if its values are all zero
or missing.
Interactions Even though the columns that you define with NOZERO do not
appear in the report, you must count them when you are referencing
columns by number. (See “Four Ways to Reference Report Items in
a Compute Block” on page 2063.)
DESCENDING
reverses the order in which PROC REPORT displays rows or values of a group,
order, or across variable.
PAGE
inserts a page break just before printing the first column containing values of
the selected item.
Interaction PAGE is ignored if you use WRAP in the PROC REPORT statement
or in the ROPTIONS window.
FLOW
wraps the value of a character variable in its column. The FLOW option honors
the split character. If the text contains no split character, then PROC REPORT
tries to split text at a blank.
ID column
specifies that the item that you are defining is an ID variable. An ID variable and
all columns to its left appear at the left of every page of a report. ID ensures
that you can identify each row of the report when the report contains more
columns than will fit on one page.
Color
From the list of colors, select the one to use in the REPORT window for the column
heading and the values of the item that you are defining. The default is the color of
Foreground in the SASCOLOR window. (For more information, see the online Help
for the SASCOLOR window.)
Note: Not all operating environments and devices support all colors, and in some
operating environments and devices, one color might map to another color. For
example, if the DEFINITION window displays the word BROWN in yellow
characters, then selecting BROWN results in a yellow item.
Buttons
Apply
applies the information in the open window to the report and keeps the window
open.
EXPLORE Window 2243
Edit Program
opens the COMPUTE window and enables you to associate a compute block
with the variable that you are defining.
OK
applies the information in the DEFINITION window to the report and closes the
window.
Cancel
closes the DEFINITION window without applying changes made with APPLY.
Details
Path
View ð Display Page
Description
You can access the last page of the report by entering a large number for the page
number. When you are on the last page of the report, PROC REPORT sends a note
to the message line of the REPORT window.
EXPLORE Window
Lets you experiment with your data.
Restriction: You cannot open the EXPLORE window unless your report contains at least one
group or order variable.
Details
Path
Edit ð Explore Data
Description
In the EXPLORE window, you can
2244 Chapter 59 / REPORT Procedure Windows
n suppress the display of a column with the Remove Column check box
Note: The results of your manipulations in the EXPLORE window appear in the
REPORT window but are not saved in report definitions.
Window Features
list boxes
The EXPLORE window contains three list boxes. These boxes contain the value
All levels as well as actual values for the first three group or order variables in
your report. The values reflect any WHERE clause processing that is in effect.
For example, if you use a WHERE clause to subset the data so that it includes
only the northeast and northwest sectors, then the only values that appear in
the list box for Sector are All levels, Northeast, and Northwest. Selecting All
levels in this case displays rows of the report for only the northeast and
northwest sectors. To see data for all the sectors, you must clear the WHERE
clause before you open the EXPLORE window.
Selecting values in the list boxes restricts the display in the REPORT window to
the values that you select. If you select incompatible values, then PROC
REPORT returns an error.
Remove Column
Above each list box in the EXPLORE window is a check box labeled Remove
Column. Selecting this check box and applying the change removes the column
from the REPORT window. You can easily restore the column by clearing the
check box and applying that change.
Buttons
OK
applies the information in the EXPLORE window to the report and closes the
window.
Apply
applies the information in the EXPLORE window to the report and keeps the
window open.
Rotate columns
changes the order of the variables displayed in the list boxes. Each variable that
can move one column to the left does; the leftmost variable moves to the third
column.
Cancel
closes the EXPLORE window without applying changes made with APPLY.
LOAD REPORT Window 2245
FORMATS Window
Displays a list of formats and provides a sample of each one.
Details
Path
From the DEFINE window, type a question mark (?) in the Format field and select
any of the Buttons except Cancel, or press Enter.
Description
When you select a format in the FORMATS window, a sample of that format
appears in the Sample: field. Select the format that you want to use for the variable
that you are defining.
Buttons
OK
writes the format that you have selected into the Format field in the
DEFINITION window and closes the FORMATS window. To see the format in the
report, select Apply in the DEFINITION window.
Cancel
closes the FORMATS window without writing a format into the Format field.
Details
Path
File ð Open Report
Description
The first list box in the LOAD REPORT window lists all the librefs that are defined
for your SAS session. The second list box lists all the catalogs that are in the
2246 Chapter 59 / REPORT Procedure Windows
selected library. The third list box lists descriptions of all the stored report
definitions (entry types of REPT) that are in the selected catalog. If there is no
description for an entry, then the list box contains the entry's name.
Buttons
OK
loads the current data into the selected report definition.
Cancel
closes the LOAD REPORT window without loading a new report definition.
Note: Issuing the END command in the REPORT window returns you to the
previous report definition (with the current data).
MESSAGES Window
Automatically opens to display notes, warnings, and errors returned by PROC REPORT.
Details
You must close the MESSAGES window by selecting OK before you can continue to
use PROC REPORT.
PROFILE Window
Customizes some features of the PROC REPORT environment by creating a report profile.
Details
Path
Tools ð Report Profile
Description
The PROFILE window creates a report profile that
n specifies the SAS library, catalog, and entry that define alternative menus to use
in the REPORT and COMPUTE windows. Use PROC PMENU to create catalog
PROMPTER Window 2247
entries of type PMENU that define these menus. PMENU entries for both
windows must be in the same catalog.
n sets defaults for WINDOWS, PROMPT, and COMMAND. PROC REPORT uses
the default option whenever you start the procedure unless you specifically
override the option in the PROC REPORT statement.
Specify the catalog that contains the profile to use with the PROFILE= option in the
PROC REPORT statement. (See the discussion of “PROFILE=libref.catalog” on page
2088.)
Buttons
OK
stores your profile in a file that is called SASUSER.PROFILE.REPORT.PROFILE.
Note: Use PROC CATALOG or the EXPLORER window to copy the profile to
another location.
Cancel
closes the window without storing the profile.
PROMPTER Window
Prompts you for information as you add items to a report.
Details
Path
Specify the PROMPT option when you start PROC REPORT or select PROMPT
from the ROPTIONS window. The PROMPTER window appears the next time you
add an item to the report.
Description
The prompter guides you through parts of the windows that are most commonly
used to build a report. As the content of the PROMPTER window changes, the title
of the window changes to the name of the window that you would use to perform a
task if you were not using the prompter. The title change is to help you begin to
associate the windows with their functions and to learn what window to use if you
later decide to change something.
If you start PROC REPORT with prompting, then the first window gives you a
chance to limit the number of observations that are used during prompting. When
you exit the prompter, PROC REPORT removes the limit.
2248 Chapter 59 / REPORT Procedure Windows
Buttons
OK
applies the information in the open window to the report and continues the
prompting process.
Note: When you select OK from the last prompt window, PROC REPORT
removes any limit on the number of observations that it is working with.
Apply
applies the information in the open window to the report and keeps the window
open.
Backup
returns you to the previous PROMPTER window.
Exit Prompter
closes the PROMPTER window without applying any more changes to the
report. If you have limited the number of observations to use during prompting,
then PROC REPORT removes the limit.
REPORT Window
Is the surface on which the report appears.
Details
Path
Use WINDOWS or PROMPT in the PROC REPORT statement.
Description
You cannot write directly in any part of the REPORT window except column
headings. To change other aspects of the report, you select a report item (for
example, a column heading) as the target of the next command and issue the
command. To select an item, use a mouse or cursor keys to position the cursor over
it. Then click the mouse button or press Enter. To execute a command, make a
selection from the menu bar at the top of the REPORT window. PROC REPORT
displays the effect of a command immediately unless the DEFER option is on.
Note: Issuing the END command in the REPORT window returns you to the
previous report definition with the current data. If there is no previous report
definition, then END closes the REPORT window.
ROPTIONS Window 2249
Note: In the REPORT window, there is no Save As option from the File menu to
save your report to a file. Instead:
1 From the REPORT window, select Save Data Set. In the dialog box, enter a SAS
library and filename in which to save this data set.
2 From the Program Editor window, execute a PROC PRINT.
3 In the File menu, select Save As to save the generated output to a file.
ROPTIONS Window
Displays choices that control the layout and display of the entire report and identifies the SAS
library and catalog containing CBT or HELP entries for items in the report.
Details
Path
Tools ð Options ð Report
Description
2250 Chapter 59 / REPORT Procedure Windows
Modes
DEFER
stores the information for changes and makes the changes all at once when you
turn DEFER mode off or select View ð Refresh.
DEFER is particularly useful when you know that you need to make several
changes to the report but do not want to see the intermediate reports.
By default, PROC REPORT redisplays the report in the REPORT window each
time you redefine the report by adding or deleting an item, by changing
information in the DEFINITION window, or by changing information in the
BREAK window.
PROMPT
opens the PROMPTER window the next time you add an item to the report.
Options
CENTER
centers the report and summary text (customized break lines). If CENTER is not
selected, then the report is left-justified.
PROC REPORT honors the first of these centering specifications that it finds:
n the CENTER or NOCENTER option in the PROC REPORT statement or the
CENTER toggle in the ROPTIONS window
n the CENTER or NOCENTER option stored in the report definition loaded
with REPORT= in the PROC REPORT statement
n the SAS system option CENTER or NOCENTER
HEADLINE
underlines all column headings and the spaces between them at the top of each
page of the report.
HEADLINE underlines with the second formatting character. (See the discussion
of “FORMCHAR <(position(s))>='formatting-character(s)' ” on page 2081.)
HEADSKIP
writes a blank line beneath all column headings (or beneath the underlining that
the HEADLINE option writes) at the top of each page of the report.
NAMED
writes name= in front of each value in the report, where name is the column
heading for the value.
ROPTIONS Window 2251
NOHEADER
suppresses column headings, including headings that span multiple columns.
Once you suppress the display of column headings in the interactive report
window environment, you cannot select any report items.
SHOWALL
overrides the parts of a definition that suppress the display of a column
(NOPRINT and NOZERO). You define a report item with a DEFINE statement or
in the DEFINITION window.
WRAP
displays one value from each column of the report, on consecutive lines if
necessary, before displaying another value from the first column. By default,
PROC REPORT displays values for only as many columns as it can fit on one
page. It fills a page with values for these columns before starting to display
values for the remaining columns on the next page.
Interaction When WRAP is in effect, PROC REPORT ignores PAGE in any item
definitions.
BOX
uses formatting characters to add line-drawing characters to the report. These
characters
n surround each page of the report
Interaction You cannot use BOX if you use WRAP in the PROC REPORT
statement or ROPTIONS window or if you use FLOW in any item's
definition.
MISSING
considers missing values as valid values for group, order, or across variables.
Special missing values that are used to represent numeric values (the letters A
through Z and the underscore (_) character) are each considered as a different
value. A group for each missing value appears in the report. If you omit the
2252 Chapter 59 / REPORT Procedure Windows
MISSING option, then PROC REPORT does not include observations with a
missing value for one or more group, order, or across variables in the report.
Attributes
LINESIZE=
specifies the line size for a report. PROC REPORT honors the first of these line-
size specifications that it finds:
n LS= in the PROC REPORT statement or LINESIZE= in the ROPTIONS
window
n the LS= setting stored in the report definition loaded with REPORT= in the
PROC REPORT statement
n the SAS system option LINESIZE=
Tip If the line size is greater than the width of the REPORT window, then use
SAS interactive report window environment commands RIGHT and LEFT
to display portions of the report that are not currently in the display.
PAGESIZE=
specifies the page size for a report. PROC REPORT honors the first of these
page size specifications that it finds:
n PS= in the PROC REPORT statement or PAGESIZE= in the ROPTIONS
window
n the PS= setting stored in the report definition loaded with REPORT= in the
PROC REPORT statement
n the SAS system option PAGESIZE=
COLWIDTH=
specifies the default number of characters for columns containing computed
variables or numeric data set variables.
When setting the width for a column, PROC REPORT first looks at WIDTH= in
the definition for that column. If WIDTH= is not present, then PROC REPORT
uses a column width large enough to accommodate the format for the item. (For
information about formats, see the discussion of “Format=” on page 2238.) If no
format is associated with the item, then the column width depends on variable
type:
Numeric variable in the input data set Value of the COLWIDTH= option
ROPTIONS Window 2253
Default 9
SPACING=space-between-columns
specifies the number of blank characters between columns. For each column,
the sum of its width and the blank characters between it and the column to its
left cannot exceed the line size.
Default 2
Interactions PROC REPORT separates all columns in the report by the number
of blank characters specified by SPACING= in the PROC REPORT
statement or the ROPTIONS window unless you use SPACING= in
the definition of a particular item to change the spacing to the left
of that item.
SPLIT='character'
specifies the split character. PROC REPORT breaks a column heading when it
reaches that character and continues the heading on the next line. The split
character itself is not part of the column heading although each occurrence of
the split character counts toward the 40-character maximum for a label.
Interaction The FLOW option in the DEFINE statement honors the split
character.
Tip If you are typing over a heading (rather than entering one from the
PROMPTER or DEFINITION window), then you do not see the effect
of the split character until you refresh the screen by adding or
deleting an item, by changing the contents of a DEFINITION or
BREAK window, or by selectingView ð Refresh.
PANELS=number-of-panels
specifies the number of panels on each page of the report. If the width of a
report is less than half of the line size, then you can display the data in multiple
sets of columns so that rows that would otherwise appear on multiple pages
appear on the same page. Each set of columns is a panel. A familiar example of
this type of report is a telephone book, which contains multiple panels of names
and telephone numbers on a single page.
When PROC REPORT writes a multipanel report, it fills one panel before
beginning the next.
2254 Chapter 59 / REPORT Procedure Windows
n line size
Default 1
Tip If number-of-panels is larger than the number of panels that can fit on
the page, then PROC REPORT creates as many panels as it can. Let
PROC REPORT put your data in the maximum number of panels that
can fit on the page by specifying a large number of panels (for example,
99).
See For information about specifying the space between panels see the
discussion of PSPACE=. For information about setting the line size, see
the discussion of “LINESIZE=” on page 2252.
PSPACE=space-between-panels
specifies the number of blank characters between panels. PROC REPORT
separates all panels in the report by the same number of blank characters. For
each panel, the sum of its width and the number of blank characters separating
it from the panel to its left cannot exceed the line size.
Default 4
User Help
identifies the library and catalog containing user-defined Help for the report.
This Help can be in CBT or HELP catalog entries. You can write a CBT or HELP
entry for each item in the report with the BUILD procedure in SAS/AF software.
You must store all such entries for a report in the same catalog.
Specify the entry name for Help for a particular report item in the DEFINITION
window for that report item or in a DEFINE statement.
Details
Path
File ð Save Data Set
SAVE DEFINITION Window 2255
Description
To specify an output data set, enter the name of the SAS library and the name of
the data set (called member in the window) that you want to create in the Save Data
Set window.
Buttons
OK
creates the output data set and closes the Save Data Set window.
Cancel
closes the Save Data Set window without creating an output data set.
Details
Path
File ð Save Report
Description
The SAVE DEFINITION window prompts you for the complete name of the catalog
entry in which to store the definition of the current report and for an optional
description of the report. This description shows up in the LOAD REPORT window
and helps you select the appropriate report.
SAS stores the report definition as a catalog entry of type REPT. You can use a
report definition to create an identically structured report for any SAS data set that
contains variables with the same names as those variables that are used in the
report definition.
Buttons
OK
creates the report definition and closes the SAVE DEFINITION window.
Cancel
closes the SAVE DEFINITION window without creating a report definition.
2256 Chapter 59 / REPORT Procedure Windows
SOURCE Window
Lists the PROC REPORT statements that build the current report.
Details
Path
Tools ð Report Statements
STATISTICS Window
Displays statistics that are available in PROC REPORT.
Details
Path
Edit ð Add item ð Statistic
After you select Statistic, PROC REPORT prompts you for the location of the
statistic relative to the column that you have selected. After you select a location,
the STATISTICS window appears.
Description
Select the statistics that you want to include in your report and close the window.
When you select the first statistic, it moves to the top of the list in the window. If
you select multiple statistics, then subsequent selections move to the bottom of
the list of selected statistics. An asterisk (*) indicates each selected statistic. The
order of selected statistics from top to bottom determines their order in the report
from left to right.
To compute standard error and the Student's t test, you must use the default value
of VARDEF=, which is DF.
WHERE ALSO Window 2257
To add all selected statistics to the report, select File ð Accept Selection.
Selecting File ð Close closes the STATISTICS window without adding the selected
statistics to the report.
WHERE Window
Selects observations from the data set that meet the conditions that you specify.
Details
Path
Subset ð Where
Description
Enter a where-expression in the Enter WHERE clause field. A where-expression is
an arithmetic or logical expression that generally consists of a sequence of
operands and operators. For information about constructing a where-expression,
see the documentation of the “WHERE” in SAS DATA Step Statements: Reference.
Note: You can clear all where-expressions by leaving the Enter WHERE clause field
empty and by selecting OK.
Buttons
OK
applies the where-expression to the report and closes the WHERE window.
Cancel
closes the WHERE window without altering the report.
Details
Path
Subset ð Where Also
Description
Enter a where-expression in the Enter where also clause field. A where-expression
is an arithmetic or logical expression that generally consists of a sequence of
operands and operators. For information about constructing a where-expression,
see the documentation of the “WHERE” in SAS DATA Step Statements: Reference.
Buttons
OK
adds the where-expression to any other where-expressions that are already in
effect and applies them all to the report. It also closes the WHERE ALSO
window.
Cancel
closes the WHERE ALSO window without altering the report.
2259
60
S3 Procedure
Overview: S3 Procedure
Note:
The S3 procedure is not supported on z/OS platforms.
Before you can use the S3 procedure, you need an Amazon Web Service (AWS) key
ID and secret. When using temporary credentials, you also need a security token.
For more information, see the Amazon S3 documentation.
When the data that you add to or retrieve from S3 is larger than 5 MB, the
procedure creates additional threads. These threads enable parallel processing for
faster transfer speeds.
Concepts: S3 Procedure
SAS first attempts to use IMDSv2 to obtain a session token. If using IMDSv2 fails,
then IMDSv1 is used.
PROC S3 Configuration
Options that are specified in AWS CLI configuration files override options that are
specified in the PROC S3 configuration file. Options that you specify in the S3
procedure override options that are set in configuration files.
The config file contains multiple sets of options called profiles. Specify which
configuration profile to use with the PROFILE= option in the PROC S3 statement.
Note: If an option is specified in the AWS CLI configuration file as well as in the
local PROC S3 configuration file, then the value in the AWS CLI configuration file
takes precedence.
2262 Chapter 60 / S3 Procedure
The PROC S3 configuration file contains case-sensitive name-value pairs that you
specify as name=value. Do not enclose values in quotation marks. The file can
contain one or more sets of the following name-value pairs:
region
specifies the AWS region to connect to. See the Region argument for the list of
valid region values.
keyId
specifies the AWS access key ID. This value is a 20-character, alphanumeric
string.
secret
specifies the AWS secret access key. This value is a 40-character string.
Note: See “Securing the PROC S3 Configuration File” for information about
restricting access to this information.
sessionToken
specifies the session token. Specify this value if you are using temporary AWS
security credentials.
Note: See “Securing the PROC S3 Configuration File” for information about
restricting access to this information.
ssl
specifies whether SSL or TLS should be used for connections. Specify true or
yes to use SSL or TLS. Any other value deactivates SSL or TLS.
In Microsoft Windows, use the file properties to restrict access to the PROC S3
configuration file. You can access the file properties by right-clicking the file,
selecting Properties, and then selecting the Security tab.
Concepts: S3 Procedure 2263
Note: Support for custom regions was added to PROC S3 for SAS Viya in a July
2021 update to SAS Viya 3.5. Support for custom regions was added to PROC S3 for
SAS 9 in SAS 9.4M8.
TKS3_CUSTOM_REGION=<'>region-name<'>,HTTP-host,HTTP-port,SSL-HTTP-
port,SSLRequired,SSLAllowed
region-name
specifies the custom region name. Enclose the name in quotation marks if it
contains spaces or special characters.
HTTP-host
specifies the host name of the server for the region.
HTTP-port
specifies the port number of the server when communicating without SSL. Use
0 to use the default port number for the HTTP protocol.
SSL-HTTP-port
specifies the port number of the server when communicating with SSL. Use 0 to
use the default port number for the HTTPS protocol.
SSLRequired
specifies whether all communications with the S3 environment must use SSL.
Values are TRUE or FALSE.
SSLAllowed
specifies whether communications with the S3 environment can use SSL.
Values are TRUE or FALSE.
Here is a sample setting for this environment variable that uses a BackBlaze B2
server.
TKS3_CUSTOM_REGION=us-west-002,s3.us-west-002.backblazeb2.com,0,0,TRUE,TRUE
2264 Chapter 60 / S3 Procedure
In Linux, define multiple regions by separating sets of values with a colon (:). In
Windows, separate sets of values with a semicolon (;).
You can specify an encryption key in three forms: an Amazon Key Management
Service (KMS) key, a customer-supplied server-side encryption key (SSE-C) key
represented as a character string, or an SSE-C key represented by a hexadecimal
value.
You can specify a KMS key from the Amazon S3 environment. This type of key
works with AWS server-side encryption and is managed in the S3 environment as
part of the IAM service.
For SSE-C encryption keys, you can specify the key as a 32-byte character string or
as a 64-digit hexadecimal value. User-specified keys are then used by the S3
environment to encrypt data using the AES256 encryption algorithm.
As a best practice, manage encryption keys in a separate SAS program. You can
then use the %INCLUDE statement to read in your named encryptions and work
with the encryption key names. For an example, see “Example 3: Manage
Encryptions with S3 Data” on page 2284.
Note: Only IAM:KMS keys are supported for SAS/ACCESS Interface to Amazon
Redshift.
For more information, see “Encryption with Amazon Redshift” in SAS/ACCESS for
Relational Databases: Reference.
Transfer Acceleration
Transfer acceleration is a special mode that is used to load or write data. In this
mode, an alternative host enables faster data transfer. For more information, see
your AWS documentation.
The GET, GETDIR, PUT, and PUTDIR statements take advantage of this faster data
transfer method when transfer acceleration is enabled.
Syntax: S3 Procedure
PROC S3 <options>;
BUCKET "bucket-name";
COPY <SRCKEY="key-name"> "source-s3-location" <ENCKEY="key-name">
"destination-s3-location";
CREATE "bucket-name";
DELETE "s3-location";
DESTROY "bucket-name";
ENCKEY <ADD> | <REPLACE> NAME="key-name" ID="key-ID"
<CONTEXT="key-context">;
GET <ENCKEY="key-name"> "s3-location" "local-file-path";
GETACCEL "bucket-name";
GETDIR <ENCKEY="key-name"> "s3-location" "local-path";
INFO <ENCKEY="key-name"> "s3-location";
LIST <OUT=libref.file><_SHORT_> "s3-location";
MKDIR "s3-location";
PUT <ENCKEY="key-name"> "local-path" "s3-location";
PUTDIR <ENCKEY="key-name"> "local-path" "s3-location";
REGION <ADD HOST="AWS-server" NAME="region-name" <PORT=port-value>
<REPLACE> <SSLALLOWED><SSLPORT=SSL-port> <SSLREQUIRED>> |
<LIST> | <REMOVE NAME="region-name">;
RMDIR "s3-location";
RUN;
2266 Chapter 60 / S3 Procedure
PROC S3 Statement
Specifies the connection parameters for connecting to AWS S3.
Syntax
PROC S3 <AWSCONFIG="AWS-CLI-file-path"> <CONFIG="local-configuration-file-
path"> <PROFILE="configuration-profile-name"> <AWSCREDENTIALS="AWS-
credentials-file-path"> <ROLENAME="IAM-role-name" | ROLEARN="IAM-role-
ARN> <KEYID="AWS-key-ID"> <SECRET="AWS-secret"> <SESSION="session-
token"> <SSL | NOSSL> <REGION=AWS-region>;
Optional Arguments
AWSCONFIG="AWS-CLI-file-path"
specifies the path of the AWS CLI configuration file, called config. This file
contains connection parameters to access AWS S3. If a value is specified in the
AWS CLI configuration file and in the PROC S3 configuration file, the value in
the AWS CLI file takes precedence.
AWSCREDENTIALS="AWS-credentials-file-path"
specifies the path of the AWS credentials file. This file contains the credentials
that are used to access AWS S3.
Note: Support for this option was added in SAS Viya 3.4.
CONFIG="local-configuration-file-path"
specifies the path and filename of the PROC S3 configuration file that contains
connection parameters to access AWS S3. If you specify this option, then the
specified configuration file is read instead of the default PROC S3 configuration
file in the default location.
The default PROC S3 configuration file is file .tks3.conf in your home directory
in UNIX or file tks3.conf in your home directory in Windows. You do not need to
specify the CONFIG= option if your configuration file is in the default location.
KEYID="AWS-key-ID"
specifies the AWS access key ID. This value is a 20-character, alphanumeric
string. A sample key ID value is AKIAIOSFODNN7EXAMPLE.
PROFILE="profile-name"
specifies the profile to use in the AWS CLI file config. Sets of options are
grouped into profiles. If you do not specify the PROFILE= option, PROC S3 uses
the default profile.
REGION=AWS-region
specifies the AWS region to connect to. Here are the valid region values.
You might need to activate a region for your account if it does not appear in your
list of regions. See your AWS documentation for more information.
2268 Chapter 60 / S3 Procedure
1 Contact your AWS sales representative to get access to the Asia Pacific (Osaka) region.
ROLENAME="IAM-role-name"
specifies an IAM role name that enables you to get access credentials to S3 via
the AssumeRole IAM operation.
ROLEARN="IAM-role-ARN"
specifies an Amazon Resource Name (ARN) that identifies an IAM role. The
associated IAM role enables you to get access credentials to S3 via the
AssumeRole IAM operation.
SECRET="AWS-secret"
specifies the AWS secret access key. This value is a 40-character string. A
sample secret access value is wJalrXUtnFEMI/K7MDENG/
bPxRfiCYEXAMPLEKEY.
SESSION="session-token"
specifies the AWS session token. Use this value if you are using temporary AWS
security credentials.
Note Support for this option was added in the May 2021 update for SAS 9.4M7
and SAS Viya 3.5.
SSL | NOSSL
SSL specifies that SSL or TLS should be enabled during data transfer. NOSSL
specifies that SSL or TLS should be disabled during data transfer.
You can use the SSL value to override an ssl=no setting in the PROC S3
configuration file. If there is no value specified for SSL in the PROC S3
statement or in a PROC S3 configuration file, then SSL or TLS is used during
data transfer by default.
BUCKET Statement
Sets the acceleration transfer mode for the specified bucket.
Syntax
BUCKET "bucket-name" ACCELERATE | NOACCELERATE;
Required Arguments
bucket-name
specifies the name of the S3 bucket for which you are setting transfer
acceleration mode. For more information, see “Transfer Acceleration” on page
2265.
ACCELERATE
enables transfer acceleration for the specified bucket.
COPY Statement 2271
NOACCELERATE
disables transfer acceleration for the specified bucket.
COPY Statement
Copies an object from an S3 source location to an S3 destination.
Syntax
COPY <SRCKEY="key-name"> "source-s3-location" <ENCKEY="key-name">
"destination-s3-location";
Required Arguments
source-s3-location
specifies the S3 location of the object to be copied. Fully qualify the S3 location
from the bucket name to the object name.
destination-s3-location
specifies the S3 location to which an object should be copied. Fully qualify the
S3 location from the bucket name to the object name.
Optional Arguments
ENCKEY="key-name"
specifies an encryption key name that identifies the encryption for the created
object. The name must match an encryption key name that you specified in an
ENCKEY statement.
The ENCKEY= option defaults to no encryption. You can use the value
_DEFAULT_ to use the default encryption key. Your default encryption key is
automatically created.
SRCKEY="key-name"
specifies the encryption of the source object. The name must match an
encryption key name that you specified in an ENCKEY statement.
Note: This value is not applied to the created object if the ENCKEY= option is
not specified.
2272 Chapter 60 / S3 Procedure
CREATE Statement
Creates a bucket.
Syntax
CREATE "bucket-name";
Required Argument
bucket-name
specifies the name of the S3 bucket that you are creating. The name of the S3
bucket that you create must be unique across S3.
DELETE Statement
Deletes an S3 location or object.
Syntax
DELETE "s3-location";
Required Argument
s3-location
specifies the S3 location that you are deleting. Fully qualify the S3 location from
the bucket name to the object name.
DESTROY Statement
Deletes an S3 bucket.
ENCKEY Statement 2273
Syntax
DESTROY "bucket-name";
Required Argument
bucket-name
specifies the name of an S3 bucket that you are deleting. The bucket that you
specify must exist in S3 and must be empty.
ENCKEY Statement
Specifies the encryption key for an S3 location.
Syntax
Form 1: ENCKEY ADD | REPLACE NAME="key-name"ID="key-ID" <CONTEXT="key-
context">;
Form 2: ENCKEY ADD | REPLACE NAME="key-name" KEY="key-string" <ALGORITHM="S3-
encryption-algorithm"> <CONTEXT="key-context">;
Form 3: ENCKEY ADD | REPLACE NAME="key-name" HEXKEY="hexadecimal-value"
<ALGORITHM="S3-encryption-algorithm"> <CONTEXT="key-context">;
Form 4: ENCKEY LIST <NAME="key-name">;
Form 5: ENCKEY REMOVE NAME="key-name";
Arguments
ADD
indicates that you are adding an encryption key.
Requirement: Specify the HEXKEY=, ID=, or KEY= options with ADD, depending
on the type of key that you are using.
ALGORITHM="S3-encryption-algorithm"
specifies the S3 encryption algorithm that is used in the S3 location.
The only value that is currently supported is AES256. This value is the default
when you do not specify the ALGORITHM= option.
CONTEXT="key-context"
specifies the key context for the encryption key.
HEXKEY="hexadecimal-value"
specifies an encryption key in the form of a 64-digit hexadecimal value.
2274 Chapter 60 / S3 Procedure
ID="key-ID"
specifies the key ID for the encryption key.
KEY="key-string"
specifies an encryption key as a 32-byte character string.
LIST
requests a list of defined encryption keys.
NAME="key-name"
specifies the encryption key name. This is a user-friendly name that you can use
to refer to an encryption key. The value is case insensitive.
NAME= is required with the ADD, REPLACE, or REMOVE options. When used
with the LIST option, NAME= is optional.
REMOVE
removes an encryption key from the list of defined keys.
REPLACE
specifies that the existing encryption key should be replaced. If not set, any
attempt to redefine an encryption key fails.
Details
The ENCKEY statement is used to support server-side encryption in an AWS S3
environment.
n Use Form 1 of the ENCKEY statement to add or replace an IAM:KMS encryption
key.
n Use Form 2 of the ENCKEY statement to add or replace an encryption key in the
form of a string. The string must be 32 bytes long.
n Use Form 3 of the ENCKEY statement to add or replace an encryption key
expressed as hexadecimal characters. Do not use the “0x” prefix when you
specify the hexadecimal key string. The hexadecimal string should be 64 digits
long.
n Use Form 4 of the ENCKEY statement to view the list of defined encryption
keys.
n Use Form 5 of the ENCKEY statement to remove an encryption key.
Note: If you specify the NOSSL option in the PROC S3 statement, then using
encryption with the ENCKEY statement fails.
You must specify one of these options in the ENCKEY statement: ADD, LIST,
REMOVE, or REPLACE.
Encryption keys are specified for the duration of your SAS session.
GETACCEL Statement 2275
The encryption key _DEFAULT_ cannot be modified. The _DEFAULT_ encryption key
is defined by the AWS environment.
For more information, see Protecting Data Using Server-Side Encryption in your
AWS documentation.
GET Statement
Retrieves an S3 object.
Note: This statement uses transfer acceleration when it is available. For more
information, see “Transfer Acceleration” on page 2265.
Syntax
GET <ENCKEY="key-name"> "s3-location" "local-file-path";
Required Arguments
s3-location
specifies the name of an S3 object to retrieve. Fully qualify the location from
the bucket name to the object name.
local-file-path
specifies a local file in which to store the object. Examples are C:\public
\filename for Windows or /u/$USER/filename for UNIX.
Optional Argument
ENCKEY="key-name"
specifies an encryption key name that identifies the encryption to use during
data retrieval. The name must match an encryption key name that you specified
in an ENCKEY statement. The name must also correspond to the encryption key
that was used when the object was created in the S3 environment. Otherwise,
S3 returns an error.
GETACCEL Statement
Retrieves the transfer acceleration status for a bucket.
Syntax
GETACCEL "bucket-name";
Required Argument
bucket-name
specifies the name of an S3 bucket for which you want the transfer acceleration
status. For more information, see “Transfer Acceleration” on page 2265.
GETDIR Statement
Retrieves the contents of an S3 directory.
Note: This statement uses transfer acceleration when it is available. For more
information, see “Transfer Acceleration” on page 2265.
Syntax
GETDIR <ENCKEY="key-name"> "s3-location" "local-path";
Required Arguments
s3-location
specifies the name of an S3 directory to retrieve. Fully qualify the location from
the bucket name to the directory.
local-path
specifies a directory that is local to the SAS client. Examples are C:\public for
Windows or /u/$USER/ for UNIX.
Optional Argument
ENCKEY="key-name"
specifies an encryption key name that identifies the encryption to use during
data retrieval. The name must match an encryption key name that you specified
in an ENCKEY statement.
Note: Encryption fails if any files in the source location do not use the specified
encryption key.
LIST Statement 2277
INFO Statement
Requests information in the SAS log about an S3 location.
Note: Information about a file includes the region, path, and the date and time at which
the file was last modified.
Syntax
INFO <ENCKEY="key-name"> "s3-location";
Required Argument
s3-location
specifies a fully qualified path from the bucket name to the object name.
Optional Argument
ENCKEY="key-name"
specifies an encryption key name. The name must match an encryption key
name that you specified in an ENCKEY statement.
Note: The ENCKEY= option is required if any objects in the S3 location are
encrypted.
LIST Statement
Requests information in the SAS log about the contents of an S3 location.
Syntax
LIST <_SHORT_> "s3-location" <OUT=filename >;
Required Argument
s3-location
specifies a fully qualified path from the bucket name to the object name. The
object that you specify is typically a container, such as a bucket or a directory.
2278 Chapter 60 / S3 Procedure
Optional Arguments
_SHORT_
specifies that you want a list of object names only.
OUT=filename
specifies an output file for the output of the LIST statement.
Note Support for this option was added in SAS 9.4M7 and SAS Viya 3.5.
MKDIR Statement
Specifies a directory to create in an S3 location.
Syntax
MKDIR "s3-location";
Required Argument
s3-location
specifies a fully qualified path from the bucket name to the directory name that
you want to create.
PUT Statement
Specifies a local object to write to an S3 location.
Note: This statement uses transfer acceleration when it is available. For more
information, see “Transfer Acceleration” on page 2265.
Syntax
PUT <ENCKEY="key-name"> "local-path" "s3-location";
PUTDIR Statement 2279
Required Arguments
local-path
specifies a file or directory that is local to the SAS client.
s3-location
specifies a location in S3. Fully qualify the path from the bucket name to the
object name.
Optional Argument
ENCKEY="key-name"
specifies an encryption key name that identifies the encryption to use when
storing data in S3. The name must match an encryption key name that you
specified in an ENCKEY statement.
PUTDIR Statement
Specifies a local directory to write to an S3 location.
Note: This statement uses transfer acceleration when it is available. For more
information, see “Transfer Acceleration” on page 2265.
Syntax
PUTDIR <ENCKEY="key-name"> "local-path" "s3-location";
Required Arguments
local-path
specifies a directory that is local to the SAS client.
s3-location
specifies a location in S3. Fully qualify the path from the bucket name to the
object name.
Optional Argument
ENCKEY="key-name"
specifies an encryption key name that identifies the encryption to use when
writing to S3. The name must match an encryption key name that you specified
in an ENCKEY statement.
2280 Chapter 60 / S3 Procedure
Note: All files in the S3 location are encrypted with the specified encryption
key.
REGION Statement
Enables you to add, list, or remove custom regions.
Notes: Support for this statement in SAS Viya 3.5 starts in July 2021.
Support for this statement in SAS 9 starts in SAS 9.4M8.
See: “Example 4: Define a Custom Region with PROC S3” on page 2288
Syntax
Form 1: REGION ADD HOST="AWS-server" NAME="region-name" <PORT=port-value>
<REPLACE> <SSLALLOWED><SSLPORT=SSL-port> <SSLREQUIRED> ;
Form 2: REGION LIST;
Form 3: REGION REMOVE NAME="region-name";
Arguments
ADD
specifies that you are adding a custom region for the S3 environment.
HOST="AWS-server"
specifies the server that PROC S3 connects to on AWS.
Note Support for the HOST= option was added in the August 2021 update
for SAS Viya 3.5.
Example host="myhost.amazonaws.com"
LIST
requests a list of all defined S3 regions.
NAME="region-name"
specifies the name of the region that you are adding or removing.
PORT=port-value
specifies the HTTP port value to use to connect to S3 without SSL.
REMOVE
specifies to remove the region that you identify with the NAME= option.
RMDIR Statement 2281
Restriction You cannot remove any of the predefined S3 regions. See the PROC
S3 statement REGION= argument for the list of predefined region
values.
REPLACE
indicates that a custom region should be added and should overwrite a region of
the same name, if it exists.
Restriction You cannot replace any of the predefined regions for S3. See the
REGION= argument for the list of predefined region values.
SSLALLOWED
indicates that SSL is allowed when communicating with the S3 environment.
SSLPORT=SSL-port
specifies the HTTP port value to use to connect to S3 with SSL.
SSLREQUIRED
indicates that all communications with the S3 environment must be made using
SSL.
RMDIR Statement
Specifies a directory to delete from an S3 location.
Syntax
RMDIR "s3-location";
Required Argument
s3-location
specifies a directory in S3. Fully qualify the path from the bucket name to the
directory name. The directory that you specify must be empty.
2282 Chapter 60 / S3 Procedure
Examples: S3 Procedure
Details
This example uses the CREATE statement in PROC S3 to create a bucket in S3.
Next, the example shows how to add a file to the bucket and then list the contents
of the bucket.
Program
proc s3 config="/u/marti/.tks3.conf";
create "/myBucket";
put "/u/marti/project/licj.csv" "/myBucket/licj.csv";
list "/myBucket";
run;
Program Description
Execute the PROC S3 statement and specify the location of the PROC S3
configuration file. Connect to S3 with connection options that are contained in
the .tks3.conf file in the specified directory.
proc s3 config="/u/marti/.tks3.conf";
Create a bucket in S3. Use the CREATE statement to create the bucket myBucket.
create "/myBucket";
Example 2: Manage the Contents of a Bucket 2283
Store a copy of a local file in the new S3 bucket. Execute the PUT statement to
copy the local file licj.csv into a file of the same name in the S3 bucket myBucket.
put "/u/marti/project/licj.csv" "/myBucket/licj.csv";
List the contents of the S3 bucket. Use the LIST statement to see a list of the
contents of myBucket.
list "/myBucket";
run;
If the file licj.csv is the only file in the bucket, then the following message is printed
to the SAS log.
Details
This example shows how to create a directory in a bucket and how to copy and
delete files.
Program
proc s3;
mkdir "/myBucket/csv";
copy "/myBucket/licj.csv" "/myBucket/csv/licj.csv";
delete "/myBucket/licj.csv";
run;
2284 Chapter 60 / S3 Procedure
Program Description
Connect to S3 using the default configuration file. Use the PROC S3 statement to
connect to S3 with connection options that are specified in the .tks3.conf file in
your home directory. This is the default configuration file location.
proc s3;
Create a new directory. Use the MKDIR statement to create a directory, csv, in the
myBucket. A note is printed to the SAS log.
mkdir "/myBucket/csv";
Copy a file into the new directory. Use the COPY statement to copy a file, licj.csv,
from myBucket into the csv directory.
copy "/myBucket/licj.csv" "/myBucket/csv/licj.csv";
Delete the original copy of the file. Use the DELETE statement to delete the
original copy of the licj.csv file from myBucket. Only the copy of the file in /
myBucket/csv remains.
delete "/myBucket/licj.csv";
run;
Details
This example shows how to work with encryption keys. As a best practice, manage
encryption keys in a separate SAS program that you include into another SAS
program that interacts with the AWS environment. In this example, the program
Example 3: Manage Encryptions with S3 Data 2285
that manages encryption keys is called keys.sas. All encryptions use the default
AES256 encryption algorithm that is supported by the S3 environment.
PROGRAM: Keys.sas
/* keys.sas */
proc s3;
/* IAM:KMS keys */
enckey add name="foo" id="98v27390-1si1-8sc9-38k0-k893j354nw5g";
enckey add name="bar" id="22f87852-0ak1-5xy2-01j1-l852g809gx4q";
hexkey="e1d2e3bb4a5f6079889706b3c4d1e2f999f2c3bb4c5f6079888706f6a4b3e2d
a";
enckey replace name="bar2"
hexkey="989294020345689209375802937572839401092837467483920183758392019
2";
run;
Program Description
Add an IAM:KMS encryption key. Use the ENCKEY statement with the ADD option
to add two encryption keys named foo and bar. You supply the ID= value that is
defined by the IAM service in the AWS environment. Use the NAME= option to
provide a user-friendly identifier for the encryption key.
/* keys.sas */
proc s3;
/* IAM:KMS keys */
enckey add name="foo" id="98v27390-1si1-8sc9-38k0-k893j354nw5g";
enckey add name="bar" id="22f87852-0ak1-5xy2-01j1-l852g809gx4q";
Add an SSE-C key with a character string. You can use the ENCKEY statement and
the ADD option to add a user-specified encryption key. Specify a 32-byte character
string as the KEY= value and provide a user-friendly value for NAME=.
/* SSE-C keys (user-specified) as character string */
enckey add name="foo2" key="ke-sj-eb-ok-qk-zi-nb-lu-fh-wi-zi";
enckey add name="bar2" key="wl-tk-64-rk-i0-zn-wk-7c-s8-a8-jk";
2286 Chapter 60 / S3 Procedure
Replace existing encryption keys with a user-defined hexadecimal key. Use the
REPLACE option with the NAME= option to identify an encryption key that you
want to replace. You can specify an encryption key with a 64-digit hexadecimal
value with the HEXKEY= option.
/* SSE-C (user-specified) keys as hex data */
enckey replace name="foo2"
hexkey="e1d2e3bb4a5f6079889706b3c4d1e2f999f2c3bb4c5f6079888706f6a4b3e2d
a";
enckey replace name="bar2"
hexkey="989294020345689209375802937572839401092837467483920183758392019
2";
run;
1 /* keys.sas */
2 proc s3;
3 /* IAM:KMS keys */
4 enckey add name="foo" id="98v27390-1si1-8sc9-38k0-k893j354nw5g";
5 enckey add name="bar" id="22f87852-0ak1-5xy2-01j1-l852g809gx4q";
6 /* user-defined keys as character string */
7 enckey add name="foo2" key="ke-sj-eb-ok-qk-zi-nb-lu-fh-wi-zi";
8 enckey add name="bar2" key="wl-tk-64-rk-i0-zn-wk-7c-s8-a8-jk";
9 /* user-defined keys as hex data */
10 enckey replace name="foo2"
11
hexkey="e1d2e3bb4a5f6079889706b3c4d1e2f999f2c3bb4c5f6079888706f6a4b3e2da";
12 enckey replace name="bar2"
13
hexkey="9892940203456892093758029375728394010928374674839201837583920192";
14 run;
NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
NOTE: The SAS System used:
real time 3.34 seconds
cpu time 0.11 seconds
Program: S3data.sas
/* s3data.sas */
%include "keys.sas";
proc s3;
enckey list;
Example 3: Manage Encryptions with S3 Data 2287
run;
Program Description
In a separate program, s3data.sas, you can specify the encryptions to use while you
work with data from an S3 environment.
Include encryption key definitions so that they are accessible to the s3data.sas
program. Use the %INCLUDE statement to retrieve encryption key definitions and
their associated names.
/* s3data.sas */
%include "keys.sas";
List available encryption keys. Use the LIST option with the ENCKEY statement to
see the currently defined encryption keys.
proc s3;
enckey list;
Manipulate the encryptions for files. The PUT statement uses the foo encryption
key from keys.sas to encrypt the local foo.txt file that is copied to an S3 bucket
called Mybucket. The COPY statement copies the foo.txt file in Mybucket to a file
called bar.txt that is encrypted using the bar encryption key. The GET statement
copies the file foo.txt from Mybucket to your local directory and renames the file
foo-get.txt. The file is encrypted using the foo encryption key.
put enckey="foo" "/home/demouser/s3/foo.txt" "/mybucket/foo.txt";
copy srckey="foo" "/mybucket/foo.txt" enckey="bar" "/mybucket/
bar.txt";
get enckey="foo" "/mybucket/foo.txt" "/home/demouser/foo-get.txt";
Create a subdirectory in an S3 bucket and encrypt it. Use the PUTDIR statement
to create a subdirectory, Enctest, in Mybucket. This location is a copy of your local
directory /home/demouser/s3. The Enctest location is encrypted using the foo2
encryption key. In keys.sas, the foo2 encryption key is represented by a
hexadecimal value, because that encryption key replaced the original encryption
key for foo2.
putdir enckey="foo2" "/home/demouser/s3" "/mybucket/enctest";
encrypted by default during data transfer. The new local directory is /home/
demouser/enctest.
getdir enckey="foo2" "/mybucket/enctest" "/home/demouser/enctest";
run;
1 /* s3data.sas */
2 %include "keys.sas";
17 proc s3;
18 enckey list;
19 put enckey="foo" "/home/demouser/foo.txt" "/mybucket/foo.txt";
20 copy srckey="foo" "/mybucket/foo.txt" enckey="bar" "/mybucket/bar.txt";
21 get enckey="foo" "/mybucket/foo.txt" "/home/demouser/foo-get.txt";
22 putdir enckey="foo2" "/home/demouser/s3" "/mybucket/enctest";
23 getdir enckey="foo2" "/mybucket/enctest" "/home/demouser";
24 run;
NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
NOTE: The SAS System used:
real time 8.82 seconds
cpu time 2.47 seconds
Details
This example shows how to define a custom region using PROC S3.
Program
proc s3;
region add host="s3.us-west-004.backblazeb2.com"
name="us-west-002"
sslrequired replace;
run;
Program Description
Connect to S3 using the default configuration file. Use the PROC S3 statement to
connect to S3 with connection options that are specified in the .tks3.conf file in
your home directory. This is the default configuration file location.
proc s3;
Define a new custom region. Use the REGION statement to create a custom region,
us-west-002. The SSLREQUIRED argument indicates that SSL is required when
communicating with the S3 environment. The REPLACE argument specifies that if
the custom region is already defined, then the definition should be overwritten.
region add host="s3.us-west-004.backblazeb2.com"
name="us-west-002"
sslrequired replace;
run;
2290 Chapter 60 / S3 Procedure
2291
61
SCAPROC Procedure
The following command runs your SAS job with the SAS Code Analyzer from your
operating system's command line:
sas yourjob.sas -initstmt "proc scaproc; record 'yourjob.txt' ; run;"
sas
is the command used at your site to start SAS.
yourjob.sas
is the name of the SAS job that you want to analyze.
yourjob.txt
is the name of the file that will contain a copy of your SAS code. The file will
also contain the comments that are inserted to show input and output
information, macro symbol usage, and other aspects of your job. For information
about issuing PROC SCAPROC in SAS code, see the examples.
Special Considerations
Some tasks of grid-enabled jobs can have dependencies on previous tasks. PROC
SCAPROC combines and reorders these tasks based on their dependencies to the
preceding tasks. Combining the tasks and submitting them in the same work unit
enables faster processing of the tasks. The NOOPTIMIZE argument of the GRID
option disables the combining and reordering of tasks of grid-enabled jobs.
For the GRID statement to work, your site has to license SAS Grid Manager or
SAS/CONNECT. SAS Grid Manager enables your generated grid job to run on a grid
of distributed machines. SAS/CONNECT enables your generated grid job to run on
parallel SAS sessions on one symmetric multiprocessing (SMP) machine.
In the following example, PROC SCAPROC submits the statements between the /*
SCAPROC GLOBAL BEGIN */ and /* SCAPROC GLOBAL END */ comments with each
of the PROC SUMMARY statements.
/* SCAPROC GLOBAL BEGIN */;
libname one "SAS-library-1";
libname two "SAS-library-1";
%let year=2018;
/* SCAPROC GLOBAL END */;
You can include any SAS statement that is valid with the generated grid job. You
can use text that is all uppercase, all lowercase, or mixed case.
PROC SCAPROC;
RECORD filespec <ATTR> <OPENTIMES> <INTCON> <EXPANDMACROS>
<GRID filespec <RESOURCE "resource name"> <INHERITLIB> <NOOPTIMIZE>
>;
WRITE;
Restriction: This procedure is not available in SAS Viya orders that include only SAS Visual
Analytics.
Examples: “Example 1: Specifying a Record File” on page 2300
“Example 2: Specifying the Grid Job Generator” on page 2301
Syntax
PROC SCAPROC;
RECORD Statement
Specifies a filename or a fileref to contain the output of the SAS Code Analyzer.
Syntax
RECORD filespec <ATTR> <OPENTIMES> <INTCON> <EXPANDMACROS>
<GRID filespec <RESOURCE "resource name"> <INHERITLIB> <NOOPTIMIZE> >;
Required Argument
filespec
specifies a physical filename in quotation marks, or a fileref, that indicates a file
to contain the output of the SAS Code Analyzer. The output is the original SAS
source and comments that contain information about the job. For more
information about the output comments, see “Results” on page 2296.
Optional Arguments
ATTR
writes additional information about the variables in the input data sets and
views.
OPENTIMES
writes the open time, size, and physical filename of the input data sets.
WRITE Statement 2295
INTCON
outputs information about table integrity constraints.
EXPANDMACROS
expands macro invocations into separate tasks.
GRID
filespec
specifies a physical filename in quotation marks, or a fileref, that points to a
file that will contain the output of the Grid Job Generator.
Note: INHERITLIB should be used only when the USER library is not
globally accessible to all remote sessions.
NOOPTIMIZE
disables the combining and reordering of tasks for grid-enabled jobs.
WRITE Statement
Specifies output information to the record file.
Syntax
WRITE;
Without Arguments
The WRITE statement specifies that the SAS Code Analyzer writes information to
the record file, if a file has been specified with the RECORD statement. The Grid
Job Generator will also run at this time if it has been specified. Termination of SAS
also causes the SAS Code Analyzer to write information to the specified record file.
2296 Chapter 61 / SCAPROC Procedure
Results
The following list contains explanations of the comments that the SAS Code
Analyzer writes to the record file that you specify with PROC SCAPROC. The
output comments are bounded by /* and */ comment tags in the record file. That
format is represented here to enhance clarity when the user reads a record file.
/* JOBSPLIT: DATASET INPUT|OUTPUT|UPDATE SEQ|MULTI name */
specifies that a data set was opened for reading, writing, or updating.
INPUT
specifies that SAS read the data set.
OUTPUT
specifies that SAS wrote the data set.
UPDATE
specifies that SAS updated the data set.
SEQ
specifies that SAS opened the data set for sequential access.
MULTI
specifies that SAS opened the data set for multipass access.
name
specifies the name of the data set.
/* JOBSPLIT: CATALOG INPUT|OUTPUT|UPDATE name */
specifies that a catalog was opened for reading, writing, or updating.
INPUT
specifies that SAS read the catalog.
OUTPUT
specifies that SAS wrote the catalog.
UPDATE
specifies that SAS updated the catalog.
name
specifies the name of the catalog.
/* JOBSPLIT: FILE INPUT|OUTPUT|UPDATE name */
specifies that an external file was opened for reading, writing, or updating.
INPUT
specifies that SAS read the file.
Results: SCAPROC Procedure 2297
OUTPUT
specifies that SAS wrote the file.
UPDATE
specifies that SAS updated the file.
name
specifies the name of the file.
/* JOBSPLIT: ITEMSTOR INPUT|OUTPUT|UPDATE name */
specifies that an ITEMSTOR was opened for reading, writing, or updating.
INPUT
specifies that SAS read the ITEMSTOR.
OUTPUT
specifies that SAS wrote the ITEMSTOR.
UPDATE
specifies that SAS updated the ITEMSTOR.
name
specifies the name of the ITEMSTOR.
/* JOBSPLIT: OPENTIME name DATE:date PHYS:phys SIZE:size */
specifies that a data set was opened for input. SAS writes the OPENTIME and
the SIZE of the file.
name
specifies the name of the data set.
DATE
specifies the date and time that the data set was opened. The value that is
returned for DATE is not the creation time of the file.
PHYS
specifies the complete physical name of the data set that was opened.
SIZE
specifies the size of the data set in bytes.
/* JOBSPLIT: ATTR name INPUT|OUTPUT VARIABLE:variable name
TYPE:CHARACTER|NUMERIC LENGTH:length LABEL:label FORMAT:format
INFORMAT:informat */
specifies that when a data set is closed, SAS reopens it and writes the
attributes of each variable. One ATTR line is produced for each variable.
name
specifies the name of the data set.
INPUT
specifies that SAS read the data set.
OUTPUT
specifies that SAS wrote the data set.
VARIABLE
specifies the name of the current variable.
TYPE
specifies whether the variable is character or numeric.
2298 Chapter 61 / SCAPROC Procedure
LENGTH
specifies the length of the variable in bytes.
LABEL
specifies the variable label if it has one.
FORMAT
specifies the variable format if it has one.
INFORMAT
specifies the variable informat if it has one.
/* JOBSPLIT: SYMBOL SET|GET name */
specifies that a macro symbol was accessed.
SET
specifies that SAS set the symbol. For example, SAS set the symbol sym1 in
the following code: %let sym1=sym2
GET
specifies that SAS retrieved the symbol. For example, SAS retrieved the
symbol sym in the following code: a="&sym"
name
specifies the name of the symbol.
/* JOBSPLIT: ELAPSED number */
specifies a number for you to use to determine the relative run times of tasks.
number
specifies a number for you to use to determine the relative run times of
tasks.
/* JOBSPLIT: USER useroption */
specifies that SAS uses the USER option with the grid job code to enable single-
level data set names to reside in the Work library.
useroption
specifies the value that is to be used while the code is running.
/* JOBSPLIT: _DATA_ */
specifies that SAS is to use the reserved data set name _DATA_.
/* JOBSPLIT: _LAST_ */
specifies that SAS is to use the reserved data set name _LAST_ .
/* JOBSPLIT: PROCNAME procname|DATASTEP */
specifies the name of the SAS procedure or DATA step for this step.
/* JOBSPLIT: LIBNAME <libname options> */
specifies the LIBNAME options that were provided in a LIBNAME statement or
were set internally.
/* JOBSPLIT: SYSSCP <sysscp> */
specifies the value of the SYSSCP automatic macro variable when the SAS job
was run.
/* JOBSPLIT: JOBSTARTTIME <datetime> */
records the date and time that a job started.
/* JOBSPLIT: JOBENDTIME <datetime> */
records the date and time that a job ended.
Results: SCAPROC Procedure 2299
UNIQUE unique
CHECK check
VARIABLES
specifies the variables involved with this integrity constraint. Otherwise, it is
empty.
WHERE
specifies the WHERE expression for CHECK integrity constraints. Otherwise,
it is empty.
REFERENCE
specifies one of the following integrity constraints:
n For FOREIGN integrity constraints, it specifies the table the foreign key
references.
n For REFERENTIAL integrity constraints, it specifies the table containing the
foreign key.
2300 Chapter 61 / SCAPROC Procedure
n Otherwise, it is empty.
ONDELETE
specifies the ON DELETE referential action for FOREIGN and REFERENTIAL
integrity constraints. Otherwise, it is empty.
ONUPDATE
specifies the ON UPDATE referential action for FOREIGN and REFERENTIAL
integrity constraints. Otherwise, it is empty.
MESSAGE
specifies the text of the error message, if there is any.
MESSAGETYPE
specifies USER if MESSAGETYPE=USER was specified. Otherwise, it is
empty.
This example specifies the record file 'record.txt' and writes information from
the SAS Code Analyzer to the file.
Program
proc scaproc;
record 'record.txt';
run;
data a;
do i = 1 to 100000;
j = cos(i);
output;
end;
run;
proc scaproc;
write;
run;
Output
Output 61.1 Contents of the record.txt File
data a;
do i = 1 to 1000000;
j = cos(i);
output;
end;
run;
/* JOBSPLIT: END */
GRID statement
Details
This example writes information from the SAS Code Analyzer to the file that is
named 1.txt. The example code also runs the Grid Job Generator, and writes that
information to the file that is named 1.grid. Notice that this example does not
have an ending statement that contains this code:
proc scaproc;
write;
run;
When SAS terminates, PROC SCAPROC automatically runs any pending RECORD
or GRID statements.
For the GRID statement to work, your site has to license SAS Grid Manager or
SAS/CONNECT. SAS Grid Manager enables your generated grid job to run on a grid
of distributed machines. SAS/CONNECT enables your generated grid job to run on
parallel SAS sessions on one symmetric multiprocessing (SMP) machine.
Program
proc scaproc;
record '1.txt' grid '1.grid';
run;
data a;
do i = 1 to 100000;
j = cos(i);
output;
end;
run;
Output
Output 61.2 Contents of the 1.txt File
data a;
do i = 1 to 1000000;
j = cos(i);
output;
end;
run;
/* JOBSPLIT: END */
2304 Chapter 61 / SCAPROC Procedure
2305
62
SCOREACCEL Procedure
n Models are supported in the form of DATA step code, DS2 code, or analytic
stores.
n Model code can be published to a session-local or global CAS table and
executed in CAS. Session-local means it is accessible only to the current CAS
session that is executing PROC SCOREACCEL. Publishing to a global CAS table
is supported in SAS Viya 3.5.
n Alternatively, models can be published to Teradata or Hadoop. After a model is
published to an external data source, it can be executed there. Running a model
invokes the SAS Embedded Process (EP). To operate in these environments, you
need to license SAS In-Database Technologies for Hadoop (on SAS Viya) or SAS
In-Database Technologies for Teradata (on SAS Viya).
n As of SAS Viya 3.4 and SAS 9.4M6, models can be removed from CAS and from
external databases using the PROC SCOREACCEL DELETEMODEL statement.
n PROC SCOREACCEL calls CAS actions to complete the tasks. For more
information about the actions, see “Using DS2 Actions to Publish, Run, and
Delete Models in CAS” in SAS DS2 Programmer’s Guide.
n When publishing DATA step model code, PROC SCOREACCEL translates DATA
step code into DS2 code on the SAS client.
Prior to SAS Viya 3.5, a model could be published only to a model table local to the
current CAS session. The local session table could then be promoted to global
scope using the PUBLISHMODEL PROMOTETABLE= parameter.
Beginning in SAS Viya 3.5, a model can be published directly to a global model
table in CAS. A model is published to a global model table in CAS when the
PUBLISHMODEL PUBLISHGLOBAL= parameter is set to YES.
PROC SCOREACCEL Statement 2307
PROC SCOREACCEL Publish, execute, and delete models in CAS or Ex. 1, Ex. 2,
an external database Ex. 3, Ex. 4,
Ex. 5, Ex. 6,
Ex. 7
Examples: “Example 1: Publishing, Running, and Deleting a DATA Step Model in CAS” on page
2329
“Example 2: Publishing, Running, and Deleting a DS2 Model in Teradata” on page
2331
“Example 3: Publishing and Running a DATA Step Model in Teradata Using a
CASLIB” on page 2334
“Example 4: Publishing and Running a Model in Hadoop” on page 2336
“Example 5: Publishing and Running a Model in Hadoop (Hive)” on page 2337
2308 Chapter 62 / SCOREACCEL Procedure
Syntax
PROC SCOREACCEL SESSREF=session-name | SESSUUID="session-uuid";
Arguments
SESSREF=session-name | SESSUUID="session-uuid"
specifies the name of a CAS session or the universally unique identifier (UUID)
of an existing CAS session to which you want to connect. If no option is
specified, the automatic CAS session, CASAUTO, is used.
SESSREF=session-name
specifies the name of a CAS session to which you want to connect.
SESSUUID="session-uuid"
specifies the universally unique identifier (UUID) of an existing CAS session.
You must obtain the SESSUUID from the existing session before you can
specify it in this option. The engine connects to the session that is identified
in the UUID.
DELETEMODEL Statement
Deletes a model from CAS or an external database.
Restriction: This statement is available as of SAS Viya 3.4 and SAS 9.4M6.
Examples: “Example 1: Publishing, Running, and Deleting a DATA Step Model in CAS” on page
2329
“Example 2: Publishing, Running, and Deleting a DS2 Model in Teradata” on page
2331
Syntax
DELETEMODEL
MODELNAME="model-name"
MODELTABLE="model-table" | "caslib.model-table" | "schema.model-table"
<delete-model-options>;
Required Arguments
MODELNAME="model-name"
specifies the name of the model to be deleted.
Alias MODEL=
Applies to CAS
DELETEMODEL Statement 2309
Hadoop
Teradata
Applies to CAS
Teradata
Requirement Deleting a model from CAS involves removing a row from a model
table. In order to remove the model, the model table must first be
loaded to the CAS server.
Applies to Teradata
CASLIB="caslib"
specifies the name of the caslib associated with the model table when deleting
a model from CAS. When deleting models in Teradata or Hadoop, this is the
name of the caslib whose associated data source is used to obtain options that
are specific to the external database. This includes any user credentials that are
specified in the data source definition.
Applies to CAS
Hadoop
Teradata
Requirement The caslib that is specified as the first part of a two-part model
table name takes precedence over the caslib that is specified with
the CASLIB option.
CLASSPATH="classpath"
specifies the class path used in the Hadoop call context. The class path can be a
folder or individual JAR files. The Hadoop configuration folder must be included
in the class path.
2310 Chapter 62 / SCOREACCEL Procedure
Applies to Hadoop
DATABASE="database-name"
specifies the name of the Teradata database.
Applies to Teradata
Note This option provides the connection option that names the Teradata
database to be accessed.
DELETEGLOBAL=YES | NO
specifies whether the model is deleted from a global model table in CAS. If
DELETEGLOBAL=YES, then the model is deleted from the global model table. If
DELETEGLOBAL=NO, then the model is deleted from the model table in the
local CAS session, leaving the global table unaltered.
Default NO
Applies to CAS
JOBMANAGEMENTURL="rest-url"
specifies the URL used to submit execution requests over a REST interface to
services such as Apache Livy.
Alias RESTURL=
Applies to Hadoop
Restriction This option is available as of SAS Viya 3.5 (August 2021 release, or
earlier if you apply a hot fix).
MODELDIR="model-directory"
specifies the root HDFS folder containing the model directory from which the
model is to be deleted.
Applies to Hadoop
PASSWORD="password"
is the password for the user ID on the Hadoop or Teradata server.
Aliases PASS=
PASSWD=
PWD=
Applies to Hadoop
Teradata
DELETEMODEL Statement 2311
PERSISTTABLE=YES | NO
specifies whether the updated model table, that results from deleting a model,
should be saved to the caslib data source associated with the table.
Default NO
Applies to CAS
Requirement If the model table already exists in the caslib data source, you
must also specify REPLACETABLE=YES to save the table.
PROMOTETABLE=YES | NO
specifies whether the updated model table, that results from deleting a model,
should be promoted to global scope on the CAS server.
Default NO
Applies to CAS
REPLACETABLE=YES | NO
specifies whether to allow an existing model table, that results from deleting a
model, to be replaced when the updated model table is saved to the caslib data
source.
Default YES
Applies to CAS
SCHEMA="schema-name"
specifies the name of the Teradata schema.
Applies to Teradata
Note This option provides the connection option that the Teradata
database uses to qualify the Teradata tables.
SERVER="server"
specifies the name of the Teradata server.
Applies to Teradata
CAS
specifies to delete the model from a model table in CAS.
HADOOP
specifies to delete the model from the Hadoop server.
2312 Chapter 62 / SCOREACCEL Procedure
TERADATA
specifies to delete the model from a model table in the Teradata database.
Default CAS
Applies to CAS
Hadoop
Teradata
USERNAME='id'
is an authorized user ID on the Hadoop or Teradata server.
Aliases USER=
USERID=
UID=
Applies to Hadoop
Teradata
WEBHDFSURL="webhdfs-url"
specifies the URL used to access the Hadoop distributed file system through the
REST API.
Applies to Hadoop
Restriction This option is available as of SAS Viya 3.5 (August 2021 release, or
earlier if you apply a hot fix).
Note Use this argument when you delete, publish, or run a model in a
platform that is configured to access the distributed file system
through the REST API.
PUBLISHMODEL Statement
Publishes a model in CAS or an external database.
Examples: “Example 1: Publishing, Running, and Deleting a DATA Step Model in CAS” on page
2329
“Example 2: Publishing, Running, and Deleting a DS2 Model in Teradata” on page
2331
“Example 3: Publishing and Running a DATA Step Model in Teradata Using a
CASLIB” on page 2334
“Example 4: Publishing and Running a Model in Hadoop” on page 2336
“Example 5: Publishing and Running a Model in Hadoop (Hive)” on page 2337
PUBLISHMODEL Statement 2313
Syntax
PUBLISHMODEL
MODELNAME ="model-name"
MODELTABLE="model-table" | "caslib.model-table" | "schema.model-table"
PROGRAMFILE="file-path" | fileref
< publish-model-options> ;
Required Arguments
MODELNAME="model-name"
specifies the name of the model to be published.
Alias MODEL=
Applies to CAS
Hadoop
Teradata
Applies to CAS
Teradata
Interaction The caslib that is specified as the first part of a two-part model
table name takes precedence over the caslib that is specified with
the CASLIB option.
PROGRAMFILE="file-path" | fileref
specifies the file that contains the model program to be published.
Applies to CAS
Hadoop
Teradata
2314 Chapter 62 / SCOREACCEL Procedure
Applies to Teradata
Restriction This option is available as of SAS Viya 3.4 and SAS 9.4M6.
CASLIB="caslib"
specifies name of the caslib in which the model table is created or to which the
updated model table is written, when publishing a model to CAS.
Applies to CAS
Hadoop
Teradata
Interaction The caslib that is specified as the first part of a two-part model
table name takes precedence over the caslib that is specified with
the CASLIB option.
CLASSPATH="class_path"
specifies the class path used in the Hadoop call. The class path must include
the Hadoop configuration folder and it must also include either a folder
containing the JAR files or the individual JAR files.
Applies to Hadoop
DATABASE="database-name"
specifies the name of the Teradata database.
Applies to Teradata
Note This option provides the connection option that names the Teradata
database to be accessed.
FORMATFILE=file-path | fileref
specifies the file that contains the user-defined format XML definition to be
published.
PUBLISHMODEL Statement 2315
Applies to CAS
Hadoop
Teradata
FORMATITEMSTOREFILE=file-path | fileref
specifies the file containing the format item store to be published.
Applies to CAS
Hadoop
Teradata
Restriction This option is available as of SAS Viya 3.4 and SAS 9.4M6.
JOBMANAGEMENTURL="rest-url"
specifies the URL used to submit execution requests over a REST interface to
services such as Apache Livy.
Alias RESTURL=
Applies to Hadoop
Restriction This option is available as of SAS Viya 3.5 (August 2021 release, or
earlier if you apply a hot fix).
KEEPLIST=YES | NO
specifies whether to include a KEEP statement in the DS2 model program that
was automatically generated from an analytic store model.
Default NO
Applies to CAS
Hadoop
Teradata
Restriction This option is available as of SAS Viya 3.4 and SAS 9.4M6.
MODELDIR="model-directory"
specifies the root HDFS folder where the model directory is created.
Applies to Hadoop
MODELNOTES="model-notes"
specifies the model notes to be written to the model table.
Applies to CAS
2316 Chapter 62 / SCOREACCEL Procedure
Teradata
MODELTYPE=DATASTEP | DS2
specifies the type of the input model program.
DATASTEP
specifies that the input model program is DATA step code.
Alias DS
DS2
specifies that the input model program is DS2 code.
Default DS2
Applies to CAS
Hadoop
Teradata
Note The DATA step code is converted to DS2 code before being bundled
into an item store.
MODELUUID="model-uuid"
specifies that the Model UUID is written to the model table.
Applies to CAS
Teradata
OUTDIR="work-directory"
specifies the local output directory that contains the program file that was
converted from DATA step to DS2.
Applies to CAS
Hadoop
Teradata
PASSWORD="password"
is the password for the user ID on the Hadoop or Teradata server.
Aliases PASS=
PASSWD=
PWD=
Applies to Hadoop
Teradata
PUBLISHMODEL Statement 2317
PERSISTTABLE=YES | NO
specifies whether the updated model table should be saved to the caslib data
source associated with the table.
Default NO
Applies to CAS
Requirement If the model table already exists in the caslib data source, you
must also specify REPLACETABLE=YES to save the table.
PROMOTETABLE=YES | NO
specifies whether the updated model table should be promoted to global scope
on the CAS server.
Default NO
Applies to CAS
PUBLISHGLOBAL=YES | NO
specifies whether the model is published to a global model table in CAS. If
PUBLISHGLOBAL=YES, then the model is published to a global model table. If
PUBLISHGLOBAL=NO, then the model is published to the model table in the
local CAS session, leaving the global model table unaltered.
Default NO
Applies to CAS
REPLACEMODEL=YES | NO
specifies whether to allow an existing model in the model table to be replaced
by the model being published.
Default YES
Applies to CAS
Hadoop
Teradata
REPLACETABLE=YES | NO
specifies whether to allow an existing model table to be replaced when the
updated model table is saved to the caslib data source.
Default YES
Applies to CAS
SCHEMA="schema-name"
specifies the name of the Teradata schema.
Applies to Teradata
Note This option provides the connection option that the Teradata
database uses to qualify the Teradata tables.
SERVER="server-name"
specifies the name of the Teradata or Hive server.
Applies to Hadoop
Teradata
Alias STOREFILE=
Applies to CAS
Hadoop
Teradata
Interaction As of SAS Viya 3.4 and SAS 9.4M6, if a single analytic store model is
specified without an accompanying DS2 program, the KEEPLIST
option can also be specified.
Tip The list of files can be a mixture of file pathnames and file
references.
Applies to CAS
Hadoop
Teradata
Restriction This option is available as of SAS Viya 3.4 and SAS 9.4M6.
CAS
specifies to publish to a model table in CAS.
HADOOP
specifies to publish to the Hadoop server.
TERADATA
specifies to publish to a model table in the Teradata database.
Default CAS
Applies to CAS
Hadoop
Teradata
USERNAME='ID'
is an authorized user ID on the Hadoop or Teradata server.
Aliases USER=
USERID=
UID=
Applies to Hadoop
Teradata
VARXMLFILE=file-path | fileref
specifies the file that contains the variable metadata XML to be used during
translation of an input DATA step model program to DS2.
Alias XMLFILE=
Applies to CAS
Hadoop
Teradata
WEBHDFSURL="webhdfs-url"
specifies the URL used to access the Hadoop distributed file system through the
REST API.
Applies to Hadoop
Restriction This option is available as of SAS Viya 3.5 (August 2021 release, or
earlier if you apply a hot fix).
Note Use this argument when you delete, publish, or run a model in a
platform that is configured to access the distributed file system
through the REST API.
2320 Chapter 62 / SCOREACCEL Procedure
RUNMODEL Statement
Runs a model in CAS or an external database.
Examples: “Example 1: Publishing, Running, and Deleting a DATA Step Model in CAS” on page
2329
“Example 2: Publishing, Running, and Deleting a DS2 Model in Teradata” on page
2331
“Example 3: Publishing and Running a DATA Step Model in Teradata Using a
CASLIB” on page 2334
“Example 4: Publishing and Running a Model in Hadoop” on page 2336
“Example 5: Publishing and Running a Model in Hadoop (Hive)” on page 2337
Syntax
RUNMODEL
MODELNAME ="model-name"
MODELTABLE="model-table" | "caslib.model-table" | "schema.model-table"
<run-model-options>;
Required Arguments
MODELNAME="model-name"
specifies the name of the model to run.
Alias MODEL=
Applies to CAS
Hadoop
Teradata
Applies to CAS
Teradata
Requirement The caslib that is specified as the first part of a two-part model
table name takes precedence over the caslib that is specified with
the CASLIB option.
Applies to Teradata
Restriction This option is available as of SAS Viya 3.4 and SAS 9.4M6.
CASLIB="caslib"
specifies the name of the caslib associated with the model table, input table,
and output table, when running a model in CAS. When running models in
Teradata or Hadoop, the name of the caslib whose associated data source is
used to obtain options that are specific to the external database. This includes
any user credentials that are specified in the data source definition.
Applies to CAS
Hadoop
Teradata
Requirement The caslib that is specified as the first part of a two-part model
table name takes precedence over the caslib that is specified with
the CASLIB option.
CLASSPATH="class_path"
specifies the class path used in the Hadoop call context. The class path
contains a folder or individual JAR files. The Hadoop configuration folder must
be included in the class path if you are not specifying the CONFIGPATH option.
If you are using Spark, Hadoop JAR files and Spark JAR files must be located in
separate folders, and the Spark JAR folder must be specified as the last entry in
the class path.
Applies to Hadoop
CONFIGPATH="configuration-path"
specifies a single folder where all the Hadoop and Spark configuration files
reside.
Applies to Hadoop
Restriction This option is available as of SAS Viya 3.4 and SAS 9.4M6.
CUSTOMJAR="file-path"
specifies the local JAR file that contains the user-provided custom reader. The
custom JAR file is automatically copied to the Hadoop cluster during job
submission.
Applies to Hadoop
DATABASE="database-name"
specifies the name of the Teradata database.
Applies to Teradata
Note This option provides the connection option that names the Teradata
database to be accessed.
DBMAXTEXT=number-of-bytes
specifies the maximum number of bytes to allocate for STRING data type
columns. This option does not apply to CHAR or VARCHAR columns.
Default 32767
Range 1–32767
Applies to Hadoop
Restriction This option is available as of SAS Viya 3.4 and SAS 9.4M6.
EPOPTIONS="options-string"
specifies the options that are passed to the SAS_SCORE_EP stored procedure
that invokes the SAS Embedded Process for Teradata.
Applies to Teradata
FORCEOVERWRITE=YES | NO
specifies whether to force deletion of the output data directory before running
the Hadoop MapReduce job.
Applies to Hadoop
HIVEPORT=integer
specifies the Hive server port number.
Applies to Hadoop
INDATASET="input-dataset"
specifies the name of the Spark dataset passed as input to the SAS Embedded
Process.
Applies to Hadoop
RUNMODEL Statement 2323
Restriction This option is available as of SAS Viya 3.5 (August 2021 release, or
earlier if you apply a hot fix).
INTABLE= option
INHDMD="file-path"
specifies the name of the input Hadoop metadata file on HDFS.
Applies to Hadoop
INTABLE= option
INQUERY="sql-query"
specifies an SQL SELECT statement that defines the inputs to the SAS
Embedded Process for Teradata.
Applies to Teradata
Alias INPUTTABLE=
Applies to CAS
Hadoop
Teradata
INHDMD= option
INQUERY= option
JOBMANAGEMENTURL="rest-url"
specifies the URL used to submit execution requests over a REST interface to
services such as Apache Livy.
Alias RESTURL=
Applies to Hadoop
Restriction This option is available as of SAS Viya 3.5 (August 2021 release, or
earlier if you apply a hot fix).
Alias KEEPLISTCOLS=
Applies to Hadoop
KEEPLISTFILE="file-path"
specifies the name of the file that contains a list of columns to be kept by the
DS2 program.
Applies to Hadoop
MODELDIR="model-directory"
specifies the root HDFS folder where the model directory is created.
Applies to Hadoop
OUTDATASET="output-dataset"
specifies the name of the output Spark dataset to be created by the SAS
Embedded Process.
Applies to Hadoop
Restriction This option is available as of SAS Viya 3.5 (August 2021 release, or
earlier if you apply a hot fix).
OUTTABLE= option
OUTHDMD="file-path"
specifies the name of the output Hadoop metadata file that is created by the
SAS Embedded Process.
Applies to Hadoop
OUTTABLE= option
OUTKEY=(column, …, column)
specifies the name of one or more columns used for the primary index of the
output table that is created by the SAS Embedded Process for Teradata.
Applies to Teradata
OUTPUTFOLDER="directory-path"
specifies the name of the directory where the output files are stored.
Applies to Hadoop
OUTPUTFORMATCLASS="class-name"
specifies the name of the output format class in dot notation that is used to
write the output records.
Applies to Hadoop
OUTRECORDFORMAT=BINARY | DELIMITED
specifies the format of the output record that is produced by the SAS
Embedded Process for Hadoop.
BINARY
specifies that the output record is binary.
DELIMITED
specifies that the output record is delimited.
Default DELIMITED
Applies to Hadoop
OUTSCHEMA="output-schema"
specifies the Hive output database schema name when running a model in
Hadoop.
Applies to Hadoop
Alias OUTPUTTABLE=
Applies to CAS
Hadoop
Teradata
OUTHDMD= option
OUTTABLEOPTIONS="options-string"
provides user-specified options that are appended to the Hive CREATE TABLE
statement.
Applies to Hadoop
PASSWORD="password"
is the password for the user ID on the Hadoop or Teradata server.
Aliases PASS=
PASSWD=
PWD=
Applies to Hadoop
Teradata
PLATFORM=MAPRED | SPARK
specifies the platform where the Hadoop Embedded Process is to be executed.
MAPRED
specifies to run the model in MapReduce.
SPARK
specifies to run the model in Spark.
Default MAPRED
Applies to Hadoop
RUNMODEL Statement 2327
Restriction This option is available as of SAS Viya 3.4 and SAS 9.4M6.
PROPERTIES="name1=value" <PROPERTIES="name2=value"> |
PROPERTIES=("name=value"<, "name=value", …>)
specifies a Hadoop configuration property as a name-value pair. For multiple
properties, specify multiple PROPERTIES arguments or specify a single
argument containing a comma-separated list of name-value pairs enclosed in
parentheses. Any Hadoop configuration property can be assigned using this
option.
If multiple properties are required, then the PROPERTIES parameter can also be
specified multiple times, once for each property.
Applies to Hadoop
Requirement If your output is to a Hive table and the Hive database folder is
under an HDFS encrypted zone, you must set the SAS Embedded
Process temporary folder to a location that is under the same
encrypted zone. To do this, set the sas.ep.tempdir configuration
property. Here is an example:
properties="sas.ep.tempdir=yourSASEPTemporaryFolder"
SCHEMA="schema-name"
specifies the name of the Teradata schema.
Applies to Teradata
Note This option provides the connection option that the Teradata
database uses to qualify the Teradata tables.
SENDPROGRAM=YES | NO
specifies whether the model program source code should be sent back to the
client and displayed for the user.
Applies to CAS
SERVER="server"
specifies the name of the Teradata or Hive server.
Applies to Hadoop
Teradata
CAS
specifies to publish to a model table in CAS.
HADOOP
specifies to publish to the Hadoop server.
TERADATA
specifies to publish to a model table in the Teradata database.
2328 Chapter 62 / SCOREACCEL Procedure
Default CAS
Applies to CAS
Hadoop
Teradata
TRACE
runs the SAS Embedded Process for Hadoop with traces on.
Applies to Hadoop
USERNAME='id'
is an authorized user ID on the Hadoop or Teradata server.
Aliases USER=
USERID=
UID=
Applies to Hadoop
Teradata
VERBOSE
specifies that the SAS Embedded Process for Hadoop provide additional logging
information.
Applies to Hadoop
WEBHDFSURL="webhdfs-url"
specifies the URL used to access the Hadoop distributed file system through the
REST API.
Applies to Hadoop
Restriction This option is available as of SAS Viya 3.5 (August 2021 release, or
earlier if you apply a hot fix).
Note Use this argument when you delete, publish, or run a model in a
platform that is configured to access the distributed file system
through the REST API.
Example 1: Publishing, Running, and Deleting a DATA Step Model in CAS 2329
Details
In this PROC SCOREACCEL example, a DATA step model is published and run in
CAS. The model is then deleted.
PROC SCOREACCEL translates DATA step code into DS2 code on the SAS client.
Program
cas mysess1;
libname mycaslib cas casref=mysess1;
proc delete data=mycaslib.model01_score_data; run;
libname eminput "/mydir/scoring/model01";
data mycaslib.model01_score_data;
set eminput.traindata;
id=_n_; run;
quit;
proc scoreaccel sessref=mysess1;
publishmodel
modelname="model01"
modeltype=datastep
modeltable="modeltable1"
programfile="/mydir/scoring/model01/score.sas"
xmlfile="/mydir/scoring/model01/score.xml"
2330 Chapter 62 / SCOREACCEL Procedure
outdir="/user1/score/work/"
modelnotes="Simple model01 test model"
replacemodel=yes
promotetable=no
persisttable=no
;
quit;
proc delete data=mycaslib.model01out_data run;
proc scoreaccel sessref=mysess1;
runModel
modelname="model01"
modeltable="modeltable1"
intable="model01_score_data"
outtable="model01_out_data"
;
quit;
proc scoreaccel sessref=mysess1;
deletemodel
modelname="model01"
modeltable="modeltable1"
;
quit;
Program Description
Assign a CAS LIBNAME. The CAS LIBNAME is passed to the DATA step in PROC
DELETE.
cas mysess1;
libname mycaslib cas casref=mysess1;
Create an input scoring table in CAS. The DELETE procedure removes an existing
input scoring table.
proc delete data=mycaslib.model01_score_data; run;
libname eminput "/mydir/scoring/model01";
data mycaslib.model01_score_data;
set eminput.traindata;
id=_n_; run;
quit;
Publish a model in CAS. The DATA step model is converted to DS2 in this step and
is published in CAS.
proc scoreaccel sessref=mysess1;
publishmodel
modelname="model01"
modeltype=datastep
modeltable="modeltable1"
programfile="/mydir/scoring/model01/score.sas"
xmlfile="/mydir/scoring/model01/score.xml"
outdir="/user1/score/work/"
modelnotes="Simple model01 test model"
replacemodel=yes
promotetable=no
Example 2: Publishing, Running, and Deleting a DS2 Model in Teradata 2331
persisttable=no
;
quit;
Run the model in CAS. The DELETE procedure removes an existing CAS table
before running the model.
proc delete data=mycaslib.model01out_data run;
proc scoreaccel sessref=mysess1;
runModel
modelname="model01"
modeltable="modeltable1"
intable="model01_score_data"
outtable="model01_out_data"
;
quit;
Details
This PROC SCOREACCEL example publishes and runs a DS2 Model in Teradata.
PROC SCOREACCEL invokes the Model publishing or DS2 actions in CAS to delete
a model, publish a model, or run a model.
Program
libname mytdlib teradata server=mytdserver user=model password=XXXXX
database=model;
proc delete data=mytdlib.model01_score_data; run;
2332 Chapter 62 / SCOREACCEL Procedure
Program Description
Create the Teradata LIBNAME reference. The Teradata libref is needed for PROC
DELETE and the DATA step program.
libname mytdlib teradata server=mytdserver user=model password=XXXXX
database=model;
Example 2: Publishing, Running, and Deleting a DS2 Model in Teradata 2333
Create an input scoring table in CAS. The DELETE procedure removes an existing
input scoring table.
proc delete data=mytdlib.model01_score_data; run;
libname eminput "/mydir/scoring/model01";
data mytdlib.model01_score_data;
set eminput.traindata;
id=_n_; run;
quit;
Publish a DS2 model to Teradata. The PROGRAMFILE statement contains the DS2
model.
proc scoreaccel sessref=mysess1;
publishmodel
target=teradata
modelname="model01"
modeltype=DS2
modeltable="modeltable1"
programfile="/mydir/scoring/model01/score.ds2"
modelnotes="Simple model01 test model"
username="model"
password="XXXXX"
server="mytdserver"
database="model"
replacemodel=yes
;
quit;
Details
In this PROC SCOREACCEL example, a DATA step model is published and run in
CAS. PROC SCOREACCEL translates DATA step code into DS2 code on the SAS
client. Here, the user credentials and other database connection information is
pulled from the specified CASLIB.
Program
libname mytdlib teradata server=mytdserver user=model password=XXXXX
database=model;
proc delete data=mytdlib.model01_score_data; run;
publishmodel
target=teradata
modelname="model01"
modeltype=datastep
modeltable="modeltable1"
programfile="/mydir/scoring/model01/score.sas"
xmlfile="/mydir/scoring/model01/score.xml"
outdir="/score/work/"
modelnotes="Simple model01 test model"
replacemodel=yes
;
runmodel
target=teradata
caslib="tdlib1"
modelname="model01"
modeltable="modeltable1"
intable="model01_score_data"
outtable="model01_out_data"
outkey="id"
;
quit;
Program Description
Create the Teradata connection options and LIBNAME reference.
libname mytdlib teradata server=mytdserver user=model password=XXXXX
database=model;
Create an input scoring table in Teradata. PROC DELETE removes an existing input
scoring table.
proc delete data=mytdlib.model01_score_data; run;
Define a Teradata caslib. The addCaslib action adds a CAS library to the current
mysess1 session.
proc cas;
session mysess1;
action addCaslib /
caslib="tdlib1"
datasource={
srcType="teradata",
dataTransferMode="parallel",
server="mytdserver",
username="model",
password="XXXXX",
database="model"
};
run;
2336 Chapter 62 / SCOREACCEL Procedure
Details
In this PROC SCOREACCEL example, a simple DS2 model is published and run in
Hadoop.
Program
The CLASSPATH option specifies a link to the Hadoop cluster.
proc scoreaccel sessref=mysess1;
publishmodel
target=hadoop
modelname="simple01"
Example 5: Publishing and Running a Model in Hadoop (Hive) 2337
modeltype=DS2
programfile="/score/simple/simple.ds2"
username="test"
password="XXXXX"
modeldir="/data/model/dlm/ds2"
classpath="/server/sdm/hadoopjars/cdh58/prod:
/server/sdm/hadoopcfg/cdh58/prod";
;
quit;
proc scoreaccel sessref=mysess1;
runmodel
target=hadoop
modelname="simple01"
username="test"
password="XXXXX"
modeldir="/data/model/dlm/ds2"
server="server1.com"
inhdmd="/data/model/dlm/meta/simple01sashdmd"
outhdmd="/data/model/dlm/meta/simple01_out.sashdmd"
outputfolder="/data/model/dlm/temp/simple01"
forceoverwrite=yes
classpath="/server/sdm/hadoopjars/cdh58/prod:
/user1/server/sdm/hadoopcfg/cdh58/prod";
;
quit;
Details
In this PROC SCOREACCEL example, a simple DS2 model is published to Hadoop
and executed there with Hive.
Program
The classpath statement specifies a link to the Hadoop cluster. The input and
output tables, carsorc and carsout, already exist on the Hadoop cluster.
proc scoreaccel sessref=mysess1;
2338 Chapter 62 / SCOREACCEL Procedure
publishmodel
target=hadoop
modelname="simple01"
modeltype=DS2
filelocation=local
programfile="/user1/score/simple/simple.ds2"
username="test"
modeldir="/user/user1/cas/models"
classpath="/server/sdm/hadoopjars/cdh58/prod:
/user1/server/sdm/hadoopcfg/cdh58/prod";
;
runmodel
target=hadoop
modelname="simple01"
username="test"
modeldir="/user/user1/cas/models"
server="server2.com"
intable="carsorc"
outtable="carsout"
outtableoptions="stored as ORC"
forceoverwrite=yes
classpath="/server/sdm/hadoopjars/cdh58/prod:
/user1/server/sdm/hadoopcfg/cdh58/prod";
;
quit;
Details
In this PROC SCOREACCEL example, a DS2 model is run in Spark with the SAS
Embedded Process.
Program
To run in Spark, the cluster must have Spark installed and configured. When the
Hadoop tracer script is run to collect JAR files, the spark folder is created along
with the required Spark JAR files. During this process, the Spark configuration file is
placed into the user’s configuration folder. To run the model in Spark, specify the
PLATFORM=SPARK option in the RUNMODEL statement. To run the model, the
folder containing the client-side Spark JAR files must be specified last in the
Example 7: Running a Model in Spark by Using the Apache Livy REST Interface 2339
CLASSPATH option. In this example, the input and output tables, carsorc and
carsout, already exist on the Hadoop cluster.
proc scoreaccel sessref=mysess1;
runmodel
target=hadoop
modelname="cars"
username="test"
modeldir="/user1/cas/models"
server="server2.com"
intable="carsorc"
outtable="carsout"
forceoverwrite=yes
classpath="/nfs/hadoop/jars"
configpath="/nfs/hadoop/cfg"
platform=SPARK
;
quit;
Details
In this example, PROC SCOREACCEL publishes and runs a model in Spark. The
request to run the model is submitted by using the Apache Livy REST service.
Create a CAS session, and assign a caslib to Spark. A best practice is to specify all
the connection properties in the caslib, so that you do not have to remember
which connection properties are required by the different actions and statements.
cas mysess;
hadoopJarPath="path-to-hadoop-jar-files",
hadoopConfigDir="path-to-hadoop-configuration",
dbmaxtext=512);
Start the SAS Embedded Process for Spark continuous session. Most customers
choose to use a continuous session for better performance.
proc cas;
sparkEmbeddedProcess.startSparkEP
caslib="myspark";
run;
quit;
Run the model in Spark. The INTABLE= option specifies the input Spark table that
contains the data to run the model against.
63
SOAP Procedure
For information about using PROC SOAP to invoke web services, see SAS BI Web
Services: Developer’s Guide.
Examples of additional elements that might be included are those for WS-Security
or WS-Addressing. Web Services Security (WS-Security) is a specification that
defines how security measures are implemented in web services to protect them
from external attacks. It is a set of protocols that ensure security for SOAP-based
messages by implementing the principles of confidentiality, integrity, and
authentication. WS-Addressing is a standard for adding addressing information to
SOAP messages. For more information, see Overview of Security for Web Services.
You can set the amount of time to wait for a response from the web service by
using the CONFIGFILE option. The default time to wait is 60 seconds.
Requests must not include an encoding declaration even if the envelope is included.
If the request is being read from a file, the file must be encoded in the same
encoding as the session encoding. Requests are encoded as UTF-8 before being
sent to the web service.
z/OS Specifics: Calling SAS registered services is not available in the z/OS
operating environment. SAS registered web services require WS-Security with
Password Digest, and Password Digest is not supported on z/OS.
PROC SOAP Statement 2343
Syntax
PROC SOAP options <properties>;
2344 Chapter 63 / SOAP Procedure
Required Arguments
IN=fileref 'your-input-file'
specifies the fileref that is used to input XML data that contains a PROC SOAP
request.
The fileref might have SOAPEnvelope and SOAPHeader elements as part of its
content, but they are not required unless you have specific header information
to provide.
SERVICE
specifies the SAS registered web service that you want to call.
Tip If you use the SERVICE option, then do not use the URL option.
URL
specifies the URL of the web service endpoint.
Tip If you use the URL option, then do not use the SERVICE option.
Optional Arguments
CONFIGFILE
enables you to set the time-out limit for web service calls. The default time-out
is 60 seconds.
DEBUG
enables you to specify an output log file. The debug option turns on wire logging
for httpclient and writes the output to the specified file. The value of this option
is the path or filename to the desired output.
ENVFILE
specifies the location of the SAS environments file.
ENVIRONMENT
specifies to use the environment that is defined in the SAS environments file.
MUSTUNDERSTAND
specifies the setting for the mustUnderstand attribute in the PROC SOAP
header.
OUT=fileref 'your-output-file'
specifies the fileref where the PROC SOAP XML response output is written.
PROXYDOMAIN
specifies an HTTP proxy server domain.
Tip This option is required only if your proxy server requires domain- or realm-
qualified credentials.
PROXYHOST
specifies an HTTP proxy server host name.
PROXYPASSWORD
specifies an HTTP proxy server password. Encodings that are produced by
PROC PWENCODE are supported.
2346 Chapter 63 / SOAP Procedure
Tip This option is required only if your proxy server requires credentials.
PROXYPORT
specifies an HTTP proxy server port.
PROXYUSERNAME
specifies an HTTP proxy server user name.
Tip This option is required only if your proxy server requires credentials.
SOAPACTION
specifies a SOAPAction element to invoke on the web service.
SRSURL
specifies the URL of the System Registry Service.
WEBAUTHDOMAIN
specifies that a user name and password be retrieved from metadata for the
specified authentication domain.
WEBDOMAIN
specifies the domain or realm for the user name and password.
WEBPASSWORD
specifies a password for basic web service authentication. Encodings that are
produced by PROC PWENCODE are supported.
WEBUSERNAME
specifies a user name for basic web service authentication.
WSSAUTHDOMAIN
specifies that the active connection to the SAS Metadata Server is used to
retrieve credentials in the specified authentication domain.
If credentials are found, they are used as the credentials for a WS-Security
UsernameToken.
WSSUSERNAME
specifies a WS-Security user name. If a value is set, then WS-Security is used
and a UsernameToken is sent with the web service request for user
authentication, security, and encryption.
WSSPASSWORD
specifies a WS-Security password that is the password for WSSUSERNAME.
Encodings that are produced by PROC PWENCODE are supported.
Properties
ENVELOPE
specifies that a SOAP envelope is to be included in the response.
Usage: SOAP Procedure 2347
Note: All discussion of TLS is also applicable to the predecessor protocol, Secure
Sockets Layer (SSL).
When you require client authentication, TLS has a renegotiation feature that
prevents unauthorized text from being added to the beginning or end of an
encrypted data stream. This feature is specifically used by certificate-based client
authentication. This feature disables TLS renegotiation in the Java Secure Sockets
Extension (JSSE) by default. As a result, when you attempt to access a web
resource that requires certificate-based client authentication through the
interception proxy, the following Java TLS error message is generated:
(javax.net.ssl.SSLException): HelloRequest followed by
an unexpected handshake message
However, it is still possible to enable the TLS renegotiation in Java by setting the
following system property to true before the JSSE library is initialized.
sun.security.ssl.allowUnsafeRenegotiation
2348 Chapter 63 / SOAP Procedure
Clients must ensure that the CA that signed the certificate has been added to their
truststore. You can provide the path to the truststore on the SAS command line or
in a SAS configuration file using JREOPTIONS.
-jreoptions (-Djavax.net.ssl.trustStore=full-path-to-the-trust-store)
Note: As a best practice, anytime that you are using passwords in configuration
files, use file system permissions that allow write only access by the owner of the
SAS or SAS Viya deployment. By default, the owner is "sas".
Here is an example using the SAS command line. The example uses the Windows
operating environment.
-JREOPTIONS (-Djavax.net.ssl.trustStore=
!SASHOME/../config/etc/SASSecurityCertificateFramework/
cacerts/trustedcerts.jks)
The second method that is used to call SAS registered web services uses the SAS
environments file to specify the endpoint of the service that you are calling. Using
Usage: SOAP Procedure 2349
this method, you can indicate the location of the SAS environments file in one of
two ways:
n use the ENVFILE option in PROC SOAP
You must also specify the desired environment within that file using the
ENVIRONMENT option, and specify the name of the service that you are calling
using the SERVICE option.
In both cases, the WSUSERNAME and WSPASSWORD options are set to the user
name and password that are required to contact the Security Token Service.
PROC SOAP uses log4j for logging requests and responses so that you can trace
them. To create a log file that contains the request issued and the response
received, create a file that has the following contents:
log4j.appender.FILE=org.apache.log4j.FileAppender
log4j.appender.FILE.File=wire.log
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern =%d %5p [%c] %m%n
log4j.logger.httpclient.wire=DEBUG, FILE
To turn on logging to see the SOAP request and response for an entire SAS session,
you must restart SAS with the –jreoptions command line option. Enable logging by
setting a Java system option using –jreoptions on the SAS command line or in a
SAS configuration file. The following syntax shows how to set the system option:
2350 Chapter 63 / SOAP Procedure
-jreoptions (-Dlog4j.configuration=path-to-log4j-config-file)
The following example shows how to use the entry on the SAS command line. The
example uses the Windows operating environment.
Using the configuration file and JREOPTIONS method described above turns on
logging for the entire SAS session. To turn on httpclient.wire for an individual PROC
SOAP call, use the DEBUG option.
You can use the DEBUG option to turn on wire logging for the duration of a PROC
SOAP call. The value of the DEBUG option is the path or filename to the output file.
Details
This example uses a proxy and a SOAPEnvelope element.
Program
<soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:soap="http://www.SoapClient.com/xml/SoapResponder.xsd">
<soapenv:Header/>
<soapenv:Body>
<soap:Method1
soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/
encoding/">
<bstrParam1 xsi:type="xsd:string">apple</bstrParam1>
<bstrParam2 xsi:type="xsd:string">zebra3</bstrParam2>
</soap:Method1>
</soapenv:Body>
</soapenv:Envelope>
;;;;
run;
Details
This example uses a proxy and does not use a SOAPEnvelope element.
Program
<soap:Method1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:soap="http://www.SoapClient.com/xml/SoapResponder.xsd"
encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<bstrParam1 xsi:type="xsd:string">apple</bstrParam1>
<bstrParam2 xsi:type="xsd:string">zebra3</bstrParam2>
</soap:Method1>
;;;;
run;
Details
This example uses the DNEONLINE Calculator to add 7 and 4.
Program
input;
put _infile_;
datalines4;
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/
envelope/"
xmlns:tem="http://tempuri.org/">
<soapenv:Header/>
<soapenv:Body>
<tem:Add>
<tem:intA>7</tem:intA>
<tem:intB>4</tem:intB>
</tem:Add>
</soapenv:Body>
</soapenv:Envelope>
;;;;
After running the program in SAS 9.4M6, the RESPONSE in the SAS log file shows
the following result:
<AddResponse xmls="http://tempuri.org"><AddResult>11</AddResult></AddResponse>
2354 Chapter 63 / SOAP Procedure
2355
64
SORT Procedure
Note: If extended attributes are defined on the input data set, PROC SORT
propagates the extended attributes to the output data set. For information about
extended attributes, see “Extended Attributes” on page 567.
NOTE: There were six observations read from the data set WORK.EMPLOYEE.
NOTE: The data set WORK.EMPLOYEE has six observations and three variables.
NOTE: PROCEDURE SORT used:
real time 0.01 seconds
cpu time 0.01 seconds
1 Belloit 1988
2 Wesley 2092
3 Lemeux 4210
4 Arnsbarger 5466
5 Pierce 5779
6 Capshaw 7338
The following output shows the results of a more complicated sort by three
variables. The businesses in this example are sorted by town, then by debt from
highest amount to lowest amount, then by account number. For an explanation of
the program that produces this output, see “Example 2: Sorting in Descending
Order” on page 2394.
Account
Obs Company Town Debt Number
Threaded Sorting
The THREADS system option enables threaded sorting. Threaded sorting achieves
a degree of parallelism in the sorting operations. This parallelism is intended to
reduce the real time to completion for a given operation and therefore limit the cost
of additional CPU resources. For more information, see “Support for Parallel
Processing” in SAS Language Reference: Concepts.
The multi-threaded SAS sort can also be invoked when you specify the THREADS
option in the PROC SORT statement. The multi-threaded sort stores all temporary
data in a single utility file within one of the locations that are specified by the
UTILLOC= system option. The size of this utility file is proportional to the amount
of data that is read from the input data set. A second utility file of the same size
can be created in another of these locations when the amount of data that is read
from the input data set is large or the amount of memory that is available to the
SORT procedure is small. For more information, refer to “UTILLOC= System Option”
in SAS System Options: Reference.
Note: The TAGSORT option on page 2383 does not support threaded sorting.
The multi-threaded SAS sort can be invoked when the THREAD system option is
specified and the value of the CPUCOUNT= system option is greater than 1. The
value of the SAS system option CPUCOUNT= affects the performance of the
threaded sort. CPUCOUNT= suggests how many system CPUs are available for use
by the threaded procedures.
For more information, see the “THREADS System Option” in SAS System Options:
Reference and the “CPUCOUNT= System Option” in SAS System Options: Reference.
Concepts: SORT Procedure 2359
3 zero
By default, PROC SORT uses either the EBCDIC or the ASCII collating sequence
when it compares character values, depending on the environment under which the
procedure is running.
For more information about the various collating sequences and when they are
used, see “Collating Sequence” in SAS National Language Support (NLS): Reference
Guide.
Note: ASCII and EBCDIC represent the family names of the session encodings. The
sort order can be determined by referring to the encoding.
EBCDIC Order
The z/OS operating environment uses the EBCDIC collating sequence.
The sorting order of the English-language EBCDIC sequence is consistent with the
following sort order example.
2360 Chapter 64 / SORT Procedure
abcdefghijklmnopqr~stuvwxyz
{ A B C D E F G H I } J K L M N O P Q R \S T
UVWXYZ
0123456789
The main features of the EBCDIC sequence are that lowercase letters are sorted
before uppercase letters, and uppercase letters are sorted before digits. Note also
that some special characters interrupt the alphabetic sequences. The blank is the
smallest character that you can display.
ASCII Order
The operating environments that use the ASCII collating sequence include the
following:
n UNIX and its derivatives
n Windows
n OpenVMS
From the smallest to the largest character that you can display, the English-
language ASCII sequence is consistent with the order shown in the following table.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z[ \] ∘_
abcdefghijklmnopqrstuvwxyz{}~
The main features of the ASCII sequence are that digits are sorted before
uppercase letters, and uppercase letters are sorted before lowercase letters. The
blank is the smallest character that you can display.
If you want to provide your own collating sequences or change a collating sequence
provided for you, then use the TRANTAB procedure to create or modify translation
tables. When you create your own translation tables, they are stored in your
PROFILE catalog, and they override any translation tables that have the same
name in the HOST catalog. For complete details, see “TRANTAB Procedure” in SAS
National Language Support (NLS): Reference Guide.
Note: System managers can modify the HOST catalog by copying newly created
tables from the PROFILE catalog to the HOST catalog. Then all users can access
the new or modified translation table.
Before PROC SORT sorts a data set, it checks the stored sort information. If you try
to sort a data set how it is currently sorted, then PROC SORT does not perform the
sort and writes a message to the log to that effect. To override this behavior, use
the FORCE option. If you try to sort a data set the same way it is currently sorted
and you specify an OUT= data set, then PROC SORT simply makes a copy of the
DATA= data set.
To override the sort information that PROC SORT stores, use the _NULL_ value
with the SORTEDBY= data set option. Refer to the “SORTEDBY= Data Set Option”
in SAS Data Set Options: Reference.
If you want to change the sort information for an existing data set, then use the
SORTEDBY= data set option in the MODIFY statement in the DATASETS
procedure. For more information, see “MODIFY Statement” on page 644.
To access the sort information that is stored with a data set, use the CONTENTS
statement in PROC DATASETS. For more information, see “CONTENTS Statement”
on page 603.
The number of variables by which you can sort a data set with PROC SORT is
limited only by available memory. The number of columns by which you can order
the rows of a result set using PROC SQL, is also limited only by available memory.
The sort indicator, whether stored in the metadata of a Base data set or
represented in memory, is limited to 127 variables. For this reason, up to 127
variables can be stored in the sort indicator or listed on the SORTEDBY= data set
option. If you are sorting by more than 127 variables, then only the first 127 are
recorded in the sort indicator. If you sort the data set again by the entire list of BY
variables, the data set is not recognized as being sorted, because the additional
variables (beyond 127) are not found within the sort indicator. For a detailed
2362 Chapter 64 / SORT Procedure
If the entire data set has been read and no out-of-sequence observations have been
found, then one of two actions is taken. If no output data set has been specified,
the sort order metadata of the input data set is updated to indicate that the
sequence has been verified. This verification notes that the data set is validly
sorted according to the specified BY variables. Otherwise, the data set is
considered sorted and either the input data set metadata is updated or, if OUT= has
been specified, the data is copied to an output data set.
If observations within the data set are not in sequence, then the data set is sorted.
If the “NODUPKEY” on page 2378 option has been specified, then the sequence
checking determines whether observations with duplicate keys are present in the
data set. If observations with duplicate keys are found, then the data set is
considered unsorted and a sort is performed. Otherwise, the data set is considered
sorted and actions are taken where either the metadata of the input data set is
updated or, if OUT= has been specified, data is copied to the output data set. The
actions taken are described in more detail in the previous paragraphs.
If the metadata of the input data set indicates that the data is already sorted
according to the key variables listed in the BY statement and the input data set has
been validated, then neither sequence checking nor sorting is performed.
See “Sorted Data Sets” in SAS Language Reference: Concepts and interactions with
the “SORTVALIDATE System Option” in SAS System Options: Reference.
Note: Only PROC SORT and PROC SQL are affected when the
SORTSEQ=LINGUISTIC system option is specified.
In SAS 9.4, the ICU library incorporated by SAS and used by PROC SORT is ICU
version 4.8.1. This ICU version uses locale data from version 2.0 of the Unicode
Common Locale Data Repository (CLDR). For in-depth information about the UCA
algorithm or the International Components for Unicode (ICU) library
implementation, see Download the ICU 4.8 Release and find the CLDR 2.0 Release
Note at CLDR Releases/Downloads.
In SAS Viya, the ICU library version incorporated by SAS and used by PROC SORT
is ICU 56. This ICU version uses locale data from version 28 of the Unicode
Common Locale Data Repository (CLDR). For in-depth information, see Download
ICU 56 and CLDR 28 Release Note.
A change in the version of the ICU that is used by PROC SORT for linguistic
collation, can affect the interpretation of data sets sorted by another version of
SAS. If a data set is linguistically sorted by one or more character variables in one
version of SAS, the data set is recognized as being sorted when accessed in another
version of SAS if the two SAS versions use different versions of the ICU. Because
collation rules can change between ICU versions, variations in the rules can cause
the order of observations produced by PROC SORT to be different. If the ordering
differences are ignored, unexpected results can be seen during processing.
When sorting linguistically, the ICU version used by SAS is recorded in the sort
indicator that is stored in the data set header. The ICU version is examined when
determining if a data set is considered sorted. A difference between the ICU version
in use and the ICU version recorded in the sort indicator of a data set causes the
SAS system to ignore the indicated sort order and assume that the data set is
unsorted.
Note: The PROC CONTENTS output shows the ICU version in use. See “Example 5:
Linguistic Sorting Using ALTERNATE_HANDLING=” on page 2401.
For both the COPY and MIGRATE procedures, if the ICU version recorded on an
input data set is different from the version in use by the SAS system, then the sort
indicator on the input data set is ignored, the output data set is not marked as
sorted, and a message is written to the SAS log. However, both procedures write
observations to an output data set in the same order as they are read from the
input. This order is preserved if a physical order is supported by the engine used for
the OUT= destination library. For these reasons, when migrating to a new release of
SAS, consider re-establishing the sort order of permanent data sets using PROC
SORT with the PRESORTED option.
Additional information about how linguistic collation is used by SAS can be found
in the following documents, as well as in the PROC SORT SORTSEQ=LINGUISTIC
system option.
n See “SORTSEQ=sort-table | LINGUISTIC” in SAS SQL Procedure User’s Guide.
The following are SAS papers that provide detailed information about Linguistic
Collation.
n Creating Order out of Character Chaos: Collation Capabilities of the SAS
System
n Linguistic Collation: Everyone Can Get What They Expect
By default, SORTPGM defaults to BEST, what SAS determines is probably the best
performance choice. When SORTPGM is set to BEST, sorting might be performed
by the following:
n a system or third-party host sorting utility program, if one is installed and
available.
n a DBMS if the input data resides in a database and a SAS/ACCESS engine is
used to read it.
To work properly, the host sort or DBMS and the SAS system must be configured
for compatible operation. The host sort or the DBMS can be configured to order
data differently from SAS. When the observations returned to SAS are not ordered
as SAS expects, the SORTPGM= system option can be set to SAS to instruct SAS
to sort the data in the order needed by SAS.
PROC SORT Order SAS data set observations by the values Ex. 1, Ex. 3,
of one or more character or numeric variables Ex. 4
Syntax
PROC SORT <collating-sequence-option> <other options>;
DUPOUT= SAS-data-set
specifies the output data set to which duplicate observations are written.
OUT= SAS-data-set
specifies the output data set.
UNIQUEOUT= SAS-data-set
specifies the output data set for eliminated observations.
Collating-Sequence-Options
Operating Environment Information: For information about behavior specific to
your operating environment for the DANISH, FINNISH, NORWEGIAN, or SWEDISH
collating-sequence-option, see the SAS documentation for your operating
environment.
You can specify only one collating-sequence-option and multiple other options in a PROC
SORT step. The order of the two types of options does not matter and both types are not
necessary in the same PROC SORT step.
ASCII
sorts character variables using the ASCII collating sequence. You need this
option only when you want to achieve an ASCII ordering on a system where
EBCDIC is the native collating sequence.
DANISH
sorts characters according to the Danish and Norwegian convention.
The Danish and Norwegian collating sequence is shown in Figure 64.87 on page
2369.
EBCDIC
sorts character variables using the EBCDIC collating sequence. You need this
option only when you want to achieve an EBCDIC ordering on a system where
ASCII is the native collating sequence.
FINNISH
sorts characters according to the Finnish and Swedish convention.
The Finnish and Swedish collating sequence is shown in Figure 64.87 on page
2369.
NATIONAL
sorts character variables using an alternate collating sequence, as defined by
your installation, to reflect a country's National Use Differences. To use this
option, your site must define a customized national sort sequence. Check with
the SAS Installation Representative at your site to determine whether a
customized national sort sequence is available.
NORWEGIAN
sorts characters according to the Danish and Norwegian convention.
REVERSE
sorts character variables using a collating sequence that is reversed from the
normal collating sequence.
Operating Environment Information: For information about the normal
collating sequence for your operating environment, see “EBCDIC Order” on page
PROC SORT Statement 2369
2359, “ASCII Order” on page 2360, and the SAS documentation for your
operating environment.
SWEDISH
sorts characters according to the Finnish and Swedish convention.
The Finnish and Swedish collating sequence is shown in Figure 64.87 on page
2369.
SORTSEQ= collating-sequence
The collating-sequence can be one of the following:
n collating-sequence-option on page 2369
collating-sequence-option
translation_table
specifies one of the PROC SORT statement collating-sequence-options
(ASCII, DANISH, EBCDIC, FINNISH, NORWEGIAN, REVERSE, SWEDISH) or
a translation table, which can be one that SAS provides or any user-defined
translation table. Translation tables provided by SAS are: ASCII, DANISH,
EBCDIC, FINNISH, ITALIAN, NORWEGIAN, POLISH, REVERSE, SPANISH,
and SWEDISH.
Example For an example of using PROC TRANTAB and PROC SORT with
SORTSEQ=, see “Using Different Translation Tables for Sorting”
in SAS National Language Support (NLS): Reference Guide.
encoding-value
specifies an encoding value. The result is the same as a binary collation of
the character data represented in the specified encoding. See the supported
encoding value in the SAS National Language Support (NLS): Reference
Guide.
Restriction PROC SORT is the only procedure or part of the SAS system that
recognizes an encoding specified for the SORTSEQ= option.
See The list of the encodings that can be specified in the SAS
National Language Support (NLS): Reference Guide.
LINGUISTIC<(collating-options )>
specifies linguistic collation, which sorts characters in a culturally sensitive
manner according to rules that are associated with a language and locale.
The rules and default collating-sequence options are based on the language
that is specified in the current locale setting. The implementation is provided
by the International Components for Unicode (ICU) library. It produces
results that are largely compatible with the Unicode Collation Algorithms
(UCA). For more information, see “ Linguistic Sorting of Data Sets and ICU”
on page 2362.
Note: Only PROC SORT and PROC SQL are affected when the linguistic
collation system option is specified.
ALTERNATE_HANDLING=SHIFTED
controls the handling of variable characters like spaces, punctuation, and
symbols. When this option is not specified (using the default value Non-
Ignorable), differences among these variable characters are of the same
importance as differences among letters. If the ALTERNATE_HANDLING
option is specified, these variable characters are of minor importance.
Default NON_IGNORABLE
CASE_FIRST=
specifies the order of uppercase and lowercase letters. This argument is
valid for only TERTIARY, QUATERNARY, or IDENTICAL levels. The
following table provides the values and information for the CASE_FIRST
argument:
Table 64.3 Arguments for CASE_FIRST=
Value Description
COLLATION=
specifies character ordering. The following table lists the available
COLLATION= values.
Note: If you do not select a collation value, then the user's locale-default
collation is selected.
Value Description
Value Description
LOCALE= locale_name
specifies the locale name in the form of a POSIX name (for example,
ja_JP). For a list of locale and POSIX values supported by PROC SORT,
see “LOCALE= Values for PAPERSIZE and DFLANG, Options” in SAS
National Language Support (NLS): Reference Guide.
NUMERIC_COLLATION=
orders integer values within the text by the numeric value instead of
characters used to represent the numbers.
Table 64.5 Values for NUMERIC_COLLATION
Value Description
Value Description
Default OFF
STRENGTH=
The value of strength is related to the collation level. There are five
collation-level values. The following table provides information about the
five levels. The default value for strength is related to the locale.
Table 64.6 Values for STRENGTH=
anywhere in the
strings. Another
example is the
difference
between large
and small Kana.
IDENTICAL or When all other levels are equal, the This level should
5 identical level is used as a tiebreaker. The be used
Unicode code point values of the sparingly,
Normalization Form D (NFD) form of each because code-
string are compared at this level, just in case point value
there is no difference at levels 1-4. differences
between two
strings rarely
occur. For
example, only
Hebrew
cantillation
marks are
distinguished at
this level.
Alias LEVEL=
Alias UCA
Interaction The ICU version can change in a new SAS release. The order of
observations produced when sorting a data set linguistically,
using one release of SAS, can be different from the order
produced by another release if the two releases use different
versions of the ICU. When migrating to a new release of SAS,
consider re-establishing the sort order of permanent data sets
using PROC SORT with the PRESORTED option. For more
details, see “ Linguistic Sorting of Data Sets and ICU” on page
2362.
CAUTION
If you use a host sort utility to sort your data, then specifying a translation-
table-based collating sequence with the SORTSEQ= option might corrupt
the character BY variables. For more information, see the PROC SORT
documentation for your operating environment.
Interaction In-database processing does not occur when the SORTSEQ= option
is specified.
Other Options
Options can include one collating-sequence-option and multiple other options. The order of
the two types of options does not matter and both types are not necessary in the same PROC
SORT step.
DATA= SAS-data-set
identifies the input SAS data set.
Restrictions For in-database processing to occur, the data set must refer to a
table residing on the DBMS.
SAS data set options DROP=, KEEP=, RENAME= are not supported
in CAS. When these options are specfied on the data set, CAS
operations will not occur.
DATECOPY
copies the SAS internal date and time at which the SAS data set was created
and the date and time at which it was last modified before the sort to the
resulting sorted data set. Note that the operating environment date and time
are not preserved.
Restriction DATECOPY can be used only when the resulting data set uses the
V8 or V9 engine.
Tip You can alter the file creation date and time with the DTC= option in
the MODIFY statement in PROC DATASETS. For more information,
see “MODIFY Statement” on page 644.
DUPOUT= SAS-data-set
specifies the output data set to which duplicate observations are written.
Interactions In-database processing does not occur when the DUPOUT= option
is specified.
Tips The DUPOUT= option can be used only with the NODUPKEY
option. It cannot be combined with the NOUNIQUEKEY option.
If the DUPOUT= data set name that is specified is the same as the
INPUT data set name, SAS does not sort or overwrite the INPUT
data set. Instead, SAS generates an error message. The FORCE
option must be specified in order to overwrite the INPUT data set
with the DUPOUT= data set of the same name.
EQUALS | NOEQUALS
specifies the order of the observations in the output data set. For observations
with identical BY-variable values, EQUALS maintains the relative order of the
observations within the input data set in the output data set. NOEQUALS does
not necessarily preserve this order in the output data set.
Default EQUALS
FORCE
sorts and replaces an indexed data set when the OUT= option is not specified.
Without the FORCE option, PROC SORT does not sort and replace an indexed
data set because sorting destroys user-created indexes for the data set. When
you specify FORCE, PROC SORT sorts and replaces the data set and destroys
all user-created indexes for the data set. Indexes that were created or required
by integrity constraints are preserved.
Restriction If you use PROC SORT with the FORCE option on data sets that
were created with the Version 5 compatibility engine or with a
sequential engine such as a tape format engine, you must also
specify the OUT= option.
2378 Chapter 64 / SORT Procedure
Tip PROC SORT checks for the sort indicator before it sorts a data set
so that data is not sorted again unnecessarily. By default, PROC
SORT does not sort a data set if the sort information matches the
requested sort. You can use FORCE to override this behavior. You
might need to use FORCE if SAS cannot verify the sort specification
in the data set option SORTEDBY=. For more information about
SORTEDBY= , see the chapter on SAS data set options in SAS Data
Set Options: Reference.
NODUPKEY
checks for and eliminates observations with duplicate BY values. If you specify
this option, PROC SORT compares all BY values for each observation to the
ones for the previous observation that is written to the output data set. If an
exact match is found, the observation is not written to the output data set.
When the SORT procedure’s input is a Base SAS engine data set and the sorting
is done by SAS, then the order of observations within an output BY group is
predictable. The order of the observations within the group is the same as the
order in which they were written to the data set when it was created. Because
the Base SAS engine maintains observations in the order that they were written
to the data set, they are read by PROC SORT in the same order. While
processing, PROC SORT maintains the order of the observations because it uses
a stable sorting algorithm. The stable sorting algorithm is used because the
EQUALS option is set by default. Therefore, the observation that is selected by
PROC SORT to be written to the output data set for a given BY group is the first
observation in the data set having the BY variable values that define the group.
If the SORT procedure reads its input from an engine that does not provide a
predictable observation order or an alternative sorting program (when a host
sort performs the sort), the observations that are eliminated and the one that is
written to the output data set might not be well defined. For example,
determining which observations are kept and which observations are discarded
might be unpredictable if data is being read into SAS from a DBMS that presents
query results in a nondeterministic order due to parallel processing.
Operating Environment Information: If you use the VMS operating
environment and are using the VMS host sort, the observation that is written to
the output data set is not always the first observation of the BY group.
To ensure that each observation in the output data set is unique, you can use
the NODUPKEY option with PROC SORT and sort by _ALL_ variables in the
input data set. An observation in a data set is unique when no other observation
in the data set has the same combination of variable values. See example
PROC SORT Statement 2379
Note: If you drop one or more BY variables from the output data set when using
NODUPKEY, you void (eliminate) the guarantee that each observation has a
unique set of BY variable values. Similarly, when you use NODUPKEY and sort
by _ALL_ variables to produce unique observations, if you drop one or more
variables from the output data set, observations in the output data set are no
longer guaranteed to be unique.
Another way to ensure that observations in an output data set are unique is to
use PROC SQL with the DISTINCT keyword. Here is a simple example using
PROC SQL:
PROC SQL;
CREATE TABLE DL (keep division league) AS SELECT DISTINCT *
FROM SASHELP.BASEBALL;
QUIT;
Interactions The Base SAS engine provides a consistent ordering where the first
observation (the first observation that was written and stored) is
generally the first one that is read by PROC SORT. The sorted data
set contains only the first observation of each BY group that PROC
SORT reads.
Tip The DUPOUT= option can be used with the NODUPKEY option.
However, it cannot be used with the NOUNIQUEKEY option.
NOEQUALS
See “EQUALS|NOEQUALS” on page 2377.
NOTHREADS
See “THREADS|NOTHREADS” on page 2383.
NOUNIQUEKEY
checks for and eliminates observations from the output data set that have a
unique sort key. A sort key is unique when the observation containing the key is
the only observation within a BY group.
OUT= SAS-data-set
names the output data set. If SAS-data-set does not exist, then PROC SORT
creates it.
CAUTION
Use care when you use PROC SORT without OUT=. Without the OUT= option,
PROC SORT replaces the original data set with the sorted observations when the
procedure executes without errors.
Default Without OUT=, PROC SORT overwrites the original data set.
Note When options NODUPKEY or NOUNIKEY are specified and the DATA=
option refers to a CAS table, and the OUT=, DUPOUT=, or UNIOUT=
options refer to a CAS table, PROC SORT invokes the CAS deduplicate
action on a CAS server. For more information, see the deduplicate
Action .
PROC SORT Statement 2381
Tips With in-database sorts, the output data set cannot refer to the input
table on the DBMS.
OVERWRITE
enables the input data set to be deleted before the replacement output data set
of the same name is populated with observations.
CAUTION
Use the OVERWRITE option only with a data set that is backed up or with a
data set that you can reconstruct. Because the input data set is deleted, data is
lost if a failure occurs while the output data set is being written.
Restrictions If the OVERWRITE and OUT= options are specified and the OUT=
data set name is not the same as the INPUT data set name, SAS
does not overwrite the INPUT data set.
Tip Using the OVERWRITE option can reduce disk space requirements.
PRESORTED
before sorting, checks within the input data set to determine whether the
sequence of observations is in order. Use the PRESORTED option when you
know or strongly suspect that a data set is already in order according to the key
variables that are specified in the BY statement. By specifying this option, you
avoid the cost of sorting the data set.
Tips You can use the DATA step to import data, from external text files,
in a sequence compatible with SAS processing and according to the
sort order specified by the combination of SORT options and key
variables listed in the BY statement. You can then specify the
PRESORTED option if you know or highly suspect that the data is
sorted accordingly.
Using the PRESORTED option with ACCESS engines and DBMS data
is not recommended. These external databases are not guaranteed
to return observations in sorted order unless an ORDER BY clause is
2382 Chapter 64 / SORT Procedure
SORTSIZE=memory-specification
specifies the maximum amount of memory that is available to PROC SORT.
Valid values for memory-specification are as follows:
MAX
specifies that all available memory can be used.
n
specifies the amount of memory in bytes, where n is a real number.
nK
specifies the amount of memory in kilobytes, where n is a real number.
nM
specifies the amount of memory in megabytes, where n is a real number.
nG
specifies the amount of memory in gigabytes, where n is a real number.
Alias SIZE=
Tips Setting the SORTSIZE= option in the PROC SORT statement to MAX or
0, or not setting the SORTSIZE= option, limits the PROC SORT to the
available physical memory based on the settings of the SAS system
options REALMEMSIZE and MEMSIZE.
TAGSORT
stores only the BY variables and the observation numbers in temporary files.
The BY variables and the observation numbers are called tags. At the
completion of the sorting process, PROC SORT uses the tags to retrieve records
from the input data set in sorted order.
Note: The utility file created is much smaller than it would be if the TAGSORT
option were not specified.
Tip When the total length of BY variables is small compared with the
record length, TAGSORT reduces temporary disk usage
considerably. However, processing time might be much higher.
THREADS | NOTHREADS
enables or prevents the activation of threaded sorting.
The page size of the utility file used by PROC SORT is influenced by
the new STRIPESIZE= system option. For more information, see
“STRIPESIZE= System Option” in SAS System Options: Reference.
UNIQUEOUT= SAS-data-set
specifies the output data set for observations eliminated by the
NOUNIQUEKEY option.
Alias UNIOUT=
Interaction The DUPOUT= and UNIOUT= options are not compatible and cannot
be specified simultaneously.
BY Statement
Specifies the sorting variables.
Syntax
BY <DESCENDING> variable-1 <<DESCENDING> variable-2 …>;
Required Argument
variable
specifies the variable by which PROC SORT sorts the observations. PROC SORT
first arranges the data set by the values in ascending order, by default, of the
first BY variable. PROC SORT then arranges any observations that have the
same value of the first BY variable by the values of the second BY variable in
ascending order. This sorting continues for every specified BY variable.
Tip When using the Google BigQuery data source, columns in the BY
statement in PROC SORT cannot be of data type FLOAT64 for in-database
processing.
Optional Argument
DESCENDING
reverses the sort order for the variable that immediately follows in the
statement so that observations are sorted from the largest value to the smallest
value. The DESCENDING keyword modifies the variable that follows it.
The THREADS SAS system option is the default as long as the PROC
SORT THREADS | NOTHREADS option is unspecified.
KEY Statement
Specifies sorting keys and variables. The KEY statement is an alternative to the BY statement. The
KEY statement syntax allows for the future possibility of specifying different collation options for
each KEY variable. Currently, the only options allowed are ASCENDING and DESCENDING.
Syntax
KEY variable(s) </ option> ;
2386 Chapter 64 / SORT Procedure
Required Argument
variable(s)
specifies the variable by which PROC SORT orders the observations. Multiple
variables can be specified. Each of these variables must be separated by a
space. A range of variables can also be specified. For example, the following
code shows how to specify multiple variables and a range of variables:
data sortKeys;
input x1 x2 x3 x4 ;
cards;
7 8 9 8
0 0 0 0
1 2 3 4 ;
run;
proc sort data=sortKeys out=sortedOutput;
key x1 x2-x4;
run;
Multiple KEY statements can also be specified. The first sort key encountered
from among all sort keys is considered the primary sort key. Sorting continues
for every specified KEY statement and its variables. For example, the following
code shows how to specify multiple KEY statements:
The following code example uses the BY statement to accomplish the same
type of sort as the previous example:
Optional Arguments
ASCENDING
sorts in ascending order the variable or variables that it follows. Observations
are sorted from the smallest value to the largest value. The ASCENDING
keyword modifies all the variables that precede it in the KEY statement.
Alias ASC
Tip In a PROC SORT KEY statement, the ASCENDING option modifies all
the variables that it follows. The option must follow the /. In the
following example, the x1 variable in the input data set is sorted in
ascending order.
run;
DESCENDING
reverses the sort order for the variable that it follows in the statement so that
observations are sorted from the largest value to the smallest value. The
DESCENDING keyword modifies all the variables that it precedes in the KEY
statement.
Alias DESC
Tip In a PROC SORT KEY statement, the DESCENDING option modifies the
variables that follows it. The option must follow the /. In the following
example, the x1 and x2 variables in the input data set is sorted in
descending order:
When the DATA= input data set is stored as a table or view in a database
management system (DBMS), the PROC SORT procedure can use in-database
processing to sort the data. In-database processing can provide the advantages of
2388 Chapter 64 / SORT Procedure
faster processing and reduced data transfer between the database and SAS
software.
n Aster
n DB2
n Google BigQuery
n Greenplum
n Hadoop
n HAWQ
n Impala
n Netezza
n Oracle
n PostgreSQL
n SAP HANA
n Snowflake
n Teradata
n Vertica
n Yellowbrick
Note: When using the Google BigQuery data source, columns in the BY statement
in PROC SORT cannot be of data type FLOAT64 for in-database processing.
PROC SORT performs in-database processing using SQL explicit pass-through. The
pass-through facility uses SAS/ACCESS to connect to a DBMS and to send
statements directly to the DBMS for execution. This facility lets you use the SQL
syntax of your DBMS. For details, see "Pass-Through Facility for Relational
Databases" in SAS/ACCESS for Relational Databases: Reference.
The SAS system option SORTPGM= can also be used without setting the
SQLGENERATION option to instruct PROC SORT to use either the DBMS, SAS, or
the HOST to perform the sort. If SORTPGM=BEST is specified, then either the
DBMS, SAS, or HOST performs the sort. The observation ordering that is produced
by PROC SORT depends on whether the DBMS or SAS performs the sorting.
Usage: SORT Procedure 2389
If the DBMS performs the sort, then the configuration and characteristics of the
DBMS sorting program affects the resulting data order. The DBMS configuration
settings and characteristics that can affect data order include character collation,
ordering of NULL values, and sort stability. Most database management systems do
not guarantee sort stability, and the sort might be performed by the DBMS
regardless of the state of the SORTEQUALS/NOSORTEQUALS system option and
EQUALS/NOEQUALS procedure option.
If you set the SAS system option SORTPGM= to SAS, then unordered data is
delivered from the DBMS to SAS and SAS performs the sorting. However,
consistency in the delivery order of observations from a DBMS is not guaranteed.
Therefore, even though SAS can perform a stable sort on the DBMS data, SAS
cannot guarantee that the ordering of observations within output BY groups is the
same from one PROC SORT execution to the next. To achieve consistency in the
ordering of observations within BY groups, first populate a SAS data set with the
DBMS data, and then use the EQUALS or SORTEQUALS option to perform a stable
sort.
When the input data set references the CAS view or an in-memory table, the SORT
procedure uses the CAS deduplicate action to perform the equivalent functionality
of the NODUPKEY or NOUNIKEY options of PROC SORT. When options
NODUPKEY or NOUNIKEY are specified in the PROC SORT statement and input to
PROC SORT is read by CAS, and all output from PROC SORT (including the
optional DUPOUT= and UNIOUT= data sets) is written to a CAS table, PROC SORT
invokes the CAS deduplicate action on the CAS server. Under these conditions, the
2390 Chapter 64 / SORT Procedure
action is invoked and the work is performed in CAS with no data moved into or out
of the CAS server.
See the deduplicate Action for information about the CAS aAction.
For conceptual information about procedures that run in CAS, see Chapter 5, “CAS
Processing of Base Procedures,” on page 93.
Procedure Output
PROC SORT produces only an output data set. To see the output data set, you can
use PROC PRINT, PROC REPORT, or another of the many available methods of
printing in SAS.
Task Options
Task Options
With all three replacement options (implicit replacement, explicit replacement, and
no replacement) there must be at least enough space in the output library for a
copy of the original data set.
You can also sort compressed data sets. If you specify a compressed data set as
the input data set and omit the OUT= option, then the input data set is sorted and
remains compressed. If you specify an OUT= data set, then the resulting data set is
compressed only if you choose a compression method with the COMPRESS= data
set option. For more information, see “COMPRESS= Data Set Option” in SAS Data
Set Options: Reference .
Also note that PROC SORT manipulates the uncompressed observation in memory
and, if there is insufficient memory to complete the sort, stores the uncompressed
data in a utility file. For these reasons, sorting compressed data sets might be
intensive and require more storage than anticipated. Consider using the TAGSORT
option when sorting compressed data sets.
Note: If the SAS system option NOREPLACE is in effect, then you cannot replace
an original permanent data set with a sorted version. You must either use the OUT=
option or specify the SAS system option REPLACE in an OPTIONS statement. The
SAS system option NOREPLACE does not affect temporary SAS data sets.
Details
This example does the following:
n sorts the observations by the values of two variables
Program
data account;
input Company $ 1-22 Debt 25-30 AccountNumber 33-36
Town $ 39-51;
datalines;
Paul's Pizza 83.00 1019 Apex
World Wide Electronics 119.95 1122 Garner
Strickland Industries 657.22 1675 Morrisville
Ice Cream Delight 299.98 2310 Holly Springs
Watson Tabor Travel 37.95 3131 Apex
Boyd & Sons Accounting 312.49 4762 Garner
Bob's Beds 119.95 4998 Morrisville
Tina's Pet Shop 37.95 5108 Apex
Elway Piano and Organ 65.79 5217 Garner
Tim's Burger Stand 119.95 6335 Holly Springs
Peter's Auto Parts 65.79 7288 Apex
Deluxe Hardware 467.12 8941 Garner
Pauline's Antiques 302.05 9112 Morrisville
Apex Catering 37.95 9923 Apex
;
proc sort data=account out=bytown;
by town company;
run;
proc print data=bytown;
var company town debt accountnumber;
title 'Customers with Past-Due Accounts';
title2 'Listed Alphabetically within Town';
run;
Program Description
Create the input data set ACCOUNT. ACCOUNT contains the name of each
business that owes money, the amount of money that it owes on its account, the
account number, and the town where the business is located.
Example 1: Sorting by the Values of Multiple Variables 2393
data account;
input Company $ 1-22 Debt 25-30 AccountNumber 33-36
Town $ 39-51;
datalines;
Paul's Pizza 83.00 1019 Apex
World Wide Electronics 119.95 1122 Garner
Strickland Industries 657.22 1675 Morrisville
Ice Cream Delight 299.98 2310 Holly Springs
Watson Tabor Travel 37.95 3131 Apex
Boyd & Sons Accounting 312.49 4762 Garner
Bob's Beds 119.95 4998 Morrisville
Tina's Pet Shop 37.95 5108 Apex
Elway Piano and Organ 65.79 5217 Garner
Tim's Burger Stand 119.95 6335 Holly Springs
Peter's Auto Parts 65.79 7288 Apex
Deluxe Hardware 467.12 8941 Garner
Pauline's Antiques 302.05 9112 Morrisville
Apex Catering 37.95 9923 Apex
;
Create the output data set BYTOWN. OUT= creates a new data set for the sorted
observations.
proc sort data=account out=bytown;
Sort by two variables. The BY statement specifies that the observations should be
first ordered alphabetically by town and then by company.
by town company;
run;
Print the output data set BYTOWN. PROC PRINT prints the data set BYTOWN.
proc print data=bytown;
Specify the variables to be printed. The VAR statement specifies the variables to
be printed and their column order in the output.
var company town debt accountnumber;
Output: HTML
Output 64.3 Sorting by the Values of Multiple Variables
Details
This example does the following:
n sorts the observations by the values of three variables
Program
proc sort data=account out=sorted;
by town descending debt accountnumber;
run;
proc print data=sorted;
var company town debt accountnumber;
title 'Customers with Past-Due Accounts';
title2 'Listed by Town, Amount, Account Number';
run;
Program Description
Create the output data set SORTED. OUT= creates a new data set for the sorted
observations.
proc sort data=account out=sorted;
Sort by three variables with one in descending order. The BY statement specifies
that observations should be first ordered alphabetically by town, then by
descending value of amount owed, then by ascending value of the account number.
by town descending debt accountnumber;
run;
Print the output data set SORTED. PROC PRINT prints the data set SORTED.
proc print data=sorted;
Specify the variables to be printed. The VAR statement specifies the variables to
be printed and their column order in the output.
var company town debt accountnumber;
Output: HTML
Note that sorting last by AccountNumber puts the businesses in Apex with a debt
of $37.95 in order of account number.
2396 Chapter 64 / SORT Procedure
Details
This example does the following:
n sorts the observations by the value of the first variable
n does not maintain the relative order with the NOEQUALS option
Example 3: Maintaining the Relative Order of Observations in Each BY Group 2397
Program
data insurance;
input YearsWorked 1 InsuranceID 3-5;
datalines;
5 421
5 336
1 209
1 564
3 711
3 343
4 212
4 616
;
proc sort data=insurance out=byyears1 equals;
by yearsworked;
run;
proc print data=byyears1;
var yearsworked insuranceid;
title 'Sort with EQUALS';
run;
proc sort data=insurance out=byyears2 noequals;
by yearsworked;
run;
proc print data=byyears2;
var yearsworked insuranceid;
title 'Sort with NOEQUALS';
run;
Program Description
Create the input data set INSURANCE. INSURANCE contains the number of years
worked by all insured employees and their insurance IDs.
data insurance;
input YearsWorked 1 InsuranceID 3-5;
datalines;
5 421
5 336
1 209
1 564
3 711
3 343
4 212
4 616
;
2398 Chapter 64 / SORT Procedure
Create the output data set BYYEARS1 with the EQUALS option. OUT= creates a
new data set for the sorted observations. The EQUALS option maintains the order
of the observations relative to each other.
proc sort data=insurance out=byyears1 equals;
Sort by the first variable. The BY statement specifies that the observations should
be ordered numerically by the number of years worked.
by yearsworked;
run;
Print the output data set BYYEARS1. PROC PRINT prints the data set BYYEARS1.
proc print data=byyears1;
Specify the variables to be printed. The VAR statement specifies the variables to
be printed and their column order in the output.
var yearsworked insuranceid;
Create the output data set BYYEARS2. OUT= creates a new data set for the sorted
observations. The NOEQUALS option does not maintain the order of the
observations relative to each other.
proc sort data=insurance out=byyears2 noequals;
Sort by the first variable. The BY statement specifies that the observations should
be ordered numerically by the number of years worked.
by yearsworked;
run;
Print the output data set BYYEARS2. PROC PRINT prints the data set BYYEARS2.
proc print data=byyears2;
Specify the variables to be printed. The VAR statement specifies the variables to
be printed and their column order in the output.
var yearsworked insuranceid;
Output: HTML
Note that sorting with the EQUALS option versus sorting with the NOEQUALS
option causes a different sort order for the observations where YearsWorked=3.
Example 4: Retaining the First Observation of Each BY Group 2399
Details
For this example, we are assuming that the Base SAS engine is being used. The
Base SAS engine provides a consistent ordering where the first observation (the
first observation that was written and stored) is generally the first one that is read
by PROC SORT. The sorted data set contains only the first observation of each BY
group.
Program
proc sort data=account out=towns nodupkey;
by town;
run;
proc print data=towns;
var town company debt accountnumber;
title 'Towns of Customers with Past-Due Accounts';
run;
Program Description
Create the output data set TOWNS but include only the first observation of each
BY group. NODUPKEY writes only the first observation of each BY group to the
new data set TOWNS.
proc sort data=account out=towns nodupkey;
by town;
run;
Print the output data set TOWNS. PROC PRINT prints the data set TOWNS.
proc print data=towns;
Specify the variables to be printed. The VAR statement specifies the variables to
be printed and their column order in the output.
var town company debt accountnumber;
Output: HTML
The output data set contains only four observations, one for each town in the input
data set.
Note: For more information about strengthening the linguistic sort of strings, see
“Example 6: Linguistic Sorting Using ALTERNATE_HANDLING= and STRENGTH=”
on page 2405.
Details
In this example, PROC SORT creates an output data set that contains only the first
observation of each BY group. You have specified
ALTERNATE_HANDLING=SHIFTED because you want "a-b" to sort close to "ab"
and "aB". That is, you do not want "a-b" to appear somewhere far away from "ab" and
"aB" by virtue of its hyphen.
Notice how "a-b" and "ab" are treated equivalently in the following example. To
order them beyond the first three levels of comparison (alphabetic, diacritic, and
case), you can use the fourth level of comparison and specify STRENGTH=4.
“Example 6: Linguistic Sorting Using ALTERNATE_HANDLING= and STRENGTH=”
on page 2405 shows how to distinguish the strings further.
PROC CONTENTS shows a Sort Information section in the output. The ICU version
is also shown in the sort information. In SAS 9.4, the ICU library incorporated by
SAS and used by PROC SORT is ICU version 4.8.1. In SAS Viya, the ICU library
version incorporated by SAS and used by PROC SORT is ICU 56.
Program
data a;
length x $ 10;
x='a-b'; output;
x='ab'; output;
x='a-b'; output;
x='aB'; output;
run;
proc sort data=a sortseq=linguistic( ALTERNATE_HANDLING=SHIFTED );
by x;
run;
title1 "Linguistic Collation with ALTERNATE_HANDLING=SHIFTED";
proc print data=a;
run;
title1 "Linguistic Collation with ALTERNATE_HANDLING=SHIFTED and BY
Processing";
proc print data=a;
var x;
by x;
run;
Example 5: Linguistic Sorting Using ALTERNATE_HANDLING= 2403
Program Description
Sort the data set using linguistic sorting. Use linguistic sorting and the
ALTERNATE_HANDLING=SHIFTED option to sort the data set. Note that the
default STRENGTH for this locale is 3. Also use the BY statement to order
observations by x.
proc sort data=a sortseq=linguistic( ALTERNATE_HANDLING=SHIFTED );
by x;
run;
Print data set A. The TITLE1 statement tells the PRINT procedure the title to use
for the output. PROC PRINT then prints data set A.
title1 "Linguistic Collation with ALTERNATE_HANDLING=SHIFTED";
proc print data=a;
run;
Print data set A using By processing. The TITLE1 statement tells the PRINT
procedure the title to use for the output. PROC PRINT then prints data set A using
By processing.
title1 "Linguistic Collation with ALTERNATE_HANDLING=SHIFTED and BY
Processing";
proc print data=a;
var x;
by x;
run;
Print the Sort Information when linguistic sorting is being used. The PROC
CONTENTS output contains a Sort Information section when PROC SORT is used
with linguistic collation. This sort information also includes the ICU version being
used.
proc contents data=a;
run;
Output: HTML
The first PROC PRINT shows that the order of "a-b" and "ab" is not well defined. The
second PROC PRINT uses BY processing to show that these values are considered
2404 Chapter 64 / SORT Procedure
PROC CONTENTS prints out sort information when linguistic sorting is used.
Information about the ICU version that is being used is also provided.
Example 6: Linguistic Sorting Using ALTERNATE_HANDLING= and STRENGTH= 2405
Details
In this example, PROC SORT creates an output data set that contains only the first
observation of each BY group. In this example, ALTERNATE_HANDLING=SHIFTED
is specified because you want "a-b" to sort close to "ab" and "aB" regardless of the
hyphen.
2406 Chapter 64 / SORT Procedure
Notice how "a-b" and "ab" are treated equivalently in the following example.
However, if you want to further distinguish between them and have them appear in
two separate BY groups, you must order the strings further. To order them beyond
the first three levels of comparison (alphabetic, diacritic, and case), use the fourth
level of comparison, STRENGTH=4.
Program
data a;
length x $ 10;
x='a-b'; output;
x='ab'; output;
x='a-b'; output;
x='aB'; output;
run;
proc sort data=a sortseq=linguistic( ALTERNATE_HANDLING=SHIFTED
STRENGTH=4);
by x;
run;
title1 "Linguistic Collation with STRENGTH=4";
proc print data=a;
run;
Title1 "Linguistic Collation with STRENGTH=4 and BY Processing";
proc print data=a;
var x;
by x;
run;
Program Description
Sort the data set using linguistic sorting. Use linguistic sorting and the
ALTERNATE_HANDLING=SHIFTED option to sort the data set. Note that the
default STRENGTH for this locale is 4. The BY statement specifies that
observations should be ordered by x.
proc sort data=a sortseq=linguistic( ALTERNATE_HANDLING=SHIFTED
STRENGTH=4);
by x;
run;
Example 6: Linguistic Sorting Using ALTERNATE_HANDLING= and STRENGTH= 2407
Print the output data set A. The TITLE1 statement tells the PRINT procedure the
title to use for the output. PROC PRINT then prints data set A.
title1 "Linguistic Collation with STRENGTH=4";
proc print data=a;
run;
Print the output data set A using By processing. The TITLE statement tells the
PRINT procedure what title to use for this output. PROC PRINT then prints data set
A using By processing.
Title1 "Linguistic Collation with STRENGTH=4 and BY Processing";
proc print data=a;
var x;
by x;
run;
Output: HTML
The first PROC PRINT shows that the order of "a-b" and "ab" is not well defined.
Differentiate between the two by setting STRENGTH=4. The second PROC PRINT
uses BY processing to show the order of precedence and how they are
differentiated.
Output 64.9 Linguistic Sorting Using the ALTERNATE_HANDLING and STRENGTH Options
2408 Chapter 64 / SORT Procedure
Details
In this example, PROC SORT with NODUPKEY creates an output data set that has
no duplicate observations. Each of these observations is unique. There is only one
observation in the output data set for a given set of variable values. The BY _ALL_
variable sorts by the kept variables (KEEP= option), DIVISION and LEAGUE.
Program
proc sort data=sashelp.baseball(keep=division league)out=DL NODUPKEY;
by _ALL_;
run;
Example 7: Eliminate All Duplicate Observations Using NODUPKEY 2409
Program Description
Processing only the division and league variables from the input data set
(KEEP=), create the DL output data set and remove all duplicate entries.
proc sort data=sashelp.baseball(keep=division league)out=DL NODUPKEY;
Output: HTML
The output data set contains only four observations, one for each unique division
and league in baseball. No duplicate entries for Division and League are kept.
65
SQL Procedure
A Brief Overview
The SQL procedure is the Base SAS implementation of Structured Query Language.
PROC SQL is part of Base SAS software, and you can use it with any SAS data set
(table). Often, PROC SQL can be an alternative to other SAS procedures or the
DATA step. You can use SAS language elements such as global statements, data set
options, functions, informats, and formats with PROC SQL just as you can with
other SAS procedures. PROC SQL enables you to perform the following tasks:
n generate reports
n update and retrieve data from database management system (DBMS) tables
66
SQOOP Procedure
Sqoop commands are passed to the cluster using the Apache Oozie Workflow
Scheduler for Hadoop. PROC SQOOP defines an Oozie workflow for your Sqoop
task, which is then submitted to an Oozie server using a RESTful API.
PROC SQOOP works similarly to the Apache Sqoop command-line interface (CLI).
Using the same syntax, a user who has licensed SAS/ACCESS Interface to Hadoop
can transfer data between a database and HDFS. The user can submit Sqoop CLI
2414 Chapter 66 / SQOOP Procedure
commands in the COMMAND statement for the SQOOP procedure. The procedure
provides feedback as to whether the job completed successfully and where to get
more details from your Hadoop cluster if the Sqoop task failed.
For more information about Apache Sqoop, see the online documentation at http://
sqoop.apache.org and http://sqoop.apache.org/docs/1.4.5/index.html.
For Sqoop considerations and usage, refer to the Apache Sqoop Cookbook.
Syntax
PROC SQOOP
COMMAND='command-to-sqoop'
DBPWD='database-password'
DBUSER='database-user-name'
<DELETEWF>
HADOOPUSER='hadoop-user-name'
HADOOPPWD='hadoop-password'
<HIVE_SERVER='server-name-and-port'>
<HIVE_URI='JDBC-URI'>
<JOBTRACKER='job-tracker-URL'>
<NAMENODE='name-node-URL'>
OOZIEURL='oozie-URL'
<PASSWORDFILE='password-file'>
<WFHDFSPATH='Oozie-workflow-path'>;
run;
Required Arguments
COMMAND='command-to-sqoop'
specifies the Apache Sqoop command. Here is how you must specify the
command:
SQOOP-command --option … --option
Tip For more information about Apache Sqoop commands, see the
online documentation at http://sqoop.apache.org.
DBPWD='database-password'
specifies the database password that is associated with the DBUSER option.
Because this option is mutually exclusive with the PASSWORDFILE option, you
must specifiy only one or the other.
DBUSER='database-user-name'
specifies the database user name to use for import or export.
HADOOPPWD='hadoop-password'
specifies the Hadoop password that is associated with the HADOOPUSER=
option.
HADOOPUSER='hadoop-user-name'
specifies the Hadoop user name to use for import or export.
2416 Chapter 66 / SQOOP Procedure
OOZIEURL='oozie-URL'
specifies the URL to the Oozie server.
Optional Arguments
DELETEWF
specifies that, if an Oozie workflow file exists as specified by the location in
WFHDFSPATH, it should be deleted. The SQOOP procedure then creates a new
workflow file at that location.
HIVE_SERVER='server-name-and-port'
specifies the Hive server name that runs the Hive service and the port number
to use to connect to the specified Hive service.
HIVE_URI=jdbc:hive2://HiveServerHost:HiveServerPort</
Schema:Property1=Value;...><PropertyN=Value;>
specifies the JDBC URI.
Example jdbc:hive2://MyHiveServer:10000
JOBTRACKER='job-tracker-URL'
specifies the URL to the JobTracker or ResourceManager services.
NAMENODE='name-node-URL'
specifies the URL to the NameNode services.
PASSWORDFILE='password-file'
specifies the name of the file that is located in HDFS that contains the database
password for import or export. This separate password file must exist before
you can run this procedure, and care should be taken to keep this file secure.
WFHDFSPATH='Oozie-workflow-path-and-filename'
specifies the path and filename for where to upload the Oozie workflow, as
shown below.
/user/myID/mydir/myfile.xml
General Usage
Because the Oracle JDBC Connector requires it, you must specify the value to be
used for the --table option in Sqoop in uppercase letters. For details about case
sensitivity for tables, see the documentation for your specific DBMS.
Connection strings should include the character set option that is appropriate for
the data to be imported. For details, refer to your connector documentation.
Using Workflows
Workflows are created only when they are required. Some SQOOP jobs can use
Oozie proxy submission, which generates no workflow file. Proxy submission is
selected if you are running Oozie 4.1 or later and are not using a Hive table as the
destination. The SAS log contains a note if PROC SQOOP uses proxy submission.
Note: For proper default NAMENODE and JOBTRACKER port values for your
environment, check the configuration for your particular distribution or refer to your
Hadoop documentation.
67
STANDARD Procedure
Standardizing Data
The following output shows a simple standardization where the output data set
contains standardized student exam scores. The statements that produce the
output follow:
proc standard data=score mean=75 std=5
out=stndtest;
run;
1 Capalleti 80.5388
2 Dubose 64.3918
3 Engles 80.9143
4 Grant 68.8980
5 Krupski 75.2816
6 Lundsford 79.7877
7 McBane 73.4041
8 Mullen 78.6612
9 Nguyen 74.9061
10 Patel 71.9020
11 Si 73.4041
12 Tanaka 77.9102
The following output shows a more complex example that uses BY-group
processing. PROC STANDARD computes Z scores separately for two BY groups by
standardizing life-expectancy data to a mean of 0 and a standard deviation of 1.
The data are 1950 and 1993 life expectancies at birth for 16 countries. The birth
rates for each country, classified as stable or rapid, form the two BY groups. The
statements that produce the analysis also do the following:
n print statistics for each variable to standardize
For an explanation of the program that produces this output, see “Example 2:
Standardizing BY Groups and Replacing Missing Values” on page 2435.
Overview: STANDARD Procedure 2423
Standard
Name Mean Deviation N
Label
Standard
Name Mean Deviation N
Label
Population
Rate Country Life50 Life93
Restriction: This procedure is not available in SAS Viya orders that include only SAS Visual
Analytics.
Examples: “Example 2: Standardizing BY Groups and Replacing Missing Values” on page 2435
“Example 1: Standardizing to a Given Mean and Standard Deviation” on page 2432
PROC STANDARD Statement 2425
Syntax
PROC STANDARD <options>;
Preserve values
PRESERVERAWBYVALUES
preserves raw by values.
Without Arguments
If you do not specify MEAN=, REPLACE, or STD=, the output data set is an identical
copy of the input data set.
Optional Arguments
DATA=SAS-data-set
specifies the input SAS data set.
Restriction You cannot use PROC STANDARD with an engine that supports
concurrent access if another user is updating the data set at the
same time.
EXCLNPWGT
excludes observations with nonpositive weight values (zero or negative). The
procedure does not use the observation to calculate the mean and standard
deviation, but the observation is still standardized. By default, the procedure
treats observations with negative weights like those with zero weights and
counts them in the total number of observations.
Alias EXCLNPWGTS
MEAN=mean-value
standardizes variables to a mean of mean-value.
NOPRINT
suppresses the printing of the procedure output. NOPRINT is the default value.
OUT=SAS-data-set
specifies the output data set. If SAS-data-set does not exist, PROC STANDARD
creates it. If you omit OUT=, the data set is named Datan, where n is the
smallest integer that makes the name unique.
Default Datan
PRESERVERAWBYVALUES
preserves raw by values. of all BY variables when those variables are
propagated to the output data set.
PRINT
prints the original frequency, mean, and standard deviation for each variable to
standardize.
REPLACE
replaces missing values with the variable mean.
Interaction If you use MEAN=, PROC STANDARD replaces missing values with
the given mean.
STD=std-value
standardizes variables to a standard deviation of std-value.
VARDEF=divisor
specifies the divisor to use in the calculation of variances and standard
deviation. The following table shows the possible values for divisor and the
associated divisors.
Table 67.1 Possible Values for VARDEF=
N Number of observations n
Default DF
Tips When you use the WEIGHT statement and VARDEF=DF, the variance is
an estimate of σ 2 , where the variance of the ith observation is
var xi = σ 2 /wi and wi is the weight for the ith observation. This yields an
estimate of the variance of an observation with unit weight.
BY Statement
Calculates standardized values separately for each BY group.
Syntax
BY <DESCENDING> variable-1 <<DESCENDING> variable-2 …> <NOTSORTED>;
Required Argument
variable
specifies the variable that the procedure uses to form BY groups. You can
specify more than one variable. If you do not use the NOTSORTED option in the
BY statement, then the observations in the data set must either be sorted by all
the variables that you specify. Otherwise, they must be indexed appropriately.
These variables are called BY variables.
Optional Arguments
DESCENDING
specifies that the data set is sorted in descending order by the variable that
immediately follows the word DESCENDING in the BY statement.
NOTSORTED
specifies that observations are not necessarily sorted in alphabetic or numeric
order. The data are grouped in another way, such as chronological order.
FREQ Statement
Specifies a numeric variable whose values represent the frequency of the observation.
Tip: The effects of the FREQ and WEIGHT statements are similar except when
calculating degrees of freedom.
See: For an example that uses the FREQ statement, see “FREQ” on page 79
Syntax
FREQ variable;
WEIGHT Statement 2429
Required Argument
variable
specifies a numeric variable whose value represents the frequency of the
observation. If you use the FREQ statement, the procedure assumes that each
observation represents n observations, where n is the value of variable. If n is
not an integer, SAS truncates it. If n is less than 1 or is missing, the procedure
does not use that observation to calculate statistics but the observation is still
standardized.
The sum of the frequency variable represents the total number of observations.
VAR Statement
Specifies the variables to standardize and their order in the printed output.
Default: If you omit the VAR statement, PROC STANDARD standardizes all numeric
variables not listed in the other statements.
Example: “Example 1: Standardizing to a Given Mean and Standard Deviation” on page 2432
Syntax
VAR variable(s);
Required Argument
variable(s)
identifies one or more variables to standardize.
WEIGHT Statement
Specifies weights for analysis variables in the statistical calculations.
See: For information about calculating weighted statistics and for an example that uses
the WEIGHT statement, see “WEIGHT” on page 82.
Syntax
WEIGHT variable;
2430 Chapter 67 / STANDARD Procedure
Required Argument
variable
specifies a numeric variable whose values weight the values of the analysis
variables. The values of the variable do not have to be integers. The table below
shows what the action will be based on the weight value.
Table 67.2 WEIGHT Statement Value and PROC STANDARD Action
Less than 0 Converts the weight value to zero and counts the
observation in the total number of observations
To exclude observations that contain negative and zero weights from the
calculation of mean and standard deviation, use EXCLNPWGT. Note that most
SAS/STAT procedures, such as PROC GLM, exclude negative and zero weights
by default.
Tip When you use the WEIGHT statement, consider which value of the
VARDEF= option is appropriate. For more information, see
“VARDEF=divisor ” on page 2427 and the calculation of weighted statistics
in “Keywords and Formulas” on page 2700.
Details
Note: Prior to Version 7 of SAS, the procedure did not exclude the observations
with missing weights from the count of observations.
Usage: STANDARD Procedure 2431
where
xi′
is a new standardized value.
S
is the value of STD=.
M
is the value of MEAN=.
xi
is an observation's value.
x
is a variable's mean.
sx
is a variable's standard deviation.
PROC STANDARD calculates the mean ( x ) and standard deviation ( sx ) from the
input data set. The resulting standardized variable has a mean of M and a standard
deviation of S.
If the data are normally distributed, standardizing is also studentizing since the
resulting data have a Student's t distribution.
2432 Chapter 67 / STANDARD Procedure
Missing Values
By default, PROC STANDARD excludes missing values for the analysis variables
from the standardization process, and the values remain missing in the output data
set. When you specify the REPLACE option, the procedure replaces missing values
with the variable's mean or the MEAN= value.
If the value of the WEIGHT variable or the FREQ variable is missing, then the
procedure does not use the observation to calculate the mean and the standard
deviation. However, the observation is standardized.
VAR statement
PRINT procedure
Details
This example does the following:
n standardizes two variables to a mean of 75 and a standard deviation of 5
Program
options nodate pageno=1 linesize=80 pagesize=60;
data score;
length Student $ 9;
input Student $ StudentNumber Section $
Test1 Test2 Final @@;
format studentnumber z4.;
datalines;
Capalleti 0545 1 94 91 87 Dubose 1252 2 51 65 91
Engles 1167 1 95 97 97 Grant 1230 2 63 75 80
Krupski 2527 2 80 69 71 Lundsford 4860 1 92 40 86
McBane 0674 1 75 78 72 Mullen 6445 2 89 82 93
Nguyen 0886 1 79 76 80 Patel 9164 2 71 77 83
Si 4915 1 75 71 73 Tanaka 8534 2 87 73 76
;
proc standard data=score mean=75 std=5 out=stndtest;
var test1 test2;
run;
proc sql;
create table combined as
select old.student, old.studentnumber,
old.section,
old.test1, new.test1 as StdTest1,
old.test2, new.test2 as StdTest2,
old.final
from score as old, stndtest as new
where old.student=new.student;
proc print data=combined noobs round;
title 'Standardized Test Scores for a College Course';
run;
2434 Chapter 67 / STANDARD Procedure
Program Description
Set the SAS system options. The NODATE option specifies to omit the date and
time at which the SAS job began. The PAGENO= option specifies the page number
for the next page of output that SAS produces. The LINESIZE= option specifies the
line size. The PAGESIZE= option specifies the number of lines for a page of SAS
output.
options nodate pageno=1 linesize=80 pagesize=60;
Create the Score data set. This data set contains test scores for students who took
two tests and a final exam. The FORMAT statement assigns the Zw.d format to
StudentNumber. This format pads right-justified output with 0s instead of blanks.
The LENGTH statement specifies the number of bytes to use to store values of
Student.
data score;
length Student $ 9;
input Student $ StudentNumber Section $
Test1 Test2 Final @@;
format studentnumber z4.;
datalines;
Capalleti 0545 1 94 91 87 Dubose 1252 2 51 65 91
Engles 1167 1 95 97 97 Grant 1230 2 63 75 80
Krupski 2527 2 80 69 71 Lundsford 4860 1 92 40 86
McBane 0674 1 75 78 72 Mullen 6445 2 89 82 93
Nguyen 0886 1 79 76 80 Patel 9164 2 71 77 83
Si 4915 1 75 71 73 Tanaka 8534 2 87 73 76
;
Generate the standardized data and create the Stndtest output data set. PROC
STANDARD uses a mean of 75 and a standard deviation of 5 to standardize the
values. OUT= identifies Stndtest as the data set to contain the standardized values.
proc standard data=score mean=75 std=5 out=stndtest;
Specify the variables to standardize. The VAR statement specifies the variables to
standardize and their order in the output.
var test1 test2;
run;
Create a data set that combines the original values with the standardized values.
PROC SQL joins Score and Stndtest to create the Combined data set (table) that
contains standardized and original test scores for each student. Using AS to rename
the standardized variables NEW.TEST1 to StdTest1 and NEW.TEST2 to StdTest2
makes the variable names unique.
proc sql;
create table combined as
select old.student, old.studentnumber,
old.section,
old.test1, new.test1 as StdTest1,
old.test2, new.test2 as StdTest2,
old.final
from score as old, stndtest as new
where old.student=new.student;
Example 2: Standardizing BY Groups and Replacing Missing Values 2435
Print the data set. PROC PRINT prints the Combined data set. ROUND rounds the
standardized values to two decimal places. The TITLE statement specifies a title.
proc print data=combined noobs round;
title 'Standardized Test Scores for a College Course';
run;
Output: Listing
The following data set contains variables with both standardized and original
values. StdTest1 and StdTest2 store the standardized test scores that PROC
STANDARD computes.
Details
This example does the following:
n calculates Z scores separately for each BY group using a mean of 0 and standard
deviation of 1
n replaces missing values with the given mean
n prints the mean and standard deviation for the variables to standardize
Program
options nodate pageno=1 linesize=80 pagesize=60;
proc format;
value popfmt 1='Stable'
2='Rapid';
run;
data lifexp;
input PopulationRate Country $char14. Life50 Life93 @@;
label life50='1950 life expectancy'
life93='1993 life expectancy';
datalines;
2 Bangladesh . 53 2 Brazil 51 67
2 China 41 70 2 Egypt 42 60
2 Ethiopia 33 46 1 France 67 77
1 Germany 68 75 2 India 39 59
2 Indonesia 38 59 1 Japan 64 79
2 Mozambique . 47 2 Philippines 48 64
1 Russia . 65 2 Turkey 44 66
1 United Kingdom 69 76 1 United States 69 75
;
proc sort data=lifexp;
by populationrate;
run;
proc standard data=lifexp mean=0 std=1 replace
print out=zscore;
by populationrate;
format populationrate popfmt.;
title1 'Life Expectancies by Birth Rate';
run;
proc print data=zscore noobs;
title 'Standardized Life Expectancies at Birth';
title2 'by a Country''s Birth Rate';
run;
Example 2: Standardizing BY Groups and Replacing Missing Values 2437
Program Description
Set the SAS system options. The NODATE option specifies to omit the date and
time at which the SAS job began. The PAGENO= option specifies the page number
for the next page of output that SAS produces. The LINESIZE= option specifies the
line size. The PAGESIZE= option specifies the number of lines for a page of SAS
output.
options nodate pageno=1 linesize=80 pagesize=60;
Assign a character string format to a numeric value. PROC FORMAT creates the
format POPFMT to identify birth rates with a character value.
proc format;
value popfmt 1='Stable'
2='Rapid';
run;
Create the Lifeexp data set. Each observation in this data set contains information
about 1950 and 1993 life expectancies at birth for 16 nations. The birth rate for each
nation is classified as stable (1) or rapid (2). The nations with missing data obtained
independent status after 1950. Data are from Vital Signs 1994: The Trends That Are
Shaping Our Future, Lester R. Brown, Hal Kane, and David Malin Roodman, eds.
Copyright © 1994 by Worldwatch Institute. Reprinted by permission of W.W.
Norton & Company, Inc.
data lifexp;
input PopulationRate Country $char14. Life50 Life93 @@;
label life50='1950 life expectancy'
life93='1993 life expectancy';
datalines;
2 Bangladesh . 53 2 Brazil 51 67
2 China 41 70 2 Egypt 42 60
2 Ethiopia 33 46 1 France 67 77
1 Germany 68 75 2 India 39 59
2 Indonesia 38 59 1 Japan 64 79
2 Mozambique . 47 2 Philippines 48 64
1 Russia . 65 2 Turkey 44 66
1 United Kingdom 69 76 1 United States 69 75
;
Sort the Lifeexp data set. PROC SORT sorts the observations by the birth rate.
proc sort data=lifexp;
by populationrate;
run;
Generate the standardized data for all numeric variables and create the Z-score
output data set. PROC STANDARD standardizes all numeric variables to a mean of
1 and a standard deviation of 0. REPLACE replaces missing values. PRINT prints
statistics.
proc standard data=lifexp mean=0 std=1 replace
print out=zscore;
Create the standardized values for each BY group. The BY statement standardizes
the values separately by birth rate.
by populationrate;
2438 Chapter 67 / STANDARD Procedure
Assign a format to a variable and specify a title for the report. The FORMAT
statement assigns a format to PopulationRate. The output data set contains
formatted values. The TITLE statement specifies a title.
format populationrate popfmt.;
title1 'Life Expectancies by Birth Rate';
run;
Print the data set. PROC PRINT prints the ZSCORE data set with the standardized
values. The TITLE statements specify two titles to be printed.
proc print data=zscore noobs;
title 'Standardized Life Expectancies at Birth';
title2 'by a Country''s Birth Rate';
run;
Output: Listing
PROC STANDARD prints the variable name, mean, standard deviation, input
frequency, and label of each variable to standardize for each BY group. Life
expectancies for Bangladesh, Mozambique, and Russia are no longer missing. The
missing values are replaced with the given mean (0).
Example 2: Standardizing BY Groups and Replacing Missing Values 2439
Standard
Name Mean Deviation N Label
Standard
Name Mean Deviation N Label
Population
Rate Country Life50 Life93
68
STREAM Procedure
The STREAM procedure is valid in SAS Viya. Support for the STREAM procedure
with SAS Viya was added in SAS 9.4M5.
%macro doit(nrows,ncols);
<table>
%do i=1 %to &nrows;
<tr>
%do j=1 %to &ncols;
<td>&j</td>
%end;
</tr>
%end;
</table>
%mend doit;
If this macro is executed within a SAS code stream, then HTML output will be
produced but the output will be syntactically invalid because it is not valid SAS
syntax. However, if this macro is executed through PROC STREAM, the HTML
output will instead be written to a file and not validated as SAS syntax. The
following is an example:
<table> <tr> <td>1</td> <td>2</td> <td>3</td> </tr> <tr> <td>1</td> <td>2</td> <td>3</td> </tr>
</table>
You can use the %INCLUDE statement in the input stream. The types of input files
that you can use include HTML files, RTF files, or any other type of text-based file.
Note that you would not likely use %INCLUDE to include actual SAS code except
for macro definitions and invocations. Any actual SAS code beyond macros is not
executed, but is treated like any other text.
Concepts: STREAM Procedure 2443
All macro variable references are resolved, but warnings are issued if the macro
name is not recognized. Macro statements such as %PUT and %LIST send output to
the log and do not produce tokens. You should avoid using these statements in an
input stream, although they are allowed.
Tokenizer Limitations
The SAS tokenizer or word-scanner has a number of limitations that can be a
problem for correctly processing syntax input streams other than SAS syntax input
streams. These limitations are described in the following list.
n Input records that come from the %INCLUDE statement cannot exceed a record
size of 32,767 bytes, which is the default value.
n The tokenizer does not provide accurate information about record size to PROC
STREAM if the input record from %INCLUDE exceeds 32,767 bytes.
n %INCLUDE and %LET statements must begin on a statement boundary. That is,
a semicolon must precede the statements, and the statements must end with a
semicolon. Macro invocations do not have this requirement.
n Macro statements are completely consumed and executed by the tokenizer, and
PROC STREAM is not aware of them. For example, %let xyz=2; is completely
consumed. This behavior also holds true for macro statements that are not
global in scope, such as %IF and %GOTO. These macro statements are flagged
as errors but PROC STREAM is not aware of the error.
n Any token that starts with the letters a-z (uppercase or lowercase), or an
underscore that is preceded by an &, is assessed by the macro processor to
determine whether it is a macro variable to be resolved. If it is not a macro
variable, a warning is issued. Currently, this warning cannot be suppressed.
n PL/I programming language comments (/* comment */) are typically completely
absorbed by the tokenizer and are not seen by PROC STREAM. You can use the
NOABSSCMT option in the PROC STREAM statement so that all text between
and including the /* and */ are seen. This option is also needed if the input
stream can contain occurrences of /* that are not enclosed in quotation marks.
An example of this is /tmp/xyz/*, which is a UNIX wildcard specification.
n No single token can exceed 32,767 characters.
stream. If, instead, ‘abc, preceded by a blank, appears in columns 77-80, and def’,
followed by a blank, appears in columns 1-4, then PROC STREAM receives
‘abcdef’.
n Non-printable characters (except for carriage return (CR) and line feed (LF))
cannot be properly tokenized.
n The input stream cannot be a binary stream such as a PDF or a JPG file.
n The normal behavior for the tokenizer is to provide a string, either with single or
double quotation marks, as a single token. This is a problem if text, such as don't
do this, is not enclosed in quotation marks, or if text has an escape sequence, as
in don''t do this. The QUOTING= option can be used to allow for quotation marks
to be treated like any other special character, such a hyphen or a slash.
n A single macro variable can expand into no more than 64,000 characters.
However, there is no limit to what a macro can return as model text, apart from
resource limitations such as memory and disk space.
Statement Task
Syntax
PROC STREAM OUTFILE= fileref <options>; BEGIN
text-1
<text-n>
;;;;
PROC STREAM Statement 2445
Required Arguments
OUTFILE=fileref
specifies the file where all tokens are written.
The LRECL specification for the fileref is used. If no LRECL is given, then the
default value for the global LRECL= option, which is 32,767 bytes, is used.
Unless you use the PRESCOL option, all tokens are streamed out with the
proper number of intervening blanks between tokens. No token is broken
between records. Also, no stream of tokens is broken between records unless
there is at least one blank within them. For example, <table>X</table> will not
be broken between records, but <table> X </table> can be broken where you
see the blanks (before and after the X).
text
specifies the SAS statements or macros to use with PROC STREAM.
Optional Arguments
MOD
specifies that the output file is appended to instead of being overwritten.
NOABSSCMT
specifies whether comments are written to the output stream.
If PL/I programming language style comments appear (/* comments */), all
text between the comment characters (/* and */) appears in the output stream.
If this option is omitted, the PL/I-style comments do not appear in the output
stream. Note that if NOABSSCMT is set, it is strongly suggested that
QUOTING= also be set, because single quotation marks (such as in the word
don't) can commonly appear in comments.
PRESCOL
indicates that an attempt is made to preserve the columns of the original input
file.
The PRESCOL option improves the validity of RTF files that are included with
the %INCLUDE macro.
SINGLE
specifies that single quotation marks are treated like any other character. If
you use the SINGLE option and macro references occur within single
quotation marks, such as '&hello', the macro references are expanded.
DOUBLE
specifies that double quotation marks are treated like any other character.
BOTH
specifies that both SINGLE and DOUBLE options are used.
RESETDELIM='label'
indicates a special marker token.
This option is used when there is a need for statements to be expanded, such as
the %INCLUDE and %LET statements. These statements must begin on a
statement boundary. If your syntax does not allow for a statement boundary,
then the given label, followed by a semicolon, can be introduced in the input
stream to satisfy the tokenizer requirements. The label and semicolon are not
sent to the output file.
Mylabel must be a valid SAS name, that is, it must begin with a letter or
underscore, and all subsequent characters must be letters, underscores, or
digits. The length of mylabel must not exceed 32 characters.
The specification and usage is not case-sensitive. There is no default value for
RESETDELIM=.
With this consideration, you can add &STREAMDELIM into the input for PROC
STREAM. When this macro variable is seen, it is assumed that optional
keywords will follow. A closing semicolon is then expected. All tokens from the
Usage: STREAM Procedure 2447
&STREAMDELIM value up to and including the semicolon are not emitted to the
output stream, but are instead special control information items for PROC
STREAM.
NEWLINE
specifies that a new line is emitted to the output file.
READFILE filename
specifies that the given filename is opened, and its contents are read as is
and written to the output file. There is no macro expansion of the contents
of this file, and new lines are preserved. This differs from %INCLUDE, where
macro expansion occurs and new lines are ignored. For an example of how to
use the READFILE keyword, see “Using the READFILE Keyword” on page
2450.
Note that there are no line breaks, but blanks are preserved, and a blank is inserted
for each line break.
2448 Chapter 68 / STREAM Procedure
The fact that the occurrence of % and & in the input stream are not enclosed in
quotation marks and are not escape sequences can cause problems. For example,
& appears in HTML streams as an escape sequence for an ampersand. If you
have a macro variable called &, then its value is substituted for &. If this
type of macro variable is not present, then SAS issues a warning. You can avoid
these problems by using the & as an escape character, as in %STR(&), as the
following example shows:
<title>A %str(&)amp; B</title>
If the escape sequence for %STR occurs within single quotation marks, then either
use the QUOTING= option, or use the escape sequence with a single quotation
mark, %STR(%').
To prevent this, use the &STREAMDELIM; statement. You can always use the
&STREAMDELIM; statement after BEGIN to indicate the beginning of any input
stream.
When you include this statement, the starting point of the input stream is
interpreted correctly:
filename dheader "c:\temp\dheader.txt"
proc stream outfile=dheader; begin &streamdelim;
[SAS version 9.4]
;;;;
Usage: STREAM Procedure 2449
This statement causes the GETOPTION function to be called, which obtains the
value of the OBS option. GETOPTION returns the value as a character string to be
placed into the ABC macro variable. That value is then written to the output
stream.
In this example, the fileref MYCODE points to the following SAS code:
filename myhtml "c:\temp\temp.txt";
data _null_;
file myhtml;
put '<table>';
put '<tr><td>1</td></tr>';
put '<tr><td>2</td></tr>';
put '</table>';
run;
If you then execute the following STREAM procedure, you can see that PROC
STREAM uses mycode as an argument to DOSUB in the %SYSFUNC function:
filename myhtml "c:\temp\temp.txt";
filename new "c:\temp\new.html";
<table><tr><td>1</td></tr><tr><td>2</td></tr></table>
Note: The DOSUB function is similar to the DOSUBL function, but DOSUB is
passed a fileref for a file that contains SAS code. DOSUBL is passed a text string
and executes the value as SAS code. For more information, see “DOSUBL Function”
in SAS Functions and CALL Routines: Reference.
2450 Chapter 68 / STREAM Procedure
When you execute this SAS program, a new RTF file is created called Newrtf.rtf.
The difference in this output file is that the macro substitutions have taken place,
but the markup language remains intact:
%let name=John;
filename oldrtf 'mytest.rtf' recfm=v lrecl=32767;
filename newrtf 'newrtf.rtf' recfm=v lrecl=32767;
proc stream outfile=newrtf quoting=both asis; begin
&streamdelim;
%include oldrtf;
;;;;
Then when the Newrtf.rtf file is read by Microsoft Word, this is what is displayed:
Today is 30NOV12.
My name is John.
;;;;
%macro doit(nrows,ncols);
<table>
%do i=1 %to &nrows;
<tr>
%do j=1 %to &ncols;
<td>&j</td>
%end;
</tr>
%end;
%mend;
%macro doit(nrows,ncols);
<table>
%do i=1 %to &nrows;
<tr>
%do j=1 %to &ncols;
<td>&j</td>
%end;
</tr>
%end;
%mend;
<PRE>
This is the first line of fixed text.
This is another line to be fixed.
This is the last line of fixed text.
</PRE> <table> <tr> <td>1</td> <td>2</td> <td>3</dt> </tr>
<tr> <td>1</td> <td>2</td> <td>3</dt> </tr> </table>
data _null_;
infile datalines;
file temp1;
input;
l=length(_infile_);
put @1 _infile_ $varying200. l;
datalines4;
<table>
<tr>
<td>abc</td><td>def</td>
</tr>
<tr>
<td>ghi</td><td>jkl</td>
</tr>
</table>
;;;;
my item here;
my next item here
The following example is not coded correctly because the semicolon after here is
not recognized:
proc stream outfile=abc; begin
my item here;
;;;;
69
SUMMARY Procedure
PROC SUMMARY can be used in SAS Cloud Analytic Services (CAS). Running
PROC SUMMARY with CAS actions has several advantages over processing within
SAS. See Chapter 40, “MEANS Procedure,” on page 1463 for information about
running PROC MEANS and PROC SUMMARY in CAS.
observations that are produced by a data source, such as a DBMS that delivers
query results through an ACCESS engine. For more information, see “Numerical
Accuracy in SAS Software” in SAS Language Reference: Concepts and “Threading in
Base SAS” in SAS Language Reference: Concepts.
Statement Task
Statement Task
See: For full syntax details, see “PROC MEANS Statement” on page 1476.
Details
PRINT | NOPRINT
specifies whether PROC SUMMARY displays the descriptive statistics. By
default, PROC SUMMARY does not display output, but PROC MEANS does
display output.
Default NOPRINT
VAR Statement
Identifies the analysis variables and their order in the results.
Default: If you omit the VAR statement, then PROC SUMMARY produces a simple count of
observations, whereas PROC MEANS tries to analyze all the numeric variables that
are not listed in the other statements.
Interaction: If you specify statistics in the PROC SUMMARY statement and the VAR statement
is omitted or if a numeric variable is not associated with a statistic in the OUTPUT
statement , then PROC SUMMARY stops processing and an error message is
written to the SAS log.
Note: See the VAR Statement in PROC MEANS for a full description of the VAR
statement.
2458 Chapter 69 / SUMMARY Procedure
70
TABULATE Procedure
PROC TABULATE computes many of the same statistics that are computed by
other descriptive statistical procedures such as MEANS, FREQ, and REPORT. PROC
TABULATE provides the following features:
n simple but powerful methods to create tabular reports
Simple Tables
The following output shows a simple table that was produced by PROC TABULATE.
The data set “ENERGY” on page 2788 contains data on expenditures of energy by
two types of customers, residential and business, in individual states in the
Northeast (1) and West (4) regions of the United States. The table sums
expenditures for states within a geographic division. The RTS option provides
enough space to display the column headings without hyphenating them.
Complex Tables
The following output is a more complicated table using the same data set that was
used to create Output 70.346 on page 2461. The statements that create this report
do the following:
n customize column and row headings
For an explanation of the program that produces this report, see “Example 6:
Summarizing Information with the Universal Class Variable ALL” on page 2566.
------------------------------------------------
| | Type |
| |-----------------------|
| |Residential| Business |
| | Customers | Customers |
|----------------------+-----------+-----------|
|Region |Division | | |
|----------+-----------| | |
|Northeast |New England| $7,477| $5,129 |
| |-----------+-----------+-----------|
| |Middle | | |
| |Atlantic | $19,379| $15,078 |
|----------+-----------+-----------+-----------|
|West |Mountain | $5,476| $4,729 |
| |-----------+-----------+-----------|
| |Pacific | $13,959| $12,619 |
------------------------------------------------
page
The SAS System 1 dimension
Year :2000
The SAS System 2
Year :2001
The SAS System 3
Year :2002
column dimension
ro w dimension
The table in Figure 70.88 on page 2464 contains three class variables: Region,
Division, and Type. These class variables form the eight categories listed in the
following table. (For convenience, the categories are described in terms of their
formatted values.)
2466 Chapter 70 / TABULATE Procedure
continuation message
the text that appears below the table if it spans multiple physical pages.
nested variable
a variable whose values appear in the table with each value of another variable.
Page dimension text has a style. The default style is Beforecaption. For more
information about using styles, see “Using ODS Styles with PROC TABULATE”
on page 2522.
subtable
the group of cells that is produced by crossing a single element from each
dimension of the TABLE statement when one or more dimensions contain
concatenated elements.
The value of the SAS system option CPUCOUNT= affects the performance of the
threaded sort. CPUCOUNT= suggests how many system CPUs are available for use
by the threaded procedures.
For more information, see the “THREADS System Option” in SAS System Options:
Reference and the “CPUCOUNT= System Option” in SAS System Options: Reference.
CLASS Identify variables in the input data set as class Ex. 3, Ex. 4
variables
CLASSLEV Specify a style for class variable level value Ex. 14, Ex.
headings 15
Syntax
PROC TABULATE <options>;
FORMCHAR <(position(s))>='formatting-character(s)'
defines the characters to use to construct the table outlines and dividers.
NOSEPS
eliminates horizontal separator lines from the row titles and the body of
the table.
ORDER=DATA | FORMATTED | FREQ | UNFORMATTED
orders the values of a class variable according to the specified order.
STYLE=style-override(s)
specifies one or more style overrides to use for specific areas of the
table.
Optional Arguments
ALPHA=value
specifies the confidence level to compute the confidence limits for the mean.
The percentage for the confidence limits is (1–value)×100. For example,
ALPHA=.05 results in a 95% confidence limit.
Default .05
Range 0-1
CLASSDATA=SAS-data-set
specifies a data set that contains the combinations of values of the class
variables that must be present in the output. Any combination of values of the
class variables appear in each table or output data set and have a frequency of
zero if they meet the following criteria:
1 occur in the CLASSDATA= data set
2 but not in the input data set
Restriction The CLASSDATA= data set must contain all class variables. Their
data type and format must match the corresponding class variables
in the input data set.
Interaction If you use the EXCLUSIVE option, then PROC TABULATE excludes
any observations in the input data set whose combinations of values
of class variables are not in the CLASSDATA= data set.
PROC TABULATE Statement 2471
Tip Use the CLASSDATA= data set to filter or supplement the input data
set.
#BYLINE
substitutes the entire BY line without leading or trailing blanks for #BYLINE
in the text string. The BY line uses the format variable-name=value.
#BYVALn
#BYVAL(BY-variable-name)
substitutes the current value of the specified BY variable for #BYVAL in the
text string.
n
specifies a variable by its position in the BY statement. For example,
#BYVAL2 specifies the second variable in the BY statement.
BY-variable-name
specifies a variable from the BY statement by its name. For example,
#BYVAL(YEAR) specifies the BY variable, YEAR. Variable-name is not
case sensitive.
#BYVARn
#BYVAR(BY-variable-name)
substitutes the name of the BY-variable or the label associated with the
variable (whatever the BY line would normally display) for #BYVAR in the
text string.
n
specifies a variable by its position in the BY statement. For example,
#BYVAR2 specifies the second variable in the BY statement.
BY-variable-name
specifies a variable from the BY statement by its name. For example,
#BYVAR(SITES) specifies the BY variable, SITES. Variable-name is not
case sensitive.
DATA=SAS-data-set
specifies the input data set.
EXCLNPWGT
excludes observations with nonpositive weight values (zero or negative) from
the analysis. By default, PROC TABULATE treats observations with negative
weights like observations with zero weights and counts them in the total
number of observations.
Alias EXCLNPWGTS
EXCLUSIVE
excludes from the tables and the output data sets all combinations of the class
variable that are not found in the CLASSDATA= data set.
FORMAT=format-name
specifies a default format for the value in each table cell. You can use any SAS
or user-defined format.
Alias F=
Default If you omit FORMAT=, then PROC TABULATE uses BEST12.2 as the
default format.
Interaction Formats that are specified in a TABLE statement override the format
that is specified with FORMAT=.
Tip The FORMAT= option is especially useful for controlling the number
of print positions that are used to print a table.
FORMCHAR <(position(s))>='formatting-character(s)'
defines the characters to use for constructing the table outlines and dividers.
position(s)
identifies the position of one or more characters in the SAS formatting-
character string. A space or a comma separates the positions.
formatting-character(s)
lists the characters to use for the specified positions.
option assigns the asterisk (*) to the third formatting character, the number
sign (#) to the seventh character, and does not alter the remaining
characters:
formchar(3,7)='*#'
3 2 5
------------------------------------
| | Expend |
| |----------| 4 7
| | Sum |
|-----------------------+----------|
|Region |Division | |
|-----------+-----------| |
1 |Northeast |New England| $12,606|
| |-----------+----------|
| |Middle | |
| |Atlantic | $34,457|
|-----------+-----------+----------|
6 |West |Mountain | $10,205| 8
| |-----------+----------|
| |Pacific | $26,578|
------------------------------------
9 10 11
See For more examples using formatting output, see PROC TABULATE
by Example, Second Edition.
MISSING
considers missing values as valid values to create the combinations of class
variables. Special missing values that are used to represent numeric values (the
letters A through Z and the underscore (_) character) are each considered as a
separate value. A heading for each missing value appears in the table.
Default If you omit MISSING, then PROC TABULATE does not include
observations with a missing value for any class variable in the report.
PROC TABULATE Statement 2475
NOSEPS
eliminates horizontal separator lines from the row titles and the body of the
table. Horizontal separator lines remain between nested column headings.
Restriction The NOSEPS option affects only the traditional SAS monospace
output destination.
Tip If you want to replace the separator lines with blanks rather than
remove them, then use option “FORMCHAR
<(position(s))>='formatting-character(s)' ” on page 2472.
NOTHREADS
disables parallel processing of the input data set. See “THREADS |
NOTHREADS” on page 2480.
DATA
orders values according to their order in the input data set.
Interaction If you use PRELOADFMT in the CLASS statement, then the order
for the values of each class variable matches the order that
PROC FORMAT uses to store the values of the associated user-
defined format. If you use the CLASSDATA= option, then PROC
TABULATE uses the order of the unique values of each class
variable in the CLASSDATA= data set to order the output levels.
If you use both options, then PROC TABULATE first uses the
user-defined formats to order the output. If you omit
EXCLUSIVE, then PROC TABULATE appends after the user-
defined format and the CLASSDATA= values the unique values
of the class variables in the input data set in the same order in
which they are encountered.
FORMATTED
orders values by their ascending formatted values. If no format has been
assigned to a numeric class variable, then the default format, BEST12., is
used. This order depends on your operating environment.
Aliases FMT
EXTERNAL
FREQ
orders values by descending frequency count.
UNFORMATTED
orders values by their unformatted values. This order depends on your
operating environment. This sort sequence is particularly useful for
displaying dates chronologically.
Aliases UNFMT
INTERNAL
Default UNFORMATTED
Interaction If you use the PRELOADFMT option in the CLASS statement, then
PROC TABULATE orders the levels by the order of the values in the
user-defined format.
OUT=SAS-data-set
names the output data set. If SAS-data-set does not exist, then PROC
TABULATE creates it.
The number of observations in the output data set depends on the number of
categories of data that are used in the tables and the number of subtables that
are generated. The output data set contains these variables (in the following
order):
QMARKERS=number
specifies the default number of markers to use for the P 2 quantile estimation
method. The number of markers controls the size of fixed memory space.
Default The default value depends on which quantiles you request. For the
median (P50), number is 7. For the quartiles (P25 and P75), number is
25. For the quantiles P1, P5, P10, P90, P95, or P99, number is 105. If you
request several quantiles, then PROC TABULATE uses the largest
default value of number.
Tip Increase the number of markers above the default settings to improve
the accuracy of the estimates; reduce the number of markers to
conserve memory and computing time.
QMETHOD=OS | P2
specifies the method PROC TABULATE uses to process the input data when it
computes quantiles. If the number of observations is less than or equal to the
QMARKERS= value and QNTLDEF=5, then both methods produce the same
results.
OS
uses order statistics. PROC UNIVARIATE uses this technique.
P2
uses the P2 method to approximate the quantile.
Alias HIST
2478 Chapter 70 / TABULATE Procedure
Default OS
QNTLDEF=1 | 2 | 3 | 4 | 5
specifies the mathematical definition that the procedure uses to calculate
quantiles when QMETHOD=OS is specified. When QMETHOD=P2, you must
use QNTLDEF=5.
Alias PCTLDEF=
Default 5
STYLE=style-override(s)
specifies one or more style overrides to use for the data cells of a table. For
example, the following statement specifies that the background color for data
cells is red:
proc tabulate data=one style=[backgroundcolor=red];
style-override
specifies one or more style attributes or style elements to override the
default style element and attributes in a specific area of a report.
<PARENT>
specifies that the data cell use the style element of its parent heading.
n the style element of the leaf heading above the row that contains the
cell, if the table specifies the style element in the row dimension
expression
n the Beforecaption style element, if the table specifies the style element
in the page dimension expression
n undefined, otherwise
Note: In this usage, the angle brackets around the word PARENT are
required. Braces or square brackets cannot be substituted in the syntax.
style-attribute-name
specifies the attribute to change.
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
style-attribute-value
specifies a value for the attribute. Each attribute has a different set of
valid values. A SAS format can also be used as an attribute value for
conditional formatting.
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
style-element-name
is the name of a style element that is part of an ODS style template.
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
Alias S=
2480 Chapter 70 / TABULATE Procedure
To specify a style element for data cells with missing values, use
STYLE= in the TABLE statement MISSTEXT= option.
You can use braces ({ and }) instead of square brackets ([ and ]).
See For information about using styles with PROC TABULATE, see “Using
ODS Styles with PROC TABULATE” on page 2522.
Example “Example 14: Specifying Style Overrides for ODS Output” on page
2607
THREADS | NOTHREADS
enables or disables parallel processing of the input data set. This option
overrides the SAS system option THREADS | NOTHREADS unless the system
option is restricted. (For more information, see “Support for Parallel Processing”
in SAS Language Reference: Concepts.)
Interaction PROC TABULATE uses the value of the SAS system option
THREADS except when a BY statement is specified or the value of
the SAS system option CPUCOUNT is less than 2. In those cases,
you can specify the THREADS option in the PROC TABULATE
statement to force PROC TABULATE to use parallel processing.
When multi-threaded processing, also known as parallel processing,
is in effect, observations might be returned in an unpredictable
order. However, the observations are sorted correctly if a BY
statement is specified.
TRAP
enables floating point exception (FPE) recovery during data processing beyond
the recovery that is provided by normal SAS FPE handling. Note that without
the TRAP option, normal SAS FPE handling is still in effect so that PROC
TABULATE terminates in the case of math exceptions.
VARDEF=divisor
specifies the divisor to use in the calculation of the variance and standard
deviation. The following table shows the possible values for divisor and the
associated divisors.
BY Statement 2481
N Number of observations n
The procedure computes the variance as CSS/divisor, where CSS is the corrected
sums of squares and equals Σ xi − x 2. When you weight the analysis variables,
CSS equals Σ wi xi − xw 2 where xw is the weighted mean.
Default DF
Requirement To compute standard error of the mean, use the default value of
VARDEF=.
Tips When you use the WEIGHT statement and VARDEF=DF, the
variance is an estimate of σ 2, where the variance of the ith
observation is var xi = σ 2 /wi, and wi is the weight for the ith
observation. This yields an estimate of the variance of an
observation with unit weight.
BY Statement
Creates a separate table for each BY group.
Accessibility When BY statements are specified, default labels for the BY group tables are
note: displayed in the table of contents in PDF and RTF output, the contents file in HTML
output, the trace record created by the ODS TRACE statement, and the entry list
created by the LIST statement in PROC DOCUMENT. The labels are based on the
values of the BY variable. For an example of creating a table with default BY group
labels, see “Overview of Table Accessibility ” in Creating Accessible SAS Output
Using ODS and ODS Graphics .
2482 Chapter 70 / TABULATE Procedure
Syntax
BY <DESCENDING> variable-1
<<DESCENDING> variable-2 ...> <NOTSORTED>;
Required Argument
variable
You can specify more than one variable. If you do not use the NOTSORTED
option in the BY statement, then the observations in the data set must either be
sorted by all the variables that you specify, or they must be indexed
appropriately. Variables in a BY statement are called BY variables.
Optional Arguments
DESCENDING
specifies that the observations are sorted in descending order by the variable
that immediately follows the word DESCENDING in the BY statement.
NOTSORTED
specifies that observations are not necessarily sorted in alphabetic or numeric
order. For example, the observations are grouped in chronological order.
CLASS Statement
Identifies class variables for the table. Class variables determine the categories that PROC
TABULATE uses to calculate statistics.
Note: CLASS statements without options use the internal default or the value specified
by an option in the PROC TABULATE statement. For example, in the following
code, variables c and d would use the internal default. If an ORDER= option had
been specified in the PROC TABULATE statement, then variables c and d would
use the value specified by the ORDER= option in the PROC TABULATE statement.
class a b / order=data;
class c d;
Some CLASS statement options are also available in the PROC TABULATE
statement. They affect all CLASS variables rather than just the ones that you
specify in a CLASS statement.
Examples: “Example 3: Using Preloaded Formats with Class Variables” on page 2556
“Example 4: Using Multilabel Formats” on page 2560
Syntax
CLASS variable(s) </ options>;
Required Argument
variable(s)
specifies one or more variables that the procedure uses to group the data.
Variables in a CLASS statement are referred to as class variables. Class
variables can be numeric or character. Class variables can have continuous
2484 Chapter 70 / TABULATE Procedure
values, but they typically have a few discrete values that define the
classifications of the variable. You do not have to sort the data by class
variables.
Interaction If a variable name and a statistic name are the same, enclose the
statistic name in single or double quotation marks.
Optional Arguments
ASCENDING
specifies to sort the class variable values in ascending order.
Alias ASCEND
DESCENDING
specifies to sort the class variable values in descending order.
Alias DESCEND
Default ASCENDING
EXCLUSIVE
excludes from tables and output data sets all combinations of class variables
that are not found in the preloaded range of user-defined formats.
GROUPINTERNAL
specifies not to apply formats to the class variables when PROC TABULATE
groups the values to create combinations of class variables.
MISSING
considers missing values as valid class variable levels. Special missing values
that represent numeric values (the letters A through Z and the underscore (_)
character) are each considered as a separate value.
Default If you omit the MISSING option, then PROC TABULATE excludes the
observations with any missing CLASS variable values from tables and
output data sets.
MLF
enables PROC TABULATE to use the format label or labels for a given range or
overlapping ranges to create subgroup combinations when a multilabel format is
assigned to a class variable.
Note: When the formatted values overlap, one internal class variable value
maps to more than one class variable subgroup combination. Therefore, the sum
of the N statistics for all subgroups is greater than the number of observations
in the data set (the overall N statistic).
Requirement You must use PROC FORMAT and the MULTILABEL option in the
VALUE statement to create a multilabel format.
Interactions Using MLF with ORDER=FREQ might not produce the order that
you expect for the formatted values.
When you specify MLF, the formatted values of the class variable
become internal values. Therefore, specifying
ORDER=FORMATTED produces the same results as specifying
ORDER=UNFORMATTED.
Tip If you omit MLF, then PROC TABULATE uses the primary format
labels, which correspond to the first external format value, to
determine the subgroup combinations.
DATA
orders values according to their order in the input data set.
Interaction If you use PRELOADFMT, then the order for the values of each
class variable matches the order that PROC FORMAT uses to
store the values of the associated user-defined format. If you use
the CLASSDATA= option in the PROC statement, then PROC
TABULATE uses the order of the unique values of each class
variable in the CLASSDATA= data set to order the output levels.
2486 Chapter 70 / TABULATE Procedure
If you use both options, then PROC TABULATE first uses the
user-defined formats to order the output. If you omit EXCLUSIVE
in the PROC statement, then PROC TABULATE places, in the
order in which they are encountered, the unique values of the
class variables that are in the input data set after the user-
defined format and the CLASSDATA= values.
FORMATTED
orders values by their ascending formatted values. This order depends on
your operating environment.
Aliases FMT
EXTERNAL
FREQ
orders values by descending frequency count.
UNFORMATTED
orders values by their unformatted values. This order depends on your
operating environment. This sort sequence is particularly useful for
displaying dates chronologically.
Aliases UNFMT
INTERNAL
Default UNFORMATTED
Interaction If you use the PRELOADFMT option in the CLASS statement, then
PROC TABULATE orders the levels by the order of the values in the
user-defined format.
Tip By default, all orders except FREQ are ascending. For descending
orders, use the DESCENDING option.
PRELOADFMT
specifies that all formats are preloaded for the class variables.
STYLE=style-override
specifies one or more style overrides to use for page dimension text and class
variable name headings. For example, the following statement specifies that the
background color for page dimension text and class variable name headings is
light green:
class region division prodtype / style=[background=lightgreen];
style-override
specifies one or more style attributes or style elements to override the
default style element and attributes in a specific area of a report.
<PARENT>
specifies that the data cell use the style element of its parent heading.
Note: In this usage, the angle brackets around the word PARENT are
required. Braces or square brackets cannot be substituted in the syntax.
style-attribute-name
specifies the attribute to change.
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
style-attribute-value
specifies a value for the attribute. Each attribute has a different set of
valid values. A SAS format can also be used as an attribute value for
conditional formatting.
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
style-element-name
is the name of a style element that is part of an ODS style template.
CLASS Statement 2489
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
Alias S=
Tips To override a style element that is specified for page dimension text in
the CLASS statement, you can specify a style element in the TABLE
statement page dimension expression.
The use of STYLE= in the CLASS statement differs slightly from its
use in the PROC TABULATE statement. In the CLASS statement,
inheritance is different for rows and columns. For rows, the parent
heading is located to the left of the current heading. For columns, the
parent heading is located above the current heading.
See For information about using styles with PROC TABULATE, see “Using
ODS Styles with PROC TABULATE” on page 2522.
Example “Example 14: Specifying Style Overrides for ODS Output” on page
2607
Details
If you specify the MISSING option in the PROC TABULATE statement, then the
procedure considers missing values as valid levels for all class variables. If you
specify the MISSING option in a CLASS statement, then PROC TABULATE
considers missing values as valid levels for the class variables that are specified in
that CLASS statement.
2490 Chapter 70 / TABULATE Procedure
CLASSLEV Statement
Specifies a style element for class variable level value headings.
Examples: “Example 14: Specifying Style Overrides for ODS Output” on page 2607
“Example 15: Style Precedence” on page 2612
Syntax
CLASSLEV variable(s) </ STYLE=style-override(s)>;
Required Argument
variable(s)
specifies one or more class variables from the CLASS statement for which you
want to specify a style element.
Optional Argument
STYLE=style-override
specifies one or more style overrides for class variable level value headings. For
example, the following statement specifies that the background color for class
variable level name headings is yellow:
classlev region division prodtype / style=[background=yellow];
style-override
specifies one or more style attributes or style elements to override the
default style element and attributes in a specific area of a report.
<PARENT>
specifies that the data cell use the style element of its parent heading.
CLASSLEV Statement 2491
Note: In this usage, the angle brackets around the word PARENT are
required. Braces or square brackets cannot be substituted in the syntax.
style-attribute-name
specifies the attribute to change.
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
style-attribute-value
specifies a value for the attribute. Each attribute has a different set of
valid values. A SAS format can also be used as an attribute value for
conditional formatting.
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
style-element-name
is the name of a style element that is part of an ODS style template.
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
2492 Chapter 70 / TABULATE Procedure
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
Alias S=
Tips The use of STYLE= in the CLASSLEV statement differs slightly from
its use in the PROC TABULATE statement. In the CLASSLEV
statement, inheritance is different for rows and columns. For rows, the
parent heading is located to the left of the current heading. For
columns, the parent heading is located above the current heading.
You can use braces ({ and }) instead of square brackets ([ and ]).
See For information about using styles with PROC TABULATE, see “Using
ODS Styles with PROC TABULATE” on page 2522.
Example “Example 14: Specifying Style Overrides for ODS Output” on page
2607
FREQ Statement
Specifies a numeric variable that contains the frequency of each observation.
Tip: The effects of the FREQ and WEIGHT statements are similar except when
calculating degrees of freedom.
Example: “FREQ” on page 79
Syntax
FREQ variable;
Required Argument
variable
specifies a numeric variable whose value represents the frequency of the
observation.
If you use the FREQ statement, then the procedure assumes that each
observation represents n observations, where n is the value of variable. If n is
not an integer, then SAS truncates it. If n is less than 1 or is missing, then the
procedure does not use that observation to calculate statistics.
The sum of the frequency variable represents the total number of observations.
KEYWORD Statement 2493
KEYLABEL Statement
Labels a keyword for the duration of the PROC TABULATE step. PROC TABULATE uses the label
anywhere that the specified keyword would otherwise appear.
Syntax
KEYLABEL keyword-1='description-1' <keyword-2='description-2' ...>;
Required Arguments
keyword
specifies a statistic keyword.
keyword can be one of the keywords for statistics that is discussed in “Statistics
That Are Available in PROC TABULATE” on page 2515 or is the universal class
variable ALL. (See “Elements That You Can Use in a Dimension Expression” on
page 2507.)
description
specifies the keyword label.
Restriction Each keyword can have only one label in a particular PROC
TABULATE step. If you request multiple labels for the same
keyword, then PROC TABULATE uses the last one that is specified
in the step.
KEYWORD Statement
Specifies a style element for keyword headings.
Example: “Example 14: Specifying Style Overrides for ODS Output” on page 2607
Syntax
KEYWORD keyword(s) </ STYLE=style-override(s)>;
2494 Chapter 70 / TABULATE Procedure
Required Argument
keyword
specifies the keyword statistic.
keyword can be one of the keywords for statistics that is discussed in “Statistics
That Are Available in PROC TABULATE” on page 2515 or is the universal class
variable ALL. (See “Elements That You Can Use in a Dimension Expression” on
page 2507.)
Optional Argument
STYLE=style-override
specifies one or more style overrides for the keyword headings.
For example, the following statement specifies that the background color for
keyword headings is linen:
keyword all sum / style=[background=linen];
style-override
specifies one or more style attributes or style elements to override the
default style element and attributes in a specific area of a report.
<PARENT>
specifies that the data cell use the style element of its parent heading.
Note: In this usage, the angle brackets around the word PARENT are
required. Braces or square brackets cannot be substituted in the syntax.
style-attribute-name
specifies the attribute to change.
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
style-attribute-value
specifies a value for the attribute. Each attribute has a different set of
valid values. A SAS format can also be used as an attribute value for
conditional formatting.
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
style-element-name
is the name of a style element that is part of an ODS style template.
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
Alias S=
Tips The use of STYLE= in the KEYWORD statement differs slightly from
its use in the PROC TABULATE statement. In the KEYWORD
statement, inheritance is different for rows and columns. For rows, the
parent heading is located to the left of the current heading. For
columns, the parent heading is located above the current heading.
2496 Chapter 70 / TABULATE Procedure
See For information about using styles with PROC TABULATE, see “Using
ODS Styles with PROC TABULATE” on page 2522.
Example “Example 14: Specifying Style Overrides for ODS Output” on page
2607
TABLE Statement
Describes a table to be printed.
Requirement: All variables in the TABLE statement must appear in either the VAR statement or
the CLASS statement.
Tips: To create several tables use multiple TABLE statements.
Use of variable name list shortcuts is now supported within the TABLE statement.
For more information, see “Shortcuts for Specifying Lists of Variable Names” on
page 62.
Syntax
TABLE <<page-expression,> row-expression,>
column-expression </ table-options>;
Required Argument
column-expression
defines the columns in the table. For information about constructing dimension
expressions, see “Details” on page 2507.
Optional Arguments
page-expression
defines the pages in a table. For information about constructing dimension
expressions, see “Details” on page 2507.
Accessibility Starting with SAS 9.4M6, you can use the ACCESSIBLECHECK
note and ACCESSIBLETABLE system options to check for and create
accessible tables. For information about creating accessible
PROC TABULATE tables, see “Overview of Table Accessibility ” in
Creating Accessible SAS Output Using ODS and ODS Graphics . If
the ACCESSIBLETABLE=ON system option is specified and the
TABULATE table has page dimension text, then that text is the
accessible caption.
row-expression
defines the rows in the table. For information about constructing dimension
expressions, see “Details” on page 2507.
Table Options
BOX=value
specifies text for the empty box above the row titles.
_PAGE_
writes the page-dimension text in the box. If the page-dimension text does
not fit, then it is placed in its default position above the box, and the box
remains empty.
BOX={<label=value> <STYLE=style-override(s)>}
specifies text and a style override for the empty box above the row titles.
_PAGE_
writes the page-dimension text in the box. If the page-dimension text does
not fit, then it is placed in its default position above the box, and the box
remains empty.
'string'
writes the quoted string in the box. Any string that does not fit in the box is
truncated.
variable
writes the name (or label, if the variable has one) of a variable in the box.
Any name or label that does not fit in the box is truncated.
For details about the arguments of the STYLE= option and how it is used,
see STYLE= on page 2504 in the TABLE statement.
This option makes table captions both visual and accessible if the
ACCESSIBLETABLE system option is specified.
#BYLINE
substitutes the entire BY line without leading or trailing blanks for #BYLINE
in the text string. The BY line uses the format variable-name=value.
#BYVALn
#BYVAL(BY-variable-name)
substitutes the current value of the specified BY variable for #BYVAL in the
text string.
n
specifies a variable by its position in the BY statement. For example,
#BYVAL2 specifies the second variable in the BY statement.
2500 Chapter 70 / TABULATE Procedure
BY-variable-name
specifies a variable from the BY statement by its name. For example,
#BYVAL(YEAR) specifies the BY variable, YEAR. Variable-name is not
case sensitive.
#BYVARn
#BYVAR(BY-variable-name)
substitutes the name of the BY-variable or the label associated with the
variable (whatever the BY line would normally display) for #BYVAR in the
text string.
n
specifies a variable by its position in the BY statement. For example,
#BYVAR2 specifies the second variable in the BY statement.
BY-variable-name
specifies a variable from the BY statement by its name. For example,
#BYVAR(SITES) specifies the BY variable, SITES. Variable-name is not
case sensitive.
Tip You can use the PROC DOCUMENT OBBNOTE option to display or
edit the caption.
CONDENSE
prints as many complete logical pages as possible on a single printed page or, if
possible, prints multiple pages of tables that are too wide to fit on a page one
below the other on a single page, instead of on separate pages. A logical page is
all the rows and columns that fall within one of the following:
n a page-dimension category (with no BY-group processing)
Restriction The CONDENSE option has no effect on the pages that are
generated by the BY statement. The first table for a BY group always
begins on a new page.
#BYLINE
substitutes the entire BY line without leading or trailing blanks for #BYLINE
in the text string. The BY line uses the format variable-name=value.
#BYVALn
#BYVAL(BY-variable-name)
substitutes the current value of the specified BY variable for #BYVAL in the
text string.
n
specifies a variable by its position in the BY statement. For example,
#BYVAL2 specifies the second variable in the BY statement.
BY-variable-name
specifies a variable from the BY statement by its name. For example,
#BYVAL(YEAR) specifies the BY variable, YEAR. Variable-name is not
case sensitive.
#BYVARn
#BYVAR(BY-variable-name)
substitutes the name of the BY-variable or the label associated with the
variable (whatever the BY line would normally display) for #BYVAR in the
text string.
n
specifies a variable by its position in the BY statement. For example,
#BYVAR2 specifies the second variable in the BY statement.
BY-variable-name
specifies a variable from the BY statement by its name. For example,
#BYVAR(SITES) specifies the BY variable, SITES. Variable-name is not
case sensitive.
PAGE
specifies that the format that is specified for the page dimension is applied
to the contents of the table cells.
ROW
specifies that the format that is specified for the row dimension is applied to
the contents of the table cells.
2502 Chapter 70 / TABULATE Procedure
COLUMN
specifies that the format that is specified for the column dimension is
applied to the contents of the table cells.
Alias COL
Default COLUMN
FUZZ=number
supplies a numeric value against which analysis variable values and table cell
values other than frequency counts are compared to eliminate trivial values
(absolute values less than the FUZZ= value) from computation and printing. A
number whose absolute value is less than the FUZZ= value is treated as zero in
computations and printing. The default value is the smallest representable
floating-point number on the computer that you are using.
INDENT=number-of-spaces
specifies the number of spaces to indent nested row headings, and suppresses
the row headings for class variables.
Restriction In the HTML, RTF, and Printer destinations, the INDENT= option
suppresses the row headings for class variables but does not indent
nested row headings.
Tip When there are no crossings in the row dimension, there is nothing
to indent, so the value of number-of-spaces has no effect. However,
in such cases INDENT= still suppresses the row headings for class
variables.
MISSTEXT='text'
supplies up to 256 characters of text to be printed for table cells that contain
missing values.
MISSTEXT={<label='text'> <STYLE=style-override(s)>}
supplies up to 256 characters of text to be printed and specifies a style override
for table cells that contain missing values. For details about the arguments of
the STYLE= option and how it is used, see STYLE= on page 2504 in the TABLE
statement.
Examples “Providing Text for Cells That Contain Missing Values” on page 2546
NOCELLMERGE
specifies that data cells are not merged with other data cells in the table.
TABLE Statement 2503
Note: The NOCELLMERGE option works with the ODS formatted destinations.
These include the ODS MARKUP family, ODS RTF, and the ODS PRINTER family
destinations.
Restriction The NOCELLMERGE option does not work with the traditional
monospace output.
NOCONTINUED
suppresses the continuation message, continued, that is displayed at the
bottom of tables that span multiple pages. The text is rendered with the
AFTERCAPTION style element.
PRINTMISS
prints all values that occur for a class variable each time headings for that
variable are printed, even if there are no data for some of the cells that these
headings create. Consequently, PRINTMISS creates row and column headings
that are the same for all logical pages of the table, within a single BY group.
Restriction If an entire logical page contains only missing values, then that page
is not printed regardless of the PRINTMISS option.
ROW=CONSTANT | FLOAT
specifies whether all title elements in a row crossing are allotted space even
when they are blank.
CONSTANT
allots space to all row titles even if the title has been blanked out. (For
example, N=' '.)
Alias CONST
2504 Chapter 70 / TABULATE Procedure
FLOAT
divides the row title space equally among the nonblank row titles in the
crossing.
Default CONSTANT
RTSPACE=number
specifies the number of print positions to allot to all of the headings in the row
dimension, including spaces that are used to print outlining characters for the
row headings. PROC TABULATE divides this space equally among all levels of
row headings.
Alias RTS=
Restriction The RTSPACE= option affects only the traditional SAS monospace
output destination.
Interaction By default, PROC TABULATE allots space to row titles that are
blank. Use ROW=FLOAT in the TABLE statement to divide the space
among only nonblank titles.
See For more examples of controlling the space for row titles, see PROC
TABULATE by Example, Second Edition.
STYLE=style-override
specifies one or more style overrides to use for parts of the table other than
table cells. For example, the following statement specifies that the background
color for missing values is red and the background color the box is orange:
table (region all)*(division all),
(prodtype all)*(actual*f=dollar10.) /
misstext=[label='Missing' style=[background=red]]
box=[label='Region by Division and Type' style=[backgroundcolor=orange]];
style-override
specifies one or more style attributes or style elements to override the
default style element and attributes in a specific area of a report.
<PARENT>
specifies that the data cell use the style element of its parent heading.
Note: In this usage, the angle brackets around the word PARENT are
required. Braces or square brackets cannot be substituted in the syntax.
style-attribute-name
specifies the attribute to change.
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
style-attribute-value
specifies a value for the attribute. Each attribute has a different set of
valid values. A SAS format can also be used as an attribute value for
conditional formatting.
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
2506 Chapter 70 / TABULATE Procedure
style-element-name
is the name of a style element that is part of an ODS style template.
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
Alias S=
You can use braces ({ and }) instead of square brackets ([ and ]).
See For information about using styles with PROC TABULATE, see “Using
ODS Styles with PROC TABULATE” on page 2522.
Example “Example 14: Specifying Style Overrides for ODS Output” on page
2607
PAGE
specifies that the style that is specified for the page dimension is applied to
the contents of the table cells.
ROW
specifies that the style that is specified for the row dimension is applied to
the contents of the table cells.
COLUMN
specifies that the style that is specified for the column dimension is applied
to the contents of the table cells.
Alias COL
Default COLUMN
Details
If all three dimensions are specified, then the leftmost dimension expression
defines pages, the middle dimension expression defines rows, and the rightmost
dimension expression defines columns. If two dimensions are specified, then the
left dimension expression defines rows, and the right dimension expression defines
columns. If a single dimension is specified, then the dimension expression defines
columns.
class variables
(See the CLASS statement on page 2482.)
Note: If the input data set contains a variable named ALL, then enclose the
name of the universal class variable in quotation marks.
Default For analysis variables, the default statistic is SUM. Otherwise, the
default statistic is N.
2508 Chapter 70 / TABULATE Procedure
Example n
Region*n
Sales*max
format modifiers
define how to format values in cells. Use the asterisk (*) operator to associate a
format modifier with the element (an analysis variable or a statistic) that
produces the cells that you want to format. Format modifiers have the form
f=format
Example Sales*f=dollar8.2
labels
temporarily replace the names of variables and statistics. Labels affect only the
variable or statistic that immediately precedes the label. Labels have the form
statistic-keyword-or-variable-name='label-text'
Tip PROC TABULATE eliminates the space for blank column headings
from a table but by default does not eliminate the space for blank row
headings unless all row headings are blank. Use ROW=FLOAT in the
TABLE statement to remove the space for blank row headings.
style specifications
specify style elements and style attributes for page dimension text, headings, or
data cells. For details, see “ Specifying Style Attributes and Style Elements in
Dimension Expressions” on page 2510.
Example Region*Division
Quarter*Sales*f=dollar8.2
(blank)
places the output for each element immediately after the output for the
preceding element. This process is called concatenation.
parentheses ()
group elements and associate an operator with each concatenated element in
the group.
n data cells
n keyword headings
The syntax for specifying style elements and style attributes in a dimension
expression is
[STYLE<(CLASSLEV)>=<style-element-name | PARENT>
[style-attribute-name-1=style-attribute-value-1< style-attribute-name-2=style-
attribute-value-2 ...>]]
dept={label='Department'
style=[color=red]}, N
n dept*[style=MyDataStyle], N
n dept*[format=12.2 style=MyDataStyle], N
Note: When used in a dimension expression, the STYLE= option must be enclosed
within square brackets ([ and ]) or braces ({ and }).
(CLASSLEV)
assigns a style element to a class variable level value heading. For example, the
following TABLE statement specifies that the level value heading for the class
variable, DEPT, has a foreground color of yellow:
table dept=[style(classlev)=
[color=yellow]]*sales;
For an example that shows how to specify style elements within dimension
expressions, see “Example 14: Specifying Style Overrides for ODS Output” on page
VAR Statement 2511
2607. For information about using styles with PROC TABULATE, see “Using ODS
Styles with PROC TABULATE” on page 2522.
VAR Statement
Identifies numeric variables to use as analysis variables.
Alias: VARIABLES
Tip: You can use multiple VAR statements.
Example: “Example 14: Specifying Style Overrides for ODS Output” on page 2607
Syntax
VAR analysis-variable(s) </ options>;
Required Argument
analysis-variable(s);
identifies the analysis variables in the table. Analysis variables are numeric
variables for which PROC TABULATE calculates statistics. The values of an
analysis variable can be continuous or discrete.
Interaction If a variable name and a statistic name are the same, enclose the
statistic name in single or double quotation marks.
Optional Arguments
STYLE=style-override(s)
specifies one or more style overrides for analysis variable name headings. For
example, the following statement specifies that the background color for
analysis variable name headings is tan:
var actual / style=[background=tan];
style-override
specifies one or more style attributes or style elements to override the
default style element and attributes in a specific area of a report.
<PARENT>
specifies that the data cell use the style element of its parent heading.
Note: In this usage, the angle brackets around the word PARENT are
required. Braces or square brackets cannot be substituted in the syntax.
style-attribute-name
specifies the attribute to change.
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
style-attribute-value
specifies a value for the attribute. Each attribute has a different set of
valid values. A SAS format can also be used as an attribute value for
conditional formatting.
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
VAR Statement 2513
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
style-element-name
is the name of a style element that is part of an ODS style template.
See For information about using styles with PROC TABULATE, see
“Using ODS Styles with PROC TABULATE” on page 2522.
For a table of default style attributes and style elements for each
ODS destination, see “Style Elements and Style Attributes for Table
Regions” on page 2530.
Alias S=
Tips To override a style element that is specified in the VAR statement, you
can specify a style element in the related TABLE statement dimension
expression.
The use of STYLE= in the VAR statement differs slightly from its use
in the PROC TABULATE statement. In the VAR statement, inheritance
is different for rows and columns. For rows, the parent heading is
located to the left of the current heading. For columns, the parent
heading is located above the current heading.
See For information about using styles with PROC TABULATE, see “Using
ODS Styles with PROC TABULATE” on page 2522.
Example “Example 14: Specifying Style Overrides for ODS Output” on page
2607
WEIGHT=weight-variable
specifies a numeric variable whose values weight the values of the variables
that are specified in the VAR statement. The variable does not have to be an
integer. If the value of the weight variable is
Less than 0 Converts the value to zero and counts the observation in
the total number of observations
To exclude observations that contain negative and zero weights from the
analysis, use EXCLNPWGT. Note that most SAS/STAT procedures, such as
PROC GLM, exclude negative and zero weights by default.
Note Prior to Version 7 of SAS, the procedure did not exclude the
observations with missing weights from the count of observations.
Tips When you use the WEIGHT= option, consider which value of the
VARDEF= option is appropriate. See the discussion of
“VARDEF=divisor ” on page 2480.
WEIGHT Statement
Specifies weights for analysis variables in the statistical calculations.
See: For information about calculating weighted statistics and for an example that uses
the WEIGHT statement, see “Calculating Weighted Statistics” on page 83.
Syntax
WEIGHT variable;
Required Argument
variable
specifies a numeric variable whose values weight the values of the analysis
variables. The values of the variable do not have to be integers. PROC
TABULATE responds to weight values in accordance with the following table.
To exclude observations that contain negative and zero weights from the
analysis, use EXCLNPWGT. Note that most SAS/STAT procedures, such as
PROC GLM, exclude negative and zero weights by default.
Note: Prior to Version 7 of SAS, the procedure did not exclude the observations
with missing weights from the count of observations.
Tip When you use the WEIGHT statement, consider which value of the
VARDEF= option is appropriate. See the discussion of
“VARDEF=divisor ” on page 2480 and the calculation of weighted
statistics in the “Keywords and Formulas” on page 2700 section of
this document.
Note: If a variable name (class or analysis) and a statistic name are the same, then
enclose the statistic name in single quotation marks (for example, 'MAX').
COLPCTN PCTSUM
COLPCTSUM RANGE
2516 Chapter 70 / TABULATE Procedure
CSS REPPCTN
CV REPPCTSUM
LCLM ROWPCTSUM
MIN STDERR
MODE SUM
N SUMWGT
NMISS UCLM
PAGEPCTN USS
PAGEPCTSUM VAR
PCTN
P1 P70
P5 P80
P10 P90
P20 P95
P30 P99
P40
P60
Q1|P25 QRANGE
PROBT | PRT T
These statistics, the formulas that are used to calculate them, and their data
requirements are discussed in the “Keywords and Formulas” on page 2700 section
of this document.
To compute standard error of the mean (STDERR) or Student's t-test, you must use
the default value of the VARDEF= option, which is DF. The VARDEF= option is
specified in the PROC TABULATE statement.
Use both LCLM and UCLM to compute a two-sided confidence limit for the mean.
Use only LCLM or UCLM to compute a one-sided confidence limit. Use the ALPHA=
option in the PROC TABULATE statement to specify a confidence level.
User-defined formats are particularly useful for grouping values into fewer
categories. For example, if you have a class variable, Age, with values that range
from 1 to 99, then you could create a user-defined format that groups the ages so
that your tables contain a manageable number of categories. The following PROC
FORMAT step creates a format that condenses all possible values of age into six
groups of values.
proc format;
value agefmt 0-29='Under 30'
30-39='30-39'
40-49='40-49'
50-59='50-59'
60-69='60-69'
other='70 or over';
run;
For information about creating user-defined formats, see Chapter 30, “FORMAT
Procedure,” on page 1075.
By default, PROC TABULATE includes in a table only those formats for which the
frequency count is not zero and for which values are not missing. To include missing
values for all class variables in the output, use the MISSING option in the PROC
TABULATE statement, and to include missing values for selected class variables,
use the MISSING option in a CLASS statement. To include formats for which the
frequency count is zero, use the PRELOADFMT option in a CLASS statement and
2518 Chapter 70 / TABULATE Procedure
the PRINTMISS option in the TABLE statement, or use the CLASSDATA= option in
the PROC TABULATE statement.
Note: You cannot modify the format for printing values in table cells by using the
FORMAT or the ATTRIB statement. If you use these statements, the analysis
variable formats that they contain will be ignored.
PROC TABULATE determines the format to use for a particular cell from the
following default order of precedence for formats:
1 If no other formats are specified, then PROC TABULATE uses the default format
(12.2).
2 The FORMAT= option in the PROC TABULATE statement changes the default
format. If no format modifiers affect a cell, then PROC TABULATE uses this
format for the value in that cell.
3 A format modifier in the page dimension applies to the values in all the table
cells on the logical page unless you specify another format modifier for a cell in
the row or column dimension.
4 A format modifier in the row dimension applies to the values in all the table
cells in the row unless you specify another format modifier for a cell in the
column dimension.
5 A format modifier in the column dimension applies to the values in all the table
cells in the column.
Calculating Percentages
These statistics calculate the most commonly used percentages. See “Example 12:
Calculating Various Percentage Statistics” on page 2589 for an example.
You place a denominator definition in angle brackets (< and >) next to the PCTN or
PCTSUM statistic. The denominator definition specifies which categories to sum
for the denominator.
The TABLE statement creates a row for each value of Division and a column for
each value of Type. Within each row, the TABLE statement nests four statistics: N
and three different calculations of PCTN. (See the following figure.) Each
occurrence of PCTN uses a different denominator definition.
Figure 70.4 Three Different Uses of the PCTN Statistic with Frequency Counts
Highlighted
1 <type> sums the frequency counts for all occurrences of Type within the same
value of Division. Thus, for Division=1, the denominator is 6 + 6, or 12.
2 <division> sums the frequency counts for all occurrences of Division within the
same value of Type. Thus, for Type=1, the denominator is 6 + 3 + 8 + 5, or 22.
3 The third use of PCTN has no denominator definition. Omitting a denominator
definition is the same as including all class variables in the denominator
Usage: TABULATE Procedure 2521
The TABLE statement creates a row for each value of Division and a column for
each value of Type. Because Type is crossed with Expenditures, the value in each
cell is the sum of the values of Expenditures for all observations that contribute to
the cell. Within each row, the TABLE statement nests four statistics: SUM and
three different calculations of PCTSUM. (See the following figure.) Each occurrence
of PCTSUM uses a different denominator definition.
2522 Chapter 70 / TABULATE Procedure
Figure 70.5 Three Different Uses of the PCTSUM Statistic with Sums Highlighted
1 <type> sums the values of Expenditures for all occurrences of Type within the
same value of Division. Thus, for Division=1, the denominator is $7,477 + $5,129.
2 <division> sums the frequency counts for all occurrences of Division within the
same value of Type. Thus, for Type=1, the denominator is $7,477 + $19,379 +
$5,476 + $13,959.
3 The third use of PCTN has no denominator definition. Omitting a denominator
definition is the same as including all class variables in the denominator
definition. Thus, for all cells, the denominator is $7,477 + $19,379 + $5,476 +
$13,959 + $5,129 + $15,078 + $4,729 + $12,619.
elements: columns, headers, and footers. Each table element can specify the use of
one or more style elements for various parts of the output. These style elements
cannot be specified within the syntax of the procedure, but you can use customized
styles for the ODS destinations that you use. For more information about
customizing tables and styles, see “TEMPLATE Procedure: Creating a Style
Template” in SAS Output Delivery System: Procedures Guide.
The Base SAS reporting procedures, PROC PRINT, PROC REPORT, and PROC
TABULATE, enable you to quickly analyze your data and organize it into easy-to-
read tables. You can use the STYLE= option with these procedure statements to
modify the appearance of your report. The STYLE= option enables you to make
changes in sections of output without changing the default style for all of the
output. You can customize specific sections of procedure output by specifying the
STYLE= option in specific statements within the procedure.
The following program uses the STYLE= option to create the colors in the PROC
TABULATE output below:
proc sort data=sashelp.prdsale out=prdsale;
by Country;
run;
Each style attribute specifies a value for one aspect of the presentation. For
example, the BACKGROUNDCOLOR= attribute specifies the color for the
background of an HTML table or for a colored table in printed output. The
FONTSTYLE= attribute specifies whether to use a Roman font or an italic font.
Note: Because styles control the presentation of the data, they have no effect on
output objects that go to the LISTING, DOCUMENT, or OUTPUT destination.
Available styles are in the SASHELP.TMPLMST item store. In SAS Enterprise Guide,
the list of style sheets is shown by the Style Wizard. In batch mode or SAS Studio,
you can display the list of available style templates by using the LIST statement in
PROC TEMPLATE:
proc template;
list styles / store=sashelp.tmplmst;
run;
For complete information about viewing ODS styles, see “Viewing ODS Styles
Supplied by SAS” in SAS Output Delivery System: Advanced Topics.
By default, HTML 4 output uses the HTMLBlue style template and HTML 5 output
uses the HTMLEncore style template. To help you become familiar with styles,
style elements, and style attributes, look at the relationship between them.
You can use the SOURCE statement in PROC TEMPLATE to display the structure
of a style template. The following code prints the structure of the HTMLBlue style
template to the SAS log:
proc template;
source styles.HTMLBlue;
run;
The following figure illustrates the structure of a style. The figure shows the
relationship between the style, the style elements, and the style attributes.
2526 Chapter 70 / TABULATE Procedure
The following list corresponds to the numbered items in the preceding figure:
You can create new styles with the “DEFINE STYLE Statement” in SAS Output
Delivery System: Procedures Guide. New styles can be created independently or
from an existing style. You can use “PARENT= Statement” in SAS Output
Delivery System: Procedures Guide to create a new style from an existing style.
For complete documentation about ODS styles, see “Style Templates” in SAS
Output Delivery System: Advanced Topics.
2 Header and Footer are examples of style elements. A style element is a
collection of style attributes that apply to a particular part of the output for a
SAS program. For example, a style element might contain instructions for the
presentation of column headings or for the presentation of the data inside table
cells. Style elements might also specify default colors and fonts for output that
uses the style. Style elements exist inside styles and consist of one or more
style attributes. Style elements can be user-defined or supplied by SAS. User-
defined style elements can be created by the “STYLE Statement” in SAS Output
Delivery System: Procedures Guide.
Note: For a list of the default style elements used for HTML and markup
languages and their inheritance, see “Style Elements” in SAS Output Delivery
System: Advanced Topics.
The following table shows commonly used style attributes that you can set with
the STYLE= option in PROC PRINT, PROC TABULATE, and PROC REPORT. Most of
these attributes apply to parts of the table other than cells (for example, table
borders and the lines between columns and rows). Note that not all attributes are
valid in all destinations. For more information about these style attributes, their
valid values, and their applicable destinations, see “Style Attributes Tables” in SAS
Output Delivery System: Advanced Topics.
Table 70.4 Style Attributes for PROC REPORT, PROC TABULATE, and PROC PRINT
PROC
REPORT PROC PROC
Areas: TABULATE PRINT:
PROC CALLDEF, STATEMENTS all
REPORT COLUMN, PROC VAR, CLASS, PROC locations
STATEMENT HEADER, TABULATE BOX, PRINT other
REPORT LINES, STATEMENT CLASSLEV, TABLE than
Attribute Area SUMMARY TABLE KEYWORD location TABLE
ASIS= X X X X
2528 Chapter 70 / TABULATE Procedure
PROC
REPORT PROC PROC
Areas: TABULATE PRINT:
PROC CALLDEF, STATEMENTS all
REPORT COLUMN, PROC VAR, CLASS, PROC locations
STATEMENT HEADER, TABULATE BOX, PRINT other
REPORT LINES, STATEMENT CLASSLEV, TABLE than
Attribute Area SUMMARY TABLE KEYWORD location TABLE
BACKGROUNDCOLO X X X X X X
R=
BACKGROUNDIMAG X X X X X X
E=
BORDERBOTTOMCO X X X
LOR=
BORDERBOTTOMST X X X X
YLE=
BORDERBOTTOMWI X X X X
DTH=
BORDERLEFTCOLOR X X X
=
BORDERLEFTSTYLE X X X X
=
BORDERLEFTWIDTH X X X X
=
BORDERCOLOR= X X X X X
BORDERCOLORDAR X X X X X X
K=
BORDERCOLORLIGH X X X X X X
T=
BORDERRIGHTCOLO X X X
R=
BORDERRIGHTSTYL X X X X
E=
BORDERRIGHTWIDT X X X X
H=
BORDERTOPCOLOR X X X
=
Usage: TABULATE Procedure 2529
PROC
REPORT PROC PROC
Areas: TABULATE PRINT:
PROC CALLDEF, STATEMENTS all
REPORT COLUMN, PROC VAR, CLASS, PROC locations
STATEMENT HEADER, TABULATE BOX, PRINT other
REPORT LINES, STATEMENT CLASSLEV, TABLE than
Attribute Area SUMMARY TABLE KEYWORD location TABLE
BORDERTOPSTYLE= X X X X
BORDERTOPWIDTH X X X X
=
BORDERWIDTH= X X X X X X
CELLPADDING= X X X
CELLSPACING= X X X
CELLWIDTH= X X X X X
CLASS= X X X X X X
COLOR= X X X
FLYOVER= X X X X
FONT= X X X X X X
FONTFAMILY= X X X X X X
FONTSIZE= X X X X X X
FONTSTYLE= X X X X X X
FONTWEIGHT= X X X X X X
FONTWIDTH= X X X X X
FRAME= X X X
HEIGHT= X X X X X
HREFTARGET= X X X
HTMLSTYLE= X X X X X
NOBREAKSPACE=2 X X X X
OUTPUTWIDTH= X X X X X
POSTHTML=1 X X X X X X
2530 Chapter 70 / TABULATE Procedure
PROC
REPORT PROC PROC
Areas: TABULATE PRINT:
PROC CALLDEF, STATEMENTS all
REPORT COLUMN, PROC VAR, CLASS, PROC locations
STATEMENT HEADER, TABULATE BOX, PRINT other
REPORT LINES, STATEMENT CLASSLEV, TABLE than
Attribute Area SUMMARY TABLE KEYWORD location TABLE
POSTIMAGE= X X X X X X
POSTTEXT=1 X X X X X X
PREHTML=1 X X X X X X
PREIMAGE= X X X X X X
PRETEXT=1 X X X X X X
PROTECTSPECIALC X X X X
HARS=
RULES= X X X
TAGATTR= X X X X X X
TEXTALIGN= X X X X X X
URL= X X X
VERTICALALIGN= X X X
WIDTH= X X X X X
1 When you use these attributes in this location, they affect only the text that is specified with the PRETEXT=,
POSTTEXT=, PREHTML=, and POSTHTML= attributes. To alter the foreground color or the font for the text that appears
in the table, you must set the corresponding attribute in a location that affects the cells rather than the table. For
complete documentation about style attributes and their values, see “Style Attributes” in SAS Output Delivery System:
Advanced Topics.
2 To help prevent unexpected wrapping of long text strings when using PROC REPORT with the ODS RTF destination, set
NOBREAKSPACE=OFF in a location that affects the LINE statement. The NOBREAKSPACE=OFF attribute must be set in
the PROC REPORT code either on the LINE statement or on the PROC REPORT statement where style(lines) is specified.
For complete documentation about the ODS destinations and their default styles,
see “Style Templates” in SAS Output Delivery System: Advanced Topics.
Table 70.5 Default Style Elements and Style Attributes for Table Regions
Specifications in the TABLE statement override the same specification in the PROC
TABULATE, CLASS, CLASSLEV, VAR, and KEYWORD statements. This enables you
to have different style behavior with multiple TABLE statements. However, any
style attributes that you specify in the PROC TABULATE statement and that you
do not override in the TABLE statement are inherited. For example, if you specify a
blue background and a white foreground for all data cells in the PROC TABULATE
statement, and you specify a gray background for the data cells of a particular
crossing in the TABLE statement, then the background for those data cells is gray,
and the foreground is white (as specified in the PROC TABULATE statement).
Detailed information about the STYLE= option is provided in the documentation for
individual statements.
Page dimension text and class variable name “CLASS Statement” (p. 2482)
headings
Table borders, rules, and other parts that are not “TABLE Statement” (p. 2496)
specified elsewhere
2 The STYLE= option in the PROC TABULATE statement changes the default
style attributes. If no other STYLE= option specifications affect a cell, then
PROC TABULATE uses these style attributes for that cell.
3 A STYLE= option that is specified in the page dimension applies to all the table
cells on the logical page unless you specify another STYLE= option for a cell in
the row or column dimension.
4 A STYLE= option that is specified in the row dimension applies to all the table
cells in the row unless you specify another STYLE= option for a cell in the
column dimension.
5 A STYLE= option that is specified in the column dimension applies to all the
table cells in the column.
n data cells
n keyword headings
The syntax for specifying style elements and style attributes in a dimension
expression is
[STYLE<(CLASSLEV)>=<style-element-name | PARENT>
[style-attribute-name-1=style-attribute-value-1< style-attribute-name-2=style-
attribute-value-2 ...>]]
dept={label='Department'
style=[color=red]}, N
n dept*[style=MyDataStyle], N
n dept*[format=12.2 style=MyDataStyle], N
Note: When used in a dimension expression, the STYLE= option must be enclosed
within square brackets ([ and ]) or braces ({ and }).
(CLASSLEV)
assigns a style element to a class variable level value heading. For example, the
following TABLE statement specifies that the level value heading for the class
variable, DEPT, has a foreground color of yellow:
table dept=[style(classlev)=
[color=yellow]]*sales;
For an example that shows how to specify style elements within dimension
expressions, see “Example 14: Specifying Style Overrides for ODS Output” on page
2607. For information about using styles with PROC TABULATE, see “Using ODS
Styles with PROC TABULATE” on page 2522.
2536 Chapter 70 / TABULATE Procedure
When the DATA= input data set references an in-memory table or view in CAS, the
TABULATE procedure can use CAS actions to perform a significant portion of its
work within the server. To reference an in-memory table or view, you must specify
the CAS engine LIBNAME statement and use the CAS engine libref with the input
table name.
By default, PROC TABULATE uses CAS processing whenever a CAS engine libref is
specified on the input table name.
In the following example, the LIBNAME statement assigns a CAS engine libref
named mycas that you use to connect to the CAS session casauto.
option casport=5570 cashost="cloud.example.com";
cas casauto ;
libname mycas cas;
data mycas.class;
set sashelp.class;
run;
CSS RANGE
CV STDERR
LCLM SUM
MAX SUMWGT
MEAN STD
MIN UCLM
N USS
NMISS VAR
Usage: TABULATE Procedure 2537
When SAS format definitions reside in CAS, formatting of class variables occurs in
CAS. If the SAS format definitions do not reside on the CAS server, the CAS
aggregation occurs on the raw values, and the relevant formats are applied by SAS
as the results set is merged into the PROC TABULATE internal structure. User-
defined formats that are created in SAS must be copied into CAS for them to work
as expected. It is a best practice to keep formats consistent between SAS and CAS.
For complete documentation about using user-defined formats with CAS, see SAS
Cloud Analytic Services: User-Defined Formats.
For information about how to use the CAS LIBNAME statement, see “CAS
LIBNAME Statement” in SAS Cloud Analytic Services: User’s Guide. For more
information about how procedures work with CAS processing, see Chapter 5, “CAS
Processing of Base Procedures,” on page 93.
When the DATA= input data set is stored as a table or view in a DBMS, the PROC
TABULATE procedure can use in-database processing to perform most of its work
within the database. In-database processing can provide the advantages of faster
processing and reduced data transfer between the database and SAS software.
If class variables are specified, the procedure creates an SQL GROUP BY clause
that represents the n-way type. Only the n-way class tree is generated on the
DBMS. The result set that is created when the aggregation query executes in the
database is read by SAS into the internal PROC TABULATE data structure.
When SAS format definitions have been deployed in the database, formatting of
class variables occurs in the database. If the SAS format definitions have not been
deployed in the database, the in-database aggregation occurs on the raw values,
and the relevant formats are applied by SAS as the results' set is merged into the
PROC TABULATE internal structures. Multi-label formatting is always done by SAS
using the initially aggregated result set that is returned by the database.
CSS RANGE
2538 Chapter 70 / TABULATE Procedure
CV STDERR
LCLM SUM
MAX SUMWGT
MEAN STD
MIN UCLM
N USS
NMISS VAR
n Aster
n DB2
n Google BigQuery
n Greenplum
n Hadoop
n HAWQ
n IMPALA
n Netezza
n Oracle
n PostgreSQL
n SAP HANA
n Snowflake
n Teradata
n Vertica
n Yellowbrick
To use the #BYVAR and #BYVAL substitutions, insert the item in the text string at
the position where you want the substitution text to appear. Both #BYVAR and
#BYVAL specifications must be followed by a delimiting character. The character
can be either a space or other non-alphanumeric character, such as a quotation
mark. If no delimiting character is provided, then the specification is ignored and its
text remains intact and is displayed with the rest of the string. To allow a #BYVAR
or #BYVAL substitution to be followed immediately by other text, with no
delimiter, use a trailing dot (as with macro variables). The trailing dot is not
displayed in the resolved text. If you want a period to be displayed as the last
character in the resolved text, use two dots after the #BYVAR or #BYVAL
substitution.
The substitution for #BYVAR or #BYVAL does not occur in the following cases:
n if you use a #BYVAR or #BYVAL specification for a variable that is not named in
the BY statement. For example, you might use #BYVAL2 when there is only one
BY-variable or #BYVAL(ABC) when ABC is non-existent or is not a BY-variable.
n if there is no BY statement
Missing Values
An observation contains a Excludes that observation from Use MISSING in the PROC
missing value for a class variable the table1 TABULATE statement, or
MISSING in the CLASS
statement
There are no data for a category Does not show the category in Use PRINTMISS in the TABLE
the table statement, or use CLASSDATA=
in the PROC TABULATE
statement
Every observation that Displays a missing value for any Use MISSTEXT= in the TABLE
contributes to a table cell statistics (except N and NMISS) statement
contains a missing value for an in that cell
analysis variable
There are no data for a Does not display that formatted Use PRELOADFMT in the CLASS
formatted value value in the table statement with PRINTMISS in
the TABLE statement, or use
CLASSDATA= in the PROC
TABULATE statement, or add
dummy observations to the
input data set so that it contains
data for each formatted value
A FREQ variable value is missing Does not use that observation to No alternative
or is less than 1 calculate statistics
This section presents a series of PROC TABULATE steps that illustrate how PROC
TABULATE treats missing values. The following program creates the data set and
formats that are used in this section and prints the data set. The data set
COMPREV contains no missing values. (See the output below.)
proc format;
value cntryfmt 1='United States'
2='Japan';
value compfmt 1='Supercomputer'
2='Mainframe'
3='Midrange'
4='Workstation'
5='Personal Computer'
6='Laptop';
run;
1. The CLASS statement applies to all TABLE statements in a PROC TABULATE step. Therefore, if you define a variable as
a class variable, PROC TABULATE omits observations that have missing values for that variable even if you do not use
the variable in a TABLE statement.
Results: TABULATE Procedure 2541
data comprev;
input Country Computer Rev90 Rev91 Rev92;
datalines;
1 1 788.8 877.6 944.9
1 2 12538.1 9855.6 8527.9
1 3 9815.8 6340.3 8680.3
1 4 3147.2 3474.1 3722.4
1 5 18660.9 18428.0 23531.1
2 1 469.9 495.6 448.4
2 2 5697.6 6242.4 5382.3
2 3 5392.1 5668.3 4845.9
2 4 1511.6 1875.5 1924.5
2 5 4746.0 4600.8 4363.7
;
proc print data=comprev noobs;
format country cntryfmt. computer compfmt.;
title 'The Data Set COMPREV';
run;
No Missing Values
The following PROC TABULATE step produces the following output:
proc tabulate data=comprev;
class country computer;
var rev90 rev91 rev92;
table computer*country,rev90 rev91 rev92 /
rts=32;
format country cntryfmt. computer compfmt.;
title 'Revenues from Computer Sales';
title2 'for 1990 to 1992';
run;
2542 Chapter 70 / TABULATE Procedure
Because the data set contains no missing values, the table includes all
observations. All headings and cells contain nonmissing values.
The observation with a missing value for Computer was the category Midrange,
Japan. This category no longer exists. By default, PROC TABULATE ignores
Results: TABULATE Procedure 2543
observations with missing values for a class variable, so this table contains one less
row than the output “Computer Sales Data: No Missing Values”.
This table includes a category with missing values of Computer. This category
makes up the first row of data in the table.
2544 Chapter 70 / TABULATE Procedure
run;
In this table, the missing value appears as the text that the MISSCOMP. format
specifies.
Output 70.9 Computer Sales Data: Text Supplied for Missing Computer Value
This table contains a row for the category No type given, United States and the
category Midrange, Japan. Because there are no data in these categories, the
values for the statistics are all missing.
2546 Chapter 70 / TABULATE Procedure
This table replaces the period normally used to display missing values with the text
of the MISSTEXT= option.
Results: TABULATE Procedure 2547
Output 70.11 Computer Sales Data: Text Supplied for Missing Statistics Values
If you want to include headings for all possible values of Computer (perhaps to
make it easier to compare the output with tables that are created later when you do
have data for laptops), then you have three different ways to create such a table:
n Use the PRELOADFMT option in the CLASS statement with the PRINTMISS
option in the TABLE statement. See “Example 3: Using Preloaded Formats with
Class Variables” on page 2556 for another example that uses PRELOADFMT.
n Use the CLASSDATA= option in the PROC TABULATE statement. See “Example
2: Specifying Class Variable Combinations to Appear in a Table” on page 2553
for an example that uses the CLASSDATA= option.
n Add dummy values to the input data set so that each value that the format
handles appears at least once in the data set.
The following program adds the PRELOADFMT option to a CLASS statement that
contains the relevant variable.
Output 70.12 Computer Sales Data: All Possible Computer Values Included
For this technique to work, the first value of the first class variable must occur in
the data with all possible values of all the other class variables. If this criterion is
not met, then the order of the headings might surprise you.
The following program creates a simple data set in which the observations are
ordered first by the values of Animal, then by the values of Food. The ORDER=
option in the PROC TABULATE statement orders the heading for the class variables
by the order of their appearance in the data set. (See the following output.)
Although bones is the first value for Food in the group of observations where
Animal=dog, all other values for Food appear before bones in the data set because
bones never appears when Animal=cat. Therefore, the heading for bones in the
table in the following output is not in alphabetical order.
In other words, PROC TABULATE maintains for subsequent categories the order
that was established by earlier categories. If you want to re-establish the order of
Food for each value of Animal, then use BY-group processing. PROC TABULATE
creates a separate table for each BY group, so that the ordering can differ from one
BY group to the next.
data foodpref;
input Animal $ Food $;
datalines;
cat fish
cat meat
cat milk
dog bones
dog fish
dog meat
;
Details
The following example program does the following:
n creates a category for each type of user (residential or business) in each division
of each region
Example 1: Creating a Basic Two-Dimensional Table 2551
Program
data energy;
length State $2;
input Region Division state $ Type Expenditures;
datalines;
1 1 ME 1 708
1 1 ME 2 379
4 4 HI 1 273
4 4 HI 2 298
;
proc format;
value regfmt 1='Northeast'
2='South'
3='Midwest'
4='West';
value divfmt 1='New England'
2='Middle Atlantic'
3='Mountain'
4='Pacific';
value usetype 1='Residential Customers'
2='Business Customers';
run;
proc tabulate data=energy format=dollar12.;
class region division type;
var expenditures;
table region*division,
type*expenditures
/ rts=25;
format region regfmt. division divfmt. type usetype.;
title 'Energy Expenditures for Each Region';
title2 '(millions of dollars)';
run;
Program Description
2552 Chapter 70 / TABULATE Procedure
Create the ENERGY data set. ENERGY contains data on expenditures of energy for
business and residential customers in individual states in the Northeast and West
regions of the United States. A DATA step on page 2788 creates the data set.
data energy;
length State $2;
input Region Division state $ Type Expenditures;
datalines;
1 1 ME 1 708
1 1 ME 2 379
4 4 HI 1 273
4 4 HI 2 298
;
Create the REGFMT., DIVFMT., and USETYPE. formats. PROC FORMAT creates
formats for Region, Division, and Type.
proc format;
value regfmt 1='Northeast'
2='South'
3='Midwest'
4='West';
value divfmt 1='New England'
2='Middle Atlantic'
3='Mountain'
4='Pacific';
value usetype 1='Residential Customers'
2='Business Customers';
run;
Specify the table options. The FORMAT= option specifies DOLLAR12. as the
default format for the value in each table cell.
proc tabulate data=energy format=dollar12.;
Specify subgroups for the analysis. The CLASS statement separates the analysis
by values of Region, Division, and Type.
class region division type;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE
calculate statistics on the Expenditures variable.
var expenditures;
Define the table rows and columns. The TABLE statement creates a row for each
formatted value of Region. Nested within each row are rows for each formatted
value of Division. The TABLE statement also creates a column for each formatted
value of Type. Each cell that is created by these rows and columns contains the
sum of the analysis variable Expenditures for all observations that contribute to
that cell.
table region*division,
type*expenditures
Specify the row title space. RTS= provides 25 characters per line for row headings.
Example 2: Specifying Class Variable Combinations to Appear in a Table 2553
/ rts=25;
Format the output. The FORMAT statement assigns formats to the variables
Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Output
Output 70.14 Basic Two-Dimensional Table
DATA step
FORMAT statement
TITLE statement
Data set: ENERGY
Details
This example does the following:
n uses the CLASSDATA= option to specify combinations of class variables to
appear in a table.
n uses the EXCLUSIVE option to restrict the output to only the combinations
specified in the CLASSDATA= data set. Without the EXCLUSIVE option, the
output would be the same as in “Example 1: Creating a Basic Two-Dimensional
Table” on page 2550.
Program
data classes;
input region division type;
datalines;
1 1 1
1 1 2
4 4 1
4 4 2
;
proc tabulate data=energy format=dollar12.
classdata=classes exclusive;
Program Description
Create the CLASSES data set. CLASSES contains the combinations of class
variable values that PROC TABULATE uses to create the table.
data classes;
input region division type;
datalines;
1 1 1
1 1 2
4 4 1
4 4 2
;
Specify the table options. CLASSDATA= and EXCLUSIVE restrict the class level
combinations to those that are specified in the CLASSES data set.
proc tabulate data=energy format=dollar12.
classdata=classes exclusive;
Specify subgroups for the analysis. The CLASS statement separates the analysis
by values of Region, Division, and Type.
class region division type;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE
calculate statistics on the Expenditures variable.
var expenditures;
Define the table rows and columns. The TABLE statement creates a row for each
formatted value of Region. Nested within each row are rows for each formatted
value of Division. The TABLE statement also creates a column for each formatted
value of Type. Each cell that is created by these rows and columns contains the
sum of the analysis variable Expenditures for all observations that contribute to
that cell.
table region*division,
type*expenditures
Specify the row title space. RTS= provides 25 characters per line for row headings.
/ rts=25;
Format the output. The FORMAT statement assigns formats to the variables
Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Output
Output 70.15 Energy Expenditures for Each Region
Details
This example does the following:
Example 3: Using Preloaded Formats with Class Variables 2557
Program
proc tabulate data=energy format=dollar12.;
class region division type / preloadfmt;
var expenditures;
table region*division,
type*expenditures / rts=25 printmiss;
format region regfmt. division divfmt. type usetype.;
title 'Energy Expenditures for Each Region';
title2 '(millions of dollars)';
run;
proc tabulate data=energy format=dollar12. out=tabdata;
class region division type / preloadfmt exclusive;
var expenditures;
table region*division,
type*expenditures / rts=25;
format region regfmt. division divfmt. type usetype.;
title 'Energy Expenditures for Each Region';
title2 '(millions of dollars)';
run;
proc print data=tabdata;
run;
Program Description
Specify the table options. The FORMAT= option specifies DOLLAR12. as the
default format for the value in each table cell.
proc tabulate data=energy format=dollar12.;
Specify subgroups for the analysis. The CLASS statement separates the analysis
by values of Region, Division, and Type. PRELOADFMT specifies that PROC
TABULATE use the preloaded values of the user-defined formats for the class
variables.
class region division type / preloadfmt;
2558 Chapter 70 / TABULATE Procedure
Specify the analysis variable. The VAR statement specifies that PROC TABULATE
calculate statistics on the Expenditures variable.
var expenditures;
Define the table rows and columns, and specify row and column options.
PRINTMISS specifies that all possible combinations of user-defined formats be
used as the levels of the class variables.
table region*division,
type*expenditures / rts=25 printmiss;
Format the output. The FORMAT statement assigns formats to the variables
Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Specify the table options and the output data set. The OUT= option specifies the
name of the output data set to which PROC TABULATE writes the data.
proc tabulate data=energy format=dollar12. out=tabdata;
Specify subgroups for the analysis. The EXCLUSIVE option, when used with
PRELOADFMT, uses only the preloaded range of user-defined formats as the levels
of class variables.
class region division type / preloadfmt exclusive;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE
calculate statistics on the Expenditures variable.
var expenditures;
Define the table rows and columns, and specify row and column options. The
PRINTMISS option is not specified in this case. If it were, then it would override the
EXCLUSIVE option in the CLASS statement.
table region*division,
type*expenditures / rts=25;
Format the output. The FORMAT statement assigns formats to the variables
Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Output
This output, created with the PRELOADFMT and PRINTMISS options, contains all
possible combinations of preloaded user-defined formats for the class variable
values. It includes combinations with zero frequencies, and combinations that make
no sense, such as Northeast and Pacific.
This output, created with the PRELOADFMT and EXCLUSIVE options, contains
only those combinations of preloaded user-defined formats for the class variable
values that appear in the input data set. This output is identical to the output from
“Example 1: Creating a Basic Two-Dimensional Table” on page 2550.
2560 Chapter 70 / TABULATE Procedure
This output shows the output data set TABDATA, which was created by the OUT=
option in the PROC TABULATE statement. TABDATA contains the data that is
created by having the PRELOADFMT and EXCLUSIVE options specified.
Details
This example does the following:
n shows how to specify a multilabel format in the VALUE statement of PROC
FORMAT
n shows how to activate multilabel format processing using the MLF option with
the CLASS statement
n demonstrates the behavior of the N statistic when multilabel format processing
is activated
Program
data carsurvey;
input Rater Age Progressa Remark Jupiter Dynamo;
datalines;
1 38 94 98 84 80
2 49 96 84 80 77
3 16 64 78 76 73
4 27 89 73 90 92
77 61 92 88 77 85
78 24 87 88 88 91
79 18 54 50 62 74
80 62 90 91 90 86
;
proc format;
value agefmt (multilabel notsorted)
15 - 29 = 'Below 30 years'
30 - 50 = 'Between 30 and 50'
2562 Chapter 70 / TABULATE Procedure
Program Description
Create the CARSURVEY data set. CARSURVEY contains data from a survey that
was distributed by a car manufacturer to a focus group of potential customers who
were brought together to evaluate new car names. Each observation in the data set
contains an identification number, the participant's age, and the participant's ratings
of four car names. A DATA step creates the data set.
data carsurvey;
input Rater Age Progressa Remark Jupiter Dynamo;
datalines;
1 38 94 98 84 80
2 49 96 84 80 77
3 16 64 78 76 73
4 27 89 73 90 92
77 61 92 88 77 85
78 24 87 88 88 91
79 18 54 50 62 74
80 62 90 91 90 86
;
Create the AGEFMT. format. The FORMAT procedure creates a multilabel format
for ages by using the “MULTILABEL” on page 1124. A multilabel format is one in
which multiple labels can be assigned to the same value, in this case because of
overlapping ranges. Each value is represented in the table for each range in which it
occurs. The NOTSORTED option stores the ranges in the order in which they are
defined.
proc format;
value agefmt (multilabel notsorted)
Example 4: Using Multilabel Formats 2563
15 - 29 = 'Below 30 years'
30 - 50 = 'Between 30 and 50'
51 - high = 'Over 50 years'
15 - 19 = '15 to 19'
20 - 25 = '20 to 25'
25 - 39 = '25 to 39'
40 - 55 = '40 to 55'
56 - high = '56 and above';
run;
Specify the table options. The FORMAT= option specifies up to 10 digits as the
default format for the value in each table cell.
proc tabulate data=carsurvey format=10.;
Specify subgroups for the analysis. The CLASS statement identifies Age as the
class variable and uses the MLF option to activate multilabel format processing.
class age / mlf;
Specify the analysis variables. The VAR statement specifies that PROC
TABULATE calculate statistics on the Progressa, Remark, Jupiter, and Dynamo
variables.
var progressa remark jupiter dynamo;
Define the table rows and columns. The row dimension of the TABLE statement
creates a row for each formatted value of Age. Multilabel formatting allows an
observation to be included in multiple rows or age categories. The row dimension
uses the ALL class variable to summarize information for all rows. The column
dimension uses the N statistic to calculate the number of observations for each age
group. Notice that the result of the N statistic crossed with the ALL class variable
in the row dimension is the total number of observations instead of the sum of the
N statistics for the rows. The column dimension uses the ALL class variable at the
beginning of a crossing to assign a label, Potential Car Names. The four nested
columns calculate the mean ratings of the car names for each age group.
table age all, n all='Potential Car Names'*(progressa remark
jupiter dynamo)*mean;
Format the output. The FORMAT statement assigns the user-defined format
AGEFMT. to Age for this analysis.
format age agefmt.;
run;
2564 Chapter 70 / TABULATE Procedure
Output
Output 70.19 Rating Four Potential Car Names
Details
This example shows how to customize row and column headings. A label specifies
text for a heading. A blank label creates a blank heading. PROC TABULATE
removes the space for blank column headings from the table.
Program
proc tabulate data=energy format=dollar12.;
class region division type;
var expenditures;
table region*division,
type='Customer Base'*expenditures=' '*sum=' '
/ rts=25;
format region regfmt. division divfmt. type usetype.;
title 'Energy Expenditures for Each Region';
title2 '(millions of dollars)';
run;
Program Description
Specify the table options. The FORMAT= option specifies DOLLAR12. as the
default format for the value in each table cell.
proc tabulate data=energy format=dollar12.;
Specify subgroups for the analysis. The CLASS statement identifies Region,
Division, and Type as class variables.
class region division type;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE
calculate statistics on the Expenditures variable.
var expenditures;
Define the table rows and columns. The TABLE statement creates a row for each
formatted value of Region. Nested within each row are rows for each formatted
value of Division. The TABLE statement also creates a column for each formatted
value of Type. Each cell that is created by these rows and columns contains the
sum of the analysis variable Expenditures for all observations that contribute to
that cell. Text in quotation marks specifies headings for the corresponding variable
or statistic. Although Sum is the default statistic, it is specified here so that you can
specify a blank for its heading.
table region*division,
2566 Chapter 70 / TABULATE Procedure
Specify the row title space. RTS= provides 25 characters per line for row headings.
/ rts=25;
Format the output. The FORMAT statement assigns formats to Region, Division,
and Type.
format region regfmt. division divfmt. type usetype.;
Output
The heading for Type contains text that is specified in the TABLE statement. The
TABLE statement eliminated the headings for Expenditures and Sum.
Details
This example shows how to use the universal class variable ALL to summarize
information from multiple categories.
Program
proc tabulate data=energy format=comma12.;
class region division type;
var expenditures;
table region*(division all='Subtotal')
all='Total for All Regions'*f=dollar12.,
type='Customer Base'*expenditures=' '*sum=' '
all='All Customers'*expenditures=' '*sum=' '
/ rts=25;
format region regfmt. division divfmt. type usetype.;
title 'Energy Expenditures for Each Region';
title2 '(millions of dollars)';
run;
Program Description
Specify the table options. The FORMAT= option specifies COMMA12. as the
default format for the value in each table cell.
proc tabulate data=energy format=comma12.;
Specify subgroups for the analysis. The CLASS statement identifies Region,
Division, and Type as class variables.
class region division type;
2568 Chapter 70 / TABULATE Procedure
Specify the analysis variable. The VAR statement specifies that PROC TABULATE
calculate statistics on the Expenditures variable.
var expenditures;
table region*(division all='Subtotal')
all='Total for All Regions'*f=dollar12.,
type='Customer Base'*expenditures=' '*sum=' '
all='All Customers'*expenditures=' '*sum=' '
Specify the row title space. RTS= provides 25 characters per line for row headings.
/ rts=25;
Format the output. The FORMAT statement assigns formats to the variables
Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Output
The universal class variable ALL provides subtotals and totals in this table.
Details
This example shows how to eliminate blank row headings from a table. To do so,
you must both provide blank labels for the row headings and specify ROW=FLOAT
in the TABLE statement.
Program
proc tabulate data=energy format=dollar12.;
class region division type;
var expenditures;
table region*division*expenditures=' '*sum=' ',
type='Customer Base'
/ rts=25 row=float;
format region regfmt. division divfmt. type usetype.;
title 'Energy Expenditures for Each Region';
title2 '(millions of dollars)';
run;
2570 Chapter 70 / TABULATE Procedure
Program Description
Specify the table options. The FORMAT= option specifies DOLLAR12. as the
default format for the value in each table cell.
proc tabulate data=energy format=dollar12.;
Specify subgroups for the analysis. The CLASS statement identifies Region,
Division, and Type as class variables.
class region division type;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE
calculate statistics on the Expenditures variable.
var expenditures;
Define the table rows. The row dimension of the TABLE statement creates a row
for each formatted value of Region. Nested within these rows is a row for each
formatted value of Division. The analysis variable Expenditures and the Sum
statistic are also included in the row dimension, so PROC TABULATE creates row
headings for them as well. The text in quotation marks specifies the headings for
the corresponding variable or statistic. Although Sum is the default statistic, it is
specified here so that you can specify a blank for its heading.
table region*division*expenditures=' '*sum=' ',
Define the table columns. The column dimension of the TABLE statement creates
a column for each formatted value of Type.
type='Customer Base'
Specify the row title space and eliminate blank row headings. RTS= provides 25
characters per line for row headings. ROW=FLOAT eliminates blank row headings.
/ rts=25 row=float;
Format the output. The FORMAT statement assigns formats to the variables
Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Output
Compare this table with the output in “Example 5: Customizing Row and Column
Headings” on page 2564. The two tables are identical, but the program that creates
this table uses Expenditures and Sum in the row dimension. PROC TABULATE
automatically eliminates blank headings from the column dimension, whereas you
must specify ROW=FLOAT to eliminate blank headings from the row dimension.
Example 8: Indenting Row Headings and Eliminating Horizontal Separators 2571
Details
This example shows how to condense the structure of a table by doing the
following:
n removing row headings for class variables
n indenting nested rows underneath parent rows instead of placing them next to
each other
n eliminating horizontal separator lines from the row titles and the body of the
table
Program
options nodate nonumber;
ods listing;
proc tabulate data=energy format=dollar12. noseps;
class region division type;
var expenditures;
table region*division,
type='Customer Base'*expenditures=' '*sum=' '
/ rts=25 indent=4;
format region regfmt. division divfmt. type usetype.;
title 'Energy Expenditures for Each Region';
title2 '(millions of dollars)';
run;
ods listing close;
Program Description
Open the LISTING destination. The INDENT argument does not indent nested row
headings for HTML output. The output will be captured as a listing with page
numbering and date turned off.
options nodate nonumber;
ods listing;
Specify the table options. The FORMAT= option specifies DOLLAR12. as the
default format for the value in each table cell. NOSEPS eliminates horizontal
separator lines from row titles and from the body of the table.
proc tabulate data=energy format=dollar12. noseps;
Example 8: Indenting Row Headings and Eliminating Horizontal Separators 2573
Specify subgroups for the analysis. The CLASS statement identifies Region,
Division, and Type as class variables.
class region division type;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE
calculate statistics on the Expenditures variable.
var expenditures;
Define the table rows and columns. The TABLE statement creates a row for each
formatted value of Region. Nested within each row are rows for each formatted
value of Division. The TABLE statement also creates a column for each formatted
value of Type. Each cell that is created by these rows and columns contains the
sum of the analysis variable Expenditures for all observations that contribute to
that cell. Text in quotation marks in all dimensions specifies headings for the
corresponding variable or statistic. Although Sum is the default statistic, it is
specified here so that you can specify a blank for its heading.
table region*division,
type='Customer Base'*expenditures=' '*sum=' '
Specify the row title space and indention value. RTS= provides 25 characters per
line for row headings. INDENT= removes row headings for class variables, places
values for Division beneath values for Region rather than beside them, and indents
values for Division four spaces.
/ rts=25 indent=4;
Format the output. The FORMAT statement assigns formats to the variables
Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Output
NOSEPS removes the separator lines from the row titles and the body of the table.
INDENT= eliminates the row headings for Region and Division and indents values
for Division underneath values for Region.
2574 Chapter 70 / TABULATE Procedure
Details
This example creates a separate table for each region and one table for all regions.
By default, PROC TABULATE creates each table on a separate page, but the
CONDENSE option places them all on the same page.
Example 9: Creating Multipage Tables 2575
Program
proc tabulate data=energy format=dollar12.;
class region division type;
var expenditures;
table region='Region: ' all='All Regions',
division all='All Divisions',
type='Customer Base'*expenditures=' '*sum=' '
/ rts=25 box=_page_ condense indent=1;
format region regfmt. division divfmt. type usetype.;
title 'Energy Expenditures for Each Region and All Regions';
title2 '(millions of dollars)';
run;
Program Description
Specify the table options. The FORMAT= option specifies DOLLAR12. as the
default format for the value in each table cell.
proc tabulate data=energy format=dollar12.;
Specify subgroups for the analysis. The CLASS statement identifies Region,
Division, and Type as class variables.
class region division type;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE
calculate statistics on the Expenditures variable.
var expenditures;
Define the table pages. The page dimension of the TABLE statement creates one
table for each formatted value of Region and one table for all regions. Text in
quotation marks provides the heading for each page.
table region='Region: ' all='All Regions',
Define the table rows. The row dimension creates a row for each formatted value
of Division and a row for all divisions. Text in quotation marks provides the row
headings.
division all='All Divisions',
Define the table columns. The column dimension of the TABLE statement creates
a column for each formatted value of Type. Each cell that is created by these pages,
rows, and columns contains the sum of the analysis variable Expenditures for all
observations that contribute to that cell. Text in quotation marks specifies
headings for the corresponding variable or statistic. Although Sum is the default
statistic, it is specified here so that you can specify a blank for its heading.
type='Customer Base'*expenditures=' '*sum=' '
2576 Chapter 70 / TABULATE Procedure
Specify additional table options. RTS= provides 25 characters per line for row
headings. BOX= places the page heading inside the box above the row headings.
CONDENSE places as many tables as possible on one physical page. INDENT=
eliminates the row heading for Division. (Because there is no nesting in the row
dimension, there is nothing to indent.)
/ rts=25 box=_page_ condense indent=1;
Format the output. The FORMAT statement assigns formats to the variables
Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Output
Output 70.24 Energy Expenditures for Each Region and All Regions
DATA step
FORMAT procedure
FOOTNOTE statement
OPTIONS statement options
FORMDLIM=
NONUMBER
SYMPUT routine
TITLE statement
Data set: CUSTOMER_RESPONSE
Details
The two tables in this example show the following:
n which factors most influenced customers' decisions to buy products
The reports appear on one physical page with only one page number. By default,
they would appear on separate pages.
In addition to showing how to create these tables, this example shows how to do
the following:
n use a DATA step to count the number of observations in a data set
The following figure shows the survey form that is used to collect data.
Program
data customer_response;
input Customer Factor1-Factor4 Source1-Source3
Quality1-Quality3;
datalines;
1 . . 1 1 1 1 . 1 . .
2 1 1 . 1 1 1 . 1 1 .
3 . . 1 1 1 1 . . . .
119 . . . 1 . . . 1 . .
120 1 1 . 1 . . . . 1 .
;
data _null_;
if 0 then set customer_response nobs=count;
call symput('num',left(put(count,4.)));
stop;
run;
proc format;
picture pctfmt low-high='009.9 %';
run;
proc tabulate data=customer_response;
var factor1-factor4 customer;
table factor1='Cost'
factor2='Performance'
factor3='Reliability'
factor4='Sales Staff',
(n='Count'*f=7. pctn<customer>='Percent'*f=pctfmt9.) ;
title 'Customer Survey Results: Spring 1996';
title3 'Factors Influencing the Decision to Buy';
run;
proc tabulate
data=customer_response;
var source1-source3 customer;
table source1='TV/Radio'
source2='Newspaper'
source3='Word of Mouth',
(n='Count'*f=7. pctn<customer>='Percent'*f=pctfmt9.) ;
title 'Source of Company Name';
footnote "Number of Respondents: &num";
run;
options formdlim='' number;
Program Description
2580 Chapter 70 / TABULATE Procedure
119 . . . 1 . . . 1 . .
120 1 1 . 1 . . . . 1 .
;
Store the number of observations in a macro variable. The SET statement reads
the descriptor portion of CUSTOMER_RESPONSE at compile time and stores the
number of observations (the number of respondents) in COUNT. The SYMPUT
routine stores the value of COUNT in the macro variable NUM. This variable is
available for use by other procedures and DATA steps for the remainder of the SAS
session. The IF 0 condition, which is always false, ensures that the SET statement,
which reads the observations, never executes. (Reading observations is
unnecessary.) The STOP statement ensures that the DATA step executes only
once.
data _null_;
if 0 then set customer_response nobs=count;
call symput('num',left(put(count,4.)));
stop;
run;
Create the PCTFMT. format. The FORMAT procedure creates a format for
percentages. The PCTFMT. format writes all values with at least one digit to the
left of the decimal point and with one digit to the right of the decimal point. A blank
and a percent sign follow the digits.
proc format;
picture pctfmt low-high='009.9 %';
run;
Specify the analysis variables. The VAR statement specifies that PROC
TABULATE calculate statistics on the Factor1, Factor2, Factor3, Factor4, and
Customer variables. The variable Customer must be listed because it is used to
calculate the Percent column that is defined in the TABLE statement.
var factor1-factor4 customer;
Define the table rows and columns. The TABLE statement creates a row for each
factor, a column for frequency counts, and a column for the percentages. Text in
quotation marks supplies headings for the corresponding row or column. The
Example 10: Reporting on Multiple-Response Survey Data 2581
format modifiers F=7. and F=PCTFMT9. provide formats for values in the
associated cells and extend the column widths to accommodate the column
headings.
table factor1='Cost'
factor2='Performance'
factor3='Reliability'
factor4='Sales Staff',
(n='Count'*f=7. pctn<customer>='Percent'*f=pctfmt9.) ;
Specify the analysis variables. The VAR statement specifies that PROC
TABULATE calculate statistics on the Source1, Source2, Source3, and Customer
variables. The variable Customer must be in the variable list because it appears in
the denominator definition.
var source1-source3 customer;
Define the table rows and columns. The TABLE statement creates a row for each
source of the company name, a column for frequency counts, and a column for the
percentages. Text in quotation marks supplies a heading for the corresponding row
or column.
table source1='TV/Radio'
source2='Newspaper'
source3='Word of Mouth',
(n='Count'*f=7. pctn<customer>='Percent'*f=pctfmt9.) ;
Specify the title and footnote. The macro variable NUM resolves to the number of
respondents. The FOOTNOTE statement uses double rather than single quotation
marks so that the macro variable will resolve.
title 'Source of Company Name';
footnote "Number of Respondents: &num";
run;
Reset the SAS system options. The FORMDLIM= option resets the page delimiter
to a page eject. The NUMBER option resumes the display of page numbers on
subsequent pages.
options formdlim='' number;
2582 Chapter 70 / TABULATE Procedure
Output
Output 70.25 Customer Survey Results: Spring 1996
Details
This report of listener preferences shows how many listeners select each type of
programming during each of seven time periods on a typical weekday. The data was
collected by a survey, and the results were stored in a SAS data set. Although this
data set contains all the information needed for this report, the information is not
arranged in a way that PROC TABULATE can use.
To make this crosstabulation of time of day and choice of radio programming, you
must have a data set that contains a variable for time of day and a variable for
programming preference. PROC TRANSPOSE reshapes the data into a new data set
that contains these variables. Once the data are in the appropriate form, PROC
TABULATE creates the report.
The following figure shows the survey form that is used to collect data.
An external file on page 2811 contains the raw data for the survey. Several lines
from that file appear here.
967 32 f 5 3 5
7 5 5 5 7 0 0 0 8 7 0 0 8 0
781 30 f 2 3 5
5 0 0 0 5 0 0 0 4 7 5 0 0 0
859 39 f 1 0 5
1 0 0 0 1 0 0 0 0 0 0 0 0 0
Program
data radio;
infile 'input-file' missover;
input /(Time1-Time7) ($1. +1);
listener=_n_;
run;
proc format;
value $timefmt 'Time1'='6-9 a.m.'
'Time2'='9 a.m. to noon'
'Time3'='noon to 1 p.m.'
'Time4'='1-4 p.m.'
'Time5'='4-6 p.m.'
'Time6'='6-10 p.m.'
'Time7'='10 p.m. to 2 a.m.'
other='*** Data Entry Error ***';
value $pgmfmt '0'="Don't Listen"
'1','2'='Rock and Top 40'
'3'='Country'
'4','5','6'='Jazz, Classical, and Easy Listening'
'7'='News/ Information /Talk'
'8'='Other'
other='*** Data Entry Error ***';
run;
proc transpose data=radio
out=radio_transposed(rename=(col1=Choice))
name=Timespan;
by listener;
var time1-time7;
run;
proc tabulate data=radio_transposed format=12.;
format timespan $timefmt. choice $pgmfmt.;
class timespan choice;
table timespan='Time of Day',
choice='Choice of Radio Program'*n='Number of Listeners';
Example 11: Reporting on Multiple-Choice Survey Data 2585
Program Description
Create the RADIO data set and specify the input file. RADIO contains data from a
survey of 336 listeners. The data set contains information about listeners and their
preferences in radio programming. The INFILE statement specifies the external file
that contains the data. MISSOVER prevents the input pointer from going to the
next record if it fails to find values in the current line for all variables that are listed
in the INPUT statement.
data radio;
infile 'input-file' missover;
input /(Time1-Time7) ($1. +1);
listener=_n_;
run;
Create the $TIMEFMT. and $PGMFMT. formats. PROC FORMAT creates formats
for the time of day and the choice of programming.
proc format;
value $timefmt 'Time1'='6-9 a.m.'
'Time2'='9 a.m. to noon'
'Time3'='noon to 1 p.m.'
'Time4'='1-4 p.m.'
'Time5'='4-6 p.m.'
'Time6'='6-10 p.m.'
'Time7'='10 p.m. to 2 a.m.'
other='*** Data Entry Error ***';
value $pgmfmt '0'="Don't Listen"
'1','2'='Rock and Top 40'
'3'='Country'
'4','5','6'='Jazz, Classical, and Easy Listening'
'7'='News/ Information /Talk'
'8'='Other'
other='*** Data Entry Error ***';
run;
Reshape the data by transposing the RADIO data set. PROC TRANSPOSE creates
RADIO_TRANSPOSED. This data set contains the variable Listener from the
original data set. It also contains two transposed variables: Timespan and Choice.
Timespan contains the names of the variables (Time1-Time7) from the input data
set that are transposed to form observations in the output data set. Choice
contains the values of these variables. (See “Details” on page 2587 for a complete
explanation of the PROC TRANSPOSE step.)
proc transpose data=radio
out=radio_transposed(rename=(col1=Choice))
name=Timespan;
by listener;
var time1-time7;
run;
2586 Chapter 70 / TABULATE Procedure
Create the report and specify the table options. The FORMAT= option specifies
the default format for the values in each table cell.
proc tabulate data=radio_transposed format=12.;
Specify subgroups for the analysis. The CLASS statement identifies Timespan and
Choice as class variables.
class timespan choice;
Define the table rows and columns. The TABLE statement creates a row for each
formatted value of Timespan and a column for each formatted value of Choice. In
each column are values for the N statistic. Text in quotation marks supplies
headings for the corresponding rows or columns.
table timespan='Time of Day',
choice='Choice of Radio Program'*n='Number of Listeners';
Output
Output 70.26 Listening Preferences on Weekdays
Example 11: Reporting on Multiple-Choice Survey Data 2587
Details
PROC TRANSPOSE restructures data so that values that were stored in one
observation are written to one variable. You can specify which variables you want
to transpose.
When you transpose with BY processing, as this example does, you create from
each BY group one observation for each variable that you transpose. In this
example, Listener is the BY variable. Each observation in the input data set is a BY
group because the value of Listener is unique for each observation.
This example transposes seven variables, Time1 through Time7. Therefore, the
output data set has seven observations from each BY group (each observation) in
the input data set.
2588 Chapter 70 / TABULATE Procedure
7 5 5 5 7 0 0 1
5 0 0 0 5 0 0 2
Ë Ì Ê
1 Ti me7 0
2 Ti me1 5
2 Ti me2 0
2 Ti me3 0
2 Ti me4 0
2 Ti me5 5
2 Ti me6 0
2 Ti me7 0
name=Timespan; 3
by listener; 4
var time1-time7; 5
format timespan $timefmt. choice $pgmfmt.; 6
Example 12: Calculating Various Percentage Statistics 2589
run;
Details
This example shows how to use three percentage sum statistics: COLPCTSUM,
REPPCTSUM, and ROWPCTSUM.
Program
data fundrais;
length name $ 8 classrm $ 1;
input @1 team $ @8 classrm $ @10 name $
@19 pencils @23 tablets;
sales=pencils + tablets;
datalines;
BLUE A ANN 4 8
RED A MARY 5 10
GREEN A JOHN 6 4
RED A BOB 2 3
BLUE B FRED 6 8
GREEN B LOUISE 12 2
BLUE B ANNETTE . 9
RED B HENRY 8 10
GREEN A ANDREW 3 5
RED A SAMUEL 12 10
BLUE A LINDA 7 12
GREEN A SARA 4 .
BLUE B MARTIN 9 13
RED B MATTHEW 7 6
GREEN B BETH 15 10
RED B LAURA 4 3
;
proc format;
picture pctfmt low-high='009 %';
run;
title "Fundraiser Sales";
proc tabulate format=7.;
class team classrm;
var sales;
table (team all),
classrm='Classroom'*sales=' '*(sum
colpctsum*f=pctfmt9.
rowpctsum*f=pctfmt9.
reppctsum*f=pctfmt9.)
all*sales*sum=' '
/rts=20;
run;
Example 12: Calculating Various Percentage Statistics 2591
Program Description
Create the FUNDRAIS data set. FUNDRAIS contains data on student sales during a
school fund-raiser. A DATA step creates the data set.
data fundrais;
length name $ 8 classrm $ 1;
input @1 team $ @8 classrm $ @10 name $
@19 pencils @23 tablets;
sales=pencils + tablets;
datalines;
BLUE A ANN 4 8
RED A MARY 5 10
GREEN A JOHN 6 4
RED A BOB 2 3
BLUE B FRED 6 8
GREEN B LOUISE 12 2
BLUE B ANNETTE . 9
RED B HENRY 8 10
GREEN A ANDREW 3 5
RED A SAMUEL 12 10
BLUE A LINDA 7 12
GREEN A SARA 4 .
BLUE B MARTIN 9 13
RED B MATTHEW 7 6
GREEN B BETH 15 10
RED B LAURA 4 3
;
Create the PCTFMT. format. The FORMAT procedure creates a format for
percentages. The PCTFMT. format writes all values with at least one digit, a blank,
and a percent sign.
proc format;
picture pctfmt low-high='009 %';
run;
Create the report and specify the table options. The FORMAT= option specifies up
to seven digits as the default format for the value in each table cell.
proc tabulate format=7.;
Specify subgroups for the analysis. The CLASS statement identifies Team and
Classrm as class variables.
class team classrm;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE
calculate statistics on the Sales variable.
var sales;
Define the table rows. The row dimension of the TABLE statement creates a row
for each formatted value of Team. The last row of the report summarizes sales for
all teams.
2592 Chapter 70 / TABULATE Procedure
Define the table columns. The column dimension of the TABLE statement creates
a column for each formatted value of Classrm. Crossed within each value of
Classrm is the analysis variable (sales) with a blank label. Nested within each
column are columns that summarize sales for the class. The first nested column,
labeled sum, is the sum of sales for the row for the classroom. The second nested
column, labeled ColPctSum, is the percentage of the sum of sales for the row for the
classroom in relation to the sum of sales for all teams in the classroom. The third
nested column, labeled RowPctSum, is the percentage of the sum of sales for the row
for the classroom in relation to the sum of sales for the row for all classrooms. The
fourth nested column, labeled RepPctSum, is the percentage of the sum of sales for
the row for the classroom in relation to the sum of sales for all teams for all
classrooms. The last column of the report summarizes sales for the row for all
classrooms.
classrm='Classroom'*sales=' '*(sum
colpctsum*f=pctfmt9.
rowpctsum*f=pctfmt9.
reppctsum*f=pctfmt9.)
all*sales*sum=' '
Specify the row title space and eliminate blank row headings. RTS= provides 20
characters per line for row headings.
/rts=20;
run;
Output
Output 70.27 Fundraiser Sales
Details
Here are the percentage sum statistic calculations used to produce the output for
the Blue Team in Classroom A:
Example 13: Using Denominator Definitions to Display Basic Frequency Counts and
Percentages 2593
n COLPCTSUM=31/91*100=34%
n ROWPCTSUM=31/67*100=46%
n REPPCTSUM=31/204*100=15%
Similar calculations were used to produce the output for the remaining teams and
classrooms.
Details
Crosstabulation tables (also called contingency tables or stub-and-banner reports)
show combined frequency distributions for two or more variables. This table shows
frequency counts for females and males within each of four job classes. The table
also shows the percentage that each frequency count represents the following:
n the total women and men in that job class (row percentage)
n the total for that gender in all job classes (column percentage)
Program
data jobclass;
input Gender Occupation @@;
2594 Chapter 70 / TABULATE Procedure
datalines;
1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 2 1 2 1 2 1 2 1 2 1 2 1 2
1 3 1 3 1 3 1 3 1 3 1 3 1 3
1 1 1 1 1 1 1 2 1 2 1 2 1 2
1 2 1 2 1 3 1 3 1 4 1 4 1 4
1 4 1 4 1 4 1 1 1 1 1 1 1 1
1 1 1 2 1 2 1 2 1 2 1 2 1 2
1 2 1 3 1 3 1 3 1 3 1 4 1 4
1 4 1 4 1 4 1 1 1 3 2 1 2 1
2 1 2 1 2 1 2 1 2 1 2 2 2 2
2 2 2 2 2 2 2 3 2 3 2 3 2 4
2 4 2 4 2 4 2 4 2 4 2 1 2 3
2 3 2 3 2 3 2 3 2 4 2 4 2 4
2 4 2 4 2 1 2 1 2 1 2 1 2 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 3 2 3 2 4 2 4 2 4 2 1 2 1
2 1 2 1 2 1 2 2 2 2 2 2 2 3
2 3 2 3 2 3 2 4
;
proc format;
value gendfmt 1='Female'
2='Male'
other='*** Data Entry Error ***';
value occupfmt 1='Technical'
2='Manager/Supervisor'
3='Clerical'
4='Administrative'
other='*** Data Entry Error ***';
run;
proc tabulate data=jobclass format=8.2;
class gender occupation;
table (occupation='Job Class' all='All Jobs')
*(n='Number of employees'*f=9.
pctn<gender all>='Percent of row total'
pctn<occupation all>='Percent of column total'
pctn='Percent of total'),
gender='Gender' all='All Employees'/ rts=50;
format gender gendfmt. occupation occupfmt.;
title 'Gender Distribution';
title2 'within Job Classes';
run;
Program Description
Create the JOBCLASS data set. JOBCLASS contains encoded information about
the gender and job class of employees at a fictitious company.
data jobclass;
input Gender Occupation @@;
datalines;
Example 13: Using Denominator Definitions to Display Basic Frequency Counts and
Percentages 2595
1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 2 1 2 1 2 1 2 1 2 1 2 1 2
1 3 1 3 1 3 1 3 1 3 1 3 1 3
1 1 1 1 1 1 1 2 1 2 1 2 1 2
1 2 1 2 1 3 1 3 1 4 1 4 1 4
1 4 1 4 1 4 1 1 1 1 1 1 1 1
1 1 1 2 1 2 1 2 1 2 1 2 1 2
1 2 1 3 1 3 1 3 1 3 1 4 1 4
1 4 1 4 1 4 1 1 1 3 2 1 2 1
2 1 2 1 2 1 2 1 2 1 2 2 2 2
2 2 2 2 2 2 2 3 2 3 2 3 2 4
2 4 2 4 2 4 2 4 2 4 2 1 2 3
2 3 2 3 2 3 2 3 2 4 2 4 2 4
2 4 2 4 2 1 2 1 2 1 2 1 2 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 3 2 3 2 4 2 4 2 4 2 1 2 1
2 1 2 1 2 1 2 2 2 2 2 2 2 3
2 3 2 3 2 3 2 4
;
Create the GENDFMT. and OCCUPFMT. formats. PROC FORMAT creates formats
for the variables Gender and Occupation.
proc format;
value gendfmt 1='Female'
2='Male'
other='*** Data Entry Error ***';
value occupfmt 1='Technical'
2='Manager/Supervisor'
3='Clerical'
4='Administrative'
other='*** Data Entry Error ***';
run;
Create the report and specify the table options. The FORMAT= option specifies
the 8.2 format as the default format for the value in each table cell.
proc tabulate data=jobclass format=8.2;
Specify subgroups for the analysis. The CLASS statement identifies Gender and
Occupation as class variables.
class gender occupation;
table (occupation='Job Class' all='All Jobs')
*(n='Number of employees'*f=9.
pctn<gender all>='Percent of row total'
pctn<occupation all>='Percent of column total'
pctn='Percent of total'),
Define the table columns and specify the amount of space for row headings. The
column dimension creates a column for each formatted value of Gender and for all
employees. Text in quotation marks supplies the heading for the corresponding
column. The RTS= option provides 50 characters per line for row headings.
gender='Gender' all='All Employees'/ rts=50;
Format the output. The FORMAT statement assigns formats to the variables
Gender and Occupation.
format gender gendfmt. occupation occupfmt.;
2596 Chapter 70 / TABULATE Procedure
Output
Output 70.28 Gender Distribution within Job Classes
Example 13: Using Denominator Definitions to Display Basic Frequency Counts and
Percentages 2597
Details
Overview
The part of the TABLE statement that defines the rows of the table uses the PCTN
statistic to calculate three different percentages.
In all calculations of PCTN, the numerator is N, the frequency count for one cell of
the table. The denominator for each occurrence of PCTN is determined by the
denominator definition. The denominator definition appears in angle brackets after
the keyword PCTN. It is a list of one or more expressions. The list tells PROC
TABULATE which frequency counts to sum for the denominator.
The following figure highlights these subtables and the frequency counts for each
category.
Occupatio n
and Al l
Occupation and Gender -----------
| |
---------------------------------------------------------------------- | All |
| | Gender | |Employees|
| |-------------------| +---------|
| | Female | Male | | |
|------------------------------------------------+---------+---------+ | |
|Job Class | | | | | 34|
|-----------------------+------------------------| | | +---------|
|Technical |Number of employees | 16| 18| | 100.00|
| |------------------------+---------+---------+ +---------|
| |Percent of row total | 47.06| 52.94| | 27.64|
| |------------------------+---------+---------+ +---------|
| |Percent of column total | 26.23| 29.03| | 27.64|
| |------------------------+---------+---------+ +---------|
| |Percent of total | 13.01| 14.63| | 35|
|-----------------------+------------------------+---------+---------+ +---------|
|Manager/Supervisor |Number of employees | 20| 15| | 100.00|
| |------------------------+---------+---------+ +---------|
| |Percent of row total | 57.14| 42.86| | 28.46|
| |------------------------+---------+---------+ +---------|
| |Percent of column total | 32.79| 24.19| | 28.46|
| |------------------------+---------+---------+ +---------|
| |Percent of total | 16.26| 12.20| | 28|
|-----------------------+------------------------+---------+---------+ +---------|
|Clerical |Number of employees | 14| 14| | 100.00|
| |------------------------+---------+---------+ +---------|
| |Percent of row total | 50.00| 50.50| | 22.76|
| |------------------------+---------+---------+ +---------|
| |Percent of column total | 22.95| 22.58| | 22.76|
| |------------------------+---------+---------+ +---------|
| |Percent of total | 11.38| 11.38| | 26|
|-----------------------+------------------------+---------+---------+ +---------|
|Administrative |Number of employees | 11| 15| | 100.00|
| |------------------------+---------+---------+ +---------|
| |Percent of row total | 42.31| 57.69| | 21.14|
| |------------------------+---------+---------+ +---------|
| |Percent of column total | 18.03| 24.19| | 21.14|
| |------------------------+---------+---------+ +---------|
| |Percent of total | 8.94| 12.20|
|-----------------------+------------------------+---------+---------+
|-----------------------+------------------------+---------+---------+
|All Jobs |Number of employees | 61| 62|
| |------------------------+---------+---------+
| |Percent of row total | 49.59| 50.41| +---------|
| |------------------------+---------+---------+ | 123|
| |Percent of column total | 100.00| 100.00| +---------|
| |------------------------+---------+---------+ | 100.00|
| |Percent of total | 49.59| 50.41| +---------|
---------------------------------------------------------------------- | 100.00|
+---------|
| 100.00|
-----------
All and Gender
Al l
and Al l
Each use of PCTN nests a row of statistics within each value of Occupation and All.
Each denominator definition tells PROC TABULATE which frequency counts to sum
for the denominators in that row. This section explains how PROC TABULATE
interprets these denominator definitions.
Example 13: Using Denominator Definitions to Display Basic Frequency Counts and
Percentages 2599
Row Percentages
The part of the TABLE statement that calculates the row percentages and that
labels the row is
pctn<gender all>='Row percent'
Consider how PROC TABULATE interprets this denominator definition for each
subtable.
PROC TABULATE looks at the first element in the denominator definition, Gender,
and asks whether Gender contributes to the subtable. Because Gender does
contribute to the subtable, PROC TABULATE uses it as the denominator definition.
This denominator definition tells PROC TABULATE to sum the frequency counts for
all occurrences of Gender within the same value of Occupation.
2600 Chapter 70 / TABULATE Procedure
For example, the denominator for the category female, technical is the sum of all
frequency counts for all categories in this subtable for which the value of
Occupation is technical. There are two such categories: female, technical and
male, technical. The corresponding frequency counts are 16 and 18. Therefore,
the denominator for this category is 16+18, or 34.
PROC TABULATE looks at the first element in the denominator definition, Gender,
and asks whether Gender contributes to the subtable. Because Gender does
contribute to the subtable, PROC TABULATE uses it as the denominator definition.
This denominator definition tells PROC TABULATE to sum the frequency counts for
all occurrences of Gender in the subtable.
For example, the denominator for the category all, female is the sum of the
frequency counts for all, female and all, male. The corresponding frequency
counts are 61 and 62. Therefore, the denominator for cells in this subtable is 61+62,
or 123.
Example 13: Using Denominator Definitions to Display Basic Frequency Counts and
Percentages 2601
Output 70.31 Subtable 3: Occupation and All
PROC TABULATE looks at the first element in the denominator definition, Gender,
and asks whether Gender contributes to the subtable. Because Gender does not
contribute to the subtable, PROC TABULATE looks at the next element in the
denominator definition, which is All. The variable All does contribute to this
subtable, so PROC TABULATE uses it as the denominator definition. All is a
reserved class variable with only one category. Therefore, this denominator
definition tells PROC TABULATE to use the frequency count of All as the
denominator.
For example, the denominator for the category clerical, all is the frequency
count for that category, 28.
Note: In these table cells, because the numerator and the denominator are the
same, the row percentages in this subtable are all 100.
2602 Chapter 70 / TABULATE Procedure
PROC TABULATE looks at the first element in the denominator definition, Gender,
and asks whether Gender contributes to the subtable. Because Gender does not
contribute to the subtable, PROC TABULATE looks at the next element in the
denominator definition, which is All. The variable All does contribute to this
subtable, so PROC TABULATE uses it as the denominator definition. All is a
reserved class variable with only one category. Therefore, this denominator
definition tells PROC TABULATE to use the frequency count of All as the
denominator.
There is only one category in this subtable: all, all. The denominator for this
category is 123.
Note: In this table cell, because the numerator and denominator are the same, the
row percentage in this subtable is 100.
Example 13: Using Denominator Definitions to Display Basic Frequency Counts and
Percentages 2603
Column Percentages
The part of the TABLE statement that calculates the column percentages and
labels the row is
pctn<occupation all>='Column percent'
Consider how PROC TABULATE interprets this denominator definition for each
subtable.
For example, the denominator for the category manager/supervisor, male is the
sum of all frequency counts for all categories in this subtable for which the value of
Gender is male. There are four such categories: technical, male; manager/
supervisor, male; clerical, male; and administrative, male. The
corresponding frequency counts are 18, 15, 14, and 15. Therefore, the denominator
for this category is 18+15+14+15, or 62.
For example, the denominator for the category all, female is the frequency count
for that category, 61.
Example 13: Using Denominator Definitions to Display Basic Frequency Counts and
Percentages 2605
Note: In these table cells, because the numerator and denominator are the same,
the column percentages in this subtable are all 100.
For example, the denominator for the category technical, all is the sum of the
frequency counts for technical, all; manager/supervisor, all; clerical, all;
and administrative, all. The corresponding frequency counts are 34, 35, 28, and
26. Therefore, the denominator for this category is 34+35+28+26, or 123.
2606 Chapter 70 / TABULATE Procedure
There is only one category in this subtable: all, all. The frequency count for this
category is 123.
Note: In this calculation, because the numerator and denominator are the same,
the column percentage in this subtable is 100.
Example 14: Specifying Style Overrides for ODS Output 2607
Total Percentages
The part of the TABLE statement that calculates the total percentages and labels
the row is:
pctn='Total percent'
If you do not specify a denominator definition, then PROC TABULATE obtains the
denominator for a cell by totaling all the frequency counts in the subtable. The
following table summarizes the process for all subtables in this example.
Class Variables
Contributing to the
Subtable Frequency Counts Total
Occupant and Gender 16, 18, 20, 15 14, 14, 11, 15 123
Details
This example creates HTML, RTF, and PDF files and specifies style overrides for
various table regions.
Program
options nodate pageno=1;
proc sort data=energy;
by region;
run;
ods html5 path='path' body='filename.htm';
ods pdf file='filename.pdf' contents=yes;
ods rtf file='filename.rtf' contents=yes;
proc tabulate data=energy style=[fontweight=bold];
by region;
class region division type / style=[textalign=center];
classlev region division type / style=[textalign=left];
var expenditures / style=[fontsize=3];
keyword all sum / style=[fontwidth=wide];
keylabel all="Total";
table (region all)*(division all*[style=[backgroundcolor=yellow]]),
(type all)*(expenditures*f=dollar10.) /
style=[bordercolor=blue]
misstext=[label="Missing" style=[fontweight=light]]
box=[label="Region by Division by Type"
Example 14: Specifying Style Overrides for ODS Output 2609
style=[fontstyle=italic]];
format region regfmt. division divfmt. type usetype.;
title 'Energy Expenditures';
title2 '(millions of dollars)';
run;
ods _all_ close;
Program Description
Set the SAS system options. The NODATE option suppresses the display of the
date and time in the output. PAGENO= specifies the starting page number.
options nodate pageno=1;
Specify the ODS output filenames. By opening multiple ODS destinations, you can
produce multiple output files in a single execution. The ODS HTML5 statement
produces output that is written in HTML 5.0. The ODS PDF statement produces
output in Portable Document Format (PDF). The ODS RTF statement produces
output in Rich Text Format (RTF). The output from PROC TABULATE goes to each
of these files.
In the ODS PDF and ODS RTF statements, the CONTENTS= option creates a table
of contents.
ods html5 path='path' body='filename.htm';
ods pdf file='filename.pdf' contents=yes;
ods rtf file='filename.rtf' contents=yes;
Customize the data cells. The STYLE= option in the PROC TABULATE statement
specifies the style override for the data cells of the table.
proc tabulate data=energy style=[fontweight=bold];
Specify the BY-group. When BY statements are specified, labels for the BY group
tables are displayed in the table of contents. The labels are based on the values of
the BY variable.
by region;
Customize the class variable name headings. The STYLE= option in the CLASS
statement specifies the style override for the class variable name headings.
class region division type / style=[textalign=center];
Customize the class variable value headings. The STYLE= option in the CLASSLEV
statement specifies the style override for the class variable level value headings.
classlev region division type / style=[textalign=left];
Customize the analysis variable name headings. The STYLE= option in the VAR
statement specifies a style element for the variable name headings.
2610 Chapter 70 / TABULATE Procedure
Specify the style attributes for keywords, and label the “all” keyword. The
STYLE= option in the KEYWORD statement specifies a style element for keywords.
The KEYLABEL statement assigns a label to the keyword.
keyword all sum / style=[fontwidth=wide];
keylabel all="Total";
Define and customize the table rows and columns. The STYLE= option in the
dimension expression overrides any other STYLE= specifications in PROC
TABULATE that specify overrides for table cells. The STYLE= option after the slash
(/) specifies style overrides for parts of the table other than table cells.
table (region all)*(division all*[style=[backgroundcolor=yellow]]),
(type all)*(expenditures*f=dollar10.) /
style=[bordercolor=blue]
Customize missing values. The STYLE= option in the MISSTEXT option of the
TABLE statement specifies a style element to use for the text in table cells that
contain missing values.
misstext=[label="Missing" style=[fontweight=light]]
Customize the box above the row titles. The STYLE= option in the BOX option of
the TABLE statement specifies a style override for text in the box above the row
titles.
box=[label="Region by Division by Type"
style=[fontstyle=italic]];
Output
Output 70.37 HTML Output
Details
This example does the following:
n creates a category for each sales type, retail or wholesale, in each region
n applies an italic font style for each region and sales type
Program
proc format;
value $saletypefmt 'R'='Retail'
'W'='WholeSale';
run;
ods html file="stylePrecedence.html";
title "Style Precedence";
title2 "First Table: no precedence, Orange";
title3 "Second Table: style_precedence=page, Yello";
proc tabulate data=sales format=dollar10.;
class product region saletype;
var netsales;
label netsales="Net Sales";
keylabel all="Total";
run;
2614 Chapter 70 / TABULATE Procedure
Program Description
Create the SALETYPEFMT. formats. PROC FORMAT creates formats for
SALETYPE.
proc format;
value $saletypefmt 'R'='Retail'
'W'='WholeSale';
run;
Specify the ODS output filename. The ODS HTML statement produces output that
is written in HTML.
ods html file="stylePrecedence.html";
Specify the titles of the tables to be produced. Two tables will be generated. The
First Table will show no style precedence whereas the Second Table will show that
the color that takes precedence is based on what is specified by the
STYLE_PRECEDENCE option.
title "Style Precedence";
title2 "First Table: no precedence, Orange";
title3 "Second Table: style_precedence=page, Yello";
Specify the table options. The FORMAT= option specifies DOLLAR10. as the
default format for the value in each table cell.
proc tabulate data=sales format=dollar10.;
Specify subgroups for the analysis. The CLASS statement separates the analysis
by values of Product, Region, and SaleType.
class product region saletype;
Specify styles for the subgroups. The CLASSLEV statement specifies a style for
the Region and Saletype elements.
classlev region saletype / style={font_style=italic};
Specify the analysis variable. The VAR statement specifies that PROC TABULATE
calculate statistics on the Netsales variable.
var netsales;
Specify labels. The LABEL statement renames the Netsales variable to Net Sales.
label netsales="Net Sales";
Specify Keylabel. The KEYLABEL statement labels the universal class variable
ALL to Total.
keylabel all="Total";
Define the table rows and columns. The TABLE statement creates a table per
product per page. In this example, there is one product, A100. The TABLE
statement also creates a row for each formatted value of Region and creates a
column for each formatted value of SaleType. Each cell that is created by these
rows and columns contains the sum of the analysis variable Net Sales for all
Example 15: Style Precedence 2615
observations that contribute to that cell. The STYLE= option in the dimension
expression overrides any other STYLE= specifications in PROC TABULATE that
specify attributes for the table cells. In this first table, the column expression is the
default and the style associated with column takes precedence. Therefore, orange
will be the default color of the background.
table product *{style={background=#edf8b1}},
region*{style={background=yellow}},
saletype*{style={background=orange}};
Define the table rows and columns using the STYLE_PRECEDENCE option. The
TABLE statement creates a table per product per page, A100. The TABLE
statement also creates a row for each formatted value of Region and creates a
column for each formatted value of SaleType. Each cell that is created by these
rows and columns contains the sum of the analysis variable Net Sales for all
observations that contribute to that cell. The STYLE= option in the dimension
expression overrides any other STYLE= specifications in PROC TABULATE that
specify attributes for the table cells. In this second table, the STYLE_PRECEDENCE
option is specified on the page expression. Therefore, the style that applies to the
background is red.
table product *{style={background=#edf8b1}},
region*{style={background=yellow}},
saletype*{style={background=orange}} / style_precedence=page;
Format the output. The FORMAT statement assigns formats to the SaleType
variable.
format saletype $saletypefmt.;
Output
Output 70.40 Table with No Style Precedence
2616 Chapter 70 / TABULATE Procedure
Details
This example does the following:
n creates a table with merged cells style behavior
n shows how cells styles are affected when empty data cells and the formatted
data cells use different styles
Example 16: Using the NOCELLMERGE Option 2617
Program
ods html file="tabstyle.html";
proc tabulate data=sashelp.class style={background=#edf8b1};
class sex age;
Program Description
Specify the ODS output filename. The ODS HTML statement produces output that
is written in HTML.
ods html file="tabstyle.html";
Specify the PROC TABULATE options. The STYLE= option sets the background
color for the cells in the table to red.
proc tabulate data=sashelp.class style={background=#edf8b1};
Specify subgroups. The CLASS statement separates the data by sex and age.
class sex age;
Define the table rows and columns. The TABLE statement creates a table. The
STYLE= option in the dimension expression overrides the STYLE= setting from the
PROC TABULATE statement for table cells attributes.
table sex*{style={background=#7fcdbb}} all, age;
Specify the title of the table to be produced. This table shows how changing the
style color affects the merged cells.
title 'Data Cell Styles in Merged Cells';
Specify the PROC TABULATE options. The STYLE= option sets the background
color for the cells in the table to red.
proc tabulate data=sashelp.class style={background=#edf8b1};
Specify subgroups. The CLASS statement separates the data by sex and age.
class sex age;
Define the table rows and columns. The TABLE statement creates a table. The
STYLE= option in the dimension expression overrides the STYLE= setting from the
PROC TABULATE statement, but only for the formatted data cells.
table sex*{style={background=#7fcdbb}} all, age/nocellmerge;
Specify the title of the table to be produced. This table shows how changing the
style color affects the formatted cells that are not merged.
title1 'Data Cell Styles with NOCELLMERGE Option';
Output
Output 70.42 Data Cell Styles in Merged Cells
Example 16: Using the NOCELLMERGE Option 2619
References
Jain, Raj and Imrich Chlamtac. 1985. “The P² Algorithm for Dynamic Calculation of
Quantiles and Histograms without Storing Observations.” Communications of
the Association of Computing Machinery 28 (10): 1076–1085.
2620 Chapter 70 / TABULATE Procedure
2621
71
TIMEPLOT Procedure
similar to the ones produced by the PLOT and PRINT procedures, PROC TIMEPLOT
output has these distinctive features:
n The vertical axis always represents the sequence of observations in the data
set. Thus, if the observations are in order of date or time, then the vertical axis
represents the passage of time.
n The horizontal axis represents the values of the variable that you are examining.
Like PROC PLOT, PROC TIMEPLOT can overlay multiple plots on one set of
axes so that each line of the plot can contain values for more than one variable.
n A plot produced by PROC TIMEPLOT can occupy more than one page.
The following output illustrates a simple report that you can produce with PROC
TIMEPLOT. This report shows sales of refrigerators for two sales representatives
during the first six weeks of the year. The statements that produce the output
follow. A DATA step in “Example 1: Plotting a Single Variable” on page 2637 creates
the data set Sales.
title 'The SAS System';
options source;
The following output is a more complicated report of the same data set that is used
to create Output 71.389 on page 2623. The statements that create this report do
the following:
n create one plot for the sale of refrigerators and one for the sale of stoves
n identify points on the plots by the first letter of the sales representative's last
name
n control the size of the horizontal axis
For an explanation of the program that produces this report, see “Example 5:
Showing Multiple Observations on One Line of a Plot” on page 2648.
2624 Chapter 71 / TIMEPLOT Procedure
Restriction: This procedure is not available in SAS Viya orders that include only SAS Visual
Analytics.
Example: “Example 4: Superimposing Two Plots” on page 2646
Syntax
PROC TIMEPLOT <options>;
Optional Arguments
DATA=SAS-data-set
identifies the input data set.
ENCRYPTKEY=key-value
specifies the key value needed for plotting an AES-encrypted data set. If the
input data set was created with ENCRYPT=AES, then you must specify the
ENCRYPTKEY= value to plot its data. For example, if a data set named
secretPlot is created using the DATA statement.
data secretPlot(encrypt=AES encryptkey=Ib007)
Then, you must specify the following PROC statement to plot the data in
secretPlot:
proc timeplot data=secretPlot(encryptkey=Ib007);
See “ENCRYPTKEY= Data Set Option” in SAS Data Set Options: Reference for
more information about the ENCRYPTKEY= data set option.
2626 Chapter 71 / TIMEPLOT Procedure
MAXDEC=number
specifies the maximum number of decimal places to be printed in the listing.
Default 2
Range 0-12