0% found this document useful (0 votes)
49 views

base-programming-ref-sheet

Fghgfdddty
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

base-programming-ref-sheet

Fghgfdddty
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

SAS Base Programming Reference Sheet

SAS Programming Fundamentals Procedures for Exploring and Analyzing Data


• A program can create a log, results, and output data • PROC CONTENTS: Prints descriptor portion of a data set
• Programs are comprised of Data and Procedure steps
PROC CONTENTS DATA = data-set-name;
• Steps end with a run; statement (sometimes quit;)
RUN;
• Each step is a series of statements
• A statement begins with a keyword (e.g. data) and end with a semicolon • PROC PRINT: lists all columns and rows in the input table by default
“;” • OBS = option limits the number of rows listed
• Assignment statements do not begin with a keyword • VAR statement limits and orders columns listed
• Use _NUMERIC_, _CHARACTER_, and _ALL_ keywords to
Example Program Program Explanation specify variables of a certain type (numeric or character)
Data myclass; Creates a new dataset myclass in the or all types
set sashelp.class; work library. Calculates a new variable • WHERE statement filters the data
heightcm = height*2.54; heightcm • BY statement groups output data
Run;
Prints a view of the myclass dataset in
• FORMAT statement applies a temporary* format in the output
Proc print data = myclass; the results window displaying only 2 • LABEL statement applies a temporary* label to the variable
var age heightcm; variables: age, heightcm. names in the output
Run; * Permanent characteristics are defined in the data step
There are 7 statements. PROC PRINT DATA = data-set-name <label> (OBS = n);
<VAR col-name(s);>
Global Statements <WHERE expression;>
<BY col-names(s);>
• OPTIONS
Options …; sets SAS system options <FORMAT col-name(s) <$> format name. ;>
EX: Options Validvarname = V7;
<LABEL col-name = “Label”; >
• Title <options>“…title
TITLE <options> “...”; text…”; set up to 10 titles RUN;
EX: TITLE “Student Ages and Heights”;
TITLE2 “Classroom 105”; • PROC MEANS: generates simple summary statistics for each numeric
column in the input data by default unless the VAR statement is used
• FOOTNOTE
Footnote “<options>
….”; “…..” ; sets up to 10 footnotes
EX: FOOTNOTE “Data refreshed annually”;
• CLASS specifies variables to group data before calculating
statistics
• LIBNAME
Libname libref
libref<engine>
<engine>“..…”;
“….”; sets a shortcut reference to data of • WAYS specifies number of ways to make unique combinations
a specific type in a specific location of class variables
o libref: name to call a library. 8-character max length • OUTPUT provides the option to create an output table and
o engine: contains predefined set of rules for reading data. Base is specific output statistics
the default engine reading SAS datasets • OUT = names the output table to be created
o “…..” physical name of the library recognized by the system
EX: LIBNAME mylib “/C:/documents/project/data”; PROC MEANS DATA = data-set-name;
LIBNAME myXL xlsx “/C:/documents/class.xlsx”; <WHERE expression;>
<VAR col-name(s);>
• LIBNAME
Libnamelibref CLEAR;
clear; ends connection to the data source <CLASS col-names(s);>
Commenting Code <WAYS n;>
<OUTPUT OUT = output-table <statistic =col-name>
• Comments can be added to prevent text in the program from
RUN;
executing
• There are 2 comment styles • PROC UNIVARIATE: Generates summary statistics and more detailed
• Comments are not executable statements statistics about distribution and extreme values for each numeric
variable by default
/* insert commented text here */
* Insert commented text here ; PROC UNIVARIATE DATA = data-set-name;
<VAR col-name(s);>
<WHERE expression;>
RUN;
Accessing Data
• PROC FREQ: Creates a frequency table for each variable in the input
• SAS can read and understand structured (e.g. xlsx) and
table by default.
unstructured (e.g. csv) data types
• TABLES limits the variables analyzed
• Structured data can be read via a LIBNAME statement or PROC
• <options> customizes the outputs by limiting columns (i.e.
IMPORT step
nocum), modifying output style (i.e. crosslist, listing) or
• Unstructured data requires PROC IMPORT to define rules
generating graphs
Importing data using Proc Import
• Creates a crosstabulation report by adding an asterisk (*)
Example Program Program Explanation between two variable names on the TABLES Statement
Proc import datafile = “myfile. csv” dbmsImport a CSV file (unstructured data)
= csv out = mylib.example replace; that outputs a SAS dataset example PROC FREQ DATA = data-set-name;
guessingrows = 100; in a user defined permanent library <TABLES col-name(s) </options>;>
Run; mylib.
<WHERE expression; >
Optionally add a guessingrows
statement to read n rows to RUN;
determine variable attributes.

Copyright © 2022 SAS Institute Inc. Cary, NC, USA. All rights reserved
SAS Base Programming Reference Sheet
Procedures for Data Manipulation The DATA Step: Controlling Variable Output
• PROC SORT: sorts the rows in a table on one or more character or • DROP= / KEEP = options can be added to a table on the DATA
numeric columns. A PROC SORT is required before any step that uses a statement or SET statement.
BY statement • DROP/ KEEP statements can be added within the data step
• BY specifies the columns used in the sort • Columns kept or dropped will be flagged in the PDV
• OUT = specifies an output table • Dropping a column on the SET statement makes a column unavailable
• NODUPKEY keeps only the first row for each unique value of for processing in the data step
the columns(s) listed in the by statement DATA work.expensive (KEEP = price item_name);
• DUPOUT creates an output table containing duplicates SET work.shopping (DROP = city state);
• DESCENDING sorts column from 9 to 0 or Z to A KEEP store_name;
RUN;
PROC SORT DATA = input-table <OUT = output-table>
<NODUPKEY> <DUPOUT = output-table>; The DATA Step: Processing Data in Groups
BY <DESCENDING> col-name(s) </options>; • Process data in groups after sorting data first
RUN; • First.bycol is 1 for the first row within a group and 0 otherwise.
• Last.bycol is 1 for the last row within a group and 0 otherwise
• PROC TRANSPOSE: is used to restructure a table
• VAR lists column(s) to be transposed BY col-names(s);
• ID creates a separate column for each value of the ID FIRST.bycol <expression>;
Variable and can only be one column. LAST.bycol<expression>;
• BY transpose data within groups. Unique combinations of BY • Accumulating columns require modifying SAS’ default behavior to
values creates one row in the output table retain all PDV values with each iteration.
• PREFIX provides a prefix for each value of the ID column • Accumulating columns are often used in conjunction with FIRST./LAST.
• NAME names the column that identifies the source column logic allowing BY GROUP calculations & totals
containing the transposed values • Column is the new variable holding the accumulating total
PROC TRANSPOSE DATA = input-table OUT = output-table Column + expression;
<PREFIX = column> <NAME = column>;
<BY col-name(s); > The DATA Step: Conditional Processing and Loops
<ID column;>
• Conditionally process data using IF/ELSE IF/ ELSE statements
<VAR columns(s);>
• SAS will check the expressions sequentially until one is true
RUN;
• IF statements can create new variables or new data sets
if price >100 then newVar = “Expensive”;
Preparing Data: The DATA Step else if price <100 and price >0 then newVAR = “Cheap”;
else if price = 0 then newVAR = “FREE”;
else newVAR = “Priceless”;
DATA output-dataset;
set input-dataset; • Execute multiple statements by using a DO statement
Run;
if price >100 then do;
• The data step is processed in two phases: newVar = “Expensive”;
• Compilation: creates the PDV, establishes data attributes output work.expensive;
and rules for execution end;
• Execution: SAS reads, manipulates and writes data
• Process repetitive code using DO LOOPS
• DATA Steps create two default variables:
• The optional OUTPUT statement will output a row for each iteration of
• _N_ counts the number of iterations through the data
the loop
step when processing
• _ERROR_ is initialized at 0. If an error is encountered, DATA output-table;
the value is set to 1 SET input-table;
• Explicit Output statements can be used to control when and where DO indexcolumn = start TO stop <BY increment> ;
each row is written. . . . repetitive code . . .
• Multiple datasets can be created in one data step <OUTPUT;>
END;
RUN;
DATA work.cheap work.expensive;
set work.shopping; • A DO UNTIL executes until a condition is true, and the condition is
if price >100 then output work.expensive; checked at the bottom of the DO loop. A DO UNTIL loop always
else output work.cheap; executes at least one time.
Run; • A DO WHILE executes while a condition is true, and the condition is
checked at the top of the DO loop. A DO WHILE loop does not iterate
Program Explanation even once if the condition is initially false.
The data step creates two output tables CHEAP and EXPENSIVE
based on the input dataset WORK.SHOPPING. DO WHILE | UNTIL expression;
If an observation has PRICE greater than 100, then the observation . . . repetitive code . . .
is assigned to the EXPENSIVE dataset. END;

Copyright © 2022 SAS Institute Inc. Cary, NC, USA. All rights reserved
SAS Base Programming Reference Sheet
The DATA Step: Combining Data • Common Date Functions:
• Combine tables by concatenating them (stacking), or matching them NOTE: SAS Dates are numeric values calculated as the number
based on a variable of days since JAN 1, 1960.
• Concatenating: Function What is does
• SAS reads all the rows from the first table listed on the set
statement and writes them to the output table. Then from MDY(month, day, year) Creates a SAS Date Value
the second table, and so on
• Columns with the same name are aligned TODAY() Returns the current date as a numeric SAS
• Columns not in all tables are included date value
• The RENAME = option can rename columns in input tables, so YEAR(date-var); Returns Year/Month/Day/QTR of the SAS
they align in the output table MONTH(date-var) date value input
• Additional DATA step statements can be used after the set DAY(date-var)
statement to manipulate data QTR(date-var)
DATA output-dataset;
INTNX(interval, start- Increments a date/time/datetime value by
set input-dataset1 input-dataset2
from, increment a given time interval
(rename=(currentName = newName));
<,'alignment'>)
Run;
• Merging tables • Common Character Functions
• All tables in the MERGE statement must be sorted by the
column(s) listed in the BY statement Function What is does
• The MERGE statement combines rows where the BY-Column TRIM(string) Removes trailing blanks
values match
• Identify matching and no matching rows by using the IN= STRIP(string) Removes all leading and trailing blanks
dataset option. IN variable values are 0 or 1.
• 0 → table did NOT include the by-column value. SCAN(string, count, Returns the nth word from a string
• 1 → table did include the by-column value <char-list, <modifier>>)
• Use a subsetting IF or IF-THEN logic to handle matching & PROPCASE(string) Changes the casing of the string.
nonmatching rows UPCASE(string) Commonly used in statements of equality
DATA output-dataset; LOWCASE(string)
MERGE input-dataset1 <(in = VAR1)>
SUBSTR(string, start- Extracts a substring from the argument
input-dataset2 <(in = VAR2)>;
from, length)
BY by-column(s);
< IF var1 = 1 and var2 = 1;> /* all matching rows */
RUN; Customizing SAS Output: Labels and Formats
• Labels and Formats can be applied in the DATA step and assigned as
The DATA Step: Functions permanent attributes. These statements can also be used in reporting
• SAS has functions to handle character, numeric and date columns. procedures as temporary attributes. (e.g. they need to be specified in
each procedure)
new-var = function(argument1, argument2,…);
• Labels can be used to provide more descriptive column headers. A label
• Convert numeric values to character using the PUT function can include any text up to 256 characters.
char-var = put(numeric-var, format); • Add labels to more than one column in a single label statement

• Convert character values to numeric using the INPUT function LABEL col-name1 = “Label Text 1”
col-name2 = “Label Text 2” ;
Numeric-var = input(char-var, informat);
• Formats are used to change the way values are displayed in data and
• SAS has functions to handle character, numeric and date columns. reports.
• Common Numeric Functions: • Formats do not change the underlying data values.
• Add formats to more than one column in a single statement
Function What is does
FORMAT date-var mmddyy10. num-var dollar13.2;
RAND(distribution, Generates random numbers from a
paramter1,...) selected distribution • Create your own custom formats using the PROC FORMAT procedure
• VALUE statement specifies the criteria for creating one
ROUND(number, Rounds number to the nearest custom format.
<rounding unit>) rounding unit (.01, .001, etc) • Multiple VALUE statements can be used within the PROC
FORMAT step.
LARGEST(k, value1, Returns the Kth largest non missing
value2, …) value PROC FORMAT;
VALUE format-name <$>
SUM(argument1, Sums all non missing arguments value-or-range-1 = 'formatted-value’
argument2,…) value-or-range-2 = 'formatted-value’
...;
RUN;

Copyright © 2022 SAS Institute Inc. Cary, NC, USA. All rights reserved
SAS Base Programming Reference Sheet
Filtering Data Exporting Data
• WHERE Statements filter rows and can be used in both the DATA • Export a SAS dataset to variable file types (XLSX, TXT,CSV, etc)
step and PROC steps. using a PROC EXPORT step
WHERE expression; • PROC EXPORT must be used to export unstructured data
type. (e.g. CSV files)
• If the expression is true, rows are read, if false, they are not. • DMBS = the database management system which
• WHERE statements can only work with columns that exist on an specifies the type of data to export. (e.g. CSV, DLM, JMP,
input dataset, not ones that are calculated during manipulation. TAB)
• Character values are case sensitive and must be in quotes “ ”
EX: WHERE Car_Make = “Honda” will select PROC EXPORT DATA=input-table
different rows than WHERE Car_Make = “HONDA” OUTFILE="output-file“
• Numeric values are not in quotes and can only include digits, decimal <DBMS = identifier REPLACE> ;
points and/or negative signs RUN;
• Compound conditions can be created using AND/OR
• Logic can be reversed with the NOT keyword • Alternatively use a LIBNAME statement to export data.
• Use the SAS Date constant when filtering with dates: “ddMONyyyy”d • A LIBNAME statement can only be used if the output data type has
an accessible SAS Engine (e.g. XLSX, JSON, XML).
WHERE Operators: • Ensure a LIBNAME libref CLEAR statement is used at the end to
= or EQ, close the connection to the excel workbook.
^=, ~= or NE
> or GT LIBNAME myXL XLSX “C:/documents/Shopping.XLSX”;
>= or GE DATA myXL.shopping;
< or LT SET work.shopping;
<= or LE RUN;
IN Operator LIBNAME myXL CLEAR
WHERE col-name in (value1, value2,…);
WHERE col-name NOT in (value1, value2,…); Exporting Reports
Special Operators
WHERE col-name IS MISSING • The SAS Output Delivery System (ODS) can send reports to various file
WHERE col-name IS NOT MISSING types to display reports including CSV, PowerPoint, RTF, and PDF.
WHERE col-name IS NULL • Each output type holds the same basic structure to open and close a
WHERE col-name BETWEEN value1 AND file. Additional statements are available based on the file type.
value2 ODS <destination> < destination specifications>;
WHERE col-name LIKE “value%” /* SAS Code that produces output */
WHERE col-name LIKE “value_” ODS destination CLOSE;
• A subsetting IF statement can be used on any variable that exists in • Additional options to excel files include:
the PDV. (e.g. variables on the input data set and new variables • Adding a style
created) • Adding a worksheet label
• The expression used in the IF statement is written with most of the ODS EXCEL FILE="filename.xlsx"
same operators as a WHERE expression. STYLE=style
OPTIONS(SHEET_NAME='label') ;
/*Implicit output */ /* SAS code that produces output on first
IF expression; worksheet */
IF expression THEN output; ODS EXCEL OPTIONS(SHEET_NAME=‘label’);
/* SAS code that produces output on second
/* Explicit output to specific table*/ worksheet */
IF expression THEN output libref.output-dataset-name; ODS EXCEL CLOSE;
• PDF outputs can include a Table of Contents (PDFTOC) and Procedure
labels in the bookmarks.
MACRO variables
ODS PDF FILE="filename.xlsx"
• A macro variable stores a value that can be submitted into a SAS STYLE=style
program STARTPAGE = NO PDFTOC= 1;
• If a macro variable is referenced inside quotation marks, then ODS PROCLABEL “label”;
double quotation marks must be used /* SAS code that produces output */
• Assign a value to a macro variable using a %LET statement ODS PDF CLOSE;
• The ampersand “&” must be used when calling a macro variable.
The & triggers the macro facility

%LET macro-variable = value; Additional Information


WHERE numvar = &macrovar;
WHERE charvar = “&macrovar”; • For more information on SAS programming techniques, visit
go.documentation.sas.com

Copyright © 2022 SAS Institute Inc. Cary, NC, USA. All rights reserved

You might also like