Using Proc IML: Statistical Computing Spring 2014
Using Proc IML: Statistical Computing Spring 2014
Statistical Computing
Spring 2014
What is IML?
SAS vs R
SAS: procedures (PROCs) and datasets
R: functions/operations and matrices/vectors
Proc IML
IML = Interactive Matrix Language
R-like programming inside of SAS
Pros: more flexible
Cons: programs are not validated
Applications
Simulate data
Matrix algebra (e.g. contrasts, algorithms)
Many things you could normally only do in R
Graphics
The Matrix
A matrix is a collection of numbers ordered by rows
and columns.
Matrices are characterized by the number of rows and
columns
The elements in a matrix are referred to first by their row
then column
x11 x12
X
x21 x22
Special Matrices
A 1 x 1 matrix is also known as a scalar
X x11
r x 1 or 1 x c matrices are known as vectors
x11
X x11 x12 X
x21
A diagonal matrix is a square matrix where the off-
diagonal elements are zero
An identity matrix is a diagonal matrix where the diagonal
elements are 1. These are also denoted by I c, where c is the
dimension of the matrix
x11 0 1 0
X I 2
0 x22 0 1
Creating Matrices in IML
PROC IML;
A = 1; /* CREATE A SCALAR*/
B = {1 2 3}; /* CREATE A ROW VECTOR OF LENGTH
3*/
C = { 4,
5,
6}; /* CREATE A COLUMN VECTOR OF LENGTH 3*/
D ={1 2,
3 4,
5 .}; /* CREATE A 3 BY 2 MATRIX WHERE THE 3,2
ELEMENT IS MISSING*/
PRINT A B C D; /* DISPLAY THE MATRICES IN THE
OUTPUT*/
QUIT;
*Can assign characters instead of numbers but matrix algebra won’t work
Manipulating Matrices
Using brackets inside the specification allows you to request
repeats
A={ [2] ‘Yes’, [2] ‘No’} is equivalent to A={‘Yes’ ‘Yes’, ‘No’ ‘No’}
SAS: {[# Repeats] Value}, R: rep(value, number of times)
Select a single element
A={1 2, 3 4}
To select the number 3: A2=A[2,1]
Select a row or column
To select the first row: A3=A[1, ]
To select the first column: A4=A[ ,1]
Select a submatrix
B={1 2 0 0, 3 4 00}
To select the A matrix from within B:
A_new=B[1:2,1:2] or B[,{1 2} ]
Manipulating Matrices (cont.)
To define row and column labels, first create a vector with the
labels
PRINT B[rowname=name label vector]
Can also use colname, format, and labels in this way
To permanently assign use mattrib matrix rowname= colname=
This then allows you to index using the matrix attributes (e.g. A[“True”,])
Selecting elements with logical arguments
Instead of listing the specific elements use a logical argument
A=[1 2 3 4], B=A[loc(A>2)]=[3 4]
Replace elements
Option 1: reassign specific elements
A[2]=7 will yield A=[1 7 3 4]
Option 2: reassign by a rule
A[loc(A>2)]=0 will yield A=[1 2 0 0]
Manipulating Matrices in IML
PROC IML;
REPEAT_O1={[2]"YES" [2] "NO"}; /*USING THE REPEAT FUNCTION TO FILL THE MATRIX*/
REPEAT_O2={"YES" "YES" "NO" "NO"}; /* REPEATING ELEMENTS MANUALLY*/
PRINT REPEAT_O1 REPEAT_O2;
A={1 2,
3 4}; /* DEFINE MATRIX*/
A1=A[2,1]; /* SELECT THE ELEMENT IN THE 2ND ROW, FIRST COLUMN: A1 SOULD EQUAL 3
*/
A2=A[1,]; /* SELECT THE FIRST ROW, A2 SHOULD EQUAL A 2 X 1 VECTOR {1 2} */
A3=A[,1]; /* SELECT THE FIRST COLUMN, A3 SHOULD EQUAL A 1 X 2 VECTOR {1,3} */
B={1 2 0 0, 3 4 0 0}; /* DEFINE A MATRIX B, WITH TWO SUBMATRICES A AND A 2 X 2
NULL MATRIX*/
A_NEW=B[1:2,1:2]; /* RECOVER THE A MATRIX FROM B */
A_NEW2=B[,{1 2}]; /*RECOVER THE A MATRIX FROM B, ANOTHER WAY TO WRITE IT*/
C_ROWNM={M F}; /* SET ROW NAMES FOR MATRIX C*/
C_COLNM={TRUE FALSE}; /* SET COL NAMES FOR MATRIX C*/
C={10 25,9 18};
PRINT A A1 A2 A3 B A_NEW
C[ROWNAME=C_ROWNM COLNAME=C_COLNM FORMAT=6.1 LABEL="MY MATRIX"] /*MODIFYING
PRINTED OUTPUT FOR MATRIX C*/;
Manipulating Matrices in IML
C_NEW=C; /* CREATING A DUPLICATE MATRIX*/
MATTRIB C_NEW ROWNAME=C_ROWNM COLNAME=C_COLNM FORMAT=6.1
LABEL="MY MATRIX"; /* PERMANANTLY CHANGING OUTPUT FORMAT*/
PRINT C C_NEW; /* COMPARING DIFFERENT APPROACHES*/
1 3 5 2 4 5
2 5 7 0 9 5
Matrix Multiplication and Division
Scalar by Matrix multiplication and division is an
element wise operation and commutative.
R aB Ba rij abij
Multiplication of vectors and matrices
Not commutative (AB ≠ BA)
Requires that the number of columns in A equals the
number of rows in B
The resulting matrix R will have dimension equal to rows of
A and columns of B
Ar , x Bx ,c Rr ,c
Multiplication and Division (cont.)
x
Ri j Ai x Bx j , where rij aihbhj
h 1
2 3 1 6
A , B
4 5 2 0
2 1 3 2 2 6 3 0 8 12
AB
4 1 5 2 4 6 5 0 14 24
26 33
BA
4 6
Special Properties
Transpose: A’= (aji)
1 2
1 3 5
A 3 4 , A'
5 6 2 4 6
*MATRIX MULTIPLICATION;
A={2 3,4 5}; /*DEFINE MATRIX*/
B={1 6,2 0}; /*DEFINE MATRIX*/
AB=A*B; /*MULTIPLY A BY B*/
BA=B*A; /* MULTIPLY B BY A*/
PRINT A B AB BA; /* NOTE THAT MULTIPLICATION IS NOT
COMMUTATIVE, AB DOESN'T EQUAL BA*/
QUIT;
Matrix Operators: Comparison
Element wise comparison of matrices, result is a
matrix of 0(False) and 1 (True)
Comparisons
Less than (<), less than or equal to (<=)
Greater than (>), greater than or equal to (>=)
Equal to (=), Not equal to (^=)
Can create compound arguments using logical
functions
And (&)
Or ( |)
Not ( ^)
Solving Systems of Equations
Solve the following system of equations
3 x 2 y 4 z 11
5x 4 y 9
3 y 10 z 42
3 2 4 x 11
5 4 0 y 9
0 3 10 z 42
Solving Systems of Equations (cont)
To solve, we can PROC IML;
rearrange A={3 2 -4,
5 -4 0,
AX B X A1 B 0 3 10};
1 B={11,9,42};
x 3 2 4 11
y 5 4 0 9 OPT1=SOLVE(A,B);
OPT2=INV(A)*B;
z 0 3 10 42
PRINT OPT1 OPT2;
QUIT;
Working with SAS Datasets
Opening a SAS Dataset
Before you can access a SAS dataset, you must first
submit a command to open it.
To simply read from an existing data set, submit a USE
statement.
USE <SAS Dataset> VAR <Variable Names> WHERE expression;
To read and write to an existing data set, use the EDIT
statement.
In addition to READ you can also EDIT, DELETE, and PURGE
observations from a dataset that has been opened using edit
Each dataset must only be opened once
Reading in Datasets
Create matrices from a SAS dataset
Create a vector for each variable
Create a matrix containing multiple variables
Select all observations or a subset
To transfer data from a SAS dataset to a matrix
SETIN
Specifies an open dataset as the current input dataset
READ
Transforms dataset into matrix
READ <range> VAR operand <WHERE (expression)>
INTO name;
READ all VAR VAR1 WHERE VAR1>80 INTO MYMAT;
Comparison Operators
Operation IML Code
Less than <
Less than or equal to <=
Equal to =
Greater than >
Greater than or equal to >=
Not equal to ^=
Contains a given string ?
Does not contain a given string ^?
Begins with a given string =:
Sounds like or is spelled like a given string =*
Sorting SAS Datasets
First close the dataset
SORT dataset out=new_dataset by var_name;
Can use the keyword DESCENDING to denote the
alternative sort order
Creating Datasets from Matrices
When you create a dataset
Columns become variables
Rows become observations
CREATE
Opens a new SAS dataset for I/O
APPEND
Writes to the dataset
CREATE SAS-data-set FROM matrix
<[COLNAME=column-name ROWNAME=row name]>
CREATE SAS-dataset VAR variable-names; APPEND
FROM matrix-name;
Data Management Commands
Command Description Command Description
APPEND Adds observations to the end RESET Names default libname
of a SAS dataset DEFLIB
CLOSE Closes a SAS dataset SETIN Selects an open SAS dataset
for input
CREATE Creates and opens a new SAS SETOUT Selects an open SAS dataset
dataset or input and output for output
DELETE Marks observations for SHOW Shows contents of the
deletion in a SAS dataset CONTENTS current input SAS dataset
EDIT Opens an existing SAS dataset SHOW Shows SAS datasets currently
for I/O DATASETS open
FIND Finds observations SORT Sorts a SAS dataset
PROC IML;
USE MYDATA VAR {MSRP MPG_CITY MPG_HIGHWAY} ; /*
OPEN DATASET*/
READ ALL VAR _ALL_ WHERE (MSRP<12000) INTO
CAR_MAT; /* READ DATASET*/
Z=NROW(CAR_MAT); /* FIGURE OUT HOW MANY ROWS*/
PRINT Z CAR_MAT[COLNAME={MSRP CITY HWY}]; /* LOOK
AT DATA*/
QUIT;
Analyzing Data & Writing Programs
Subscript Operations
Commands that can be applied Reduction operators
to obtain summary statistics on Addition +
matrices Multiplication #
Select a single element, row, Mean:
column, or submatrix Sum of Squares ##
Similar to the APPLY function in R Maximum <>
Minimum ><
SUMMARY produces summary
Index of maximum <:>
statistics on the numeric
Index of minimum >:<
variables of a SAS data set. If you
want them by subgroup use the
Additional Operators
CLASS option. Concatenation: Horizontal ||,
Vertical //
SUMMARY VAR {VARIABLE LIST} Number of rows: nrow(matrix),
<CLASS (By Variables)> STAT Number of Columns: ncol(matrix)
(Desired stats) <OPT (SAVE)>
Types of Statements
Control Statements
Direct the flow of execution
E.g. IF-THEN/ELSE statement
Functions and CALL statements
Perform special tasks or user-defined operations
Command statements
Perform special processing such as setting options,
displaying windows, and handling input and output
Control Statements
Statement Description
PROC IML; QUIT; Initiates and ends an IML session
DO; END; Specifies a group of statements
Iterative DO; END; Defines an iteration loop
IF-THEN;ELSE; Conditionally routes execution
START; FINISH; Defines a module
RUN; Executes a Module
IF-THEN/ELSE statements
IF expression THEN PROC IML;
statement-one; ELSE A={12 22 33};
statement-two; IF MAX(A)<20
IML processess the
THEN P=1; ELSE
expression and uses this
to decide whether P=0; PRINT P;
statement one or QUIT;
statement two is executed.
You may also nest IF-
THEN/ELSE Statements
DO groups
Several statements can be grouped PROC IML;
together into a compound statement to Y=0;
be executed as a unit. DO I=1 TO 3;
DO; Statements; END; Y=Y+1;
You can combine DO arguments with PRINT Y;
IF/ELSE END;
QUIT;
IF (X<Y) THEN DO; Z=X+Y; END;
PROC IML;
ELSE DO; Z=X-Y; END;
COUNT=1;
The iterative DO <WHILE/UNTIL DO WHILE(COUNT<3);
expression> repeats a set of statements COUNT=COUNT+1;
over an number of times defined by the PRINT “WHILE";
index.
END;
If DO WHILE is used, the expression is COUNT=1;
evaluated at the beginning of each loop
with iterations continuing until the DO UNTIL(COUNT>3);
expression is false. If the expression begins COUNT=COUNT+1;
false the loop does not run. PRINT “UNTIL";
If DO UNTIL is used the expression is END;
evaluated at the end of the loop, this QUIT;
means that the loop will always execute at
least once.
Interacting with Procs
Option One
Write the data to a SAS data set by using the CREATE and APPEND statements
Use the SUBMIT statement to call a SAS procedure that analyzes the data
Read the results of the analysis into IML matrices using USE and READ statements
Option Two
Do what can only be done in IML
Write the data back out to a SAS dataset
Call PROCs normally
ODS TRACE ON;/ODS TRACE OFF;
Placed before and after a proc will print to the log the names of the various output.
Useful for requesting/saving specific parts of the analysis.
To use PROCs SUBMIT; Statements; END SUBMIT;
Like macros you can list variables already existing in IML that you would like to use in the
proc. Then inside the submit command refer to these variables using &Varname
Substitutions take place before the block is processed so no macro variable is created
If you use SUBMIT *, you indicate a wildcard so that any of the existing variables can be
referred
Any variable inside the submit block that is referenced (&var) but not created in the IML
procedure does not get substituted. This is used for creating true macros.
Interacting with Procs
PROC IML;
Q={2 5 7 9};
CREATE MYDATA VAR{Q};
APPEND;
CLOSE MYDATA;
*Table=“Moments”;
SUBMIT;
*SUBMIT table;
PROC UNIVARIATE DATA=MYDATA;
VAR Q;
ODS OUTPUT MOMENTS=MOMENTS;
* ODS OUTPUT MOMENTS=&Table;
RUN;
ENDSUBMIT;
USE MOMENTS;
READ ALL VAR{NVALUE1 LABEL1};
CLOSE MOMENTS;
LABL ="MY OUTPUT";
PRINT NVALUE1[ROWNAME=LABEL1 LABEL=LABL];
QUIT;
Modules
Modules are used for two purposes
To create user-defined subroutine or function.
To define variables that are local to the module.
START MODULE-NAME OPTIONS; STATEMENTS; FINISH MODULE-NAME;
To execute the module use
RUN MODULE-NAME; execute module first then subroutines
CALL MODULE_NAME; execute subroutines then modules
A function is a special type of module that only returns a specific value.
START MODULE; STATEMENTS; RETURN(VARIABLE); FINISH MODULE;
Any variables created inside the module but not mentioned in the return
statement will not be retained for future use.
Possible to store and load modules (like a macro library or SOURCE in R)
STORE MODULE= MODULE NAME;
LOAD MODULE=MODULE NAME;
These will retain a program after IML has exited
Creating a Permanent Module Library
Permanent libraries maintain functions for multiple
users. Equivilant to datasets stored in a permanent
library vs. work folder
LIBNAME LIBRARY ‘PATH’;
PROC IML;
START FUNC1(X); RETURN(X+1); FINISH;
START FUNC2(X); RETURN(X**2); FINISH;
RESET STORAGE=SOURCEFILE.LIBRARY;
STORE MODULE=_ALL_;
QUIT;
Command Statments
Statement Description
FREE Frees memory associated with a matrix
LOAD Loads a matrix or module from a
storage library
MATTRIB Associates printing attributes with
matrices
PRINT Prints a matrix or message
RESET Sets various system options
REMOVE Removes a matrix or module from
library storage
SHOW Displays system information
STORE Stores a matrix or module in the storage
library
Using R
Calling R from within IML
Check to see if R has permission for your SAS
PROC OPTIONS OPTION=RLANG;
If not, you will have to add the –RLANG option to startup
Similar to calling procs
SUBMIT/R; ENDSUBMIT;
Export
ExportDataSetToR: SAS dataset ->R data frame
ExportMatrixtoR:IML Matrix->R Matrix
Import
IMPORTDATASETFROMR: R Expression ->SAS Dataset
IMPORTMATRIXFROMR : R Expression ->SAS MATRIX
R OBJECTS TEND TO BE COMPLEX SO YOU CAN ONLY TRANSFER
SOMETHING THAT HAS BEEN COERCED TO DATA FRAME
SAS to R and back again
proc iml;
proc iml; use Sashelp.Class;
/* Comparison of matrix operations in IML and R */ read all var {Weight Height};
print "---------- SAS/IML Results -----------------"; close Sashelp.Class;
/* send matrices to R */
x = 1:3; /* vector of sequence 1,2,3 */
call ExportMatrixToR(Weight, "w");
m = {1 2 3, 4 5 6, 7 8 9}; /* 3 x 3 matrix */ call ExportMatrixToR(Height, "h");
q = m * t(x); /* matrix multiplication */ submit / R;
print q; Model <- lm(w ~ h, na.action="na.exclude") # a
ParamEst <- coef(Model) # b
Pred <- fitted(Model)
print "------------- R Results --------------------";
Resid <- residuals(Model)
submit / R; endsubmit;
rx <- matrix( 1:3, nrow=1) # vector of sequence 1,2,3 call ImportMatrixFromR(pe, "ParamEst");
rm <- matrix( 1:9, nrow=3, byrow=TRUE) # 3 x 3 print pe[r={"Intercept" "Height"}];
matrix ht = T( do(55, 70, 5) );
rq <- rm %*% t(rx) # matrix multiplication A = j(nrow(ht),1,1) || ht;
pred_wt = A * pe;
print(rq) print ht pred_wt;
endsubmit;
YVar = "Weight";
submit / R; XVar = "Height";
hist(p, freq=FALSE) # histogram submit XVar YVar / R;
Model <- lm(&YVar ~ &XVar, data=Class, na.action="na.exclude")
lines(est) # kde overlay
print (Model$call)
endsubmit; endsubmit;
MISC