SET Where Label Rename Format
SET Where Label Rename Format
By: Raj
Programmatic Approach using DATA Step and PROC SQL
Creating a SAS data set Using DATA Step:
DATA raj;
SET sashelp.class;
WHERE age<13;
LABEL name = 'First Name';
RENAME name = FName;
FORMAT height weight 5.1;
RUN;
Storing Results:
Very often you don’t want to display results. Instead you want to store them for
use in subsequent computations. That’s what this DATA step will do:
Page 1 of 10
SQL
By: Raj
DATA new;
SET raj;
RUN;
Column Subsets:
If you don’t need all of the variables available in the existing data set? In the
DATA step, a KEEP statement can be used to identify those to be stored in the new
data set. For example:
DATA subset;
SET raj;
KEEP fname sex age;
RUN;
DATA subset;
SET raj;
DROP height weight;
RUN;
PROC SQL;
CREATE TABLE subset(DROP=height weight) AS
SELECT *
Page 2 of 10
SQL
By: Raj
FROM raj
;
QUIT;
Page 3 of 10
SQL
By: Raj
;
QUIT;
Creating Subtotals:
Suppose that instead of an overall summary, we want the computations stratified
by SEX. The PROC SUMMARY code shown previously can be adapted by
inserting a CLASS statement and coding the NWAY option (to suppress
production of the grand overall statistics, which we no longer want). Here is the
code:
PROC SUMMARY DATA=raj NWAY;
CLASS sex;
VAR age height weight;
OUTPUT OUT=group_averages(DROP = _type_ _freq_)
MIN (age )=Youngest
MAX (age )=Oldest
MEAN(height)=Avg_Height
MEAN(weight)=Avg_Weight;
RUN;
Conditionality
It is not uncommon to have values that depend on other values—in other words,
conditionality. Probably the most common way of implementing conditionality in
the DATA step is the IF/THEN/ELSE structure.
For example, suppose that students of different ages and sexes are to go on
different field trips. The 11-year-olds (boys and girls) are going to the zoo; girls
who are not going to the zoo (that is, 12-year-old girls) are going to the museum;
and boys who aren’t going to the zoo have to stay behind. Here’s one way of
generating a list of individual student destinations:
Page 4 of 10
SQL
By: Raj
DATA trip_list;
SET raj;
IF age=11 THEN Trip = 'Zoo ';
ELSE IF sex='F' THEN trip = 'Museum';
ELSE trip = '[None]';
KEEP fname age sex trip;
RUN;
DATA girls;
SET raj;
WHERE sex='F';
Page 5 of 10
SQL
By: Raj
RUN;
PROC SQL;
CREATE TABLE girls AS
SELECT *
FROM raj
WHERE sex='F'
;
QUIT;
PROC SQL;
SELECT *
FROM raj
WHERE age=10
;
QUIT;
PROC SQL;
CREATE TABLE tens AS
SELECT *
FROM raj
WHERE age=10
;
QUIT;
Page 6 of 10
SQL
By: Raj
OUTPUT MAX(height)=Tallest MIN(height)=Shortest
OUT= hilo(DROP = _type_ _freq_);
RUN;
PROC SQL;
CREATE TABLE hilo AS
SELECT sex,
age,
MAX(height) AS Tallest,
MIN(height) AS Shortest
FROM raj
GROUP BY sex, age
;
QUIT;
Reordering Rows:
The purpose of PROC SORT is the reordering of observations. For example, if we
run:
PROC SQL;
CREATE TABLE age_sort AS
SELECT *
FROM raj
ORDER BY age DESCENDING, fname
;
QUIT;
Page 7 of 10
SQL
By: Raj
First create Dataset:
PROC SQL;
CREATE TABLE sex_age AS
SELECT sex, age
FROM RAJ
; QUIT;
SQL has a special keyword, DISTINCT, to specify that duplicate rows are to be
eliminated. The keyword appears in the SELECT statement or clause, immediately
following SELECT and preceding the list of columns. So the SQL code to
eliminate duplicates from our table is:
PROC SQL;
CREATE TABLE sex_age_distinct AS
SELECT DISTINCT *
FROM sex_age
;
QUIT;
PROC SQL;
CREATE TABLE teens AS
SELECT name AS FName,
age
FROM sashelp.class
WHERE age>12
;
QUIT;
Page 8 of 10
SQL
By: Raj
Summary Stats using Proc Freq:
PROC FREQ DATA=teens NOPRINT;
TABLES age / OUT=cohorts(DROP=percent RENAME=(count=Many) );
RUN;
To combine these counts with the original data, we first sort that original data:
PROC SORT DATA=teens OUT=sorted;
BY age;
RUN;
DATA detail_and_counts;
MERGE sorted cohorts;
BY age;
RUN;
We now have all of the data together, but the names are grouped by AGE and thus
not in
alphabetical order.
Then we combine the original data with the counts, via a MERGE statement:
DATA detail_and_counts;
MERGE sorted cohorts;
BY age;
RUN;
We now have all of the data together, but the names are grouped by AGE and thus
not in alphabetical order. So we sort again to restore the original alphabetical
order:
PROC SORT DATA=detail_and_counts;
BY fname;
run;
It has taken four steps to get the output. In contrast, using SQL, we can simply
write:
PROC SQL;
CREATE TABLE detail_and_counts AS
SELECT fname,
age,
COUNT(*) AS Many
Page 9 of 10
SQL
By: Raj
FROM teens
GROUP BY age
ORDER BY fname
;
QUIT;
Deriving our unweighted mean via PROC MEANS is more complicated, and is a
twostep proposition. First we have to eliminate repetitions of AGE values; one way
to do this is with PROC FREQ:
Now we can proceed to find the average of these distinct (unduplicated) AGE
values, using PROC MEANS:
This derivation can be done in just one PROC SQL statement. We can even display
the simple weighted mean alongside. The code is:
PROC SQL;
SELECT MEAN( age)
LABEL = 'Weighted' FORMAT=8.3,
MEAN(DISTINCT age)
LABEL = 'Unweighted' FORMAT=8.3
FROM teens; QUIT;
Page 10 of 10