0% found this document useful (0 votes)

187 views11 pages

Programmatic Approach Using DATA Step and PROC SQL Creating A SAS Data Set Using DATA Step

The document provides examples of creating SAS data sets, selecting variables, filtering data, performing calculations and summary statistics, and combining/reordering data using both the DATA step and PROC SQL approaches. Key differences between the two approaches are discussed such as using SELECT vs MERGE statements to combine data and summaries, and using DISTINCT to eliminate duplicate values in PROC SQL.

Uploaded by

Tejavath Prashanth Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

187 views11 pages

Programmatic Approach Using DATA Step and PROC SQL Creating A SAS Data Set Using DATA Step

Uploaded by

Tejavath Prashanth Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Programmatic Approach using DATA Step and PROC SQL

Creating a SAS data set Using DATA Step:

DATA raj;
SET [Link];
WHERE age<13;
LABEL name = 'First Name';
RENAME name = FName;
FORMAT height weight 5.1;
RUN;

Using the Simplest SELECT Step:

To select some/all variables and create a SAS data set.
PROC SQL;
SELECT *
FROM raj
;
QUIT;

A More selective Statement:

To print selected variables:
Using SAS dataset we use var statement to print particular variables:

PROC PRINT NOOBS LABEL DATA=raj;

VAR fname age;
RUN;

Using Proc Step:

PROC SQL;
SELECT fname, age
FROM raj
;
QUIT;

Storing Results:
Very often you don’t want to display results. Instead you want to store them for
use in subsequent computations. That’s what this DATA step will do:

Page 1 of 11
E-Mail: contact@[Link] Phone: +91-9848733309/+91-9676828080
[Link] & [Link]
DATA new;
SET raj;
RUN;

Using Proc Sql:

PROC SQL;
CREATE TABLE new AS
SELECT *
FROM raj
;
QUIT;

Column Subsets:
If you don’t need all of the variables available in the existing data set? In the
DATA step, a KEEP statement can be used to identify those to be stored in the new
data set. For example:
DATA subset;
SET raj;
KEEP fname sex age;
RUN;

Using Proc SQL we can create subsets.

PROC SQL;
CREATE TABLE subset AS
SELECT fname, sex, age
FROM raj
;
QUIT;

DATA subset;
SET raj;
DROP height weight;
RUN;

However, this approach to subsetting can be implemented if we turn to SAS

features available within PROC SQL. Specifically, we can code the DROP= data
set option for the table being created, as in:

Page 2 of 11
E-Mail: contact@[Link] Phone: +91-9848733309/+91-9676828080
[Link] & [Link]
PROC SQL;
CREATE TABLE subset(DROP=height weight) AS
SELECT *
FROM raj
;
QUIT;

Creating New Columns:

Using Data step:
DATA ratios;
SET raj;
ATTRIB Ratio FORMAT=5.2 LABEL='Weight:Height Ratio';
ratio = weight / height;
RUN;

Using PROC SQL:

PROC SQL;
CREATE TABLE ratios AS
SELECT *,
weight / height AS Ratio
FORMAT=5.2 LABEL='Weight:Height Ratio'
FROM raj
;
QUIT;

Calculate Descriptive Statistics:

Using Data Step:
PROC SUMMARY DATA=raj;
VAR age height weight;
OUTPUT OUT=overall_averages(DROP = _type_ _freq_)
MIN (age )=Youngest
MAX (age )=Oldest
MEAN(height)=Avg_Height
MEAN(weight)=Avg_Weight;
RUN;

Using proc Step:

PROC SQL;
CREATE TABLE overall_averages AS
SELECT MIN (age) AS Youngest,
Page 3 of 11
E-Mail: contact@[Link] Phone: +91-9848733309/+91-9676828080
[Link] & [Link]
MAX (age) AS Oldest,
MEAN(height) AS Avg_Height FORMAT=5.1,
MEAN(weight) AS Avg_Weight FORMAT=5.1
FROM raj
;
QUIT;

Creating Subtotals:
Suppose that instead of an overall summary, we want the computations stratified
by SEX. The PROC SUMMARY code shown previously can be adapted by
inserting a CLASS statement and coding the NWAY option (to suppress
production of the grand overall statistics, which we no longer want). Here is the
code:
PROC SUMMARY DATA=raj NWAY;
CLASS sex;
VAR age height weight;
OUTPUT OUT=group_averages(DROP = _type_ _freq_)
MIN (age )=Youngest
MAX (age )=Oldest
MEAN(height)=Avg_Height
MEAN(weight)=Avg_Weight;
RUN;

Using PROC SQL

PROC SQL;
CREATE TABLE group_averages AS
SELECT sex,
MIN (age) AS Youngest,
MAX (age) AS Oldest,
MEAN(height) AS Avg_Height FORMAT=5.1,
MEAN(weight) AS Avg_Weight FORMAT=5.1
FROM raj
GROUP BY sex
;
QUIT;

Conditionality
It is not uncommon to have values that depend on other values—in other words,
conditionality. Probably the most common way of implementing conditionality in
the DATA step is the IF/THEN/ELSE structure.
Page 4 of 11
E-Mail: contact@[Link] Phone: +91-9848733309/+91-9676828080
[Link] & [Link]
For example, suppose that students of different ages and sexes are to go on
different field trips. The 11-year-olds (boys and girls) are going to the zoo; girls
who are not going to the zoo (that is, 12-year-old girls) are going to the museum;
and boys who aren’t going to the zoo have to stay behind. Here’s one way of
generating a list of individual student destinations:

DATA trip_list;
SET raj;
IF age=11 THEN Trip = 'Zoo ';
ELSE IF sex='F' THEN trip = 'Museum';
ELSE trip = '[None]';
KEEP fname age sex trip;
RUN;

USING PROC SQL:

PROC SQL;
CREATE TABLE trip_list AS
SELECT fname,
age,
sex,
CASE WHEN age=11 THEN 'Zoo'
WHEN sex='F' THEN 'Museum'
ELSE '[None]'
END
AS Trip
FROM raj
;
QUIT;

USING SELECT STATEMENT IN DATA STEP:

DATA trip_list;
SET RAJ;
SELECT;
WHEN (age=11) Trip = 'Zoo ';
WHEN (sex='F') trip = 'Museum';
OTHERWISE trip = '[None]';
END;
KEEP fname age sex trip;
RUN;

Page 5 of 11
E-Mail: contact@[Link] Phone: +91-9848733309/+91-9676828080
[Link] & [Link]
FILTERING USING WHERE STATEMENT:
Using DATA STEP:

DATA girls;
SET raj;
WHERE sex='F';
RUN;

Using PROC SQL:

PROC SQL;
CREATE TABLE girls AS
SELECT *
FROM raj
WHERE sex='F'
;
QUIT;

PROC SQL;
SELECT *
FROM raj
WHERE age=10
;
QUIT;

PROC SQL;
CREATE TABLE tens AS
SELECT *
FROM raj
WHERE age=10
;
QUIT;

Filtering Aggregated Data:

SQL has a second filtering device, the HAVING clause. The distinction between
this and the WHERE clause is that HAVING conditions can reference summary
statistics and are evaluated after aggregations are performed. Thus they take effect
“downstream,” on the output side of the process.

Page 6 of 11
E-Mail: contact@[Link] Phone: +91-9848733309/+91-9676828080
[Link] & [Link]
To illustrate, consider this PROC SUMMARY step, which calculates the extreme
values of the HEIGHT variable and does so separately for each SEX/AGE
combination:

Using Proc Summary in Data Step:

PROC SUMMARY DATA=raj NWAY;

CLASS sex age;
OUTPUT MAX(height)=Tallest MIN(height)=Shortest
OUT= hilo(DROP = _type_ _freq_);
RUN;

Using Proc SQL:

PROC SQL;
CREATE TABLE hilo AS
SELECT sex,
age,
MAX(height) AS Tallest,
MIN(height) AS Shortest
FROM raj
GROUP BY sex, age
;
QUIT;

Reordering Rows:
The purpose of PROC SORT is the reordering of observations. For example, if we
run:

PROC SORT DATA=raj OUT=age_sort;

BY DESCENDING age fname;
RUN;

USING PROC SQL:

PROC SQL;
CREATE TABLE age_sort AS
SELECT *
FROM raj
ORDER BY age DESCENDING, fname
Page 7 of 11
E-Mail: contact@[Link] Phone: +91-9848733309/+91-9676828080
[Link] & [Link]
;
QUIT;

Elimination of Duplicate Records:

Eliminating duplicate rows from a table is a common task. To illustrate, we first
need to have a data set containing duplicates. We’ll get one by eliminating some of
the columns in Raj:

First create Dataset:

PROC SQL;
CREATE TABLE sex_age AS
SELECT sex, age
FROM RAJ
; QUIT;

A commonly used technique for elimination of the duplicates is to use the

NODUPRECS
option of PROC SORT:

PROC SORT DATA=sex_age OUT=sex_age_distinct NODUPRECS;

BY _ALL_;
RUN;

SQL has a special keyword, DISTINCT, to specify that duplicate rows are to be
eliminated. The keyword appears in the SELECT statement or clause, immediately
following SELECT and preceding the list of columns. So the SQL code to
eliminate duplicates from our table is:

PROC SQL;
CREATE TABLE sex_age_distinct AS
SELECT DISTINCT *
FROM sex_age
;
QUIT;

Combining Summary Statistics with Original detail:

Let’s begin by creating a table we will use in the examples:
Page 8 of 11
E-Mail: contact@[Link] Phone: +91-9848733309/+91-9676828080
[Link] & [Link]
PROC SQL;
CREATE TABLE teens AS
SELECT name AS FName,
age
FROM [Link]
WHERE age>12
;
QUIT;

Summary Stats using Proc Freq:

PROC FREQ DATA=teens NOPRINT;
TABLES age / OUT=cohorts(DROP=percent RENAME=(count=Many) );
RUN;

To combine these counts with the original data, we first sort that original data:
PROC SORT DATA=teens OUT=sorted;
BY age;
RUN;
DATA detail_and_counts;
MERGE sorted cohorts;
BY age;
RUN;

We now have all of the data together, but the names are grouped by AGE and thus
not in
alphabetical order.

PROC SORT DATA=detail_and_counts;

BY fname;
run;

Then we combine the original data with the counts, via a MERGE statement:
DATA detail_and_counts;
MERGE sorted cohorts;
BY age;
RUN;
We now have all of the data together, but the names are grouped by AGE and thus
not in alphabetical order. So we sort again to restore the original alphabetical
order:
Page 9 of 11
E-Mail: contact@[Link] Phone: +91-9848733309/+91-9676828080
[Link] & [Link]
PROC SORT DATA=detail_and_counts;
BY fname;
run;

It has taken four steps to get the output. In contrast, using SQL, we can simply
write:
PROC SQL;
CREATE TABLE detail_and_counts AS
SELECT fname,
age,
COUNT(*) AS Many
FROM teens
GROUP BY age
ORDER BY fname
;
QUIT;

Summary Statistics based on Distinct Values:

Sometimes, when we need summary statistics derived from our data, we want the
computer to ignore repetition of values. For example, suppose we want to know the
average of the AGE values that occur in our TEENS table, ignoring repetitions of
those values. In other words, we need an unweighted mean of AGE, in the sense
that we want to include each particular value (13 and so on) only once, no matter
how many times it may appear. The weighted mean is pretty simple to calculate,
with or without SQL. The non-SQL code is:

PROC MEANS DATA=teens MEAN MAXDEC=3;VAR age;

RUN;

Deriving our unweighted mean via PROC MEANS is more complicated, and is a
twostep proposition. First we have to eliminate repetitions of AGE values; one way
to do this is with PROC FREQ:

PROC FREQ DATA=teens NOPRINT;

TABLES age / out=freq2means(KEEP = age);
RUN;

Now we can proceed to find the average of these distinct (unduplicated) AGE
values, using PROC MEANS:

Page 10 of 11
E-Mail: contact@[Link] Phone: +91-9848733309/+91-9676828080
[Link] & [Link]
PROC MEANS DATA=freq2means MEAN MAXDEC=3;
VAR age;
RUN;

This derivation can be done in just one PROC SQL statement. We can even display
the simple weighted mean alongside. The code is:

PROC SQL;
SELECT MEAN( age)
LABEL = 'Weighted' FORMAT=8.3,
MEAN(DISTINCT age)
LABEL = 'Unweighted' FORMAT=8.3
FROM teens; QUIT;

Page 11 of 11
E-Mail: contact@[Link] Phone: +91-9848733309/+91-9676828080
[Link] & [Link]

SAS & SQL Techniques for Data Analysis
No ratings yet
SAS & SQL Techniques for Data Analysis
10 pages
Proc SQL 20150209
No ratings yet
Proc SQL 20150209
41 pages
Demographic Project
No ratings yet
Demographic Project
7 pages
SAS&R Analogs
No ratings yet
SAS&R Analogs
7 pages
SAS Programming Efficiency Tips
No ratings yet
SAS Programming Efficiency Tips
9 pages
SAS Report & Frequency Table Guide
No ratings yet
SAS Report & Frequency Table Guide
15 pages
Yr10 CS 23-24 T3 W7 L1 SQL Basics
No ratings yet
Yr10 CS 23-24 T3 W7 L1 SQL Basics
28 pages
SAS Material
No ratings yet
SAS Material
75 pages
Imp SQL Solved
No ratings yet
Imp SQL Solved
21 pages
CLASS 12 COMPUTER SCIENCE PRACTICAL FILE 2023 24 - Extracted
No ratings yet
CLASS 12 COMPUTER SCIENCE PRACTICAL FILE 2023 24 - Extracted
7 pages
UNIT 8 Note 3 More SQL
No ratings yet
UNIT 8 Note 3 More SQL
3 pages
PROC REPORT: A Comprehensive Guide
No ratings yet
PROC REPORT: A Comprehensive Guide
28 pages
Project 1
No ratings yet
Project 1
5 pages
01 SQL Fundamentals
No ratings yet
01 SQL Fundamentals
149 pages
Dbms Practical 5-8
No ratings yet
Dbms Practical 5-8
14 pages
SQL Data Manipulation and Joins Guide
100% (1)
SQL Data Manipulation and Joins Guide
5 pages
Understanding PROC SQL Semicolon Usage
100% (1)
Understanding PROC SQL Semicolon Usage
5 pages
SAS Learning Module Collapsing Across Observations Using Proc SQL
No ratings yet
SAS Learning Module Collapsing Across Observations Using Proc SQL
4 pages
Placementdrive 1706154092999283
No ratings yet
Placementdrive 1706154092999283
6 pages
12-Ip-Cs-Mysql-Tables Revisions-23-24-Anskey
No ratings yet
12-Ip-Cs-Mysql-Tables Revisions-23-24-Anskey
3 pages
SAS Procedures for Data Import/Export
No ratings yet
SAS Procedures for Data Import/Export
36 pages
SAS Chapter 03
No ratings yet
SAS Chapter 03
6 pages
SQL3
No ratings yet
SQL3
2 pages
SQL Queries for Database Management
No ratings yet
SQL Queries for Database Management
36 pages
c164 Biva Exp2
No ratings yet
c164 Biva Exp2
21 pages
SQL Commands and Database Management Guide
No ratings yet
SQL Commands and Database Management Guide
16 pages
SQL Simplification for Data Scientists
No ratings yet
SQL Simplification for Data Scientists
22 pages
SQL Summary
No ratings yet
SQL Summary
10 pages
SQL Guide
No ratings yet
SQL Guide
16 pages
Here Is The Output Produced by The Proc Print Statement Above
No ratings yet
Here Is The Output Produced by The Proc Print Statement Above
6 pages
SQL Interview Questions Guide
No ratings yet
SQL Interview Questions Guide
33 pages
I. Schemas Table 1: STUDIES
No ratings yet
I. Schemas Table 1: STUDIES
4 pages
12 Practical File IT
No ratings yet
12 Practical File IT
3 pages
Data Celko
No ratings yet
Data Celko
60 pages
0.0 - Hypothesis Testing - AA
No ratings yet
0.0 - Hypothesis Testing - AA
13 pages
ALBS - SAS Module 7
No ratings yet
ALBS - SAS Module 7
56 pages
Rdbmsexp 6
No ratings yet
Rdbmsexp 6
6 pages
Adbms 1.4
No ratings yet
Adbms 1.4
8 pages
MySQL Select and Update Exercises
No ratings yet
MySQL Select and Update Exercises
4 pages
Thinking in Sets
No ratings yet
Thinking in Sets
37 pages
Practical Program SQL
No ratings yet
Practical Program SQL
12 pages
DBMS Pactical File SS
No ratings yet
DBMS Pactical File SS
21 pages
SQL Data Manipulation Language Guide
100% (2)
SQL Data Manipulation Language Guide
51 pages
SQL Notes Queries 1
No ratings yet
SQL Notes Queries 1
9 pages
Mfe Sas L3 SQL
No ratings yet
Mfe Sas L3 SQL
24 pages
SQL Assignment
No ratings yet
SQL Assignment
6 pages
Database Management Lab Record
No ratings yet
Database Management Lab Record
68 pages
Database SQL
No ratings yet
Database SQL
24 pages
SQL Syntax Cheat Sheet: Basics to Advanced
No ratings yet
SQL Syntax Cheat Sheet: Basics to Advanced
15 pages
SQL Level Classification Guide
No ratings yet
SQL Level Classification Guide
3 pages
SQL Notesdoc
No ratings yet
SQL Notesdoc
5 pages
Data Warehousing & SQL Concepts
No ratings yet
Data Warehousing & SQL Concepts
14 pages
SQL
No ratings yet
SQL
10 pages
SQL PPT
No ratings yet
SQL PPT
68 pages
SQL Table Creation and Queries Guide
40% (5)
SQL Table Creation and Queries Guide
59 pages
SAP Java Instance Troubleshooting
No ratings yet
SAP Java Instance Troubleshooting
5 pages
Implementing and Administering Microsoft Project Server 2019 Training
No ratings yet
Implementing and Administering Microsoft Project Server 2019 Training
9 pages
Airline Management System
No ratings yet
Airline Management System
8 pages
Senior - Platform Engineer - JR912
No ratings yet
Senior - Platform Engineer - JR912
2 pages
Lead Analyst Role in Chennai
No ratings yet
Lead Analyst Role in Chennai
3 pages
Information Management L2
No ratings yet
Information Management L2
45 pages
Power Query: Data Import & Transformation
No ratings yet
Power Query: Data Import & Transformation
4 pages
Dataguard Switchover Using DGMGRL Utility
No ratings yet
Dataguard Switchover Using DGMGRL Utility
6 pages
Cloud Computing Seminar Report
100% (1)
Cloud Computing Seminar Report
20 pages
Basics of Oracle Service Contracts
No ratings yet
Basics of Oracle Service Contracts
4 pages
Grant/Revoke Privileges: Description
No ratings yet
Grant/Revoke Privileges: Description
5 pages
Qie Install Guide
No ratings yet
Qie Install Guide
26 pages
Chapter 8 - Concurrency Control Techniques
No ratings yet
Chapter 8 - Concurrency Control Techniques
28 pages
Microsoft AZ-800 Exam - ExamTopics
100% (3)
Microsoft AZ-800 Exam - ExamTopics
286 pages
Nis w22 Ans (1) - 1
No ratings yet
Nis w22 Ans (1) - 1
27 pages
Insta Clone Phase I
No ratings yet
Insta Clone Phase I
18 pages
Cyber Forensics Subject Question Answers
No ratings yet
Cyber Forensics Subject Question Answers
22 pages
CIS Juniper OS Benchmark v2.0.0 PDF
No ratings yet
CIS Juniper OS Benchmark v2.0.0 PDF
447 pages
Reference Database Revision Manager
No ratings yet
Reference Database Revision Manager
3 pages
LSMW Data Migration Guide
No ratings yet
LSMW Data Migration Guide
6 pages
Functional Specification Document Template
No ratings yet
Functional Specification Document Template
11 pages
2024 04 27 Detecting and Reporting Sensitive Data in ECR
No ratings yet
2024 04 27 Detecting and Reporting Sensitive Data in ECR
65 pages
Topic 5 - Computer Virus and Malicious Code PDF
No ratings yet
Topic 5 - Computer Virus and Malicious Code PDF
48 pages
Spark Optimization PDF
50% (2)
Spark Optimization PDF
14 pages
Business Intelligence Lead Consultant Profile
No ratings yet
Business Intelligence Lead Consultant Profile
8 pages
Obs Synopsis Toc
No ratings yet
Obs Synopsis Toc
2 pages
3-Tier Architecture in C# - CodeProject
No ratings yet
3-Tier Architecture in C# - CodeProject
8 pages
Information Assurance & Cybersecurity Quiz
No ratings yet
Information Assurance & Cybersecurity Quiz
51 pages
Backup Types and Common Mistakes
No ratings yet
Backup Types and Common Mistakes
12 pages
SQL Health Check Report Sample
No ratings yet
SQL Health Check Report Sample
14 pages

Programmatic Approach Using DATA Step and PROC SQL Creating A SAS Data Set Using DATA Step

Uploaded by

Programmatic Approach Using DATA Step and PROC SQL Creating A SAS Data Set Using DATA Step

Uploaded by

Programmatic Approach using DATA Step and PROC SQL

Creating a SAS data set Using DATA Step:

Using the Simplest SELECT Step:

A More selective Statement:

PROC PRINT NOOBS LABEL DATA=raj;

Using Proc Step:

Using Proc Sql:

Using Proc SQL we can create subsets.

However, this approach to subsetting can be implemented if we turn to SAS

Creating New Columns:

Using PROC SQL:

Calculate Descriptive Statistics:

Using proc Step:

Using PROC SQL

USING PROC SQL:

USING SELECT STATEMENT IN DATA STEP:

Using PROC SQL:

Filtering Aggregated Data:

Using Proc Summary in Data Step:

PROC SUMMARY DATA=raj NWAY;

Using Proc SQL:

PROC SORT DATA=raj OUT=age_sort;

USING PROC SQL:

Elimination of Duplicate Records:

First create Dataset:

A commonly used technique for elimination of the duplicates is to use the

PROC SORT DATA=sex_age OUT=sex_age_distinct NODUPRECS;

Combining Summary Statistics with Original detail:

Summary Stats using Proc Freq:

PROC SORT DATA=detail_and_counts;

Summary Statistics based on Distinct Values:

PROC MEANS DATA=teens MEAN MAXDEC=3;VAR age;

PROC FREQ DATA=teens NOPRINT;

You might also like