0% found this document useful (0 votes)

91 views10 pages

SAS & SQL Techniques for Data Analysis

PROC SQL is a powerful tool for summarizing and manipulating data in SAS. It can be used to select, filter, sort, join, and aggregate data. Some key capabilities demonstrated in the document include: 1. Creating new tables from existing data sets using SELECT statements. 2. Filtering rows using WHERE clauses and conditional logic like CASE statements. 3. Grouping and aggregating data using functions like COUNT, MIN, MAX, and MEAN with GROUP BY. 4. Combining original detail data with summary statistics in a single table. 5. Performing tasks like sorting, subsetting columns, and eliminating duplicates with concise SQL syntax.

Uploaded by

S Sreenivasulu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views10 pages

SAS & SQL Techniques for Data Analysis

Uploaded by

S Sreenivasulu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

SQL

By: Raj
Programmatic Approach using DATA Step and PROC SQL
Creating a SAS data set Using DATA Step:
DATA raj;
SET [Link];
WHERE age<13;
LABEL name = 'First Name';
RENAME name = FName;
FORMAT height weight 5.1;
RUN;

Using the Simplest SELECT Step:

To select some/all variables and create a SAS data set.
PROC SQL;
SELECT *
FROM raj
;
QUIT;

A More selective Statement:

To print selected variables:
Using SAS dataset we use var statement to print particular variables:

PROC PRINT NOOBS LABEL DATA=raj;

VAR fname age;
RUN;

Using Proc Step:

PROC SQL;
SELECT fname, age
FROM raj
;
QUIT;

Storing Results:
Very often you don’t want to display results. Instead you want to store them for
use in subsequent computations. That’s what this DATA step will do:

Page 1 of 10
SQL
By: Raj
DATA new;
SET raj;
RUN;

Using Proc Sql:

PROC SQL;
CREATE TABLE new AS
SELECT *
FROM raj
;
QUIT;

Column Subsets:
If you don’t need all of the variables available in the existing data set? In the
DATA step, a KEEP statement can be used to identify those to be stored in the new
data set. For example:
DATA subset;
SET raj;
KEEP fname sex age;
RUN;

Using Proc SQL we can create subsets.

PROC SQL;
CREATE TABLE subset AS
SELECT fname, sex, age
FROM raj
;
QUIT;

DATA subset;
SET raj;
DROP height weight;
RUN;

However, this approach to subsetting can be implemented if we turn to SAS

features available within PROC SQL. Specifically, we can code the DROP= data
set option for the table being created, as in:

PROC SQL;
CREATE TABLE subset(DROP=height weight) AS
SELECT *

Page 2 of 10
SQL
By: Raj
FROM raj
;
QUIT;

Creating New Columns:

Using Data step:
DATA ratios;
SET raj;
ATTRIB Ratio FORMAT=5.2 LABEL='Weight:Height Ratio';
ratio = weight / height;
RUN;

Using PROC SQL:

PROC SQL;
CREATE TABLE ratios AS
SELECT *,
weight / height AS Ratio
FORMAT=5.2 LABEL='Weight:Height Ratio'
FROM raj
;
QUIT;

Calculate Descriptive Statistics:

Using Data Step:
PROC SUMMARY DATA=raj;
VAR age height weight;
OUTPUT OUT=overall_averages(DROP = _type_ _freq_)
MIN (age )=Youngest
MAX (age )=Oldest
MEAN(height)=Avg_Height
MEAN(weight)=Avg_Weight;
RUN;

Using proc Step:

PROC SQL;
CREATE TABLE overall_averages AS
SELECT MIN (age) AS Youngest,
MAX (age) AS Oldest,
MEAN(height) AS Avg_Height FORMAT=5.1,
MEAN(weight) AS Avg_Weight FORMAT=5.1
FROM raj

Page 3 of 10
SQL
By: Raj
;
QUIT;

Creating Subtotals:
Suppose that instead of an overall summary, we want the computations stratified
by SEX. The PROC SUMMARY code shown previously can be adapted by
inserting a CLASS statement and coding the NWAY option (to suppress
production of the grand overall statistics, which we no longer want). Here is the
code:
PROC SUMMARY DATA=raj NWAY;
CLASS sex;
VAR age height weight;
OUTPUT OUT=group_averages(DROP = _type_ _freq_)
MIN (age )=Youngest
MAX (age )=Oldest
MEAN(height)=Avg_Height
MEAN(weight)=Avg_Weight;
RUN;

Using PROC SQL

PROC SQL;
CREATE TABLE group_averages AS
SELECT sex,
MIN (age) AS Youngest,
MAX (age) AS Oldest,
MEAN(height) AS Avg_Height FORMAT=5.1,
MEAN(weight) AS Avg_Weight FORMAT=5.1
FROM raj
GROUP BY sex
;
QUIT;

Conditionality
It is not uncommon to have values that depend on other values—in other words,
conditionality. Probably the most common way of implementing conditionality in
the DATA step is the IF/THEN/ELSE structure.
For example, suppose that students of different ages and sexes are to go on
different field trips. The 11-year-olds (boys and girls) are going to the zoo; girls
who are not going to the zoo (that is, 12-year-old girls) are going to the museum;
and boys who aren’t going to the zoo have to stay behind. Here’s one way of
generating a list of individual student destinations:

Page 4 of 10
SQL
By: Raj

DATA trip_list;
SET raj;
IF age=11 THEN Trip = 'Zoo ';
ELSE IF sex='F' THEN trip = 'Museum';
ELSE trip = '[None]';
KEEP fname age sex trip;
RUN;

USING PROC SQL:

PROC SQL;
CREATE TABLE trip_list AS
SELECT fname,
age,
sex,
CASE WHEN age=11 THEN 'Zoo'
WHEN sex='F' THEN 'Museum'
ELSE '[None]'
END
AS Trip
FROM raj
;
QUIT;

USING SELECT STATEMENT IN DATA STEP:

DATA trip_list;
SET RAJ;
SELECT;
WHEN (age=11) Trip = 'Zoo ';
WHEN (sex='F') trip = 'Museum';
OTHERWISE trip = '[None]';
END;
KEEP fname age sex trip;
RUN;

FILTERING USING WHERE STATEMENT:

Using DATA STEP:

DATA girls;
SET raj;
WHERE sex='F';

Page 5 of 10
SQL
By: Raj
RUN;

Using PROC SQL:

PROC SQL;
CREATE TABLE girls AS
SELECT *
FROM raj
WHERE sex='F'
;
QUIT;

PROC SQL;
SELECT *
FROM raj
WHERE age=10
;
QUIT;

PROC SQL;
CREATE TABLE tens AS
SELECT *
FROM raj
WHERE age=10
;
QUIT;

Filtering Aggregated Data:

SQL has a second filtering device, the HAVING clause. The distinction between
this and the WHERE clause is that HAVING conditions can reference summary
statistics and are evaluated after aggregations are performed. Thus they take effect
“downstream,” on the output side of the process.
To illustrate, consider this PROC SUMMARY step, which calculates the extreme
values of the HEIGHT variable and does so separately for each SEX/AGE
combination:

Using Proc Summary in Data Step:

PROC SUMMARY DATA=raj NWAY;

CLASS sex age;

Page 6 of 10
SQL
By: Raj
OUTPUT MAX(height)=Tallest MIN(height)=Shortest
OUT= hilo(DROP = _type_ _freq_);
RUN;

Using Proc SQL:

PROC SQL;
CREATE TABLE hilo AS
SELECT sex,
age,
MAX(height) AS Tallest,
MIN(height) AS Shortest
FROM raj
GROUP BY sex, age
;
QUIT;

Reordering Rows:
The purpose of PROC SORT is the reordering of observations. For example, if we
run:

PROC SORT DATA=raj OUT=age_sort;

BY DESCENDING age fname;
RUN;

USING PROC SQL:

PROC SQL;
CREATE TABLE age_sort AS
SELECT *
FROM raj
ORDER BY age DESCENDING, fname
;
QUIT;

Elimination of Duplicate Records:

Eliminating duplicate rows from a table is a common task. To illustrate, we first
need to have a data set containing duplicates. We’ll get one by eliminating some of
the columns in Raj:

Page 7 of 10
SQL
By: Raj
First create Dataset:

PROC SQL;
CREATE TABLE sex_age AS
SELECT sex, age
FROM RAJ
; QUIT;

A commonly used technique for elimination of the duplicates is to use the

NODUPRECS
option of PROC SORT:

PROC SORT DATA=sex_age OUT=sex_age_distinct NODUPRECS;

BY _ALL_;
RUN;

SQL has a special keyword, DISTINCT, to specify that duplicate rows are to be
eliminated. The keyword appears in the SELECT statement or clause, immediately
following SELECT and preceding the list of columns. So the SQL code to
eliminate duplicates from our table is:

PROC SQL;
CREATE TABLE sex_age_distinct AS
SELECT DISTINCT *
FROM sex_age
;
QUIT;

Combining Summary Statistics with Original detail:

Let’s begin by creating a table we will use in the examples:

PROC SQL;
CREATE TABLE teens AS
SELECT name AS FName,
age
FROM [Link]
WHERE age>12
;
QUIT;

Page 8 of 10
SQL
By: Raj
Summary Stats using Proc Freq:
PROC FREQ DATA=teens NOPRINT;
TABLES age / OUT=cohorts(DROP=percent RENAME=(count=Many) );
RUN;

To combine these counts with the original data, we first sort that original data:
PROC SORT DATA=teens OUT=sorted;
BY age;
RUN;
DATA detail_and_counts;
MERGE sorted cohorts;
BY age;
RUN;

We now have all of the data together, but the names are grouped by AGE and thus
not in
alphabetical order.

PROC SORT DATA=detail_and_counts;

BY fname;
run;

Then we combine the original data with the counts, via a MERGE statement:
DATA detail_and_counts;
MERGE sorted cohorts;
BY age;
RUN;
We now have all of the data together, but the names are grouped by AGE and thus
not in alphabetical order. So we sort again to restore the original alphabetical
order:
PROC SORT DATA=detail_and_counts;
BY fname;
run;

It has taken four steps to get the output. In contrast, using SQL, we can simply
write:
PROC SQL;
CREATE TABLE detail_and_counts AS
SELECT fname,
age,
COUNT(*) AS Many

Page 9 of 10
SQL
By: Raj
FROM teens
GROUP BY age
ORDER BY fname
;
QUIT;

Summary Statistics based on Distinct Values:

Sometimes, when we need summary statistics derived from our data, we want the
computer to ignore repetition of values. For example, suppose we want to know the
average of the AGE values that occur in our TEENS table, ignoring repetitions of
those values. In other words, we need an unweighted mean of AGE, in the sense
that we want to include each particular value (13 and so on) only once, no matter
how many times it may appear. The weighted mean is pretty simple to calculate,
with or without SQL. The non-SQL code is:

PROC MEANS DATA=teens MEAN MAXDEC=3;VAR age;

RUN;

Deriving our unweighted mean via PROC MEANS is more complicated, and is a
twostep proposition. First we have to eliminate repetitions of AGE values; one way
to do this is with PROC FREQ:

PROC FREQ DATA=teens NOPRINT;

TABLES age / out=freq2means(KEEP = age);
RUN;

Now we can proceed to find the average of these distinct (unduplicated) AGE
values, using PROC MEANS:

PROC MEANS DATA=freq2means MEAN MAXDEC=3;

VAR age;
RUN;

This derivation can be done in just one PROC SQL statement. We can even display
the simple weighted mean alongside. The code is:

PROC SQL;
SELECT MEAN( age)
LABEL = 'Weighted' FORMAT=8.3,
MEAN(DISTINCT age)
LABEL = 'Unweighted' FORMAT=8.3
FROM teens; QUIT;

Page 10 of 10

Programmatic Approach Using DATA Step and PROC SQL Creating A SAS Data Set Using DATA Step
No ratings yet
Programmatic Approach Using DATA Step and PROC SQL Creating A SAS Data Set Using DATA Step
11 pages
Proc SQL 20150209
No ratings yet
Proc SQL 20150209
41 pages
Demographic Project
No ratings yet
Demographic Project
7 pages
SAS&R Analogs
No ratings yet
SAS&R Analogs
7 pages
SAS Learning Module Collapsing Across Observations Using Proc SQL
No ratings yet
SAS Learning Module Collapsing Across Observations Using Proc SQL
4 pages
SAS Programming Efficiency Tips
No ratings yet
SAS Programming Efficiency Tips
9 pages
SAS Report & Frequency Table Guide
No ratings yet
SAS Report & Frequency Table Guide
15 pages
Project 1
No ratings yet
Project 1
5 pages
01 SQL Fundamentals
No ratings yet
01 SQL Fundamentals
149 pages
Dbms Practical 5-8
No ratings yet
Dbms Practical 5-8
14 pages
Import Xls Sas Code
No ratings yet
Import Xls Sas Code
6 pages
Yr10 CS 23-24 T3 W7 L1 SQL Basics
No ratings yet
Yr10 CS 23-24 T3 W7 L1 SQL Basics
28 pages
Imp SQL Solved
No ratings yet
Imp SQL Solved
21 pages
SQL3
No ratings yet
SQL3
2 pages
Data Celko
No ratings yet
Data Celko
60 pages
SQL Data Sorting and Functions Guide
No ratings yet
SQL Data Sorting and Functions Guide
5 pages
Your SQL Quickstart Guide
No ratings yet
Your SQL Quickstart Guide
32 pages
SQL Syntax Cheat Sheet: Basics to Advanced
No ratings yet
SQL Syntax Cheat Sheet: Basics to Advanced
15 pages
SQL Worksheet
No ratings yet
SQL Worksheet
8 pages
Joins
No ratings yet
Joins
43 pages
SAS Vs SQL
No ratings yet
SAS Vs SQL
52 pages
Advanced SQL Processing
No ratings yet
Advanced SQL Processing
7 pages
Rdbmsexp 6
No ratings yet
Rdbmsexp 6
6 pages
CLASS 12 COMPUTER SCIENCE PRACTICAL FILE 2023 24 - Extracted
No ratings yet
CLASS 12 COMPUTER SCIENCE PRACTICAL FILE 2023 24 - Extracted
7 pages
SQL Data Manipulation Language Guide
100% (2)
SQL Data Manipulation Language Guide
51 pages
12 Practical File IT
No ratings yet
12 Practical File IT
3 pages
UNIT 8 Note 3 More SQL
No ratings yet
UNIT 8 Note 3 More SQL
3 pages
Data Warehousing & SQL Concepts
No ratings yet
Data Warehousing & SQL Concepts
14 pages
SAS Chapter 03
No ratings yet
SAS Chapter 03
6 pages
PROC REPORT: A Comprehensive Guide
No ratings yet
PROC REPORT: A Comprehensive Guide
28 pages
SQL PPT
No ratings yet
SQL PPT
68 pages
SQL ORDER BY and GROUP BY Explained
No ratings yet
SQL ORDER BY and GROUP BY Explained
9 pages
SQL Summary
No ratings yet
SQL Summary
10 pages
Chapter-6 Add From Handout
No ratings yet
Chapter-6 Add From Handout
72 pages
12-Ip-Cs-Mysql-Tables Revisions-23-24-Anskey
No ratings yet
12-Ip-Cs-Mysql-Tables Revisions-23-24-Anskey
3 pages
5
No ratings yet
5
3 pages
Database SQL
No ratings yet
Database SQL
24 pages
Techniques Used To Transform Data, Part 1
No ratings yet
Techniques Used To Transform Data, Part 1
12 pages
Exp 5 Group By, Having, Orderby
No ratings yet
Exp 5 Group By, Having, Orderby
6 pages
Database Nest Quiz
No ratings yet
Database Nest Quiz
22 pages
SQL Commands and Database Management Guide
No ratings yet
SQL Commands and Database Management Guide
16 pages
Thinking in Sets
No ratings yet
Thinking in Sets
37 pages
Mysql Questions
No ratings yet
Mysql Questions
7 pages
Practical Program SQL
No ratings yet
Practical Program SQL
12 pages
DDD Lab 01 Support Material
No ratings yet
DDD Lab 01 Support Material
9 pages
SQL 4
No ratings yet
SQL 4
12 pages
DBMS Lab File
No ratings yet
DBMS Lab File
23 pages
c164 Biva Exp2
No ratings yet
c164 Biva Exp2
21 pages
PLSQL
No ratings yet
PLSQL
14 pages
SAS Procedures for Data Import/Export
No ratings yet
SAS Procedures for Data Import/Export
36 pages
Selecting Data
No ratings yet
Selecting Data
15 pages
Structured Query Language
No ratings yet
Structured Query Language
68 pages
SQL Interview Questions Guide
No ratings yet
SQL Interview Questions Guide
33 pages
ALBS - SAS Module 7
No ratings yet
ALBS - SAS Module 7
56 pages
12 (19-2-1) Ques DBMS (SQL QUERIES)
No ratings yet
12 (19-2-1) Ques DBMS (SQL QUERIES)
16 pages
I Cdisc: Ntroduction To
No ratings yet
I Cdisc: Ntroduction To
29 pages
Iddcr - MB 1.2 CRF
No ratings yet
Iddcr - MB 1.2 CRF
40 pages
IDDCR - MB 2 Clinical Trial Design
No ratings yet
IDDCR - MB 2 Clinical Trial Design
11 pages
Iddcr - MB 1 CT Process
No ratings yet
Iddcr - MB 1 CT Process
12 pages
Iddcr - MB 4 PK - PD
No ratings yet
Iddcr - MB 4 PK - PD
15 pages
IDDCR - MA 2 Phases
No ratings yet
IDDCR - MA 2 Phases
21 pages
PROC MEANS Freq Corr Regression Annova
No ratings yet
PROC MEANS Freq Corr Regression Annova
60 pages
Memory Devices
No ratings yet
Memory Devices
7 pages
Anti Money Laundering PDF
0% (1)
Anti Money Laundering PDF
8 pages
Fax Error Codes and Remedies Guide
No ratings yet
Fax Error Codes and Remedies Guide
18 pages
ATmega328 Microcontroller Features Overview
No ratings yet
ATmega328 Microcontroller Features Overview
3 pages
Iec 60870 5 101 104 Intop Acp Eng
No ratings yet
Iec 60870 5 101 104 Intop Acp Eng
110 pages
BSIT Course Outlines
No ratings yet
BSIT Course Outlines
16 pages
Polvere: Standard Tuning
No ratings yet
Polvere: Standard Tuning
4 pages
IT Job Openings in India by Skill and Location
No ratings yet
IT Job Openings in India by Skill and Location
2 pages
Understanding One-Dimensional Arrays
No ratings yet
Understanding One-Dimensional Arrays
6 pages
CCNA Exploration 3 Chapter 1 Exam Answers
No ratings yet
CCNA Exploration 3 Chapter 1 Exam Answers
9 pages
Asa 96 General Config PDF
No ratings yet
Asa 96 General Config PDF
1,260 pages
ARG Network Adaptor - FEATURES and BENEFITS 1800 - Issp5
No ratings yet
ARG Network Adaptor - FEATURES and BENEFITS 1800 - Issp5
2 pages
Engineering Exam Prep Channel
No ratings yet
Engineering Exam Prep Channel
25 pages
O Level Computer Science Exam
No ratings yet
O Level Computer Science Exam
12 pages
BAPI Enhancement
No ratings yet
BAPI Enhancement
6 pages
We Provide The Most Reliable Storage Solution: January.2010 (Rev.4.0)
No ratings yet
We Provide The Most Reliable Storage Solution: January.2010 (Rev.4.0)
12 pages
DBMS Transactions and ACID Explained
No ratings yet
DBMS Transactions and ACID Explained
3 pages
Database Management Systems Overview
0% (1)
Database Management Systems Overview
33 pages
Unit-Wise Question Bank: 8M Questions
No ratings yet
Unit-Wise Question Bank: 8M Questions
13 pages
Enterprise Presentation of Dell
No ratings yet
Enterprise Presentation of Dell
21 pages
WANDA 4.5 Installation Guide
No ratings yet
WANDA 4.5 Installation Guide
4 pages
OCJP Final Set-3
No ratings yet
OCJP Final Set-3
29 pages
Submitted To: - Submitted By:-: Page 1 of 32
100% (2)
Submitted To: - Submitted By:-: Page 1 of 32
32 pages
O Level Computer Science: Data Representation
No ratings yet
O Level Computer Science: Data Representation
20 pages
Data Engineer Resume: Big Data & Cloud Expertise
No ratings yet
Data Engineer Resume: Big Data & Cloud Expertise
8 pages
Overview of TCP/IP Reference Model
No ratings yet
Overview of TCP/IP Reference Model
3 pages
Cognex D750 CommunicationsAndProgramming
No ratings yet
Cognex D750 CommunicationsAndProgramming
88 pages
Citrix 1Y0-A11 Exam
No ratings yet
Citrix 1Y0-A11 Exam
34 pages
Introduction to Data Structures
No ratings yet
Introduction to Data Structures
51 pages
சித்தர் நூல்கள் மொத்தம் 15
100% (1)
சித்தர் நூல்கள் மொத்தம் 15
945 pages

SAS & SQL Techniques for Data Analysis

Uploaded by

SAS & SQL Techniques for Data Analysis

Uploaded by

SQL

Using the Simplest SELECT Step:

A More selective Statement:

PROC PRINT NOOBS LABEL DATA=raj;

Using Proc Step:

Using Proc Sql:

Using Proc SQL we can create subsets.

However, this approach to subsetting can be implemented if we turn to SAS

Creating New Columns:

Using PROC SQL:

Calculate Descriptive Statistics:

Using proc Step:

Using PROC SQL

USING PROC SQL:

USING SELECT STATEMENT IN DATA STEP:

FILTERING USING WHERE STATEMENT:

Using PROC SQL:

Filtering Aggregated Data:

Using Proc Summary in Data Step:

PROC SUMMARY DATA=raj NWAY;

Using Proc SQL:

PROC SORT DATA=raj OUT=age_sort;

USING PROC SQL:

Elimination of Duplicate Records:

A commonly used technique for elimination of the duplicates is to use the

PROC SORT DATA=sex_age OUT=sex_age_distinct NODUPRECS;

Combining Summary Statistics with Original detail:

PROC SORT DATA=detail_and_counts;

Summary Statistics based on Distinct Values:

PROC MEANS DATA=teens MEAN MAXDEC=3;VAR age;

PROC FREQ DATA=teens NOPRINT;

PROC MEANS DATA=freq2means MEAN MAXDEC=3;

You might also like