Proc SQL
WHY LEARN PROC SQL?
• PROC SQL can not only retrieve information
without having to learn SAS syntax, but it can
often do this with fewer and shorter
statements than traditional SAS code.
• Additionally, SQL often uses fewer resources
than conventional DATA and PROC steps.
Further, the knowledge learned is transferable
to other SQL packages
PROC SQL Syntax
PROC SQL is SAS’ implementation of the Structured Query Language (SQL). It is a powerful Base SAS Procedure that combines the functionality of DATA and PROC steps into a
single step.
PROC SQL <options>;
CREATE TABLE Dataset-Name AS
SELECT <column(s)(* ,distinct)>
FROM <table-name | view-name> • The CREATE TABLE statement provides the ability to create a new
WHERE <expression> dataset (same as DATA statement in SAS).
GROUP BY <column(s)> • The purpose of the SELECT statement is to name the columns that will
HAVING <expression> appear on the dataset and the order in which they will appear.
ORDER BY <column(s)>; • The FROM clause is used for specifying the input dataset name (same as
QUIT; SET statement of SAS).
• The WHERE clause selects the rows meeting certain condition(s) (same
as WHERE statement in SAS).
Note: The above order of SQL steps is important.
• The GROUP statement is used for aggregating/summarizing the data.
• The ORDER BY clause returns the data in sorted order of specified
column(s) (same as PROC SORT in SAS).
SAS looks at a dataset one record at a time, using an implied loop that moves from the first record to
the last record. SQL looks at all the records, as a single object, thus SQL built-in functions work over the
entire dataset
Among the significant differences is that PROC SQL can use SAS data step functions, SAS macros and
macro variables. SQL Proc can do a many to many merge. It does not require to sort data set before
merging. It can also easily join on variables with different names. SQL can perform a Cartesian product
very easily. SQL also allows merging where the condition of match is not equality.
Don’t use SQL for large datasets
ABSTRACT
• PROC SQL is a powerful Base SAS7 Procedure that
combines the functionality of DATA and PROC steps into a
single step.
• PROC SQL can sort, summarize, subset, join (merge), and
concatenate datasets, create new variables, and print the
results or create a new table or view all in one step!
• PROC SQL can be used to retrieve, update, and report on
information from SAS data sets or other database products.
This paper will concentrate on SQL's syntax and how to
access information from existing SAS data sets.
PROC SQL Statements
List all information in the table: PROC SQL;
An asterisk on the SELECT statement will SELECT *
select all columns from the dataset. FROM trg.table;
QUIT;
Limiting information on the SELECT: PROC SQL;
To specify certain variables in the output CREATE TABLE new AS
dataset, the variables are listed and SELECT LastName, FirstName,
separated on the SELECT statement by a Salary
comma(,). The SELECT statement does NOT FROM trg.table;
limit the number of variables read. QUIT;
SAS features work with PROC SQL PROC SQL;
CREATE TABLE new AS
SELECT *
FROM trg.table (DROP =
JobCode);
QUIT;
The SELECT statement is mandatory on every PROC SQL query.
PROC SQL Statements
PROC SQL Statements
PROC SQL;
Creating new variables:
Variables can be dynamically created in PROC CREATE TABLE new AS
SQL. SELECT Salary, Salary*0.05 AS Tax
FROM table1;
QUIT;
PROC SQL;
Sorting the data using PROC SQL:
If the data is already in sorted order, PROC SQL CREATE TABLE new AS
will print a message in the LOG stating the SELECT *
sorting utility was not used. FROM table1
ORDER BY JobCode, Salary DESCENDING;
QUIT;
Subsetting using the WHERE: PROC SQL;
The WHERE statement will process a subset of CREATE TABLE new AS
data rows before they are processed. SELECT Salary, HireDate
The WHERE clause cannot reference a computed FROM table1
variable from the same dataset
WHERE Salary > 20000;
QUIT;
PROC SQL Statements
PROC SQL Statements
Removing duplicates
NoDup: NoDupkey: Some function has to be used to get the other variables.
PROC SQL; PROC SQL;
CREATE TABLE new AS CREATE TABLE new AS
SELECT DISTINCT JobCode, Location SELECT DISTINCT JobCode, MAX(HireDate) AS HireDate
FROM table1; FROM table1
QUIT; GROUP BY JobCode;
QUIT;
PROC SQL;
Using FORMATS: CREATE TABLE new AS
SAS-defined or user-defined formats can SELECT HireDate, Salary, Salary*0.05 as Tax FORMAT = DOLLAR10.2
be used to improve the appearance of FROM table1
variables in a dataset.
WHERE Salary > 10000
ORDER BY Salary, HireDate DESCENDING;
QUIT;
PROC SQL;
The CALCULATED option: SELECT Country, Salary*0.05 AS Tax, (Salary*0.05)* .01 AS Rebate
The CALCULATED component must refer FROM table1;
to a variable created within the same QUIT;
SELECT statement. -OR-
PROC SQL;
SELECT Country, Salary*0.05 AS Tax, CALCULATED Tax * .01 AS Rebate
FROM table1;
QUIT;
Any of the DATA step functions can be used in an expression to create a new variable except ALG, DIF, and SOUND.
PROC SQL Examples
PROC SQL Statements
PROC SQL;
CREATE TABLE new AS
SELECT COUNT(*) AS n, ROUND(MEAN(Salary),0.01) FORMAT = 8.2 AS SalaryMean,
PUT(Salary,12.) AS Salary FORMAT $12.
FROM table1;
QUIT;
PROC SQL;
CREATE TABLE new AS
SELECT JobCode, COUNT(JobCode) AS n, Salary, SUM(Salary) AS SalaryMax,
ROUND(Salary/(CALCULATED SalaryMax)*100,0.01) FORMAT 6.2 as Salpct
FROM table1
GROUP BY jobcode;
QUIT;
Note:
• Summary functions are restricted to the SELECT and HAVING clauses only.
• All the above functions will be calculated for each jobcode group not for each
observation
• GROUP BY Clause requires at least 1 summary function in the SELECT statement, else it
get transforms to a ORDER BY clause.
PROC SQL Examples
PROC SQL Statements
PROC SQL;
CREATE TABLE new AS
SELECT JobCode, COUNT(JobCode) AS n, HireDate FORMAT = date7., Salary, MAX(Salary) as SalaryMax
FROM table1
GROUP BY JobCode
HAVING Salary = CALCULATED SalaryMax
ORDER BY CALCULATED SalaryMax DESCENDING;
QUIT;
Note: Predicates in the having clause are applied after the formation of groups whereas predicates in the where clause are
applied before forming groups.
PROC SQL;
CREATE TABLE new AS
The CASE Expression: SELECT LastName, Salary
The CASE expression can be used to create a new CASE
variable that is a “re-categorization” of the values of WHEN Salary <= 20000 THEN ‘Low Sal’
another variable. Coding the WHEN in descending WHEN Salary <= 30000 THEN ‘High
order of probability will improve efficiency because Salary’
SAS will stop checking the CASE conditions as soon as ELSE ‘Very High’
it finds the first true value. END as SalType LENGTH = 12
FROM table1
ORDER BY Salary;
QUIT;
String Operations
• SQL includes a string-matching operator for comparisons on character strings. Patterns are described using two special characters:
percent (%). The % character matches any substring.
underscore (_). The _ character matches any character.
• Find the names of all customers whose street includes the substring “Main”.
SELECT name
FROM customer
WHERE street LIKE ‘%Main%’
• SQL supports a variety of string operations such as
concatenation (using “||”)
converting from upper to lower case (and vice versa)
finding string length, extracting substrings, etc.
Merging Using SQL
Cartesian Join
• A Cartesian join is when you join every row of one table to every row of another
table. You can also get one by joining every row of a table to every row of itself.
• Example. Run the following code on the dataset forsql and note the results
PROC SQL;
CREATE TABLE matrix AS select * FROM
(SELECT ans AS ans0001 FROM trg.forsql WHERE var='0001'),
(SELECT ans AS ans0006 FROM trg.forsql WHERE var='0006'),
(SELECT ans AS ans0003 FROM trg.forsql WHERE var='0003')
ORDER BY ans0001, ans0006, ans0003;
QUIT;
Cartesian Joins allow for combining tables to self.
Ex. Find the names of all branches that have greater assets than some branch located in Brooklyn.
PROC SQL;
SELECT DISTINCT T.branch-name
FROM branch AS T, branch AS S
WHERE (T.assets > S.assets) AND S.branch-city = ‘Brooklyn’;
Different Types of Joins
13