0% found this document useful (0 votes)
2 views

16. Introduction to PROC SQL — Intro to SAS Notes

PROC SQL is a SAS procedure that allows for efficient data manipulation and querying using Structured Query Language (SQL). It enables users to select, subset, sort, summarize, and group data with fewer statements compared to traditional SAS procedures. The document provides an overview of PROC SQL syntax, examples of its usage, and details on various clauses such as SELECT, WHERE, and CASE for data retrieval and manipulation.

Uploaded by

sridhar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

16. Introduction to PROC SQL — Intro to SAS Notes

PROC SQL is a SAS procedure that allows for efficient data manipulation and querying using Structured Query Language (SQL). It enables users to select, subset, sort, summarize, and group data with fewer statements compared to traditional SAS procedures. The document provides an overview of PROC SQL syntax, examples of its usage, and details on various clauses such as SELECT, WHERE, and CASE for data retrieval and manipulation.

Uploaded by

sridhar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

3/4/25, 10:27 AM 16.

Introduction to PROC SQL — Intro to SAS Notes

16. Introduction to PROC SQL


PROC SQL is a powerful tool for data manipulation and querying. It can perform many functions such as the conventional DATA and
PROC steps but with fewer statements and computer resources. In this lesson, we will investigate how to select, subset, sort,
summarize and group data with SQL procedure.

16.1. PROC SQL Basics


PROC SQL is a procedure that SAS developed for the implementation of Structured Query Language. You can use this procedure to
modify, retrieve and report data in tables and views (created on tables). Just as with other SAS procedures, PROC SQL also has basic
syntax structures. It takes the following general form:

PROC SQL;
SELECT column-1<,…column-n>
FROM table-1|view-1<,…table-n|view-n>
<WHERE expression>
<GROUP BY column-1<,…column-n>>
<HAVING expression >
<ORDER BY column-1<,…column-n >>;
QUIT;

First of all, you may see differences in terminology between SQL and other SAS steps. For example, the data file is called data set in
other SAS steps, but table in SQL. Correspondingly, records are called observations in the previous lessons, but rows in SQL tables;
and we call a field of data set as variable, but column in this lesson.

Other SAS steps SQL Procedure

data set table


observation row

variable column

Another thing that needs your attention is that, unlike other SAS procedures, there could be one or a few SELECT statements inside
PROC SQL. One SELECT statement is called a query, which is composed of many clauses, like SELECT, FROM, WHERE, GROUP
BY, HAVING and ORDER BY. The order of these clauses is important. They must appear in the order as shown above.

We will use the whole lesson to work our way through all these keywords in PROC SQL. Let’s start with the most basic one.

Example
The following SAS SQL code is just query that retrieves data from a single table:

LIBNAME phc6089 "/folders/myfolders/SAS_Notes/data";


PROC SQL;
select ID, SATM, SATV
from phc6089.survey;
QUIT;

To run the program above, you will need to save the SAS data file (survey.sas7bdat) to your computer first (see the data folder on
the course website). Edit LIBNAME statement to reflect the directory in which you saved the survey data set. Then run the program
and check the output.

The SQL procedure in this code represents the most basic form of the procedure. Like other SAS procedures, you need to run
PROC SQL at the beginning to invoke it. Inside the procedure, there is only one statement starting with SELECT, which chooses
the columns you want. You can list as many columns as needed, separated by commas. Another clause is FROM, which is used to
specify the table(s). PROC SQL follows the same protocol of SAS file names. Here we used a two-level name to reference the
permanent file. Just as you read the code, this program is used to select three columns (student id, SAT Math score and SAS
Verbal score) from the table.

Example
The following SAS program uses CREATE TABLE statement to create a new table named SAT_scores, which contains student id,
SAT math scores and verbal scores.

PROC SQL;
CREATE TABLE SAT_Scores as
select ID, SATM, SATV
from phc6089.survey;
QUIT;

Launch and run the SAS program. You may notice that there is no output displayed in the SAS output window or any open ODS
destination. That’s because the CREATE TABLE statement suppresses the printed output of the query. However, if you check the
SAS log window, it shows a message that indicates that the table has been created, and the number of rows and columns in the
table (see output above). In this example, table SAT_scores has 226 rows and 3 columns. And the new table’s columns have the
same attributes (type, length, format, label) as those of the selected source columns.

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 1/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes
From these two examples, you now have some idea about what PROC SQL is like to work with. Let’s summarize what makes it so
unique from other SAS procedures.

1. Unlike other SAS procedures which contain many statements, the SQL procedure may consist of one or more than one
SELECT statement. Each SELECT statement contains several clauses, like SELECT, FROM, WHERE, ORDER BY. But the
SELECT and FROM clause are essential and indispensable. Other clauses are optional. All clauses have to be written in
the order as listed in the syntax. For each one SELECT statement, only one semicolon is needed at the end of the
statement.
2. No RUN statement is required for PROC SQL to execute. SQL procedure will continue to run after you submit the program.
To end it, you have to submit another PROC step, a DATA step, or a QUIT statement.

16.2. Using the SELECT Clause


In the previous section, we learned the basics about PROC SQL. Next, we will investigate more details about the SELECT statement
and how to use it to retrieve data, create new columns and what options are available for data manipulation.

Example
The following SAS program creates a new temporary table with all columns retrieved from permanent file traffic.sas7bdat (see the
data folder on the course website):

PROC SQL;
CREATE TABLE traffic as
select *
from phc6089.traffic;
QUIT;

PROC CONTENTS data=traffic VARNUM;


RUN;

PROC CONTENTS data=phc6089.traffic VARNUM;


RUN;

First, you need to download the permanent SAS data file traffic to your own computer. Revise the libname statement as needed.
Then run the program.

One thing you need to know about this program is the shortcut, noted with an asterisk (*) after SELECT. The asterisk refers to all
columns in the original table. So, this code is to select all columns in the permanent file into the temporary file, traffic.

To check the data, you may use the other procedures we learned in previous lessons, such as the PRINT procedure. In the above
program, PROC CONTENTS has been used to check the variable attributes in the original and the new table. As we mentioned in
the previous section, the variables chosen from other table(s) keep the same attributes.

Besides selecting original columns, the SELECT clause can also be used to create new columns, just as we used assignment
statements in DATA step to create new variables.

Example
The following program is to create new columns with the SELECT statement:

PROC SQL;
select id, count_location,
scan(count_location,-1,' ') as orientation,
street,
passing_vehicle_volume * 0.5 as weekends_traffic_volume
from traffic;
QUIT;

As you can see, this code uses the traffic table we created previously. Using the SELECT statement you can create new columns
that contain either characters or numbers. With valid statements within the SELECT clause, you can use any expression for new
columns. And, the new columns can be named by using the keyword AS followed by the names you would like to use. (Column
names also follow the rules for SAS names.) In the above code, the first new column is created by a character function scan(),
which substring is the orientation information from the existing column, count_location. The name for this new column is orientation
after AS. (It may make no sense, just for the use of example.) The second new column is a math expression that estimates the
traffic volume during weekends by multiplying daily vehicle volume by 0.5. Its alias is weekends_traffic_volume.

Launch and run the SAS program, and review the output to convince yourself that SAS does indeed create two new columns as
you expect. But you should note that new columns only exist during the query, unless you created a table out of it.

While observing the data in traffic, you may notice that some data are not formatted as you want. Fortunately, SAS provides many
options in SELECT statement so you can enhance the appearance of the query output.

Example
The following program adds the format to dates, labels columns and add titles to the output:

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 2/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

PROC SQL;
TITLE "Traffic volume in Area SS";
TITLE2 "During weekdays and weekends";
select id,
Date_of_count label='Date of Count' format=mmddyy10.,
count_location label='Location',
street,
passing_vehicle_volume label='Daily Volume' format=comma6.,
passing_vehicle_volume * 0.5 as weekends_traffic_volume
label='Weekends Volume' format=comma6.
from traffic;
QUIT;

Launch and run the SAS program and then review the resulting output to convince yourself that the data has been formatted and
labeled as you expect. Except for titles, you can also add a footnote to the output using footnote statement. But unlike using title
and footnote statements with other SAS steps, both statements have to be placed either before the PROC SQL statement, or
between the PROC SQL statement and SELECT statement.

One more thing we will talk about in this section is the CASE operator, which just follows the SELECT clause to create new columns
conditionally. You must remember that this applies only to IF-THEN-ELSE statements that are available in DATA step. In PROC SQL,
the CASE operator can perform the equivalent functions. First, let’s look at the syntax for the CASE construct.

CASE
WHEN when-condition THEN result-expression
<… WHEN when-condition THEN
result-expression>
<ELSE result-expression>
END AS < column name>

As in IF-THEN statements, you can add as many WHEN conditions as you want. The conditions can be any valid SAS expression,
including calculations, functions, and logical operators. It works as IF-THEN statements, too. If the conditions have been met, SAS will
carry out the corresponding actions following the keyword THEN. If the WHEN condition is false, then PROC SQL executes the ELSE
expression. You can create a new column and name it with AS keywords after END. The ELSE and AS keywords are optional. But it’s
good practice to keep original columns while creating new ones.

Example
The following SAS program uses CASE operator to assign different salary raise plans for each salary range:

PROC SQL ;
select Name,
Department,
employee_annual_salary label='salary' format=dollar12.2,
'next year raise:',
case
when employee_annual_salary=. then .
when employee_annual_salary < 85000 then 0.05
when 85000 <= employee_annual_salary < 125000 then 0.03
when employee_annual_salary >=125000 then 0.01
else 0
end as raise format=percent8.
from phc6089.salary;
QUIT;

You already know format and label options from the previous explanations. There are a couple of new things in this example,
however. First, you can insert a character(or numeric) constant as a new column in the table. Here a character string “next year
raise” has been added between salary and raise. Raise is also a new column which has been created by the CASE operator
based on the current annual salary of each person.

Download the SAS data set salary.sas7bdat (see the data folder on the course website) on your computer and revise the libname
statement to reflect the directory where you save the file. Then launch and run the program. Review the query result to convince
yourself that the raise values have been assigned correctly.

The CASE operator has two forms of syntax. In fact, if you use only one column for WHEN condition(s), this column’s name can be
put after CASE and before WHEN. So you don’t have to repeat the column’s name in each WHEN condition. Below is the syntax for
this form:

CASE <column-name>
when-condition THEN result-expression
<… WHEN when-condition THEN
result-expression>
<ELSE result-expression>
END AS < column name>

Example
The following program uses the simpler form of CASE construct to decide compensation (Yes or N/A) based on departments:

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 3/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

PROC SQL outobs=20;


select Name,
Department,
employee_annual_salary label='salary' format=dollar12.2,
case department
when 'POLICE' then 'Yes'
when 'FIRE' then 'Yes'
else 'N/A'
end as Compensation
from phc6089.salary;
QUIT;

Traffic volume in Area SS


During weekdays and weekends

Name Department salary Compensation

WARNER PROCUREMENT $76,980.00 N/A

EDWARDS POLICE $83,616.00 Yes

PENDARVIS POLICE $103,590.00 Yes

CLARK FIRE $85,680.00 Yes

FOLINO POLICE $83,616.00 Yes

SAWYER CITY COUNCIL $117,333.00 N/A

WALTON POLICE $89,718.00 Yes

QUINLAN TRANSPORTN $95,888.04 N/A

SOTO WATER MGMNT $79,040.00 N/A

RUSS WATER MGMNT $79,040.00 N/A

MC GUIRE POLICE $86,520.00 Yes

GAWRISCH POLICE $86,520.00 Yes

CANO FIRE $86,520.00 Yes

CLARK STREETS & SAN $72,862.40 N/A

GIBOWICZ POLICE $83,616.00 Yes

MANCILLA POLICE $90,456.00 Yes

WYATT WATER MGMNT $65,686.40 N/A

VALLE FIRE $54,114.00 Yes

SISSAC PUBLIC LIBRARY $12,407.20 N/A

ARMSTEAD POLICE $80,778.00 Yes

The above code uses the same data set as the previous example, salary. It assigns the different compensation plans based on
which department people work for and creates a new column, Compensation, for the result. This time, the column name
Department has been put outside the WHEN conditions and into CASE operator. So we don’t need coding like “WHEN
department=’POLICE’” any more.

Another feature is the option you can use in the PROC SQL statement, OUTOBS=n. It can be used to limit the number of rows
displayed in the output. So in this case, we would expect the table in the output window shows the first 20 rows of the data. And
such a warning message will be delivered in the log file.

WARNING: Statement terminated early due to OUTBOS=20 option.

Note that OUTOBS= will also affect tables that are created by the CREATE TABLE statement.

Launch and run the program. Then check the query result to make sure the records have been processed as expected. Note that
you have to be cautious with this simpler form. For instance, if you move Employee_annual_salary out of the WHEN conditions in
the program of the previous example, SAS will report an error and not execute!

16.3. Using the WHERE Clause


As you know, the WHERE statement or option in DATA step or other procedures is very useful in selecting observations from a data
set based on some criteria. In PROC SQL, the WHERE clause in the SELECT statement can also be used to subset data based on
specified conditions. Any valid SAS expression can be put inside WHERE clause, including functions, comparison or logical operators
and even some special operators. Making the good use of it can increase programming efficiency and save computing resources
greatly. As always, we will work through this subject with examples.

Example
The following example uses the WHERE clause to select employees who work at a police department and have the job title as
sergeant:

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 4/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

PROC SQL;
select Name, Department, Employee_annual_salary
from phc6089.salary
where Department='POLICE' AND Position_title='SERGEANT';
QUIT;

Traffic volume in Area SS


During weekdays and weekends

Name Department Employee Annual Salary

PENDARVIS POLICE 103590

ODUM POLICE 106920

DANIELS POLICE 106920

TATE JR POLICE 103590

GOLDSMITH POLICE 110370

FRANKO POLICE 106920

RADDATZ POLICE 110370

FLEMING POLICE 100440

CASEY POLICE 110370

MC COY POLICE 110370

JACOBS POLICE 106920

Reading through the program, you must have known that it selects the name, department and annual salary information from
salary data for police sergeants. Note that the columns in the WHERE clause do not have to be specified in the SELECT clause,
(such as Position title), which is used in the WHERE clause but not in the SELECT clause. However, for the sake of the results
checking, I would suggest to keep these columns in the query until verified.

Launch and run the SAS program, and review the output to convince yourself that the records have been selected as described.

We saw two types of operators used in the above program, the comparison (=) and the logical (and). Besides these common ones,
another type that could be very useful in your programming is called a conditional operator. You may know some of them already, like
IN, CONTAINS and MISSING. You can find the complete list of operators in the SAS documentation. Next, let’s look at a couple of
examples on this using BETWEEN AND and LIKE.

BETWEEN value-1 AND value-2

Both value-1 and value-2 are end values. So you can use the BETWEEN AND operator to specify a range of values, such as from one
date to another, or from lower limit to upper limit. The smaller value does not have to be the first.

Example
The following program uses the operator, BETWEEN AND, to select observations from salary data whose annual salary is
between \$65,000 and \$70,000, and also works in Fire department:

PROC SQL;
select Name, Department,
Employee_annual_salary label='Salary' format=DOLLAR12.2
from phc6089.salary
where Employee_annual_salary between 65000 and 70000
and Department='FIRE';
QUIT;

Traffic volume in Area SS


During weekdays and weekends

Name Department Salary

KELLY FIRE $65,946.00

KOCHANEY FIRE $65,946.00

Launch and run the SAS program, and review the query output to convince yourself that the SAS yield the result as expected.

Another useful operator is the LIKE operator:

Column LIKE ‘pattern’

With the LIKE operator, you have to specify a column name and the pattern to be matched. Regarding the pattern, first it is case-
sensitive and has to be enclosed in quotation marks; secondly, it may contain a special character, either an underscore(_) and/or
percent sign(%). The underscore character stands for any single character and the percent sign for any sequence of zero or more
characters. For example, assume that you are working with a table containing these values for a column.

Cathy
Kathy
Kathie
https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 5/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes
Katherine

Now using different patterns, the selection results are different:

Patterns Results
Kath_ Kathy
Kath__ Kathie
Kath% Kathy, Kathie, Katherine
_ath% All of the names above

Example
The following program shows the use of the LIKE operator in a WHERE clause to select name, department, position title and
annual salary information for people whose name starts with R and the third letter is B:

PROC SQL;
select Name, Department, Position_title,
Employee_annual_salary label='Salary' format=DOLLAR12.2
from phc6089.salary
where Name like 'R_B%';
QUIT;

Traffic volume in Area SS


During weekdays and weekends

Name Department Position Title Salary

ROBINSON WATER MGMNT OPERATING ENGINEER-GROUP C $93,745.60

RABANALES FINANCE AUDITOR II $87,912.00

ROBERTS FIRE PARAMEDIC I/C $79,404.00

ROBINSON WATER MGMNT CONSTRUCTION LABORER $79,040.00

Launch and run the SAS program, and review the query output to convince yourself that the SAS behaves as described.

Another point worthy of being made here is the CALCULATED keywords. In the last section you learned that we can perform
calculations in SELECT statement and assign an alias to that new column. However, because SAS processes the WHERE clause
prior to the SELECT clause, you will run into a problem if the calculated column is used in a WHERE clause as condition. Therefore,
the keyword CALCULATED has to be inserted into the WHERE clause along with the alias to inform SAS that the value is calculated
within the query. This point will be illustrated by the following programs.

Example
The following program attempts to calculate the bonus for every employee, then select ones who has more than \$2,000 as bonus:

PROC SQL;
select Name, Department,
Employee_annual_salary label='Salary' format=DOLLAR12.2,
Employee_annual_salary * 0.02 as Bonus
from phc6089.salary
where Bonus > 2000 ;
QUIT;

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 6/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

176 ods listing close;ods html5 (id=saspy_internal) file=stdout


options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
176! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
177
178 PROC SQL;
179 select Name, Department,
180 Employee_annual_salary label='Salary'
format=DOLLAR12.2,
181 Employee_annual_salary * 0.02 as Bonus
182 from phc6089.salary
183 where Bonus >2000 ;
NOTE: Data file PHC6089.SALARY.DATA is in a format that is native to
another host, or the file encoding does not match the session
encoding. Cross Environment Data Access will be used, which
might require additional CPU resources and might reduce
performance.
ERROR: The following columns were not found in the contributing
tables: Bonus.
NOTE: PROC SQL set option NOEXEC and will continue to check the
syntax of statements.
184 QUIT;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

185
186 ods html5 (id=saspy_internal) close;ods listing;

187

Launch and run the SAS program. You may want to see what’s going wrong yourself. In the log window, SAS delivered an error
message that the column Bonus cannot be found (see output above). That’s because SAS processes the WHERE clause before
the SELECT clause. To make it right, add CALCULATED in the WHERE clause as shown below.

PROC SQL;
select Name, Department,
Employee_annual_salary label='Salary' format=DOLLAR12.2,
Employee_annual_salary * 0.02 as Bonus
from phc6089.salary
where CALCULATED Bonus > 2000 ;
QUIT;

Traffic volume in Area SS


During weekdays and weekends

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 7/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

Name Department Salary Bonus

PENDARVIS POLICE $103,590.00 2071.8

SAWYER CITY COUNCIL $117,333.00 2346.66

ODUM POLICE $106,920.00 2138.4

PLANTZ GENERAL SERVICES $102,060.00 2041.2

FORD FIRE $100,440.00 2008.8

DANIELS POLICE $106,920.00 2138.4

MALONEY FIRE $127,566.00 2551.32

MOLLOY COMMUNITY DEVELOPMENT $102,060.00 2041.2

NIEGO FIRE $143,682.00 2873.64

PRICE FIRE $103,590.00 2071.8

TATE JR POLICE $103,590.00 2071.8

JIMENEZ PROCUREMENT $113,448.00 2268.96

CEBALLOS WATER MGMNT $113,448.00 2268.96

FERMAN FIRE $131,466.00 2629.32

SHUM TRANSPORTN $104,736.00 2094.72

FUNK FIRE $101,688.00 2033.76

WRZESINSKI FIRE $105,918.00 2118.36

GOLDSMITH POLICE $110,370.00 2207.4

FRANKO POLICE $106,920.00 2138.4

RADDATZ POLICE $110,370.00 2207.4

MC NABB FIRE $105,918.00 2118.36

FLEMING POLICE $100,440.00 2008.8

CASEY POLICE $110,370.00 2207.4

MACELLAIO JR WATER MGMNT $101,920.00 2038.4

DARLING LAW $149,160.00 2983.2

NASH FINANCE $103,740.00 2074.8

MC COY POLICE $110,370.00 2207.4

HOLDER WATER MGMNT $108,534.40 2170.688

TAYLOR FIRE $108,462.00 2169.24

PERFETTI POLICE $112,068.00 2241.36

JACOBS POLICE $106,920.00 2138.4

HENRY AVIATION $108,534.40 2170.688

IRELAND FIRE $113,400.00 2268

Now it’s working! Make the same change to your program. Check the output to make sure that SAS processes the data properly.

An alternative to using the keyword CALCULATED is to repeat the calculation expression in the WHERE clause. In the preceding
program, the WHERE clause can be rewritten as:

where Employee_annual_salary *0.02 >2000;

But note that this is not an efficient way to do this because SAS has to do the calculation twice.

16.4. Sorting Data


The SELECT and FROM clauses are indispensable in the SQL query. Other clauses may be optional but very useful when querying a
table. Last section introduced the WHERE clause and how to use it to select rows conditionally. From previous examples, you may
have noticed that the order of observations might remain the same as they were in original data set. If, however, you want to specify
the order of data, you will need the ORDER BY clause to sort the data as you want.

Example
The following SAS program uses ORDER BY inside PROC SQL to sort the data in the file survey.sas7bdat by the values of gender
and GPA:

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 8/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

PROC SQL;
select ID, Gender, GPA, SATM, SATV
from phc6089.survey
where SATV is not null and GPA > 3
order by Gender, GPA ;
QUIT;

Traffic volume in Area SS


During weekdays and weekends

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 9/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

id Gender GPA SATM SATV

1219 Female 3.01 630 590

1039 Female 3.02 560 560

1125 Female 3.08 470 510

1203 Female 3.1 500 610

1139 Female 3.1 530 600

1068 Female 3.1 550 610

1116 Female 3.1 560 550

1072 Female 3.1 570 570

1138 Female 3.12 560 580

1120 Female 3.16 680 670

1102 Female 3.2 620 630

1142 Female 3.2 720 500

1020 Female 3.2 600 630

1201 Female 3.21 760 660

1089 Female 3.25 500 600

1133 Female 3.27 450 550

1144 Female 3.27 640 600

1163 Female 3.3 600 590

1038 Female 3.3 680 650

1069 Female 3.3 600 490

1033 Female 3.3 650 600

1115 Female 3.3 580 620

1057 Female 3.3 600 600

1037 Female 3.3 500 400

1109 Female 3.31 500 490

1215 Female 3.33 650 630

1078 Female 3.33 570 490

1060 Female 3.36 540 550

1030 Female 3.36 450 450

1129 Female 3.4 500 540

1100 Female 3.4 490 520

1196 Female 3.41 480 560

1046 Female 3.42 660 580

1165 Female 3.45 560 500

1148 Female 3.46 620 640

1082 Female 3.48 550 690

1096 Female 3.48 750 550

1168 Female 3.5 650 560

1091 Female 3.5 800 650

1018 Female 3.5 600 750

1181 Female 3.5 600 550

1123 Female 3.51 560 530

1177 Female 3.53 520 500

1063 Female 3.53 560 590

1015 Female 3.55 400 600

1058 Female 3.55 585 590

1160 Female 3.57 700 700

1159 Female 3.59 640 440

1202 Female 3.6 460 540

1004 Female 3.6 710 560

1086 Female 3.6 540 570

1035 Female 3.61 500 550

1187 Female 3.62 780 660

1149 Female 3.63 640 700

1110 Female 3.63 700 600

1011 Female 3.67 690 690

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 10/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

id Gender GPA SATM SATV

1044 Female 3.7 570 560

1145 Female 3.7 500 450

1207 Female 3.7 640 670

1052 Female 3.7 600 620

1218 Female 3.71 540 560

1174 Female 3.74 680 600

1118 Female 3.74 650 700

1031 Female 3.75 640 620

1073 Female 3.76 600 550

1005 Female 3.76 600 520

1055 Female 3.76 760 600

1204 Female 3.77 650 630

1199 Female 3.77 620 640

1134 Female 3.78 650 600

1208 Female 3.78 550 400

1010 Female 3.8 580 540

1197 Female 3.8 600 700

1205 Female 3.8 550 550

1077 Female 3.81 560 610

1105 Female 3.81 510 680

1179 Female 3.83 660 660

1023 Female 3.88 670 680

1094 Female 3.89 640 710

1067 Female 3.9 640 560

1065 Female 3.9 575 600

1081 Female 3.94 620 600

1014 Female 4 700 700

1075 Female 4 650 550

1214 Female 4 590 500

1157 Male 3.02 400 400

1047 Male 3.02 700 570

1084 Male 3.03 690 690

1140 Male 3.04 540 600

1152 Male 3.05 600 500

1130 Male 3.06 660 620

1151 Male 3.08 590 590

1053 Male 3.08 420 490

1097 Male 3.1 640 430

1137 Male 3.1 580 640

1090 Male 3.12 560 330

1185 Male 3.13 600 600

1101 Male 3.13 570 490

1209 Male 3.14 580 580

1041 Male 3.16 620 760

1220 Male 3.19 480 480

1153 Male 3.2 550 630

1071 Male 3.2 650 660

1119 Male 3.2 640 630

1045 Male 3.2 680 710

1217 Male 3.21 620 400

1200 Male 3.21 650 590

1222 Male 3.24 510 680

1175 Male 3.3 640 600

1170 Male 3.3 610 590

1016 Male 3.3 640 600

1095 Male 3.33 690 650

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 11/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

id Gender GPA SATM SATV

1212 Male 3.35 570 480

1098 Male 3.36 540 520

1027 Male 3.36 550 600

1017 Male 3.38 720 580

1087 Male 3.4 580 570

1161 Male 3.4 600 630

1036 Male 3.42 530 760

1221 Male 3.42 590 610

1194 Male 3.46 640 600

1009 Male 3.48 690 620

1029 Male 3.5 680 680

1048 Male 3.51 650 590

1024 Male 3.51 600 730

1034 Male 3.53 670 630

1206 Male 3.53 620 630

1111 Male 3.54 540 570

1150 Male 3.55 630 650

1106 Male 3.57 620 500

1042 Male 3.66 640 640

1169 Male 3.67 650 670

1019 Male 3.7 600 720

1080 Male 3.72 660 660

1079 Male 3.72 700 640

1059 Male 3.72 800 750

1173 Male 3.73 580 590

1166 Male 3.74 700 520

1070 Male 3.76 600 610

1062 Male 3.76 670 670

1226 Male 3.78 630 520

1162 Male 3.83 710 710

1006 Male 3.86 610 720

1180 Male 3.86 720 500

1172 Male 3.87 780 580

1122 Male 3.88 670 510

1131 Male 3.92 730 800

1007 Male 3.94 710 670

1028 Male 4 610 600

Launch and run the SAS program, and review the output to convince yourself that the query result is in order first by gender and
then by GPA.

Several things need to be pointed out regarding the above program:

1. You can use one or more column in ORDER BY to sort the data. Comma is used to separate multiple column names. In this
example, two columns have been used, Gender and GPA. So the data will be sorted by Gender first, then by GPA in order.
2. By default, the values of column(s) will be sorted ascendingly. For example, there are two values in Gender, Female and
Male. In the query result, Female records are listed first, then male ones because SAS sorted them by the first letter in
alphabetical order. As to GPA order, since it’s numeric, SAS sorted observations by number values of GPA inside each
gender group.
3. The WHERE clause is used to select observations that her/his SAT verbal score is not missing and GPA greater than 3. “is
not null” and “is not missing” are interchangeable to indicate no missing values included.

As in PROC SORT, if you want to change the default ascending order into descending order, you just need to specify DESC
following the column name.

Example
The following SAS program sorts the data survey.sas7bdat by the values of gender in descending order then by GPA ascendingly:

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 12/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

PROC SQL;
select ID, Gender, GPA, SATM, SATV
from phc6089.survey
where SATV is not null and GPA > 3
order by Gender desc, 3 ;
QUIT;

Traffic volume in Area SS


During weekdays and weekends

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 13/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

id Gender GPA SATM SATV

1047 Male 3.02 700 570

1157 Male 3.02 400 400

1084 Male 3.03 690 690

1140 Male 3.04 540 600

1152 Male 3.05 600 500

1130 Male 3.06 660 620

1053 Male 3.08 420 490

1151 Male 3.08 590 590

1137 Male 3.1 580 640

1097 Male 3.1 640 430

1090 Male 3.12 560 330

1185 Male 3.13 600 600

1101 Male 3.13 570 490

1209 Male 3.14 580 580

1041 Male 3.16 620 760

1220 Male 3.19 480 480

1045 Male 3.2 680 710

1071 Male 3.2 650 660

1119 Male 3.2 640 630

1153 Male 3.2 550 630

1200 Male 3.21 650 590

1217 Male 3.21 620 400

1222 Male 3.24 510 680

1175 Male 3.3 640 600

1016 Male 3.3 640 600

1170 Male 3.3 610 590

1095 Male 3.33 690 650

1212 Male 3.35 570 480

1098 Male 3.36 540 520

1027 Male 3.36 550 600

1017 Male 3.38 720 580

1161 Male 3.4 600 630

1087 Male 3.4 580 570

1221 Male 3.42 590 610

1036 Male 3.42 530 760

1194 Male 3.46 640 600

1009 Male 3.48 690 620

1029 Male 3.5 680 680

1048 Male 3.51 650 590

1024 Male 3.51 600 730

1034 Male 3.53 670 630

1206 Male 3.53 620 630

1111 Male 3.54 540 570

1150 Male 3.55 630 650

1106 Male 3.57 620 500

1042 Male 3.66 640 640

1169 Male 3.67 650 670

1019 Male 3.7 600 720

1080 Male 3.72 660 660

1079 Male 3.72 700 640

1059 Male 3.72 800 750

1173 Male 3.73 580 590

1166 Male 3.74 700 520

1070 Male 3.76 600 610

1062 Male 3.76 670 670

1226 Male 3.78 630 520

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 14/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

id Gender GPA SATM SATV

1162 Male 3.83 710 710

1180 Male 3.86 720 500

1006 Male 3.86 610 720

1172 Male 3.87 780 580

1122 Male 3.88 670 510

1131 Male 3.92 730 800

1007 Male 3.94 710 670

1028 Male 4 610 600

1219 Female 3.01 630 590

1039 Female 3.02 560 560

1125 Female 3.08 470 510

1072 Female 3.1 570 570

1139 Female 3.1 530 600

1068 Female 3.1 550 610

1116 Female 3.1 560 550

1203 Female 3.1 500 610

1138 Female 3.12 560 580

1120 Female 3.16 680 670

1020 Female 3.2 600 630

1142 Female 3.2 720 500

1102 Female 3.2 620 630

1201 Female 3.21 760 660

1089 Female 3.25 500 600

1144 Female 3.27 640 600

1133 Female 3.27 450 550

1163 Female 3.3 600 590

1038 Female 3.3 680 650

1037 Female 3.3 500 400

1069 Female 3.3 600 490

1033 Female 3.3 650 600

1057 Female 3.3 600 600

1115 Female 3.3 580 620

1109 Female 3.31 500 490

1078 Female 3.33 570 490

1215 Female 3.33 650 630

1060 Female 3.36 540 550

1030 Female 3.36 450 450

1129 Female 3.4 500 540

1100 Female 3.4 490 520

1196 Female 3.41 480 560

1046 Female 3.42 660 580

1165 Female 3.45 560 500

1148 Female 3.46 620 640

1082 Female 3.48 550 690

1096 Female 3.48 750 550

1181 Female 3.5 600 550

1168 Female 3.5 650 560

1018 Female 3.5 600 750

1091 Female 3.5 800 650

1123 Female 3.51 560 530

1177 Female 3.53 520 500

1063 Female 3.53 560 590

1058 Female 3.55 585 590

1015 Female 3.55 400 600

1160 Female 3.57 700 700

1159 Female 3.59 640 440

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 15/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

id Gender GPA SATM SATV

1086 Female 3.6 540 570

1004 Female 3.6 710 560

1202 Female 3.6 460 540

1035 Female 3.61 500 550

1187 Female 3.62 780 660

1149 Female 3.63 640 700

1110 Female 3.63 700 600

1011 Female 3.67 690 690

1044 Female 3.7 570 560

1145 Female 3.7 500 450

1052 Female 3.7 600 620

1207 Female 3.7 640 670

1218 Female 3.71 540 560

1174 Female 3.74 680 600

1118 Female 3.74 650 700

1031 Female 3.75 640 620

1073 Female 3.76 600 550

1055 Female 3.76 760 600

1005 Female 3.76 600 520

1204 Female 3.77 650 630

1199 Female 3.77 620 640

1134 Female 3.78 650 600

1208 Female 3.78 550 400

1010 Female 3.8 580 540

1205 Female 3.8 550 550

1197 Female 3.8 600 700

1077 Female 3.81 560 610

1105 Female 3.81 510 680

1179 Female 3.83 660 660

1023 Female 3.88 670 680

1094 Female 3.89 640 710

1067 Female 3.9 640 560

1065 Female 3.9 575 600

1081 Female 3.94 620 600

1075 Female 4 650 550

1014 Female 4 700 700

1214 Female 4 590 500

There are only two places that are different from the program in the previous example. DESC has been added after Gender to tell
SAS to sort the data descending. Another way to refer to the column rather than its name is its location in the SELECT clause.
GPA is listed as the third one so that we can use 3 to specify GPA.

Launch and run the SAS program, and then review the output to convince yourself that the output from this query is in descending
order of Gender and in ascending order of GPA.

Up until now, you might think that ORDER BY can perform the same as PROC SORT. Actually, it can do more than that. Let’s find
out with the next example.

Example
The following program sorts the survey data first by gender in descending order as before, then by mean values of SAT math and
verbal scores in ascending order:

PROC SQL;
select *
from phc6089.survey
where SATV is not null and GPA>3
order by Gender desc, MEAN(SATM,SATV) ;
QUIT;

Traffic volume in Area SS

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 16/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes
During weekdays and weekends

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 17/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

id Gender GPA SmokeCigarettes SATM SATV

1157 Male 3.02 No 400 400

1090 Male 3.12 No 560 330

1053 Male 3.08 No 420 490

1220 Male 3.19 No 480 480

1217 Male 3.21 No 620 400

1212 Male 3.35 No 570 480

1101 Male 3.13 No 570 490

1098 Male 3.36 No 540 520

1097 Male 3.1 No 640 430

1152 Male 3.05 No 600 500

1111 Male 3.54 No 540 570

1106 Male 3.57 No 620 500

1140 Male 3.04 Yes 540 600

1226 Male 3.78 No 630 520

1087 Male 3.4 No 580 570

1027 Male 3.36 No 550 600

1209 Male 3.14 No 580 580

1173 Male 3.73 No 580 590

1122 Male 3.88 No 670 510

1153 Male 3.2 No 550 630

1151 Male 3.08 No 590 590

1222 Male 3.24 No 510 680

1185 Male 3.13 No 600 600

1221 Male 3.42 Yes 590 610

1170 Male 3.3 No 610 590

1070 Male 3.76 No 600 610

1028 Male 4 No 610 600

1137 Male 3.1 No 580 640

1166 Male 3.74 No 700 520

1180 Male 3.86 No 720 500

1161 Male 3.4 No 600 630

1175 Male 3.3 Yes 640 600

1016 Male 3.3 No 640 600

1194 Male 3.46 No 640 600

1048 Male 3.51 No 650 590

1200 Male 3.21 No 650 590

1206 Male 3.53 No 620 630

1119 Male 3.2 No 640 630

1047 Male 3.02 No 700 570

1150 Male 3.55 No 630 650

1042 Male 3.66 No 640 640

1130 Male 3.06 No 660 620

1036 Male 3.42 No 530 760

1034 Male 3.53 No 670 630

1017 Male 3.38 No 720 580

1071 Male 3.2 No 650 660

1009 Male 3.48 No 690 620

1080 Male 3.72 No 660 660

1169 Male 3.67 No 650 670

1019 Male 3.7 No 600 720

1006 Male 3.86 No 610 720

1024 Male 3.51 No 600 730

1079 Male 3.72 No 700 640

1062 Male 3.76 No 670 670

1095 Male 3.33 No 690 650

1172 Male 3.87 Yes 780 580

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 18/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

id Gender GPA SmokeCigarettes SATM SATV

1029 Male 3.5 No 680 680

1041 Male 3.16 No 620 760

1007 Male 3.94 No 710 670

1084 Male 3.03 No 690 690

1045 Male 3.2 No 680 710

1162 Male 3.83 No 710 710

1131 Male 3.92 No 730 800

1059 Male 3.72 No 800 750

1037 Female 3.3 No 500 400

1030 Female 3.36 No 450 450

1145 Female 3.7 No 500 450

1208 Female 3.78 No 550 400

1125 Female 3.08 No 470 510

1109 Female 3.31 No 500 490

1133 Female 3.27 No 450 550

1015 Female 3.55 No 400 600

1202 Female 3.6 No 460 540

1100 Female 3.4 No 490 520

1177 Female 3.53 No 520 500

1196 Female 3.41 No 480 560

1129 Female 3.4 No 500 540

1035 Female 3.61 No 500 550

1078 Female 3.33 No 570 490

1165 Female 3.45 No 560 500

1159 Female 3.59 No 640 440

1069 Female 3.3 No 600 490

1060 Female 3.36 No 540 550

1123 Female 3.51 No 560 530

1214 Female 4 No 590 500

1089 Female 3.25 Yes 500 600

1218 Female 3.71 No 540 560

1205 Female 3.8 No 550 550

1086 Female 3.6 No 540 570

1116 Female 3.1 No 560 550

1203 Female 3.1 No 500 610

1039 Female 3.02 No 560 560

1010 Female 3.8 No 580 540

1005 Female 3.76 No 600 520

1044 Female 3.7 No 570 560

1139 Female 3.1 No 530 600

1072 Female 3.1 No 570 570

1138 Female 3.12 No 560 580

1181 Female 3.5 No 600 550

1073 Female 3.76 No 600 550

1063 Female 3.53 No 560 590

1068 Female 3.1 No 550 610

1077 Female 3.81 No 560 610

1065 Female 3.9 No 575 600

1058 Female 3.55 No 585 590

1163 Female 3.3 No 600 590

1105 Female 3.81 No 510 680

1075 Female 4 No 650 550

1067 Female 3.9 No 640 560

1057 Female 3.3 No 600 600

1115 Female 3.3 No 580 620

1168 Female 3.5 No 650 560

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 19/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

id Gender GPA SmokeCigarettes SATM SATV

1081 Female 3.94 No 620 600

1142 Female 3.2 No 720 500

1052 Female 3.7 No 600 620

1219 Female 3.01 No 630 590

1020 Female 3.2 No 600 630

1082 Female 3.48 No 550 690

1144 Female 3.27 No 640 600

1046 Female 3.42 No 660 580

1033 Female 3.3 No 650 600

1134 Female 3.78 No 650 600

1102 Female 3.2 No 620 630

1148 Female 3.46 No 620 640

1031 Female 3.75 No 640 620

1199 Female 3.77 No 620 640

1004 Female 3.6 No 710 560

1174 Female 3.74 No 680 600

1215 Female 3.33 No 650 630

1204 Female 3.77 No 650 630

1110 Female 3.63 No 700 600

1096 Female 3.48 No 750 550

1197 Female 3.8 No 600 700

1207 Female 3.7 No 640 670

1179 Female 3.83 No 660 660

1038 Female 3.3 No 680 650

1149 Female 3.63 No 640 700

1023 Female 3.88 No 670 680

1018 Female 3.5 No 600 750

1120 Female 3.16 No 680 670

1118 Female 3.74 No 650 700

1094 Female 3.89 No 640 710

1055 Female 3.76 No 760 600

1011 Female 3.67 Yes 690 690

1160 Female 3.57 No 700 700

1014 Female 4 No 700 700

1201 Female 3.21 Yes 760 660

1187 Female 3.62 No 780 660

1091 Female 3.5 No 800 650

Since all columns will be used in the query, * is used to specify all columns after SELECT. The WHERE clause remains the same.
In the ORDER BY clause, besides Gender, one function is used to calculate the average scores of SATM and SATV, then uses the
calculation results to sort the data inside each gender group. To get the same result, could you try other SAS steps and count how
many of them will be needed?

Launch and run the SAS program, and review the output to convince yourself that the data has been sorted in desired order.

One more thing, you may notice a note in log window when running this program.

NOTE: The query as specified involves ordering by an item that doesn't appear in its SELECT clause.

That’s because MEAN(SATM,SATV) is not listed in the SELECT clause, only in the ORDER BY clause.

16.5. Summarizing and Grouping Data


In previous sections, we use the SQL procedure to generate detailed reports. Sometimes, the summarized report is also necessary for
us to explore data. To do that, we are going to need summary functions and/or the GROUP BY clause in PROC SQL.

Many summary functions that are used in other SAS steps can also work well in PROC SQL. Below is the table of summary functions
you can request:

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 20/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

Summary Description
function
AVG, MEAN mean or average of values
COUNT, FREQ, number of non-missing values
N
CSS corrected sum of squares
CV coefficient of variation(percent)
MAX largest value
MIN smallest value
NMISS number of missing values
PRT probability of a greater absolute value of
student's t
RANGE range of values
STD standard deviation
STDERR standard error of the mean
SUM sum of values
T student's t value for testing hypothesis
USS uncorrected sum of squares
VAR variance

Note: some functions have multiple names. The first listed is the SQL name.

Next we will work through examples to see how these functions perform calculations in PROC SQL. Along the way, the GROUP BY
clause will be introduced and work with the functions.

Example
The following program uses the AVG() function to calculate the mean scores of SAT math and verbal test:

PROC SQL;
select avg(SATM) as average_Math,
avg(SATV) as average_Verbal
from phc6089.survey;
QUIT

Traffic volume in Area SS


During weekdays and weekends

average_Math average_Verbal

599.0046 580.3256

First launch and run the SAS program. When checking the output you will see two overall average scores have been calculated for
SATM and SATV separately. There is only one observation in the output window.

Let’s review the function in the code. To calculate average, either MEAN() or AVG() can be used in this case. Note that there is
only one argument (column) inside the function AVG(). So the statistic is calculated across all rows for one column.

AVG(STAM)—the overall average score of SATM


AVG(STAV)—the overall average score of SATV

Quite simple, right? Let’s add one more argument into the function. Can you guess how many observations will be in the output?

Example
In the following program, two columns are the arguments of the function MEAN():

PROC SQL;
select mean(SATM, SATV) as average
from phc6089.survey;
QUIT;

Traffic volume in Area SS


During weekdays and weekends

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 21/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

average

700

600

470

635

560

665

690

595

655

560

690

515

700

500

620

650

675

660

615

450

650

675

665

620

560

575

605

680

450

630

350

625

650

525

645

450

665

560

690

640

615

565

695

620

635

620

630

450

600

610

455

605

680

550

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 22/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

average

600

587.5

775

545

615

670

575

625

587.5

425

600

580

545

605

655

570

575

475

600

580

585

530

670

660

610

620

570

690

710

555

575

550

445

725

575

560

675

670

650

535

530

650

505

530

625

475

595

560

580

525

495

650

555

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 23/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

average

600

530

600

555

525

675

635

675

475

590

545

435

490

645

625

610

520

640

765

420

500

625

585

625

610

570

565

570

610

525

620

475

590

600

630

670

640

590

550

590

485

660

400

540

540

700

615

710

595

625

530

610

535

605

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 24/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

average

660

600

500

680

585

640

620

595

510

650

660

610

575

550

505

600

530

720

610

590

540

530

500

620

555

520

650

525

630

620

710

500

555

640

550

625

655

475

580

625

610

525

425

545

640

640

510

550

610

480

600

595

640

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 25/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

average

635

575

We changed the program a little bit. Both SATM and SATV are put inside the function as arguments. Launch and run the SAS
program. You will see there are 226 observations, which is the same as in the original survey data.

If you add more than one column as arguments of summary functions, SAS will perform the calculation across the columns for
each row to generate the above output.

In this case, the summary function is not performing aggregation anymore. SAS then looks for a like-named function in BASE SAS.
If yes, the calculation will be performed for each row; if not, an error message will be output in the log window. You can try to
change MEAN() to AVG() to see what will happen.

ERROR: Function AVG could not be located.

Example
The following program uses only one argument for MEAN(), but add one more column in the SELECT clause:

PROC SQL;
select Gender,
mean(SATM) as average_Math
from phc6089.survey;
QUIT;

Traffic volume in Area SS


During weekdays and weekends

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 26/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

Gender average_Math

Male 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 27/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

Gender average_Math

Female 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 28/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

Gender average_Math

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Male 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 29/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

Gender average_Math

Male 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Male 599.0046

Male 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Female 599.0046

Female 599.0046

Male 599.0046

Male 599.0046

Male 599.0046

Male 599.0046

Male 599.0046

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 30/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

Gender average_Math

Female 599.0046

Male 599.0046

In the above program, the the SELECT statement changed again. This time, only one argument is for the MEAN() function to
calculate the overall average score of SAT math grades. Outside the function, another column has been selected as well. What
output will it produce?

Launch and run the SAS program. You may be surprised that the output contains 226 rows. Review the output you will see two
things that have been done by the above code:

1. It calculated the overall average math score for all students.


2. It displayed all rows in the output because Gender is not an argument of MEAN() function.

Note that the overall average math score is just repeated for each row. You can find a message like the one below in the log
window. When you submit such a program, SAS calculate the statistic first. Then merge it back with other columns. That’s how
“remerging” happens.

NOTE: The query requires remerging summary statistics back with the original data

The above result is not what we wanted. Now, let’s see how to use the GROUP BY clause to make it reasonable.

Example
The following example calculates the average SAT math score for each gender group:

PROC SQL;
select Gender,
mean(SATM) as average_Math
from phc6089.survey
group by Gender;
QUIT;

Traffic volume in Area SS


During weekdays and weekends

Gender average_Math

Female 589.5082

Male 611.3298

The above program seems identical to the program in the previous example except for one more clause: GROUP BY. Finally, we
get it right and obtain the desired result: the average SAT math scores for female and male students. Of course, you can make
further use of GROUP BY by adding multiple columns. Let’s find out with the next example.

Example
The following program uses both Gender and SmokeCigarettes in the GROUP BY clause to calculate the average SAT math
scores:

PROC SQL;
select Gender, SmokeCigarettes,
mean(SATM) as average_Math
from phc6089.survey
group by 1, 2;
QUIT;

Traffic volume in Area SS


During weekdays and weekends

Gender SmokeCigarettes average_Math

Female No 589.6552

Female Yes 586.6667

Male No 613.2353

Male Yes 593.3333

Launch and run the SAS program, then review the output. As you can see, the average math scores are calculated for each
smoking group (Yes or No) inside each gender group (Female or Male).

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 31/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes
Just one more thing about this program, the columns can also be referred to by their locations in the SELECT clause as in the
WHERE clause. Here, 1 and 2 are used to refer to Gender and SmokeCigarettes.

Next, we will pay attention to one special summary function in SQL, which is COUNT(). You can use the COUN() function to count
the non-missing values.

Example
The following example count the number of rows in survey data, the number of non-missing records for math and verbal test
scores, and the distinct values of gender:

PROC SQL;
select count(*) as No_obs,
count(SATM) as No_Math_records,
count(SATV) as No_Verbal_records,
count(distinct Gender) as Gender_group
from phc6089.survey;
QUIT;

Traffic volume in Area SS


During weekdays and weekends

No_obs No_Math_records No_Verbal_records Gender_group

226 216 215 2

The above code reveals three different common ways of using the COUNT() function.

1. Count(*) is to count total number of rows in a table. COUNT() is the only function that allows you to use * as an argument.
2. Count(column) is to count the number of non-missing values in a column. In the program, we count the number of non-
missing values for math and verbal scores.
3. Count(distinct column) is to count the total number of unique values in a column. In the above example, we count the
number of gender categories.

Launch and run the SAS program, then review the output. With knowledge of some of the missing values inside the table, we are
not surprised to see the first three numbers unmatched. The total number of rows in survey data is 226. The total numbers of non-
missing values of math and verbal scores are 216 and 215, separately. Both numbers are less than 226, which means there are
missing values in each column, and SATV has one more value missing. There are only two categories in Gender, Male and
Female. So the last count is 2.

16.6. Using the HAVING Clause


Previously we learned how to use the GROUP BY clause to group and summarize data. Sometimes, we want to select certain groups
from the result. That’s when the HAVING clause comes into play.

Example
The following program calculates the average salary for each department, then select three departments as needed in the query
output:

PROC SQL;
select Department,
avg(Employee_annual_salary) as Avg_salary format=DOLLAR12.2
from phc6089.salary
group by Department
having Department in ('LAW','FINANCE','FIRE')
order by Avg_salary;
QUIT;

Traffic volume in Area SS


During weekdays and weekends

Department Avg_salary

LAW $71,082.20

FINANCE $82,184.00

FIRE $90,742.33

Let’s review the program first. The code selects the column Department and uses the summary function AVG() to compute the
average salaries. Since the GROUP BY clause also is also present in the SELECT statement, the averages are for each
department. The user is only interested in three departments, Law, Finance and Fire. So we use the HAVING clause to select only
these three to be output. Finally, we ask SAS to sort the data by average salaries. This program contains every clause we have
learned so far except the WHERE clause, which we will address later.

Launch and run the SAS program and review the output to make sure you understand the output.
https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 32/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes
You may wonder if WHERE can do the same thing as HAVING does in the above program. You can try replacing Having with
WHERE clause as following. You will get identical output as before.

PROC SQL;
select Department,
avg(Employee_annual_salary) as Avg_salary format=DOLLAR12.2
from phc6089.salary
where Department in ('LAW','FINANCE','FIRE')
group by Department
order by Avg_salary;
QUIT;

Traffic volume in Area SS


During weekdays and weekends

Department Avg_salary

LAW $71,082.20

FINANCE $82,184.00

FIRE $90,742.33

However, let’s not assume that WHERE and HAVING are the same based on this. There are some big differences between them.
Generally speaking, HAVING has control on grouped data during output; WHERE controls input data row by row. Let’s see more
examples about these two commands.

Example
The following program calculates the average salary for each department and choose ones having more than \$70,000:

PROC SQL;
select Department,
avg(Employee_annual_salary) as Avg_salary format=DOLLAR12.2
from phc6089.salary
group by Department
having Avg_salary > 70000
order by Avg_salary;
QUIT;

Traffic volume in Area SS


During weekdays and weekends

Department Avg_salary

LAW $71,082.20

BOARD OF ELECTION $74,988.00

HEALTH $75,066.86

TRANSPORTN $79,438.18

POLICE $81,850.26

FINANCE $82,184.00

WATER MGMNT $84,780.42

PROCUREMENT $89,236.00

COMMUNITY DEVELOPMENT $90,096.00

DoIT $90,252.00

FIRE $90,742.33

ADMIN HEARNG $91,980.00

BUILDINGS $94,793.01

Only a small change has been made to this program. The condition in the HAVING clause changed the department average salary
more than \$70,000. So, the expression used in the HAVING statement is a summary function. And, the data is sorted by average
values.

Launch and run the SAS program and review the output. As we expect, all departments having more than \$70,000 average salary
are listed as the query result.

Next, let’s try using WHERE to perform the same task.

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 33/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

PROC SQL;
select Department,
avg(Employee_annual_salary) as Avg_salary format=DOLLAR12.2
from phc6089.salary
where calculated Avg_salary > 70000
group by Department
order by Avg_salary;
QUIT;

349 ods listing close;ods html5 (id=saspy_internal) file=stdout


options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
349! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
350
351 PROC SQL;
352 select Department,
353 avg(Employee_annual_salary) as Avg_salary
format=DOLLAR12.2
354 from phc6089.salary
355 where calculated Avg_salary > 70000
356 group by Department
357 order by Avg_salary;
NOTE: Data file PHC6089.SALARY.DATA is in a format that is native to
another host, or the file encoding does not match the session
encoding. Cross Environment Data Access will be used, which
might require additional CPU resources and might reduce
performance.
ERROR: Summary functions are restricted to the SELECT and HAVING
clauses only.
NOTE: PROC SQL set option NOEXEC and will continue to check the
syntax of statements.
358 QUIT;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

359
360 ods html5 (id=saspy_internal) close;ods listing;

361

You must remember that to use the computed result in the WHERE clause, the keyword “CALCULATED” should be inserted.
Oops! SAS gives us an error message like this:

ERROR: Summary functions are restricted to the SELECT and HAVING clauses only.

This example illustrates a big difference between HAVING and WHERE. The summary functions can be used in a HAVING clause
but not in a WHERE clause, because HAVING works on grouped data, but WHERE evaluates existing or calculated data row by
row.

Based on our current experiences with these two clauses, you might prefer to use HAVING since it can be used for both situations.
However, don’t rush to this conclusion either. You will find out more in the next example.

Example
The following two SAS program are similar. The only difference is that the first program uses a WHERE clause and the second
program uses a HAVING clause. They try to accomplish the same task: count how many employees at each position inside Police
Department:

PROC SQL;
select Position_Title,
count(*) as Employees
from phc6089.salary
where Department='POLICE'
group by Position_Title;
QUIT;

PROC SQL;
select Position_Title,
count(*) as Employees
from phc6089.salary
group by Position_Title
having Department='POLICE';
QUIT;

Traffic volume in Area SS


During weekdays and weekends

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 34/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

Position Title Employees

ACCOUNTANT II 1

CLINICAL THERAPIST III 1

CROSSING GUARD 5

CROSSING GUARD - PER AGREEMENT 3

DIR OF POLICE RECORDS 1

FISCAL ADMINISTRATOR 1

POLICE OFFICER 85

POLICE OFFICER (ASSIGNED AS DETECTIVE) 13

POLICE OFFICER / FLD TRNG OFFICER 1

SENIOR DATA ENTRY OPERATOR 2

SERGEANT 11

Traffic volume in Area SS


During weekdays and weekends

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 35/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

Position Title Employees

ACCOUNTANT II 1

CLINICAL THERAPIST III 2

CROSSING GUARD 5

CROSSING GUARD 5

CROSSING GUARD 5

CROSSING GUARD 5

CROSSING GUARD 5

CROSSING GUARD - PER AGREEMENT 3

CROSSING GUARD - PER AGREEMENT 3

CROSSING GUARD - PER AGREEMENT 3

DIR OF POLICE RECORDS 1

FISCAL ADMINISTRATOR 1

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85
Print to PDF
POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 36/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

Position Title Employees

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER 85

POLICE OFFICER (ASSIGNED AS DETECTIVE) 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) 13

POLICE OFFICER / FLD TRNG OFFICER 1

SENIOR DATA ENTRY OPERATOR 4

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 37/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

Position Title Employees

SENIOR DATA ENTRY OPERATOR 4

SERGEANT 11

SERGEANT 11

SERGEANT 11

SERGEANT 11

SERGEANT 11

SERGEANT 11

SERGEANT 11

SERGEANT 11

SERGEANT 11

SERGEANT 11

SERGEANT 11

Now, Launch and run both programs. The output on the top is from the program using WHERE clause; the output on the bottom is
the partial output from the program using HAVING clause.

You might be surprised to see how different these two results are. One would expect a result like the output on the top. But the
output on the bottom has so many more rows, and even some numbers do not match! Let’s review the code to understand what
happened. There are two columns in the SELECT clause, Position_Title and a summary function, count(*), which counts total
number of rows for each position group since we specify Position_Title in the GROUP BY clause. Unlike the programs in the
previous example, the expression used inside WHERE and HAVING references another column, Department, which is not in the
SELECT clause. Therefore, SAS handles them differently in the two programs.

The first program uses the WHERE clause. Since SAS processes the WHERE clause before SELECT and on a row-by-row basis,
the records from Police department are selected from the data first. Then SAS counts the number of employees under each
position title inside the department. For example, there is only one person who is a “CLINICAL THERAPIST III” in the Police
Department. So the count is 1. We obtained the desired output.

On the other hand, the second program uses the HAVING clause. It is equivalent to the following program but without Department
column in the output:

PROC SQL;
select Position_Title,
Department,
count(*) as Employees
from phc6089.salary
group by Position_Title
having Department='POLICE';
QUIT;

Traffic volume in Area SS


During weekdays and weekends

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 38/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

Position Title Department Employees

ACCOUNTANT II POLICE 1

CLINICAL THERAPIST III POLICE 2

CROSSING GUARD POLICE 5

CROSSING GUARD POLICE 5

CROSSING GUARD POLICE 5

CROSSING GUARD POLICE 5

CROSSING GUARD POLICE 5

CROSSING GUARD - PER AGREEMENT POLICE 3

CROSSING GUARD - PER AGREEMENT POLICE 3

CROSSING GUARD - PER AGREEMENT POLICE 3

DIR OF POLICE RECORDS POLICE 1

FISCAL ADMINISTRATOR POLICE 1

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 39/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

Position Title Department Employees

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER POLICE 85

POLICE OFFICER (ASSIGNED AS DETECTIVE) POLICE 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) POLICE 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) POLICE 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) POLICE 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) POLICE 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) POLICE 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) POLICE 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) POLICE 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) POLICE 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) POLICE 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) POLICE 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) POLICE 13

POLICE OFFICER (ASSIGNED AS DETECTIVE) POLICE 13

POLICE OFFICER / FLD TRNG OFFICER POLICE 1

SENIOR DATA ENTRY OPERATOR POLICE 4

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 40/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

Position Title Department Employees

SENIOR DATA ENTRY OPERATOR POLICE 4

SERGEANT POLICE 11

SERGEANT POLICE 11

SERGEANT POLICE 11

SERGEANT POLICE 11

SERGEANT POLICE 11

SERGEANT POLICE 11

SERGEANT POLICE 11

SERGEANT POLICE 11

SERGEANT POLICE 11

SERGEANT POLICE 11

SERGEANT POLICE 11

In this program, SAS counts employee numbers on each position across all departments because of GROUP BY clause. For
example, there is each one person titled “CLINICAL THERAPIST III” in POLICE department and HEALTH department. So the total
count on this position is 2. Since there is an extra column in SELECT clause besides the summary function and a GROUP BY
column, all rows are in the output with counts on each job position. For instance, under position title “CLINICAL THERAPIST III”,
both records have 2 as value of “Employees”. At last, SAS evaluates the condition (Department=POLICE) in HAVING clause to
select rows for the output. That’s why you see Employees=2 for position title “CLINICAL THERAPIST III” in the output from the
second query.

We have seen two examples that show the differences between HAVING and WHERE so far. Since SAS handles them so
differently, when it comes to WHERE or HAVING, pick one that fits your needs the best.

Last but not the least, let’s check out one more cool feature of HAVING clause.

Example
The following program selects the departments whose average salary is lower than the overall salary level:

PROC SQL;
select Department,
avg(Employee_annual_salary) as Avg_salary format=DOLLAR12.2
from phc6089.salary
group by Department
having Avg_salary < (select avg(Employee_annual_salary) from
phc6089.salary)
order by Avg_salary;
QUIT;

Traffic volume in Area SS


During weekdays and weekends

Department Avg_salary

DISABILITIES $36,264.00

OEMC $49,116.80

FAMILY & SUPPORT $53,642.00

PUBLIC LIBRARY $59,030.00

GENERAL SERVICES $65,600.80

BUSINESS AFFAIRS $65,652.00

CITY COUNCIL $66,983.00

AVIATION $67,704.48

STREETS & SAN $68,625.08

LAW $71,082.20

BOARD OF ELECTION $74,988.00

HEALTH $75,066.86

Going through this program, you may not find anything unusual until HAVING clause. Inside the clause it’s not a standard
expression as before, but a query:

(select avg(Employee_annual_salary) from stat482.salary)

Such kind of query is called subquery, inner query or nested query. You can use this query-expression in a HAVING or WHERE
clause. The subquery used in this example is to calculate the overall average salary. The result is compared with average salaries
of each department. Then SAS evaluates the condition “Less than” in HAVING clause to select departments who have less
average salaries to output.
https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 41/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes
Launch and run the SAS program, and review the query result. Convince yourself that the departments’ information has been
selected as described.

16.7. Querying Multiple Tables


So far, all the examples in this lesson are querying a single table. However, as matter of fact, you can specify multiple tables in the
FROM clause. Querying more than more table at a time makes PROC SQL even more powerful in data manipulation.

The following examples use two tables:

Survey Data (survey.sas7bdat) contains:

ID, Gender, GPA, SmokeCigarrets, SATM, SATV

id Gender GPA SmokeCig SATM SATV

1001 Male 2.67 No 700 700

1002 Female 2.98 No 700 500

1003 Female 2.67 No 470 470

1004 Female 3.6 No 710 560

1005 Female 3.76 No 600 520

1006 Male 3.86 No 610 720

1007 Male 3.94 No 710 670

1008 Male 2.8 Yes 610 580

1009 Male 3.48 No 690 620

Survey2 Data (survey2.sas7bdat) contains:

ID, Seating, DiveInfluence, Height, Weight

id Seating DriverInfluen Height Weight

1001 Middle No 68 190

1002 Middle No 54 110

1003 Middle No 65 225

1004 Middle No 52 135


1005 Back No 72 128

1006 Middle No 70 188

1007 Back No 70 155

1008 Middle Yes 68 160

1009 Front No 72 160

Download these two tables if you have not done so. Revise the libnameto reflect the directory that you save the files.

Example
The following program attempts to get demographic information about students from two separate tables, survey and survey2:

PROC SQL;
create table demo_info as
select ID, Gender, Height, Weight
from phc6089.survey, phc6089.survey2;
QUIT;

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 42/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

423 ods listing close;ods html5 (id=saspy_internal) file=stdout


options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
423! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
424
425 PROC SQL;
426 create table demo_info as
427 select ID, Gender, Height, Weight
428 from phc6089.survey, phc6089.survey2;
NOTE: Data file PHC6089.SURVEY.DATA is in a format that is native to
another host, or the file encoding does not match the session
encoding. Cross Environment Data Access will be used, which
might require additional CPU resources and might reduce
performance.
NOTE: Data file PHC6089.SURVEY2.DATA is in a format that is native
to another host, or the file encoding does not match the session
encoding. Cross Environment Data Access will be used, which
might require additional CPU resources and might reduce
performance.
ERROR: Ambiguous reference, column ID is in more than one table.
NOTE: PROC SQL set option NOEXEC and will continue to check the
syntax of statements.
429 QUIT;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

430
431 ods html5 (id=saspy_internal) close;ods listing;

432

Let’s review the code. In this SQL procedure, we used the CREATE TABLE clause to save and name the new table as demo_info.
The subsequent SELECT clause chooses ID, gender, height and weight columns from two tables. In FROM clause, two tables’
names are listed.

Launch and run the SAS program. You should expect no result in the output window because the CREATE TABLE clause
suppresses output. On the other hand, check the log window and you will find the error message: “Ambiguous reference, column
ID is in more than one table”.

As you observed two tables, ID is in both tables and contains the same information. If a column in the SELECT statement appears
in multiple tables, the table it is chosen from has to be specified by adding the table’s name in front as this:

Table.Column

So to make it right, we revise the previous program a little bit: change ID to survey.ID, which means that we use ID from survey
data. The other change is the tables’ names. You can give a table an alias with or without the keyword AS after its original name. In
the following program, we use S1 for survey data and S2 to survey2 data. And as you can see, it’s okay to use one level alias even
for a permanent file. This makes life easier! In this way, ID can be specified as S1.ID.

PROC SQL;
create table demo_info as
select s1.ID, Gender, Height, Weight
from phc6089.survey as s1, phc6089.survey2 as s2;
QUIT;

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 43/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

434 ods listing close;ods html5 (id=saspy_internal) file=stdout


options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
434! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
435
436 PROC SQL;
437 create table demo_info as
438 select s1.ID, Gender, Height, Weight
439 from phc6089.survey as s1, phc6089.survey2 as s2;
NOTE: Data file PHC6089.SURVEY.DATA is in a format that is native to
another host, or the file encoding does not match the session
encoding. Cross Environment Data Access will be used, which
might require additional CPU resources and might reduce
performance.
NOTE: Data file PHC6089.SURVEY2.DATA is in a format that is native
to another host, or the file encoding does not match the session
encoding. Cross Environment Data Access will be used, which
might require additional CPU resources and might reduce
performance.
NOTE: The execution of this query involves performing one or more
Cartesian product joins that can not be optimized.
NOTE: Table WORK.DEMO_INFO created, with 51076 rows and 4 columns.

440 QUIT;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds

441
442 ods html5 (id=saspy_internal) close;ods listing;

443

Everything seems good. Now launch and run the SAS program. As before, there is no output because of the CREATE TABLE
statement. Check the log file in which there are two notes that need your attention (see the last two notes above).

The first is “The execution of this query involves performing one or more Cartesian product joins that can not be optimized”. What
is a Cartesian product? It refers to a query result in which each row in the first table is combined with every row in the second
table. If you specify multiple tables in FROM clause but do not use a WHERE clause to choose needed rows, a Cartesian product
is generated. For example, if we submit the following program:

PROC SQL;
Select *
from table1, table2;

Table1 has 3 rows; Table2 has 3 rows as well. Their Cartesian product contains (3*3)9 rows.

<div class="container">

Table1

name value1
x 1
y 2
z 3

Table2

name value2
A 4
B 5
C 6

Result:

name value1 name value2


x 1 A 4
x 1 B 5
x 1 C 6
y 2 A 4
y 2 B 5
y 2 C 6
z 3 A 4
z 3 B 5
z 3 C 6

In the program for this example, there is no WHERE clause. So SAS generated a Cartesian product and gave you the note. Both
Survey and Survey2 have 226 rows in the table. The query should have (226*226) = 51076 rows as the result. That’s why you got
the other note, “Table Work.demo_info created, with 51076 rows and 4 columns.” Clearly, this can’t be correct. How do we get the
desired result? Let’s make a final push.

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 44/45
3/4/25, 10:27 AM 16. Introduction to PROC SQL — Intro to SAS Notes

Example
The following program selects the demographic information of students (ID, gender, height and weight) from two tables, survey
and survey2:

PROC SQL;
create table demo_info as
select s1.ID, Gender, Height, Weight
from phc6089.survey as s1, phc6089.survey2 as s2
where s1.ID = s2.ID;
select *
from demo_info
where ID < 1010;
QUIT;

Traffic volume in Area SS


During weekdays and weekends

id Gender Height Weight

1001 Male 67 190

1002 Female 54 110

1003 Female 65 225

1004 Female 52 135

1005 Female 72 128

1006 Male 70 188

1007 Male 70 155

1008 Male 68 160

1009 Male 72 160

Let’s check through the code. Only one more clause has been added to the query, WHERE. We use the WHERE clause to subset
the whole Cartesian product by only selecting the rows with matched ID numbers. Note that the column names in the WHERE
clause do not have to be the same. At last, to be able to check the table in person, another query is added to display the data in
the output window.

Launch and run the SAS program, and review the log file and the output.

NOTE: Table WORK.DEMO_INFO creates, with 226 rows and 4 columns.

Finally, we got what we want. As you can see from the query result, it’s like combining two columns from each table horizontally.
SAS also call it join. In this particular case, since we only chose the matched rows, it’s also called the inner join. Such type of join
is very similar to Merge By in the DATA step but requiring less computing resources and less coding. There are other types of join
and data union (a vertical combination of rows) in PROC SQL which are beyond this lesson’s scope. If you are interested, you can
explore them yourself with the foundation of this lesson!

By Dr. Robert Parker


© Copyright 2020.

https://users.phhp.ufl.edu/rlp176/Courses/PHC6089/SAS_notes/16_PROC_SQL.html 45/45

You might also like