SQL
SQL
Introduction
• Introduction to SQL
• Introduction to PROC SQL
• Writing a PROC SQL Step
• Selecting Rows and Columns
• Ordering Rows
• Querying Multiple Tables
• Using SQL Joins
• Summarizing Data
• Additional Features
• Using Union, Outer Union, Except and
Intersect Operators
• Creating Tables and Views
Select columns
Summarize data
Create a tablePrivate
as the output of a query
and Confidential 6
Why PROC SQL?
01 Intuitive language
1 2
Fuzzy merge Summarize data
3 4
Private and Confidential 8
Why PROC SQL?
10 Text wrapping
When you have a long character variable (such as a COMMENT field in
the questionnaire), and you want to print the values using PROC
PRINT, you will get the warning message:
• WARNING: Data too long for column "COMMENT"; truncated to 124
characters to fit.
• A simple solution is to use PROC SQL with the flow option. An
alternative would be to use PROC REPORT.
Private and Confidential 10
Introduction to PROC SQL?
The following figure summarizes the variety of source material that you can use
with PROC SQL and what the procedure can produce.
PROC SQL differs from most other SAS procedures in several ways:
Unlike other PROC statements, many statements in PROC SQL are composed
of clauses.
Example
The following PROC SQL step contains two statements: the
PROC SQL statement and the SELECT statement. The SELECT
statement
contains several clauses: SELECT, FROM, WHERE, and ORDER
BY.
proc sql;
select empid,jobcode,salary, salary*.06 as bonus
from sasuser.payrollmaster where salary<32000
order by jobcode;
Quit;
Example
If you run below query you see below message in the SAS Log,
proc sql; select empid,jobcode,salary, salary*.06 as
bonus
from sasuser.payrollmaster where
salary<32000 order by jobcode;
Quit;
• Unlike many other SAS procedures, PROC SQL continues to run after you submit
a step.
• To end the procedure, you must submit another PROC step, a DATA step, or a
QUIT statement.
Example
proc sql;
select empid,jobcode,salary, salary*.06 as bonus from
sasuser.payrollmaster where salary<32000 order by jobcode;
quit;
Two
Five
Six
None
Two
Five
Six
None
• Before creating a query, you must first reference the library in which your
table is stored.
• Then you write a PROC SQL step to query your table.
Where:
Below Query fetches Employee Id, Salary and calculates bonus for
all employees from payroll master where employee salary less
than 32,000 and sorts output by ascending order of jobcode.
proc sql;
select empid,jobcode,salary, salary*.06 as bonus
from sasuser.payrollmaster
where salary<32000
order by jobcode;
Quit;
SELECT column_name,column_name
FROM table_name;
or
SELECT * FROM table_name;
Example
Suppose you want to print only name ,age and sex of all student from
sashelp.class table then ,
Proc Sql;
Select Name,Age,Sex from sashelp.class;
Quit;
• To create a new column, include any valid SAS expression in the SELECT clause
list of columns.
• You can optionally assign a column alias, a name, to a new column by using the
keyword AS followed by the name that you would like to use.
Example
Proc sql;
select empid,jobcode,salary, salary*.06 as bonus
from sasuser.payrollmaster where salary<32000
order by jobcode ;
Quit;
• After writing the SELECT clause, you have to specify the table to be queried in
the FROM clause.
• Type the keyword FROM, followed by the name of the table.
Proc Sql;
SELECT CustomerName, City FROM
Customers;
Quit;
The PROC SQL step above queries the "CustomerName" and
"City" columns from the "Customers" table.
In the following PROC SQL query, the WHERE clause selects rows in which the
value of the column Salary is less than 32,000.
proc sql;
select empid,jobcode,salary,
salary*.06 as bonus
from sasuser.payrollmaster
where salary<32000
order by jobcode;
Quit;
PROC SQL supports almost all the functions available to the SAS DATA
step that can be used in a proc sql select statement.
Common Functions
COUNT NMISS
DISTINCT RANGE
MAX SUBSTR
MIN LENGTH
SUM UPPER
AVG LOWER
VAR CONCAT
STD ROUND
STDERR MOD
proc sql;
select Count(country) as count Count
from country; 11
quit;
proc sql;
select sum(population) as
tot_population Total population
from country; 4,010,014,754
quit;
proc sql;
select Count(Distinct continent) as
dist_count Dist_Count
from country; 5
quit;
• The order of rows in the output of a PROC SQL query cannot be guaranteed, unless you
specify a sort order.
• To sort rows by the values of specific columns, you can use the ORDER BY clause in the
SELECT statement.
• Specify the keywords ORDER BY, followed by one or more column names separated by
commas.
In the following PROC SQL query, the SELECT statement selects all customers from the
"Customers" table, sorted by the "Country" column:
Proc sql;
SELECT * FROM Customers
ORDER BY Country;
Quit;
Example
SELECT * FROM Customers
ORDER BY Country DESC;
Example
In the following PROC SQL query, the ORDER BY clause sorts by the
values of two columns, JobCode and EmpID:
Proc sql;
select empid,jobcode,salary, salary*.06 as bonus
from sasuser.payrollmaster
where salary<32000
order by jobcode,empid;
Quit;
In the above example ,the rows are sorted first by JobCode and then by EmpID.
Example
The preceding ORDER BY clause could be
rewritten as follows:
order by 2,empid;
PROC SQL steps that create detail reports. But you might also
want to summarize data in groups.
To summarize your data, you can submit the following PROC SQL step:
Proc sql;
select membertype, sum(milestraveled) as TotalMiles from
sasuser.frequentflyers group by membertype;
Quit;
• This topic deals with the more complex task of extracting data from two or
more tables.
• In the previous practice, We wrote PROC SQL step to query a single table.
Suppose you now want to examine data that is stored in two tables.
• PROC SQL allows you to combine tables horizontally, in other words, to
combine rows of data.
Table A Table B
PROC SQL;
CREATE TABLE new AS
SELECT a.patient,
a.date,
a.med,
b.pulse,
b.temp
FROM dosing as a, vitals as b
WHERE a.patient=b.patient
AND a.date=b.date;
QUIT;
PROC SQL;
CREATE TABLE new1 AS
SELECT a.patient,
a.date,
a.med,
b.pulse,
b.temp
FROM dosing LEFT JOIN vitals
ON a.patient=b.patient
AND a.date=b.date;
QUIT;
Resultant dataset will contain all & only those observations which comes
from DOSING dataset.
PROC SQL;
CREATE TABLE new1 AS
SELECT a.patient,
a.date,
a.med,
b.pulse,
b.temp
FROM dosing RIGHT JOIN vitals
ON a.patient=b.patient
AND a.date=b.date;
QUIT;
Resultant dataset will contain all & only those observations which comes from
VITALS dataset.
PROC SQL;
CREATE TABLE new1 AS
SELECT a.patient,
a.date,
a.med,
b.pulse,
b.temp
FROM dosing FULL JOIN vitals
ON a.patient=b.patient
AND a.date=b.date;
QUIT;
Resultant dataset will contain all observation if they come from at least
one of the datasets.
Action Keyword
Specify whether PROC SQL prints the query’s result PRINT|NOPRINT
PROC SQL steps that create detail reports. But you might also
want to summarize data in groups.
To summarize your data, you can submit the following PROC SQL step:
Proc sql;
select membertype, sum(milestraveled) as TotalMiles from
sasuser.frequentflyers group by membertype;
Quit;
General form, basic PROC SQL step for creating a table from a query result:
PROC SQL;
CREATE TABLE table-name AS
SELECT column-1<, . . . column-n>
FROM table-1 | view-1<, . . . table-n | view-n>
<WHERE expression>
<GROUP BY column-1<, ... column-n>>
<ORDER BY column-1<, . . . column-n>>
Quit;
where table-name specifies the name of the table to be created.
Suppose that after determining the total miles traveled for each frequent-flyer
membership class in the Sasuser.Frequentflyers table, you want to store this
information in the temporary table Work.Miles.
Proc sql;
create table work.miles as
select membertype, sum(milestravelled) as TotalMiles
from sasuser.frequentflyers group by membertype;
Quit;
Example
The following PROC SQL query groups the output rows by JobCode.
The HAVING clause uses the summary function AVG to specify that only the
groups that have an average salary that is greater than 40,000 will be
displayed in the output.
Proc sql;
select jobcode,avg(salary) as Avg
from sasuser.payrollmaster group by jobcode having
avg(salary)>40000 order by jobcode;
Quit;
When you have a long character variable (such as a COMMENT field in the questionnaire),
and you want to print the values using PROC PRINT, you will get the warning message:
WARNING:
Data too long for column "COMMENT"; truncated to
124 characters to fit.
Using PROC SQL, you can quickly count non-missing values for several variables and output
the result on one line. (PROC FREQ would produce several output tables with the output
sometimes continuing on to the next page.)
• In our administrative tables, the Social Security Number (SSN) field is not very
well populated.
• We need to go after different sources: membership tables (old, current, and
daily) and other utilization tables.
• PROC SQL makes the selection process very easy, where the COALESCE function
will pick the first non-missing value.
PROC SQL is more intuitive than PROC MEANS or PROC SUMMARY, where SAS will
create an output table that always contains more rows and columns than you need
and you have to choose the right _TYPE_ value.
PROC SQL below summarizes the total number of cases by age group for each year.
PROC SQL;
SELECT YEAR, AGEGRP, SUM(CASES) AS CASES
FROM work.data1
GROUP BY 1,2
QUIT;
PROC SQL;
CREATE TABLE REVIEW AS
SELECT *
FROM SAMPLE, OW_DEATH
WHERE SUM( ((KBMON =SBMON)*2), ((KBDAY =SBDAY)*1),
((KBYEAR =SBYEAR)*2),
((KFSNDX =SFSNDX)*1), ((KFNAME =SFNAME)*1),
((KMNAME =SMNAME)*1),
((KLSNDX =SLSNDX)*1), ((KLNAME =SLNAME)*1),
((KSEX =SSEX)*2) , ((KSSN =SSSN and KSSN ne ' ')*4) )
>=9;
QUIT;
Partial train_a
Training class A is completed in a
ID Name End_Date single session.
11 Bob 15JUN2012 End_Date represents the date of
16 Sam 5JUN2012 training.
14 Pete 21JUN2012
Partial train_b
Name ID SDate EDate Training class B is a multi-
session class.
Bob 11 9JUL2012 13JUL2012 SDate is recorded on the first
Pam 15 25JUL2012 27JUL2012 training day.
Kyle 19 12JUL2012 20JUL2012 EDate is recorded when the
course is complete.
Chris 21 29JUL2012 .
Private and Confidential 71
Discussion
Set operators use the intermediate result sets from two queries to create a
final result set.
Intermediate
Query 1: result set 1
Query 2: Intermediate
result set 2
RS2
Query 1:
List employees
that have completed Final Result
train_a. Set
RS1
Which employees
have completed training UNION
A or B?
RS2
Query 2:
List employees
that have completed
train_b.
Query 1:
List employees
Which employees that have completed Final Result
have completed train_a and the Set
training A and/or completion date.
B and on what RS1
OUTER
dates?
UNION
RS2
Query 2:
List employees
that have completed
train_b and the
completion date.
Query 1:
List employees
that have completed Final Result
Which employees train_a. Set
have completed RS1
training A, but not EXCEPT
training B?
RS2
Query 2:
List employees
that have completed
train_b.
Query 1:
List employees
that have completed Final Result
Which employees train_a. Set
have completed RS1
both training A INTERSECT
and B?
RS2
Query 2:
List employees
that have completed
train_b.
OUTER UNION All rows from both result All columns from both
sets result sets
PROC SQL
orion.employee_addresses orion.employee_payroll
•
• Employee_ID Employee_ID
• Employee_Name Birth_Date
• City
orion.birthmonths
Birth
Name City
Month
Aisbitt, Sandy Melbourne 1
Sheedy, Sherie Melbourne 1
Tannous, Cos Melbourne 1
Private and Confidential 83
Creating a Table: Method 1
For this task, use method 1.
Create a table that contains columns and rows returned by a query on
existing tables.
proc sql;
create table orion.birthmonths as
select Employee_Name as Name format=$25.,
City format=$25.,
Define month(Birth_Date) as BirthMonth
table 'Birth Month' format=3.
from orion.employee_payroll as p,
columns orion.employee_addresses as a Join tables
where p.Employee_ID=a.Employee_ID
and Employee_Term_Date is missing
order by BirthMonth,City,Name;
quit;
proc sql;
create table orion.birthmonths as
select Employee_Name as Name format=$25.,
City format=$25.,
month(Birth_Date) as BirthMonth
'Birth Month' format=3.
from orion.employee_payroll as p,
orion.employee_addresses as a
where p.Employee_ID=
a.Employee_ID
and Employee_Term_Date is missing
order by BirthMonth,City,Name;
NOTE: Table ORION.BIRTHMONTHS created, with 308 rows and 3 columns.
quit;
proc sql;
describe table orion.birthmonths;
select * from orion.birthmonths;
quit;
The DESCRIBE statement writes information about the table to the SAS log.
proc sql;
describe table orion.birthmonths;
NOTE: SQL table ORION.BIRTHMONTHS was created like:
proc sql;
describe table orion.birthmonths;
select * from orion.birthmonths;
quit;
• The SELECT statement creates a report that lists the contents of the table.
• Partial PROC SQL Output
Birth
Name City Month
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Aisbitt, Sandy Melbourne 1
Sheedy, Sherie Melbourne 1
Tannous, Cos Melbourne 1
Boocks, Michael. R. Miami-Dade 1
Chinnis, Kumar Miami-Dade 1
Management wants a table for new sales staff that is structured like
the orion.sales table.
orion.sales work.new_sales_staff
PROC SQL
Copy the table structure from an existing table with the LIKE clause.
proc sql;
create table work.new_sales_staff
like orion.sales;
NOTE: Table WORK.NEW_SALES_STAFF created, with 0 rows and 9
columns.
quit;
PROC SQL
Tom Zhou
The data that Tom needs is name, job title, salary, and years of service.
This data is contained in three tables.
orion.employee_addresses
orion.employee_payroll
orion.employee_organization
What is the best way to help Tom, given the following requirements:
16 proc sql;
17 create view orion.tom_zhou as
18 select Employee_Name as Name format=$25.0,
19 Job_Title as Title format=$15.0,
20 Salary 'Annual Salary' format=comma10.2,
21 int((today()-Employee_Hire_Date)/365.25)
22 as YOS 'Years of Service'
23 from employee_addresses as a,
24 employee_payroll as p,
25 employee_organization as o
26 where a.Employee_ID=p.Employee_ID and
27 o.Employee_ID=p.Employee_ID and
28 Manager_ID=120102;
NOTE: SQL view ORION.TOM_ZHOU has been defined.
Learn more:
https://imarticus.org/corporate/
Email us:
[email protected]
Visit us:
Mumbai | Thane | Pune | Bangalore | Delhi - NCR |
Hyderabad | Chennai | Coimbatore