0% found this document useful (0 votes)
299 views98 pages

SQL

The document provides an introduction to using PROC SQL in SAS. It discusses: 1) PROC SQL allows querying data from SAS data files, views, and other databases. It can produce reports, views, tables, and macro variables. 2) PROC SQL differs from other SAS procedures in that its statements are composed of clauses and it executes queries immediately without a RUN statement. 3) Writing a basic PROC SQL query involves specifying the table or view, columns, optional WHERE, GROUP BY, and ORDER BY clauses between a PROC SQL and QUIT statement.

Uploaded by

Rohit Ghai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
299 views98 pages

SQL

The document provides an introduction to using PROC SQL in SAS. It discusses: 1) PROC SQL allows querying data from SAS data files, views, and other databases. It can produce reports, views, tables, and macro variables. 2) PROC SQL differs from other SAS procedures in that its statements are composed of clauses and it executes queries immediately without a RUN statement. 3) Writing a basic PROC SQL query involves specifying the table or view, columns, optional WHERE, GROUP BY, and ORDER BY clauses between a PROC SQL and QUIT statement.

Uploaded by

Rohit Ghai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 98

SQL PROC in SAS

Introduction

In this session, you will learn about:

• Introduction to SQL
• Introduction to PROC SQL
• Writing a PROC SQL Step
• Selecting Rows and Columns
• Ordering Rows
• Querying Multiple Tables
• Using SQL Joins
• Summarizing Data
• Additional Features
• Using Union, Outer Union, Except and
Intersect Operators
• Creating Tables and Views

Private and Confidential 2


Introduction to SQL

Private and Confidential 3


Introduction to SQL

Mainly used for updating and


retrieving data from relational
databases

SQL is now public domain.

PROC SQL can be viewed as


language within language

Structured Query Language


Developed by E. F. Codd
at IBM
in 1970.

Private and Confidential 4


Introduction to
Proc SQL
Introduction to Proc SQL
The SAS SQL procedure enables you to:

Invoke the SQL procedure

Select columns

Define new columns

Specify the table(s) to be read

Specify sub setting criteria

Order rows by values of one or more columns

Group results by values of one or more columns

End the SQL procedure

Summarize data

Generate a report as the output of a query

Create a tablePrivate
as the output of a query
and Confidential 6
Why PROC SQL?

01 Intuitive language

To produce summarized statistics (reports)


02 without modifying parent table Why
PROC
03 Creating new dataset from multiple datasets

It is good at accessing data stored in multiple


04 data sets at different levels

It can perform matching where the condition


05 of a match is not equality

06 Useful in mining text

Private and Confidential 7


Why PROC SQL?

Top 10 reasons to use SAS PROC SQL:

1 2
Fuzzy merge Summarize data

COALESCE function Insert records to a


table

3 4
Private and Confidential 8
Why PROC SQL?

Top 10 reasons to use SAS PROC SQL:

5 Count frequencies Using


6
PROC SQL, you can quickly
Matching multiple count non-missing values
tables at different levels for several variables and
output the result on one
line
Join tables
Access other databases The advantages are:
PROC SQL is the only way • No sorting needed
you can join a SAS table • Two tables can join on
and an Oracle table. different variable
names
7 8
Private and Confidential 9
Why PROC SQL?

Top 10 reasons to use SAS PROC SQL:

9 Build macro value list


• Outputting information to title/footnote statements using
macro variables when you don’t know the new values
(department code) in advance.
• Using the macro variable as the value for your IN statement

10 Text wrapping
When you have a long character variable (such as a COMMENT field in
the questionnaire), and you want to print the values using PROC
PRINT, you will get the warning message:
• WARNING: Data too long for column "COMMENT"; truncated to 124
characters to fit.
• A simple solution is to use PROC SQL with the flow option. An
alternative would be to use PROC REPORT.
Private and Confidential 10
Introduction to PROC SQL?

The following figure summarizes the variety of source material that you can use
with PROC SQL and what the procedure can produce.

PROC SQL Input


• PROC SQL tables (SAS data files)
• SAS data views (PROC SQL views/DATA
step views/SAS/ACCESS views)
• DBMS tables

PROC SQL Output


• Reports
• PROC SQL views
• PROC SQL tables (SAS data files)
• Macro Variables

Ref: http://www.koerup.dk/Cert/ Private and Confidential 11


How is PROC SQL Unique?

PROC SQL differs from most other SAS procedures in several ways:

Unlike other PROC statements, many statements in PROC SQL are composed
of clauses.

Example
The following PROC SQL step contains two statements: the
PROC SQL statement and the SELECT statement. The SELECT
statement
contains several clauses: SELECT, FROM, WHERE, and ORDER
BY.
proc sql;
select empid,jobcode,salary, salary*.06 as bonus
from sasuser.payrollmaster where salary<32000
order by jobcode;
Quit;

Private and Confidential 12


Ref: http://www.koerup.dk/Cert/
How is PROC SQL Unique?
PROC SQL differs from most other SAS procedures in several ways:
• The PROC SQL step does not require a RUN statement. PROC SQL executes each query
automatically.
• If you use a RUN statement with a PROC SQL step, SAS ignores the RUN statement,
executes the statements as usual, and generates the note shown below in the SAS log.

Example

If you run below query you see below message in the SAS Log,
proc sql; select empid,jobcode,salary, salary*.06 as
bonus
from sasuser.payrollmaster where
salary<32000 order by jobcode;
Quit;

PROC SQL statements are executed immediately;


The RUN statement has no effect.
Private and Confidential 13
Ref: http://www.koerup.dk/Cert/
How is PROC SQL Unique?
PROC SQL differs from most other SAS procedures in several ways:

• Unlike many other SAS procedures, PROC SQL continues to run after you submit
a step.
• To end the procedure, you must submit another PROC step, a DATA step, or a
QUIT statement.

Example
proc sql;
select empid,jobcode,salary, salary*.06 as bonus from
sasuser.payrollmaster where salary<32000 order by jobcode;
quit;

Private and Confidential 14


Ref: http://www.koerup.dk/Cert/
Quiz
How many SAS statements does the PROC SQL step below
contain?
proc sql;
select empid,jobcode,salary,
salary*.06 as bonus
from sasuser.payrollmaster
where salary<32000
order by jobcode;

 Two
 Five
 Six
 None

Private and Confidential 15


Quiz
How many SAS statements does the PROC SQL step below
contain?
proc sql;
select empid,jobcode,salary,
salary*.06 as bonus
from sasuser.payrollmaster
where salary<32000
order by jobcode;

 Two
 Five
 Six
 None

Private and Confidential 16


Writing a PROC
SQL Step
Writing a PROC SQL Step

• Before creating a query, you must first reference the library in which your
table is stored.
• Then you write a PROC SQL step to query your table.

General form, basic PROC SQL step :


PROC SQL;
SELECT column-1<, . . . column-n>
FROM table-1 | view-1<, ... table-n | view-n>
<WHERE expression>
<GROUP BY column-1<, ... column-n>>
<ORDER BY column-1<, ... column-n>>
Quit;

Private and Confidential 18


Ref: http://www.koerup.dk/Cert/
Writing a PROC SQL Step

Where:

PROC SQL Invokes the SQL procedure

Specifies the column(s) to be selected SELECT

FROM Specifies the table(s) to be queried

Subsets the data based on a condition WHERE

Classifies the data into groups based on the specified


GROUP BY
column(s)

Sorts the rows that the query returns by the value(s) of


ORDER BY
the specified column(s)

Private and Confidential 19


Ref: http://www.koerup.dk/Cert/
Writing a PROC SQL Step

Below Query fetches Employee Id, Salary and calculates bonus for
all employees from payroll master where employee salary less
than 32,000 and sorts output by ascending order of jobcode.

proc sql;
select empid,jobcode,salary, salary*.06 as bonus
from sasuser.payrollmaster
where salary<32000
order by jobcode;
Quit;

Private and Confidential 20


Ref: http://www.koerup.dk/Cert/
Selecting Columns
Selecting Columns

• To specify which column(s) to display in a query, you write a


SELECT clause, the first clause in the SELECT statement.
• After the keyword SELECT, list one or more column names and
separate the column names with commas.

General Syntax of SELECT Statement ,

SELECT column_name,column_name
FROM table_name;
or
SELECT * FROM table_name;

Private and Confidential 22


Ref: http://www.koerup.dk/Cert/
Selecting Columns
With the SELECT clause
You can specify particular columns to display (columns that are already stored in
a table).

Example

Suppose you want to print only name ,age and sex of all student from
sashelp.class table then ,
Proc Sql;
Select Name,Age,Sex from sashelp.class;
Quit;

Create new columns


You can create new columns that contain either text or a calculation.
New columns will appear in output, along with any existing columns that are selected.

Private and Confidential 23


Ref: http://www.koerup.dk/Cert/
Selecting Columns

• To create a new column, include any valid SAS expression in the SELECT clause
list of columns.
• You can optionally assign a column alias, a name, to a new column by using the
keyword AS followed by the name that you would like to use.

Example

• The following SELECT clause specifies the columns


EmpID, JobCode, Salary, and bonus.
• The columns EmpID, JobCode, and Salary are existing
columns. The column named bonus is a new column.

A column alias must follow the rules for SAS names.

Private and Confidential 24


Ref: http://www.koerup.dk/Cert/
Selecting Columns

Proc sql;
select empid,jobcode,salary, salary*.06 as bonus
from sasuser.payrollmaster where salary<32000
order by jobcode ;
Quit;

The newly Created columns exist only for the duration


of the query, unless a table or a view is created.

Private and Confidential 25


Ref: http://www.koerup.dk/Cert/
Specifying the Table

• After writing the SELECT clause, you have to specify the table to be queried in
the FROM clause.
• Type the keyword FROM, followed by the name of the table.

Proc Sql;
SELECT CustomerName, City FROM
Customers;
Quit;
The PROC SQL step above queries the "CustomerName" and
"City" columns from the "Customers" table.

You can also assign alias to your table name by adding a


space and alias name.

Private and Confidential 26


Ref: http://www.koerup.dk/Cert/
Specifying Subsetting Criteria

• To subset data based on a condition, use a WHERE clause in the SELECT


statement.
• As in the WHERE statement and the WHERE command used in other SAS
procedures.

In the following PROC SQL query, the WHERE clause selects rows in which the
value of the column Salary is less than 32,000.

proc sql;
select empid,jobcode,salary,
salary*.06 as bonus
from sasuser.payrollmaster
where salary<32000
order by jobcode;
Quit;

Private and Confidential 27


Ref: http://www.koerup.dk/Cert/
SQL Functions

PROC SQL supports almost all the functions available to the SAS DATA
step that can be used in a proc sql select statement.

Common Functions
COUNT NMISS
DISTINCT RANGE
MAX SUBSTR
MIN LENGTH
SUM UPPER
AVG LOWER
VAR CONCAT
STD ROUND
STDERR MOD

Private and Confidential 28


Example – SQL Function

proc sql;
select Count(country) as count Count
from country; 11
quit;

proc sql;
select sum(population) as
tot_population Total population
from country; 4,010,014,754
quit;

Private and Confidential 29


Example – Distinct Function

proc sql; Dist_continent


select (Distinct continent) as ASIA
dist_continent
N. AMERICA
from country;
quit; AFRICA
S. AMERICA
EUROPE

proc sql;
select Count(Distinct continent) as
dist_count Dist_Count
from country; 5
quit;

Private and Confidential 30


Ordering Rows
Ordering Rows

• The order of rows in the output of a PROC SQL query cannot be guaranteed, unless you
specify a sort order.
• To sort rows by the values of specific columns, you can use the ORDER BY clause in the
SELECT statement.
• Specify the keywords ORDER BY, followed by one or more column names separated by
commas.

Syntax of ORDER BY Clause,


SELECT column_name,column_name
FROM table_name
ORDER BY column_name,column_name ASC|DESC;

Private and Confidential 32


Ref: http://www.koerup.dk/Cert/
Ordering Rows

In the following PROC SQL query, the SELECT statement selects all customers from the
"Customers" table, sorted by the "Country" column:

Proc sql;
SELECT * FROM Customers
ORDER BY Country;
Quit;

• By default the ORDER BY keyword sorts the records in ascending


order.
• To sort the records in a descending order, you need to use the
DESC/DESCENDING keyword.

Example
SELECT * FROM Customers
ORDER BY Country DESC;

Private and Confidential 33


Ref: http://www.koerup.dk/Cert/
Ordering Rows
• You can also use ORDER BY clause to sort the result set by Multiple columns.
• To sort rows by the values of two or more columns, list multiple column names (or
numbers) in the ORDER BY clause, and use commas to separate the column names (or
numbers).

Example
In the following PROC SQL query, the ORDER BY clause sorts by the
values of two columns, JobCode and EmpID:

Proc sql;
select empid,jobcode,salary, salary*.06 as bonus
from sasuser.payrollmaster
where salary<32000
order by jobcode,empid;
Quit;

Private and Confidential 34


Ref: http://www.koerup.dk/Cert/
Ordering Rows

In the above example ,the rows are sorted first by JobCode and then by EmpID.

You can mix the two types of column references, names


and numbers, in the ORDER BY clause.

Example
The preceding ORDER BY clause could be
rewritten as follows:
order by 2,empid;

Private and Confidential 35


Ref: http://www.koerup.dk/Cert/
Summarizing Groups of Data

PROC SQL steps that create detail reports. But you might also
want to summarize data in groups.

To group data for summarizing, you can use the GROUP BY


clause.

The GROUP BY clause is used in queries that include one or


more summary functions.

Summary functions produce a statistical summary for each


group that is defined in the GROUP BY clause.

Private and Confidential 36


Ref: http://www.koerup.dk/Cert/
Summarizing Groups of Data – Example

• Suppose you want to determine the total number of miles traveled by


frequent-flyer program members in each of three membership classes (Gold,
Silver, and Bronze).
• Frequent-flyer program information is stored in the
table Sasuser.Frequentflyers.

To summarize your data, you can submit the following PROC SQL step:

Proc sql;
select membertype, sum(milestraveled) as TotalMiles from
sasuser.frequentflyers group by membertype;
Quit;

Private and Confidential 37


Ref: http://www.koerup.dk/Cert/
Summarizing Groups of Data

• In this case, the SUM function totals the values of


the MilesTraveled column to create the TotalMiles column.
• The GROUP BY clause groups the data by the values of MemberType.

• As in the ORDER BY clause, in the GROUP BY clause you specify the


keywords GROUP BY, followed by one or more column names separated
by commas.
• The results show total miles by membership class (MemberType).

If you specify a GROUP BY clause in a query that


does not contain a summary function, your clause is changed
to an ORDER BY clause, and a message to that effect is written
to the SAS log.

Private and Confidential 38


Ref: http://www.koerup.dk/Cert/
Summarizing Groups of Data
• To summarize data, you can use the following summary functions with
PROC SQL.
• Notice that some functions have more than one name to accommodate both
SAS and SQL conventions.
• Where multiple names are listed, the first name is the SQL name

Private and Confidential 39


Ref: http://www.koerup.dk/Cert/
Querying Multiple
Tables and Using SQL
Joins
Querying Multiple Tables

• This topic deals with the more complex task of extracting data from two or
more tables.
• In the previous practice, We wrote PROC SQL step to query a single table.
Suppose you now want to examine data that is stored in two tables.
• PROC SQL allows you to combine tables horizontally, in other words, to
combine rows of data.

In SQL terminology, combining tables horizontally is


called joining tables. Joins do not alter the original tables.

Table A Table B

Private and Confidential 41


Combining Datasets: Joins

Full Join Inner Join


If a or b If a and b

Left Join Right Join


If a If b
Private and Confidential 42
Combining Datasets: Joins

Dataset: Dosing Dataset: Vitals

Private and Confidential 43


Join Tables (Merge Datasets) – Inner Join: Using WHERE

PROC SQL;
CREATE TABLE new AS
SELECT a.patient,
a.date,
a.med,
b.pulse,
b.temp
FROM dosing as a, vitals as b
WHERE a.patient=b.patient
AND a.date=b.date;
QUIT;

• No prior sorting required – one advantage over DATA MERGE


• Use comma (,) to separate two datasets in FROM
• Without WHERE, all possible combinations of rows from each tables is produced,
all columns are included
Private and Confidential 44
Join Tables (Merge Datasets) – Inner Join

Private and Confidential 45


Join Tables (Merge Datasets) – Left Joins: Using ON

PROC SQL;
CREATE TABLE new1 AS
SELECT a.patient,
a.date,
a.med,
b.pulse,
b.temp
FROM dosing LEFT JOIN vitals
ON a.patient=b.patient
AND a.date=b.date;

QUIT;

Resultant dataset will contain all & only those observations which comes
from DOSING dataset.

Private and Confidential 46


Join Tables (Merge Datasets) – Left Joins: Using ON

Private and Confidential 47


Join Tables (Merge Datasets) – Left Joins: Using ON

PROC SQL;
CREATE TABLE new1 AS
SELECT a.patient,
a.date,
a.med,
b.pulse,
b.temp
FROM dosing RIGHT JOIN vitals
ON a.patient=b.patient
AND a.date=b.date;

QUIT;

Resultant dataset will contain all & only those observations which comes from
VITALS dataset.

Private and Confidential 48


Join Tables (Merge Datasets) – Left Joins: Using ON

Private and Confidential 49


Join Tables (Merge Datasets) – Left Joins: Using ON

PROC SQL;
CREATE TABLE new1 AS
SELECT a.patient,
a.date,
a.med,
b.pulse,
b.temp
FROM dosing FULL JOIN vitals
ON a.patient=b.patient
AND a.date=b.date;

QUIT;

Resultant dataset will contain all observation if they come from at least
one of the datasets.

Private and Confidential 50


Join Tables (Merge Datasets) – Left Joins: Using ON

Private and Confidential 51


Important Options

Action Keyword
Specify whether PROC SQL prints the query’s result PRINT|NOPRINT

Specify whether PROC SQL should stop executing ERRORSTOP |


after an error NOERRORSTOP

Restrict the number of input rows INOBS=

Restrict the number of output rows OUTOBS=

Restrict the number of loops LOOPS=

Specify whether PROC SQL prompts you when a


PROMPT |
limit is reached with the INOBS=, OUTOBS=, or
NOPROMPT
LOOPS= options

Private and Confidential 52


Summarizing Groups of Data

PROC SQL steps that create detail reports. But you might also
want to summarize data in groups.

To group data for summarizing, you can use the GROUP BY


clause.

The GROUP BY clause is used in queries that include one or


more summary functions.

Summary functions produce a statistical summary for each


group that is defined in the GROUP BY clause.

Private and Confidential 53


Ref: http://www.koerup.dk/Cert/
Summarizing Groups of Data – Example

• Suppose you want to determine the total number of miles traveled by


frequent-flyer program members in each of three membership classes (Gold,
Silver, and Bronze).
• Frequent-flyer program information is stored in the
table Sasuser.Frequentflyers.

To summarize your data, you can submit the following PROC SQL step:

Proc sql;
select membertype, sum(milestraveled) as TotalMiles from
sasuser.frequentflyers group by membertype;
Quit;

Private and Confidential 54


Ref: http://www.koerup.dk/Cert/
Summarizing Groups of Data

• In this case, the SUM function totals the values of


the MilesTraveled column to create the TotalMiles column.
• The GROUP BY clause groups the data by the values of MemberType.

• As in the ORDER BY clause, in the GROUP BY clause you specify the


keywords GROUP BY, followed by one or more column names separated
by commas.
• The results show total miles by membership class (MemberType).

If you specify a GROUP BY clause in a query that


does not contain a summary function, your clause is changed
to an ORDER BY clause, and a message to that effect is written
to the SAS log.

Private and Confidential 55


Ref: http://www.koerup.dk/Cert/
Summarizing Groups of Data
• To summarize data, you can use the following summary functions with
PROC SQL.
• Notice that some functions have more than one name to accommodate both
SAS and SQL conventions.
• Where multiple names are listed, the first name is the SQL name.

Private and Confidential 56


Ref: http://www.koerup.dk/Cert/
Creating Output Tables

You can also redirect or create output of your query to a dataset/table.

To create a new table from the results of a query, use a CREATE


TABLE statement that includes the keyword AS and the clauses
that are used in a PROC SQL query: SELECT,FROM, and any
optional clauses, such as ORDER BY. The CREATE TABLE
statement stores your query results in a table instead of
displaying the results as a report.

Private and Confidential 57


Creating Output Tables

General form, basic PROC SQL step for creating a table from a query result:

PROC SQL;
CREATE TABLE table-name AS
SELECT column-1<, . . . column-n>
FROM table-1 | view-1<, . . . table-n | view-n>
<WHERE expression>
<GROUP BY column-1<, ... column-n>>
<ORDER BY column-1<, . . . column-n>>
Quit;
where table-name specifies the name of the table to be created.

Private and Confidential 58


Ref: http://www.koerup.dk/Cert/
Creating Output Tables – Example

Suppose that after determining the total miles traveled for each frequent-flyer
membership class in the Sasuser.Frequentflyers table, you want to store this
information in the temporary table Work.Miles.

To do so, you can submit the following PROC SQL step:

Proc sql;
create table work.miles as
select membertype, sum(milestravelled) as TotalMiles
from sasuser.frequentflyers group by membertype;
Quit;

Private and Confidential 59


Ref: http://www.koerup.dk/Cert/
Creating Output Tables

Because the CREATE TABLE statement is used, this query does


not create a report.
The SAS log verifies that the table was created and indicates how
many rows and columns the table contains.

Table WORK.MILES created, with 3 rows and 2 columns

Private and Confidential 60


Ref: http://www.koerup.dk/Cert/
Additional Features
Additional Features
• To further refine a PROC SQL query that contains a GROUP BY clause, you can
use a HAVING clause.
• A HAVING clause works with the GROUP BY clause to restrict the groups that
are displayed in the output, based on one or more specified conditions.

Example
The following PROC SQL query groups the output rows by JobCode.
The HAVING clause uses the summary function AVG to specify that only the
groups that have an average salary that is greater than 40,000 will be
displayed in the output.

Proc sql;
select jobcode,avg(salary) as Avg
from sasuser.payrollmaster group by jobcode having
avg(salary)>40000 order by jobcode;
Quit;

Private and Confidential 62


Ref: http://www.koerup.dk/Cert/
Text Wrapping

When you have a long character variable (such as a COMMENT field in the questionnaire),
and you want to print the values using PROC PRINT, you will get the warning message:

WARNING:
Data too long for column "COMMENT"; truncated to
124 characters to fit.

A solution is to use PROC SQL with the flow option. An


alternative would be to use PROC REPORT.
PROC SQL FLOW=30;
SELECT var1, var2
FROM data1
;QUIT;

Flow = 30 has an effect on all character variables.

Private and Confidential 63


Count Frequencies

Using PROC SQL, you can quickly count non-missing values for several variables and output
the result on one line. (PROC FREQ would produce several output tables with the output
sometimes continuing on to the next page.)

PROC SQL; SELECT COUNT(*) AS TOTAL,


COUNT(var1) AS X1,
COUNT(var2) AS X2,
COUNT(var3) AS X3,
COUNT(var4) AS X4
FROM data1; QUIT;

Private and Confidential 64


Coalesce Function

• In our administrative tables, the Social Security Number (SSN) field is not very
well populated.
• We need to go after different sources: membership tables (old, current, and
daily) and other utilization tables.
• PROC SQL makes the selection process very easy, where the COALESCE function
will pick the first non-missing value.

PROC SQL; CREATE TABLE _SSNINFO AS


SELECT S.*, COALESCE (C.SSN, A.SSN, B.SSN, D.SSN) AS SSN,
FROM _SAMPLE AS S LEFT JOIN _CMS AS A ON S.HRN=A.HRN
LEFT JOIN _MG AS B ON S.HRN=B.HRN
LEFT JOIN _CMSDL AS C ON S.HRN=C.HRN
LEFT JOIN _DOCPLUS AS D ON S.HRN=D.HRN
;QUIT;

Private and Confidential 65


Summarizing Data
Summarize Data

PROC SQL is more intuitive than PROC MEANS or PROC SUMMARY, where SAS will
create an output table that always contains more rows and columns than you need
and you have to choose the right _TYPE_ value.

PROC SQL below summarizes the total number of cases by age group for each year.

PROC SQL;
SELECT YEAR, AGEGRP, SUM(CASES) AS CASES
FROM work.data1
GROUP BY 1,2
QUIT;

Private and Confidential 67


Fuzzy Merge

Fuzzy merge is the process of


matching records where the condition
of a match is based on close-but-not
equivalent condition.

In survival analysis where we need to


know whether a member is dead, one
important step is to match our sample to
the records in the death tape by SSN,
birthday, and name.

We compare matching variables and


assign points for every match.

Private and Confidential 68


Fuzzy Merge

• The score of 16 would be a perfect match.


• In real life, this is not always the case, so our rule is that a score of 13 and above
will be considered a match.
• For any score between 9 and 12 we will do a manual check to determine
whether it is a match.

PROC SQL;
CREATE TABLE REVIEW AS
SELECT *
FROM SAMPLE, OW_DEATH
WHERE SUM( ((KBMON =SBMON)*2), ((KBDAY =SBDAY)*1),
((KBYEAR =SBYEAR)*2),
((KFSNDX =SFSNDX)*1), ((KFNAME =SFNAME)*1),
((KMNAME =SMNAME)*1),
((KLSNDX =SLSNDX)*1), ((KLNAME =SLNAME)*1),
((KSEX =SSEX)*2) , ((KSSN =SSSN and KSSN ne ' ')*4) )
>=9;
QUIT;

Private and Confidential 69


Business Scenario

Your manager has requested reports that answer questions,


including the following:

Which employees have


Which employees
completed training A, but not
have completed training A or B?
training B?

Which employees have Which employees have


completed training A and/or B, completed
and on what dates? both classes?

Private and Confidential 70


Business Data

The data required to answer the questions is stored in two tables.

Partial train_a
Training class A is completed in a
ID Name End_Date single session.
11 Bob 15JUN2012 End_Date represents the date of
16 Sam 5JUN2012 training.
14 Pete 21JUN2012

Partial train_b
Name ID SDate EDate Training class B is a multi-
session class.
Bob 11 9JUL2012 13JUL2012 SDate is recorded on the first
Pam 15 25JUL2012 27JUL2012 training day.
Kyle 19 12JUL2012 20JUL2012 EDate is recorded when the
course is complete.
Chris 21 29JUL2012 .
Private and Confidential 71
Discussion

1 Which employees have completed


training A or B?
2 Which employees have completed
Can you answer any training A and/or B, and on what
of the four questions dates?
by querying only one table?
3 Which employees have completed
training A, but not training B?
4 Which employees have completed
both classes?

Partial train_a Partial train_b


ID Name End_Date Name ID SDate EDate
11 Bob 15JUN2012 Bob 11 9JUL2012 13JUL2012
16 Sam 5JUN2012 Pam 15 25JUL2012 27JUL2012
14 Pete 21JUN2012 Kyle 19 12JUL2012 20JUL2012
Chris 21
Private and Confidential
29JUL2012 .72
Discussion – Answer

1 Which employees have completed


training A or B?
2 Which employees have completed
No, answering each question training A and/or B, and on what
requires you to use data from dates?
both tables. 3 Which employees have completed
training A, but not training B?
4 Which employees have completed
both classes?

Partial train_a Partial train_b


ID Name End_Date Name ID SDate EDate
11 Bob 15JUN2012 Bob 11 9JUL2012 13JUL2012
16 Sam 5JUN2012 Pam 15 25JUL2012 27JUL2012
14 Pete 21JUN2012 Kyle 19 12JUL2012 20JUL2012
Chris 21
Private and Confidential
29JUL2012 .73
Using Set Operators

Set operators use the intermediate result sets from two queries to create a
final result set.

Intermediate
Query 1: result set 1

RS1 Final Result


Set
? Set
Operator

Query 2: Intermediate
result set 2

RS2

Private and Confidential 74


Question 1: UNION Set Operator

Query 1:
List employees
that have completed Final Result
train_a. Set
RS1
Which employees
have completed training UNION
A or B?
RS2
Query 2:
List employees
that have completed
train_b.

Private and Confidential 75


Question 2: OUTER UNION Set Operator

Query 1:
List employees
Which employees that have completed Final Result
have completed train_a and the Set
training A and/or completion date.
B and on what RS1
OUTER
dates?
UNION
RS2
Query 2:
List employees
that have completed
train_b and the
completion date.

Private and Confidential 76


Question 3: EXCEPT Set Operator

Query 1:
List employees
that have completed Final Result
Which employees train_a. Set
have completed RS1
training A, but not EXCEPT
training B?
RS2
Query 2:
List employees
that have completed
train_b.

Private and Confidential 77


Question 4: INTERSECT Set Operator

Query 1:
List employees
that have completed Final Result
Which employees train_a. Set
have completed RS1
both training A INTERSECT
and B?
RS2
Query 2:
List employees
that have completed
train_b.

Private and Confidential 78


Default Behavior of Set Operators

Set Operator Rows Columns


UNION Unique rows from both Aligned by column position
result sets in both result sets

OUTER UNION All rows from both result All columns from both
sets result sets

EXCEPT Unique rows from the Aligned by column position


first result set, that are in both result sets
not in the second result
set
INTERSECT Unique rows from the Aligned by column position
first result set that are in in both result sets
the second result set

Private and Confidential 79


Creating Tables
and Views
Creating a New Table

There are three ways to create new tables in PROC SQL.

Method 1 Copy columns and rows from existing table(s).


PROC SQL

Method 2 Copy columns but no rows from an existing table.


PROC SQL

Method 3 Define only the columns in the PROC SQL code.


PROC SQL

Private and Confidential 81


Business Scenario: Method 1

• Management wants to recognize employee birthdays. You need to write code to


generate a table with each employee’s birth month.
• Existing tables contain the columns and rows that you need.

PROC SQL

Private and Confidential 82


Business Data

orion.employee_addresses orion.employee_payroll

• Employee_ID Employee_ID
• Employee_Name Birth_Date
• City
orion.birthmonths

Birth
Name City
Month
Aisbitt, Sandy Melbourne 1
Sheedy, Sherie Melbourne 1
Tannous, Cos Melbourne 1
Private and Confidential 83
Creating a Table: Method 1
For this task, use method 1.
Create a table that contains columns and rows returned by a query on
existing tables.

proc sql;
create table orion.birthmonths as
select Employee_Name as Name format=$25.,
City format=$25.,
Define month(Birth_Date) as BirthMonth
table 'Birth Month' format=3.
from orion.employee_payroll as p,
columns orion.employee_addresses as a Join tables
where p.Employee_ID=a.Employee_ID
and Employee_Term_Date is missing
order by BirthMonth,City,Name;
quit;

CREATE TABLE table-name AS Filter the data rows


SELECT …; added to the table
Private and Confidential 84
Viewing the Log

proc sql;
create table orion.birthmonths as
select Employee_Name as Name format=$25.,
City format=$25.,
month(Birth_Date) as BirthMonth
'Birth Month' format=3.
from orion.employee_payroll as p,
orion.employee_addresses as a
where p.Employee_ID=
a.Employee_ID
and Employee_Term_Date is missing
order by BirthMonth,City,Name;
NOTE: Table ORION.BIRTHMONTHS created, with 308 rows and 3 columns.

quit;

Private and Confidential 85


Verifying the New Table

proc sql;
describe table orion.birthmonths;
select * from orion.birthmonths;
quit;

The DESCRIBE statement writes information about the table to the SAS log.

proc sql;
describe table orion.birthmonths;
NOTE: SQL table ORION.BIRTHMONTHS was created like:

create table ORION.BIRTHMONTHS( bufsize=8192 )


(
Name char(40) format=$25.,
City char(30) format=$25.,
BirthMonth num format=3. label='Birth Month'
);

Private and Confidential 86


Verifying the New Table

proc sql;
describe table orion.birthmonths;
select * from orion.birthmonths;
quit;

• The SELECT statement creates a report that lists the contents of the table.
• Partial PROC SQL Output

Birth
Name City Month
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Aisbitt, Sandy Melbourne 1
Sheedy, Sherie Melbourne 1
Tannous, Cos Melbourne 1
Boocks, Michael. R. Miami-Dade 1
Chinnis, Kumar Miami-Dade 1

Private and Confidential 87


Business Scenario: Method 2

Management wants a table for new sales staff that is structured like
the orion.sales table.

orion.sales work.new_sales_staff
PROC SQL

Private and Confidential 88


Creating a Table: Method 2

Copy the table structure from an existing table with the LIKE clause.

CREATE TABLE table-name-2


Partial SAS Log
LIKE table-name-1;
proc sql;
create table work.new_sales_staff
like orion.sales;
quit;

Name the source table.

proc sql;
create table work.new_sales_staff
like orion.sales;
NOTE: Table WORK.NEW_SALES_STAFF created, with 0 rows and 9
columns.
quit;

Private and Confidential 89


Business Scenario: Method 3

• You need to create a new table to contain discount information.


• The structure and data needed are not in an existing table.

PROC SQL

Private and Confidential 90


Creating a Table: Method 3

Define the columns in the CREATE TABLE statement.

CREATE TABLE table-name


(column-name type(length)
<, ...column-name type(length)> );
proc sql;
create table discounts
(Product_ID num format=z12.,
Start_Date date,
End_Date date,
Discount num format=percent.);
quit;

• The table definition is enclosed in parentheses.


• Individual column definitions are separated by commas.

Private and Confidential 91


Business Scenario

Tom Zhou is a sales manager who needs access to personnel


information for his staff.

Tom Zhou

Private and Confidential 92


Business Data

The data that Tom needs is name, job title, salary, and years of service.
This data is contained in three tables.

orion.employee_addresses

orion.employee_payroll

orion.employee_organization

Private and Confidential 93


Considerations

What is the best way to help Tom, given the following requirements:

He can write simple PROC SQL


He should not be allowed access to
queries and use basic SAS
personnel data for any employee
procedures, but cannot write
that is not his direct report.
complex joins.

A PROC SQL view accessing data


for Tom Zhou’s direct reports can
provide the information that Tom
needs in a secure manner.

Private and Confidential 94


What Is a PROC SQL View?

A PROC SQL view

Extracts underlying data


each time it is used and
Is a stored query
PROC SQL View accesses the most current
data

Can be referenced in SAS


Contains no actual data programs in the same
way as a data table

Can be derived from one Cannot have the same


or more tables, PROC SQL name as a data table
views, DATA step views, stored in the same SAS
or SAS/ACCESS views library.

Private and Confidential 95


Creating a PROC SQL View

To create a PROC SQL view, use the CREATE VIEW statement.

CREATE VIEW view-name AS


SELECT …;
proc sql;
create view orion.tom_zhou as
select Employee_Name as Name format=$25.0,
Job_Title as Title format=$15.0,
Salary 'Annual Salary' format=comma10.2,
int((today()-Employee_Hire_Date)/365.25)
as YOS 'Years of Service'
from employee_addresses as a,
employee_payroll as p,
employee_organization as o
where a.Employee_ID=p.Employee_ID and
o.Employee_ID=p.Employee_ID and
Manager_ID=120102;
quit;

Private and Confidential 96


View the Log

Partial SAS Log

16 proc sql;
17 create view orion.tom_zhou as
18 select Employee_Name as Name format=$25.0,
19 Job_Title as Title format=$15.0,
20 Salary 'Annual Salary' format=comma10.2,
21 int((today()-Employee_Hire_Date)/365.25)
22 as YOS 'Years of Service'
23 from employee_addresses as a,
24 employee_payroll as p,
25 employee_organization as o
26 where a.Employee_ID=p.Employee_ID and
27 o.Employee_ID=p.Employee_ID and
28 Manager_ID=120102;
NOTE: SQL view ORION.TOM_ZHOU has been defined.

Private and Confidential 97


AWARDS:

Learn more:
https://imarticus.org/corporate/

Email us:
[email protected]

Connect with us:


www.linkedin.com/company/imarticuslearning

ACCREDITED TRAINING PARTNER:


Watch us:
www.youtube.com/ImarticusLearninginstitute

Visit us:
Mumbai | Thane | Pune | Bangalore | Delhi - NCR |
Hyderabad | Chennai | Coimbatore

You might also like