SAS SQL - Course Notes PDF
SAS SQL - Course Notes PDF
Course Notes
SAS® SQL 1: Essentials Course Notes was developed by Peter Styliadis, Charlot Bennett, Johnny
Johnson, and Mark Jordan. Additional contributions were made by Michele Austin, Brittany Coleman,
Bruce Dawless, Davetta Dunlap, Marty Hultgren, John McCall, Rich Papel, Ross Richards, Lorilyn
Russell, Allison Saito, Ian Sedgwick, Charu Shankar, Jim Simon, Theresa Stemler, Stacey Syphus,
Chris Warters, and Anna Yarbrough. Instructional design, editing, and production support was
provided by the Learning Design and Development team.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or
trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
Copyright © 2019 SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States
of America. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mec hanical, photocopying, or otherwise,
without the prior written permission of the publisher, SAS Institute Inc.
Book code E71409, course code LWSQ1M6/SQ194, prepared date 11Jun2019. LWSQ1M6_001
ISBN 978-1-64295-094-6
For Your Infor mation iii
Table of Contents
Lesson 4 Subqueries..........................................................................................4-1
To learn more…
For information about other courses in the curriculum, contact the
SAS Education Division at 1-800-333-7660, or send e-mail to
[email protected]. You can also find this information on the web at
http://support.sas.com/training/ as well as in the Training Course
Catalog.
For a list of SAS books (including e-books) that relate to the topics
covered in this course notes, visit https://www.sas.com/sas/books.html or
call 1-800-727-0025. US customers receive free shipping to US
addresses.
viii For Your Information
Lesson 1 Essentials
1.1 Setting Up for the Course ............................................................................................ 1-3
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Setting Up for the Course 1-3
Course Overview
Fundamentals Joins
PROC
FEDSQL
Subqueries
Accessing DBMS Data SQL
Course Overview
Demonstration Performed by your instructor as an example for you
to observe
Activity Short practice opportunities for you to perform in SAS,
either independently or with the guidance of your
instructor
Practice Extended practice opportunities for you to work on
independently
Case Study A comprehensive practice opportunity at the end
of the class
4
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-4 Lesson 1 Essentials
5
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
When you come to a practice, you can choose an appropriate level: basic, intermediate, or complex.
SAS Windowing
SAS Studio Environment
6
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Setting Up for the Course 1-5
qu it ;
37
38 p ro c s ql ;
pr oc s ql ; 39 s el e ct *
40 f ro m w or k .t a rg e tc u st ;
se le ct * 41 q ui t ;
NO T E: PR O CE D UR E S QL us e d ( To t al pr o ce s s
f ro m wo rk .t a rg et cu st ; ti m e) :
re a l t im e 2. 4 8 s ec o nd s
qu it ; cp u t i me 2. 4 3 s ec o nd s
Results and
Editor Log Output Data
7
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Make a note of
the location of
course
your course
files
activities files folder.
data
database
demos
practices
8
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
For this course, you use a variety of data files and SAS programs. The SAS program files are
organized into folders for activities, demos, and practices.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-6 Lesson 1 Essentials
data
database
demos
s104d01.sas
practices SQL, Lesson 4, demo 1
9
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
course
files
activities
data
cre8data.sas
database
demos
practices
10
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Open and run the cre8data.sas program in the data folder to create the tables for the course. If your
files will not be in s:/workshop, change the value of PATH= in the %LET statement to reflect your
actual course folder location.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Setting Up for the Course 1-7
11
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
customer globalpop
transaction
merchant statecode
13
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
United States population data was obtained from the US Census Bureau:
• United States Census Bureau: US Census State Population Totals and Components of Change:
https://www.census.gov/newsroom/press-kits/2018/pop-estimates-national-state.html
Global data was obtained from the World Bank:
• The World Bank: Global Financial Inclusion (Global Findex) Database:
https://datacatalog.worldbank.org/dataset/global-financial-inclusion-global-findex-database
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-8 Lesson 1 Essentials
14
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
For more information about the LIBNAME statement, see "LIBNAME Statement" in the SAS® 9.4 and
SAS® Viya® 3.4 Programming documentation. You can also find the direct link in the Course Links
section on the Extended Learning page.
LIBNAME is a global
statement and does
not require a RUN
statement.
sq
15
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Setting Up for the Course 1-9
1.02 Activity
1. Open libname.sas from the ma i n fol der. Compl ete the LIBNAME s ta tement
to crea te a l i bra ry na med sq tha t rea ds SAS ta bles i n the data fol der. The
path should be the folder where your course files are located.
Note: In Enterpri se Guide, cl ick Libraries Refresh to upda te the library l ist.
libname sq "s:/workshop/data";
2. Run the code a nd veri fy tha t the l ibrary wa s s uccessfully a ssigned i n the
l og.
3. Sa ve the upda ted libname.sas progra m.
4. Na vi ga te to your l i st of l ibra ries a nd expand the sq l i bra ry. Veri fy tha t the
l i brary exi sts and tables a re a vailable.
16
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
18
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Throughout each lesson, you will see references to topics in the SAS documentation. All links are
provided on the ELP for each lesson.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-10 Lesson 1 Essentials
Tables
20
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
City BankID …
State
bank
…
BankID
BankName
Address
City
State
…
21
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 What Is SQL? 1-11
22
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Timeline
Relational Database
Model Proposed
First ANSI Standard First major revision to the Last major revision to the
SEQUEL SQL (SQL-87) ANSI Standard (SQL-2) ANSI Standard (SQL-3)
Developed
23
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
1970 - Dr. Edgar F. Codd proposes a relational data model (“A Relational Model of Data for Large
Shared Data Banks”)
1973 - Donald D. Chamberlin and Raymond F. Boyce develop SEQUEL (Structured English Query
Language) at IBM
1986 - First ANSI standard for SQL (SQL-87)
1992 - First major revision to the ANSI standard (SQL-2)
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-12 Lesson 1 Essentials
1993 - SAS SQL procedure added (based on SQL-2 standard). Although PROC SQL incorporated
many of the standards, SAS SQL is not ANSI compliant.
1999 - Last major revision to the ANSI standard (SQL-3)
SQL Implementations
Any ma ny
more!
24
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Introduction to the SQL Procedure 1-13
SQL Procedure
SQL SAS
PROC PROC SQL enables the use
of SQL in SAS and includes
non-ANSI-compliant
SAS enhancements.
SQL
SQL
26
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
PROC SQL follows most of the guidelines set by the American National Standards Institute (ANSI) in
its implementation of SQL. However, it is not fully compliant with the current ANSI standard for SQL.
For more information about SAS and the ANSI standard, see "PROC SQL and the ANSI Standard"
in the SAS SQL Procedure User’s Guide documentation. You can also find the direct link in the
Course Links section on the ELP.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-14 Lesson 1 Essentials
Analyze and
Access Explore Prepare Export
report on
data data data results
data
PROC SQL
27
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
As you go through the process of making data meaningful and actionable using SQL, you will likely
follow these basic steps: access data or read it with a program, explore the data to see what is there
and what you might need to add or change, prepare the data to get it ready for analysis, analyze and
report on the data, and export results to various report and data formats.
SQL Procedure
SAS tables
SAS tables
PROC
Views
INPUT OUTPUT
Views
SQL
DBMS tables
DBMS tables
28 Reports
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Introduction to the SQL Procedure 1-15
29
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
SQL Procedure
QUIT;
30
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Multiple statements can be included in a PROC SQL step. Each statement defines a process and is
executed immediately.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-16 Lesson 1 Essentials
QUIT;
31
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
The SELECT statement is the most commonly used SQL statement and is usually referred to as a
query.
32
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Introduction to the SQL Procedure 1-17
QUIT;
If present, the other clauses
must be in this order.
33
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
When you construct a SELECT statement, you must specify the clauses in the following order:
• The SELECT clause selects columns.
• The FROM clause selects one or more source tables or views .
• The WHERE clause enables you to filter rows of data.
• The GROUP BY clause enables you to process data in groups.
• The HAVING clause works with the GROUP BY clause to filter grouped results .
• The ORDER BY clause specifies the order of rows.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-18 Lesson 1 Essentials
QUIT;
34
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
For more information about the available PROC SQL statements, see "SQL Procedure" in the SAS ®
9.4 SQL Procedure User’s Guide documentation. You can also find the direct link in the Course
Links section on the ELP.
Exploring Tables
Explore the
customer table.
customer
• column attributes
• preview first 10 rows
35
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Introduction to the SQL Procedure 1-19
36
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
The DESCRIBE TABLE statement returns results comparable to PROC CONTENTS, which shows
the contents of a SAS table and more in-depth table information, and prints the directory of the SAS
library.
For more information about the CONTENTS procedure, see “CONTENTS Procedure” in Base SAS ®
9.4 Procedures Guide, Seventh Edition. You can also find the direct link in the Course Links section
on the ELP.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-20 Lesson 1 Essentials
37
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
To select all of a table’s columns in the order in which they were stored, specify an asterisk (*) in a
SELECT clause instead of column names.
The OBS= data set option specifies the last observation that SAS processes in a data set.
For more information about data set options, see “Dictionary of Data Set Options” in the SAS® 9.4
and SAS ® Viya® 3.4 Programming documentation. You can also find the direct link in the Course
Links section on the ELP.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Introduction to the SQL Procedure 1-21
Scenario
Use PROC SQL to explore the customer table.
Files
• s101d01.sas
• customer – a SAS table that contains one row per customer.
Syntax
PROC SQL;
DESCRIBE TABLE table-name;
QUIT;
PROC SQL;
SELECT col-name, col-name
FROM input-table(OBS=n);
QUIT;
Notes
• The DESCRIBE TABLE displays each column attribute in the log.
• The SELECT statement is the primary tool of PROC SQL. You use it to identify, retrieve, and
manipulate columns of data from a table. You can also use several optional clauses within the
SELECT statement to place restrictions on a query.
• The SELECT clause lists the columns that will appear in the results. The asterisk (*) selects all
columns.
• The FROM clause specifies source tables or views.
Demo
1. Open s101d01.sas from the demos folder and find the Demo section of the program. In the first
SQL procedure step, add a DESCRIBE TABLE statement to see column attributes of the
sq.customer table. Highlight the step and run the selected code. Examine the log and results.
proc sql;
describe table sq.customer;
quit;
2. In the second SQL procedure step, add a SELECT statement and select all the columns from the
sq.customer table using an asterisk. Add the OBS=10 data set option to the table to limit the
report to 10 rows.
Note: In this example, the customer table contains more than 100,000 rows. Running the
query without the OBS= option displays all the rows and might cause system issues.
proc sql;
select *
from sq.customer(obs=10);
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-22 Lesson 1 Essentials
3. Modify the SELECT statement to select the columns FirstName, LastName, and DOB. Highlight
the step and run the selected code. Examine the log and results.
Note: PROC SQL displays the permanently assigned labels if they exist.
proc sql;
select FirstName, LastName, DOB
from sq.customer(obs=10);
quit;
4. Modify the SELECT statement to select the columns CustomerID, LastName, UserID, and
DOB. Highlight the step and run the selected code. Examine the log and results.
proc sql;
select CustomerID, LastName, UserID, DOB
from sq.customer(obs=10);
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Introduction to the SQL Procedure 1-23
Limits rows from each source table Restricts the number of rows
that contribute to a query that a query outputs
39
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
40
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
For more information about PROC SQL options, see “PROC SQL Options” in the SAS® 9.4 SQL
Procedure User’s Guide, Fourth Edition documentation. You can also find the direct link in the
Course Links section on the ELP.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-24 Lesson 1 Essentials
1.04 Activity
Open s101a04.sas from the activities fol der a nd perform the following tasks
to expl ore the sq.customer ta bl e:
1. Remove the a s terisk a nd select only the FirstName, LastName, a nd State
col umns . Run the query. Vi ew the log a nd results .
2. Remove the OBS=10 da ta s et option a nd a dd the INOBS=10 PROC SQL
option after the PROC SQL keywords a nd before the s emi colon. Run the
query. Are the res ul ts the sa me using the INOBS=10 option? What a bout
the l og?
3. After the INOBS= opti on, a dd the NUMBER opti on. Run the query. Whi ch
col umn wa s a dded to the res ults?
41
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
▪ More control of how SAS processes data ▪ Describe what you want to do, not
how to do it
Can create multiple tables in one step
SQL optimizer
Includes looping and array processing
45
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
PROC SQL is a complement to the DATA step, not a replacement for the DATA step. Sometimes
PROC SQL is the best tool to use, but in other situations, it is better to use the DATA step.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Introduction to the SQL Procedure 1-25
Syntax Summary
46
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
47
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-26 Lesson 1 Essentials
1.4 Solutions
Solutions to Activities and Questions
Confirm that 27
SAS tables are
created.
12
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
%let path=s:/workshop;
libname sq "s:/workshop/data";
17
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.4 Solutions 1-27
continued...
1.04 Activity – Correct Answer
1. Remove the a s terisk a nd select only the FirstName, LastName, a nd State
col umns . Run the query. Vi ew the log a nd results .
proc sql;
select FirstName, LastName, State
from sq.customer(obs=10);
quit;
Using SAS data set
options is a SAS
enhancement.
42
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
continued...
1.04 Activity – Correct Answer
2. Are the res ults the s ame using the INOBS=10 option? Yes, the results
are the same. Wha t a bout the l og? The INOBS= option restricts the
number of rows that PROC SQL retrieves from any single source. The
results are the same, but the log returns a warning.
43
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-28 Lesson 1 Essentials
44
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 2 PROC SQL
Fundamentals
2.1 Generating Simple Reports .......................................................................................... 2-3
Demonstration: Creating Simple Reports.................................................................. 2-17
Demonstration: Assigning Values Conditionally ......................................................... 2-24
Practice............................................................................................................... 2-27
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Generating Simple Reports 2-3
3
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
4
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
The WHERE clause enables you to retrieve only the rows that satisfy a condition.
WHERE clauses can contain any of the columns in a table, including columns that are not specified
in the SELECT clause.
The WHERE clause must come after the SELECT clause and the FROM clause.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-4 Lesson 2 PROC SQL Fundamentals
Character Numeric
5
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Mnemonic Symbol
LT <
GT >
EQ =
LE <=
GE >=
NE <>
¬= (EBCDIC)
^= (ASCII)
6
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Generating Simple Reports 2-5
7
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
A SAS date value is a date written in the following form: 'ddmmm<yy>yy'd or ''ddmmm<yy>yy''d.
• date='1JAN2013'd;
• date=‘01jan09’D;
A SAS time constant is a time written in the following form: 'hh:mm<:ss.s>'t or ''hh:mm<:ss.s>''t.
• time='9:25’t;
• time='9:25:19pm'T;
A SAS datetime constant is datetime value written in the following form:
'ddmmm<yy>yy:hh:mm<:ss.s>'dt or ''ddmmm<yy>yy:hh:mm<:ss.s>''dt.
• begin='01may12:9:30:00'dt;
• end='31dec13:5:00:00’dt;
• dtime='18jan2003:9:27:05am'DT;
Common Date and Time Functions:
• MONTH(date) - returns the month as a numeric value from a SAS date value.
• DAY(date) - returns the numeric day of the month from a SAS date value.
• YEAR(date) - returns the numeric year from a SAS date value.
• YRDIF(start-date, end-date, < basis>) - returns the difference in years between two dates
according to specified day count conventions.
• TODAY() - returns the current date as a numeric SAS date value.
• TIME() - returns the current time of day as a numeric SAS time value.
• DATETIME() - returns the current date and time of day as a SAS datetime value.
• DATEPART(datetime) - extracts the date from a SAS datetime value.
• TIMEPART(datetime) - extracts a time value from a SAS datetime value.
• MDY(month, day, year) - returns a SAS date value from month, day, and year values.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-6 Lesson 2 PROC SQL Fundamentals
For more information about the SAS functions, see “SAS Functions and CALL Routines by
Category” in the SAS® 9.4 Functions and CALL Routines: Reference, Fifth Edition documentation.
You can also find the direct link in the Course Links section on the ELP.
Combining Expressions
WHERE expression-1 OR | AND expression-n;
OR AND
proc sql; proc sql;
select CustomerID, DOB select CustomerID, DOB
from sq.customer from sq.customer
where State = 'NY' or where Income > 30000 and
State = 'NC' or State = 'NC';
State = 'CA'; quit;
quit;
IN Operator
9
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
The IN operator tests for values that match one of a list of values .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Generating Simple Reports 2-7
You can use the IN operator with character strings or numeric values to determine whether a
variable's value is among a list of values. Character values must be enclosed in quot ation marks.
Numeric values must be in standard form.
2.01 Activity
Open s102a01.sas from the activities fol der a nd perform the following tasks
to fi nd a l l customers i n the states VT, SC, or GA:
1. Compl ete the WHERE cl a use to fi lter for cus tomers i n the s tate of VT a nd
run the query.
2. Add a nother expression using the OR operator to s elect only cus tomers
from the s ta te of VT or SC. How ma ny cus tomers a re from ei ther VT or
SC?
3. Swi tch your current expres sion to use the IN operator. Add the s tate of
GA. How ma ny cus tomers are from either VT, SC, or GA?
10
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Numeric Character
12
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-8 Lesson 2 PROC SQL Fundamentals
• is ANSI standard and can be used in DBMS environments that distinguish between missing and
null values.
• is the same as, and interchangeable with, the IS MISSING operator in SAS. However, IS
MISSING is not ANSI standard.
• can be prefixed with the NOT operator to form a negative condition.
In SAS,
• a blank represents a missing character value
• a . represents a missing numeric value.
2.02 Activity
Open s102a02.sas from the activities fol der a nd perform the following
tas ks to fi nd all cus tomers with a nonmissing CreditScore va l ue tha t is
l es s tha n 500.
1. Exa mi ne the query. Add a WHERE cl a us e to fi nd a ll customers wi th a
CreditScore va l ue that i s l ess tha n 500 a nd run the query. Wha t do you
notice a bout the va l ues i n the CreditScore col umn? How ma ny rows a re
i n your report?
2. Incl ude the AND operator i n the WHERE cl a use to fi nd a ll rows that a re
l es s tha n 500 a nd not nul l. Us e a method of your choi ce. How ma ny rows
a re i n your fi na l report?
13
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Generating Simple Reports 2-9
Zachary Zelda
wildcard wildcard
Zelda Zulma
for any for a
Zulma number of Zelma single
Zula characters Zola character
Zoe
Zandra
17
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
The LIKE operator tests for values that match a specified pattern.
• a percent sign (%) to match any number of characters
• an underscore (_) to match one arbitrary character
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-10 Lesson 2 PROC SQL Fundamentals
18
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Unless an ORDER BY clause is included in the SELECT statement, a particular order to the output
rows (such as the order in which the rows are encountered in the queried table) cannot be
guaranteed. Without an ORDER BY clause, the order of the output rows is determined by the
internal processing of PROC SQL, the default collating sequence of SAS, and your operating
environment. Therefore, if you want your query results to appear in a particular order, use the
ORDER BY clause.
• The PROC SQL default sort order is ascending.
• When you use an ORDER BY clause, you change the order of the result s but not the order of the
rows that are stored in the source table.
• PROC SQL sorts missing values before nonmissing values. Therefore, when you specify
ascending order, missing values appear first in the query results.
• If multiple ORDER BY columns are specified, the first one determines the major sort order.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Generating Simple Reports 2-11
2.03 Activity
Open s102a03.sas from the activities fol der a nd perform the following tasks
to s ort the report by CreditScore a nd LastName:
1. Compl ete the ORDER BY cl a use and sort by CreditScore. Run the query
a nd exa mine the report. Wha t i s the default s ort order?
2. Add the keyword DESC a fter the CreditScore col umn i n the ORDER BY
cl a us e. Run the query a nd examine the report. Wha t does the DESC
option do?
3. Add a s econdary s ort col umn to s ort by LastName. Run the query. Who is
the fi rs t cus tomer on the report?
4. Remove LastName from the SELECT cl a us e a nd rerun the query. Are the
res ul ts s till s orted by LastName wi thi n CreditScore?
19
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
21
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-12 Lesson 2 PROC SQL Fundamentals
Enhancing Reports
22
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Enhancing Reports
Add labels
Add title
Add
Add footnote formats
23
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Generating Simple Reports 2-13
24
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
TITLE is a global statement that establishes a permanent title for all reports created in your SAS
session. The syntax is the keyword TITLE followed by the title text enclosed in quotation marks. You
can have up to 10 titles. Specify a number 1 through 10 after the keyword TITLE to indicate the line
number. TITLE and TITLE1 are equivalent. FOOTNOTE follows the same rules as TITLE.
25
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-14 Lesson 2 PROC SQL Fundamentals
26
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Format names always contain a period (.) as a required delimiter. The w specifies the total width of
the value, including decimal places and special characters. If you do not specify a format width that
is large enough to accommodate a value, SAS automatically adjusts to display as much of the stored
value as possible.
Depending on the particular format, the d specifies the number of decimal places in the value. The d
is optional for numeric formats. For example, to display the value 1234 as $1,234 in a report, you
can use the DOLLAR6.0 or DOLLAR6. format.
2.04 Activity
Open s102a04.sas from the activities fol der a nd perform the following tasks
to enha nce a report:
1. Exa mi ne the query. Add the title "Customers from Hawaii" a nd a
footnote us ing today's date. Run the progra m and exa mine the new title
a nd footnote i n your report.
2. Appl y LABEL="Email Address" to the UserID col umn a nd LABEL=
"Estimated Income" to the Income col umn.
3. Appl y FORMAT=DATE9. to the DOB col umn a nd FORMAT=DOLLAR16.2
to the Income col umn. Run the program a nd examine the report.
4. Cha nge the DOLLAR16.2 forma t to DOLLAR7.2. Run the progra m. Wha t
ha ppens to the va l ues in the Income col umn?
27
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Generating Simple Reports 2-15
SAS Formats
30
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
For more information about SAS formats, see “Formats by Category” in the SAS® 9.4 Formats and
Informats: Reference documentation. You can also find the direct link in the Course Links section on
the ELP.
For more information about PROC FORMAT, see “FORMAT Procedure” in the Base SAS ® 9.4
Procedures Guide, Seventh Edition documentation. You can also find the direct link in the Course
Links section on the ELP.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-16 Lesson 2 PROC SQL Fundamentals
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Generating Simple Reports 2-17
Scenario
Use the WHERE and ORDER BY clauses to create three reports.
Files
• s102d01.sas
• customer – a SAS table that contains one row per customer
Syntax
TITLE<n> 'title-text’;
PROC SQL OUTOBS=n;
SELECT col-name <FORMAT=formatw.d> <LABEL='LABEL'>
FROM input-table(OBS=n)
WHERE expression
ORDER BY col-name <DESC>;
QUIT;
TITLE;
Notes
• The WHERE clause filters rows based on the expression (or expressions).
• The ORDER BY clause arranges rows based on the listed columns. The default order is
ascending. Use DESC after a column name to reverse the sort sequence.
• The OBS= data set option specifies the last rows that SAS processes from a table.
• The OUTOBS= option restricts the number of rows in the results.
• TITLE is a global statement that establishes a permanent title for all reports created in your SAS
session.
• The FORMAT= column modifier specifies a SAS format for determining how character and
numeric values in a column are displayed by the query expression.
• The LABEL= column modifier specifies a column label.
Demo
1. Open the s102d01.sas program in the demos folder and find the Demo section. Move to
Report 1. Complete the query.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-18 Lesson 2 PROC SQL Fundamentals
a. Complete the WHERE clause to filter for a missing BankID value and a value of
CreditScore greater than 700.
proc sql;
select FirstName, LastName, State,
Income, UserID
from sq.customer (obs=100)
where BankID is null and CreditScore > 700;
quit;
b. Complete the ORDER BY clause to arrange rows by descending Income.
proc sql;
select FirstName, LastName, State, Income, UserID
from sq.customer (obs=100)
where BankID is null and CreditScore > 700
order by Income desc;
quit;
c. Add the column modifiers FORMAT=DOLLAR16. to the Income column and LABEL=‘Email’
to the UserID column. Remove the OBS= data set option and add the OUTOBS=10 option in
the PROC SQL statement.
proc sql outobs=10;
select FirstName, LastName, State,
Income format=dollar16.,
UserID label='Email'
from sq.customer (obs=100)
where BankID is null and CreditScore > 700
order by Income desc;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Generating Simple Reports 2-19
b. Complete the ORDER BY clause to arrange rows by descending DOB. Run the query and
view the results.
proc sql;
select CustomerID, State, Zip,
DOB, UserID,
HomePhone, CellPhone
from sq.customer(obs=100)
where DOB < '31DEC1940'd and Employed='Y'
order by DOB desc;
quit;
c. Add the column modifiers FORMAT= to the DOB and Zip columns. Remove the OBS= data
set option and highlight and run the query. Examine the log and results.
Note: The Z format writes standard numeric data with leading 0s. Scroll in the results and show
ZIP codes with fewer than five digits.
proc sql;
select CustomerID, State, Zip format=z5.,
DOB format=date9., UserID,
HomePhone, CellPhone
from sq.customer(obs=100)
…
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-20 Lesson 2 PROC SQL Fundamentals
expression AS alias
proc sql;
select FirstName, LastName, UserID,
yrdif(dob,'01jan2019'd) as Age
from sq.customer(obs=100);
quit;
Create a report that
contains customers
who are age
70 and over.
32
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
In addition to selecting columns that are stored in a table, you can create new columns that exist for
the duration of the query. These columns can contain text or calculations. PROC SQL writes the
columns that you create as if they were columns from the table.
The YRDIF function enables us to calculate the age of all of our customers. We begin with the start
date, which is their date of birth value. Then we use the constant of January 1, 2019, as the end
date. If you want today's date, you can use the TODAY function(). The third argument, basis,
describes how SAS calculates the date difference. We specify Age to indicate that we want the
person's age computed. We will name the column Age and run the query to create a new column
with each customer’s age.
For more information about the YRDIF function, see “YRDIF Function” in the SAS® 9.4 Functions
and CALL Routines: Reference, Fifth Edition documentation. You can also find the direct link in the
Course Links section on the ELP.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Generating Simple Reports 2-21
2.05 Activity
Open s102a05.sas from the activities fol der a nd perform the following tasks
to fi nd a l l customers 70 yea rs ol d a nd ol der:
1. Exa mi ne a nd run the query. Vi ew the res ults.
2. Add the expres sion yrdif(dob,'01jan2019'd) i n the SELECT cl a use a fter
UserID to crea te a new col umn. Run the query a nd examine the results.
Wha t i s the na me of the new col umn?
3. Add as Age a fter your functi on. Run the query a nd examine the results .
Wha t cha nges?
4. Remove the OBS= da ta s et option i n the FROM cl a use and add a WHERE
cl a us e to return rows where Age i s greater tha n or equal to 70. Run the
query. Di d the query run s uccessfully?
33
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
35
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
When you use a column alias to refer to a calculated value, you must use the CALCULATED
keyword with the alias to inform PROC SQL that the value is calculated within the query. You can
use an alias to refer to a calculated column in a SELECT clause, a WHERE clause, or an ORDER
BY clause. The ORDER BY clause does not require the calculated keyword.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-22 Lesson 2 PROC SQL Fundamentals
CATEGORY RANGE
Excellent 750+
Good 700 – 749
Fair 650 – 699
Poor 550 – 649
Bad 550 & below
36
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
proc sql;
select FirstName, LastName, State, CreditScore,
case
when CreditScore >= 750 then "Excellent"
when CreditScore >= 700 then "Good"
when CreditScore >= 650 then "Fair"
when CreditScore >= 550 then "Poor"
when CreditScore >= 0 then "Bad"
else "Unknown"
end as Category
from sq.customer(obs=1000);
quit;
GT LT <= >= NE = …
37
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Generating Simple Reports 2-23
proc sql;
select FirstName, LastName, State, CreditScore,
case
when CreditScore >= 750 then "Excellent"
when CreditScore >= 700 then "Good"
when CreditScore >= 650 then "Fair"
when CreditScore >= 550 then "Poor"
when CreditScore >= 0 then "Bad"
else "Unknown"
end as Category
from sq.customer(obs=1000); The first WHEN clause evaluated
quit; as true determines which value
ELSE provides alternate action if the CASE expression returns.
no WHEN expressions are true.
38
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
CASE-Operand Form
proc sql;
select FirstName, LastName, State, CreditScore,
case Married
when "M" then "Married" equivalent of
when "D" then "Divorced" Married="D"
when "S" then "Single"
when "W" then "Widowed"
else "Unknown"
end as Category
from sq.customer(obs=1000); A test of equality
quit; is implied.
GT LT <= >= NE =
39
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-24 Lesson 2 PROC SQL Fundamentals
Scenario
Use the CASE expression to create columns conditionally.
Files
• s102d02.sas
• customer – a SAS table that contains one row per customer
Syntax
PROC SQL;
SELECT col-name, <col-name>,
CASE <case-operand>
WHEN condition THEN result-expression
<WHEN condition THEN result-expression>
<ELSE result-expression>
END <AS column>
FROM input-table;
QUIT;
Notes
• CASE expressions enable you to interpret and change some or all the data values in a column to
make the data more useful or meaningful.
• You can use a CASE expression anywhere that you can use a column name.
Demo
1. Open the s102d02.sas program in the demos folder and find the Demo section.
2. In the Simple Case Expression section:
a. Highlight and run the query. Examine the log and results.
b. Complete the WHEN and ELSE expressions in the simple CASE expression. Highlight and
run the query. Examine the log and results.
proc sql;
select FirstName, LastName, State, CreditScore,
case
when CreditScore >=750 then 'Excellent'
when CreditScore >=700 then 'Good'
when CreditScore >=650 then 'Fair'
when CreditScore >=550 then 'Poor'
when CreditScore >=0 then 'Bad'
end as CreditCategory
from sq.customer(obs=1000);
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Generating Simple Reports 2-25
c. Add the ELSE expression to change remaining values to Unk nown. Highlight and run the
query. Examine the log and results.
…
else 'Unknown'
end as CreditCategory
…
d. Add a WHERE clause to filter the table for customers with Excellent credit and remove the
OBS=1000 data set option. Highlight and run the query. Examine the log and results.
proc sql;
select FirstName, LastName, State, CreditScore,
case
when CreditScore >=750 then 'Excellent'
when CreditScore >=700 then 'Good'
when CreditScore >=650 then 'Fair'
when CreditScore >=550 then 'Poor'
when CreditScore >=0 then 'Bad'
else 'Unknown'
end as CreditCategory
from sq.customer(obs=1000)
where calculated CreditCategory = 'Excellent';
quit;
3. Move to the CASE-OPERAND FORM section.
a. Highlight and run the query. Examine the log and results.
b. Complete the WHEN and ELSE expressions in the simple CASE-Operand expression.
Highlight and run the query. Examine the log and results.
proc sql;
select FirstName, LastName, State, CreditScore, Married,
case Married
when 'M' then 'Married'
when 'S' then 'Single'
when 'D' then 'Divorced'
when 'W' then 'Widowed'
else 'Unknown'
end as MarriedCategory
from sq.customer(obs=1000);
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-26 Lesson 2 PROC SQL Fundamentals
Syntax Summary
SELECT col-name, col-name TITLE<n> "title-text";
FROM input-table FOOTNOTE<n> "footnote-text";
WHERE expression FORMAT=formatw.d
ORDER BY col-name <DESC>; LABEL = 'LABEL'
Query Enhance Reports
column AS alias
CASE EXPRESSION
"01JAN2000 12:00:00"dt
WHERE col-name IS NULL Create Columns "01JAN2019"d
WHERE col-name LIKE
"14:45:32"t
WHERE col-name BETWEEN-AND
WHERE CALCULATED column SAS Dates and Times
Filter Data
41
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Generating Simple Reports 2-27
Practice
If you restarted your SAS session, open and submit the libname.sas program in the course files.
Level 1
1. Querying a Table
The sq.transactionfull table contains customer and transaction information. Write a PROC SQL
step to generate a report for large transactions that are not related to tuition payments to
universities. Use the following requirements as you generate the report:
a. Write a query to display the following columns in this order: CustomerName,
MerchantName, Type, Service, and Amount from the sq.transactionfull table.
1) Select rows that have a transaction Amount value greater than $1,000 and a Service
value not equal to University.
2) Order the rows such that the largest transaction is listed first.
3) Format the Amount column with the DOLLAR10.2 format.
4) Label CustomerName as Customer Name, and Amount as Transaction Amount.
5) Add the following title: Large Non-Educational Transactions
6) Run the program and review the results.
Partial Results
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-28 Lesson 2 PROC SQL Fundamentals
4) Add the following title: Customers with May Birthdays in North Carolina
5) Run the program and review the results.
Partial Results
Level 2
3. Working with Datetime Values
The sq.transactionfull table contains a list of customer and transaction information. Write a
query to create a report displaying transactions that took place in November and December of
any year. Use the following requirements as you generate the report:
a. Use the sq.transactionfull table to select the following columns in this order:
CustomerName, MerchantName, Amount.
1) Create a new column named TransactionDate by using the DATEPART function to
extract the SAS date value from the DateTime column. Format the new column using the
DATE9. format.
2) Filter the data to select rows where the month of the transaction date is November or
December and the Service value is not equal to University.
3) Order the report by the original DateTime column.
4) Format the Amount column with DOLLAR10.2.
5) Label CustomerName as Customer Name, MerchantName as Merchant Name,
Amount as Transaction Amount, and TransactionDate as Transaction Date.
6) Add the following title: November/December Transactions
7) Run the program and review the results.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Generating Simple Reports 2-29
Partial Results
Challenge
4. Conditional Processing with a Dynamic Title
Write a query to conditionally create a new column based on a customer’s age. Use the following
requirements as you generate the report:
a. Using the sq.customer file as input, use a character format to display the first initial of
FirstName labeled as Initial and to display LastName labeled as Last Name. In addition,
display CreditScore with the label Credit Score.
1) Create a new column named Generation based on the customer’s date of birth (DOB),
using the following logic:
a) DOB between 01JAN1928 and 31DEC1945 Generation = Silent.
b) DOB between 01JAN1946 and 31DEC1964 Generation = Boomer.
c) DOB between 01JAN1965 and 31DEC1979 Generation = GenX.
d) DOB between 01JAN1980 and 31DEC1996 Generation = Millennial.
e) DOB on or after 01JAN1997 Generation = Post-Millennial.
f) Otherwise, set Generation = Unk nown.
2) Filter the data to select rows where CreditScore is not missing and State = VT.
3) Order the report by descending CreditScore within each value of Generation.
4) Add a dynamic title that specifies the exact date the report was run: Created on
Dynamic Date Value (Hint: Research the %QSYSFUNC macro function). The date
value in the title shown below is formatted using the WEEKDATE format.
5) Run the program and review the results.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-30 Lesson 2 PROC SQL Fundamentals
Partial Results (The date in the title changes based on when the program is run.)
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Summarizing and Grouping Data 2-31
44
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Distinct Keyword
PROC SQL;
SELECT DISTINCT col-name <,col-name> The DISTINCT keyword applies to
FROM input-table all columns in the SELECT list.
QUIT;
proc sql;
select distinct State
from sq.customer;
quit;
45
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Use the DISTINCT keyword to eliminate duplicate rows. The DISTINCT keyword applies to all
columns in the SELECT list. One row is displayed for each unique combination of values.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-32 Lesson 2 PROC SQL Fundamentals
When you specify all of a table's columns in a SELECT clause with the DISTINCT keyword, PROC
SQL eliminates duplicate rows, or rows in which the values in all of the columns match, from the
results.
2.06 Activity
Open s102a06.sas from the activities fol der a nd perform the following tasks
to el i minate duplica te va lues i n a table:
1. Exa mi ne a nd run the query. Vi ew the res ults.
2. Cha nge the State col umn i n the SELECT cl a use to the Employed col umn.
Run the query. Wha t does this query s how?
3. Add the Married col umn i n the SELECT cl a use a fter the Employed
col umn. Run the query. Wha t does this query s how?
46
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Summarizing Data
statepopulation
summarize data
48
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Summarizing and Grouping Data 2-33
49
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
When you use a summary function with a single argument, nonmissing values are totaled down a
column.
50
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
A summary function with multiple arguments, nonmissing values are totaled across a row.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-34 Lesson 2 PROC SQL Fundamentals
VAR Variance
51
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
When more than one argument is used within an SQL aggregate function, the function is no longer
considered to be an SQL aggregate or summary function. If there is a like-named Base SAS
function, then PROC SQL executes the Base SAS function, and the results that are returned are
based on the values for the current row. If no like-named Base SAS function exists, then an error
occurs. For example, if you use multiple arguments for the AVG function, an error occurs because
there is no AVG function for Base SAS.
For more information about SAS functions, see the SAS® 9.4 Functions and CALL Routines:
Reference, Fifth Edition documentation. You can also find the direct link in the Course Links section
on the ELP.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Summarizing and Grouping Data 2-35
Summary Functions
Scenario
Use summary functions to analyze a table.
Files
• s102d03.sas
• statepopulation – a SAS table that contains forecasted US population information by states
Syntax
Notes
• When you use a summary function with multiple arguments, nonmissing values are totaled
across a row.
• When you use a summary function with a single argument, nonmissing values are totaled down
a column.
Demo
1. Open the s102d03.sas program in the demos folder and find the Demo section. Highlight and
run the DESCRIBE TABLE statement and query in the Explore the sq.statepopulation table
section. Examine the log and results.
proc sql inobs=10;
describe table sq.statepopulation;
select Region, Division, Name, PopEstimate1, PopEstimate2,
PopEstimate3
from sq.statepopulation;
quit;
2. Move to Method 1 – Down a Column. In the query complete the following:
a. Run the query and examine the results.
proc sql;
select count(PopEstimate1) as TotalStates,
mean(PopEstimate1) as Mean format=comma16.
from sq.statepopulation;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-36 Lesson 2 PROC SQL Fundamentals
b. In the SELECT clause, add three columns to find the standard deviation, minimum, and
maximum of PopEstimate1 using the STD, MIN, and MAX functions. Use the COMMA16.
format for all new columns. Highlight and run the query. Examine the log and results.
proc sql;
select count(PopEstimate1) as TotalStates,
mean(PopEstimate1) as Mean format=comma16.,
std(PopEstimate1) as StdDev format=comma16. ,
min(PopEstimate1) as Min format=comma16.,
max(PopEstimate1) as Max format=comma16.
from sq.statepopulation;
quit;
c. Move to SAS Method – PROC MEANS below the query. SAS has procedures to do similar
summarization. Highlight and run the MEANS procedure. Examine the log and results.
proc means data=sq.statepopulation maxdec=0;
var PopEstimate1;
run;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Summarizing and Grouping Data 2-37
b. Replace the AVG function with the MEAN function. Highlight and run the query. Examine
the log and results.
…
mean(PopEstimate1, PopEstimate2, PopEstimate3) as Mean
format=comma16.
…
c. In the MAX function, change the arguments to of PopEstimate1-PopEstimate3. Highlight
and run the query. Examine and discuss the syntax error.
…
mean(PopEstimate1, PopEstimate2, PopEstimate3) as Mean
format=comma16.,
min(PopEstimate1, PopEstimate2, PopEstimate3) as Min
format=comma16. ,
max(of PopEstimate1-PopEstimate3) as Max
format=comma16.
…
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-38 Lesson 2 PROC SQL Fundamentals
An asterisk
specifies all rows.
SELECT COUNT(argument);
proc sql;
select count(*) as TotalCustomers format=comma12.
from sq.customer;
quit;
53
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
2.07 Activity
Open s102a07.sas from the activities fol der a nd perform the following tasks
to s umma ri ze a table using the COUNT function:
1. Exa mi ne a nd run the query. Vi ew the res ults. Why i s the va lue of
MaritalStatus di fferent from the va lue of TotalRows?
2. Ins i de the COUNT function, a dd the DISTINCT keyword in front of the
Married col umn a nd run the query. Wha t does the new report s how?
54
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Summarizing and Grouping Data 2-39
Grouping Data
customer
56
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Grouping Data
SELECT col-name, summary function(column)
The GROUP BY clause summarizes
FROM input-table
groups of data by a specified WHERE expression
column or columns. GROUP BY col-name <,col-name>
ORDER BY col-name DESC;
57
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-40 Lesson 2 PROC SQL Fundamentals
58
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
You can use a HAVING clause with a GROUP BY clause to filter grouped data. The HAVING clause
affects groups in a way that is similar to how a WHERE clause affects individual rows. When you
use a HAVING clause, PROC SQL displays only the groups that satisfy the HAVING expression.
Because the WHERE clause is evaluated before a row is available for processing and determines
which individual rows are available for grouping, you cannot use a WHERE clause to subset
grouped rows by referring to the calculated summary column.
You can still use the WHERE clause with the HAVING clause. The WHERE clause filters the
rows coming from the input table or tables, and the HAVING clause filters the aggregated data
post-summarization.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Summarizing and Grouping Data 2-41
Scenario
Use the GROUP BY clause to group data and produce summary statistics for each group.
Files
• s102d04.sas
• customer – a SAS table that contains US population information by country, region, and state
Syntax
PROC SQL;
SELECT col-name, col-name
FROM input-table
WHERE expression
GROUP BY col-name
HAVING expression
ORDER BY col-name <DESC>;
QUIT;
Notes
• The GROUP BY clause enables you to break query results into subsets of rows. When you use
the GROUP BY clause, you use an aggregate function in the SELECT clause
• The HAVING clause is a “WHERE” clause for grouped data.
Demo
1. Open the s102d04.sas program in the demos folder and find the Demo section. Notice that the
query creates a report of the State column in the customer table and limits the output to 1,000
rows. Highlight and run the query. Examine the log and results.
Note: The table is limited for development purposes because the customer table has more
than 100,000 rows. After we finalize our query, we can run it on the entire table.
Note: When you use a GROUP BY clause without an aggregate function, PROC SQL treats
the GROUP BY clause as if it were an ORDER BY clause, displaying a corresponding
message in the log.
proc sql;
select State
from sq.customer(obs=1000)
group by State;
quit;
2. Modify the query to count the total number of customers in each state by using the COUNT
function. Name the column TotalCustomers. Highlight and run the query. Examine the log and
results.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-42 Lesson 2 PROC SQL Fundamentals
proc sql;
select State, count(*) as TotalCustomers format=comma7.
from sq.customer(obs=1000)
group by state;
quit;
3. Add an ORDER BY clause to the query to sort the report by descending TotalCustomers.
Remove the OBS=1000 data set option and run the final query. Examine the log and results.
proc sql;
select State, count(*) as TotalCustomers format=comma7.
from sq.customer(obs=1000)
group by state
order by TotalCustomers desc;
quit;
4. Replace State in the SELECT and GROUP BY clauses with BankID. Highlight and run the
query. Examine the log and the results.
Note: Missing values are grouped and summarized.
proc sql;
select BankID, count(*) as TotalCustomers format=comma7.
from sq.customer
group by BankID
order by TotalCustomers desc;
quit;
5. Add the Employed column after BankID in the SELECT and GROUP BY clauses. Highlight and
run the query. Examine the log and the results.
proc sql;
select BankID, Employed,
count(*) as TotalCustomers format=comma7.
from sq.customer
group by BankID, Employed
order by TotalCustomers desc;
quit;
6. Add a WHERE clause to filter for TotalCustomers greater than 10,000. Highlight and run the
query. Examine the log and the results.
Note: Because the WHERE clause is evaluated before a row is available for processing and
determines which individual rows are available for grouping, you cannot use a WHERE
clause to subset grouped rows by referring to the calculated summary column.
proc sql;
select BankID, Employed,
count(*) as TotalCustomers format=comma7.,
avg(CreditScore) as avgCreditScore format=3.
from sq.customer
where calculated TotalCustomers > 10000
group by BankID, Employed
order by TotalCustomers desc;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Summarizing and Grouping Data 2-43
7. Remove the WHERE clause and insert a HAVING clause below the GROUP BY clause.
Highlight and run the query. Examine the log and the results.
Note: The order of the clauses is required.
proc sql;
select BankID, Employed,
count(*) as TotalCustomers format=comma7.
from sq.customer
where calculated TotalCustomers > 10000
group by BankID, Employed
having TotalCustomers > 10000
order by TotalCustomers desc;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-44 Lesson 2 PROC SQL Fundamentals
DATEPART(datetime-value) TIMEPART(datetime-value)
select DateTime,
datepart(DateTime) as Date format=date9.,
timepart(DateTime) as Time format=time.,
Amount
from sq.transaction;
Extract the date and
time values from the
DateTime column.
60
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
61
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Summarizing and Grouping Data 2-45
2.08 Activity
Open s102a08.sas from the activities fol der a nd perform the following tasks
to s umma ri ze data us ing date functions:
1. Exa mi ne a nd run the query. Vi ew the res ults. Whi ch month has the
hi ghest va lue for MedianSpent?
2. Repl ace the MONTH function with the QTR function. Cha nge the name
of the Month col umn to Qtr. Run the query. Wha t i s the error?
3. Repl ace Month i n the GROUP BY cl a us e wi th Qtr. Run the query. Whi ch
qua rter ha s the hi ghest va lue for MedianSpent?
62
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
customer
64
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-46 Lesson 2 PROC SQL Fundamentals
65
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Summarizing and Grouping Data 2-47
Scenario
Use functions to summarize the number of customers under 25 and over 64 for each state.
Files
• s102d05.sas
• customer – a SAS table that contains customer information
Syntax
Notes
YRDIF Function
• The start date and end date specify a SAS data value.
• The basis value identifies a character constant or variable that describe how SAS calculates a
date difference.
• If you do not specify a third argument, Age becomes the default value for basis.
• The Age basis specifies that a person’s age is computed.
Demo
1. Open the s102d05.sas program in the demos folder and find the Demo section. Run the query
and examine the results.
proc sql inobs=1000;
create table CustomerCount as
select State,
yrdif(DOB,'01JAN2020'd,'age') as Age
from sq.customer;
quit;
2. Add the less than operator after the YRDIF function test if the row is under 25 years old. Rename
the column Under25. Run the query and examine the results.
proc sql inobs=1000;
create table CustomerCount as
select State,
yrdif(DOB,'01JAN2020'd,'age')<25 as Under25
from sq.customer;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-48 Lesson 2 PROC SQL Fundamentals
3. Copy the expression. Replace the < comparison operator with the > comparison operator.
Change the value from 25 to 64 and the name from Under25 to Over64. Run the query and
examine the results.
proc sql inobs=1000;
create table CustomerCount as
select State,
yrdif(DOB,'01JAN2020'd,'age')<25 as Under25,
yrdif(DOB,'01JAN2020'd,'age')>64 as Over64
from sq.customer;
quit;
4. Summarize the data by wrapping each new column with the SUM function to add all the values
of 1 to count the number of customers. Add a GROUP BY clause with the State column.
Remove the INOBS= option. Run the query and examine the results.
proc sql inobs=1000;
create table CustomerCount as
select State,
sum(yrdif(DOB,'01JAN2020'd,'age')<25) as Under25,
sum(yrdif(DOB,'01JAN2020'd,'age')>64) as Over64
from sq.customer
group by State;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Summarizing and Grouping Data 2-49
Syntax Summary
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-50 Lesson 2 PROC SQL Fundamentals
Practice
Level 1
5. Eliminating Duplicates
The sq.globalfull table contains estimated financial information by geographic region and
country for the population age 15 years and older.
a. Write a query to generate a report that displays the unique CountryCode values from
sq.globalfull.
1) Order the report by the CountryCode.
2) Add the title Unique Country Codes.
3) Run the program and review the results.
Partial Results
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Summarizing and Grouping Data 2-51
Level 2
6. Grouping and Summarizing Data
Create a new program.
a. Write a query to generate a report that identifies which customers have the greatest
percentage of suspiciously large transactions (over $500). Use the following requirements as
you generate the report:
1) Using the sq.transactionfull table as input, select and group the report by CustomerID.
2) Create the following columns:
a) TotalTransactions using the COUNT(*) function to count the number of transactions
for each value of CustomerID.
b) SuspiciousTransactions as SUM(Amount >= 500) to count the number of
transactions greater than 500.
c) PCTSuspicious by dividing SuspiciousTransactions by TotalTransactions.
Format the new column with PERCENT8.2.
3) Select only transactions where the Service value is not equal to University.
4) Filter the output to display only summary rows where PCTSuspicious > .05.
5) Order the report by descending PCTSuspicious.
6) Add the title Customers with High Percentage of Suspicious Transactions.
7) Run the program and review the results.
Results
Challenge
7. Grouping and Summarizing Data with Calculations
The sq.globalfull table contains estimated financial information by geographic region and
country for the population age 15 years and older.
a. Write a query to generate a report that displays how many people in each region ages 15 or
older will borrow for health or medical purposes next year and three years from now. The
report will include the one-year and three-year forecasted counts and a percent increase
between the two.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-52 Lesson 2 PROC SQL Fundamentals
b. Which region had the largest percent decrease from year 1 to year 3?
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Creating and Managing Tables 2-53
Creating Tables
query
existing table
70
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
71
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-54 Lesson 2 PROC SQL Fundamentals
72
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
To create a table from a query result, use a CREATE TABLE statement with the AS keyword, and
place it before the SELECT statement. When a table is created this way, its data is derived from the
table or view that is referenced in the query's FROM clause. The new table's column names are as
specified in the query's SELECT clause list. The new table’s column attributes (the type, length,
informat, format, and extended attributes) are the same as the selected source columns.
You can create only one table in a PROC SQL step. Use the DATA step if you want to create
multiple output tables in a single step:
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Creating and Managing Tables 2-55
2.09 Activity
Open s102a09.sas from the activities fol der a nd perform the following
tas ks :
1. Exa mi ne a nd run the query i n the Create a Table from a Query s ecti on.
Vi ew the res ults.
2. Add the CREATE TABLE s tatement a nd create a table na med Top5States.
Run the query a nd confi rm tha t the ta ble wa s created s uccessfully.
3. Run the code bel ow your SQL query. Wha t di d the code produce?
73
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
76
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
To create an empty table that has the same columns and attributes as an existing table or view, use
the LIKE clause in the CREATE TABLE statement.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-56 Lesson 2 PROC SQL Fundamentals
You can create a new table without rows by using the CREATE TABLE statement to define the
columns and their attributes. You can specify a column's name, type, length, informat, format, and
label. The table definition is enclosed in parentheses. Individual column definitions are separated by
commas.
Description Syntax
78
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
For more information about the CREATE TABLE statement, see “CREATE TABLE Statement” in the
SAS® 9.4 SQL Procedure User’s Guide, Fourth Edition documentation. You can also find the direct
link in the Course Links section on the ELP.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Creating and Managing Tables 2-57
79
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
After tables are created, you can use the INSERT statement to insert data values into tables. You
can use an empty table or a table that is already populated.
proc sql;
insert into work.highcredit
(FirstName, LastName, UserID, CreditScore)
select FirstName, LastName,
UserID, CreditScore
from sq.customer
where CreditScore > 700; Columns from the query must be in
quit; the same position as in the INSERT
column list.
80
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-58 Lesson 2 PROC SQL Fundamentals
proc sql;
insert into employee
(FirstName, LastName, DOB, EmpID)
values("Diego", "Lopez", "01SEP1980"d, 1280)
values("Omar", "Fayed", "21MAR1989"d, 1310);
quit;
81
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Creating and Managing Tables 2-59
proc sql;
insert into employee
set FirstName= "Diego",
LastName= "Lopez",
DOB = "01SEP1980"d, Columns within the SET clause must
EmpID = 1280; exist in the table.
quit;
82
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
With the SET clause, you assign values to columns by name. The columns can appear in any order
in the SET clause.
Note the following features of SET clauses:
• As with other SQL clauses, use commas to separate columns. In addition, you must use a
semicolon after the last SET clause only.
• If you omit data for a column, then the value in that column is a missing value.
• To specify that a value is missing, use a blank in single quotation marks for character values and a
period for numeric values.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-60 Lesson 2 PROC SQL Fundamentals
Description Syntax
A query returning multiple rows based INSERT INTO table-name
on positional values <(column list)>
SELECT columns
FROM table-name;
One clause per row using positional INSERT INTO table-name
values <(column list)>
VALUES (value1,value2,...);
One clause per row using column- INSERT INTO table-name
value pairs SET column-name=value,
column-name=value,...;
83
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
For more information about the INSERT Statement, see “INSERT Statement” in the SAS® 9.4 SQL
Procedure User’s Guide, Fourth Edition documentation. You can also find the direct link in the
Course Links section on the ELP.
84
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Creating and Managing Tables 2-61
2.10 Activity
Open s102a10.sas from the activities fol der a nd perform the following tasks
to crea te a new table a nd i nsert rows into i t:
1. Exa mi ne the CREATE TABLE s tatement a nd run the query onl y. Confi rm
tha t a n empty tabl e wa s created.
2. In the Inserting Rows with a Query s ection, enter the correct col umn
na mes to complete the INSERT INTO s tatement. Run the query. How
ma ny rows were i nserted i nto the table highcredit?
3. In the Inserting Rows with the SET Clause s ection, compl ete the INSERT
INTO s tatement wi th the SET cl a use a nd insert yours elf a s a customer
i nto the highcredit ta bl e. Run the query. Wha t does the note in the l og
s a y?
4. Compl ete the code to drop the highcredit tabl e.
85
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Additional Statements
Statement Description
ALTER TABLE Adds col umns to, drops columns from, a nd changes
col umn a ttri butes i n a n existing table
UPDATE Modi fi es a col umn's va lues in existing rows of a
ta bl e or vi ew
DELETE Removes one or more rows from a ta ble or vi ew that
i s s peci fied i n the FROM cl a use
88
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
For more information about the ALTER TABLE, UPDATE, or DELETE Statement, see “SQL
Procedure Reference” in the SAS® 9.4 SQL Procedure User’s Guide, Fourth Edition documentation.
You can also find the direct link in the Course Links section on the ELP.
A SAS method to alter table metadata is the DATASETS procedure. For more information about
PROC DATASETS, see “DATASETS Procedure” in the Base SAS ® 9.4 Procedures Guide, Seventh
Edition documentation. You can also find the direct link in the Course Links section on the ELP.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-62 Lesson 2 PROC SQL Fundamentals
Syntax Summary
89
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Using DICTIONARY Tables 2-63
Business Scenario
91
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
DICTIONARY Tables
Tables
SAS
libraries
DICTIONARY table
Columns
• information about each SAS session
• updated automatically by SAS Many
• Read-only more!
• metadata: data about other data
• valid in PROC SQL only
92
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-64 Lesson 2 PROC SQL Fundamentals
DICTIONARY tables are special Read-only PROC SQL tables or views. They retrieve information
about all the SAS libraries, SAS data sets, SAS system options, and external files that are
associated with the current SAS session. For example, the DICTIONARY.columns table contains
information such as name, type, length, and format, about all columns in all tables that are known to
the current SAS session. PROC SQL automatically assigns the DICTIONARY libref. To get
information from DICTIONARY tables, specify DICTIONARY.table-name in the FROM clause in a
SELECT statement in PROC SQL.
For more information about DICTIONARY tables, see “Accessing SAS Information by Using
DICTIONARY Tables” in the SAS® 9.4 SQL Procedure User’s Guide, Fourth Edition documentation.
You can also find the direct link in the Course Links section on the ELP.
2.11 Activity
Open s102a11.sas from the activities fol der a nd perform the following tasks
to fi nd a l l the ava ilable DICTIONARY tables in your SAS s ession:
1. Exa mi ne a nd run the program. Vi ew the l og a nd results.
2. Note the col umn l abels for the first two col umns: Member Name i s the
DICTIONARY tabl e, a nd Data Set Label i s the descri ption of tha t table.
3. Repl ace the asterisk in the SELECT cl a use a nd s elect the DISTINCT
memname a nd memlabel col umns. Run the query a nd examine a ll the
a va i lable DICTIONARY tables i n your SAS s es sion.
4. Wha t i s the da ta set label of the members DICTIONARY ta bl e?
93
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Using DICTIONARY Tables 2-65
DICTIONARY Tables
Expl ore
Da ta
96
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-66 Lesson 2 PROC SQL Fundamentals
Scenario
Use SQL to query DICTIONARY information.
Files
• s102d06.sas
• DICTIONARY.tables, DICTIONARY.columns, DICTIONARY.libnames
Syntax
PROC SQL;
DESCRIBE TABLE DICTIONARY.<input-table>;
SELECT *
FROM DICTIONARY.<input-table>
WHERE expression
ORDER BY col-name <DESC>;
QUIT;
Notes
• Dictionary tables are Read-only metadata views that contain session metadata, such as
information about SAS libraries, data sets, and external files in use or available in the current SAS
session.
• DICTIONARY.tables provides detailed information about tables.
• DICTIONARY.columns provides detailed information about all columns in all tables.
• DICTIONARY.libnames provides detailed information about current SAS LIBNAME connections.
Demo
1. Open the s102d06.sas program in the demos folder and find the Demo section.
2. In the Explore dictionary.tables section:
a. Highlight and run the procedure. Examine the log and the results.
b. Add a WHERE clause to subset the Libname column for libraries named SQ and remove
the INOBS= option. Highlight and run the procedure. Examine the log and the results.
Note: The Libname and Memname columns are stored in all uppercase.
proc sql inobs=100;
describe table dictionary.tables;
select *
from dictionary.tables
where Libname = 'SQ';
quit;
c. Discuss the code for the SAS equivalent of DICTIONARY.tables. Highlight and run the
procedure. Examine the log and the results.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Using DICTIONARY Tables 2-67
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-68 Lesson 2 PROC SQL Fundamentals
Syntax Summary
98
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
99
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Using DICTIONARY Tables 2-69
Practice
Level 1
8. Counting the Number of Tables in a Library
a. Write a query to create a report that displays the count of the number of tables in the SQ
library.
1) Use DICTIONARY.tables as input.
2) Name the calculated column TableCount.
3) Add an appropriate title and run the program to review the results.
Results
Level 2
9. Counting the Number of Tables in All Libraries
a. Write a query to create a report that displays the count of the number of tables in all libraries.
1) Use DICTIONARY.tables as input.
2) Name the calculated column TableCount.
3) Group the results by the library name.
4) Add an appropriate title and display the library name and table count as shown below.
(Your library list and counts might differ.)
Results
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-70 Lesson 2 PROC SQL Fundamentals
Challenge
10. Finding Tables with a Column Using Pattern Matching
a. Write a query to create a report that displays the list of all tables in the SQ library with a
column containing ID in its name.
Display the table name and the column name containing ID as shown below.
Partial Results
b. How many unique tables have a column that contains ID in its name?
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Solutions 2-71
2.5 Solutions
Solutions to Practices
1. Querying a Table
/*s102s01.sas*/
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-72 Lesson 2 PROC SQL Fundamentals
quit;
title;
What value of Merchant Name is on the first documented transaction in December?
Big Burgers, Inc.
4. Conditional Processing with a Dynamic Title
/*s102s04.sas*/
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Solutions 2-73
How many unique country codes are in the sq.globalfull table? 151
6. Grouping and Summarizing Data
/*s102s06.sas*/
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-74 Lesson 2 PROC SQL Fundamentals
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Solutions 2-75
3. How ma ny cus tomers a re from either VT, SC, or GA? 2,704 customers
11
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
continued...
2.02 Activity – Correct Answer
1. Wha t do you noti ce a bout the va lues i n the CreditScore col umn? How
ma ny rows a re i n your report? Missing values are included in the
results, for a total of 2,197 rows.
14
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-76 Lesson 2 PROC SQL Fundamentals
equivalent statements
15
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
20
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Solutions 2-77
continued...
2.04 Activity – Correct Answer
28
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
To dynamically create a footnote or title using today’s date, use this statement:
29
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-78 Lesson 2 PROC SQL Fundamentals
34
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
47
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Solutions 2-79
2. Wha t does the new report s how? You can use DISTINCT with the
COUNT function to return the number of distinct, nonmissing values
from a column.
select count(*) as TotalRows format=comma10.,
count(distinct Married) as MaritalStatus format=comma10.
from sq.customer;
55
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
63
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-80 Lesson 2 PROC SQL Fundamentals
continued...
2.09 Activity – Correct Answer
1. Exa mi ne a nd run the query. Vi ew the res ults.
74
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
75
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Solutions 2-81
continued...
2.10 Activity – Correct Answer
1. There are four columns and zero rows in the new highcredit table.
2. How ma ny rows were i nserted i nto the table highcredit? 26,006 rows
were inserted into work.highcredit.
insert into work.highcredit(FirstName,LastName,
UserID,CreditScore)
select FirstName, LastName,
UserID, CreditScore
from sq.customer
where CreditScore > 700;
86
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
4.
proc sql;
drop table highcredit;
quit;
87
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-82 Lesson 2 PROC SQL Fundamentals
continued...
2.11 Activity – Correct Answer
1.
DICTIONARY
table names DICTIONARY
table columns
DICTIONARY
table labels
94
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
95
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 3 SQL Joins
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Introduction to SQL Joins 3-3
Joining Tables
smallcustomer Partial smalltransaction Partial
Partial
FirstName LastName … AccountID AccountID DateTime BankID …
Gary Sienkiewicz … 1010159565 . 07MAY18:15:35:02 . …
Sergio Lefeld … 1010367330 1010159565 16SEP18:14:57:08 101010101 …
John Oliver … 2020012887 1010183063 24FEB18:17:27:42 101010101 …
Iva Bower … 3030085224 1010367330 15MAY18:17:54:21 101010101 …
1010367330 17OCT18:11:02:38 101010101 …
3
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Joining Tables
smallcustomer Partial smalltransaction Partial
FirstName LastName … AccountID AccountID DateTime BankID …
Gary Sienkiewicz … 1010159565 . 07MAY18:15:35:02 . …
Sergio Lefeld … 1010367330 1010159565 16SEP18:14:57:08 101010101 …
John Oliver … 2020012887 1010183063 24FEB18:17:27:42 101010101 …
Iva Bower … 3030085224 1010367330 15MAY18:17:54:21 101010101 …
1010367330 17OCT18:11:02:38 101010101 …
4
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
A primary k ey is a column that contains a unique value for each row of data. A primary key cannot
contain missing values.
A foreign k ey is a column in one table that refers to the primary key in another table.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-4 Lesson 3 SQL Joins
Joining Tables
smallcustomer Partial smalltransaction Partial
FirstName LastName … AccountID AccountID DateTime BankID …
Gary Sienkiewicz … 1010159565 . 07MAY18:15:35:02 . …
Sergio Lefeld … 1010367330 1010159565 16SEP18:14:57:08 101010101 …
John Oliver … 2020012887 1010183063 24FEB18:17:27:42 101010101 …
Iva Bower … 3030085224 1010367330 15MAY18:17:54:21 101010101 …
1010367330 17OCT18:11:02:38 101010101 …
Joins combine data horizontally from multiple source tables to produce either a report or an output
table. The source tables are left intact and untouched.
6
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Introduction to SQL Joins 3-5
Default Join
smallcustomer Partial smalltransaction Partial
FirstName LastName … AccountID AccountID DateTime BankID …
Gary Sienkiewicz … 1010159565 . 07MAY18:15:35:02 . …
Sergio Lefeld … 1010367330 1010159565 16SEP18:14:57:08 101010101 …
John Oliver … 2020012887 1010183063 24FEB18:17:27:42 101010101 …
Iva Bower … 3030085224 1010367330 15MAY18:17:54:21 101010101 …
1010367330 17OCT18:11:02:38 101010101 …
7
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
When you run this query, the Cartesian product creates a report with a total of 96 rows. A Cartesian
product is rarely the result that you want when you join tables. When working with large tables, a
Cartesian product can create an unnecessarily large report or table, and even slow down system
resources.
3.01 Activity
Open s103a01.sas from the activities fol der a nd perform the following tasks
to perform a defa ult join of two tables:
1. Exa mi ne a nd run the two queries to explore the sq.smallcustomer a nd
sq.smalltransaction ta bl es. Confirm tha t the sq.smallcustomer conta i ns
8 rows a nd the sq.smalltransaction conta i ns 12 rows .
2. In the next s ecti on, l ist the sq.smallcustomer a nd sq.smalltransaction
tabl e in the FROM cl a use a nd s eparate the tables by a comma . Run the
query a nd vi ew the l og. What note do you s ee?
3. Vi ew the res ults. Na me two i ssues with the report.
8
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-6 Lesson 3 SQL Joins
Types of Joins
Inner Join Returns only
rows that match
11
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Joining tables is a very common requirement when working with data. There are multiple methods
available in SAS to join tables. The most common are SQL and the DATA step. This course focuses
on SQL joins. The SAS ® Programming 2 course addresses the DATA step merge.
Table Relationships
One-to-One Many-to-Many
A B C D A B C D
1 1 1 1
2 2 1 1
3 3 2 2
One-to-Many Nonmatches
A B C D A B C D
1 1 1 2
2 1 2 3
3 2 4 4
12
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
One-to-One: A single observation in one data set is related to exactly one observation in another
data set based on the values of one or more selected variables.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Introduction to SQL Joins 3-7
One-to-many: A single observation in one data set is related to more than one observation in
another data set based on the values of one or more selected variables.
Many-to-Many: Occurs when multiple records in a table are associated with multiple records in
another table.
Nonmatches: At least one observation in one data set is unrelated to any observation in another
data set based on the values of one or more selected variables.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-8 Lesson 3 SQL Joins
Scenario
14
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
15
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
s103d01
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Inner Joins 3-9
16
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
s103d01
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-10 Lesson 3 SQL Joins
Scenario
Files
• s103d01.sas
• smallcustomer – a SAS table that contains one row per customer
• smalltransaction – a SAS table that contains one row per customer transaction
Syntax
PROC SQL;
SELECT col-name, col-name
FROM table1 INNER JOIN table2
ON table1.col-name=table2.col-name;
QUIT;
Notes
• An SQL inner join combines matching rows between two tables.
• The two tables to be joined are listed in the FROM clause separated by INNER JOIN.
• The ON expression indicates how rows should be matched.
Demo
1. Open the s103d01.sas program in the demos folder and find the Demo section of the program.
Run the queries in the Explore the Tables section to compare the columns of the
sq.smallcustomer and sq.smalltransaction tables.
proc sql;
select *
from sq.smallcustomer;
select *
from sq.smalltransaction;
quit;
2. Find the Perform the INNER JOIN section and add sq.smallcustomer and
sq.smalltransaction to the FROM clause to perform an inner join on AccountID. Qualify
AccountID columns as table-name.col-name in the ON expression only. Highlight and run the
query.
proc sql;
select FirstName, LastName, State, Income, DateTime, MerchantID,
Amount
from sq.smallcustomer inner join
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Inner Joins 3-11
sq.smalltransaction
on smallcustomer.AccountID = smalltransaction.AccountID;
quit;
3. Add the AccountID column to the query after Amount. Highlight and run the query. Examine the
log. Why does the program fail?
Note: There is an ambiguous reference. The column AccountID is in more than one table.
proc sql;
select FirstName, LastName, State, Income, DateTime, MerchantID,
Amount, AccountID
from sq.smallcustomer inner join
sq.smalltransaction
on smallcustomer.AccountID = smalltransaction.AccountID;
quit;
4. Modify the query to qualify the AccountID column in the SELECT clause. Highlight the step and
run the selected code.
Note: Because AccountID occurs in both tables, you must qualify the column with the table
name to indicate which column you want to select.
proc sql;
select FirstName, LastName, State, Income, DateTime, MerchantID,
Amount, smallcustomer.AccountID
from sq.smallcustomer inner join
sq.smalltransaction
on smallcustomer.AccountID = smalltransaction.AccountID;
quit;
5. Modify the query to include a WHERE clause to subset for customers who have a State value of
NY (New York) and an ORDER BY clause that sorts by descending Amount.
proc sql;
select FirstName, LastName, State, Income, DateTime, MerchantID,
Amount, smallcustomer.AccountID
from sq.smallcustomer inner join
sq.smalltransaction
on smallcustomer.AccountID = smalltransaction.AccountID
where State = "NY"
order by Amount desc;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-12 Lesson 3 SQL Joins
19
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
proc sql;
select FirstName, LastName, State, Income, DateTime, c.AccountID
from sq.smallcustomer as c inner join
sq.smalltransaction as t
on c.AccountID = t.AccountID;
quit;
20
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
A table alias is a temporary alternate name for a table. You specify table aliases in the FROM
clause.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Inner Joins 3-13
3.02 Activity
Open s103a02.sas from the activities fol der a nd perform the following tasks
to perform a n i nner join on tables wi th different col umns names:
1. Exa mi ne a nd run the two queries to explore the sq.statepopulation a nd
sq.statecode tabl es. Wha t columns ca n you us e to joi n the tables?
2. Speci fy the tabl es in the FROM cl a use a nd perform a n inner join. Add
the a l i as p for the sq.statepopulation ta bl e, a nd the a lias s for the
sq.statecode tabl e.
3. Compl ete the ON expression to ma tch rows where p.Name =
s.StateCode. Hi ghl ight a nd run the query. How ma ny rows a re in the new
report?
21
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
24
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
A natural join selects rows from two tables that have equal values in columns that share the same
name and the same type. A natural join is requested with the syntax NATURAL JOIN. If like columns
are not found, then a Cartesian product is performed.
Do not use an ON clause with a natural join. When using a natural join, an ON clause is implied,
matching all like columns. You can use a WHERE clause to subset the query results.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-14 Lesson 3 SQL Joins
A natural join can be an inner join or an outer join, which is requested with the syntax INNER or
OUTER. If the join type specification is omitted, then an inner join is implied.
FEEDBACK Option
25
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
26
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Inner Joins 3-15
3.03 Activity
Open s103a03.sas from the activities fol der a nd perform the following tasks
to fi nd tabl es that contain BankID a nd MerchantID col umns:
1. Compl ete the fi rst query by a ddi ng the BANKID col umn na me i n the
WHERE cl a use. How ma ny tables contain the BankID col umn?
2. Repl ace BANKID wi th MERCHANTID. How ma ny tabl es contain the
MerchantID col umn?
27
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-16 Lesson 3 SQL Joins
Scenario
Files
• s103d02.sas
• smallcustomer – a SAS table that contains one row per customer
• smalltransaction – a SAS table that contains one row per customer transaction
• bank – a SAS table that contains bank information
• merchant – a SAS table that contains merchant information
Syntax
PROC SQL;
SELECT col-name, col-name
FROM table1 INNER JOIN table2
ON table1.col-name=table2.col-name INNER JOIN
table3
ON join-criteria INNER JOIN
table4
ON join-criteria;
QUIT;
Notes
• To join more than two tables, each join must be listed individually in the query.
Demo
1. Open the s103d02.sas program in the demos folder and find the Demo section of the program.
Under the Explore the Tables section, run the queries to explore the sq.smallcustomer,
sq.smalltransaction, sq.bank, and sq.merchant tables. Describe the relationships between
the tables.
proc sql inobs=5;
title "Table: SMALLCUSTOMER";
select *
from sq.smallcustomer;
title "Table: SMALLTRANSACTION";
select *
from sq.smalltransaction;
title "Table: MERCHANT";
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Inner Joins 3-17
select *
from sq.merchant;
title "Table: BANK";
select *
from sq.bank;
title;
quit;
2. Find the Joining Data from More Than Two Tables section. Highlight and run the query to join
sq.smallcustomer with sq.transaction. Examine the results.
proc sql;
select FirstName, LastName, c.State, Income, DateTime, MerchantID,
Amount, c.AccountID, c.BankID
from sq.smallcustomer as c inner join
sq.smalltransaction as t
on c.AccountID = t.AccountID;
quit;
3. Add a second inner join and join the MerchantID column from the sq.merchant table with the
MerchantID column of the previous join. Replace MerchantID in the SELECT clause with
MerchantName. Highlight and run the query. Examine the results.
proc sql;
select FirstName, LastName, c.State, Income, DateTime, MerchantID
MerchantName, Amount, c.AccountID, c.BankID
from sq.smallcustomer as c inner join sq.smalltransaction as t
on c.AccountID = t.AccountID inner join
sq.merchant as m
on t.MerchantID = m.MerchantID;
quit;
4. Add a third inner join and join the BankID column from the sq.bank table with the BankID
column of the previous join. Replace BankID in the SELECT clause with the bank name.
Highlight and run the query. Examine the results.
proc sql;
select FirstName, LastName, c.State, Income, DateTime,
MerchantName, Amount, c.AccountID, c.BankID, b.Name
from sq.smallcustomer as c inner join
sq.smalltransaction as t
on c.AccountID = t.AccountID inner join
sq.merchant as m
on t.MerchantID = m.MerchantID inner join
sq.bank as b
on t.BankID = b.BankID;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-18 Lesson 3 SQL Joins
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Inner Joins 3-19
30
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Most database products treat missing as the absence of a value (nulls). Because they do not contain
any value, they are excluded from any conditional evaluation.
PROC SQL treats missing as missing values and matches for joins. Any missing value will match
with any other missing value of the same type (character or numeric) in a join. This could return
unexpected results.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-20 Lesson 3 SQL Joins
proc sql;
select *
from sq.smallcustomer2 as c inner join
sq.smalltransaction2 as t
on c.AccountID = t.AccountID and
c.AccountID is not null;
quit;
Adding the IS NOT NULL operator to
the ON clause prevents the missing
values from joining.
32
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Non-Equijoin
smallcustomer
33
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Inner Joins 3-21
Non-Equijoin
34
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
3.04 Activity
Open s103a04.sas from the activities fol der a nd perform the following tasks
to us e a non-equijoin.
1. Compl ete the ON cl a use to joi n on rows where cus tomer Income i s
grea ter tha n the LowIncome ra nge, a nd l ess tha n or equal to the
HighIncome ra nge us ing the BETWEEN-AND where operator.
2. Wha t ta x bra cket i s Ol ga Coms tock i n?
3. Vi ew your l og. Wha t note do you s ee?
35
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-22 Lesson 3 SQL Joins
Syntax Summary
SELECT col-name, col-name
FROM table1 INNER JOIN table2
ON table1.column = table2.column;
Inner Join
38
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Inner Joins 3-23
Practice
Level 1
1. Inner Join
Open s103p01.sas from the practices folder. Modify the program to generate a report that
shows the breakdown of employment and marital status for customers in New York City.
a. Add a PROC SQL step to create a table named work.nyc that combines sq.customer and
sq.maritalcode. Follow the requirements below.
1) This table should include only FirstName, LastName, Employed, and MaritalStatus.
2) Perform an inner join on the Married column in the sq.customer table and MaritalCode
column in the sq.maritalcode table.
3) Filter the Zip column for customers in the 10001 ZIP code.
b. Execute the PROC FREQ step to generate the crosstabulation of MaritalStatus and
Employed.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-24 Lesson 3 SQL Joins
Level 2
2. Join on Inequality
Open s103p02.sas from the practices folder. Modify the program to join the sq.customer and
sq.agegroup tables based on a customer’s year of birth.
a. Add a PROC SQL step to the top of the program to create a table named work.generation
that combines sq.customer and sq.agegroup. Follow the requirements below.
1) Select FirstName and LastName, and create a column named Year to determine the
DOB year of the customer from the sq.customer table. Select the Name column from
the sq.agegroup table.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Inner Joins 3-25
2) The StartYear and EndYear columns in the sq.agegroup table indicate the starting and
ending years for each generation. Use these columns to perform a non-equijoin using the
calculated Year value from step 1) above.
b. Execute the PROC SGPLOT step below your query to generate the bar chart shown below.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-26 Lesson 3 SQL Joins
Challenge
Open s103p03.sas from the practices folder. Modify the program to create a table named
work.births3 that will be used to create a report showing a three-year projection of births by
Region, Division, and State. The data to create this report is stored in four tables as described
below:
a. sq.statepopulation contains
b. sq.regioncode contains
1) RegionCode
2) RegionName.
c. sq.divisioncode contains
1) DivisionCode
2) DivisionName.
d. sq.statecode contains
1) StateCode
2) StateName.
e. Using all inner joins, combine the four tables above such that the descriptive Region,
Division, and State names are combined with the Births3 data.
f. Execute the PROC TABULATE step to generate the required report and answer the following
questions:
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Inner Joins 3-27
1) Which division has the highest projected three-year births, and which has the lowest?
2) Which region has the highest projected three-year births, and which has the lowest?
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-28 Lesson 3 SQL Joins
41
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Left joins return all rows from the left table and matching rows from the right table.
Right joins return all rows from the right table and the matching rows from the left.
Full joins return all matching and nonmatching rows from both tables.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Outer Joins 3-29
Join Type
SELECT columns
FROM table1 LEFT | RIGHT | FULL JOIN table2
ON table1.column = table2.column;
42
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Scenario
smallcustomer Partial Report of all
Left Join customers with
or without a
transaction
smalltransaction Partial
43
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-30 Lesson 3 SQL Joins
Report of all
customers with or
without transactions
44
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
A left outer join lists matching rows and rows from the left-hand table (the first table listed in the
FROM clause) that do not match any row in the right-hand table. A left join is specified with the
keywords LEFT JOIN and ON.
Scenario
smallcustomer Partial
Right Join
Report of all
transactions
with or without
a customer
smalltransaction Partial
45
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Outer Joins 3-31
46
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
A right join, specified with the keywords RIGHT JOIN and ON, is the opposite of a left join:
nonmatching rows from the right-hand table (the second table listed in the FROM clause) are
included with all matching rows in the output. This example reverses the join of the last example. It
uses a right join to select all the transactions from the smalltransaction table and displays the
customers only if the customer matches.
c.AccountID t.AccountID
proc sql;
select FirstName, LastName, Income, AccountID,
DateTime, MerchantID, Amount
from sq.smallcustomer as c inner join
sq.smalltransaction as t With an inner join,
on c.AccountID = t.AccountID; you can select either
quit; AccountID column.
47
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-32 Lesson 3 SQL Joins
c.AccountID = t.AccountID
proc sql;
select FirstName, LastName, Income, AccountID,
DateTime, MerchantID, Amount
from sq.smallcustomer as c left join Depending on which
sq.smalltransaction as t AccountID column
on c.AccountID = t.AccountID; we choose, our
quit; results differ.
48
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
3.05 Activity
Open s103a05.sas from the activities fol der a nd perform the following tasks
to perform a l eft joi n:
1. Run the query to crea te a l eft join between the sq.smallcustomer a nd
sq.smalltransaction ta bl es. Notice the difference wi thin the AccountID
col umns in rows 8 a nd 9.
2. Remove the col umn t.AccountID i n the SELECT cl a us e. Run the query and
exa mi ne the results . How ma ny mi ssing AccountID va l ues a re i n the
res ul ts?
3. Repl ace c.AccountID wi th t.AccountID. Repl a ce the c i n the col umn l abel
wi th a t. How ma ny mi s sing AccountID va l ues are i n the results?
49
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Outer Joins 3-33
Scenario
smallcustomer Partial
Full Join
Report of all
customers and
transactions.
smalltransaction Partial
52
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Full Join
select *
from sq.smallcustomer as c full join
sq.smalltransaction as t
on c.AccountID = t.AccountID;
53
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-34 Lesson 3 SQL Joins
54
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
COALESCE Function
55
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Outer Joins 3-35
Scenario
Files
• s103d03.sas
• smallcustomer – a SAS table that contains one row per customer
• smalltransaction – a SAS table that contains customer transaction data
Syntax
PROC SQL;
SELECT col-name, col-name
FROM table1 FULL JOIN table2
ON table1.col-name=table2.col-name;
QUIT;
Notes
• The two tables to be joined are listed in the FROM clause separated by FULL JOIN.
• A full join lists all values in both tables, with or without a match.
• The COALESCE function returns the value of the first nonmissing argument.
Demo
1. Open the s103d03.sas program in the demos folder and find the Demo section of the program.
Highlight and run the query. Examine the results. Discuss the values in both AccountID
columns.
proc sql;
select FirstName, LastName, Income, c.AccountID, t.AccountID,
DateTime, MerchantID, Amount
from sq.smallcustomer as c full join sq.smalltransaction as t
on c.AccountID = t.AccountID;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-36 Lesson 3 SQL Joins
2. Modify the SELECT clause and remove the t.AccountID column. Highlight and run the query.
Examine the results. Discuss the missing values in the c.AccountID column.
proc sql;
select FirstName, LastName, Income, c.AccountID, t.AccountID,
DateTime, MerchantID, Amount
from sq.smallcustomer as c full join
sq.smalltransaction as t
on c.AccountID = t.AccountID;
quit;
3. Modify the SELECT clause and replace c.AccountID with t.AccountID. Highlight and run the
query. Examine the results. Discuss the missing values in the t.AccountID column.
proc sql;
select FirstName, LastName, Income, t.AccountID,
DateTime, MerchantID, Amount
from sq.smallcustomer as c full join
sq.smalltransaction as t
on c.AccountID = t.AccountID;
quit;
4. Modify the SELECT clause, use the COALESCE function, and add c.AccountID and
t.AccountID as arguments. Add the alias AccountID and the FORMAT=10. column modifier to
the newly created column. Highlight and run the query. Examine the results.
proc sql;
select FirstName, LastName, Income,
coalesce(c.AccountID,t.AccountID) as AccountID format=10.,
DateTime, MerchantID, Amount
from sq.smallcustomer as c full join
sq.smalltransaction as t
on c.AccountID = t.AccountID;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Outer Joins 3-37
57
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
58
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-38 Lesson 3 SQL Joins
59
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
3.06 Activity
Open s103a06.sas from the activities fol der a nd perform the following tasks
to fi nd a l l tra nsactions not a ssoci ated wi th a documented cus tomer:
1. Run the query to crea te a l eft join between the sq.smalltransaction2 a nd
sq.smallcustomer2 tabl es. Exa mine the report. Notice that the rows with
mi s s ing va lues i n AccountID ha ve been joined.
2. In the ON cl a us e, a dd the expression AND t.AccountID is not null. Run
the query. Confi rm tha t mi s sing va lues were not joined.
3. Add a WHERE cl a use with the expression c.AccountID is NULL to fi l ter for
a l l tra nsactions wi thout a documented customer. Run the query a nd
exa mi ne the report. How ma ny tra nsacti ons do not have a customer
a s s oci ated wi th them?
60
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Outer Joins 3-39
Syntax Summary
SELECT columns
FROM table1 <LEFT |RIGHT |FULL > JOIN table2
ON table1.column = table2.column
Outer Join
64
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-40 Lesson 3 SQL Joins
Practice
Level 1
Join the sq.globalpop and sq.globalmetadata tables to create the work.meta table. Use the
work.meta table to generate a report showing the country codes for countries in the
sq.globalpop table that do not have any country metadata in the sq.globalmetadata table.
b. Select the CountryCode, SeriesName, EstYear1, and EstYear3 columns from the
sq.globalpop table and the ShortName and IncomeGroup columns from the
sq.globalmetadata table.
d. Use the CountryCode column in both tables for the join criteria.
e. Create a report showing the unique country codes for which there is no global metadata
using the work.meta table.
1) Select the CountryCode column from the work.meta table and eliminate duplicate
values.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Outer Joins 3-41
2) Filter for rows where the ShortName column is missing. The ShortName column
contains values from the sq.globalmetadata table. If the results are missing, then the
row did not retrieve information from sq.globalmetadata.
Partial Results
Level 2
Generate a report showing the count of customer marital status descriptions for each primary
bank. The sq.customer table contains a marital code (Married) and a primary bank ID. The final
results should contain BankID, MaritalStatus, Name (name of bank), and Count.
a. Write a PROC SQL step to join the sq.customer table and the sq.maritalcode table.
1) Select BankID from the sq.customer table and MaritalStatus value from the
sq.maritalcode table. Create a new column named Count to count the number of
customers. Format the new column using commas.
2) Use a left join to select all customers from the sq.customer table, with or without
matches in the sq.maritalcode table.
3) Use the Married column in the sq.customer table and the MaritalCode column in the
sq.maritalcode table as the join criteria.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-42 Lesson 3 SQL Joins
Partial Results
b. In the same PROC SQL step, add the descriptive bank Name column to the results.
1) After the MaritalStatus column in the SELECT clause, add Name from the sq.bank
table based on matching BankID values, again using a left join.
Partial Results
c. Which combination of MaritalStatus and Name had the lowest count of customers?
Challenge
a. Some of our New York City merchants are having difficulty with sales. Your job is to identify
merchants who have had no recent sales, capture what industry they are in, and then
generate a list of customers in their area who are transacting with other vendors to help them
market the appropriate customers to generate sales.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Outer Joins 3-43
1) Use a left outer join to combine the sq.merchant and sq.transaction tables to identify
New York City merchants (Zip=10001) with no transactions. From the sq.merchant
table, select MerchantName, MerchantID, Type, and Zip. Label Type as Merchant
Type and Zip as Merchant Zipcode. Merchants who do not have any transactions will
be those who have no entries in the sq.transaction table.
Results
b. The next step is to find the customers in NYC who have similar Type transactions with other
vendors. The problem is that the sq.transactionfull table (where transaction information is
combined with customer information) does not contain a column for the customer’s ZIP code
for us to join on, only the full address. Therefore, we need to extract the ZIP code from the
customer.Address column to perform the final join.
1) In the same PROC SQL step, use an inner join to combine the above list with the
sq.transactionfull table by Type and Zip to generate a report with customer contact
information for all NYC customers with similar Type transactions.
Hint: Assign the sq.transactionfull table an alias of c and use the following expression
to extract Zip from the c.address column to perform this join:
input(scan(c.address,-1),5.)
2) Include CustomerID, CustomerName, and Address from the sq.transactionfull table.
Label Address as Customer Address.
3) Ensure that the report shows unique rows of data and is ordered by merchantID and
CustomerID. Add an appropriate title to the report.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-44 Lesson 3 SQL Joins
c. How many potential new customers can the merchant Miasma Mitigation, Inc. now market?
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.4 Complex Joins 3-45
Reflexive Join
67
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Reflexive Join
68
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-46 Lesson 3 SQL Joins
69
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.4 Complex Joins 3-47
Scenario
Files
• s103d04.sas
• employee – a SAS table that contains one row per employee
Syntax
PROC SQL;
SELECT col-name, col-name
FROM table1 INNER JOIN table1
ON table1.col-name=table1.col-name;
QUIT;
Notes
• A reflexive join or self-join is used to join a table to itself as if the table were two tables.
• The ON expression indicates how rows should be matched. The column names must be qualified
as table-name.col-name.
Demo
1. Open the s103d04.sas program in the demos folder and find the Demo section of the program.
Highlight and run the query. Examine the results.
proc sql;
select e.EmployeeID, e.EmployeeName, e.StartDate format=date9.,
e.ManagerID
from sq.employee as e;
quit;
2. Modify the query to create a reflexive join. In the FROM clause, add an inner join followed by the
sq.employee table again. Add the alias m to the second sq.employee table. Add the ON clause
and set e.ManagerID equal to m.EmployeeID.
proc sql;
select e.EmployeeID, e.EmployeeName, e.StartDate format=date9.,
e.ManagerID
from sq.employee as e inner join
sq.employee as m
on e.ManagerID = m.EmployeeID;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-48 Lesson 3 SQL Joins
3. Add the EmployeeName column in the SELECT clause. Qualify the new EmployeeName
column with the table alias m. Highlight and run the query. Examine the results.
proc sql;
select e.EmployeeID, e.EmployeeName, e.StartDate format=date9.,
e.ManagerID, m.EmployeeName
from sq.employee as e inner join
sq.employee as m
on e.ManagerID = m.EmployeeID;
quit;
4. Add the column alias ManagerName to the m.EmployeeName column and an ORDER BY
clause to sort by ManagerName. Highlight and run the query. Examine the results.
proc sql;
select e.EmployeeID, e.EmployeeName, e.StartDate format=date9.,
e.ManagerID, m.EmployeeName as ManagerName
from sq.employee as e inner join
sq.employee as m
on e.ManagerID = m.EmployeeID
order by ManagerName;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.4 Complex Joins 3-49
Scenario
transactionfull statecode
71
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
72
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-50 Lesson 3 SQL Joins
3.07 Activity
Open s103a07.sas from the activities fol der a nd perform the following tasks
to joi n two tabl es us ing the SUBSTR function:
1. Exa mi ne a nd run the query. Di d you recei ve a s yntax error?
2. In the ON cl a us e, us e the SUBSTR function on t.StateID to extra ct the
first two characters. Run the query. Whi ch StateName i s Caberto, Glen
Daniel from?
73
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Numeric ZIP
75
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.4 Complex Joins 3-51
3.08 Activity
Open s103a08.sas from the activities fol der a nd perform the following tasks
to joi n a numeric col umn wi th a character col umn:
1. Run the queri es i n the Create a Table and Insert Values s ection. View the
newl y crea ted table.
2. Run the query i n the Join Different Column Types s ection. What s yntax
error wa s generated when you joi ned columns of di fferent types ?
3. Lea ve the progra m open for the next a cti vi ty.
76
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
character arithmetic
function calculation
78
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
For more information about the PUT function, see "PUT Function" in the SAS® 9.4 Functions and
CALL Routines: Reference, Fifth Edition documentation. You can also find the direct link in the
Course Links section on the ELP.
For more information about the INPUT function, see "INPUT Function" in the SAS® 9.4 Functions
and CALL Routines: Reference, Fifth Edition documentation. You can also find the direct link in the
Course Links section on the ELP.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-52 Lesson 3 SQL Joins
put(z.Zip,z5.)
79
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
3.09 Activity
Us e s103a08.sas from the previ ous a ctivi ty a nd perform the following ta sks
to joi n a numeric col umn wi th a character col umn by us i ng the PUT
function.
1. If you ha ve not run the queri es i n the Create a Table and Insert Values
s ecti on, run those now.
2. Us e the PUT functi on to convert z.Zip i n the ON cl a use to a cha ra cter
va l ue using the z5 forma t. Run the query.
3. Wha t ci ty does the ZipCode va l ue 14216 repres ent?
80
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.4 Complex Joins 3-53
• Take the SAS Programming 2 • Visit the Course Handouts • Read the SAS paper Top 10
course to learn more about section on the ELP and Most Powerful Functions for
DATA step match-merges. download the SQL Join PROC SQL.
• Visit the SQL and the DATA Summary PDF.
Step section on the ELP for
additional resources about
comparisons of the DATA
step and PROC SQL.
82
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-54 Lesson 3 SQL Joins
3.5 Solutions
Solutions to Practices
1. Inner Join
/*s103s01.sas*/
proc sql;
create table work.NYC as
select c.FirstName, c.LastName, c.Employed, m.MaritalStatus
from sq.customer as c inner join
sq.maritalcode as m
on c.married=m.MaritalCode
where zip=10001;
quit;
/*Alternate Solution*/
proc sql;
create table work.NYC as
select c.FirstName, c.LastName, c.Employed, m.MaritalStatus
from sq.customer as c, sq.maritalcode as m
where c.married=m.MaritalCode and
zip=10001;
quit;
Does this vary across marital status? Yes, NYC customers whose marital status is Single
are more likely to be unemployed.
2. Join on Inequality
/*s103s02.sas*/
/*Solution*/
proc sql;
create table work.generation as
select c.FirstName,
c.LastName,
year(c.DOB) as Year,
a.Name
from sq.Customer as c inner join
sq.AgeGroup as a
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-55
/*Alternate Solution*/
proc sql;
create table work.generation as
select c.FirstName,
c.LastName,
year(c.DOB) as Year,
a.Name
from sq.Customer as c, sq.AgeGroup as a
where calculated Year between a.StartYear and a.EndYear;
quit;
/*Bar Chart*/
title 'Count of Customers by Generation';
proc sgplot data=work.generation noautolegend;
hbar Name /
stat=freq
dataskin=sheen
categoryorder=respdesc
datalabel
datalabelattrs=(size=9pt)
FILLATTRS=(color=cx6f7eb3);
yaxis label='Generation';
xaxis grid label='Count';
run;
title;
Which age group has the most customers? Millennials or Generation Y
proc sql;
create table Births3 as
select c.RegionName, d.DivisionName, b.Statename, a.Births3
from sq.statepopulation as a inner join
sq.statecode as b
on a.name=b.statecode inner join
sq.RegionCode as c
on a.region=c.regionCode inner join
sq.divisioncode as d
on a.Division=d.divisioncode;
quit;
/*Alternate Solution*/
proc sql;
create table Births3 as
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-56 Lesson 3 SQL Joins
Which region has the highest projected three-year births, and which has the lowest?
South region has the highest. Northeast region has the lowest.
************************************;
*Solution with Temporary Table *;
************************************;
proc sql;
create table work.meta as
select p.CountryCode, p.SeriesName, p.EstYear1, p.EstYear3,
m.ShortName, m.IncomeGroup
from sq.globalpop as p left join
sq.globalmetadata as m
on p.countrycode= m.countrycode;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-57
************************************;
*Solution with No Temporary Table *;
************************************;
title 'Countries with no Metadata';
title2 'No Temporary Table';
proc sql;
select distinct p.CountryCode
from sq.globalpop as p left join
sq.globalmetadata as m
on p.CountryCode= m.CountryCode
where ShortName is null;
quit;
title;
What is the last CountryCode value in your results? WSM
/*a*/
proc sql;
select c.BankID,
m.MaritalStatus,
count(*) as Count format=comma10.
from sq.customer as c left join
sq.maritalcode as m
on c.Married=m.MaritalCode
where c.BankID is not null
group by c.BankID, m.MaritalStatus
order by Count desc;
quit;
/*b*/
title 'Count of Marital Status by Bank';
proc sql;
select c.BankID,
m.MaritalStatus,
b.Name,
count(*) as Count format=comma10.
from sq.customer as c left join
sq.maritalcode as m
on c.Married=m.MaritalCode left join
sq.bank as b
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-58 Lesson 3 SQL Joins
on c.BankID=b.BankID
where c.BankID is not null
group by c.BankID, m.MaritalStatus, b.Name
order by Count desc;
quit;
title;
Which combination of MaritalStatus and Name had the lowest count of customers?
Widowed and Wheatberry Bank, Inc. had a total count of 362.
/*a*/
title 'NYC Merchants with no Transactions';
proc sql;
select a.MerchantName,
a.MerchantID,
a.Type 'Merchant Type',
a.Zip 'Merchant Zipcode'
from sq.merchant as a left join
sq.transaction as b
on a.MerchantID=b.MerchantID
where a.Zip=10001 and b.MerchantID is null;
quit;
title;
/*b*/
proc sql;
title 'Customers with similar Type Transactions';
title2 'as NYC Merchants with no Sales';
select distinct a.MerchantName,
a.MerchantID,
a.type 'Merchant Type',
a.Zip 'Merchant Zipcode',
c.CustomerID,
c.Customername,
c.Address 'Customer Address'
/*Join merchants with transactions. Notice where clause below
that determines which rows are returned.*/
from sq.merchant as a left join
sq.transaction as b
on a.MerchantID=b.MerchantID
/*Combine the above list with all customers with similar type
transactions.*/
inner join sq.transactionfull as c
on a.type=c.type and input(scan(c.address,-1),5.)=a.zip
/*the where filter selects NYC merchants who do not exist in
the transaction table=no sales*/
where a.zip=10001 and b.MerchantID is null
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-59
order by 2,5;
quit;
title;
How many potential new customers can the merchant Miasma Mitigation, Inc. now market?
Six potential new customers
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-60 Lesson 3 SQL Joins
continued...
3.01 Activity – Correct Answer
2. Wha t note do you s ee?
9
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Nonmatching IDs
10
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-61
continued...
3.02 Activity – Correct Answer
1. Wha t col umns ca n you us e to joi n the ta bles? The Name column from
the statepopulation table and StateCode column from the statecode
table
22
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
23
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-62 Lesson 3 SQL Joins
28
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
continued...
3.04 Activity – Correct Answer
1.
select FirstName, LastName,
Income format=dollar16., TaxBracket
from sq.smallcustomer as c inner join
sq.taxbracket as t
on c.Income between t.LowIncome and t.HighIncome
order by TaxBracket desc, Income desc;
36
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-63
37
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
continued...
3.05 Activity – Correct Answer
1. Noti ce the di fference wi thin the AccountID col umns i n rows 8 a nd 9.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-64 Lesson 3 SQL Joins
continued...
3.06 Activity – Correct Answer
1. Noti ce tha t the rows wi th mi ssing va lues i n AccountID ha ve been joined.
61
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-65
continued...
3.06 Activity – Correct Answer
2. Confi rm tha t mi s sing va l ues were not joined.
62
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
63
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-66 Lesson 3 SQL Joins
74
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
ERROR: Expression using equals (=) has components that are of different data types.
77
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-67
81
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-68 Lesson 3 SQL Joins
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 4 Subqueries
4.1 Subquery in WHERE and HAVING Clause s .................................................................. 4-3
Demonstration: Subquery That Returns a Single Value................................................. 4-7
Demonstration: Subquery That Returns Multiple Values.............................................. 4-12
Practice............................................................................................................... 4-19
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Subquery in WHERE and HAVING Clauses 4-3
3
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-4 Lesson 4 Subqueries
Noncorrelated Subqueries
Outer Query
SELECT … Independent
FROM … Subquery
<WHERE …> (select avg(PopEstimate1)
<GROUP BY …> from sq.statepopulation
where … )…
<HAVING …>
<ORDER BY …>;
4
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
This course focuses on noncorrelated subqueries and refers to them as simply subqueries for ease.
Scenario
5
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Subquery in WHERE and HAVING Clauses 4-5
Subqueries Steps
6
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
proc sql;
1
select avg(PopEstimate1) as Average
from sq.statepopulation;
quit;
proc sql;
2 What happens if
PopEstimate1
select Name, PopEstimate1
from sq.statepopulation changes in the data?
where PopEstimate1 > 6278420;
quit;
7
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-6 Lesson 4 Subqueries
3
select Name, PopEstimate1
from sq.statepopulation
where PopEstimate1 > (select avg(PopEstimate1)
from sq.statepopulation);
The subquery
The subquery is evaluated first. executes
independently of the
outer query.
8
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
3
select Name, PopEstimate1
from sq.statepopulation
where PopEstimate1 > (6278420);
9
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Subquery in WHERE and HAVING Clauses 4-7
Scenario
Use a noncorrelated subquery that returns a single value.
Files
• s104d01.sas
• statepopulation – a SAS table that contains estimated state populations for the next three years
Syntax
PROC SQL;
SELECT col-name, col-name
FROM input-table
WHERE column operator (SELECT col-name
FROM input-table…)
ORDER BY col-name <DESC>;
QUIT;
Notes
• A subquery is a query expression that is nested as part of another query expression.
• A single-value subquery returns a single row and column.
Demo
1. Open the s104d01.sas program in the demos folder and find the Demo section. Run the query
to explore the sq.statepopulation table.
2. Run the query in the Future Subquery section to find the average of PopEstimate1 in the
sq.statepopulation table. View the results.
proc sql;
select avg(PopEstimate1)
from sq.statepopulation;
quit;
3. Complete the outer query to find all states that have a higher PopEstimate1 value than the
result of the previous query's value of 6278420. Sort the results by descending PopEstimate1.
proc sql;
select Name, PopEstimate1
from sq.statepopulation
where PopEstimate1 > 6278420
order by PopEstimate1 desc;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-8 Lesson 4 Subqueries
4. Remove the value 6278420 in the WHERE clause, replace it with the query that found the
average of PopEstimate1, and enclose the inner query in parentheses.
Note: Remember to remove the semicolon from the subquery when copying and pasting.
proc sql;
select Name, PopEstimate1
from sq.statepopulation
where PopEstimate1 > (select avg(PopEstimate1)
from sq.statepopulation)
order by PopEstimate1 desc;
quit;
5. Replace the subquery with a new query. Find the average of Population_2010 from the
sashelp.us_data table.
Note: The table in a subquery can be a different table than the outer query.
proc sql;
select Name, PopEstimate1
from sq.statepopulation
where PopEstimate1 > (select avg(Population_2010)
from sashelp.us_data)
order by PopEstimate1 desc;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Subquery in WHERE and HAVING Clauses 4-9
4.01 Activity
Open s104a01.sas from the activities fol der a nd perform the following tasks
to us e a subquery to return two col umns i n the WHERE cl a use:
1. Exa mi ne a nd run the fi rst query. Confi rm tha t the res ults contain one
row a nd two col umns.
2. Add the fi rs t query a s a subquery i n the s econd query to fi nd a ll s tates
wi th PopEstimate1 hi gher tha n the a verage estimated s tate population.
3. Run the query. Wha t i s the s yntax error i n the l og?
11
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
sq.statepopulation
14
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-10 Lesson 4 Subqueries
1
proc sql;
select avg(PopEstimate1) as Average
from sq.statepopulation;
quit;
2
proc sql;
select Division, avg(PopEstimate1) as avgDivisionPop
from sq.statepopulation
group by Division
having avgDivisionPop > 6278420;
quit;
15
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
4.02 Activity
Open s104a02.sas from the activities fol der a nd perform the following tasks
to perform a s ubquery i n the HAVING cl a use:
1. Exa mi ne a nd run the fi rst query. Vi ew the results.
2. Modi fy the s econd query. Copy the va l ue returned by the fi rst query
i nto the s ubquery a ga inst the HAVING cl ause to return di visions wi th
a n a vera ge PopEstimate1 va l ue greater tha n the total a verage of
PopEstimate1.
3. Remove the s tatic va lue a nd a dd the subquery i n the HAVING cl a use.
4. How ma ny di vi s ions have a hi gher a verage PopEstimate1 tha n the
a vera ge PopEstimate1 of a l l the s tates?
16
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Subquery in WHERE and HAVING Clauses 4-11
18
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
proc sql;
1
select Name
from sq.statepopulation
where Division = '3';
quit;
2
proc sql;
create table division3 as
select *
from sq.customer
where State in ("IL","IN","MI","OH","WI");
quit;
19
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-12 Lesson 4 Subqueries
Scenario
Use a noncorrelated subquery that returns multiple values.
Files
• s104d02.sas
• customer – a SAS table that contains one row per customer
• statepopulation – a SAS table that contains estimated state populations for the next three years
Syntax
PROC SQL;
SELECT col-name, col-name
FROM input-table
WHERE column IN (SELECT column
FROM input-table…)
ORDER BY col-name <DESC>;
QUIT;
Notes
• A multiple-value subquery can return more than one value from a column.
• A multiple-value subquery can be used in a WHERE or HAVING expression that contains an
IN operator.
Demo
1. Open the s104d02.sas program in the demos folder and find the Demo section.
2. Examine the Future Subquery section and run the query to find the states that reside in
Division 3. View the results.
proc sql;
select Division, Name
from sq.statepopulation
where Division = '3';
quit;
3. Using the results from the Future Subquery section, modify the outer query by entering the
State abbreviations ("IL", "IN","MI","OH","WI") in the WHERE clause. Run the query and view
the results.
proc sql;
create table Division3 as
select *
from sq.customer
where State in ("IL", "IN","MI","OH","WI");
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Subquery in WHERE and HAVING Clauses 4-13
4. Replace the values in parentheses with the query from step 2 by copying and pasting the query
inside the parentheses. The query is now the subquery. Be sure to remove the semicolon from
inside the parentheses. Highlight and run the query. Discuss the error in the log.
Note: This program will return a syntax error.
…
where State in (select Division, Name
from sq.statepopulation
where Division = '3');
quit;
5. Remove the Division column from the subquery. Highlight and run the query. Discuss the
results.
…
where State in (select Division, Name
from sq.statepopulation
where Division = '3');
quit;
6. Change the Division value from 3 to 6 in the subquery. Replace the 3 at the end of the new
table name in the CREATE TABLE statement to a 6. Highlight and run the query. Discuss the
results.
proc sql;
create table division6 as
select *
from sq.customer
where State in (select Name
from sq.statepopulation
where Division = '6');
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-14 Lesson 4 Subqueries
ANY Keyword
21
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Similar to the IN operator, you can also use the ANY expression for all returned values from the
subquery.
Scenario
New York
Florida
Which states have a
PopEstimate1 value
that is greater than
New York or Florida?
22
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Subquery in WHERE and HAVING Clauses 4-15
ANY Keyword
20629982,19641589
The ANY keyword is true when the value of the specified column
is greater than any of the values returned by the subquery.
23
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
MIN Function
19641589
You can also use the MIN function inside the subquery to
return the minimum PopEstimate1.
24
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
You can also use > or < to return values greater than any of the values or less than any of the
values.
In this example, you can also use the MIN function inside the subquery to return the minimum value
of PopEstimate1 and use a comparison operator to compare the values.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-16 Lesson 4 Subqueries
4.03 Activity
Open s104a03.sas from the activities fol der a nd perform the following tasks
to fi nd a l l s tates with a PopEstimate1 va l ue that i s l ower tha n the va lue for
New York or Fl ori da:
1. Compl ete the query us ing the ANY keyword or MAX s tatistic.
2. Run the query. How ma ny s tates have estimated populations lower tha n
New York or Fl ori da?
25
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Correlated Subqueries
Outer Query
SELECT …
FROM …
<WHERE …>
Dependent
<GROUP BY …>
<HAVING …> Correlated
<ORDER BY …>; subqueries are
Correlated resource intensive.
(SELECT …
FROM …
<WHERE …>)
27
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
The previous subqueries have been noncorrelated subqueries that are self-contained and that
execute independently of the outer query.
A correlated subquery is dependent on the outer query. A correlated subquery requires one or more
values to be passed to it by the outer query before the subquery can be resolved. This means that
PROC SQL must process the correlated subquery multiple times : once for each table row that the
outer query processes. Correlated subqueries tend to use more resources than noncorrelated
subqueries. You typically want to avoid a correlated subquery, and we will briefly discuss why.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Subquery in WHERE and HAVING Clauses 4-17
Correlated Subqueries
28
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
A correlated subquery is not stand-alone. It needs additional information from the main query.
Correlated Subqueries
Executing a correlated subquery
can be resource intensive.
select count(*) as TotalCustomer
from sq.customer as c
where '1' = (select Division
from sq.statepopulation as s
where s.Name = c.State);
Test WHERE
condition
29
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Conceptually, correlated subqueries are pretty tricky and resource intensive. Using correlated
subqueries reduces performance in your system. A better method is joining the tables and using a
WHERE clause to obtain the necessary rows.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-18 Lesson 4 Subqueries
Using Joins
select count(*) as TotalCustomer
from sq.customer as c
where '1' = (select Division
from sq.statepopulation as s
where s.Name = c.State);
30
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Subqueries can sometimes be replaced with joins. In this example, instead of the tricky and
resource-intensive correlated subquery, you can use a join to retrieve the results.
The join joins the sq.customer and sq.statepopulation tables on State abbreviations. Then the
query filters for rows where Division equals 1.
Syntax Summary
SELECT col-name, col-name
FROM input-table
WHERE column operator (SELECT col-name
FROM input-table…)
GROUP BY col-name
HAVING column operator (SELECT col-name
FROM input-table…);
WHERE and HAVING Subqueries
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Subquery in WHERE and HAVING Clauses 4-19
Practice
Level 1
1. Subquery That Returns a Single Value
The sq.statepopulation table contains estimated population statistics for every state in the US
and its territories. Which states have an estimated three-year population growth greater than the
average for all states?
a. Write a query that displays the average three-year population growth for all states. Use the
nPopChg3 column to calculate the average from the sq.statepopulation table.
Results
b. Use the query from step a to display the states that have a projected three-year population
growth greater that the overall average.
1) Include Name and nPopChg3 in the results.
2) Label nPopChg3 as Estimated Growth and format the values with commas.
3) Use the query from step a to subset the table.
4) Order the results by descending nPopChg3.
5) Add an appropriate title to the report.
Partial Results
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-20 Lesson 4 Subqueries
Level 2
2. Subquery with Multiple Functions
Determine which Texas customers have higher than average credit scores.
a. Using the sq.customer table, write a query to define the high-credit threshold as 2 standard
deviations greater than the mean of CreditScore for TX customers.
1) Use the following expression to create the column HighScore. This will be your future
subquery result.
sum(avg(creditscore),(2*std(creditscore))
2) Run the query and compare your results.
Results
b. Write a second query to create a report showing Texas customers whos e CreditScore value
is greater than the results from the previous query. Use the query from part a as a subquery.
1) Select CustomerID, FirstName, LastName, and CreditScore.
2) Subset the table for customers who are from the state of Texas and who have a greater
CreditScore value than the value from the query in part a.
3) Order the report by descending CreditScore.
4) Add an appropriate title.
Partial Results
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Subquery in WHERE and HAVING Clauses 4-21
Create a report that lists all countries and their forecasted percentage of Borrowed for health or
medical purposes (% age 15+) that reside in Europe or Central Asia and are considered high
income.
a. Using the sq.globalmetadata table, return all the country codes for the region of Europe &
Central Asia that are in the High Income group.
Partial Report
b. Write the second query using the sq.globalfindex table and view the estimated percentage
of population who borrowed for health or medical purposes (% age 15+) for the region of
Europe & Central Asia.
1) Select CountryCode and create two new columns that convert EstYear1 and EstYear3
to percentages.
• Calculate the columns by dividing EstYear1 and EstYear3 by 100. Name the new
columns EstPct1 and EstPct3 respectively and format them using the percent format.
2) Subset the table where IndicatorName is equal to Borrowed for health or medical
purposes (% age 15+), and use the previous query to subset the data by CountryCode
values in Europe and Central Asia.
Hint: You can use the UPCASE function to standardize character casing.
3) Order the report by descending EstYear1.
4) Add an appropriate title.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-22 Lesson 4 Subqueries
Partial Report
c. Which country in your report has the lowest EstPct1 value for next year?
4. Using a Subquery with Summarized Data
Determine which customers spend less on average utility bills than the overall utility bill average?
a. Use the sq.transactionfull table to create a report showing CustomerID and the average
utility payment amount, named UtilityAmt, for customers whose average utility payment is
less than the overall average utility payment for all customers.
Hint: Use the Type column to determine which transactions are utility payments .
1) Order by descending UtilityAmt.
2) Add an appropriate title to the report.
Results
b. How many customers in the sq.transaction table have lower than average utility payments?
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Subquery in WHERE and HAVING Clauses 4-23
Challenge
5. Using Nested Subqueries
Create a report that lists all countries that have a higher population than the average of all
countries and where the EstYear1 forecast for Outstanding housing loan (% age 15+) increases
in EstYear3.
a. Open s104p05.sas from the practices folder. Run the query to create the table
CountryEstPop. The table contains total estimated population for ages 15+ for each
country.
b. Write a query to find all countries in the CountryEstPop table with a higher population than
the mean of all countries.
Partial Results
c. Use the previous query as a subquery. Select all columns from the sq.globalfindex table
where IndicatorName is Outstanding housing loan (% age 15+), the EstYear1 forecast
increases in three years, EstYear1 and EstYear3 are not null, and the country has a
population estimate greater than the mean of all countries.
1) Convert EstYear1 and EstYear3 to a percent and format the values. Create a new
column to determine the percent increase by subtracting EstYear3 by EstYear1. Name
the column PctIncrease and format the values using a percent.
2) Order the results by descending PctIncrease. Add an appropriate title.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-24 Lesson 4 Subqueries
Results
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 In-Line View s (Query in the FROM Clause) 4-25
In-Line View
Outer Query
SELECT …
FROM (SELECT col-name
FROM … An in-line view acts
<WHERE …>) as a virtual table.
<WHERE …>
<GROUP BY …>
<HAVING …>
<ORDER BY …>;
34
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
In-Line View
proc sql;
select *
from (select CustomerID, State, Income
from sq.customer
where Income > 100000);
quit; You cannot use an
ORDER BY clause in
an in-line view.
Creates a virtual table to use in
the outer query.
35
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-26 Lesson 4 Subqueries
Scenario
12.0%
36
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 In-Line View s (Query in the FROM Clause) 4-27
37
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
38
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-28 Lesson 4 Subqueries
39
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 In-Line View s (Query in the FROM Clause) 4-29
Scenario
Use an in-line view to create a virtual table.
Files
• s104d03.sas
• customer – a SAS table that contains one row per customer
• statepopulation – a SAS table that contains estimated state populations for the next three years
Syntax
PROC SQL;
SELECT col-name, col-name
FROM (SELECT column,…
FROM input-table…)
WHERE expression
ORDER BY col-name <DESC>;
QUIT;
Notes
• An in-line view is a query in the FROM clause.
• An in-line view produces a virtual table that the outer query uses to select data.
• An in-line view can be referenced only in the query in which it is defined.
Demo
1. Open the s104d03.sas program in the demos folder and find the Demo section. Run the first
query to explore the statepopulation and customer tables.
2. Move to the next section, Temporary Table Solution. Discuss both queries in the section.
a. Run the first query to create the totalcustomer temporary table. View the results.
b. Run the second query to join the totalcustomer and sq.statepopulation tables and
calculate the new column PctCustomer that calculates the percentage of customers in each
state based on the current year's estimated population. View the results.
3. Copy only the query that is used to create the totalcustomer table. Move to the Using an In-
Line View section of the program. Paste the query (not including the CREATE TABLE
statement) in the FROM clause to create an in-line view. Be sure to remove the semicolon.
Highlight and run the query. View the syntax error in the log.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-30 Lesson 4 Subqueries
…
from (select State,count(*) as TotalCustomer
from sq.customer
group by State
order by TotalCustomer desc) as c inner join
sq.statepopulation as s
on c.State = s.Name
…
4. Remove the ORDER BY clause in the subquery. Run the query and view the results.
…
from (select State,count(*) as TotalCustomer
from sq.customer
group by State
order by TotalCustomer desc) as c inner join
sq.statepopulation as s
on c.State = s.Name
…
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 In-Line View s (Query in the FROM Clause) 4-31
…
from (select State,count(*) as TotalCustomer
from sq.customer
group by State)
…;
41
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Creating a View
42
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-32 Lesson 4 Subqueries
sq.totalcustomer
43
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
For sake of efficiency, it is recommended that you avoid using the ORDER BY clause in a query that
defines a view. Using the ORDER BY clause in a view definition forces PROC SQL to sort the data
every time that the view is referenced. Instead, you can use an ORDER BY clause in queries that
reference the view.
Using a VIEW
sq.totalcustomer
44
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 In-Line View s (Query in the FROM Clause) 4-33
Using a VIEW
sq.totalcustomer
The view executes the
stored query and
extracts the most
current data.
45
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
4.04 Activity
Open s104a04.sas from the activities fol der a nd perform the following tasks
to crea te a nd us e a vi ew:
1. Crea te a vi ew na med VWtotalcustomer from the query. Run the query
a nd exa mine the log.
2. Run the code i n the s ection Use the View in the PROCS Below. Whi ch
s ta te has the hi ghest number of cus tomers?
46
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-34 Lesson 4 Subqueries
proc sql;
create view sq.totalcustomer as
select State,count(*) as TotalCustomer
from customer
group by State;
quit;
PROC SQL expects the view to reside in the same
SAS library as the contributing table or tables.
S:\workshop\data
sq
48
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Scenario
s:\workshop\data s:\workshop
sq.customer totalcustomer
sq mkt
49
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 In-Line View s (Query in the FROM Clause) 4-35
Scenario
50
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
s:\workshop\data s:\workshop
customer customer
This violated the one-level naming convention in the CREATE VIEW query. When marketing used
the view, it retrieved the stored query. The stored query that we are using uses the one-level naming
convention. When the view was created, PROC SQL assumed that the customer data was in the sq
library. When marketing moved the view and executed a query using the view, the stored query
assumed that the customer table is now in the mkt library in s:\workshop. However, the customer
table is not in that location.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-36 Lesson 4 Subqueries
The scope of the libref is local to the view and does not conflict
with any identically named librefs in the SAS session.
52
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
By using a SAS enhancement, you can create a usable view that is stored in a different physical
location than its source tables. In other words, you can make the view portable.
You can embed a SAS LIBNAME statement or a SAS/ACCESS LIBNAME statement in a view by
using the USING LIBNAME clause. When PROC SQL executes the view, the stored query assigns
the libref. For SAS/ACCESS librefs, PROC SQL establishes a connection to a DBMS. The scope of
the libref is local to the view and does not conflict with any identically named librefs in the SAS
session. When the query finishes, the libref is disassociated. The connection to the DBMS is
terminated, and all data in the library becomes unavailable.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 In-Line View s (Query in the FROM Clause) 4-37
Views
Advantages
• avoid storing copies of large tables
• avoid a frequent refresh of table copies; when the underlying data
changes, a view surfaces the most current data
• combine data from multiple database tables and multiple libraries or
databases
• simplify complex queries
• prevent other users from inadvertently altering the query code
53
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Views
Disadvantages
• Views might produce different results each time they are accessed if the
data in the underlying data sources changes.
• Views can require significant resources each time that they execute.
With a view, you save disk storage space at the cost of extra CPU and
memory usage.
54
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-38 Lesson 4 Subqueries
Syntax Summary
SELECT …
FROM (SELECT col-name
FROM …
<WHERE …>) ;
In-Line View
CREATE VIEW table-name AS query
CREATE VIEW
CREATE VIEW …
USING LIBNAME libref engine "path";
USING Clause
55
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 In-Line View s (Query in the FROM Clause) 4-39
Practice
Level 1
6. In-Line View Summarizing CreditScore
Determine which customers have an extremely high credit score relative to other customers in
their ZIP code (Zip). Similar to the practice in the previous section, extremely high credit is
defined as greater than 2 standard deviations above the mean of CreditScore. However, rather
than use an overall high-credit threshold for all customers, we need to calculate the threshold for
each value of Zip. This can be accomplished using an in-line view.
a. Open s104p06.sas from the practices folder. Run the query to summarize the
HighZipCredit threshold for each Zip value for the first 1000 rows.
Partial Results
b. Use the query from step a as an in-line view to join with the sq.customer table.
1) Select c.CustomerID, c.Zip, and c.CreditScore from the sq.customer table, and select
s.HighZipCredit from the in-line view. Format the c.Zip column using the Z5. format.
2) Perform an inner join with the sq.customer table and the in-line view from step a. Give
the sq.customer table the alias c and the in-line view the alias s. Remove the INOBS=
option from the in-line view.
3) Use c.Zip = s.Zip as the join criteria.
4) Filter rows where the customer's c.CreditScore value is greater than the
s.HighZipCredit value.
5) Order the results by Zip and descending CreditScore.
6) Add an appropriate title to the report.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-40 Lesson 4 Subqueries
Partial Results
c. What is the last Zip value and the corresponding CreditScore value in your final report?
Level 2
7. Building a Complex Query with In-Line Views
Determine which employees have the highest salary for their job title in every state. Using the
sq.employee table, write a query to calculate the maximum Salary value for each value of
JobTitle within each state.
a. Using the sq.employee table, write a query to calculate the maximum salary for each job title
within each state.
1) Convert the values of the State column to uppercase and name the column State to
standardize the state code values. Select JobTitle and calculate the maximum salary.
Name the column MaxJobSalary.
2) Filter rows where the State is not null.
3) Group the results by State and JobTitle.
4) Order the results by State.
5) Add an appropriate title.
6) Run the query and compare your results.
Partial Results
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 In-Line View s (Query in the FROM Clause) 4-41
b. Using the query in step a as an in-line view, create a report by joining that result with the
sq.employee table to do the following:
1) Display EmployeeID, EmployeeName, State, JobTitle, and Salary for the highest paid
employee for each value of JobTitle in every state.
2) Perform an inner join with the sq.employee table and the in-line view from step a. Give
the sq.employee table the alias detail and the in-line view the alias summary.
Hint: Remove the ORDER BY clause when using an in-line view.
3) Use JobTitle, State, and Salary equal to MaxJobSalary as the join criteria.
4) Order the report by State and JobTitle.
5) Add an appropriate title and format the Salary values with a dollar sign and comma.
Partial Results
Challenge
8. Building a Complex Query Using a Join and Subquery
Generate a report of the total estimated number of individuals next year who made or received
digital payments in the past year (% age 15+) in South Asia to determine the best country to
promote a digital payment app.
a. Calculate the total estimated population for ages 15+ for countries in South Asia in the
sq.globalpop table.
1) Select the CountryCode, sum of EstYear1. Name the new column EstYear1Pop and
format using commas.
2) Filter the SeriesName column for estimated population greater than 15 years of age.
Use a subquery from the sq.globalmetadata table to include only CountryCode in
South Asia.
Hint: Use the LIKE operator in the WHERE clause to include rows for ages 15+.
3) Group by CountryCode.
4) Run the query and compare your results.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-42 Lesson 4 Subqueries
Results
b. Use the previous query as an in-line view to join by CountryCode and multiply the estimated
percentage of individuals who made or received digital payments in the past year (% age
15+) by the estimated population for next year.
1) Select the CountryCode and IndicatorName columns from the sq.globalfindex table.
Convert the EstYear1 value from sq.globalfindex to a percentage by dividing by 100
and then multiply by the EstYear1Pop value from the in-line view. Name the column
Estimate and format using commas.
Note: Multiplying the EstYear1 percentage of individuals by the EstYear1Pop total
population for the country returns an estimated number of individuals who used
digital payments.
2) Perform an inner join of sq.globalfindex and the in-line view from step a. Assign
sq.globalfindex the alias f, and assign the in-line view the alias pop. Use CountryCode
as the join criteria.
3) Filter the rows by the IndicatorName value of Made or received digital payments in the
past year (% age 15+).
4) Order the results by Estimate descending. Add an appropriate title.
Results
c. Which value of CountryCode has the highest estimated use of digital payments?
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Subquery in the SELECT Clause 4-43
58
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Scenario
12.0%
59
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-44 Lesson 4 Subqueries
State
PopEstimate1
/ 326477837
When using a subquery in the SELECT clause, you do not have to use a value from the table in the
outer query. You can retrieve values from other tables.
The remerge
select Name, PopEstimate1 / sum(PopEstimate1)
feature of SAS
as PctPop format=percent7.2
makes two from sq.statepopulation
passes through a order by PctPop desc;
table.
61
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Subquery in the SELECT Clause 4-45
Aggregate functions, such as the SUM function, can cause the same calculation to repeat for every
row. This occurs whenever PROC SQL remerges data. Remerging occurs whenever any of the
following conditions exist:
• The SELECT clause references a column that contains an aggregate function and other columns
that are not listed in the GROUP BY clause.
• The ORDER BY clause references a column that is not referenced by the SELECT clause.
When a query remerges data, PROC SQL displays a note in the log to indicate that data remerging
has occurred. PROC SQL runs an internal query to find the sum and then runs another internal
query to divide each state's population by the sum.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-46 Lesson 4 Subqueries
Scenario
Remerge summary statistics in SAS to find the percentage population of each state.
Files
• s104d04.sas
• statepopulation – a SAS table that contains estimated state populations for the next three years
Syntax
PROC SQL;
SELECT col-name, summary function(column)
FROM table;
QUIT;
Notes
• The SELECT clause references a column that contains an aggregate function and other columns
that are not listed in the GROUP BY clause.
Demo
1. Open the s104d04.sas program in the demos folder and find the Demo section. Run the query
to select the Name and PopEstimate1 columns from the sq.statepopulation table.
proc sql;
select Name, PopEstimate1
from sq.statepopulation;
quit;
2. In the SELECT clause, add the SUM function to sum PopEstimate1. Format the column using
the COMMA12 format. Run the query and examine log and results.
Note: All values are the sum of the entire PopEstimate1 column, whereas the Name and
PopEstimate1 columns have individual rows from the input table.
proc sql;
select Name, PopEstimate1, sum(PopEstimate1) format=comma12.
from sq.statepopulation;
quit;
3. Modify the SELECT clause by dividing PopEstimate1 and sum(PopEstimate1). Replace the
COMMA12. format with the PERCENT7.2 format. Name the new column PctPop. Run the query
and examine the log and results.
proc sql;
select Name,
PopEstimate1/sum(PopEstimate1) as PctPop format=percent7.2
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Subquery in the SELECT Clause 4-47
from sq.statepopulation;
quit;
4. Add an ORDER BY clause and sort the results by descending PctPop. Run the query and
examine the results.
proc sql;
select Name,
PopEstimate1/sum(PopEstimate1) as PctPop format=percent7.2
from sq.statepopulation
order by PctPop desc;
quit;
select Region,
sum(PopEstimate1) as TotalRegion format=comma14.
from sq.statepopulation;
63
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Remerging can be a powerful tool. However, it does not always produce the desired result. The most
common example is when you forget the GROUP BY clause in a query and specify a grouping
column and a summary function. Always check your log to help avoid this error and others.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-48 Lesson 4 Subqueries
ERROR: The query requires remerging summary statistics back with the
original data. This is disallowed due to the NOREMERGE proc option
or NOSQLREMERGE system option.
64
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Resubmitting the query with the NOREMERGE option in the PROC SQL statement produces no
output and results in an error message in the SAS log.
4.05 Activity
Open s104a05.sas from the activities fol der a nd perform the following tasks
to di s a ble remerging of s ummary s tatistics:
1. Exa mi ne a nd run the query. Exa mi ne the l og a nd the results. What note
do you s ee i n the l og?
2. Add the PROC SQL option NOREMERGE. Run the query. Di d i t run
s ucces sfully? Wha t wa s the error i n the l og?
3. Add a GROUP BY cl a use after the FROM cl a use a nd group by Region. Run
the query. Di d i t run s ucces sfully?
65
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Subquery in the SELECT Clause 4-49
Scenario
68
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
69
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-50 Lesson 4 Subqueries
Syntax Summary
70
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
• View the SAS paper Nifty • Read the SAS blog Building • Visit Using Subqueries to
Uses of SQL Reflexive Join an SQL subquery in SAS Select Data in the SAS
and Sub-query in SAS. Enterprise Guide. documentation.
71
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Subquery in the SELECT Clause 4-51
Practice
Level 1
9. Remerging Summary Statistics
Determine which states have the most estimated births for next year. Include a column showing
each state's births as the percent of national births.
a. Select the Name and Births1 columns. Create a new column named PctBirth by dividing
Births1 for each state by the sum of Births1 for all states. Format the new column using the
PERCENT format.
b. Order the results by descending PctBirth.
c. Add an appropriate title.
Partial Results
d. Which state has the highest percentage of estimated births for next year?
Level 2
10. Subquery in the SELECT Clause with an In-Line View
Find the top 10 countries that have the highest percentage of estimated global population using
the population estimates of ages 15+ in the sq.globalfull table.
a. Use the sq.globalfull table to write a query to sum the EstYear1Pop of all countries.
1) Use an in-line view to select the distinct CountryCode and EstYear1Pop values from
the sq.globalfull table.
Note: You must find the distinct estimated population of each country because the
data contains the estimated population for each country multiple times.
2) Sum the EstYear1Pop column and name the column new column EstPct. Format the
column using the COMMA format.
3) Run the query and compare your results.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-52 Lesson 4 Subqueries
Results
b. Create a new query using the query from step a in the SELECT clause to determine the
estimated global population percentage of each country.
1) Select the distinct CountryCode and ShortName values. Determine the percentage
population by creating a new column. Divide the EstYear1Pop value of each country by
the total value calculated in step a. Name the new column PctPop and format using the
PERCENT format.
2) Use the sq.globalfull table.
3) Order the results by PctPop descending.
4) Limit the results to the top 10 countries.
5) Add an appropriate title.
Results
Challenge
11. Remerging GROUP BY Summary Statistics
In the sq.statepopulation table, states in the United States are categorized by divisions.
Calculate the percentage of births for next year by each state for its division.
a. Use the sq.statepopulation table to write a query to sum the values of Births1 for each
division and divide the summarized value by the Births1 values of each state.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Subquery in the SELECT Clause 4-53
1) Select the Name, Division, and Births1 columns. Create two new calculated columns.
a) The first column should sum all Births1 values. Name the column TotalDivisionEst,
and format it using the COMMA format.
b) The second column should divide Births1 by the new column, TotalDivisionEst.
Name the column PctDivision, and format it using the PERCENT format.
2) Group the query by Division.
3) Order the results by Division and PctDivision descending.
4) Format the Births1 column using the COMMA format.
5) Add an appropriate title.
Partial Results
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-54 Lesson 4 Subqueries
4.4 Solutions
Solutions to Practices
1. Subquery That Returns a Single Value
/*s104s01.sas*/
/*a*/
proc sql;
select mean(nPopChg3)
from sq.statepopulation;
quit;
/*b*/
title "States with an estimated 3-Year Population Growth";
title2 "Greater than the Overall Average";
proc sql;
select Name, nPopChg3 label="Estimated Growth" format=comma16.
from sq.statepopulation
where nPopChg3 > (select mean(nPopChg3)
from sq.statepopulation)
order by nPopChg3 desc;
quit;
title;
Which state has the lowest estimated growth in your results? MA, with an estimated growth of
38,903
2. Subquery with Multiple Functions
/*s104s02.sas*/
/*a*/
proc sql;
select sum(avg(CreditScore),(2*std(CreditScore))) as HighScore
from sq.customer
where state='TX';
quit;
/*b*/
title 'Texas Customers with Higher than Average Credit Scores';
proc sql;
select CustomerID, FirstName, LastName, CreditScore
from sq.customer
where state="TX" and
CreditScore >
(select sum(avg(CreditScore),(2*std(CreditScore)))
as HighScore
from sq.customer
where state='TX')
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.4 Solutions 4-55
/*a*/
proc sql;
select CountryCode
from sq.globalmetadata
where upcase(Region)='EUROPE & CENTRAL ASIA' and
upcase(IncomeGroup)='HIGH INCOME';
quit;
/*b*/
title 'Estimated Forecasted Percentages';
title2 'Countries in Europe & Central Asia with High Incomes';
title3 'Borrowed Money for Health or Medical Purposes';
proc sql;
select CountryCode,
EstYear1/100 as EstPct1 format=percent7.2,
EstYear3/100 as EstPct3 format=percent7.2
from sq.globalfindex
where IndicatorName='Borrowed for health or medical purposes
(% age 15+)' and
CountryCode in
(select CountryCode
from sq.globalmetadata
where upcase(Region)='EUROPE & CENTRAL ASIA'
and upcase(IncomeGroup)='HIGH INCOME')
order by EstYear1 desc;
quit;
title;
Which country in your report has the lowest EstPct1 value for next year? FIN
4. Using a Subquery with Summarized Data
/*s104s04.sas*/
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-56 Lesson 4 Subqueries
where Type="Utilities")
order by UtilityAmt desc;
quit;
title;
How many customers in the sq.transaction table have lower than average utility payments? Five
5. Using Nested Subqueries
title "Countries with a Higher Population Estimate than the Mean";
title2 "Number of Outstanding Housing Loan Increases from Year1 to
Year3";
/*b*/
proc sql;
select CountryCode
from CountryEstPop
where EstPop > (select mean(EstPop)
from CountryEstPop);
quit;
/*c*/
proc sql;
select CountryCode, IndicatorName,
EstYear1/100 as EstYear1 format=percent7.2,
EstYear3/100 as EstYear3 format=percent7.2,
calculated EstYear3 - calculated EstYear1
as PctIncrease format=percent7.2
from sq.globalfindex
where IndicatorName = "Outstanding housing loan (% age 15+)"
and EstYear1 is not null
and EstYear3 is not null
and EstYear1 < EstYear3
and CountryCode in
(select CountryCode
from CountryEstPop
where EstPop > (select mean(EstPop)
from CountryEstPop))
order by PctIncrease desc;
quit;
title;
Which country had the largest value for PctIncrease? BGD
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.4 Solutions 4-57
/*a*/
title 'Maximum Salary for Each Job in every State';
proc sql;
select upcase(State) as State, JobTitle,
max(Salary) as MaxJobSalary
from sq.employee
where State is not null
group by State, JobTitle
order by State;
quit;
title;
/*b*/
title 'Employees with Highest Salary for their Job in every State';
proc sql;
select detail.EmployeeID, detail.EmployeeName, detail.State,
detail.JobTitle, detail.Salary format=dollar12.
from sq.employee as detail inner join
(select upcase(State) as State,
JobTitle, max(Salary) as MaxJobSalary
from sq.employee
where State is not null
group by State, JobTitle) as summary
on detail.Jobtitle=summary.JobTitle and
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-58 Lesson 4 Subqueries
detail.State=Summary.State and
detail.Salary=Summary.MaxJobSalary
order by detail.State, detail.JobTitle;
quit;
title;
Who is the last employee in the final report? The last employee is Pongor, Katherine .
8. Building a Complex Query Using a Join and Subquery
/*s104s07.sas*/
/*a*/
title 'Estimated Population for Next Year in South Asia ';
title2 'Ages 15+';
proc sql;
select CountryCode,
sum(EstYear1) as EstYear1Pop format=comma14.
from sq.globalpop
where SeriesName not like '%00-04' and
SeriesName not like '%05-09' and
SeriesName not like '%10-14' and
CountryCode in (select CountryCode
from sq.globalmetadata
where Region = 'South Asia')
group by CountryCode;
quit;
title;
/*b*/
title 'Estimated Population who Made or Received Digital Payments
in the Past Year';
title2 'South Asia Ages 15+';
proc sql;
select f.CountryCode, f.IndicatorName,
((f.EstYear1/100) * pop.EstYear1Pop)
as Estimate format=comma14.
from sq.globalfindex as f inner join
(select CountryCode, sum(EstYear1)
as EstYear1Pop format=comma14.
from sq.globalpop
where SeriesName not like '%00-04' and
SeriesName not like '%05-09' and
SeriesName not like '%10-14' and
CountryCode in (select CountryCode
from sq.globalmetadata
where Region = 'South Asia')
group by CountryCode) as pop
on f.CountryCode = pop.CountryCode
where IndicatorName = 'Made or received digital payments in
the past year (% age 15+)'
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.4 Solutions 4-59
/*Alternate Solution*/
title 'Estimate Percentage of Births by Each State';
proc sql;
select Name, Births1,
Births1/(select sum(Births1)
from sq.statepopulation)
as PctBirth format=percent7.2
from sq.statepopulation
order by PctBirth desc;
quit;
title;
Which state has the highest percentage of estimated births for next year? CA (California)
10. Subquery in the SELECT Clause with an In-Line View
/*s104s10.sas*/
/*a*/
title 'Estimated Population of Ages 15+ for Next Year';
proc sql;
select sum(EstYear1Pop) as EstPct format=comma16.
from (select distinct CountryCode, EstYear1Pop
from sq.globalfull);
quit;
title;
/*b*/
title 'Top 10 Countries by Estimate Population';
title2 'Ages 15+';
proc sql outobs=10;
select distinct CountryCode, ShortName,
EstYear1Pop/
(select sum(EstYear1Pop) as EstPct format=comma16.
from (select distinct CountryCode, EstYear1Pop
from sq.globalfull))
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-60 Lesson 4 Subqueries
as PctPop format=percent7.2
from sq.globalfull
order by PctPop desc;
quit;
title;
Which country has the highest estimated population of individuals 15 or over? China
11. Remerging GROUP BY Summary Statistics
/*s104s11.sas*/
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.4 Solutions 4-61
…
where PopEstimate1 > (select avg(PopEstimate1),
"Average Estimated Population"
from sq.statepopulation);
12
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
…
where PopEstimate1 > (select avg(PopEstimate1),
'Average Population'
from sq.statepopulation);
13
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-62 Lesson 4 Subqueries
Subquery
17
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
49 states
26
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.4 Solutions 4-63
2. Run the code i n the s ecti on Use the View in the PROCS Below. Whi ch
s tate has the hi ghest number of cus tomers? California
47
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
continued...
4.05 Activity – Correct Answer
1. Wha t note do you s ee in the log?
NOTE: The query requires remerging summary statistics back with the
original data.
ERROR: The query requires remerging summary statistics back with the original
data. This is disallowed due to the NOREMERGE proc option or NOSQLREMERGE
system option.
66
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-64 Lesson 4 Subqueries
67
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 5 Set Operators
5.1 Introduction to Set Operators ...................................................................................... 5-3
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Introduction to Set Operators 5-3
3
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
4
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-4 Lesson 5 Set Operators
Res ult
Query 1
Set 1
Set Operators
Res ult
Query 1
Set 2
5
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
A set operator vertically combines the intermediate result sets from two queries to produce a final
result set.
Business Data
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Introduction to Set Operators 5-5
Scenario
Which customers
responded to Complete list of all
either phone or customer responses
email?
7
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
saleslist
List of target
Target Customer List customers and their
contact information.
8
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-6 Lesson 5 Set Operators
Response Tables
Customer responses
by email and phone.
Email Respondents Phone Respondents
salesemail salesphone
9
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Scenario
saleslist salesemail
salesphone
10
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Introduction to Set Operators 5-7
11
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Outer Union
Intersect Except Union
12
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-8 Lesson 5 Set Operators
13
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
You must place a semicolon after the last SELECT statement only.
SELECT query…
UNION | EXCEPT | INTERSECT | OUTER UNION <ALL> <CORR>
SELECT query…;
14
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 INTERSECT, EXCEPT, and UNION 5-9
INTERSECT Operator
salesemail salesphone
CustomerID EmailResp CustomerID SalesRep PhoneResp
1939774314 Accepted 1939774314 121038 Declined
1999302252 Declined 1999302252 120145 Call Back Which customers
1963960449 Declined 1999302252 120145 Accepted have responded
1908694347 Accepted 1963960449 120145 Declined
to both email and
phone sales?
1960311448 Accepted 1987175132 120145 Declined
1905044343 Declined 1970095442 121137 Accepted
16
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
INTERSECT Operator
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-10 Lesson 5 Set Operators
18
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
19
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 INTERSECT, EXCEPT, and UNION 5-11
INTERSECT Operator
title "Customers Who Have Responded to
Both Email and Phone Sales";
proc sql;
select CustomerID
from sq.salesemail
intersect
select CustomerID
from sq.salesphone;
quit;
20
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
5.01 Activity
Open s105a01.sas from the activities fol der a nd perform the following to
fi nd uni que cus tomers who ha ve responded by phone a nd email:
1. Run the fi rs t queries to previ ew the sq.salesemail a nd sq.salesphone
tabl es. Exa mine the col umns i n both tables.
2. In the Intersect s ecti on, exa mine a nd run the query. Di d the query run
s ucces sfully? Why not?
3. Add the CORR keyword a fter the INTERSECT s et operator. Run the query.
Di d the query run s uccessfully? Why?
21
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-12 Lesson 5 Set Operators
24
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
You can use an INNER JOIN to produce identical results. In this example, we use an inner join to
find all matches of the two tables. The DISTINCT keyword removes any duplicate rows resulting in
the same result as the INTERSECT set operator.
EXCEPT Operator
saleslist salesemail
UserID CustomerID … CustomerID EmailResp
[email protected] 1939774314 … 1939774314 Accepted
[email protected] 1958716829 … 1999302252 Declined
[email protected] 1999302252 … 1963960449 Declined
[email protected] 1963960449 … 1908694347 Accepted
List of customers
1960311448 Accepted
who have not
[email protected] 1987175132 … 1905044343 Declinedresponded to our
[email protected] 1970095442 …
sales email
[email protected] 1908694347 …
barmablanton521@n/a.com 1918638906 …
[email protected] 1960311448 …
[email protected] 1905044343 …
25
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 INTERSECT, EXCEPT, and UNION 5-13
EXCEPT Operator
26
…
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
27
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-14 Lesson 5 Set Operators
28
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
EXCEPT Operator
title "Customers Who Haven't Responded to the Sales Email";
proc sql;
select CustomerID
from sq.saleslist
except
select CustomerID
from sq.salesemail;
quit;
29
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 INTERSECT, EXCEPT, and UNION 5-15
5.02 Activity
Open s105a02.sas from the activities fol der a nd perform the following to
fi nd a ll target cus tomers who have not responded to our s a les phone ca ll:
1. Run the fi rs t queries to previ ew the sq.saleslist a nd sq.salesphone
tabl es. Exa mine the col umns i n both tables.
2. Compl ete the query to fi nd a ll customers from the sq.saleslist tabl e who
ha ve not res ponded to our s ales ca ll i n sq.salesphone.
3. How ma ny cus tomers ha ve not res ponded to our phone ca ll?
30
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
32
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
You can use a subquery to produce identical results. In this example, we use a subquery to return all
CustomerID values in the salesphone table. Then we use the WHERE clause to subset for all
CustomerID values in the saleslist table that are not in the list returned by our subquery. The
DISTINCT keyword removes any duplicate rows, resulting in the same result as the EXCEPT set
operator.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-16 Lesson 5 Set Operators
UNION Operator
salesemail salesphone
CustomerID EmailResp CustomerID SalesRep PhoneResp
1939774314 Accepted 1939774314 121038 Declined
1999302252 Declined 1999302252 120145 Call Back
1963960449 Declined 1999302252 120145 Accepted Find the total
number of unique
1908694347 Accepted 1963960449 120145 Declined
customers who
1960311448 Accepted 1987175132 120145 Declined
responded to either
1905044343 Declined 1970095442 121137 Accepted phone or email.
1955264298 121038 Declined
33
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
UNION Operator
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 INTERSECT, EXCEPT, and UNION 5-17
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-18 Lesson 5 Set Operators
Scenario
Use the UNION set operator to count the number of unique customers who responded to an email or
a phone sales attempt.
Files
• s105d01.sas
• salesemail – a SAS table that contains email response information from targeted customers
• salesphone – a SAS table that contains phone response information from targeted customers
Syntax
PROC SQL;
SELECT query…
UNION <ALL> <CORR>
SELECT query…;
QUIT;
Notes
• The UNION set operator produces all unique rows from both queries.
• The CORR keyword overlays columns that have the same name in both tables. When used with
EXCEPT, INTERSECT, and UNION, CORR suppresses columns that are not in both tables.
Demo
1. Open the s105d01.sas program in the demos folder and find the Demo section. Under Explore
the SALESEMAIL and SALESPHONE Tables, run the two queries. Discuss the tables and the
desired outcome.
proc sql;
select *
from sq.salesemail;
select *
from sq.salesphone;
quit;
2. In the next section, complete the query to find all unique customers who responded to either an
email or phone call. Begin with the SELECT statement and select all columns from the
sq.salesemail table.
proc sql;
select *
from sq.salesemail
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 INTERSECT, EXCEPT, and UNION 5-19
3. Use the UNION set operator followed by another SELECT statement to select all columns from
the sq.salesphone table. Run the query and examine the syntax error.
proc sql;
select *
from sq.salesemail
union
select *
from sq.salesphone;
quit;
4. Add the keyword CORR after the UNION set operator. Run the query and examine the log and
results.
Note: The CORR keyword aligns the columns that have the same name in both tables and
removes any columns not found in both tables.
proc sql;
select *
from sq.salesemail
union corr
select *
from sq.salesphone;
quit;
5. Remove the CORR keyword, and specify the CustomerID column in both SELECT clauses. Run
the query and examine the log and results.
proc sql;
select CustomerID
from sq.salesemail
union corr
select CustomerID
from sq.salesphone;
quit;
6. Add another SELECT statement at the first line in the SQL procedure. Use the COUNT(*)
function to count all rows. Name the column TotalNum. Add a FROM clause and use the
previous query as a subquery in the FROM clause (in-line view). Be sure to add parentheses
around the subquery. Run the query and examine the results.
proc sql;
select count(*) as TotalNum
from (select CustomerID
from sq.salesemail
union
select CustomerID
from sq.salesphone);
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-20 Lesson 5 Set Operators
Final Final
Result Set Result Set
38
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
The UNION set operator works in a different order than the INTERSECT and EXCEPT operators.
The UNION set operator first combines results sets and then removes duplicate rows. INTERSET
and EXCEPT remove duplicate rows and then combine result sets.
39
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
In these cases, a note is written in the log: “WARNING: A table has been extended with null columns
to perform the UNION set operation.”
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 INTERSECT, EXCEPT, and UNION 5-21
40
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
41
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
The ALL keyword is applicable only in the INTERSECT, UNION, and EXCEPT set operators.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-22 Lesson 5 Set Operators
saleslist
Create a list of
EXCEPT customers who have
not responded to
either email or phone
UNION
sales attempts.
salesemail
salesphone
42
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
43
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 INTERSECT, EXCEPT, and UNION 5-23
proc sql;
select CustomerID
from sq.saleslist
except
(select customerid
from sq.salesemail
union
select customerID
from sq.salesphone);
quit;
We have two customers who
have not responded to any of
our sales attempts.
44
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Syntax Summary
SELECT query…
UNION | EXCEPT | INTERSECT <ALL> <CORR>
SELECT query…;
Set Operators
45
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-24 Lesson 5 Set Operators
Practice
Level 1
1. Using the EXCEPT SET Operator
Use the EXCEPT SET operator to generate a report listing of merchants in the sq.merchant
table who are not listed in the sq.transaction table.
a. Select MerchantID from the sq.merchant table.
b. Use the EXCEPT SET operator.
c. Select MerchantID from the sq.transaction table.
d. Order the results by MerchantID.
e. Add an appropriate title.
Results
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 INTERSECT, EXCEPT, and UNION 5-25
c. Write another query to create the last row of the report for all AU employees.
1) Follow the pervious steps. Replace US with AU.
2) Run the query and compare your results.
Results
d. Combine the results of all three queries into a single query using UNION SET operators.
1) Order the results by descending total salary.
2) Add an appropriate title.
Results
Level 2
3. Using the EXCEPT SET Operator with the DISTINCT Keyword
Using the sq.statepopulation table, generate a list of state codes for states without any
customers.
a. Write a query to list the unique Name values in the sq.statepopulation table. This list
represents all available states.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-26 Lesson 5 Set Operators
b. Write a separate query to list the unique State values from the sq.customer table. This list
represents all states where customers reside.
c. Combine the queries using the EXCEPT SET operator to display states with no customers.
Add an appropriate title.
Results
Challenge
4. Using SET Operators to Summarize Data
Determine what percentage of customers have accepted either the phone or email offer. The
sq.saleslist table contains the full list of customers presented with an offer. The sq.salesemail
and sq.salesphone tables contain email and phone responses.
a. Write a query to use a UNION SET operation to combine the CustomerID values for
customers who accepted either the phone or email offer. This will form the basis of your in-
line view.
Results
b. Write a query to count the number of customers who have accepted either offer (step a
above).
1) Use the following formula to calculate the rate of offer acceptance:
select count(*)/(select count(*) from sq.saleslist)
from (your in-line view code from step 1)
2) Name the calculated column PctResp. Format it as a percent with no decimals.
3) Label the new column Offer Acceptance Rate.
4) Add an appropriate title to the report.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 INTERSECT, EXCEPT, and UNION 5-27
Results
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-28 Lesson 5 Set Operators
Desired Results
salesemail
48
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
49
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 OUTER UNION 5-29
50
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-30 Lesson 5 Set Operators
Scenario
Use the OUTER UNION set operator with the CORR keyword to combine the salesemail and
salesphone tables.
Files
• s105d02.sas
• salesemail – a SAS table that contains email response information from targeted customers
• salesphone – a SAS table that contains phone response information from targeted customers
Syntax
PROC SQL;
SELECT query…
OUTER UNION <CORR>
SELECT query…;
QUIT;
table(RENAME=(old-name-1=new-name-1 …))
Notes
• The OUTER UNION set operator produces all unique rows from both queries.
• The CORR keyword overlays columns that have the same name in both tables. When used with
OUTER UNION, CORR includes columns that are not in both tables
• The RENAME= data set option changes the name of a column.
Demo
1. Open the s105d02.sas program in the demos folder and find the Demo section. Run the query
to perform an OUTER UNION concatenation of the sq.salesemail and sq.salesphone tables.
Examine the results.
proc sql;
select *
from sq.salesemail
outer union
select *
from sq.salesphone;
quit;
2. Add the CORR keyword after the OUTER UNION set operator. Run the query and examine the
results.
proc sql;
select *
from sq.salesemail
outer union corr
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 OUTER UNION 5-31
select *
from sq.salesphone;
quit;
3. Add the RENAME= option after the salesemail table and rename the column EmailResp to
Resp. After the salesphone table, rename the column PhoneResp to Resp. Run the query and
examine the results.
proc sql;
select *
from sq.salesemail(rename=(EmailResp=Resp))
outer union corr
select *
from sq.salesphone(rename=(PhoneResp=Resp));
quit;
4. Remove the RENAME= option after each table. Modify the first SELECT clause and select the
column CustomerID. In the first clause, also select EmailResp and change the column name to
Resp using the AS keyword. Modify the second SELECT clause and select the CustomerID,
SalesRep, and PhoneResp columns. Change the PhoneResp column name to Resp using the
AS keyword. Run the query and examine the results.
proc sql;
select CustomerID, EmailResp as Resp
from sq.salesemail(rename=(EmailResp=Resp))
outer union corr
select CustomerID, SalesRep, PhoneResp as Resp
from sq.salesphone(rename=(EmailResp=Resp));
quit;
5. Add the CREATE TABLE statement to create a table from the query results. Name the table
response1.
proc sql;
create table response1 as
select CustomerID, EmailResp as Resp
from sq.salesemail
outer union corr
select CustomerID, SalesRep, PhoneResp as Resp
from sq.salesphone;
quit;
6. Find the SAS DATA Step section. Briefly describe how you can retrieve the same results using
the SAS DATA step.
data response2;
length Resp $12;
set sq.salesemail(rename=(EmailResp=Resp))
sq.salesphone(rename=(PhoneResp=Resp));
run;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-32 Lesson 5 Set Operators
52
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
The OUTER UNION set operator is similar to the DATA step with multiple tables listed in the SET
statement. OUTER UNION in SQL and the DATA step with the SET statement can produce similar
results.
Syntax Summary
SELECT query…
OUTER UNION <CORR>
SELECT query…;
OUTER UNION
table(RENAME=(old-name-1=new-name-1 …))
RENAME Data Set Option
53
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 OUTER UNION 5-33
54
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-34 Lesson 5 Set Operators
Practice
Level 1
5. Using the OUTER UNION SET Operator
Create a report that shows the email and phone offer responses along with the sales
representative if available.
a. Using the sq.salesphone table as input, write a query to list the following columns:
1) CustomerID
2) a new column named Response based on the existing PhoneResp column
3) SalesRep labeled Sales Rep
4) a new column named Channel with the constant text Phone
Run the query and compare your results.
Results
b. Using the sq.salesemail table as input, write a query to list the following columns:
1) CustomerID
2) a new column named Response based on the existing EmailResp column
3) a new column named Channel with the constant text Email
Run the query and compare your results.
Results
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 OUTER UNION 5-35
c. Combine the two query results using the OUTER UNION SET operation.
1) Be mindful of the column alignment. Use the SET operator modifiers as needed.
2) Order the results by CustomerID and Response.
3) Add an appropriate title.
4) Run the query to generate the final results.
Results
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-36 Lesson 5 Set Operators
5.4 Solutions
Solutions to Practices
1. Using the EXCEPT SET Operator
/*s105s01.sas*/
/*a*/
proc sql;
select 'Total Paid to All Employees',
sum(Salary) format=dollar16.,
count(*) as Total
from sq.employee;
quit;
/*b*/
proc sql;
select 'Total Paid to US Employees',
sum(Salary) format=dollar16.,
count(*) as Total
from sq.employee
where upcase(Country)='US';
quit;
/*c*/
proc sql;
select 'Total Paid to AU Employees',
sum(Salary) format=dollar16.,
count(*) as Total
from sq.employee
where upcase(Country)='AU';
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Solutions 5-37
/*d*/
title 'Country Specific Salary Information';
proc sql;
select 'Total Paid to All Employees',
sum(Salary) format=dollar16.,
count(*) as Total
from sq.employee
union
select 'Total Paid to US Employees',
sum(Salary) format=dollar16.,
count(*) as Total
from sq.employee
where upcase(Country)='US'
union
select 'Total Paid to AU Employees',
sum(Salary) format=dollar16.,
count(*) as Total
from sq.employee
where upcase(Country)='AU'
order by 2 desc;
quit;
title;
Which country has more employees? The US has more employees.
3. Using the EXCEPT SET Operator with the Distinct Keyword
/*s105s03.sas*/
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-38 Lesson 5 Set Operators
from sq.salesemail
where EmailResp = 'Accepted'
union
select CustomerID
from sq.salesphone
where PhoneResp = 'Accepted');
quit;
title;
What is the acceptance rate? 50%
5. Using the OUTER UNION Set Operator
/*s105s05.sas*/
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Solutions 5-39
22
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
proc sql;
select * CORR matches by
from sq.salesemail column names and
intersect corr removes
suppresses
columns
columnsnot
select * not
in in
both
both
tables.
tables
from sq.salesphone;
quit;
23
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-40 Lesson 5 Set Operators
Five customers
31
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 6 Using and Creating
Macro Variables in SQL
6.1 Creating User-Defined Macro Variables........................................................................ 6-3
6.2 Creating Data-Driven Macro Variables with PROC SQL ................................................ 6-6
Demonstration: Using a PROC SQL Data-Driven Macro Variable ................................ 6-10
Demonstration: Concatenating Values in Macro Variables ........................................... 6-17
Practice............................................................................................................... 6-22
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating User-Defined Macro Variables 6-3
proc sql;
create table CustomerGA as
select CustomerID, Employed, Race,
Married, State, CreditScore
from sq.customer
where State="GA" and To change to the values
CreditScore > 650; to NC and 700, you
quit; would have to manually
adjust the query.
GA NC 650 700
3
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
NY WV
A SAS macro variable
stores text that is
ma cro substituted in your code
va ri a ble when it runs. It’s like
700
an automatic
775 find-and-replace.
725
ma cro
va ri a ble
4
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, US A. ALL RIGHTS RESERVED.
6-4 Lesson 6 Using and Creating Macro Variables in SQL
Macro language statements begin with a % sign. To create a user-defined macro variable, you use
the %LET statement. In the %LET statement, you specify the name of the macro variable, followed
by an equal sign and then the text string that you want to store.
It is recommended that you do not include quotation marks when you define the macro variable
value. Use quotation marks when necessary after the macro variable is resolved.
proc sql;
create table Customer&State as When the query executes,
select CustomerID, Employed, Race, values of State and
Married, State, CreditScore CreditMin are replaced by
from sq.customer GA and 650.
where State="&State" and
CreditScore > &CreditMin;
quit;
6
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Macro variables must be enclosed in double quotation marks when the macro variables are used as
a character string.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating User-Defined Macro Variables 6-5
6.01 Activity
Open s106a01.sas from the activities fol der a nd perform the following tasks
to crea te a nd us e user-defined macro va ri ables:
1. Run the progra m. Exa mi ne the log a nd results . Confirm tha t the name of
the newl y crea ted table i s customerga a nd contains 957 rows .
2. Repl ace the va lues GA a nd 650 i n the %LET s tatements wi th NC a nd 700.
Run the progra m. Exa mi ne the log a nd results . What i s the name of the
newl y crea ted table? How ma ny rows ?
3. Cha nge the double quotation marks i n the WHERE cl a use expression to
s i ngle quotation ma rks. Run the query. How ma ny rows a re i n the new
ta bl e?
7
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, US A. ALL RIGHTS RESERVED.
6-6 Lesson 6 Using and Creating Macro Variables in SQL
Create macro
PROC SQL Query variables from values
returned by a query.
12
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
In PROC SQL, you can create macro variables from values returned by a query. You can then
reference those stored macro variables in other PROC steps, DATA steps, titles, and footnotes.
Think of it this way: Build once, use many.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating Data- Driven Macro Variables w ith PROC SQL 6-7
How can we
Avera ge Es ti mated Population for Next Yea r i s: ??? document the
average value of
PopEstimate1 in the
title?
13
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
1 2
14
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, US A. ALL RIGHTS RESERVED.
6-8 Lesson 6 Using and Creating Macro Variables in SQL
SELECT col-name
INTO :macvar
FROM table;
select avg(PopEstimate1)
The value is stored in the
into :AvgEst1
from sq.statepopulation; AvgEst1 macro variable.
Macro Variables
Name Value
AvgEst1 6278420
15
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
If the query produces more than one row of output, the macro variable will contain only the value
from the first row.
If the query has no rows in its output, then the macro variable is not modified. If the macro variable
does not exist yet, it will not be created.
%put &=AvgEst1;
%put &=AvgEst1;
AVGEST1= 6278420
16
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
A %PUT statement that contains &= followed by the name of the macro variable displays the name
and value of a macro variable.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating Data- Driven Macro Variables w ith PROC SQL 6-9
17
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, US A. ALL RIGHTS RESERVED.
6-10 Lesson 6 Using and Creating Macro Variables in SQL
Scenario
Use PROC SQL to create and use a macro variable.
Files
• s106d01.sas
• statepopulation – a SAS table that contains estimated yearly population by state
Syntax
%PUT &=macvar;
OPTIONS SYMBOLGEN;
Notes
• A macro variable stores a text string that can be substituted into a SAS program.
• If the query produces more than one row of output, the macro variable will contain only the value
from the first row.
• Macro variables can be referenced in a program by preceding the macro variable name with
& (ampersand).
• The %PUT statement writes the value of the macro variable to the SAS log.
• The SYMBOLGEN option displays a note identifying the macro variable and its resolved value
in the log.
Demo
1. Open the s106d01.sas program in the demos folder and find the Demo section. Run the first
query in the Create the Macro Variable Using PROC SQL section. View the results.
proc sql;
select avg(PopEstimate1)
into :AvgEst1
from sq.statepopulation;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating Data- Driven Macro Variables w ith PROC SQL 6-11
2. In the same query, add the NOPRINT option in the PROC SQL statement. At the end of the
query, add a %PUT statement to view the value of the newly created macro variable in the SAS
log. Run the query and view the SAS log.
proc sql noprint;
select avg(PopEstimate1)
into :AvgEst1
from sq.statepopulation;
quit;
%put &=AvgEst1;
3. Move to STEP 2 and apply the newly created macro variable AvgEst1 in the TITLE statement
and the WHERE clause. Run the query. View the log and results.
Note: To format a macro variable in the TITLE statement, you can use
%left(%qsysfunc(putn(&AvgEst1,dollar16.))). Scroll to the bottom of the program file for
this demo to view the solution code.
title "Average Estimated Population for Next Year: &AvgEst1";
proc sql;
select Name, PopEstimate1
from sq.statepopulation
where PopEstimate1 > &AvgEst1;
quit;
4. Above the query, add the statement OPTIONS SYMBOLGEN to see the value that was
substituted in the code. Below the query, turn of the options by adding the OPTIONS
NOSYMBOLGEN statement. Run the query and examine the log.
options symbolgen;
…sql query
options nosymbolgen;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, US A. ALL RIGHTS RESERVED.
6-12 Lesson 6 Using and Creating Macro Variables in SQL
19
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
The SYMBOLGEN option displays the results of resolving macro variable references in the SAS log.
This option is useful for debugging.
Macro Variables
The INTO clause can create multiple Name Value
macro variables from query results. AvgEst1 6278420
MinEst1 584290
MaxEst1 39209127
TotalCount 52
20
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
You list the names of the macro variables to be created in the INTO clause. Precede each macro
variable name with a colon. PROC SQL generates the values that are assigned to these variables by
executing the query.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating Data- Driven Macro Variables w ith PROC SQL 6-13
21
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
22
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
When storing a value in a single macro variable, PROC SQL preserves leading or trailing blanks. By
default, when creating macro variables that contain numeric values, the values are formatted using
the BEST8. format, which preserves leading blanks for numbers that have fewer than eight digits.
This can also cause the value to be displayed using scientific notation for numbers that have more
than eight digits. You can use another format, such as the w. format, to display the value without
using scientific notation.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, US A. ALL RIGHTS RESERVED.
6-14 Lesson 6 Using and Creating Macro Variables in SQL
23
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
6.02 Activity
Open s106a02.sas from the activities fol der a nd perform the following tasks
to s tore a number l arger tha n eight di gits i n a macro va ri able:
1. Run the progra m i n STEP 1 to crea te the ma cro va ri ables MaxPop a nd
TotalCtry. The ma cro va ri a bles s tore the estimated ma ximum three-year
popul ation estimate for a country a nd the tota l number of countri es.
Vi ew the l og. Notice tha t MaxPop s tores a va lue i n s cientific notation.
2. Exa mi ne STEP 2 of the progra m to fi nd the country wi th the ma xi mum
es timated three-year population. Run the program to us e the ma cro
va ri a bles tha t you crea ted. Confirm tha t no rows were returned.
3. Add the FORMAT=10. col umn modifier to the max(estYear3Pop) col umn
i n STEP 1. Run the entire program. Whi ch country ha s the l argest three -
yea r es timated population?
24
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating Data- Driven Macro Variables w ith PROC SQL 6-15
29
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
%put &=StateList;
48 %put &=StateList;
STATELIST=IL,IN,MI,OH,WI
30
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, US A. ALL RIGHTS RESERVED.
6-16 Lesson 6 Using and Creating Macro Variables in SQL
31
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
32
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating Data- Driven Macro Variables w ith PROC SQL 6-17
Scenario
Use PROC SQL to create a concatenated list of state values separated by a comma, with each
value enclosed in quotation marks.
Files
• s106d02.sas
• statepopulation – a SAS table that contains estimated yearly population by state
• customer – a SAS table that contains one row per customer
Syntax
PROC SQL;
SELECT col-name
INTO :macvar SEPARATED BY "delimiter"
FROM table;
QUIT;
QUOTE(argument)
STRIP(argument)
Notes
• You can concatenate the values of one column into one macro variable.
• Use the SEPARATED BY keywords to specify a character to delimit the values in the macro
variable.
• The QUOTE function adds double quotation marks to a character value.
• The STRIP function returns a character string with all leading and trailing blanks removed.
Demo
1. Open the s106d02.sas program in the demos folder and find the Demo section. Run the code
in Step 1: Create Macro Variables. Examine the results and log to see the values of the newly
created macro variables &Division and &StateList.
%let Division=3;
proc sql;
select Name
into :StateList SEPARATED BY ","
from sq.statepopulation
where Division = "&Division";
quit;
%put &=Division;
%put &=StateList;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, US A. ALL RIGHTS RESERVED.
6-18 Lesson 6 Using and Creating Macro Variables in SQL
2. Discuss the query under Step 2: Use Macro Variables. The table being created will end with the
value of the macro variable &Division. The customer table will attempt to be subset by a list of
values from the &StateList macro variable. Run the query and examine the errors.
Note: The name of the new table will end with the value of the macro variable.
options symbolgen;
proc sql;
create table division&Division as
select *
from sq.customer
where State in (&StateList);
quit;
options nosymbolgen;
3. Move back to Step 1. Add the QUOTE function around the Name column. Run the query
and %PUT statements. Examine the results and log.
proc sql;
select quote(Name)
into :StateList SEPARATED BY ","
from sq.statepopulation
where Division = "&Division";
quit;
%put &=Division;
%put &=StateList;
4. Add the STRIP function inside the QUOTE function. Run the query and %PUT statements.
Examine the results and log.
proc sql;
select quote(strip(Name))
into :StateList SEPARATED BY ","
from sq.statepopulation
where Division = "&Division";
quit;
%put &=Division;
%put &=StateList;
5. Move to Step 2. Run the query. Examine the results and log.
options symbolgen;
proc sql;
create table division&Division as
select *
from sq.customer
where State in (&StateList);
quit;
options nosymbolgen;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating Data- Driven Macro Variables w ith PROC SQL 6-19
6. Move to Step 1. Change the value in the %LET statement to 9 and add the NOPRINT option
in the PROC SQL statement to finalize the program. Run the entire program. Examine the
results and log.
%let Division=9;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, US A. ALL RIGHTS RESERVED.
6-20 Lesson 6 Using and Creating Macro Variables in SQL
FORMAT=$QUOTEw.
STATELIST="IL","IN","MI","OH","WI"
34
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Strings larger than six characters are truncated when you use the QUOTE format. To ensure that
values are not truncated, you should be sure to specify a width as long as the longest value +2.
When using other formats, be sure to check the SAS documentation.
Syntax Summary
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating Data- Driven Macro Variables w ith PROC SQL 6-21
• Take the SAS Functions by • Take the SAS Macro Language • Visit the SAS Macro
Example course. 1: Essentials course. Language section on the ELP
• Read SAS Functions by • Read SAS Macro Programming for a list of SAS papers and
Example. Made Easy. blogs.
• View the SAS Function • View the SAS 9.4 Macro
Documentation. Language: Reference
• Create your own functions Documentation.
with PROC FCMP.
36
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, US A. ALL RIGHTS RESERVED.
6-22 Lesson 6 Using and Creating Macro Variables in SQL
Practice
Level 1
1. Creating a Macro Variable from an SQL Query
Using the sq.statepopulation table, write a program that dynamically returns states in a
specified region that have a three-year estimated population change greater than the median
population change for the entire region.
a. Write a query to create the macro variable MedianEst to store the value of the median
nPopChg3 of Region 1.
1) Calculate the median of nPopChg3.
2) Use the INTO clause to create a macro variable named MedianEst. Use the TRIMMED
keyword.
3) Use the sq.statepopulation table.
4) Filter the results for rows in the Region column with the value of 1. The value 1 is
character.
5) Add the NOPRINT option in the PROC SQL statement.
6) Below the query view the new macro variable in the log using %PUT.
%put &=MedianEst;
7) Run the query and compare your log.
b. Write another query to select states and their three-year population change in Region 1 that
have a higher estimate than the median of all states in that region.
1) Select the Name and nPopChg3 columns. Format nPopChg3 so that values are
displayed with commas.
2) Use the sq.statepopulation table.
3) Filter rows for states in region 1 that have an nPopChg3 value greater than the macro
variable &MedianEst created in step 1.
4) Order by nPopChg3 descending.
5) Add the following titles:
a) title 1 – States in Region 1 with a 3-Year Estimated Population Change Greater
than the Median
b) title 2 – Median Estimate: &MedianEst
6) Run the query and compare your results.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating Data- Driven Macro Variables w ith PROC SQL 6-23
Results
c. At the beginning of your program, create a macro variable named RegionNum to specify the
region to analyze.
1) Use the %LET statement to create the macro variable RegionNum and set the value
equal to 1.
2) In you program, replace every location of the character value 1 with &RegionNum. Make
sure that the macro variable is enclosed in double quotation marks.
Hint: You need to do this replacement in the WHERE clause in each query and in the
TITLE statement.
3) Run the entire program and confirm that the results are the same as step b.
d. In the %LET statement replace the value 1 with the value 2. Run the entire program.
Results
e. How many states in Region 2 have a higher three-year estimated population change than the
median of all states in that region?
Level 2
2. Creating a Macro Variable with a List of Values from an SQL Query
Write a program that dynamically determines a list of countries in a specified region. Then
determine the percentage of people who have borrowed from a financial institution or used a
credit card (% age 15+), and whether it is increasing or decreasing from the year 1 estimate to
the year 3 estimate. Use the sq.globalmetadata table for country region information and the
sq.globalfindex table for country financial and population information.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, US A. ALL RIGHTS RESERVED.
6-24 Lesson 6 Using and Creating Macro Variables in SQL
a. Using sq.globalmetadata, write a query to create a macro variable that lists the distinct
values of CountryCode for a specified region. Ensure that the values are enclosed in
quotation marks and separated by commas.
1) Create a macro variable named RegionValue with the value South Asia.
2) Use the RegionValue macro variable in a query to create a data-driven macro variable
named Countries that lists the country codes in the specified region. The value list
should be comma separated, and each value should be enclosed in quotation marks.
3) Use the NOPRINT option in the PROC SQL statement.
4) Use %PUT below your query to view the value of the macro variable Countries.
5) Run the query and %PUT statement. Compare your results.
Results
b. Write another query to select the countries in the region that you specified. Convert the
whole numbers to percentages, and determine whether the estimated value is increasing,
decreasing, or unknown using the sq.globalfindex table.
1) Select the CountryCode and IndicatorName columns. Create three new columns.
a) First divide EstYear1 by 100 and name the column EstYear1PCT. Format using
percentages.
b) Follow the previous step for EstYear3.
c) Use the CASE expression to determine whether the value is Increasing, Decreasing,
or Unk nown. The value is Unk nown when the value is null. Name the column
Forecast.
Hint: In the CASE expression, make sure that the first WHEN tests whether a value
is missing.
2) Filter rows by CountryCode using the Countries macro variable and whether
IndicatorName is equal to Borrowed from a financial institution or used a credit card
(% age 15+).
3) Order by Forecast.
4) Add the title Countries in &RegionValue.
5) Run the query and compare the results.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating Data- Driven Macro Variables w ith PROC SQL 6-25
Results
c. Change the RegionValue macro variable at the beginning of the program to North America.
Run the program.
Results
b. Write another query to calculate the percentage of customers in each ZIP code for the
specified state.
1) Select Zip and calculate the percentage of customers in a ZIP code by counting the
number of customers with that Zip value and dividing it by the TotalCust macro variable.
Name the new column PctZip and format it using a percent.
2) Filter the rows by State using the macro variable StateValue.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, US A. ALL RIGHTS RESERVED.
6-26 Lesson 6 Using and Creating Macro Variables in SQL
c. Change the value in the macro variable State from NC to TX. Run the program and compare
the results.
Partial Results
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating Data- Driven Macro Variables w ith PROC SQL 6-27
Challenge
4. Splitting One Table into Many
Write a program that dynamically creates a new table for each distinct value in a column. Use
the Region column in the sq.globalmetadata table.
Hint: Visit the Extended Learning page and view the SAS blog How to split one data set into
many in the SAS Macro Language section or follow the direct link:
https://blogs.sas.com/content/sasdummy/2015/01/26/how-to-split-one-data-set-into-many/
How many tables were created?
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, US A. ALL RIGHTS RESERVED.
6-28 Lesson 6 Using and Creating Macro Variables in SQL
6.3 Solutions
Solutions to Practices
1. Creating a Macro Variable from an SQL Query
/*s106s01.sas*/
/*a*/
proc sql noprint;
select median(nPopChg3)
into :MedianEst trimmed
from sq.statepopulation
where Region="1";
quit;
%put &=MedianEst;
/*b*/
title "States in Region 1 with a 3-Year Estimated Population
Change Greater than the Median";
title2 "Median Estimate: &MedianEst";
proc sql;
select Name,nPopChg3 format=comma14.
from sq.statepopulation
where Region="1" and
nPopChg3 > &MedianEst
order by nPopChg3 desc;
quit;
title;
/*c*/
%let RegionNum=2;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.3 Solutions 6-29
/*a*/
%let RegionValue=South Asia;
proc sql noprint;
select quote(strip(CountryCode))
into :Countries separated by ","
from sq.globalmetadata
where Region="&RegionValue";
quit;
%put &=Countries;
/*b*/
title "Countries in &Region";
proc sql;
select CountryCode, IndicatorName,
EstYear1/100 as EstYear1PCT format=percent7.2,
EstYear3/100 as EstYear3PCT format=percent7.2,
case
when (EstYear1 is null or EstYear3 is null) then "Unknown"
when calculated EstYear1PCT < calculated EstYear3PCT
then "Increasing"
when calculated EstYear1PCT > calculated EstYear3PCT
then "Decreasing"
else "No Change"
end as Forecast
from sq.globalfindex
where CountryCode in (&Countries) and
IndicatorName="Borrowed from a financial institution or
used a credit card (% age 15+)"
order by Forecast;
quit;
title;
/*c*/
%let Region=North America;
proc sql noprint;
select quote(strip(CountryCode))
into :Countries separated by ","
from sq.globalmetadata
where Region="&Region";
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, US A. ALL RIGHTS RESERVED.
6-30 Lesson 6 Using and Creating Macro Variables in SQL
%put &=Countries;
/*a*/
%let StateValue=NC;
proc sql noprint;
select count(*) as Total
into :TotalCust trimmed
from sq.customer
where State="&StateValue";
quit;
%put &=TotalCust;
/*b*/
title "Total Customers in &StateValue:
%left(%qsysfunc(putn(&TotalCust,comma15.)))";
title2 "Percentage of Customers by Zip";
title3 "Report Created on %left(%qsysfunc(today(),weekdate.))";
proc sql;
select Zip, count(*)/&TotalCust as PctZip format=percent8.2
from sq.customer
where State="&StateValue"
group by Zip
order by PctZip desc;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.3 Solutions 6-31
title;
/*c*/
%let StateValue=TX;
%let table=sq.globalmetadata;
%let column=Region;
%macro runSteps;
&allsteps.;
%mend;
%runSteps;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, US A. ALL RIGHTS RESERVED.
6-32 Lesson 6 Using and Creating Macro Variables in SQL
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.3 Solutions 6-33
proc sql;
create table Customer&State as
select CustomerID, Employed, Race,
Married, State, CreditScore The new table contains
from sq.customer
where State="&State" and rows where State equals
CreditScore > &CreditMin; GA and CreditScore is
quit; greater than 650.
8
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
continued...
6.01 Activity – Correct Answer
2. Wha t i s the na me of the newly crea ted ta ble? The table name is
CUSTOMERNC. How ma ny rows ? 700 rows
Change the values as
necessary.
%let State=NC;
%let CreditMin=700;
9
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, US A. ALL RIGHTS RESERVED.
6-34 Lesson 6 Using and Creating Macro Variables in SQL
%let State=NC;
%let CreditMin=700;
The macro variable does not resolve in
proc sql; single quotation marks. The WHERE
create table Customer&State as
clause is searching for the character
select CustomerID, Employed, Race,
Married, State, CreditScore string '&State'.
from sq.customer
where State='&State' and
CreditScore > &CreditMin;
quit;
10
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
continued...
6.02 Activity – Correct Answer
1.
25
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.3 Solutions 6-35
continued...
6.02 Activity – Correct Answer
2. select distinct CountryCode, ShortName, Region,
EstYear3Pop format=comma16.
from sq.globalfull
where EstYear3Pop = &MaxPop;
1141324637 = 1.1413E9
26
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
continued...
6.02 Activity – Correct Answer
3.
proc sql noprint;
select max(EstYear3Pop) format=10.,
count(distinct CountryCode)
into :MaxPop trimmed, :TotalCtry trimmed
from sq.globalfull;
quit; The The
FORMAT
FORMATcolumn
columnmodifier
modifier
will
%put &=MaxPop &=TotalCtry; formats
format the
thestored
storedvalue
value of
ofthe
the
macro
macrovariable.
variable
27
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, US A. ALL RIGHTS RESERVED.
6-36 Lesson 6 Using and Creating Macro Variables in SQL
28
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 7 Accessing DBMS Data
with SAS/ACCESS®
7.1 Overview of SAS/ACCESS Technology ........................................................................ 7-3
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Overview of SAS/ACCESS Technology 7-3
SAS/ACCESS Technology
3
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
SAS provides data access to more of your data sources so that you can make better decisions
faster. These interfaces are out-of-the-box solutions that provide enterprise data access and
integration between SAS and third-party databases. SAS/ACCESS interfaces enable your SAS
solutions to read, write, and update data no matter what native databases or platforms you use.
SAS/ACCESS Technology
4
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-4 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
For example, if you need to read data in Teradata, there is SAS/ACCESS Interface to Teradata. If
you need to read Oracle, there is SAS/ACCESS Interface to Oracle.
There are many factors affecting the optimization of your programs when working with DBMSs. The
best advice is to read the SAS/ACCESS documentation for your specific DBMS. Then try multiple
solutions to a program and benchmark the programs to see which program is running more
efficiently.
Connecting to DBMS
6
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Overview of SAS/ACCESS Technology 7-5
SQL pass-through is known as explicit pass-through. The LIBNAME statement is known as implicit
pass-through.
The PRODUCT_STATUS procedure returns a list of the SAS Foundation products that are installed
on your system, along with the version numbers of those products. It provides a quick method to
determine whether a SAS product is available for your use. The results from PROC
PRODUCT_STATUS are returned to the SAS log.
PROC PRODUCT_STATUS;
RUN;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-6 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
Scenario
DATA Step
PROC Step
Legacy Oracle SQL
Programs
Visualizations
8
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Scenario
9
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
The customer data resides in your company's Oracle database, and you are already familiar with
native Oracle SQL. You would like to take pass native Oracle SQL from SAS to the database for
processing.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 SQL Pass-Through Facility 7-7
Process native
Native DBMS syntax
database SQL
10
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
The SQL pass-through facility enables you to send DBMS-specific SQL statements directly to a
DBMS for execution. The pass-through facility uses a SAS/ACCESS interface engine to connect to
the DBMS. Therefore, you must have SAS/ACCESS software installed for your DBMS.
1 2 3
Establish a connection Use native DBMS syntax to Terminate the
with the DBMS retrieve data to be used in a connection to the DBMS
PROC SQL query
11
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-8 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
PROC SQL;
CONNECT TO DBMS-name <AS alias> (DBMS-connection-options);
SELECT column-list
2
Retrieves and uses DBMS
FROM CONNECTION TO DBMS-name|alias data in a PROC SQL query
(DBMS-query);
DISCONNECT FROM DBMS-name|alias;
Native DBMS query
QUIT;
proc sql;
connect to oracle(user=sas_user pw=sastest
path=localhost);
select UserID, Income format=dollar16., State
from connection to oracle
(select UserID, Income, State
from customer
where Income is not null Make your
order by Income desc
fetch first 10 rows only); connection to Oracle
disconnect from oracle; using your specific
quit; DBMS options.
13
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 SQL Pass-Through Facility 7-9
Option Specifies
USER= Oracle user name
PW= Oracle password associated with the Oracle user name
PATH= Oracle driver, node, and database, or a database alias
For specific DBMS connection information, view the SAS documentation. For specific values of
connectivity options or table names, contact your DBMS administrator.
proc sql;
connect to oracle(user=sas_user pw=sastest
path=localhost);
select UserID, Income format=dollar16., State
from connection to oracle
(select UserID, Income, State
from customer
where Income is not null The SELECT
order by Income desc statement in PROC
fetch first 10 rows only); SQL returns results
disconnect from oracle; from the native
quit;
Oracle query.
14
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-10 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
proc sql;
connect to oracle(user=sas_user pw=sastest
path=localhost);
select UserID, Income format=dollar16., State
from connection to oracle
(select UserID, Income, State
from customer
where Income is not null
order by Income desc Native Oracle SQL
fetch first 10 rows only); syntax is sent to the
disconnect from oracle; DBMS.
quit;
This query selects the Oracle DBMS customer table and is executed directly in Oracle. The DBMS
processes the syntax as if you are coding directly in the DBMS. The DBMS results are read by SAS
and processed based on the SELECT statement. You cannot use features specific to SAS inside the
parentheses. Everything inside the parenthesis is SQL that is native to the Oracle DBMS.
When using a different schema from the default, the database table in the pass -through query is
specified with the syntax schema.table_name.
The FETCH FIRST n ROWS ONLY syntax was used in Oracle Database Express Edition Release
18.4.0.0.0 (18c).
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 SQL Pass-Through Facility 7-11
proc sql;
connect to oracle(user=sas_user pw=sastest
path=localhost);
select UserID, Income format=dollar16., State
from connection to oracle
(select UserID, Income, State
from customer
where Income is not null
order by Income desc It’s good practice to
fetch first 10 rows only); disconnect from the
disconnect from oracle; DBMS.
quit;
16
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
You can close the connection to the DBMS by using one of the following methods:
• submitting a DISCONNECT statement
• terminating the SQL procedure (for example, with a QUIT statement)
SQL Pass-Through
proc sql;
1 connect to oracle(user=sas_user pw=sastest
path=localhost);
2 select UserID, Income format=dollar16., State
from connection to oracle
(select UserID, Income, State Native DBMS query
from customer
where Income is not null
order by Income desc
fetch first 10 rows only);
3 disconnect from oracle;
quit;
17
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-12 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
table view
18
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Query results from the SQL pass-through must be saved to a SAS table or view to be used in
subsequent SAS processes.
The CREATE VIEW statement stores the pass-through query but does not check the validity of the
query. It is important to execute the view after it has been created.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 SQL Pass-Through Facility 7-13
Scenario
Use an SQL pass-through query to a Microsoft Access database using the SAS/ACCESS Interface
to PC Files engine.
Files
• s107d01.sas
• customer – a Microsoft Access table that contains customer information
Syntax
PROC SQL;
CONNECT TO DBMS-name <AS alias> (DBMS-connection-options);
SELECT col-name, col-name
FROM CONNECTION TO DBMS-name|alias
(DBMS-query);
DISCONNECT FROM DBMS-name|alias;
QUIT;
Notes
• The CONNECT statement establishes the connection to the DBMS.
• The SELECT statement selects the results of the DBMS query.
• The DISCONNECT statement closes the connection to the DBMS.
Demo
1. Open the s107d01.sas program in the demos folder and find the Demo section. If you have
not already done so, run the libname.sas program to define the PATH macro variable.
2. Run the Microsoft Access query. Discuss the syntax error.
3. Add a CONNECT TO PCFILES statement above the SELECT statement. After PCFILES,
add the PATH= and DBPASSWORD=SASTEST options in parentheses. End the statement
with a semicolon.
proc sql;
connect to pcfiles(path="&path/database/SQL_DB.accdb"
dbpassword=sastest);
select top 10, UserID, Income, State
from customer
order by Income desc;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-14 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
4. Add the SELECT statement after the CONNECT TO statement, and select all columns using an
asterisk. Add the FROM CONNECTION TO component and reference the pcfiles database.
Enclose the original Microsoft Access query in parentheses.
proc sql;
connect to pcfiles(path="&path/database/SQL_DB.accdb"
dbpassword=sastest);
select * from connection to pcfiles
(select top 10, UserID, Income, State
from customer
order by Income desc);
quit;
5. Add the DISCONNECT FROM statement to disconnect from the pcfiles database.
proc sql;
connect to pcfiles(path="&path/database/SQL_DB.accdb"
dbpassword=sastest);
select * from connection to pcfiles
(select top 10, UserID, Income, State
from customer
order by Income desc);
disconnect from pcfiles;
quit;
6. Run the SQL pass-through query and view the results.
7. In the SELECT statement, remove the asterisk and add the columns UserID, Income, and
State. Format the Income column using the DOLLAR16. format. Run the query and view the
results.
proc sql;
connect to pcfiles(path="&path/database/SQL_DB.accdb"
dbpassword=sastest);
select UserID, Income format=dollar16., State
from connection to pcfiles
(select top 10, UserID, Income, State
from customer
order by Income desc);
disconnect from pcfiles;
quit;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 SQL Pass-Through Facility 7-15
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-16 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
7.01 Activity
Open s107a01.sas from the activities fol der a nd perform the following tasks
to pa s s Mi crosoft Acces s SQL to the da tabase:
1. Exa mi ne the native DBMS query. Run the entire query. Di d i t produce a n
error?
2. In the WHERE cl a us e, replace IS MISSING wi th IS NULL. Run the entire
query. Di d i t run s uccessfully?
20
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
22
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 SQL Pass-Through Facility 7-17
23
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Syntax Summary
PROC SQL;
CONNECT TO DBMS-name <AS alias> (DBMS-connection-options);
SELECT column-list
FROM CONNECTION TO DBMS-name|alias
(DBMS-query);
DISCONNECT FROM DBMS-name|alias;
QUIT;
SQL Pass-Through
24
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-18 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
SAS Engine
26
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
The SAS/ACCESS LIBNAME engine translates your PROC SQL syntax to native DBMS SQL when
possible. This is important because SAS SQL and native DBMS SQL can differ. So if you are
working with multiple DBMSs, you can learn SAS SQL instead of learning native DBMS SQL
implementations, and let the engine do the work for you.
SAS library
27
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.3 SAS/ACCESS LIBNA ME Statement 7-19
28
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Submit a LIBNAME statement with a CLEAR option to release the DBMS and associated resources.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-20 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
Scenario
Use the SAS/ACCESS LIBNAME statement to read a table in Microsoft Access.
Files
• s107d02.sas
• customer – a Microsoft Access table that contains customer information
Syntax
Notes
The SAS/ACCESS LIBNAME statement
• uses the SAS/ACCESS engine to assign a libref to a DBMS
• enables you to reference a DBMS object directly in a DATA step or SAS procedure
• has many options to control the connection to the DBMS.
Demo
1. Open the s107d02.sas program in the demos folder and find the Demo section. If you have not
already done so, run the libname.sas program to define the PATH macro variable.
2. Begin by viewing all available libraries in your current session.
Note: Your libraries might differ. Notice that there is no library named db.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.3 SAS/ACCESS LIBNA ME Statement 7-21
3. Complete the LIBNAME statement to define a library named db that uses the PCFILES engine.
Add the PATH= option to connect to the SQL_DB.accdb Microsoft Access database, and add
the DBPASSWORD= option using the password sastest. Highlight the LIBNAME statement and
run the selected code. Use the navigation pane to expand the db library.
Note: In SAS Enterprise Guide, click Libraries and select Refresh to update the library list.
libname db pcfiles path="&path/database/SQL_DB.accdb"
dbpassword=sastest;
4. Review the query below the LIBNAME statement. Add the library db to the beginning of the
customer table name, and apply the DOLLAR16. format to the Income column.
libname db pcfiles path="&path/database/SQL_DB.accdb"
dbpassword=sastest;
libname db clear;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-22 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
30
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.3 SAS/ACCESS LIBNA ME Statement 7-23
31
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Behind the scenes, the SAS/ACCESS engine attempts to convert your SAS code to an SQL query
that is passed to the DBMS. This enables as much work to be done in the DBMS as possible.
SAS Engine
If SAS fails to pass the query (or parts of the query) to the DBMS, then SAS processes the code.
Things like DATA step merge, specific functions, specific data set options, or combining different
data sources must be done in SAS. The data is transferred between the two.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-24 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
SAS Engine
For large DBMS tables, this can result in slow performance,
because large data volume might need to be returned to SAS.
33
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
In order for PROC SQL code to be converted to SQL, the code must meet these basic requirements:
• refer to a single SAS/ACCESS LIBNAME engine
• use SAS functions that can be translated into DBMS functions
See the documentation for your specific DBMS for more information.
To examine the SQL that SAS/ACCESS submits, use the following option:
The SASTRACE= option enables you to examine the SQL that the SAS/ACCESS engine submits
to the DBMS.
• ',,,d' – Specifies that all SQL statements sent to the DBMS are sent to the log.
• ',,,s' – Specifies that a summary of timing information for calls made to the DBMS is sent
to the log.
The SASTRACELOC= option defines where to send the SASTRACE information.
The NOSTSUFFIX option limits the amount of information displayed in the log.
For more information about how to determine where the processing is occurring, visit the SAS
documentation or the ELP.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.3 SAS/ACCESS LIBNA ME Statement 7-25
34
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
For more information about DBMS-specific references, see the SAS/ACCESS ® 9.4 for Relational
Databases: Reference, Ninth Edition documentation. You can also find the direct link in the Course
Links section on the ELP.
7.02 Activity
Open a brows er a nd perform the following tasks to vi ew SAS DBMS -Specific
Reference documentation:
1. In a web brows er, a ccess SAS Help a t
http://s upport.s as.com/documentation.
2. Under Popular Documentation, s el ect Programming: SAS 9.4 and Viya.
3. Scrol l down until you fi nd the Accessing Data s ection. Inside the s ection,
fi nd SAS/ACCESS a nd s elect Relational Databases.
4. Scrol l down a nd find the DBMS-Specific Reference s ection a nd s elect a
reference of your choi ce.
5. Revi ew the topics i n your DBMS-s pecific reference. Cl ick the Passing SAS
Functions to l i nk a nd review the SAS functions tha t pass to your DBMS
for proces s ing.
35
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-26 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
Syntax Summary
36
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.4 FEDSQL Procedure 7-27
Vendor Neutral
ANSI SQL
38
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
PROC FEDSQL is not a replacement for PROC SQL. On the contrary, both are tools that are used
for specific scenarios. They offer different strengths for different situations. But in the end, they are
both an implementation of SQL and have some slight differences in syntax . But when you know one,
you can easily transition to another.
ANSI 3 Compliant
First major revision Last major revision
to the ANSI to the ANSI
Standard (SQL-2) Standard (SQL-3)
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-28 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
The FEDSQL procedure is a SAS proprietary implementation of ANSI SQL:1999 (SQL-3) core
standard. It provides ANSI 1999 core compliance features and proprietary extensions.
For more information about PROC FEDSQL and the ANSI Standard, see "FedSQL and the ANSI
Standard" in the SAS® 9.4 FedSQL Language Reference, Fifth Edition documentation. You can also
find the direct link in the Course Links section on the ELP.
40
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
PROC FEDSQL supports many more data types than previous SAS SQL implementations.
Traditional DBMS access through SAS/ACCESS LIBNAME engines translates target DBMS data
types to and from two legacy SAS data types: SAS numeric and SAS character. PROC FEDSQL
processes ANSI data types at native precision using a threaded driver instead of the typical data
access and translation by the LIBNAME engine.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.4 FEDSQL Procedure 7-29
PROC SQL
ANSI Type Resulting SAS Type Default Length
CHAR(n) Character 8
VARCHAR(n) Character 8
INTEGER Numeric 8
Traditional DBMS access translates target
SMALLINT Numeric 8 DBMS data types to SAS data types.
DECIMAL Numeric 8
NUMERIC Numeric 8
FLOAT Numeric 8
REAL Numeric 8
DOUBLE PRECISION Numeric 8
DATE Numeric 8
42
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
PROC SQL can work only with the data types defined in SAS: numeric and character. When working
with DBMSs, SAS translates ANSI data types to SAS data types.
For example, when working with a DBMS that has data defined as integer or float , that is not a
problem for PROC SQL it converts the value to a SAS numeric. The precision values of SAS
numeric values are accurate to approximately 15 digits.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-30 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
For more information about data types for SAS data, see "Data Types for SAS Data Sets" in the
SAS® 9.4 FedSQL Language Reference, Fifth Edition documentation. You can also find the direct
link in the Course Links section on the ELP.
For more information about the numerical accuracy in SAS software, see "Numerical Accuracy in
SAS Software" in the SAS® 9.4 FedSQL Language Reference, Fifth Edition documentation. You can
also find the direct link in the Course Links section on the ELP.
DATA TYPES
BIGINT FLOAT(p) TIME(p)
BINARY(n) INTEGER TIMESTAMP(p)
CHAR(n) NCHAR(n) TINYINT
DATE NVARCHAR(n) VARBINARY(n)
DOUBLE REAL VARCHAR(n)
DECIMAL|NUMERIC(p,s) SMALLINT
43
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.4 FEDSQL Procedure 7-31
44
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
PROC FEDSQL provides a scalable, threaded, high-performance way to access, manage, and
share relational data in multiple data sources. When possible, PROC FEDSQL queries are optimized
with multi-threaded algorithms in order to resolve large-scale operations.
FedSQL DICTIONARY
COLUMNS tables provide an
CATALOGS in-depth look into your
COLUMN_ database tables and
STATISTICS columns.
TABLES
STATISTICS
45
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
A PROC FEDSQL DICTIONARY table is a Read-only table that contains information about columns,
tables, and catalogs, as well as statistics about tables and their associated indexes.
PROC FEDSQL DICTIONARY tables can give you a more in-depth look into your database tables
and columns.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-32 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
For more information about PROC FEDSQL DICTIONARY tables, see "FedSQL DICTIONARY
Tables" in the SAS® 9.4 FedSQL Language Reference, Fifth Edition documentation. You can also
find the direct link in the Course Links section on the ELP.
QUIT;
46
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
47
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.4 FEDSQL Procedure 7-33
proc fedsql;
select State,
count(*) as TotalCustomer
from mkt.customer
where CreditScore > 700 Reference the table.
group by State
order by TotalCustomer desc;
quit;
48
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
7.03 Activity
Open s107a03.sas from the activities fol der a nd perform the following tasks
to perform a PROC FEDSQL query:
1. Exa mi ne a nd run the query. Di d i t produce a n error?
2. In the WHERE cl a us e, replace the double quotation marks a round NC
wi th single quotation marks. Run the query. Di d i t run s uccessfully?
49
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-34 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
For debugging purposes, you can see that PROC FEDSQL submits to the DBMS by specifying the
undocumented IPTRACE option. This option causes PROC FEDSQL to write the SQL sent to the
RDBMS in the log as the "Retextualized child query." If the query is not accepted by the DBMS,
IPTRACE writes the error message generated by the RDBMS in the log. PROC FEDSQL continues
to reformulate the query and try again until the DBMS executes the query and returns a result set.
IPTRACE then reports how much of the original query was pushed down into the DBMS.
Another method that prints the query plan is the _METHOD option. For more information about
_METHOD, see "_METHOD FedSQL Option" in the Base SAS ® 9.4 Procedures Guide, Seventh
Edition documentation. You can also find the direct link in the Course Links section on the E LP.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.4 FEDSQL Procedure 7-35
52
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
SAS Platform
SAS Viya
• an open, cloud-enabled, analytic run-time environment
53
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
SAS Viya is the latest enhancement of the SAS Platform. It is an open, cloud-enabled, analytic run-
time environment with a number of supporting services.
One of those supporting services is SAS Cloud Analytic Services, or CAS. CAS provides a powerful
in-memory engine that delivers blazing speed to accurately process your big data. It uses scalable,
high-performance, multi-threaded algorithms to rapidly perform analytical processing on in-memory
data of any size.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-36 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
Mnemonics
Limited
Column INTO
Modifiers
Limited SAS
Functions
Limited
Options Calculated
Remerge
Summary
Statistics
54
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Includes many non-ANSI standard SAS Includes very few non-ANSI SAS
enhancements enhancements
55
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
For a summary document about PROC SQL versus PROC FEDSQL, see "PROC SQL vs. PROC
FEDSQL Summary" on the ELP.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.4 FEDSQL Procedure 7-37
Syntax Summary
56
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
57
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-38 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
Case Study
58
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
https://communities.sas.com/sas-training
59
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.5 Solutions 7-39
7.5 Solutions
Solutions to Activities and Questions
connect to pcfiles(path="&path\database\SQL_DB.accdb"
dbpassword=sastest);
select UserID, Income format=dollar16., State
from connection to pcfiles
(select top 10, UserID, Income, State
from customer
where BankID is null A native ACCESS SQL
order by Income desc); query requires IS NULL.
disconnect from pcfiles;
21
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
proc fedsql;
select UserID, Income, State Single quotation marks
from sq.customer are ANSI standard.
where State='NC'
order by Income desc
The LIMIT statement limits the
limit 10;
quit; number of records returned in
FedSQL.
50
C o p yri gh t © SAS In sti tu te In c. Al l ri gh ts re se rve d .
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-40 Lesson 7 Accessing DBMS Data w ith SAS/ACCESS®
To use a macro variable in a character string in PROC FEDSQL, you must use
%TSLIT(¯o-variable).
%LET StateVar=NC;
PROC FEDSQL;
SELECT UserID, Income, State
FROM sq.customer
WHERE State=%tslit(&StateVar)
ORDER BY Income DESC
LIMIT 10;
QUIT;
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.