0% found this document useful (0 votes)
71 views14 pages

PLSQL

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 14

LESSON 1 : PROC SQL TUTORIAL FOR BEGINNERS (20 EXAMPLES)

Deepanshu Bhalla 1 Comment SAS, SQL


This tutorial is designed for beginners who want to get started with PROC SQL.
Also, it will attempt to compare the techniques of DATA Step and PROC SQL.
TERMINOLOGY
The difference between SAS and SQL terminology is shown in the table below.

SYNTAX
PROC SQL;
SELECT column(s)
FROM table(s) | view(s)
WHERE expression
GROUP BY column(s)
HAVING expression
ORDER BY column(s);
QUIT;
The SELECT statement must be specified in the following order:
1.SELECT
2.FROM
3.WHERE
4.GROUP BY
5.HAVING
6.ORDER BY
Note: Only the SELECT and FROM clauses are required. All the other clauses are
optional.

EXPLANATION
PROC SQL: calls the SQL procedure
SELECT: specifies the column(s) (variables) to be selected
FROM: specifies the table(s) (data sets) to be queried
WHERE: subsets the data based on a condition
GROUP BY: classifies the data into groups based on the specified column(s)
ORDER BY: sorts the resulting rows (observations) by the specified column(s)
QUIT: ends the PROC SQL procedure
We are going to look at the difference between Non-SQL Base SAS and PROC
SQL.

PROC SQL STATEMENTS


1. Selecting all variables from the data set
proc sql;
select *
from sasuser.outdata;
Quit;
Asterisk (*) is used to select all columns (variables) in the order in which they
are stored in the table.
Outdata is the table (data set) from which we need to select the columns
(variables) . It is stored in SASUSER library.
To display the list of columns to the SAS log, use FEEDBACK option in the PROC
SQL statement
proc sql feedback;
select *
from sasuser.outdata;
Quit;

The SAS log is shown below:

2. Selecting specific variables from the data set


In the SELECT clause, multiple columns are separated by commas.
proc sql;
select weight,married
from sasuser.outdata;
Quit;
In the SELECT clause, Weight and Married columns (variables) are specified so
that we can select them from OUTDATA table (data set).
3. Limiting the number of rows
Suppose you want to limit the number of rows (observations) that PROC SQL
displays, use the OUTOBS= option in the PROC SQL statement.
proc sql outobs=50;

select weight,married
from sasuser.outdata; Quit;
4. Renaming a variable in output
Suppose you want to rename a variable, use the column alias AS option in the
PROC SQL statement.
proc sql;
select weight,married as marriage
from sasuser.outdata;
Quit;
The variable name has been renamed from married to marriage.
5. Creating a new variable
Suppose you want to create a new variable that contains calculation.
proc sql;
select weight, (weight*0.5) as newweight
from sasuser.outdata;
Quit;
A new variable has been created and named newweight which is calculated on
the basis of the existing variable weight.
6. Referring to a previously calculated variable
The keyword CALCULATED is used to refer a previously calculated variable.
proc sql;
select weight, (weight*0.5) as newweight,
CALCULATED newweight*0.25 as revweight
from sasuser.outdata;
Quit;
7. Removing duplicate rows
The keyword DISTINCT is used to eliminate duplicate rows (observations) from
your query results.
proc sql;
select DISTINCT weight, married
from sasuser.outdata;
Quit;
8. Labeling and formatting variables
SAS-defined formats can be used to improve the appearance of the body of a
report. You can also label the variables using LABEL keyword.
proc sql;
select weight FORMAT= 8.2
, married Label =" Married People"
from sasuser.outdata;

Quit;
9. Sorting data
The ORDER BY clause returns the data in sorted order.
ASC option is used to sort the data in ascending order. It is the default option.
DESC option is used to sort the data in descending order
proc sql;
select age, weight,marriage
from sasuser.outdata
ORDER BY weight ASC, married DESC;
Quit;
10. Subsetting data with the WHERE clause
Use the WHERE clause with any valid SAS expression to subset data.
List of conditional operators :
1. BETWEEN-AND
The BETWEEN-AND operator selects within an inclusive range of values.
Example : where salary between 4500 and 6000;
2. CONTAINS or ?
The CONTAINS or ? operator selects observations by searching for a specified
set of characters within the values of a character variable
Example : where firstname contains DE;
OR where firstname ? DE;
3. IN
The IN operator selects from a list of fixed values.
Example : where state = 'NC' or state = 'TX';
The easier way to write the above statement would be to use the IN operator
where state IN ('NC','TX');
4. IS MISSING or IS NULL
The IS MISSING or IS NULL operator selects missing values.
Example : where dateofbirth is missing
OR where dateofbirth is null
5. LIKE
The LIKE Operator is used to select a pattern.

proc sql;
select name, age, weight
from sasuser.outdata
WHERE name LIKE 'D_an%';
Quit;
Important Points :
1. The WHERE clause cannot reference a computed variable.
To use computed variables on the WHERE clause they must be recomputed.
PROC SQL;
SELECT NAME, (WEIGHT * .01) AS NEWWEIGHT
FROM HEALTH
WHERE (WEIGHT * .01) > 5;
QUIT;
2. The WHERE clause cannot be used with the GROUP BY clause.
To subset data with the GROUP BY clause you must use HAVING clause.
11. Multiple Conditions / Criteria
PROC SQL;
SELECT WEIGHT,
CASE
WHEN WEIGHT BETWEEN 0 AND 50 THEN LOW
WHEN WEIGHT BETWEEN 51 AND 70 THEN MEDIUM
WHEN WEIGHT BETWEEN 71 AND 100 THEN HIGH
ELSE VERY HIGH
END AS NEWWEIGHT
FROM HEALTH;
QUIT;
The END is required when using the CASE.
The following operators can be used in CASE expression:
All operators that IF uses (= , <, >, NOT, NE, AND, OR, IN, etc)
BETWEEN AND
CONTAINS or ?
IS NULL or IS MISSING
=*
LIKE
12. Aggregating or summarizing data

Use GROUP BY clause to summarize data. Summary functions are used on the
SELECT statement to produce summary for each of the analysis variables.
proc sql;
select age, weight, COUNT(married) AS marriage
from sasuser.outdata
GROUP BY age
ORDER BY weight ASC, married DESC;
Quit;

The summary functions available are listed below:


1. AVG/MEAN
2. COUNT/FREQ/N
3. SUM
4. MAX
5. MIN
6. NMISS
7. STD
8. VAR
9. T (t value)
10. USS (Uncorrelated Sum of Square)
11. CSS (Correlated Sum of Square)
12. RANGE
13. Subsetting data in the groups
In order to subset data when grouping is in effect, the HAVING clause must be
used. The variable specified in having clause must contain summary statistics.
proc sql;
select age, weight, COUNT(married) AS marriage
from sasuser.outdata
GROUP BY age
HAVING marriage > 2
ORDER BY weight ASC, married DESC;
Quit;
14. Creating a new data set as output
The CREATE TABLE statement can be used to create a new data set as output
instead of a report produced in output window.

SYNTAX
PROC SQL;
CREATE TABLE table-name AS
SELECT column(s)
FROM table(s) | view(s)
WHERE expression
GROUP BY column(s)
ORDER BY column(s);
QUIT;
proc sql;
create table health AS
select weight, married
from sasuser.outdata
ORDER BY weight ASC, married DESC;
Quit;
15. Limiting the number of rows in the new created data set
Suppose you want to limit the number of rows (observations) that PROC SQL
produces in the data set, use the INOBS= option in the PROC SQL statement.
proc sql INOBS=50;
create table health AS
select weight,married
from sasuser.outdata;
Quit;
16. Remove duplicates
Proc SQL noprint;
Create Table inter.Merged1 as
Select distinct * from inter.readin ;
Quit;
Proc SQL noprint;
Create Table inter.Merged1 as
Select distinct var2 from inter.readin ;
Quit;
17. Counting unique values by a grouping variable
Suppose you want to calculate number of students who have got same scores
by their college names.
You can use PROC SQL with COUNT(DISTINCT variable_name) to determine the
number of unique values for a column.
PROC SQL;
CREATE TABLE TEST1 as
SELECT college_names,count(distinct score) AS unique_count
FROM test
GROUP BY college_names;
QUIT;
18. Sub Query

Find employee IDs who have records in table file1 but not in table file2.
Proc SQL;
Select ID from file1
Where ID not in (select ID from file2);
Quit;
19. Sub Query - Part II
Find employee IDs whose age is in the average age +/- 5 years.
Proc SQL;
Select id from file1
where age between (select Avg(age) from file1) - 10 and
(select avg(age) from file1)+10;
Quit;
20. Sub Query - Part III
proc sql;
select Name, Grade, Teacher,
Case
When Student_ID in
(select Student_ID from Tests where Score lt 70) then 'Failed one or more tests'
else 'Passed all tests'
end as Test_Results
from Students;
quit;
LESSON 2 : PROC SQL : MERGING
Deepanshu Bhalla 5 Comments SAS, SQL
This tutorial is designed for beginners who want to get started with PROC SQL
Joins. It explains different types of joins and the equivalent data step merge
code for these joins.
Lesson 1 : Proc SQL Fundamentals with 16 Examples
Advantages of PROC SQL Joins over Data Step Merging
1. PROC SQL joins do not require sorted tables (data sets), while you need to
have two data sets sorted when using MERGE statement
2.
PROC SQL joins do not require that common variable have the same name
in the data sets you are joining, while you need to have common variable name
listed in BY option when using MERGE statement.
3.
PROC SQL joins can use comparison operators other than the equal sign
(=).
1. Cartesian product (All rows in all the tables)
The Cartesian product returns a number of rows equal to the product of all rows
(observations) in all the tables (data sets) being joined. For example, if the first

table has 10 rows and the second table has 10 rows, there will be 100 rows (10 *
10) in the merged table (data set).

Create these two data sets into SAS


Data A;
Input ID Name$ Height;
cards;
1A1
3B2
5C2
7D2
9E2
;
run;
Data B;
Input ID Name$ Weight;
cards;
2A2
4B3
5C4
7D5
;
run;
Cartesian Product : SQL Code
PROC SQL;
Create table dummy as
Select * from A , B;
Quit;
Key takeaways
1. Since the first data set has 5 rows and the second data set has 4 rows, there
are 20 rows (5 * 4) in the merged data set.
2. Since the ID values of the first data set is different than the ID values of the
second data set, the ID given in the joined data set is misleading.
In the final merged file, number of columns would be (Common columns in both
the data sets + uncommon columns from data set A + uncommon columns from
data set B).
2. Inner Join (rows common to both tables)

It returns rows common to both tables (data sets).

PROC SQL;
Create table dummy as
Select * from A,B
where A.ID = B.ID;
Quit;

Another way to write the above code PROC SQL;


Create table dummy as
Select * from A inner join B
On A.ID = B.ID;
Quit;
Both the codes produce same result.
Inner Join : Data Step Code
Data dummy;
Merge A (IN = X) B (IN=Y);
by ID;
If X and Y;
run;
In the final merged file, number of columns would be (Common columns in both
the data sets + uncommon columns from data set A + uncommon columns from
data set B).
If missing values in the common variable between the data sets

SQL Inner Join will return Cartesian Product of missing values in the common
variable between data sets. Since dataset A has 3 missing values and dataset B
has 1 missing value, there are 3 (3*1) missing values in the merged dataset.
3. Left Join (Return all rows from the left table, and the matched rows from the
right table)
It returns all rows from the left table, and the matched rows from the right
table.
PROC SQL;
Create table dummy as
Select * from A left join B
On A.ID = B.ID;
Quit;

Left Join : Data Step Code


Data dummy;
Merge A (IN = X) B (IN=Y);
by ID;
If X ;
run;
If missing values in the common variable between the data sets
SQL Left Join will return Cartesian Product of missing values in the common
variable between data sets.

In the final merged file, number of columns would be (Common columns in both
the data sets + uncommon columns from data set A + uncommon columns from
data set B).
4. Right Join (Return all rows from the right table, and the matched rows from
the left table)
It returns all rows from the right table, and the matched rows from the left
table.

PROC SQL;
Create table dummy as
Select * from A right join B
On A.ID = B.ID;
Quit;

Note : The right-hand table ID values are missing in the merged table.
To add the missing right hand table ID values to a right join, you can use the
SQLCOALESCE function. The COALESCE function returns the first non-missing
argument.
PROC SQL;
Create table dummy as
Select coalesce (A.ID,B.ID) as ID,*
from A right join B
on A.ID = B.ID;
Quit;

Right Join : Data Step Code


Data dummy;
Merge A (IN = X) B (IN=Y);
by ID;
If Y ;
run;
In the final merged file, number of columns would be (Common columns in both
the data sets + uncommon columns from data set A + uncommon columns from
data set B).
5. Full Join (Return all rows from the left table and from the right table)
It returns all rows from the left table and from the right table

Key takeaway : The FULL JOIN suffers the same difficulty as the RIGHT JOIN.
Namely, the common variable values are lost from the right-hand data set.
The COALESCE function can solve this difficulty.
PROC SQL;
Create table dummy as
Select coalesce (A.ID,B.ID) as ID,*
from A right join B
on A.ID = B.ID;
Quit;

Full Join : Data Step Code


Data dummy;
Merge A (IN = X) B (IN=Y);
by ID;
run;
In the final merged file, number of columns would be (Common columns in both
the data sets + uncommon columns from data set A + uncommon columns from
data set B).
How to refer to permanent library in SQL Joins
PROC SQL;
Create table dummy as
Select * from readin.A as file1 left join readin.B as file2
On file1.ID = file2.ID;
Quit;

You might also like