SQL Basics: Prepared by Destiny Corporation
SQL Basics: Prepared by Destiny Corporation
PROGRAM EDITOR
Basic Examples
to delete a table, index or view change the values in a table add rows to a table The SQL procedure uses the SELECT statement to perform a wide variety of queries. Within the SELECT statement are different Clauses:
*Q02E01 The Select statement; proc sql; select * from saved.computer; ** The Proc SQL statement is loaded into memory and remains resident until another Data or Proc step is run or a QUIT; statement is executed. Therefore, subsequent queries or other SQL statements can be run without the need to re-submit the Proc SQL statement. each SQL statement is processed individually. No RUN; is required.
PROGRAM EDITOR
proc sort data = saved.computer; by disk; proc print data = saved.computer; run; Advantages of SQL:
**
**
Proc Print style output is automatically produced. Note there is no observation number and there is a line below the variable names. This output can be suppressed with the NOPRINT option on the Proc SQL statement:
**
Notice that no physical output file has been produced from this query, simply a listing as output.
Usage
Invoke Proc SQL and then use a SELECT statement:
PROGRAM EDITOR
proc sql;
Selecting Rows
Selecting Rows is done with the WHERE clause:
PROGRAM EDITOR
proc sort data = saved.computer by disk; run; proc print data=saved.computer (where=(supplier='KETCHUP COMPUTERS')); var type disk retail; run; Again, the SQL advantage is the lack of the sort step. Note that the list of column names require to be delimited by commas: select type, disk, retail All lists of column names and table names require commas in SQL syntax.
PROGRAM EDITOR
*Q02E03 The subsetting rows with where; proc sql; select * from saved computer where supplier = KETCHUP COMPUTERS order by disk; quit;
**
Calculating Columns
Columns can be calculated by assigning an expression to an item name:
** **
PROGRAM EDITOR
*Q02E05 Assigning an expression to column name; proc sql; select type, disk, retail, retail * 7/47 as vat from saved. Computer where supplier = KETCHUP COMPUTERS order by disk; quit; Notice the structure: as opposed to the traditional SAS: expression AS variable variable = expression;
**
**
**
**
Selecting columns
Using the column names from the SAS file allows choice of the columns in the report:
Notice that the traditional equivalent would now require an additional step: PROGRAM EDITOR data new; set saved.computer; vat = retail * 7/47; proc sort data = new; by disk; proc print data = new (where=(supplier='KETCHUP COMPUTERS')); var type disk retail vat; run;
PROGRAM
*Q02E04 Selecting columns to display in select statement; proc sql; select type,disk,retail from saved.computer where supplier = 'KETCHUP COMPUTERS' order by disk;
daychg label='Daily Charge', (endate - stdate +1) as duration label= 'Days Hired', (endate - stdate +1)*daychg format = pound7.2, round(((endate - stdate +1)*daychg*7/47),0.01) as vat format = pound7.2 from saved.carhire order by 5 desc; Notes: Functions can be used to derive calculated columns as shown by the SUBSTR function above and can be used to change calculate values as shown by the ROUND function. Derived columns need not be given aliases using the AS syntax. Aliases cannot be used in further calculations. For example, if the expression in line 17 above: (endate - stdate +1)*daychg were replaced by duration*daychg an error would result, duration not being found. The order by clause can use the ordinal position of the column. In the example above the 5th column in used. Ordering can be done a calculated column with no alias.
PROGRAM EDITOR
*Q02E07 Labeling columns with a column modifier; proc format; picture pound low-high='000,000,009.99'(prefix='$'); proc sql; title 'Ketchup Computers, VAT element of prices'; select type label='Computer Type', retail label='Retail Price' format=pound9.2, retail*7/47 as vat format=pound7.2 from saved.computer where supplier='KETCHUP COMPUTERS' order by disk ;
Distinct Values
A useful keyword is DISTINCT, which allows selection of unique values of a column: PROGRAM EDITOR *Q02E09 Selection of unique key values with Distinct; title 'W hich disk types are sold by Ketchup?'; select distinct disk from saved.computer where supplier='KETCHUP COMPUTERS' <-- not required order by disk ;
Using Functions
Ordinary 'Data Step' functions can be used in these expressions as noted in Q2.1: PROGRAM EDITOR *Q02E08 Using functions; options nodate nonumber; title 'Charges raised for car hire'; proc format; picture pound low-high = '000,009.99' (prefix= '$'); value $model 'F' = 'Fiesta' 'E' = 'Escort' 'O' = 'Orion' 'S' = 'Sierra' 'G' = 'Granada'; proc sql; select substr(carkey,2,1) format=$model., custkey label='Customer Code',
OUTPUT
W hich disk types are sold by Ketchup? DISK 20 40 60 100 120 200
PROGRAM EDITOR
*Q02E11 Checking select statement syntax without executing; validate select distinct disk label = Hard Disk Size. Type label = Computer Type' from saved computer where supplier = KETCHUP COMPUTERS order by disk ;
PROGRAM EDITOR
title 'W hich disk types are sold by Ketchup?'; proc sort data=saved.computer out=sorted; required by disk; data unique(keep=disk); set sorted(keep=disk supplier); by disk; if last.disk; where supplier='KETCHUP COMPUTERS'; proc print data=unique; run;
<-----
LOG
Proc SQL has valid syntax
This facility can only be used with a Query-expression i.e. to qualify the syntax of a SELECT.
Syntax Errors
This syntax error is caused by the ORDER BY option:
PROGRAM EDITOR
*Q02E10 Distinct can apply to unique combinations; title 'All combinations of disk and type'; select distinct disk label='Hard Disk Size', type label='Computer Type' from saved.computer where supplier='KETCHUP COMPUTERS' order by disk ;
PROGRAM EDITOR
*Q02E12 Where is the syntax error here?; validate select distinct disk label = Hard Disk Size, type label = Computer Type from saved.Computer order by disk where supplier = KETCHUP COMPUTERS ;
The DISTINCT keyword applies to all the column names and each unique combination of values is returned. Data/Proc Step methods would use two BY variables. Note the effect on the inner variable (type) when the outer variable (disk) changes value:
LOG
validate select distinct disk label = Hard Disk Size , type label = Computer Type from saved. Computer order by disk where supplier = KETCHUP COMPUTERS ; -------22 202 ERROR 22-322: Expecting one of the following: (, **, *, /, +, -, !!, , <,<=,<>,=,>,>=,?,CONTAINS, eq, ge, gt, le, lt, ne, ^=, ~=, &, AND, !, or, , OR, , , The statement is being ignored. ERROR 202-322: The option or parameter is not recognized. Make sure the ORDER by statement is the last option on the SELECT statement.
PROGRAM EDITOR
title 'W hich disk types are sold by Ketchup?'; proc sort data=saved.computer out=sorted; by disk type; data unique(keep=disk type); set sorted(keep=disk type supplier); by disk type; if last.type; where supplier='KETCHUP COMPUTERS'; proc print data=unique; run;
Syntax Checking
Use the VALIDATE statement, before the SELECT statement, to check the SQL statements without executing them:
Analysis on Groups
PROGRAM EDITOR
*Q02E13 Analysis down a column for groups; select mean(retail) as avprice from saved.computer;
OUTPUT
AVPRICE 1929.167 This is the equivalent of:
PROGRAM EDITOR
proc means data = saved.computer mean; var retail; run; With more than one argument, the function performs for each row:
PROGRAM EDITOR
*Q02E14 More then one argument to analyze each row; select retail format= pound10.2, retail * 7/47 as VAT format = pound8.2, sum(retail,retail*7/47) as gross format =pound10.2 from saved.computer; With a single argument, but with other selected columns, the function gives a result for all the rows, then merges the summary back with each row:
PROGRAM EDITOR
*Q02E16 The count function supplies the number of rows; select count(*) as no_rows from saved.computer; select sum(retail)/count(*) as average from saved.computer; quit;
Prepared by: Destiny Corporation 100 Great Meadow Rd, Suite 601 Wethersfield, CT 06109-2379 Phone: (860) 721-1684 - 1-800-7TRAINING Fax: (860) 721-9784 Web: www.destinycorp.com Email: [email protected]
SAS and all other SAS Institute, Inc. product or service names are registered trademarks or trademarks of SAS Institute, Inc. in the USA and other countries.
PROGRAM EDITOR
*Q02E15 Merges summary value onto each row of output; select cpu, disk, (retail -wholesal) as profit label='Profit', mean(retail-wholesale) as avprofit label = 'Average Profit', (retail-wholesal) - mean(retail -wholesal) as diff label = 'Difference' from saved.computer where supplier contains 'FLOPPY'; To accomplish the same thing in Data/Proc step either requires use of Proc Means/Summary to create a one-observation, onevariable data set which is then read into the data step alongside saved.computer or two passes of the data in the same data step: